The Apache Software Foundation (ASF) has been a major driver of Web technology and a very visible publisher of useful open source tools since the inception of the Web. Some ASF tools like Apache Server and Tomcat are exceptionally well known, but others are hidden gems.
If we do a little digging we can unearth some valuable ASF tools, some of which may be integrated into commercial tools you're already using. Apache software licensing specifically allows commercial application and only requires preservation of copyright notices in proprietary developments.
Many of the hidden beauties are Apache Incubator projects. Some are top-level Apache projects that have been around for a while, but haven't yet achieved public visibility.
Some of these incubator projects move very quickly through to wide use, and others can "gestate" for a long time. ASF has a formal, well-evolved system of working with open source developers to sort through volumes of software and identify the most promising projects.
Let’s begin our review with some long running ASF projects that don't get as much publicity. Tools from these projects work behind the scenes in many applications and development projects.
- Xerces and Xalan -- An XML parser toolkit and XSLT stylesheet processor respectively in Java and C++, supporting almost all of the latest standards.
- Apache Commons – A huge collection of well designed Java classes demonstrating the Object Oriented Programming principle of reusable code.
- HTTP Components – This project spun off from the Commons when it became obvious that many developers were only interested in HTTP tools.
Rising stars in the Apache firmament
In this category we find projects which have only recently attracted a lot of attention. After being nurtured in the Apache Incubator system they have become top-level ASF projects and are used as a starting point for commercial services and software offerings.
- Lucene -- A text indexing and search library written entirely in Java, Lucene started life as a SourceForge project, entered the ASF ecosystem in 2001 and became a top level project in 2005. The basic technology has supported a number of related sub projects. Commercial developers have built substantial businesses offering Lucene enabled products and services.
- Hadoop -- A distributed computing toolkit inspired by Google's MapReduce (for distributing jobs and combining the results) and Google File System (for efficient handling of really large amounts of data), Hadoop rapidly rose to become an Apache top level project. Hadoop is extensively supported and used by many organizations with big distributed processing jobs such as Yahoo and IBM's Watson program of "Jeopardy" fame where it combined with UIMA!
- UIMA -- The Unstructured Information Management Architecture (UIMA) provides a scalable analytical framework which facilitates content analysis of bulk text resources. UIMA has made rapid progress since starting as an Apache incubator project in 2006 and only became a top level Apache project in 2010. Version 1.0 became a standard accepted by the Organization for the Advancement of Structured Information Standards (OASIS) in 2009. This sort of capability fits right in to the goals of machine learning and the Semantic Web.
Potentially interesting ASF projects
Many ASF projects are progressing in relative obscurity. Some may never make a big impact, but I recommend keeping an eye on the following.
- Mahout -- The goal of the Mahout project is to build machine learning/data mining tools scalable to large text data sets by using the Hadoop map/reduce concepts and distributed file system. A number of clustering and pattern recognition algorithms have been implemented. Mahout is already in commercial use for analysis of customer behavior.
- Apache FOP -- The Formatting Objects Processor toolkit takes input in the W3C XSL-FO standard XML document formatting representation and creates output in PDF and other representations. The full W3C specification is so complex that the recently released FOP version 1.0 meets only the basic specification.
- OpenNLP -- The OpenNLP project is creating a Java based "machine learning based toolkit for the processing of natural language text." All attempts at natural language understanding require common tasks such as parsing, and identifying parts of speech. OpenNLP hopes to facilitate language processing projects by providing basic tools so developers won't have to reinvent the wheel.
- VXQuery -- The Versatile XQuery project wants to create a Java implementation of the XML Query language which will get around the requirement of having a complete DOM (Document Object Model) in memory before a query can be applied.
The ever increasing number of projects and developer participation at the Apache Software Foundation shows that the ASF model of open source tools which commercial projects are free to use fills an important niche in the computer ecosystem. Take a look, you might find some real jewels.