Manage Learn to apply best practices and optimize your operations.

Success with Java-based Hadoop demands variety of skills

Increased interest in 'big data' analytics is prompting development team managers to consider Hadoop, which calls for new programming skills.

A surge in interest in "big data" analytics is leading many development team managers to consider Hadoop technology.  When they do, they also need to take inventory of skills available for dealing with Hadoop.

Based on Google’s MapReduce model, Hadoop distributes computing jobs and then combines results.  Hadoop is Java-based, so it typically requires Java-programming abilities.

Implementing Hadoop is not the same type of Java development project that enterprise application development teams are probably used to, although effective big data analytics does share some similarity with traditional SOA -- and even batch-oriented development.

Hadoop is "not about real time operational [business intelligence], but more about the discovery, exploration and analysis of large amounts of multistructured data," said Helena Schwenk, analyst at MWD Advisors. She told via email that a well-rounded Hadoop implementation team's skills should include experience in large-scale distributed systems and knowledge of languages such as Java, C++, Pig Latin and HiveQL. Data exploration and analysis skills such as predictive modeling, natural language processing and text analysis are also important parts of the mix.

Shwenk went on to explain that other areas to consider are data management, integration of both structured and unstructured data, a range of data latency demands, and architectural support for scalability and high-speed processing.

Clearly, flexibility is important and team members need to be ready to update and broaden their skills. "Big data challenges cannot be solved by a single platform or engine," said Schwenk. Instead, she said, team members need to employ a variety of technologies, components and architectures. She went on to say that technologies such as Hadoop, MapReduce and distributed NoSQL databases will likely be part of the mix, but that "technologies such as in-memory databases, columnar databases and massively parallel-processing architectures" are also possibilities.

Of course, the value for many enterprises will really come from the integration of big data analytics with their existing enterprise architecture. One way to do this, according to Schwenk, is to merge big data projects with existing business processes and data assets such as a data warehouse for a fuller picture of their business.

"Big data," said Schwenk, "will require you to think carefully about sourcing and investing in the right people, analytic skills and experience to make sure you can take advantage of the … opportunities that big data presents."

This may mean that some application development teams will have to hire new talent or provide training for the developers they already have.

Dig Deeper on Topics Archive

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.