Sunday, June 24, 2012


Exciting developments in Spring for Apache Hadoop project

Few days back, Spring source community announced the second milestone of SHDP (Spring for Apache Hadoop) Project. The very first time I had come to know about this initiative from VMWare, I was almost sure that this is going to be a boon for Java spring development community interested in Big data. In the second milestone, they have made it a point to address the major challenges of developers by providing many of the Spring’s powerful concepts for Apache Hadoop development.

Gradually this project is evolving and rightly hitting the main challenges of developers who are working on the Hadoop and its peripheral technologies. The new milestone not only covers the core support for Hadoop based MR Jobs but also provides support for related technologies like HBase, Hive, Pig etc.

For the developers who are using HBase in their stack, this release is a wow moment. Spring has offered its powerful DAO (Data Access Object) concept for HBase. Developers can now use the powerful template for HBase crud without worrying about exception handling, writing boilerplate code or, handling resources and disposing them. This means that you don’t need to worry about tedious tasks like looking up a HBase table, running the query, building a scanner, cleaning up resources etc...

In case you really want to get a feel of how much headache it removes for a developer, here is a sample code snippet provided by Spring source to read each row from a HBase table.

// read each row from 'MyTable' List rows = template.find("MyTable", "SomeColumn", new RowMapper() {   @Override   public String mapRow(Result result, int rowNum) throws Exception {     return result.toString();   } }));

This definitely reduces a lot many lines of code. The other noticeable point is that all different types of exceptions being raised by the underlying HBase APIs are being converted into one DataAccessException which again eases the development for layers involved in CRUD operations with HBase. So the developers who are using HBase in their technology stack are destined to enjoy this release.

There is a lot of good news for the development community who is using Hive and Pig. Spring seems to have support of both clients (Thrift and JDBC) for Hive. For the Thrift client a dedicated namespace has been provided. In case you want to use the JDBC client, one can leverage the rich JDBC support in Spring which provides you facilities like Jdbc template, NamedParameterJdbcTemplate etc. For the pig developers, Spring provides easy configuration and instatiation of pig servers for registering and executing pig scripts either locally and remotely.

There is a lot more like cascading integration, security support for which I am excited to delve into details, but one thing is quite clear that the small elephant is definitely going to enjoy this spring :)

Here are some of the useful links in case you are interested to get into the details of all these developments:


  1. Good post - do keep them coming regularly. And did you check out the latest big data conference