Few days back, Spring source community announced the second
milestone of SHDP (Spring for Apache Hadoop) Project. The very first time I had
come to know about this initiative from VMWare, I was almost sure that this is
going to be a boon for Java spring development community interested in
Big data. In the second milestone, they have made it a point to address the
major challenges of developers by providing many of the Spring’s powerful
concepts for Apache Hadoop development.
Gradually this project is evolving and rightly hitting the
main challenges of developers who are working on the Hadoop and its peripheral
technologies. The new milestone not only covers the core support for Hadoop
based MR Jobs but also provides support for related technologies like HBase,
Hive, Pig etc.
For the developers who are using HBase in their stack, this
release is a wow moment. Spring has offered its powerful DAO (Data Access
Object) concept for HBase. Developers can now use the powerful template for
HBase crud without worrying about exception handling, writing boilerplate code
or, handling resources and disposing them. This means that you don’t need to
worry about tedious tasks like looking up a HBase table, running the query,
building a scanner, cleaning up resources etc...
In case you really want to get a feel of how much headache
it removes for a developer, here is a sample code snippet provided by Spring
source to read each row from a HBase table.
//
read each row from 'MyTable' List rows =
template.find("MyTable", "SomeColumn", new
RowMapper() { @Override
public String mapRow(Result result, int rowNum) throws Exception { return result.toString(); } }));
This definitely reduces a lot many lines of code. The other
noticeable point is that all different types of exceptions being raised by the
underlying HBase APIs are being converted into one DataAccessException which
again eases the development for layers involved in CRUD operations with HBase. So
the developers who are using HBase in their technology stack are destined to
enjoy this release.
There is a lot of good news for the development community
who is using Hive and Pig. Spring seems to have support of both clients (Thrift
and JDBC) for Hive. For the Thrift client a dedicated namespace has been
provided. In case you want to use the JDBC client, one can leverage the rich
JDBC support in Spring which provides you facilities like Jdbc template, NamedParameterJdbcTemplate
etc. For the pig developers, Spring provides easy configuration
and instatiation of pig servers for registering and executing pig scripts
either locally and remotely.
There is a lot more like cascading integration, security
support for which I am excited to delve into details, but one thing is quite
clear that the small elephant is definitely going to enjoy this spring :)
Here are some of the useful links in case you are interested
to get into the details of all these developments: