Thursday, January 26, 2012

MapReduce v/s SQL

-->
I was exposed to the MapReduce paradigm couple of years ago and am in touch with the open source implementation of MapReduce framework (Hadoop) since then. We started to play with Hadoop actively to understand the pros and cons of the framework and as of today we have considerably progressed by building our new platform backed by this powerful distributed computing framework.

During this whole journey, we encountered many challenges and questions. One of the most frequent query was why not SQL based systems and why Hadoop? After going through various discussions, technical articles and my own exposure to this system, I thought of sharing my own experience around this very frequent query originating from different development community.

The concept of MapReduce was actually brought in to solve the scale problem when you are dealing with huge amount of data, which nowadays is not a problem that exists only for big corporate houses (like Google, Microsoft etc.). With the bloat of information everywhere, this problem of dealing with huge data set has become a more common problem than earlier.  Most of the times people try to solve it with the conventional systems like SQL. Many of the places, it has been successful up to some extent, but beyond a threshold, it becomes really very challenging to tackle this problem with those conventional systems. There are many reasons of this challenge.

1. Unstructured data processing: SQL systems are optimized to deal with structured data. The whole concept of relational databases is based on the notion of relational schema to store information, which becomes a challenging when you are dealing with unstructured data. In such cases you end up retro fitting your information in a tight schema, which creates challenge while digging out insight from the data. 

2. Control on processing steps: SQL mostly is a language, which is declarative in nature, and when you use SQL for querying information, you pass on your interest in result and the data sources from where this information can be retrieved. The actual details of how to get to the result still lies under the control of query processing engine. So, you are left with nothing (Most of the time you can only pass on some clue/hint to the engine to influence its strategy of processing), but to rely upon the genius of processing engine to provide you the data through optimal processing. 

3. Scale (out v/s up): Conventional systems like relational databases are designed to work on monolithic systems rather than distributed clusters. This imposes the constraint that while addressing the scale issue, you will end up buying costlier hardware. One thing worth noting here is that cost of the hardware does not increase linearly, meaning the cost of one machine having 5 times power than a standard machine is more than 5 times costlier than the standard machine. Because of this equation, it makes more sense to address the scale challenge horizontally than vertically. That means if you build framework where in you can scale by just adding more no.  of machines, it makes more sense. Of course adding more machines comes with an overhead of more co-ordination requirements. For which if you have at your luxury a framework like Hadoop, then it can be a smart move to approach for scale out using such frameworks.

4. Offline v/s online processing: The original requirement because of which the MapReduce framework originated was around processing huge amount of data without caring about items like realtime processing, transaction support etc. Hence these systems are more optimized for offline processing rather than realtime processing of data. Though there are many other technologies, which are trying to address these pieces as well, but still as of today, in the heart of the framework, it is fundamentally a offline distributed processing framework. 

5. Raw talent v/s conventional wisdom:  Relational database systems have existed in the industry for quite a few years and it has catered to the need of its age very successfully. These successful years of SQL adoption in the s/w industry has produced a lot of experts around that technology.  So, when you are using SQL based systems, you have the luxury to have these expert advices. On the contrary, in the MapReduce world, the thought process is quite different and during your initial days of its adoption, it is very likely that you may end up designing your system in the relational way even though underlying you tend to use these new frameworks. Design and thought process in MapReduce requires a raw thought process, and the moment you try to retrofit this with conventional thought process of how to build your stack thinking about entities and its relationships, you debar yourself from gaining the juice of MapReduce framework.  

Conclusion
This post certainly does not cover each and every aspect of both the systems, but IMHO, it provides you some data point to think about while you are planning to build your stack around these technologies. Obviously, the intent of the post is not to prove anything, but to dig out different relevant points around these technologies. For a given requirement, it is even possible (and it is not very remote possibility in many of the cases) that you end up building your stack with a marriage of both these technologies.

Signing off for right now with a request to provide your fruitful comments…
--RBK

Tuesday, January 24, 2012

Tanhai...

Har bita lamha apni ek tasveer liye aata hai,
Mujhe meri tanhai mein dara jata hai.
Mujhe khud se nahin apni tanhai se ummid hoti hai,
Aur ek ajeeb si bechaini uski taqdeer hoti hai.
Tanhai mujhse sach kahne ko kahti hai,
Kabhi kabhi to jid karne lagti hai,
Main use jhoothla nahin pata hoon,
Chah kar bhi kabhi jeet nahin pata hoon.

Main khud se baat karna chahta hoon,
Kuch paheliyon ka hal dhoondhna chahta hoon.
Par tanhai ko mujh pe bharosa nahin hai,
aur ab to mujhko bhi yakeen yahi hai.
Band aankhon mein roshni jab chamkti hai,
Sannate mein kisi ki aahat jab khatkti hai.
Tab har baar meri tanhai mujhse pahle jagti hai,
Aur mujhe jagakar badi bedard si hansti hai.

Wo mera kuch nahin kar payegi ye janta hoon,
Per phir bhi achanak se apne ko dara hua pata hoon.
Aur khusi hoti hai ki kabhi to main sach se darta hoon,
Aur in sab ke baad bhi usse baat karne ko tadpta hoon.

Ek khwahish hai kabhi to wo mujhe raasta bataye
Uske jaisa kyun nahin hoon, ye samjhaye.
Lekin poochne ke pahle hi phir se dar jata hoon,
Aur phir se apne aap ko wahin tanha pata hoon.


-- RBK

Sunday, July 19, 2009

“Kyun hai” – My first poem

Har saans mein ehsaash kyun hai?
Har ehshaash mein ek pyas kyun hai?

Har pyas mein milawat kyun hai?
Har dard mein ek chahat kyun hai?
Har chahat mein ek dikhawat kyun hai?
Har dikhawat mein ek banawat kyun hai?

Har lamha itna bikhra kyun hai?
Har aaj itna mohtaaj kyun hai?
Har chehra itna suna kyun hai?
Har roj mujhe jina hi kyun hai?

Sochta hoon sochna hi kyun hai?
Na sochna hi achha kyun hai?

Har khwahis itni jiddi kyun hai?
Har aadmi itna piddi kyun hai?
Har raat itni chhoti kyun hai?
Har din itna bhaari kyun hai?

Sochta hoon ek aadhar dhoondh loon
Jine ka sansaar dhoodh loon
Par us sansaar ki jaroorat kyun hai

Sochta hoon sab mithya hai yahan pe
Per in sab mein itni haqeeqat kyun hai.

                                         --RBK

Monday, July 13, 2009

Many many congratulations “Raza and Mubeen”

With the title of the blog, most of the readers would have guessed that probably its a congratulation to a newly wed couple. you are right and i am on the way to return journey from Hyderabad to Bangalore and just trying to recollect the moments which I spent this weekend. There are so many sweet memories attached with the trip and I would like to share it with you. These memories make it a memorable one. Some of the main reasons are:

  1. It was wedding of one of my very good friend (Raza)
  2. First time i was attending a Muslim wedding and I was very excited for it.
  3. The day of marriage was also my birthday :)
  4. I met with many old friends and spending time with them was real fun.

What else do you want to achieve during a normal weekend.

I left from my office on Friday evening then met one of my very old fiend on station. It was his first day in office after passing out from IIMb. we sat in the train and my phone started ringing from 11 P.M. onwards. Many of the people who were knowing that i was travelling that night were wishing before time as they were worried about what if i am out of the signal coverage. So, first time i celebrated my birthday inside a Indian Railways coach.

Next morning we were in the city with which i have a lots of memories as this is the city from where my career started. Then we met with the groom and here we were a little bit disappointed to see the tummy of the groom (Raza i am sorry but its a truth).  Next with friends i moved to a nearby shopping mall (GVK 1) and did some shopping for my birthday. Celebrated the birthday with the pastries from Karachi bakery and then in the evening finally reached the place where the marriage was supposed to take place. i was thrilled as i wanted to know the way it goes on and i actually found it very exciting. We met with the boy, congratulated him and then finally had that awesome biryani which was the crux of all things. Thanks raza for the awesome biryani. I suggest you to open a restaurant with the same caterer.

The very next day it was a fun time with friends, met with raza and bhabhi. I along with Vineet and Himanshu did lots of masti and the finally i am returning back from Hyderabad to Bangalore. When i recollect the moments many of the things are still there which make me either laugh/smile like:

  • Joke by Himanshu (Because I find that i have some responsibility towards the society, I cann’t publish that here, but it was a killer one :) ).
  • Dhokala, Raj kachauri and masala milk of Tiwari. I miss it like anything in Bangalore.

DSC04783

Wish you all the best Raza and Mubeen in the new life.