Thursday, April 18, 2013

What is Apache Hadoop?

Its an open-source software/framework for reliable, scalable, distributed computing.

This framework basically provides for distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Hadoop's own distributed file system (HDFS) allows for rapid data transfer among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of system failure, even if a significant number of nodes become inoperative.

Hadoop was inspired by Google's MapReduce which is also a programming framework where application is broken down into small parts so that it can run on different nodes individually (Map) and then the results are collected and compiled as one (Reduce). 


No comments:

Post a Comment