The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability,
the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
This section contains simple hadoop MapReduce examples of wordcount,matrix multiplication,finding maximum temperature,converting small image files to sequential files.
That will provide basic idea,how to use MapReduce programing framework in hadoop cluster and about some basic APIs required to write Hadoop MapReduce programe and how key-value mapping done to get desire output
JavaTM 1.5.x, preferably from Sun, must be installed.
ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
Hadoop-1.1.x or 2.2.x must be installed.
Commands for Verification of Hadoop setup :
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4...