Hadoop on a single host
Install Hadoop distribution
First, download the binary tarball of the latest Hadoop 1 release (currently 1.2.1).
Untar this to your home directory:
tar xfz hadoop-1.2.1-bin.tar.gz
You should now have a hadoop-1.2.1
directory:
HDFS
Edit hdfs-site.xml
Within this directory, find the file conf/hdfs-site.xml
and use a text editor to change the file
to have the following contents:
Format HDFS
Within the hadoop-1.2.1
directory, run the following:
bin/hadoop namenode -format
Sample output of this command is shown here:
Note from the above the Re-format filesystem in
/tmp/hadoop-ekoontz/dfs/name ? (Y or N)
. This is because I ran this
command before, and so there is existing contents in this directory. I
confirmed with Y
because this is only for temporary work and I have
no valuable data there.
The directory /tmp/hadoop-XXXX
will vary for your installation
depending on your username - your username will used instead of ekoontz
.
Start namenode
Start the namenode with: bin/hadoop namenode
.
Sample output of this command:
Start datanode
Open up another shell, cd
to the hadoop-1.2.1
directory, and do: bin/hadoop datanode
.
If you get an error about Incompatible namespaceIDs
like this:
Then do the following:
rm -rf /tmp/hadoop-ekoontz/dfs/data
where ekoontz
should be replaced with your username as mentioned
above in the “Format HDFS” section, and then rerun the command
bin/hadoop datanode
.
If all goes well, the output of starting the datanode should look like this:
Glance at the first terminal, where you started the namenode. You should see some acknowledgement from the Namenode that the Datanode that you started in the second terminal has connected to the Namenode:
Test HDFS
Open up a third terminal and cd
to the hadoop-1.2.1
directory as
before. Run the following:
bin/hadoop fs -ls hdfs://localhost:8020/
bin/hadoop fs -mkdir hdfs://localhost:8020/mydir
bin/hadoop fs -copyFromLocal README.txt hdfs://localhost:8020/mydir
bin/hadoop fs -lsr hdfs://localhost:8020/
(-lsr
means ‘ls, recursively’, similar to the Unix command ls -R
).
The output from the above should look similar to the following:
MapReduce
Edit mapred-site.xml
Within the hadoop-1.2.1
directory, find the file
conf/mapred-site.xml
and use a text editor to change the file to
have the following contents:
Start Jobtracker
Open up a new shell, cd
to the hadoop-1.2.1
directory and do:
bin/hadoop jobtracker
.
The output should look something like:
Start Tasktracker
Open up a new shell, cd
to the hadoop-1.2.1
directory and do:
bin/hadoop tasktracker
.
The output should look something like:
Looking at the shell in which you started the Jobtracker, you should see some acknowledgement that the Jobtracker received a connection from the Tasktracker:
Test MapReduce
We are now ready to run a sample MapReduce job. The example will use
is “Pi”, which estimates the value of the mathematical constant Pi by
random sampling. This job’s definition is found in a jar file in your
hadoop-1.2.1
directory called hadoop-exmples-1.2.1.jar
.
As before, open up a new terminal, cd
to hadoop-1.2.1
and do:
bin/hadoop jar hadoop-examples-1.2.1.jar pi 5 5
The output of running the job in this terminal should look something like this:
The job should take a few seconds to complete. As it is running, watch the open terminals you have, that are running the Namenode, Datanode, Jobtracker, and Tasktracker, which should be showing log messages showing how each component is doing its specific work to run the job.
blog comments powered by Disqus