Install Hadoop distribution

First, download the binary tarball of the latest Hadoop 1 release (currently 1.2.1).

Untar this to your home directory:

tar xfz hadoop-1.2.1-bin.tar.gz

You should now have a hadoop-1.2.1 directory:

HDFS

Edit hdfs-site.xml

Within this directory, find the file conf/hdfs-site.xml and use a text editor to change the file to have the following contents:

Format HDFS

Within the hadoop-1.2.1 directory, run the following:

bin/hadoop namenode -format

Sample output of this command is shown here:

Note from the above the Re-format filesystem in /tmp/hadoop-ekoontz/dfs/name ? (Y or N). This is because I ran this command before, and so there is existing contents in this directory. I confirmed with Y because this is only for temporary work and I have no valuable data there.

The directory /tmp/hadoop-XXXX will vary for your installation depending on your username - your username will used instead of ekoontz.

Start namenode

Start the namenode with: bin/hadoop namenode.

Sample output of this command:

Start datanode

Open up another shell, cd to the hadoop-1.2.1 directory, and do: bin/hadoop datanode.

If you get an error about Incompatible namespaceIDs like this:

Then do the following:

 rm -rf /tmp/hadoop-ekoontz/dfs/data

where ekoontz should be replaced with your username as mentioned above in the “Format HDFS” section, and then rerun the command bin/hadoop datanode.

If all goes well, the output of starting the datanode should look like this:

Glance at the first terminal, where you started the namenode. You should see some acknowledgement from the Namenode that the Datanode that you started in the second terminal has connected to the Namenode:

Test HDFS

Open up a third terminal and cd to the hadoop-1.2.1 directory as before. Run the following:

bin/hadoop fs -ls hdfs://localhost:8020/
bin/hadoop fs -mkdir hdfs://localhost:8020/mydir
bin/hadoop fs -copyFromLocal README.txt hdfs://localhost:8020/mydir
bin/hadoop fs -lsr hdfs://localhost:8020/

(-lsr means ‘ls, recursively’, similar to the Unix command ls -R).

The output from the above should look similar to the following:

MapReduce

Edit mapred-site.xml

Within the hadoop-1.2.1 directory, find the file conf/mapred-site.xml and use a text editor to change the file to have the following contents:

Start Jobtracker

Open up a new shell, cd to the hadoop-1.2.1 directory and do: bin/hadoop jobtracker.

The output should look something like:

Start Tasktracker

Open up a new shell, cd to the hadoop-1.2.1 directory and do: bin/hadoop tasktracker.

The output should look something like:

Looking at the shell in which you started the Jobtracker, you should see some acknowledgement that the Jobtracker received a connection from the Tasktracker:

Test MapReduce

We are now ready to run a sample MapReduce job. The example will use is “Pi”, which estimates the value of the mathematical constant Pi by random sampling. This job’s definition is found in a jar file in your hadoop-1.2.1 directory called hadoop-exmples-1.2.1.jar.

As before, open up a new terminal, cd to hadoop-1.2.1 and do:

bin/hadoop jar hadoop-examples-1.2.1.jar pi 5 5

The output of running the job in this terminal should look something like this:

The job should take a few seconds to complete. As it is running, watch the open terminals you have, that are running the Namenode, Datanode, Jobtracker, and Tasktracker, which should be showing log messages showing how each component is doing its specific work to run the job.

blog comments powered by Disqus

Published

28 October 2013

Hadoop on a single host