Secure Hadoop From Scratch
This is a short guide to getting started with building and running the latest Hadoop, with security support, from source code. It builds on Hadoop From Source. This post is meant to be self-contained, so you can just start from this, or if you have trouble with it, you can look at Hadoop From Source and see if that works better for you. Let me know of any problems by emailing me (ekoontz@hiro-tan.org) or filing a GitHub issue and/or a patch.
Prerequisites
Prerequisites are the same as given by [Hadoop From Source] (2012-02-27-hadoop-from-source#Prerequisites), plus:
Kerberos
I used MIT Kerberos for my setup. Athough I haven’t had time to try alternative Kerberos implementations such as:
I would like to hear of anyone’s experiences with them.
DNS
Make sure that you can resolve hostnames correctly, both forward and reverse (that is, if you configure Hadoop to use a host H, that H is resolvable to the correct IP I, and that reverse DNS works too, so that I reverse-DNS-resolves to H.
Also, you will need to add the following Java property in various places below:
-Dsun.net.spi.nameservice.provider.1=dns,sun
This serves to override Java’s default name resolution protocol ‘sun’, which seems to have problems when interacting with Kerberos names that include hostnames (e.g. service/hostname@REALM).
In particular, if this property is not explicitly set, Java will try to use the wrong IP for H. We will show symptoms of this behavior below so that you can be aware of it and how to correct it if necessary.
To avoid such problems, however, please note below where this property is mentioned as being required.
Root
You will need root access for two reasons:
- Having root access allows you to run kadmin.local to add or modify Kerberos principals, which will be necessary below.
- Starting the Hadoop datanode securely requires root access. (though, why?)
Secure core-site.xml
Secure environment variables
YARN_OPTS=-DHADOOP_JAAS_DEBUG=true \
-Dsun.net.spi.nameservice.provider.1=dns,sun\
-Djava.security.krb5.realm=HADOOP.LOCALDOMAIN \
-Djava.security.krb5.kdc=mac.foofers.org
HADOOP_OPTS=-DHADOOP_JAAS_DEBUG=true \
-Dsun.net.spi.nameservice.provider.1=dns,sun
-Djava.security.krb5.realm=HADOOP.LOCALDOMAIN \
-Djava.security.krb5.kdc=mac.foofers.org
Troubleshooting
You may see something like:
Note the long pause in log message timestamps between 22:41:13
and
22:42:44
. This indicates that the Kerberos server could not be
reached. Check to see that the hostname H
given in your
-Djava.security.kerb5.krc=H
is reachable: do telnet H 88
. If this
succeeds, then the Kerberos server is probably working ok : the
problem is that Java is unable to resolve H to the correct IP. Check
to make sure that -Dsun.net.spi.nameservice.provider.1=dns,sun
is
set.
For more on DNS and Java interaction, have a look at my hostname-testing Github repository, which has some sample code to test interactions between Java and DNS.
Related Material
hostname-testing, sample code to test interactions between Java and DNS.
jaas_and_kerberos, sample code to test interactions between Java, JAAS, and Kerberos.
Up and Running With Secure Zookeeper, another guide to getting started with a sample Kerberos setup (and how to integrate Zookeeper with it).
Tips
In a terminal window, run the following and keep it where you can watch it:
sudo tail -f /var/log/krb5kdc/kdc.log
blog comments powered by Disqus