17 December 2014

Builing Hadoop Sources

You'd like to hack the Hadoop source code in your favorite IDE, here are the steps to follow to avoid lots of frustration.

1. Checkout/Clone the repo

I'm using Cloudera version, so I'd like to be synched with the exact version I'm running. But the community repo https://github.com/apache/hadoop would also make it:

git clone https://github.com/cloudera/hadoop-common.git hadoop-cdh

Go to the freshly cloned repo

cd hadoop-cdh

2. Ensure Java 7+

Well, you'd better to anticipate the Java7 EOL in April 2015* and go with 8, but you must have at least Java7.

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/

3. Ensure Protobuf 2.5+

You'll find more info about where to get and how to build protobuf 2.5+ on ubuntu 12.x here: http://steveloughran.blogspot.ch/2013/09/how-to-update-ubuntu-12x-box-to-protoc.html

You can then point your maven hadoop plugin to your custom installation of protobuf:

export HADOOP_PROTOC_CDH5_PATH=/path/to/protobuf-2.5.0/src/protoc

You can now compile the project
mvn clean install -DskipTests

4. Run Maven Eclipse

Unfortunately the maven-eclipse-plugin has a regression after version 2.6, and thus this exact version needs to be used to run it properly:

mvn org.apache.maven.plugins:maven-eclipse-plugin:2.6:clean org.apache.maven.plugins:maven-eclipse-plugin:2.6:eclipse

And that's it, you're good to import the project you need into your favorite IDE!

1 comment:

  1. For HDP, it's about the same, except that:

    - the source code is stored in https://github.com/hortonworks/hadoop-release/
    - the protoc variable to set is HADOOP_PROTOC_PATH