HDP 2.2 is out. Woot!
Disclaimer: This article describes complete hacks how to install HDP 2.2 in Ubuntu 14.04, which is not (yet) officially supported by HortonWorks. Use it at your own risk.
The Minimal Requirement page specify "64-bit Ubuntu Precise (12.04)", which from my understanding is the minimal (i.e. oldest) Ubuntu version HDP runs on. But actually it's not the case. HDP 2.2 is not running on any other version of Ubuntu. And that's a pitty. In my brand new cluster, I'd like obviously to install 14.04 which comes with newest version of Ansible to name only this one...
So what's the problem?Ambari agent is detecting the OS version and report it to Ambari server. The server has hardcoded strings of OS version it accepts, including "ubuntu12".
The idea here is to "trick" ambari to let it report ubuntu12 to ambari server to move forward with the installation process.
Hack it!The guilty file is /usr/lib/ambari-agent/lib/ambari_commons/os_check.py, and this file is reading /etc/*-release to find out the OS version. Replacing 14.04 by 12.04 in these files do the trick.
$ sed -e "s/14.04/12.04/g" -i /etc/*-release
All set! Now Ambari agent can successfully contact the server.
Ganglia configurationGanglia relies on apache2, and the configuration layout in 14.04 changed from /etc/apache2/conf.d to /etc/apache2/conf-available and /etc/apache2/conf-enabled (more info on Debian sources). So another hack is required to create a symlink from /etc/apache2/conf.d to /etc/apache2/conf-enabled otherwise ganglia will failed to install
$ ln -fs /etc/apache2/conf-enabled/ /etc/apache2/conf.d
Lib Postgresql to point to the correct jarlibpostgresql-jdbc-java package is required to use Ambari, Hive, Oozie etc. with PostgreSQL. In 14.04, it provides 2 jars, /usr/share/java/postgresql-jdbc3.jar and /usr/share/java/postgresql-jdbc4.jar. Hive, at least, looks for /usr/share/java/postgresql-jdbc.jar (version agnostic). A symlink needs to be created to jdbc4 to avoid startup failure:
- In Ansible:
file: src=/usr/share/java/postgresql-jdbc4.jar dest=/usr/share/java/postgresql-jdbc.jar state=link
- Simply in bash:
ln -sf /usr/share/java/postgresql-jdbc4.jar /usr/share/java/postgresql-jdbc.jar
- In Ansible:
file: src=/usr/share/java/postgresql-jdbc.jar dest=/var/lib/ambari-server/resources/postgres-jdbc-driver.jar state=link
- Simply in bash:
ln -sf /usr/share/java/postgresql-jdbc.jar /var/lib/ambari-server/resources/postgres-jdbc-driver.jar
Testing your deployment (aka updated 1TB terasort copy-paste example for HDP 2.2)
$ hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar \ teragen -Ddfs.block.size=536870912 -Dmapred.map.tasks=20 \ 10000000000 /tmp/1Tsort/input $ hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar \ terasort -Ddfs.block.size=536870912 -Dmapred.reduce.tasks=20 -Dmapreduce.terasort.output.replication=3 \ /tmp/1Tsort/input /tmp/1Tsort/output $ hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar \ teravalidate /tmp/1Tsort/output /tmp/1Tsort/report
Find more details about HDP 2.2 in the complet HDP 2.2 documentation.