16 December 2014

Install HDP 2.2 on Ubuntu 14.04 Trusty Tahr


HDP 2.2 is out. Woot!

Disclaimer: This article describes complete hacks how to install HDP 2.2 in Ubuntu 14.04, which is not (yet) officially supported by HortonWorks. Use it at your own risk.


The Minimal Requirement page specify "64-bit Ubuntu Precise (12.04)", which from my understanding is the minimal (i.e. oldest) Ubuntu version HDP runs on. But actually it's not the case. HDP 2.2 is not running on any other version of Ubuntu. And that's a pitty. In my brand new cluster, I'd like obviously to install 14.04 which comes with newest version of Ansible to name only this one...

So what's the problem?

Ambari agent is detecting the OS version and report it to Ambari server. The server has hardcoded strings of OS version it accepts, including "ubuntu12".

The idea here is to "trick" ambari to let it report ubuntu12 to ambari server to move forward with the installation process.

Hack it!

The guilty file is /usr/lib/ambari-agent/lib/ambari_commons/os_check.py, and this file is reading /etc/*-release to find out the OS version. Replacing 14.04 by 12.04 in these files do the trick.

$ sed -e "s/14.04/12.04/g" -i /etc/*-release

All set! Now Ambari agent can successfully contact the server.

Ganglia configuration

Ganglia relies on apache2, and the configuration layout in 14.04 changed from /etc/apache2/conf.d to /etc/apache2/conf-available and /etc/apache2/conf-enabled (more info on Debian sources). So another hack is required to create a symlink from /etc/apache2/conf.d to /etc/apache2/conf-enabled otherwise ganglia will failed to install

$ ln -fs /etc/apache2/conf-enabled/ /etc/apache2/conf.d

Lib Postgresql to point to the correct jar

libpostgresql-jdbc-java package is required to use Ambari, Hive, Oozie etc. with PostgreSQL. In 14.04, it provides 2 jars, /usr/share/java/postgresql-jdbc3.jar and /usr/share/java/postgresql-jdbc4.jar. Hive, at least, looks for /usr/share/java/postgresql-jdbc.jar (version agnostic). A symlink needs to be created to jdbc4 to avoid startup failure:
  • In Ansible:
    file: src=/usr/share/java/postgresql-jdbc4.jar dest=/usr/share/java/postgresql-jdbc.jar state=link
  • Simply in bash:
    ln -sf  /usr/share/java/postgresql-jdbc4.jar /usr/share/java/postgresql-jdbc.jar
Another link to update is the postgresql jar file sent by Ambari to test the jdbc connection.
  • In Ansible:
    file: src=/usr/share/java/postgresql-jdbc.jar dest=/var/lib/ambari-server/resources/postgres-jdbc-driver.jar state=link
  • Simply in bash:
    ln -sf /usr/share/java/postgresql-jdbc.jar /var/lib/ambari-server/resources/postgres-jdbc-driver.jar

Testing your deployment (aka updated 1TB terasort copy-paste example for HDP 2.2)


$ hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar \
    teragen -Ddfs.block.size=536870912 -Dmapred.map.tasks=20 \
    10000000000 /tmp/1Tsort/input
$ hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar \
    terasort -Ddfs.block.size=536870912 -Dmapred.reduce.tasks=20 -Dmapreduce.terasort.output.replication=3 \
    /tmp/1Tsort/input /tmp/1Tsort/output
$ hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar \
    teravalidate /tmp/1Tsort/output /tmp/1Tsort/report

Find more details about HDP 2.2 in the complet HDP 2.2 documentation.

6 comments:

  1. If your namenode fails to (re)start with an error message

    2015-01-27 21:06:44,360 - Retrying after 10 seconds. Reason: Execution of 'su -s /bin/bash - hdfs -c 'export PATH=$PATH:/usr/hdp/current/hadoop-client/bin ; hdfs --config /etc/hadoop/conf dfsadmin -safemode get' | grep 'Safe mode is OFF'' returned 1. stdin: is not a tty

    you might want to apply the following patch in the node running the namenode:

    # diff -u /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HDFS/package/scripts/hdfs_namenode.py-orig /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HDFS/package/scripts/hdfs_namenode.py
    --- /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HDFS/package/scripts/hdfs_namenode.py-orig 2015-01-27 21:43:56.589503784 +0100
    +++ /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HDFS/package/scripts/hdfs_namenode.py 2015-01-27 21:41:36.012206081 +0100
    @@ -49,7 +49,7 @@
    else:
    dfs_check_nn_status_cmd = None

    - namenode_safe_mode_off = format("su -s /bin/bash - {hdfs_user} -c 'export PATH=$PATH:{hadoop_bin_dir} ; hdfs --config {hadoop_conf_dir} dfsadmin -safemode get' | grep 'Safe mode is OFF'")
    + namenode_safe_mode_off = format("su -s /bin/bash - {hdfs_user} -c 'export PATH=$PATH:{hadoop_bin_dir} ; hdfs --config {hadoop_conf_dir} dfsadmin -safemode get' | grep 'Safe mode is OFF'; true")

    if params.security_enabled:
    Execute(format("{kinit_path_local} -kt {hdfs_user_keytab} {hdfs_principal_name}"),
    root@daplab-rt-12:/var/log/ambari-agent# cd
    root@daplab-rt-12:~# diff -u /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HDFS/package/scripts/hdfs_namenode.py-orig /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HDFS/package/scripts/hdfs_namenode.py
    --- /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HDFS/package/scripts/hdfs_namenode.py-orig 2015-01-27 21:43:56.589503784 +0100
    +++ /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HDFS/package/scripts/hdfs_namenode.py 2015-01-27 21:41:36.012206081 +0100
    @@ -49,7 +49,7 @@
    else:
    dfs_check_nn_status_cmd = None

    - namenode_safe_mode_off = format("su -s /bin/bash - {hdfs_user} -c 'export PATH=$PATH:{hadoop_bin_dir} ; hdfs --config {hadoop_conf_dir} dfsadmin -safemode get' | grep 'Safe mode is OFF'")
    + namenode_safe_mode_off = format("su -s /bin/bash - {hdfs_user} -c 'export PATH=$PATH:{hadoop_bin_dir} ; hdfs --config {hadoop_conf_dir} dfsadmin -safemode get' | grep 'Safe mode is OFF'; true")

    if params.security_enabled:
    Execute(format("{kinit_path_local} -kt {hdfs_user_keytab} {hdfs_principal_name}"),

    Yeah, basically return true even if the su command fails in order to move forward. The drawback is the restart will not wait for the Safe Mode to be turned off.

    ReplyDelete
  2. Thanks for your precious information !
    To avoid modifying /etc/*-releases system files you can modify os_check.py file on functions :
    get_os_version : dist = "12.02"
    get_os_major_version : return "12"

    + depending on your java version installed on ubuntu and set on Ambari you should probably set a link or alternatives between java versions.

    Note : working fine with client HDP2.2 installation on a cluster with Ambari 1.7.0

    ReplyDelete
  3. This is great stuff! I was able to install HDP 2.2 using Ambari 2.0.1 using your sed hack. I didn't have to do the remaining adjustments. However, I installed storm, and I had to make a symbolic link in order for it to work on all my machines that were running storm:
    sudo ln -s /usr/lib/jvm/java-7-oracle/bin/jps /usr/lib/jvm/java-7-oracle/jre/bin/jps

    ReplyDelete
  4. Benoit:

    How did you get the ambari package lists to work for apt? In my Trusty set up it ignores these packages:

    ...
    Hit http://us.archive.ubuntu.com trusty/restricted Translation-en
    Ign http://public-repo-1.hortonworks.com Ambari/main Translation-en_US
    Ign http://public-repo-1.hortonworks.com Ambari/main Translation-en
    Hit http://us.archive.ubuntu.com trusty/universe Translation-en
    ...

    Here is the content of my ambari.list: (I am trying Ambari 2.0.1)
    deb http://public-repo-1.hortonworks.com/ambari/ubuntu12/2.x/updates/2.0.1 Ambari main

    ReplyDelete
    Replies
    1. I just brew something for you: https://github.com/killerwhile/hdp-utils
      In a nutshell, HDP is maintaining a url in S3 with pointers to the most recent version of HDP: http://s3.amazonaws.com/dev.hortonworks.com/HDP/hdp_urlinfo.json
      Interesting to see there that HDP2.4 is in preparation, as well as few other major upgrades.

      Delete