14 January 2015

Rename a host in Ambari (1.7.0)

Ambari might be tricky for some real life operation, like moving a component from a dead node to another (for instance, the App Timeline Server can't be moved to another host if the current host is dead). We end-up in particular corner cases where it's easier to rename a node in db than failing to move the component.

But updating the hostname is not as straight forward as update hosts set host_name = 'new-hostname' where host_name = 'old-hostname' because the primary key in different tables is the hostname.

Updating all hostname in all the required tables in the same transaction does not work either, because the constraint will fail while doing the update. We should defer constraint checks to ensure the transaction is run successfully.
Mysql

If you choose Mysql as the backend DB, lucky you, it's fairly trivial:
SET FOREIGN_KEY_CHECKS=0; 

BEGIN;
UPDATE ambari.clusterhostmapping set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.configgrouphostmapping set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.host_role_command set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.hostcomponentdesiredstate set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.hostcomponentstate set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.hostconfigmapping set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.hoststate set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.hosts set host_name = '${new-name}' WHERE host_name = '${old-name}';
COMMIT;

SET FOREIGN_KEY_CHECKS=1;
Postgresql

If you have Postgres as the backend DB, well, there's a bit more work to do:
  1. Delete the constraints about hosts.host_name primary key (yes, really)
  2. Update the hostname
  3. Re-created the foreign key constraints
Automatically find the constraint

Connect to postgres with the admin user

sudo su - postgres
psql -d ambari

Run a query which will generate the ALTER command to drop all the constraints on hosts.host_name

SELECT 'ALTER TABLE '||nspname||'.'||relname||' DROP CONSTRAINT '||conname||';'
FROM pg_constraint
INNER JOIN pg_class ON conrelid=pg_class.oid
INNER JOIN pg_namespace ON pg_namespace.oid=pg_class.relnamespace
where conname in ('hstcmponentdesiredstatehstname', 'hostcomponentstate_host_name', 'fk_hoststate_host_name', 'fk_hostconfmapping_host_name', 'fk_host_role_command_host_name', 'fk_cghm_hname', 'clusterhostmapping_cluster_id') ORDER BY CASE WHEN contype='f' THEN 0 ELSE 1 END,contype,nspname,relname,conname;

Run a query which will generate the ALTER command to create all the constraints on hosts.host_name

SELECT 'ALTER TABLE '||nspname||'.'||relname||' ADD CONSTRAINT '||conname||' '|| pg_get_constraintdef(pg_constraint.oid)||';'
FROM pg_constraint
INNER JOIN pg_class ON conrelid=pg_class.oid
INNER JOIN pg_namespace ON pg_namespace.oid=pg_class.relnamespace
where conname in ('hstcmponentdesiredstatehstname', 'hostcomponentstate_host_name', 'fk_hoststate_host_name', 'fk_hostconfmapping_host_name', 'fk_host_role_command_host_name', 'fk_cghm_hname', 'clusterhostmapping_cluster_id')
ORDER BY CASE WHEN contype='f' THEN 0 ELSE 1 END DESC,contype DESC,nspname DESC,relname DESC,conname DESC;

Save the ALTER commands in you favorite text editor for later, or use the ones provided below.
Delete the constraints

 ALTER TABLE ambari.clusterhostmapping DROP CONSTRAINT clusterhostmapping_cluster_id;
 ALTER TABLE ambari.configgrouphostmapping DROP CONSTRAINT fk_cghm_hname;
 ALTER TABLE ambari.host_role_command DROP CONSTRAINT fk_host_role_command_host_name;
 ALTER TABLE ambari.hostcomponentdesiredstate DROP CONSTRAINT hstcmponentdesiredstatehstname;
 ALTER TABLE ambari.hostcomponentstate DROP CONSTRAINT hostcomponentstate_host_name;
 ALTER TABLE ambari.hostconfigmapping DROP CONSTRAINT fk_hostconfmapping_host_name;
 ALTER TABLE ambari.hoststate DROP CONSTRAINT fk_hoststate_host_name;
Update the hostname

Replace ${old-name} and ${new-name} with the appropriate values.
BEGIN;
UPDATE ambari.clusterhostmapping set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.configgrouphostmapping set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.host_role_command set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.hostcomponentdesiredstate set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.hostcomponentstate set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.hostconfigmapping set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.hoststate set host_name = '${new-name}' WHERE host_name = '${old-name}';
UPDATE ambari.hosts set host_name = '${new-name}' WHERE host_name = '${old-name}';
COMMIT;

Recreate the constaints
 ALTER TABLE ambari.hoststate ADD CONSTRAINT fk_hoststate_host_name FOREIGN KEY (host_name) REFERENCES ambari.hosts(host_name);
 ALTER TABLE ambari.hostconfigmapping ADD CONSTRAINT fk_hostconfmapping_host_name FOREIGN KEY (host_name) REFERENCES ambari.hosts(host_name);
 ALTER TABLE ambari.hostcomponentstate ADD CONSTRAINT hostcomponentstate_host_name FOREIGN KEY (host_name) REFERENCES ambari.hosts(host_name);
 ALTER TABLE ambari.hostcomponentdesiredstate ADD CONSTRAINT hstcmponentdesiredstatehstname FOREIGN KEY (host_name) REFERENCES ambari.hosts(host_name);
 ALTER TABLE ambari.host_role_command ADD CONSTRAINT fk_host_role_command_host_name FOREIGN KEY (host_name) REFERENCES ambari.hosts(host_name);
 ALTER TABLE ambari.configgrouphostmapping ADD CONSTRAINT fk_cghm_hname FOREIGN KEY (host_name) REFERENCES ambari.hosts(host_name);
 ALTER TABLE ambari.clusterhostmapping ADD CONSTRAINT clusterhostmapping_cluster_id FOREIGN KEY (host_name) REFERENCES ambari.hosts(host_name);

Done.

Restart ambari-server

Once you updated the hostname, you need to restart ambari-server.
Actually, you might want to shut it down before running the queries to ensure consistent state of the db.

Resources

If you're interested in more generic readings on deferred foreign keys check, I can recommend this one which was from a great inspiration to me:


.

06 January 2015

Hadoop hftp:// returns an error HTTP_OK expected, received 500

It's fairly common these days to migrate from one version of Hadoop to another. Hopefully you have 2 clusters, and you planned to upgrade one cluster, sync some data and migrate the second one. This means that for a certain period of time, you'll have two different versions of Hadoop running (CDH4 and CDH5 for instance, HDP1.3 and HDP2.2 or a mix of both).
To transfer data between two clusters with different major versions like these, the best way is to use distcp, reading from the source via HFTP and writing to the destination cluster with HDFS, HFTP being read only.

Now things become more tricky when your distcp jobs fail with the error java.io.IOException: HTTP_OK expected, received 500:

FAIL randomFile.txt : java.io.IOException: HTTP_OK expected, received 500
 at org.apache.hadoop.hdfs.HftpFileSystem$RangeHeaderUrlOpener.connect(HftpFileSystem.java:380)
 at org.apache.hadoop.hdfs.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:119)
 at org.apache.hadoop.hdfs.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103)
 at org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:187)
 at java.io.DataInputStream.read(DataInputStream.java:83)
 at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:426)
 at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:549)
 at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:316)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
 at org.apache.hadoop.mapred.Child.main(Child.java:262)

I will give a quick solution of the problem, but in a second phase I will explain the root cause of this issue, dig into HFTP protocol, put my hands down to the namenode source code and more generally give some trick to debug the Hadoop stack.

Short version

Disable "Wildcard binding" in the NameNode. HFTP does not work with the namenode binding on all the interfaces (0.0.0.0). Seriously!

In Cloudera Manager, got to HDFS configuration and search for Wildcard

Ensure the property dfs.namenode.rpc-address in hdfs-site.xml is different from 0.0.0.0:

<property>
    <name>dfs.namenode.rpc-address</name>
    <value>0.0.0.0:8020</value>
</property>

And restart the NameNode.

I reported the issue in Apache Jira: HDFS-7586

If you're eager for more details, keep reading!

HFTP

HFTP is another data transfer protocol builtin Hadoop core.
From the server perspective, it exposes two HTTP endpoints (servlet) in the NameNode and another in the DataNode:

  • NameNode
    • http://namenode:50070/listPaths/${path_to_file_in_hdfs} is handled by org.apache.hadoop.hdfs.server.namenode.ListPathServlet
    • http://namenode:50070/data/${path_to_file_in_hdfs} is handled by org.apache.hadoop.hdfs.server.namenode.FileDataServlet and redirects (header Location) the client to a datanode (storing the first block)
  • DataNode
    • http://datanode:50075/streamFile/${path_to_file_in_hdfs} is handled by org.apache.hadoop.hdfs.server.namenode.StreamFile and streams the file to the client.
From the client, the org.apache.hadoop.fs.FileSystem implementation is org.apache.hadoop.hdfs.HftpFileSystem, but as it's all HTTP requests, curl can be used to emulate the calls:

  1. Calling the namenode to find out where to download from
    $ curl -vv http://namenode1:50070/data/tmp/randomFile.txt?ugi=hdfs | grep "Location:"
     Location: http://datanode003:50075/streamFile/tmp/randomFile.txt?ugi=hdfs&nnaddr=10.0.0.4:8020
  2. Downloading the file
    $ curl -vv http://datanode003:50075/streamFile/tmp/randomFile.txt?ugi=hdfs&nnaddr=10.0.0.4:8020
Easy.
Note: we can call any http://datanode:50075 to download the file, but calling the one returned by the call to the namenode gives the guarantee you're reading from the datanode storing the most blocks locally (see org.apache.hadoop.hdfs.server.common.JspHelper.getBestNode() function for more details).

Note 2: seems like the datanode is receiving the namenode IP in parameter. Not sure why, but it's required.
Debugging without using Curl

Not everybody is familiar with curl, but there's another, easier and thus limited, way to debug hadoop client: turning on logging on the hadoop or hdfs command line tool
In CDH4:
export HADOOP_ROOT_LOGGER=TRACE,console
In CDH5:
export HADOOP_OPTS="-Droot.logger=TRACE,console"

After setting this environment variable, hdfs dfs -cat hftp://namenode1/tmp/randomFile.txt will start to be more noisy:

$ hdfs dfs -cat hftp://namenode1/tmp/randomFile.txt
15/01/06 07:39:29 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of successful kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
15/01/06 07:39:29 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of failed kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
...
15/01/06 07:39:30 TRACE fs.FileSystem: url=http://namenode1:50070/listPaths/tmp/randomFile.txt?ugi=hdfs
15/01/06 07:39:30 DEBUG security.Groups: Returning cached groups for 'hdfs'
15/01/06 07:39:30 TRACE fs.FileSystem: url=http://namenode1:50070/data/tmp/randomFile.txt?ugi=hdfs
cat: HTTP_OK expected, received 500
As you can see, we can figure out more informations. You need to use curl to dig further, but in a lot of other scenarios, turning on DEBUG or TRACE logging on the client is a first vital step.

HTTP_OK expected, received 500

We understood HFTP and are more familiar with the Servlet endpoints of HFTP.

Tthe trace below is still missing some details, mostly what HTTP request is made to the datanode. For this, we need curl.

$ curl -vv http://namenode1:50070/data/tmp/randomFile.txt?ugi=hdfs | grep "Location:"
 Location: http://datanode003:50075/streamFile/tmp/randomFile.txt?ugi=hdfs&nnaddr=0.0.0.0:8020

Ouch, 0.0.0.0:8020. We saw ealier that the datanode is trying to contact the namenode received in paramter. Opening a connection to 0.0.0.0 will always fail...

And the stacktrace relative to the failed request to the datanode:

java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:528)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:492)
 at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:510)
 at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:604)
 at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:252)
 at org.apache.hadoop.ipc.Client.getConnection(Client.java:1291)
 at org.apache.hadoop.ipc.Client.call(Client.java:1209)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at $Proxy15.getBlockLocations(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor147.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at $Proxy15.getBlockLocations(Unknown Source)
 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:155)
 at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:970)
 at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:960)
 at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:239)
 at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:206)
 at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:199)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1103)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1089)
 at org.apache.hadoop.hdfs.server.namenode.StreamFile.doGet(StreamFile.java:94)
Obvious, the datanode can't connect to the NameNode. We go the root cause.


But where this 0.0.0.0 comes from?

That's the question. The FileDataServlet is responsible for redirecting the client, so this should be the guilty. Digging into org.apache.hadoop.hdfs.server.namenode.FileDataServlet source code shows that the redirection URI is generated using dfs.namenode.rpc-address parameter (DFSConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY) to get the ip address. But if you turned on wildcard binding, this value contains 0.0.0.0:8020 in hdfs-site.xml:

<property>
    <name>dfs.namenode.rpc-address</name>
    <value>0.0.0.0:8020</value>
</property>

This concludes our investigation: HFTP is broken with wildcard binding feature turned on.

Now, how to fix that?

The easiest way is to disable wildcard binding if it's not really needed. But if it is really needed, then code change is required:

As the Servlet knows to which of its IP addresses the request comes from, it could use this IP instead of 0.0.0.0. This might not work in all the scenario, where you have HFTP listening on one interface and HDFS (RPC) on another, but this would already do a better job than the broken current version.

--- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileDataServlet.java
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileDataServlet.java
@@ -45,6 +45,7 @@ import org.apache.hadoop.util.ServletUtil;
 public class FileDataServlet extends DfsServlet {
   /** For java.io.Serializable */
   private static final long serialVersionUID = 1L;
+  private static final String WILDCARD_INTERFACE = "0.0.0.0";
 
   /** Create a redirection URL */
   private URL createRedirectURL(String path, String encodedPath, HdfsFileStatus status, 
@@ -80,6 +81,19 @@ public class FileDataServlet extends DfsServlet {
     NameNode nn = NameNodeHttpServer.getNameNodeFromContext(
         getServletContext());
     String addr = NetUtils.getHostPortString(nn.getNameNodeAddress());
+    // Wildcard binding special case: returning 0.0.0.0 to the DN is always wrong.
+    if (addr != null && addr.startsWith(WILDCARD_INTERFACE + ":")) {
+        LOG.warn("Wildcard binding seems to be turned on. Returning 0.0.0.0 is always wrong. " +
+            "The best we can do is using the ip address the requested is connected to " +
+            "(HttpServletRequest.getLocalAddr()) instead (" + request.getLocalAddr() +"), but this might " +
+            "not always be correct (multiple interface, hftp listen to one interface, hdfs-rpc to another one). " +
+            "Please turn off wildcard binding if you're using HFTP, i.e. " +
+            "dfs.namenode.rpc-address (DFSConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY) and " +
+            "dfs.namenode.http-address must (DFSConfigKeys.DFS_NAMENODE_HTTP_ADDRESS_KEY) " +
+            "must point to a ip address different than 0.0.0.0.");
+
+        addr = addr.replace(WILDCARD_INTERFACE, request.getLocalAddr());
+    }
     String addrParam = JspHelper.getUrlParam(JspHelper.NAMENODE_ADDRESS, addr);
     
     return new URL(scheme, hostname, port,

Happy Hadoop debugging and patching!

Final Note: Hadoop Core has a really good code coverage, but the MiniDFSCluster is forcing to listen on 127.0.0.1, i.e. it does not test anything in regard to wildcard binding. That's probably why this issue wasn't found by the tests.
.