06 January 2015

Hadoop hftp:// returns an error HTTP_OK expected, received 500

It's fairly common these days to migrate from one version of Hadoop to another. Hopefully you have 2 clusters, and you planned to upgrade one cluster, sync some data and migrate the second one. This means that for a certain period of time, you'll have two different versions of Hadoop running (CDH4 and CDH5 for instance, HDP1.3 and HDP2.2 or a mix of both).
To transfer data between two clusters with different major versions like these, the best way is to use distcp, reading from the source via HFTP and writing to the destination cluster with HDFS, HFTP being read only.

Now things become more tricky when your distcp jobs fail with the error java.io.IOException: HTTP_OK expected, received 500:

FAIL randomFile.txt : java.io.IOException: HTTP_OK expected, received 500
 at org.apache.hadoop.hdfs.HftpFileSystem$RangeHeaderUrlOpener.connect(HftpFileSystem.java:380)
 at org.apache.hadoop.hdfs.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:119)
 at org.apache.hadoop.hdfs.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103)
 at org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:187)
 at java.io.DataInputStream.read(DataInputStream.java:83)
 at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:426)
 at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:549)
 at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:316)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
 at org.apache.hadoop.mapred.Child.main(Child.java:262)

I will give a quick solution of the problem, but in a second phase I will explain the root cause of this issue, dig into HFTP protocol, put my hands down to the namenode source code and more generally give some trick to debug the Hadoop stack.

Short version

Disable "Wildcard binding" in the NameNode. HFTP does not work with the namenode binding on all the interfaces ( Seriously!

In Cloudera Manager, got to HDFS configuration and search for Wildcard

Ensure the property dfs.namenode.rpc-address in hdfs-site.xml is different from


And restart the NameNode.

I reported the issue in Apache Jira: HDFS-7586

If you're eager for more details, keep reading!


HFTP is another data transfer protocol builtin Hadoop core.
From the server perspective, it exposes two HTTP endpoints (servlet) in the NameNode and another in the DataNode:

  • NameNode
    • http://namenode:50070/listPaths/${path_to_file_in_hdfs} is handled by org.apache.hadoop.hdfs.server.namenode.ListPathServlet
    • http://namenode:50070/data/${path_to_file_in_hdfs} is handled by org.apache.hadoop.hdfs.server.namenode.FileDataServlet and redirects (header Location) the client to a datanode (storing the first block)
  • DataNode
    • http://datanode:50075/streamFile/${path_to_file_in_hdfs} is handled by org.apache.hadoop.hdfs.server.namenode.StreamFile and streams the file to the client.
From the client, the org.apache.hadoop.fs.FileSystem implementation is org.apache.hadoop.hdfs.HftpFileSystem, but as it's all HTTP requests, curl can be used to emulate the calls:

  1. Calling the namenode to find out where to download from
    $ curl -vv http://namenode1:50070/data/tmp/randomFile.txt?ugi=hdfs | grep "Location:"
     Location: http://datanode003:50075/streamFile/tmp/randomFile.txt?ugi=hdfs&nnaddr=
  2. Downloading the file
    $ curl -vv http://datanode003:50075/streamFile/tmp/randomFile.txt?ugi=hdfs&nnaddr=
Note: we can call any http://datanode:50075 to download the file, but calling the one returned by the call to the namenode gives the guarantee you're reading from the datanode storing the most blocks locally (see org.apache.hadoop.hdfs.server.common.JspHelper.getBestNode() function for more details).

Note 2: seems like the datanode is receiving the namenode IP in parameter. Not sure why, but it's required.
Debugging without using Curl

Not everybody is familiar with curl, but there's another, easier and thus limited, way to debug hadoop client: turning on logging on the hadoop or hdfs command line tool
In CDH4:
In CDH5:
export HADOOP_OPTS="-Droot.logger=TRACE,console"

After setting this environment variable, hdfs dfs -cat hftp://namenode1/tmp/randomFile.txt will start to be more noisy:

$ hdfs dfs -cat hftp://namenode1/tmp/randomFile.txt
15/01/06 07:39:29 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of successful kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
15/01/06 07:39:29 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of failed kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
15/01/06 07:39:30 TRACE fs.FileSystem: url=http://namenode1:50070/listPaths/tmp/randomFile.txt?ugi=hdfs
15/01/06 07:39:30 DEBUG security.Groups: Returning cached groups for 'hdfs'
15/01/06 07:39:30 TRACE fs.FileSystem: url=http://namenode1:50070/data/tmp/randomFile.txt?ugi=hdfs
cat: HTTP_OK expected, received 500
As you can see, we can figure out more informations. You need to use curl to dig further, but in a lot of other scenarios, turning on DEBUG or TRACE logging on the client is a first vital step.

HTTP_OK expected, received 500

We understood HFTP and are more familiar with the Servlet endpoints of HFTP.

Tthe trace below is still missing some details, mostly what HTTP request is made to the datanode. For this, we need curl.

$ curl -vv http://namenode1:50070/data/tmp/randomFile.txt?ugi=hdfs | grep "Location:"
 Location: http://datanode003:50075/streamFile/tmp/randomFile.txt?ugi=hdfs&nnaddr=

Ouch, We saw ealier that the datanode is trying to contact the namenode received in paramter. Opening a connection to will always fail...

And the stacktrace relative to the failed request to the datanode:

java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:528)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:492)
 at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:510)
 at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:604)
 at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:252)
 at org.apache.hadoop.ipc.Client.getConnection(Client.java:1291)
 at org.apache.hadoop.ipc.Client.call(Client.java:1209)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at $Proxy15.getBlockLocations(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor147.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at $Proxy15.getBlockLocations(Unknown Source)
 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:155)
 at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:970)
 at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:960)
 at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:239)
 at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:206)
 at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:199)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1103)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1089)
 at org.apache.hadoop.hdfs.server.namenode.StreamFile.doGet(StreamFile.java:94)
Obvious, the datanode can't connect to the NameNode. We go the root cause.

But where this comes from?

That's the question. The FileDataServlet is responsible for redirecting the client, so this should be the guilty. Digging into org.apache.hadoop.hdfs.server.namenode.FileDataServlet source code shows that the redirection URI is generated using dfs.namenode.rpc-address parameter (DFSConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY) to get the ip address. But if you turned on wildcard binding, this value contains in hdfs-site.xml:


This concludes our investigation: HFTP is broken with wildcard binding feature turned on.

Now, how to fix that?

The easiest way is to disable wildcard binding if it's not really needed. But if it is really needed, then code change is required:

As the Servlet knows to which of its IP addresses the request comes from, it could use this IP instead of This might not work in all the scenario, where you have HFTP listening on one interface and HDFS (RPC) on another, but this would already do a better job than the broken current version.

--- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileDataServlet.java
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileDataServlet.java
@@ -45,6 +45,7 @@ import org.apache.hadoop.util.ServletUtil;
 public class FileDataServlet extends DfsServlet {
   /** For java.io.Serializable */
   private static final long serialVersionUID = 1L;
+  private static final String WILDCARD_INTERFACE = "";
   /** Create a redirection URL */
   private URL createRedirectURL(String path, String encodedPath, HdfsFileStatus status, 
@@ -80,6 +81,19 @@ public class FileDataServlet extends DfsServlet {
     NameNode nn = NameNodeHttpServer.getNameNodeFromContext(
     String addr = NetUtils.getHostPortString(nn.getNameNodeAddress());
+    // Wildcard binding special case: returning to the DN is always wrong.
+    if (addr != null && addr.startsWith(WILDCARD_INTERFACE + ":")) {
+        LOG.warn("Wildcard binding seems to be turned on. Returning is always wrong. " +
+            "The best we can do is using the ip address the requested is connected to " +
+            "(HttpServletRequest.getLocalAddr()) instead (" + request.getLocalAddr() +"), but this might " +
+            "not always be correct (multiple interface, hftp listen to one interface, hdfs-rpc to another one). " +
+            "Please turn off wildcard binding if you're using HFTP, i.e. " +
+            "dfs.namenode.rpc-address (DFSConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY) and " +
+            "dfs.namenode.http-address must (DFSConfigKeys.DFS_NAMENODE_HTTP_ADDRESS_KEY) " +
+            "must point to a ip address different than");
+        addr = addr.replace(WILDCARD_INTERFACE, request.getLocalAddr());
+    }
     String addrParam = JspHelper.getUrlParam(JspHelper.NAMENODE_ADDRESS, addr);
     return new URL(scheme, hostname, port,

Happy Hadoop debugging and patching!

Final Note: Hadoop Core has a really good code coverage, but the MiniDFSCluster is forcing to listen on, i.e. it does not test anything in regard to wildcard binding. That's probably why this issue wasn't found by the tests.

No comments:

Post a Comment