Hadoop File Upload Process Inner Workings

I'm currently stuck with a problem where I can upload files to HDFS when running the client from any of the nodes actually in the cluster, but can't do the same when running the client from my local computer (even though I can do things like run an ls from my local client). I'm pretty sure this is a ports issue, but the smaller problem got me thinking I'd like to understand exactly what communication happens between my client computer, the namenode, and the datanodes when I try to upload a file anyway. So, can anyone enlighten me? What exactly happens when, over what ports, and between which computers?

Answers


This was an EC2 issue where I'd get Datanode EC2 private IPs returned by the namenode to all clients regardless of whether or not they were in EC2 or on our private network. Those ips obviously wouldn't work for clients outside of EC2, so any operation where a datanode was involved and getting hit from outside of EC2 would screw up. I never found a good solution for this and just decided to make people query from inside EC2 for now.


Need Your Help

Cannot call method 'start' of undefined error using Backbone.history.start() and QUnit

javascript unit-testing testing backbone.js qunit

I'm currently writing unit tests for my backbone.js application, and I'm having a few problems testing the backbone routes with QUnit.

Using a scalar as a condition in perl

perl if-statement conditional-statements scalar

First timer...so let me know if there is anything that I have not paid attention to whilst posing a question.