Difference between Rack Awareness and Name node
I was going through Hadoop, I have doubt whether there is difference between Rack wareness and Name Node. Will Rack wareness and name node will remain on same box
As Aviral rightly said, the question has been quite vague. But just quoting for your understanding,
Namenode : The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file. The NameNode responds the successful requests by returning a list of relevant DataNode servers where the data lives. You can read in detail about this concept here.
Rack Awareness : In simple words rack awareness is the strategy namenode employs to choose the nearest datanode based on rack information. You can read details here
Further more, I would like to suggest this blog
Image credits Brad Hedlund
From Apache HDFS Users Guide
HDFS is the primary distributed storage used by Hadoop applications.
A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data
Typically large Hadoop clusters are arranged in racks and network traffic between different nodes with in the same rack is much more desirable than network traffic across the racks. In addition NameNode tries to place replicas of block on multiple racks for improved fault tolerance.
From RackAwareness tutorial:
Hadoop components are rack-aware. For example, HDFS block placement will use rack awareness for fault tolerance by placing one block replica on a different rack. This provides data availability in the event of a network switch failure or partition within the cluster.
Let's see how Hadoop writes are implemented.
If the writer is on a datanode, the 1st replica is placed on the local machine, otherwise a random datanode.
The 2nd replica is placed on a datanode that is on a different rack.
The 3rd replica is placed on a datanode which is on a different node of the rack as the second replica.
Due to replication of data blocks on three different nodes across two different RACs, Hadoop read operations provides high availability of data blocks.
At least one replica is stored on different RAC. If one RAC is not accessible, still Hadoop can fetch data block from other RAC.