Add new disks to datanode with bigger hard drivers
I am running a hdfs with some datanode, each datanode has 8 x 1TB hard drivers.
I want to add 2 x 2TB hard drivers for each datanode. I know how to add new hard drivers for datanode but I confuse that new hard drivers is bigger than old one so It maybe have problem in data distribution among hard drivers on datanode.
I think it is better to create 2 logical drivers (1TB) on 2TB hard driver then mount its to OS so that the volume of each datanode path is the same.
I need some advices. Thank for reading!
If you have mixed sized disks in a datanode, it is a common problem that the smaller disks will fill faster than the biggest ones. This is because the default volume choosing policy in the datanode is round robin. Basically the datanode will write new data to each disk in turn, taking no consideration about the size of the disks or their free space.
There is an alternative volume choosing policy which is ideal to use on datanodes with mixed sized disks called AvailableSpaceVolumeChoosingPolicy - I am not sure what distribution of Hadoop you are using, but the CDH documentation is:
If you change to that policy, then by default 75% of new writes will go to the under used disks until they catch up with the other disks and then it will fall back to round robin writes.