6 mappers created by default in Sqoop?
I always believed that if you do not specify the -m property in a sqoop import, by default 4 mappers are created. However, in my case 6 mappers are being created. Can someone explain this ? This is the sqoop command :
sqoop import --connect jdbc:mysql://localhost/cloudera --target-dir hdfsout --split-by employeename --username root --password XXXXX --table employee
employee table has 3 columns . employeename, age and dateofjoining. In hdfs also, 6 map part files 0001, 0002 .... 0005 are created.
First of all, splitting on integer column is recommended.
As per code,
LOG.warn("You are strongly encouraged to choose an integral split column.");
It is not guaranteed that sqoop will generate the same amount of mappers as mentioned by you in -m <number of mappers>
See code for details. Useful part of code:
// Use this as a hint. May need an extra task if the size doesn't // divide cleanly. int numSplits = ConfigurationHelper.getConfNumMaps(conf);
If you use --verbose at the end of the query. you will see boundary values on which split is done.
If you split it on integer value, I am sure only 4 mappers will run in this case.
4 is the default value for the SUGGESTED number of mappers.
So running a squoop command without suggesting a number of mappers, should be equivalent to running a squoop command with a suggestion of 4 mappers.
However, the suggestion may be ignored, you can already see that the suggestion can be ignored if you do not give a suggestion, but as can be seen here, the suggestion can also be ignored if you do specify the number.