Why does SQOOP launch 4 mappers by default?
I am just trying to understand why by default SQOOP launches 4 mappers? In some cases if we raise the mappers to 8, there are chances that could give us better performance. So, what criteria has been considered for choosing the default mappers as 4 ? Thanks in Advance.
I will quote 7.2.4.Controlling Parallelism from Apache Sqoop's official site.
By default, four tasks are used. Some databases may see improved performance by increasing this value to 8 or 16.
Do not increase the degree of parallelism greater than that available within your MapReduce cluster; tasks will run serially and will likely increase the amount of time required to perform the import.
Do not increase the degree of parallelism higher than that which your database can reasonably support. Connecting 100 concurrent clients to your database may increase the load on the database server to a point where performance suffers as a result.
So your answer is Performance Issues. To achieve a better performance under normal conditions, this value was decided to be named as default. Hope that helps.