Difference between mapreduce split and spark paritition

I wanted to ask is there any significant difference in data partitioning when working with Hadoop/MapReduce and Spark? They both work on HDFS(TextInputFormat) so it should be same in theory.

Are there any cases where the there procedure of data partitioning can differ? Any insights would be very helpful to my study.

Thanks

Answers


Is any significant difference in data partitioning when working with Hadoop/mapreduce and Spark?

Spark supports all hadoop I/O formats as it uses same Hadoop InputFormat APIs along with it's own formatters. So, Spark input partitions works same way as Hadoop/MapReduce input splits by default. Data size in a partition can be configurable at run time and It provides transformation like repartition, coalesce, and repartitionAndSortWithinPartition can give you direct control over the number of partitions being computed.

Are there any cases where their procedure of data partitioning can differ?

Apart from Hadoop, I/O APIs Spark does have some other intelligent I/O Formats(Ex: Databricks CSV and NoSQL DB Connectors) which will directly return DataSet/DateFrame(more high-level things on top of RDD) which are spark specific.

Key points on spark partitions when reading data from Non-Hadoop sources

  • The maximum size of a partition is ultimately by the connectors,
    • for S3, the property is like fs.s3n.block.size or fs.s3.block.size.
    • Cassandra property is spark.cassandra.input.split.size_in_mb.
    • Mongo prop is, spark.mongodb.input.partitionerOptions.partitionSizeMB.
  • By default the number of partitions is the max(sc.defaultParallelism, total_data_size / data_block_size). some times number of available cores in the cluster also imflunce the number of partitions like sc.parallelize() without partitions param.

Read more.. link1


Need Your Help

Calling command prompt from Qt application without freezing?

c++ qt command-line

In my Qt GUI application, I am calling the command prompt through:

Zend Form View Helper

php zend-framework zend-form zend-form-element

I've built a View helper that will build a checkbox label and input. The issue I'm having is on populate the checkbox value is ignored.