hadoop + Writable interface + readFields throws an exception in reducer

I have a simple map-reduce program in which my map and reduce primitives look like this

map(K,V) = (Text, OutputAggregator) reduce(Text, OutputAggregator) = (Text,Text)

The important point is that from my map function I emit an object of type OutputAggregator which is my own class that implements the Writable interface. However, my reduce fails with the following exception. More specifically, the readFieds() function is throwing an exception. Any clue why ? I use hadoop 0.18.3

10/09/19 04:04:59 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/09/19 04:04:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.JobClient: Running job: job_local_0001
10/09/19 04:04:59 INFO mapred.MapTask: numReduceTasks: 1
10/09/19 04:04:59 INFO mapred.MapTask: io.sort.mb = 100
10/09/19 04:04:59 INFO mapred.MapTask: data buffer = 79691776/99614720
10/09/19 04:04:59 INFO mapred.MapTask: record buffer = 262144/327680
Length = 10
10/09/19 04:04:59 INFO mapred.MapTask: Starting flush of map output
10/09/19 04:04:59 INFO mapred.MapTask: bufstart = 0; bufend = 231; bufvoid = 99614720
10/09/19 04:04:59 INFO mapred.MapTask: kvstart = 0; kvend = 10; length = 327680
10/09/19 04:04:59 WARN mapred.LocalJobRunner: job_local_0001
 at org.myorg.OutputAggregator.readFields(OutputAggregator.java:46)
 at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
 at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
 at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:751)
 at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:691)
 at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:770)
 at org.myorg.xxxParallelizer$Reduce.reduce(xxxParallelizer.java:117)
 at org.myorg.xxxParallelizer$Reduce.reduce(xxxParallelizer.java:1)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:904)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:785)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
 at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:157)
java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
 at org.myorg.xxxParallelizer.main(xxxParallelizer.java:145)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
 at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)


When posting a question about custom code: Post the relevant piece of code. So the content of line 46 and a few lines before & after would really help ...:)

However this may help:

THE pitfall when writing your own Writable Class is the fact that Hadoop reuses the actual instance of the class over and over again. Between calls to readFields you do NOT get a shiny new instance.

So at the start of the readFields method you MUST assume the object you are in is filled with "garbage" and must be cleared before continuing.

My suggestion to you is to implement a "clear()" method that fully wipes the current instance and resets it to the state it would be in the moment after it was created and the constructor completed. And of course you call that method as the first thing in your readFields for both the key and the value.


In addition to Niels Basjes answer: Just initialize your member variables within the empty constructor (which you have to supply, otherwise Hadoop can not init your object), e.g.:

public OutputAggregator() {
    this.member = new IntWritable();

assuming that this.member is of type IntWritable.

Need Your Help

How to ajaxify Zend_Form validation

ajax json zend-framework zend-form

I have a Zend_Form subclass. Some elements are set to belong to arrays.

Why does my page display multiple questions when I only want one?

php jquery html

I'm trying to display one question at a time. My question page displays every question from my database when I only want one question then hit the "next button" to move on to the next question. Can