Hadoop Map/Reduce Job progress counters

I have a map/reduce job and I want to track the number of records processed in the map phase of the job. To do that, I am using Custom Counters, and incrementing them by 1, in my map phase. Also, I am monitoring these counters after every 30 seconds.

However, when I am checking the job counters progress using Job Client, I am looking at the number of records processed are not even with each periodic interval. Sometimes, there is no change and sometimes there is.

context.getCounter(ApplicationCounters.TOTAL_NUMRECORDS_PROCESSEDBY_MAP)
            .increment(1);

My hadoop cluster heartbeat interval is 15 seconds. Does not that means, I should get regular consistent updated.

Answers


Be sure that your job has finished, using waitForCompletion is recommended. Querying the counter during runtime can end in strange results.

The counters are globally aggregated by the framework at the end of the job.


Need Your Help

How do you use script variables in psql?

sql postgresql variables psql

In MS SQL Server, I create my scripts to use customizable variables:

Why does this precision fail example show up with doubles but not with floats?

java floating-point double floating-point-precision

There's a great discussion on SO about why floating-point should not be used for currency operations and a lovely example is given here from Bloch's Effective Java.