How to find time spent by mappers and reducers in Hadoop?

How to find time spent by each mapper and reducer as well as time for shuffling (sorting) within the code (not in web interface) in Hadoop? How about total time by all mapper (or reducers?

Answers


There is an API for the JobTracker as described here which gives you a bunch of information on the cluster itself as well as details for all jobs.

In particular, if you know the job id and you want to find metrics for each individual map and reduce tasks, you could call getMapTaskReports which will return a TaskReport instance detailed here which gives you access to methods such as getFinishTime or getStartTime. So for example:

TaskReport[] maps = jobtracker.getMapTaskReports("your_job_id");
for (TaskReport rpt : maps) {
  long duration = rpt.getFinishTime() - rpt.getStartTime();
  System.out.println("Mapper duration: " + duration);
}
TaskReport[] reduces = jobtracker.getReduceTaskReports("your_job_id");
for (TaskReport rpt : reduces) {
  long duration = rpt.getFinishTime() - rpt.getStartTime();
  System.out.println("Reducer duration: " + duration);
}

To count the total time by all mappers or reducers in your job, you could just sum them up simply in the code.

And regarding the shuffling, this is generally counted in the jobtracker as 33% of each reduce task, which does not necessarily mean it's 33% of the time but I don't think there's an automated way to get the shuffling time per task so you could just go with this simple heuristic with 33%.

Please take into account though that by using time measurements from the jobtracker API like shown above, the time in reducers might be a bit biased, because when a reduce task starts it essentially does the shuffling (up to 33% as explained), then it waits until all map tasks are finished, and only then does it start the actual reduce, so a reduce measurement is actually the sum of these 3 periods (shuffle + wait + reduce).


Need Your Help

bundle ID issue - I can't get the Xcode bundle ID and the iTunes connect ID to reflect the same id

ios xcode itunesconnect

I have an iOS app I'm trying to submit and my app runs on my iPhone. I added the apple id account to Xcode and typed in my bundle id under the general section in Xcode then clicked "fix issue" whic...

Call instance declared in Main from outside method on Timer Event

c# events methods timer

Its my first time working with Timers and I need to call an instance of a class declared in my Main. My program is a Quiddich game, and this is my Main.