How to see input records of a particular hadoop task?

I am running a hadoop job. All, but 4 tasks are done. I am pondering why is it taking so much longer to process those chunks. My guess is that those input records are "hard" to process by my job. To test locally I would like to retrieve those input records. How an I do this?

The status column for the task says hdfs://10.4.94.75:8020/user/someuser/myfilename:154260+3

But what does it mean?

Answers


The last part of the status gives you information about the split. More specifically:

  hdfs://10.4.94.75:8020/user/someuser/myfilename:154260+3

tells you that the task having this status processed the split of "myfilename" starting at byte offset 154260 in "myfilename" and having length 3.

Given this piece of information, you can detect the records assigned to this task by skiping in the file to byte 154260 and reading 3 bytes.


Need Your Help

CodeIgniter - HMVC, Helpers and Undefined property

php codeigniter hmvc

I use WireDesignz HMVC and have a KERNEL_Controller (which is actually the MY_Controller with another prefix) using CodeIgniter 3.0.6

Open camera take photo and set it in image view in PopupWindow - android

android android-camera android-imageview popupwindow

I have a PopupWindow in Android, and inside it I have a imageview. When I press the imageview I want to open the camera take a photo and when I comeback to set photo for imageview's background.