Saving garbage collection logs into ${yarn.nodemanager.log-dirs}/application_${appid}/container_${contid} for mappers and reducers on Hadoop Yarn

I'm trying to log garbage collection metrics for my mappers and reducers. However I'm unable to get the logs to go into the path: ${yarn.nodemanager.log-dirs}/application_${appid}/container_${contid}

Here is what my mapred-site.xml with the relevant properties looks like:

<property> <name>mapreduce.map.java.opts</name> <value>-Xloggc:${yarn.nodemanager.log-dirs}/application_${appid}/container_${contid}/gc-@taskid@.log -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xloggc:${yarn.nodemanager.log-dirs}/application_${appid}/container_${contid}/gc-@taskid@.log -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags</value> </property>

But the logs do not appear in the correct location in spite of the above configurations. Any insights into this issue would be highly appreciated.

Answers


  1. run ps xww or inspect /proc/<pid>/cmdline to see if the flags are passed with the expected values to the JVM
  2. check if the directories exist

Considering that the flags you present seem to be about right, I would suggest printing the flags that the Java process is loading as suggested by the8472.

Personally, i'm not familiar with hadoop but one of my initial steps looking into the scenario you describe would be to check the value of the variables being used, mainly ${yarn.nodemanager.log-dirs} may have something like: /opt/path to my/app which could cause the -Xloggc value to be only /opt/path.

Finally, I'd suggest have you to do some testing to ensure the values are being correctly interpreted:

-Xloggc:/tmp/application_${appid}/container_${contid}/gc-@taskid@.log -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags

I'll try to improve this answer and detail it a bit more later on.


I resolved this issue by using the property: ${yarn.app.container.log.dir} to log into the ${yarn.nodemanager.log-dirs}/application_${appid}/container_${contid} directory

So the full configuration I used is as follows:

<property> <name>mapreduce.map.java.opts</name> <value>-Xloggc:${yarn.app.container.log.dir}/gc-@taskid@.log -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xloggc:${yarn.app.container.log.dir}/gc-@taskid@.log -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags</value> </property>


Need Your Help

copy file from gcs to s3 in boto3

python amazon-s3 google-cloud-storage boto3

I am looking to copy files from gcs to my s3 bucket. In boto2, easy as a button.