Custom log4j appender in Hadoop 2
How to specify custom log4j appender in Hadoop 2 (amazon emr)?
Hadoop 2 ignores my log4j.properties file that contains custom appender, overriding it with internal log4j.properties file. There is a flag -Dhadoop.root.logger that specifies logging threshold, but it does not help for custom appender.
1.in order to change log4j.properties at the name node, u can change /home/hadoop/log4j.properties.
2.in order to change log4j.properties for the container logs, u need to change it at the yarn containers jar, since they hard-coded loading the file directly from project resources.
2.1 ssh to the slave (on EMR u can also simply add this as bootstrap action, so u dont need to ssh to each of the nodes). ssh to hadoop slave
2.2 override the container-log4j.properties at the jar resources:
jar uf /home/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.2.0.jar container-log4j.properties
I know this question has been answered already, but there is a better way of doing this, and this information isn't easily available anywhere. There are actually at least two log4j.properties that get used in Hadoop (at least for YARN). I'm using Cloudera, but it will be similar for other distributions.
Local properties file
Location: /etc/hadoop/conf/log4j.properties (on the client machines)
There is the log4j.properties that gets used by the normal java process. It affects the logging of all the stuff that happens in the java process but not inside of YARN/Map Reduce. So all your driver code, anything that plugs map reduce jobs together, (e.g., cascading initialization messages) will log according to the rules you specify here. This is almost never the logging properties file you care about.
As you'd expect, this file is parsed after invoking the hadoop command, so you don't need to restart any services when you update your configuration.
If this file exists, it will take priority over the one sitting in your jar (because it's usually earlier in the classpath). If this file doesn't exist the one in your jar will be used.
Container properties file
Location: etc/hadoop/conf/container-log4j.properties (on the data node machines)
This file decides the properties of the output from all the map and reduce tasks, and is nearly always what you want to change when you're talking about hadoop logging.
In newer versions of Hadoop/YARN someone caught a dangerously virulent strain of logging fever and now the default logging configuration ensures that single jobs can generate several hundred of megs of unreadable junk making your logs quite hard to read. I'd suggest putting something like this at the bottom of the container-log4j.properties file to get rid of most of the extremely helpful messages about how many bytes have been processed:
log4j.logger.org.apache.hadoop.mapreduce=WARN log4j.logger.org.apache.hadoop.mapred=WARN log4j.logger.org.apache.hadoop.yarn=WARN log4j.logger.org.apache.hadoop.hive=WARN log4j.security.logger=WARN
By default this file usually doesn't exist, in which case the copy of this file found in hadoop-yar-server-nodemanager-stuff.jar (as mentioned by uriah kremer) will be used. However, like with the other log4j-properties file, if you do create /etc/hadoop/conf/container-log4j.properties it will be used on all your YARN stuff. Which is good!
Note: No matter what you do, a copy of container-log4j-properties in your jar will not be used for these properties, because the YARN nodemanager jars are higher in the classpath. Similarly, despite what the internet tells you -Dlog4j.configuration=PATH_TO_FILE will not alter your container logging properties because the option doesn't get passed on to yarn when the container is initialized.
Look for hadoop-config.sh in the deployment. That is the script being sourced before executing the hadoop command. I see the following code in hadoop-config.sh, see if modifying that helps.