hadoop hbase cdh4 job fails to start with permission errors

Within the CDH4 ecosystem, I am trying to get a mapreduce job to output to an hbase table. For some reason it is failing during the addDependencyJars call of the configuration setup.

From what I can tell is that the hbase configuration does not pick up the hadoop configuration (see the warning from output of job). I have provided hdfs-site.xml, job configuration, job output with stack trace, and file permissions.

Any help or insight into how to debug this further will be greatly appreciated.

hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- replication configuration -->
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.permissions.superusergroup</name>
        <value>hadoop</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/var/hadoop/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/var/hadoop/datanode</value>
    </property>
</configuration>

// Configuration for job

Configuration conf = HBaseConfiguration.create(); 
Job job = new Job(conf);
job.setJarByClass(LocalCsvCdrHbaseJob.class); 
job.setJobName("Local CVS CDR Venue Session Analysis to hbase"); 
job.setMapOutputKeyClass(IntWritable.class); 
job.setMapOutputValueClass(VenueSession.class); 
job.setMapperClass(VenueMapper.class); 
job.setReducerClass(VenueSessionCountHbaseReducer.class); 
job.setInputFormatClass(TextInputFormat.class); 
job.setOutputFormatClass(TableOutputFormat.class); 
FileInputFormat.setInputPaths(job, new Path(args[0])); 
TableMapReduceUtil.initTableReducerJob("venue_session", VenueSessionCountHbaseReducer.class, job); 
TableMapReduceUtil.addDependencyJars(job); 
job.waitForCompletion(true);

The hbase classpath definitely contains the hadoop conf directory (etc/hadoop/conf).

:~ # sudo -u mapred HADOOP_CLASSPATH=`hbase classpath` hadoop jar /home/mapred/cdr-hadoop-0.0.0-SNAPSHOT.jar net.thecloud.bi.cdr.jobs.LocalCsvCdrHbaseJob /cdr-venue-sessions/2013-05-22.cdr.csv
13/08/08 11:03:12 WARN conf.Configuration: dfs.df.interval is deprecated. Instead, use fs.df.interval
13/08/08 11:03:12 WARN conf.Configuration: dfs.max.objects is deprecated. Instead, use dfs.namenode.max.objects
13/08/08 11:03:12 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
13/08/08 11:03:12 WARN conf.Configuration: dfs.data.dir is deprecated. Instead, use dfs.datanode.data.dir
13/08/08 11:03:12 WARN conf.Configuration: dfs.name.dir is deprecated. Instead, use dfs.namenode.name.dir
13/08/08 11:03:12 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
13/08/08 11:03:12 WARN conf.Configuration: fs.checkpoint.dir is deprecated. Instead, use dfs.namenode.checkpoint.dir
13/08/08 11:03:12 WARN conf.Configuration: dfs.block.size is deprecated. Instead, use dfs.blocksize
13/08/08 11:03:12 WARN conf.Configuration: dfs.access.time.precision is deprecated. Instead, use dfs.namenode.accesstime.precision
13/08/08 11:03:12 WARN conf.Configuration: dfs.replication.min is deprecated. Instead, use dfs.namenode.replication.min
13/08/08 11:03:12 WARN conf.Configuration: dfs.name.edits.dir is deprecated. Instead, use dfs.namenode.edits.dir
13/08/08 11:03:12 WARN conf.Configuration: dfs.replication.considerLoad is deprecated. Instead, use dfs.namenode.replication.considerLoad
13/08/08 11:03:12 WARN conf.Configuration: dfs.balance.bandwidthPerSec is deprecated. Instead, use dfs.datanode.balance.bandwidthPerSec
13/08/08 11:03:12 WARN conf.Configuration: dfs.safemode.threshold.pct is deprecated. Instead, use dfs.namenode.safemode.threshold-pct
13/08/08 11:03:12 WARN conf.Configuration: dfs.http.address is deprecated. Instead, use dfs.namenode.http-address
13/08/08 11:03:12 WARN conf.Configuration: dfs.name.dir.restore is deprecated. Instead, use dfs.namenode.name.dir.restore
13/08/08 11:03:12 WARN conf.Configuration: dfs.https.client.keystore.resource is deprecated. Instead, use dfs.client.https.keystore.resource
13/08/08 11:03:12 WARN conf.Configuration: dfs.backup.address is deprecated. Instead, use dfs.namenode.backup.address
13/08/08 11:03:12 WARN conf.Configuration: dfs.backup.http.address is deprecated. Instead, use dfs.namenode.backup.http-address
13/08/08 11:03:12 WARN conf.Configuration: dfs.permissions is deprecated. Instead, use dfs.permissions.enabled
13/08/08 11:03:12 WARN conf.Configuration: dfs.safemode.extension is deprecated. Instead, use dfs.namenode.safemode.extension
13/08/08 11:03:12 WARN conf.Configuration: dfs.datanode.max.xcievers is deprecated. Instead, use dfs.datanode.max.transfer.threads
13/08/08 11:03:12 WARN conf.Configuration: dfs.https.need.client.auth is deprecated. Instead, use dfs.client.https.need-auth
13/08/08 11:03:12 WARN conf.Configuration: dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
13/08/08 11:03:12 WARN conf.Configuration: dfs.replication.interval is deprecated. Instead, use dfs.namenode.replication.interval
13/08/08 11:03:12 WARN conf.Configuration: fs.checkpoint.edits.dir is deprecated. Instead, use dfs.namenode.checkpoint.edits.dir
13/08/08 11:03:12 WARN conf.Configuration: dfs.write.packet.size is deprecated. Instead, use dfs.client-write-packet-size
13/08/08 11:03:12 WARN conf.Configuration: dfs.permissions.supergroup is deprecated. Instead, use dfs.permissions.superusergroup
13/08/08 11:03:12 WARN conf.Configuration: topology.script.number.args is deprecated. Instead, use net.topology.script.number.args
13/08/08 11:03:12 WARN conf.Configuration: dfs.umaskmode is deprecated. Instead, use fs.permissions.umask-mode
13/08/08 11:03:12 WARN conf.Configuration: dfs.secondary.http.address is deprecated. Instead, use dfs.namenode.secondary.http-address
13/08/08 11:03:12 WARN conf.Configuration: fs.checkpoint.period is deprecated. Instead, use dfs.namenode.checkpoint.period
13/08/08 11:03:12 WARN conf.Configuration: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl
13/08/08 11:03:12 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
Exception in thread "main" java.io.IOException: java.lang.RuntimeException: java.io.IOException: Permission denied
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.findOrCreateJar(TableMapReduceUtil.java:598)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(TableMapReduceUtil.java:549)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(TableMapReduceUtil.java:513)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableReducerJob(TableMapReduceUtil.java:456)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableReducerJob(TableMapReduceUtil.java:393)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableReducerJob(TableMapReduceUtil.java:363)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableReducerJob(TableMapReduceUtil.java:346)
    at net.thecloud.bi.cdr.jobs.LocalCsvCdrHbaseJob.main(LocalCsvCdrHbaseJob.java:46)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied
    at org.apache.hadoop.util.JarFinder.getJar(JarFinder.java:164)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.findOrCreateJar(TableMapReduceUtil.java:595)
    ... 12 more
Caused by: java.io.IOException: Permission denied
    at java.io.UnixFileSystem.createFileExclusively(Native Method)
    at java.io.File.checkAndCreate(File.java:1704)
    at java.io.File.createTempFile(File.java:1792)
    at org.apache.hadoop.util.JarFinder.getJar(JarFinder.java:156)
    ... 17 more

file permissions

:~ # ls -l /var/hadoop/
total 12
drwxrwxrwx 2 hdfs   hdfs   4096 Aug  8 09:23 datanode
drwxrwxrwx 3 mapred hadoop 4096 Aug  8 09:41 mapred
drwxrwxrwx 3 hdfs   hdfs   4096 Aug  8 09:59 namenode

hdfs permissions

:~ # hdfs dfs -ls -R /
drwxrwxrwx   - hdfs  hadoop          0 2013-08-08 09:36 /cdr-venue-sessions
-rw-rw-rw-   3 hdfs  hadoop   27014304 2013-08-08 09:36 /cdr-venue-sessions/2013-05-22.cdr.csv
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:10 /hbase
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:07 /hbase/.logs
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:06 /hbase/.oldlogs
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:10 /hbase/.tmp
-rw-rw-rw-   3 hbase hadoop         38 2013-08-08 10:06 /hbase/hbase.id
-rw-rw-rw-   3 hbase hadoop          3 2013-08-08 10:06 /hbase/hbase.version
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:10 /hbase/venue_session
-rw-rw-rw-   3 hbase hadoop        711 2013-08-08 10:10 /hbase/venue_session/.tableinfo.0000000001
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:10 /hbase/venue_session/.tmp
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:10 /hbase/venue_session/5cd64eee2dea6b1464023f24eee3daf0
-rw-rw-rw-   3 hbase hadoop        246 2013-08-08 10:10 /hbase/venue_session/5cd64eee2dea6b1464023f24eee3daf0/.regioninfo
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:10 /hbase/venue_session/5cd64eee2dea6b1464023f24eee3daf0/values
drwxrwxrwt   - hdfs  hadoop          0 2013-08-08 09:41 /tmp
drwxrwxrwx   - mapred hadoop          0 2013-08-08 09:41 /tmp/hadoop-mapred
drwxrwxrwx   - mapred hadoop          0 2013-08-08 09:41 /tmp/hadoop-mapred/mapred
drwxrwxrwx   - mapred hadoop          0 2013-08-08 10:06 /tmp/hadoop-mapred/mapred/system
-rw-rw-rw-   3 mapred hadoop          4 2013-08-08 10:06 /tmp/hadoop-mapred/mapred/system/jobtracker.info
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:30 /user-venue-types
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:28 /var
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:28 /var/hadoop
drwxrwxrwx   - mapred hadoop          0 2013-08-08 09:28 /var/hadoop/mapred
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:27 /var/lib
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:27 /var/lib/hadoop-hdfs
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:27 /var/lib/hadoop-hdfs/cache
drwxrwxrwx   - mapred hadoop          0 2013-08-08 09:27 /var/lib/hadoop-hdfs/cache/mapred
drwxrwxrwx   - mapred hadoop          0 2013-08-08 09:27 /var/lib/hadoop-hdfs/cache/mapred/mapred
drwxrwxrwt   - mapred hadoop          0 2013-08-08 09:27 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:30 /venues

Answers


Permissions are not so easy in Hadoop usually. Couple of debugging points:

  • Please check from which user you run your job and which user is 'visible' on Hadoop cluster.
  • Please check what happens inside method and what files are being modified.
  • Assure required permissions are met. If not, you have option either to disable HDFS permissions or 'represent' yourself as different user to Hadoop cluster.

These questions could be useful for you:


Need Your Help

Chainable dynamic attributes in Python

python

I want to both create and set the value of an attribute's attribute in Python.

Azure SQL Database "DTU percentage" metric

azure azure-sql-database azure-storage azure-monitoring

With the new Azure SQL Database tier structure, it seems important to monitor your database "DTU" usage to know whether to upgrade or downgrade to another tier.