Giraph Job running in local mode always

I ran Giraph 1.1.0 on Hadoop 2.6.0. The mapredsite.xml looks like this

<configuration>

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
  <description>The runtime framework for executing MapReduce jobs. Can be one of
    local, classic or yarn.</description>
</property>

<property>
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx3072m</value>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx6144m</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>4</value>
</property>
</configuration>

The giraph-site.xml looks like this

<configuration>
<property>
        <name>giraph.SplitMasterWorker</name>
        <value>true</value>
</property>
<property>
        <name>giraph.logLevel</name>
        <value>error</value>
</property>
</configuration>

I do not want to run the job in the local mode. I have also set environment variable MAPRED_HOME to be HADOOP_HOME. This is the command to run the program.

hadoop jar myjar.jar hu.elte.inf.mbalassi.msc.giraph.betweenness.BetweennessComputation /user/$USER/inputbc/inputgraph.txt /user/$USER/outputBC 1.0 1

When I run this code that computes betweenness centrality of vertices in a graph, I get the following exception

Exception in thread "main" java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, you cannot run in split master / worker mode since there is only 1 task at a time!
        at org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:168)
        at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:236)
        at hu.elte.inf.mbalassi.msc.giraph.betweenness.BetweennessComputation.runMain(BetweennessComputation.java:214)
        at hu.elte.inf.mbalassi.msc.giraph.betweenness.BetweennessComputation.main(BetweennessComputation.java:218)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

What should I do to ensure that the job does not run in local mode?

Answers


I have met the problem just a few days ago.Fortunately i solved it by doing this.

Modify the configuration file mapred-site.xml,make sure the value of property 'mapreduce.framework.name' to be 'yarn' and add the property 'mapreduce.jobtracker.address' which value is 'yarn' if there is not.

The mapred-site.xml looks like this:

<configuration>
   <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
   </property>
   <property>
     <name>mapreduce.jobtracker.address</name>
     <value>yarn</value>
   </property>
</configuration>

Restart hadoop after modifying the mapred-site.xml.Then run your program and set the value which is after '-w' to be more than 1 and the value of 'giraph.SplitMasterWorker' to be 'true'.It will probably work.

As for the cause of the problem,I just quote somebody's saying: These properties are designed for single-node executions and will have to be changed when executing things in a cluster of nodes. In such a situation, the jobtracker has to point to one of the machines that will be executing a NodeManager daemon (a Hadoop slave). As for the framework, it should be changed to 'yarn'.


We can see that in the stack-trace where the configuration check in LocalJobRunner fails this is a bit misleading because it makes us assume that we run in local model.You already found the responsible configuration option: giraph.SplitMasterWorker but in your case you set it to true. However, on the command-line with the last parameter 1 you specify to use only a single worker. Hence the framework decides that you MUST be running in local mode. As a solution you have two options:

  • Set giraph.SplitMasterWorker to false although you are running on a cluster.
  • Increase the number of workers by changing the last parameter to the command-line call.

    hadoop jar myjar.jar hu.elte.inf.mbalassi.msc.giraph.betweenness.BetweennessComputation /user/$USER/inputbc/inputgraph.txt /user/$USER/outputBC 1.0 4

Please refer also to my other answer at SO (Apache Giraph master / worker mode) for details on the problem concerning local mode.


If you are after to split the master from the node you can use:

-ca giraph.SplitMasterWorker=true

also to specify the amount of workers you can use:

-w #

where "#" is the number of workers you want to use.


Need Your Help

cannot display class variables in wordpress plugin with PHP

php wordpress class plugins

I am working on a WordPress plugin and started working with classes to store information such as database settings. I have however run into a problem. I cannot output or echo the variables inside my

How to check portion of a string and accept anything after it?

php jquery json

I had a hard time wording this question. Lets say you have two things in a database: