Aggregates Supported by Mongo Hadoop Connecter?

I am trying to do some aggregate operation on mongo collection using mongo hadoop (https://github.com/mongodb/mongo-hadoop) library for spark. I input my query using mongo.input.query configuration that is sent as an input to newApiHadoopRDD.

Configuration mongodbConfig = new Configuration();
mongodbConfig.set("mongo.job.input.format","com.mongodb.hadoop.MongoInputFormat";
mongodbConfig.set("mongo.input.uri","mongodb://"+mongodbHost+"/"+database.collection);
mongodbConfig.set("mongo.input.query",query);

JavaPairRDD audienceRDD =  sc.newAPIHadoopRDD(mongodbConfig, MongoInputFormat.class,Object.class, BSONObject.class);
audienceRDD.foreach(e -> System.out.println("data: "+e.toString()));


query={ "aggregate" : "__collection__" , "pipeline" : [ 
{ "$match" : { "date" : { "$gte" : { "$date" : "2016-08-09T00:00:00.000Z"} , "$lte" : { "$date" : "2016-08-11T00:00:00.000Z"}}}} , 
{ "$unwind" : "$segments"} , 
{ "$group" : { "_id" : "$segments" , "audienceSize" : { "$sum" : "$count"}}}]}, sort={ }, fields={ }, limit=0, notimeout=false}

The operation is successful in case I use a normal query like find. But when I try to use groupBy, I dint get any records on RDD. Can some one suggest a way to do aggregate operation on mongo collection using the mongo hadoop connector.

Answers


Anyways, due to the 16 MB limit on running aggregation queries, I ended up creating a temporaryCollection with the records and then making query on that temporary collection. Stored the response on an RDD and once I finished what I wanted to do, I dropped the temporary collection.

That said I think adding ability to make aggregate queries using the mongo.input.query would be a nice addition to this nice connector library.


Need Your Help

Connect SQL Server with PHP: Laravel 5.3

php laravel laravel-5.2 laravel-5.1 laravel-5.3

Inside config\database.php below setting is changed from mysql to sqlsrv