Scala DataFrame: Explode an array

I am using the spark libraries in Scala. I have created a DataFrame using

val searchArr = Array(
  StructField("log",IntegerType,true),
  StructField("user", StructType(Array(
    StructField("date",StringType,true),
    StructField("ua",StringType,true),
    StructField("ui",LongType,true))),true),
  StructField("what",StructType(Array(
    StructField("q1",ArrayType(IntegerType, true),true),
    StructField("q2",ArrayType(IntegerType, true),true),
    StructField("sid",StringType,true),
    StructField("url",StringType,true))),true),
  StructField("where",StructType(Array(
    StructField("o1",IntegerType,true),
    StructField("o2",IntegerType,true))),true)
)

val searchSt = new StructType(searchArr)    

val searchData = sqlContext.jsonFile(searchPath, searchSt)

I am now what to explode the field what.q1, which should contain an array of integers, but the documentation is limited: http://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/DataFrame.html#explode(java.lang.String,%20java.lang.String,%20scala.Function1,%20scala.reflect.api.TypeTags.TypeTag)

So far I tried a few things without much luck

val searchSplit = searchData.explode("q1", "rb")(q1 => q1.getList[Int](0).toArray())

Any ideas/examples of how to use explode on an array?

Answers


Did you try with an UDF on field "what"? Something like that could be useful:

val explode = udf {
(aStr: GenericRowWithSchema) => 
  aStr match {
      case null => ""
      case _  =>  aStr.getList(0).get(0).toString()
  }
}


val newDF = df.withColumn("newColumn", explode(col("what")))

where:

  • getList(0) returns "q1" field
  • get(0) returns the first element of "q1"

I'm not sure but you could try to use getAs[T](fieldName: String) instead of getList(index: Int).


Need Your Help

Can a Directory alone load specific query on a page?

php url redirect directory

On websites such as facebook and many others you see URLs such as www.facebook.com/username. How does a URL such as this actually load the users information from a MySQL database? and what is the a...

Statistics on Facebook shares

ios facebook facebook-ios-sdk share-button

I want to add a share button inside an iOS app and wonder if it is possible to get any statistics of how many people have used the shared button inside my app? When people use the share function they