Map Reduce to parse JSON data in hadoop 2.2

Hello I have a JSON in the following format.I need to parse this in the map function to get the gender information of all the records.

        "SeasonTicket" : false, 
        "name" : "Vinson Foreman", 
        "gender" : "male", 
        "age" : 50, 
        "email" : "", 
        "annualSalary" : "$98,501.00", 
        "id" : 0
        "SeasonTicket": true, 
        "name": "Genevieve Compton", 
        "gender": "female", 
        "age": 28, 
        "email": "", 
        "annualSalary": "$46,881.00", 
        "id": 1
        "SeasonTicket": false, 
        "name": "Christian Crawford", 
        "gender": "male", 
        "age": 53, 
        "email": "", 
        "annualSalary": "$53,488.00", 
        "id": 2

I have tried using JSONparser but am not able to get through the JSON structure.I have been advised to use JAQL and pig but cannot do so.

Any help would be appreciated.


What I understand is that you have a huge file with an array of JSONs. Of this, you need to read the same to a mapper and emit say <id : gender>. The challenge is that JSON falls across to multiple lines.

In this is the case, I would suggest you to change the default delimiter to "}" instead of "\n".

In this case, you will be able to get parts of the JSON into the map method as value. You can discard the key ie. byte offset and do slight re-fractor on the value like removing off unwanted [ ] or , and adding chars like "}" and then parse the remaining string.

This solution works because there is no nesting within JSON and } is a valid JSON end delimiter as per the given example.

For changing the default delimiter, just set the property textinputformat.record.delimiter to "}"

Please check out this example.

Also check this jira.

