Map Reduce to parse JSON data in hadoop 2.2

Hello I have a JSON in the following format.I need to parse this in the map function to get the gender information of all the records.

[
    {
        "SeasonTicket" : false, 
        "name" : "Vinson Foreman", 
        "gender" : "male", 
        "age" : 50, 
        "email" : "vinsonforeman@cyclonica.com", 
        "annualSalary" : "$98,501.00", 
        "id" : 0
    }, 
    {
        "SeasonTicket": true, 
        "name": "Genevieve Compton", 
        "gender": "female", 
        "age": 28, 
        "email": "genevievecompton@cyclonica.com", 
        "annualSalary": "$46,881.00", 
        "id": 1
    }, 
    {
        "SeasonTicket": false, 
        "name": "Christian Crawford", 
        "gender": "male", 
        "age": 53, 
        "email": "christiancrawford@cyclonica.com", 
        "annualSalary": "$53,488.00", 
        "id": 2
    }
]

I have tried using JSONparser but am not able to get through the JSON structure.I have been advised to use JAQL and pig but cannot do so.

Any help would be appreciated.

Answers


What I understand is that you have a huge file with an array of JSONs. Of this, you need to read the same to a mapper and emit say <id : gender>. The challenge is that JSON falls across to multiple lines.

In this is the case, I would suggest you to change the default delimiter to "}" instead of "\n".

In this case, you will be able to get parts of the JSON into the map method as value. You can discard the key ie. byte offset and do slight re-fractor on the value like removing off unwanted [ ] or , and adding chars like "}" and then parse the remaining string.

This solution works because there is no nesting within JSON and } is a valid JSON end delimiter as per the given example.

For changing the default delimiter, just set the property textinputformat.record.delimiter to "}"

Please check out this example.

Also check this jira.


Need Your Help

Photoshop - script to color in image based on RGB info at current location

javascript photoshop

Looking to see if this is possible for some artwork we could put in our office. I have some complex line drawings based on some of our very old engineering drawings. They are basically 2 layer files,

Delete a line using SED from a file and redirect the output to same file

unix sed solaris ksh

I use sed command to delete first two lines and last line from a file and i want to redirect the output to the SAME file