how to iterate through a json that has multiple pages

I have created a program that iterates through a multi-page json object.

def get_orgs(token,url):
    part1 = 'curl -i -k -X GET -H "Content-Type:application/json" -H "Authorization:Bearer '
    final_url = part1 + token + '" ' + url 
    pipe = subprocess.Popen(final_url, shell=False,stdout=subprocess.PIPE,stdin=subprocess.PIPE)
    data = pipe.communicate()[0]
    for line in data.split('\n'):
        print line
        try:
            row = json.loads(line)
            print ("next page url ",row['next'])
        except :
            pass
    return row
my_data = get_orgs(u'MyBeearerToken',"https://data.ratings.com/v1.0/org/576/portfolios/36/companies/")

The json object is as below:

[{results: [{"liquidity":"Strong","earningsPerformance":"Average"}]
,"next":"https://data.ratings.com/v1.0/org/576/portfolios/36/companies/?page=2"}]

I am using 'next' key to iterate,but at times it points to "Invalid Page" ( a page that doesn't exist). Does JSON object have a rule about how many records are there on each page ? In that case , I will use that to estimate how many pages are possible.

EDIT: Adding more details The json has just 2 keys ['results','next']. If there are multiple pages, then the 'next' key has the next page's url (as you can see in the output above). Else , it contains 'None'. But, problem is that at times, instead of 'None' , it points to the next page (which does not exist). So, I want to see if I can count the rows in Json and divide by a number to know how many pages the loop needs to iterate through.

Answers


In my opinion using urllib2 or urllib.request would be a much better option than curl in order to make the code easier to understand, but if that's a constraint - I can work with that ;-)

Assuming the json-response is all in one line (Otherwise your json.loads will throw an Exception), the task is pretty simple and this will allow you to fetch the amount of items behind the result key:

row = [{'next': 'https://data.ratings.com/v1.0/org/576/portfolios/36/companies/?page=2', 'results': [{'earningsPerformance':'Average','liquidity': 'Strong'}, {'earningsPerformance':'Average','liquidity': 'Strong'}]}]
result_count = len(row[0]["results"])

The alternative solution using httplib2 should look something like this (I didn't test this):

import httplib2
import json
h = httplib2.Http('.cache')
url = "https://data.ratings.com/v1.0/org/576/portfolios/36/companies/"
token = "Your_token"
try:
    response, content = h.request(
        url,
        headers = {'Content-Type': 'application/json', 'Authorization:Bearer': token}
    )
    # Convert the response to a string
    content = content.decode('utf-8') # You could get the charset from the header as well
    try:
        object = json.loads(content)
        result_count = len(object[0]["results"])
        # Yay, we got the result count!
    except Exception:
        # Do something if the server responds with garbage
        pass
except httplib2.HttpLib2Error:
    # Handle the exceptions, here's a list: https://httplib2.readthedocs.io/en/latest/libhttplib2.html#httplib2.HttpLib2Error
    pass

For more on httplib2 and why its amazing I suggest reading Dive Into Python.


Need Your Help

applying business rules at the database level

sql database oracle rule-engine business-rules

I'm working on a project in which we will need to determine certain types of statuses for a large body of people, stored in a database. The business rules for determining these statuses are fairly

From which file does Angular js execution start?

javascript angularjs

I have a angular js project. I downloaded the seed project and made some changes. I am able to run it. When I start the server, i have to manually go to