Working with JSONPath in Python: A Tutorial to Get Used to It

Would you like to learn how to use JSONPath in Python to extract specific data from your JSON documents? You are in the right place.

JSONPath is a query language that can be used to extract data from JSON documents (e.g. a JSON string or a JSON file). One of the main implementations of JSONPath for Python is the module jsonpath-ng. This module understands JSONPath syntax and returns the part of the document you want to select with a JSONPath expression.

We will go through a few examples starting from a very simple one so you can get used to the syntax of the jsonpath-ng module.

Once you get familiar with this module it will be a lot easier to understand more complex parsing expressions.

What is JSONPath?

Have you ever wondered how to extract data from a JSON document?

One of the ways is with JSONPath…

JSON Path is a query language that allows extracting specific data from a JSON document similarly to XPath for XML.

An alternative to JSONPath is to go through the data structure returned by the Python json module programmatically but using that approach could be less memory efficient compared to using JSONPath.

While going through this tutorial you can test JSONPath expressions in your browser using this online tool.

Which Module Can You Use to Evaluate a JSONPath on a JSON String in Python?

To evaluate a JSONPath on a JSON string with Python you can use the jsonpath-ng module.

The same applies to JSON data retrieved from a file.

How to Install the jsonpath-ng Module

To install the module jsonpath-ng you can use the following PIP command:

pip3.8 install jsonpath-ng

Note: if you don’t have the jsonpath-ng module installed locally you will see the following error when trying to import this module.

ModuleNotFoundError: No module named 'jsonpath_ng'

How to Get the Value of an Attribute using Python JSONPath

Let’s take a simple JSON file called cities.json that contains a single JSON object.

{
    "city": "Paris",
    "country": "France"
}

First of all, use the json module to retrieve the content of the file.

import json

with open("cities.json", "r") as jsonfile:
    json_data = json.load(jsonfile)

print(type(json_data))
print(json_data)

As you can see below the variable json_data is a dictionary and contains the JSON read from the file.

$ python jsonpath_example.py
<class 'dict'>
{'city': 'Paris', 'country': 'France'}

The next step is to define a rule that allows to retrieve the value of an attribute from the JSON data, for example the value of the attribute “city“.

To do that we first define an expression using jsonpath-ng…

import json, jsonpath_ng

with open("cities.json", "r") as json_file:
    json_data = json.load(json_file)

jsonpath_expr = jsonpath_ng.parse("$.city")

We have used the dollar symbol at the beginning of the expression passed to jsonpath_ng.parse().

How does the dollar sign work with jsonpath-ng?

When writing a JSONPath parsing expression in Python the dollar sign represents the root object (the full object for our JSON document).

The next step is to use this expression to find the data we are looking for in the JSON.

We can use the following line of code:

extracted_data = jsonpath_expr.find(json_data)

We are using the find method of the jsonpath_expr object.

Let’s find out more about the variable extracted_data returned by the find method using the Python print function.

print(f"The variable extracted_data is of type {type(extracted_data)} and it has {len(extracted_data)} elements.")
print(f"The value of extracted_data is {extracted_data}")

Note: in these two print statements we are using f-strings.

The output is…

The variable extracted_data is of type <class 'list'> and it has 1 elements.
The value of extracted_data is [DatumInContext(value='Paris', path=Fields('city'), context=DatumInContext(value={'city': 'Paris', 'country': 'France'}, path=Root(), context=None))]

Interesting…

We have learned something new, the variable returned by the find function (extracted_data) is a Python list and it contains one element.

You can see the value of that element in the output of the second print statement.

But how do we get the value of the attribute city?

We do it by accessing the value attribute of the element of the list (accessed using index 0 considering that it’s the only element in the list).

print(f"The city is {extracted_data[0].value}")

[output]
The city is Paris

Another Example of Getting the Value of a JSON Attribute with JSONPath

To get more familiar with jsonpath-ng let’s update the content of our JSON file as shown below.

{
    "city": "Paris",
    "country": {
        "name": "France",
        "identifier": "FR"
    }
}

This time the value of the country attribute is not a string but it’s a JSON object.

Let’s see what happens when we try to retrieve the value of the attribute country.

jsonpath_expr = jsonpath_ng.parse("$.country")
extracted_data = jsonpath_expr.find(json_data)
print(f"The data is {extracted_data[0].value}")

Note: the rest of the code stays the same

[output]
The data is {'name': 'France', 'identifier': 'FR'}

And now let’s see if we can get the identifier by simply using the dot notation again in the expression we have passed to jsonpath_ng.parse().

The Python code becomes…

jsonpath_expr = jsonpath_ng.parse("$.country.identifier")
extracted_data = jsonpath_expr.find(json_data)
print(f"The data is {extracted_data[0].value}")

And the output is…

The data is FR

That’s good, we have a basic understanding of how to retrieve attributes.

Let’s see something a bit more complex…

How to Parse a JSON Array in Python using JSONPath

Update the JSON file we are working on to include multiple cities instead of just one.

In other words, the JSON file will contain a JSON array.

Here is how the file becomes…

{
    "cities": [
        {
            "city": "Paris",
            "country": {
                "name": "France",
                "identifier": "FR"
            }
        },
        {
            "city": "London",
            "country": {
                "name": "United Kingdom",
                "identifier": "UK"
            }
        },
        {
            "city": "New York",
            "country": {
                "name": "United States",
                "identifier": "US"
            }
        }
    ]
}

Let’s say we want to retrieve the identifier attribute for each element in the JSON array.

How can we do that?

Let’s open the Python shell to try a few things out…

>>> import json, jsonpath_ng
>>> with open("cities.json", "r") as json_file:
...     json_data = json.load(json_file)
... 
>>> jsonpath_expr = jsonpath_ng.parse("$.cities.city")
>>> extracted_data = jsonpath_expr.find(json_data)
>>> extracted_data
[]

This doesn’t work, we got back an empty array.

Let’s see if we can pass an index to the cities array in the parsing expression.

>>> jsonpath_expr = jsonpath_ng.parse("$.cities[0].city")
>>> extracted_data = jsonpath_expr.find(json_data)  
>>> extracted_data[0].value
'Paris'

It works!

So, how can you extract the value of the same attribute from each JSON object in the JSON array?

To refer to all the elements in a JSON array using JSONPath in Python you can use [*] next to the name of the JSON array.

Our code becomes…

>>> jsonpath_expr = jsonpath_ng.parse("$.cities[*].city")
>>> extracted_data = jsonpath_expr.find(json_data)
>>> extracted_data[0].value
'Paris'
>>> extracted_data[1].value
'London'
>>> extracted_data[2].value
'New York'
>>> extracted_data[3].value
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

We get a list index out of range exception when accessing the fourth element of the list extracted_data because this list has only three elements, the three cities in the JSON file.

We can also use a for loop to print the cities extracted using the JSONPath expression:

>>> for match in extracted_data:
...     print(match.value)
... 
Paris
London
New York

Makes sense?

Another Example of JSONPath Expression in Python

Let’s keep our JSON file the same…

I want to show you something else you can do with JSONPath.

Can we extract data from an inner JSON object without specifying every single node of the JSON document in the parsing expression?

The answer is yes, and we can do it using the following syntax:

jsonpath1..jsonpath2

This expression allows retrieving all nodes matched by jsonpath2 descending from any node matching jsonpath1.

So, in our scenario, we can extract the “country” attribute without having to specify the “cities” JSON array in the parsing expression.

Here’s how…

import json, jsonpath_ng

with open("cities.json", "r") as json_file:
    json_data = json.load(json_file)

jsonpath_expr = jsonpath_ng.parse("$..country")
extracted_data = jsonpath_expr.find(json_data)

for match in extracted_data:
    print(f"The country data is {match.value}")

If you execute this code you get the following output:

The country data is {'name': 'France', 'identifier': 'FR'}
The country data is {'name': 'United Kingdom', 'identifier': 'UK'}
The country data is {'name': 'United States', 'identifier': 'US'}

What is the Difference Between JSON and JSONPath?

JSON stands for JavaScript Object Notation and it’s a format for storing and exchanging data between systems or applications.

The json module is the most common Python module to read and write JSON data.

JSONPath, on the other side, allows extracting data from a JSON document without having to traverse the data structure returned by the json module when reading a JSON string or a JSON file.

Conclusion

I hope you have found this JSONPath tutorial useful and that the examples I went through have given you enough knowledge to continue testing more JSONPath expressions as part of your Python code.

Also, don’t worry if you find JSONPath a bit tricky to grasp at the beginning, it’s perfectly normal.

Have you found this tutorial useful? Would you like to bring your Python skills to the next level?

Related course: DataCamp has created a course that teaches Data Science in Python, something that is becoming more and more popular on the market. Check the DataCamp course Introduction to Data Science in Python.

Leave a Comment