Fetching Knowledge from an HTTP API with Python

0
3


On this fast tip, excerpted from Helpful Python, Stuart reveals you the way straightforward it’s to make use of an HTTP API from Python utilizing a few third-party modules.

More often than not when working with third-party information we’ll be accessing an HTTP API. That’s, we’ll be making an HTTP name to an internet web page designed to be learn by machines relatively than by folks. API information is often in a machine-readable format—normally both JSON or XML. (If we come throughout information in one other format, we will use the strategies described elsewhere on this ebook to transform it to JSON, in fact!) Let’s have a look at how you can use an HTTP API from Python.

The final rules of utilizing an HTTP API are easy:

  1. Make an HTTP name to the URLs for the API, probably together with some authentication info (equivalent to an API key) to point out that we’re approved.
  2. Get again the information.
  3. Do one thing helpful with it.

Python gives sufficient performance in its commonplace library to do all this with none extra modules, however it would make our life rather a lot simpler if we decide up a few third-party modules to clean over the method. The primary is the requests module. That is an HTTP library for Python that makes fetching HTTP information extra nice than Python’s built-in urllib.request, and it may be put in with python -m pip set up requests.

To point out how straightforward it’s to make use of, we’ll use Pixabay’s API (documented right here). Pixabay is a inventory photograph web site the place the pictures are all obtainable for reuse, which makes it a really helpful vacation spot. What we’ll deal with right here is fruit. We’ll use the fruit photos we collect in a while, when manipulating recordsdata, however for now we simply wish to discover photos of fruit, as a result of it’s tasty and good for us.

To begin, we’ll take a fast have a look at what photos can be found from Pixabay. We’ll seize 100 photos, rapidly look by means of them, and select those we wish. For this, we’ll want a Pixabay API key, so we have to create an account after which seize the important thing proven within the API documentation beneath “Search Photographs”.

The requests Module

The essential model of creating an HTTP request to an API with the requests module includes developing an HTTP URL, requesting it, after which studying the response. Right here, that response is in JSON format. The requests module makes every of those steps straightforward. The API parameters are a Python dictionary, a get() perform makes the decision, and if the API returns JSON, requests makes that obtainable as .json on the response. So a easy name will appear like this:

import requests

PIXABAY_API_KEY = "11111111-7777777777777777777777777"

base_url = "https://pixabay.com/api/"
base_params = {
    "key": PIXABAY_API_KEY,
    "q": "fruit",
    "image_type": "photograph",
    "class": "meals",
    "safesearch": "true"
}

response = requests.get(base_url, params=base_params)
outcomes = response.json()

It will return a Python object, because the API documentation suggests, and we will have a look at its components:

>>> print(len(outcomes["hits"]))
20
>>> print(outcomes["hits"][0])
{'id': 2277, 'pageURL': 'https://pixabay.com/pictures/berries-fruits-food-blackberries-2277/', 'sort': 'photograph', 'tags': 'berries, fruits, meals', 'previewURL': 'https://cdn.pixabay.com/photograph/2010/12/13/10/05/berries-2277_150.jpg', 'previewWidth': 150, 'previewHeight': 99, 'webformatURL': 'https://pixabay.com/get/gc9525ea83e582978168fc0a7d4f83cebb500c652bd3bbe1607f98ffa6b2a15c70b6b116b234182ba7d81d95a39897605_640.jpg', 'webformatWidth': 640, 'webformatHeight': 426, 'largeImageURL': 'https://pixabay.com/get/g26eb27097e94a701c0569f1f77ef3975cf49af8f47e862d3e048ff2ba0e5e1c2e30fadd7a01cf2de605ab8e82f5e68ad_1280.jpg', 'imageWidth': 4752, 'imageHeight': 3168, 'imageSize': 2113812, 'views': 866775, 'downloads': 445664, 'collections': 1688, 'likes': 1795, 'feedback': 366, 'user_id': 14, 'person': 'PublicDomainPictures', 'userImageURL': 'https://cdn.pixabay.com/person/2012/03/08/00-13-48-597_250x250.jpg'}

The API returns 20 hits per web page, and we’d like 100 outcomes. To do that, we add a web page parameter to our record of params. Nevertheless, we don’t wish to alter our base_params each time, so the best way to strategy that is to create a loop after which make a copy of the base_params for every request. The built-in copy module does precisely this, so we will name the API 5 instances in a loop:

for web page in vary(1, 6):
    this_params = copy.copy(base_params)
    this_params["page"] = web page
    response = requests.get(base_url, params=params)

It will make 5 separate requests to the API, one with web page=1, the subsequent with web page=2, and so forth, getting totally different units of picture outcomes with every name. It is a handy solution to stroll by means of a big set of API outcomes. Most APIs implement pagination, the place a single name to the API solely returns a restricted set of outcomes. We then ask for extra pages of outcomes—very like wanting by means of question outcomes from a search engine.

Since we wish 100 outcomes, we may merely resolve that that is 5 calls of 20 outcomes every, however it might be extra strong to maintain requesting pages till we now have the hundred outcomes we want after which cease. This protects the calls in case Pixabay adjustments the default variety of outcomes to fifteen or comparable. It additionally lets us deal with the scenario the place there aren’t 100 photos for our search phrases. So we now have a whereas loop and increment the web page quantity each time, after which, if we’ve reached 100 photos, or if there aren’t any photos to retrieve, we get away of the loop:

photos = []
web page = 1
whereas len(photos) < 100:
    this_params = copy.copy(base_params)
    this_params["page"] = web page
    response = requests.get(base_url, params=this_params)
    if not response.json()["hits"]: break
    for end in response.json()["hits"]:
        photos.append({
            "pageURL": consequence["pageURL"],
            "thumbnail": consequence["previewURL"],
            "tags": consequence["tags"],
        })
    web page += 1

This manner, after we end, we’ll have 100 photos, or we’ll have all the pictures if there are fewer than 100, saved within the photos array. We will then go on to do one thing helpful with them. However earlier than we try this, let’s discuss caching.

Caching HTTP Requests

It’s a good suggestion to keep away from making the identical request to an HTTP API greater than as soon as. Many APIs have utilization limits with a view to keep away from them being overtaxed by requesters, and a request takes effort and time on their half and on ours. We must always attempt to not make wasteful requests that we’ve finished earlier than. Thankfully, there’s a helpful means to do that when utilizing Python’s requests module: set up requests-cache with python -m pip set up requests-cache. It will seamlessly file any HTTP calls we make and save the outcomes. Then, later, if we make the identical name once more, we’ll get again the domestically saved consequence with out going to the API for it in any respect. This protects each time and bandwidth. To make use of requests_cache, import it and create a CachedSession, after which as a substitute of requests.get use session.get to fetch URLs, and we’ll get the advantage of caching with no additional effort:

import requests_cache
session = requests_cache.CachedSession('fruit_cache')
...
response = session.get(base_url, params=this_params)

Making Some Output

To see the outcomes of our question, we have to show the pictures someplace. A handy means to do that is to create a easy HTML web page that reveals every of the pictures. Pixabay gives a small thumbnail of every picture, which it calls previewURL within the API response, so we may put collectively an HTML web page that reveals all of those thumbnails and hyperlinks them to the principle Pixabay web page—from which we may select to obtain the pictures we wish and credit score the photographer. So every picture within the web page would possibly appear like this:

<li>
    <a href="https://pixabay.com/pictures/berries-fruits-food-blackberries-2277/">
        <img src="https://cdn.pixabay.com/photograph/2010/12/13/10/05/berries-2277_150.jpg" alt="berries, fruits, meals">
    </a>
</li>

We will assemble that from our photos record utilizing a record comprehension, after which be a part of collectively all the outcomes into one huge string with "n".be a part of():

html_image_list = [
    f"""<li>
            <a href="https://www.sitepoint.com/python-fetching-data-http-api/{image["pageURL"]}">
                <img src="https://www.sitepoint.com/python-fetching-data-http-api/{picture["thumbnail']}" alt="https://www.sitepoint.com/python-fetching-data-http-api/{picture["tags"]}">
            </a>
        </li>
    """
    for picture in photos
]
html_image_list = "n".be a part of(html_image_list)

At that time, if we write out a really plain HTML web page containing that record, it’s straightforward to open that in an internet browser for a fast overview of all of the search outcomes we obtained from the API, and click on any certainly one of them to leap to the complete Pixabay web page for downloads:

html = f"""<!doctype html>
<html><head><meta charset="utf-8">
<title>Pixabay seek for {base_params['q']}</title>
<model>
ul {{
    list-style: none;
    line-height: 0;
    column-count: 5;
    column-gap: 5px;
}}
li {{
    margin-bottom: 5px;
}}
</model>
</head>
<physique>
<ul>
{html_image_list}
</ul>
</physique></html>
"""
output_file = f"searchresults-{base_params['q']}.html"
with open(output_file, mode="w", encoding="utf-8") as fp:
    fp.write(html)
print(f"Search outcomes abstract written as {output_file}")

The search results page, showing many fruits

This text is excerpted from Helpful Python, obtainable on SitePoint Premium and from book retailers.



LEAVE A REPLY

Please enter your comment!
Please enter your name here