How to change the user agent, use proxies and make concurrent requests with the Python AIOHTTP library

AIOHTTP (Python/3.x aiohttp/3.x.x) is an asynchronous HTTP Client/Server for asyncio and Python. It is frequently used to create bots and scrapers because of its asynchronous nature that makes it easy for developers to make requests in parallel.

How to install AIOHTTP?

You can install it using the following command

pip install aiohttp

How to make GET requests with AIOHTTP?

The code snippet below shows how you can make a simple HTTP request with the AIOHTTP library to https://deviceandbrowserinfo.com/api/http_headers, prints the status code of the response (200 if successful) and the content of the response. Since the requests are asynchronous, you must use await to get the response content.


import aiohttp
import asyncio

async def main():
    async with aiohttp.ClientSession() as session:
        async with session.get('https://deviceandbrowserinfo.com/api/http_headers') as resp:
            print(resp.status)
            print(await resp.text())

asyncio.run(main())

How to modify the default user agent?

The code above makes a request to https://deviceandbrowserinfo.com/api/http_headers, which returns the list of HTTP headers and their associated value. In the case of AIOHTTP, we obtain the following results:


    {
      "Connection": "upgrade",
      "Host": "deviceandbrowserinfo.com",
      "X-Forwarded-For": "xx.yy.zz.aa",
      "Accept": "*/*",
      "Accept-Encoding": "gzip, deflate",
      "User-Agent": "Python/3.9 aiohttp/3.8.6"
    }

We see that by default, AIOHTTP has the following user agent: Python/3.9 aiohttp/3.8.6. Note that the version in the user-agent depends on the library version.

To change user-agent used by the AIOHTTP client, we need to pass the headers parameter with a User-Agent property when doing an HTTP request:


async with aiohttp.ClientSession() as session:
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15'}
    async with session.get('https://deviceandbrowserinfo.com/api/http_headers', headers=headers) as resp:
        print(resp.status)
        print(await resp.text())

With the headers parameter, the server returns our new user agent along with the previous HTTP headers:


    {
      "Connection": "upgrade",
      "Host": "deviceandbrowserinfo.com",
      "X-Forwarded-For": "xx.yy.zz.aa",
      "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15",
      "Accept": "*/*",
      "Accept-Encoding": "gzip, deflate"
    }

How can I change AIOHTTP headers?

We may want to change all the HTTP headers to appear more human and avoid being blocked (response 403). In this case, we need to provide a headers dictionary that contains all the headers we want to modify. For example, to make it look like the requests are coming from a Chrome browser on MacOS, we could provide the following headers:


async with aiohttp.ClientSession() as session:
    headers = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
        'Accept-Language': 'en,fr-FR;q=0.9,fr;q=0.8',
        'Connection': 'keep-alive',
        'Sec-Fetch-Dest': 'document',
        'Sec-Fetch-Mode': 'navigate',
        'Sec-Fetch-Site': 'none',
        'Sec-Fetch-User': '?1',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36',
        'sec-ch-ua': '"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"',
        'sec-ch-ua-form-factors': '"Desktop"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"macOS"',
    }
    async with session.get('https://deviceandbrowserinfo.com/api/http_headers', headers=headers) as resp:
        print(resp.status)
        print(await resp.text())

How can I use AIOHTTP with a proxy?

You need to pass a proxy parameter that contains information about your proxy credentials:


proxy_url = 'http://your-proxy-url:port'

async with session.get('https://deviceandbrowserinfo.com/api/http_headers', proxy=proxy_url) as resp:
    print(resp.status)
    print(await resp.text())

Does AIOHTTP execute JavaScript?

No. When you make an HTTP requests to a page that also contains JavaScript, AIOHTTP doesn’t execute any JavaScript. It just enables you to retrieve the content of the page (HTML, JS and CSS). If you want to execute JS, you should use a headless browser such as Headless Chrome.

How can I make concurrent/parallel requests with AIOHTTP?

The asynchronous nature of AIOHTTP makes it convenient to do parallel HTTP requests. To control the level of concurrency, i.e. the maximum number of concurrent/parallel, we can use the asyncio.Semaphore synchronisation primitive. The example below shows how we can use AIOHTTP to make at most 5 concurrent/parallel GET requests on a list of urls.


import aiohttp
import asyncio

async def fetch(url, session, semaphore):
    async with semaphore:
        async with session.get(url) as response:
            data = await response.text()
            return data

async def fetch_all(urls, max_concurrent_requests):
    semaphore = asyncio.Semaphore(max_concurrent_requests)
    async with aiohttp.ClientSession() as session:
        tasks = [fetch(url, session, semaphore) for url in urls]
        results = await asyncio.gather(*tasks)
    return results

async def main():
    urls = [
        'https://example.com',
        '...'
    ]
    max_concurrent_requests = 5
    results = await fetch_all(urls, max_concurrent_requests)
    for i, result in enumerate(results):
        print(f"Result {i+1}:\n{result}\n")

if __name__ == "__main__":
    asyncio.run(main())
                

How can I parse HTML with AIOHTTP?

To parse and analyze HTML content with Python requests, you need to leverage the Beautiful Soup library. The example below shows how you can make a request to https://deviceandbrowserinfo.com/learning_zone, extract all the links in the page, and print them.


import aiohttp
import asyncio
from bs4 import BeautifulSoup

async def main():
    async with aiohttp.ClientSession() as session:
        async with session.get('https://deviceandbrowserinfo.com/learning_zone') as resp:
            soup = BeautifulSoup(await resp.text(), 'html.parser')
            links = soup.find_all('a')

            for link in links:
                link_text = link.get_text()
                print(link_text)

asyncio.run(main())

How can I block requests coming from AIOHTTP?

Block with the user-agent: You can block requests whose user agent contains the aiohttp substring. However, you should keep in mind that an attacker can easily change this value, cf

Block using missing and inconsistent HTTP headers: In case the attacker simply changes its user agent, you can block HTTP requests that claim to come from standard browsers such as Chrome, Firefox, and Safari but that don’t have standard HTTP headers, for example:

  • Missing accept-language
  • Missing client hints, such as sec-ch-ua

You should be careful of potential false positives when taking this kind of blocking decision as there might be edge cases on certain less common (outdated/non-standard) browsers.

Block using TLS fingerprinting: Another solution is to leverage the TLS fingerprint to block values linked to AIOHTTP.

Other recommended articles

How to get started in bot detection and bot development?

This article present the topics and blogs/articles/websites/papers that are worth reading to learn about bot detection.

Read more

Published on: 28-09-2024

How to remove “Chrome is being controlled by automated test software” ?

In this article, we present how you can remove the “Chrome is being controlled by automated test software” warning in Chrome using the ignoreDefaultArgs: ["--enable-automation"] argument.

Read more

Published on: 16-06-2024

Simple Selenium Chrome Crawler (Python)

Tutorial to create a simple scraper/crawler in Python that leverages Google Chrome and Selenium

Read more

Published on: 01-04-2024