How to change the user agent, use proxies and make concurrent requests with the Python AIOHTTP library
AIOHTTP
(Python/3.x aiohttp/3.x.x
) is an asynchronous HTTP Client/Server for asyncio and Python. It is
frequently used to create bots and scrapers because of its asynchronous nature that makes it easy for
developers to make requests in parallel.
How to install AIOHTTP?
You can install it using the following command
pip install aiohttp
How to make GET requests with AIOHTTP?
The code snippet below shows how you can make a simple
HTTP request with the AIOHTTP library to https://deviceandbrowserinfo.com/api/http_headers,
prints the status code of the response (200
if successful) and the content of the response.
Since the requests are asynchronous, you must use await
to get the response content.
import aiohttp
import asyncio
async def main():
async with aiohttp.ClientSession() as session:
async with session.get('https://deviceandbrowserinfo.com/api/http_headers') as resp:
print(resp.status)
print(await resp.text())
asyncio.run(main())
How to modify the default user agent?
The code above makes a request to https://deviceandbrowserinfo.com/api/http_headers, which returns the list of HTTP headers and their associated value. In the case of AIOHTTP, we obtain the following results:
{
"Connection": "upgrade",
"Host": "deviceandbrowserinfo.com",
"X-Forwarded-For": "xx.yy.zz.aa",
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"User-Agent": "Python/3.9 aiohttp/3.8.6"
}
We see that by default, AIOHTTP has the following user
agent: Python/3.9 aiohttp/3.8.6
. Note that the version in the user-agent depends on the library
version.
To change user-agent used by the AIOHTTP client, we need
to pass the headers
parameter with a User-Agent
property when doing an HTTP
request:
async with aiohttp.ClientSession() as session:
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15'}
async with session.get('https://deviceandbrowserinfo.com/api/http_headers', headers=headers) as resp:
print(resp.status)
print(await resp.text())
With the headers
parameter, the server
returns our new user agent along with the previous HTTP headers:
{
"Connection": "upgrade",
"Host": "deviceandbrowserinfo.com",
"X-Forwarded-For": "xx.yy.zz.aa",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15",
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate"
}
How can I change AIOHTTP headers?
We may want to change all the HTTP headers to appear more
human and avoid being blocked (response 403
). In this case, we need to provide a
headers
dictionary that contains all the headers we want to modify. For example, to make it
look like the requests are coming from a Chrome browser on MacOS, we could provide the following headers:
async with aiohttp.ClientSession() as session:
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'Accept-Language': 'en,fr-FR;q=0.9,fr;q=0.8',
'Connection': 'keep-alive',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36',
'sec-ch-ua': '"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"',
'sec-ch-ua-form-factors': '"Desktop"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
}
async with session.get('https://deviceandbrowserinfo.com/api/http_headers', headers=headers) as resp:
print(resp.status)
print(await resp.text())
How can I use AIOHTTP with a proxy?
You need to pass a proxy
parameter that
contains information about your proxy credentials:
proxy_url = 'http://your-proxy-url:port'
async with session.get('https://deviceandbrowserinfo.com/api/http_headers', proxy=proxy_url) as resp:
print(resp.status)
print(await resp.text())
Does AIOHTTP execute JavaScript?
No. When you make an HTTP requests to a page that also contains JavaScript, AIOHTTP doesn’t execute any JavaScript. It just enables you to retrieve the content of the page (HTML, JS and CSS). If you want to execute JS, you should use a headless browser such as Headless Chrome.
How can I make concurrent/parallel requests with AIOHTTP?
The asynchronous nature of AIOHTTP makes it convenient to
do parallel HTTP requests. To control the level of concurrency, i.e. the maximum number of
concurrent/parallel, we can use the asyncio.Semaphore
synchronisation primitive. The example
below shows how we can use AIOHTTP to make at most 5 concurrent/parallel GET requests on a list of
urls
.
import aiohttp
import asyncio
async def fetch(url, session, semaphore):
async with semaphore:
async with session.get(url) as response:
data = await response.text()
return data
async def fetch_all(urls, max_concurrent_requests):
semaphore = asyncio.Semaphore(max_concurrent_requests)
async with aiohttp.ClientSession() as session:
tasks = [fetch(url, session, semaphore) for url in urls]
results = await asyncio.gather(*tasks)
return results
async def main():
urls = [
'https://example.com',
'...'
]
max_concurrent_requests = 5
results = await fetch_all(urls, max_concurrent_requests)
for i, result in enumerate(results):
print(f"Result {i+1}:\n{result}\n")
if __name__ == "__main__":
asyncio.run(main())
How can I parse HTML with AIOHTTP?
To parse and analyze HTML content with Python requests, you need to leverage the Beautiful Soup library. The example below shows how you can make a request to https://deviceandbrowserinfo.com/learning_zone, extract all the links in the page, and print them.
import aiohttp
import asyncio
from bs4 import BeautifulSoup
async def main():
async with aiohttp.ClientSession() as session:
async with session.get('https://deviceandbrowserinfo.com/learning_zone') as resp:
soup = BeautifulSoup(await resp.text(), 'html.parser')
links = soup.find_all('a')
for link in links:
link_text = link.get_text()
print(link_text)
asyncio.run(main())
How can I block requests coming from AIOHTTP?
Block with the user-agent: You can block
requests whose user agent contains the aiohttp
substring. However, you should keep in mind that
an attacker can easily change this value, cf
Block using missing and inconsistent HTTP headers: In case the attacker simply changes its user agent, you can block HTTP requests that claim to come from standard browsers such as Chrome, Firefox, and Safari but that don’t have standard HTTP headers, for example:
- Missing
accept-language
- Missing client hints, such as
sec-ch-ua
You should be careful of potential false positives when taking this kind of blocking decision as there might be edge cases on certain less common (outdated/non-standard) browsers.
Block using TLS fingerprinting: Another solution is to leverage the TLS fingerprint to block values linked to AIOHTTP.