How to change the user agent, HTTP headers and use proxies with the Python requests library
Requests
(python-requests/2.31.0
) is a Python HTTP library that makes it easy for developer to make
HTTP(s) requests (GET, POST, etc).
How to install Python requests?
You can install it using the following command
pip install requests
How to make GET requests with Python requests?
The code snippet below shows how you can make a simple
HTTP request with the Python requests library to https://deviceandbrowserinfo.com/api/http_headers,
prints the status code of the response (200
if successful) and the content of the response.
import requests
response = requests.get('https://deviceandbrowserinfo.com/api/http_headers')
print(response.status_code)
print(response.text)
How to modify the default user agent?
The code above makes a request to https://deviceandbrowserinfo.com/api/http_headers, which returns the list of HTTP headers and their associated value. In the case of Python requests, we obtain the following results:
{
"Connection": "upgrade",
"Host": "deviceandbrowserinfo.com",
"X-Forwarded-For": "xx.yy.zz.aa",
"User-Agent": "python-requests/2.31.0",
"Accept-Encoding": "gzip, deflate",
"Accept": "*/*"
}
We see that by default, Python requests has the following
user agent: python-requests/2.31.0
. Note that the version in the user-agent depends on the
library version.
To change the Python requests user-agent, we need to pass
the headers
parameter with a User-Agent
property when doing an HTTP request:
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15'}
response = requests.get('https://deviceandbrowserinfo.com/api/http_headers', headers=headers)
With the headers
parameter, the server
returns our new user agent along with the previous HTTP headers:
{
"Connection": "upgrade",
"Host": "deviceandbrowserinfo.com",
"X-Forwarded-For": "xx.yy.zz.aa",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15",
"Accept-Encoding": "gzip, deflate",
"Accept": "*/*"
}
How can I change Python requests HTTP headers?
We may want to change all the HTTP headers to appear more
human and avoid being blocked (response 403
). In this case, we need to provide a
headers
dictionary that contains all the headers we want to modify. For example, to make it
look like the requests are coming from a Chrome browser on MacOS, we could provide the following headers:
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'Accept-Language': 'en,fr-FR;q=0.9,fr;q=0.8',
'Connection': 'keep-alive',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36',
'sec-ch-ua': '"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"',
'sec-ch-ua-form-factors': '"Desktop"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
}
response = requests.get('https://deviceandbrowserinfo.com/api/http_headers', headers=headers)
How can I use Python requests with a proxy?
You need to pass a proxies
parameter that
contains information about your proxies credentials:
proxies = {
'http': 'http://username:password@proxyserver:port',
'https': 'http://username:password@proxyserver:port',
}
response = requests.get('https://deviceandbrowserinfo.com/api/ip_address', proxies=proxies)
Does Python requests execute JavaScript?
No. When you make an HTTP requests to a page that also contains JavaScript, Python requests doesn’t execute any JavaScript. It just enables you to retrieve the content of the page (HTML, JS and CSS). If you want to execute JS, you should use a headless browser such as Headless Chrome.
How can I parse HTML with Python requests?
To parse and analyze HTML content with Python requests, you need to leverage the Beautiful Soup library. The example below shows how you can make a request to https://deviceandbrowserinfo.com/learning_zone, extract all the links in the page, and print them.
import requests
from bs4 import BeautifulSoup
response = requests.get('https://deviceandbrowserinfo.com/learning_zone')
soup = BeautifulSoup(response.content, 'html.parser')
links = soup.find_all('a')
for link in links:
link_text = link.get_text()
print(link_text)
How can I block requests coming from Python requests?
Block with the user-agent: You can block
requests whose user agent contains the python-requests
substring. However, you should keep in
mind that an attacker can easily change this value.
Block using missing and inconsistent HTTP headers: In case the attacker simply changes its user agent, you can block HTTP requests that claim to come from standard browsers such as Chrome, Firefox, and Safari but that don’t have standard HTTP headers, for example:
- Missing
accept-language
- Missing client hints, such as
sec-ch-ua
You should be careful of potential false positives when taking this kind of blocking decision as there might be edge cases on certain less common (outdated/non-standard) browsers.
Block using TLS fingerprinting: Another solution is to leverage the TLS fingerprint to block values linked to Python requests.