How to detect (modified|headless) Chrome instrumented with Puppeteer (2024 edition)

In this article, we present 3 efficient techniques to detect bots that leverage Puppeteer with headless and non-headless Chrome. These techniques have been tested in June 2024.

TL;DR:

If you just want the code of the detection techniques, you can only have a look at the code snippet below. The remainder of this article goes into the details of these techniques and explains how some of them can be bypassed by attackers. The 3 techniques work as follows:

  1. Using the user agent HTTP headers or with navigator.userAgent in JS to detect user agents linked to Headless Chrome: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/125.0.0.0 Safari/537.36
  1. By detecting if navigator.webdriver = true in JavaScript
  1. By detecting the side effects of CDP

JavaScript code to detect (headless) Chrome instrumented with Puppeteer:

let isBot = false;
if (navigator.userAgent.includes("HeadlessChrome")) {
    isBot = true;
}

if (navigator.webdriver) {
    isBot = true;
}

var cdpDetected = false;
var e = new Error();
Object.defineProperty(e, 'stack', {
   get() {
    cdpDetected = true;
   }
});

// This is part of the detection, the console.log shouldn't be removed!
console.log(e);

if (cdpDetected) {
    isBot = true;
}

if (isBot) {
		console.log("Your bot has been detected!")
}

Technique 1: How to detect the an unmodified Headless Chrome automated with Puppeteer

To illustrate the first detection technique, we create a simple bot based on Headless Chrome and Puppeteer. The bot visits https://deviceandbrowserinfo.com/http_headers and we take a screenshot of the page to observe the HTTP headers sent by our bot:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://deviceandbrowserinfo.com/http_headers');

    const headersTable = await page.$('#headers')
    await headersTable.screenshot({ path: './vanila-headless-chrome.png' })
    await browser.close()
})();

We obtain the following HTTP headers:

Thus, we notice that the user agent sent by an unmodified headless chrome instrumented with Puppeteer indicates the presence of Headless Chrome. Note that the user agent can also be obtained from the client side, e.g. if you want to exclude headless Chrome traffic from your google analytics. You can access it using navigator.userAgent . In the case of our bot, it returns Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/125.0.0.0 Safari/537.36 .

Disclaimer: this detection technique works only for Headless Chrome. It doesn’t work for normal Chrome instrumented with Puppeteer since the user agent won’t contain the HeadlessChrome substring.

Technique 2: How to detect a modified (headless) Chrome instrumented with Puppeteer

This second technique works both for headless and non-headless Chrome and can also be used to detect bots that changed their user agent.

To lie about the user agent, we just need to use page.setUserAgent with the user agent we want to forge. We need to do it before we visit the page, e.g. just after the page has been created.

const page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36');

Once we change the user, it doesn’t contain the HeadlessChrome substring anymore, even when using Headless Chrome.

However, from JavaScript, you can still detect bots that forged their user agent by verifying if the navigator.webdriver property is equal to true.

Technique 3: How to detect a modified (headless) Chrome instrumented with Puppeteer that removed navigator.webdriver = true

The navigator.webdriver = true property presented in the previous section can easily be removed by using the --disable-blink-features=AutomationControlle argument when creating the puppeteer browser instance:

const browser = await puppeteer.launch({args: ['--disable-blink-features=AutomationControlled']});

When a browser is created this way, navigator.webdriver returns false and can’t be used anymore for detection.

Thus, to leverage attackers that actively lie about their nature by forging their user agent and by getting rid of navigator.webdriver , we need to find another detection technique. We can use CDP detection, a technique I presented in a recent DataDome blog post.

Under the hood, Puppeteer leverages the Chrome DevTools Protocol (CDP) to instrument (headless) Chrome. By using a specially crafted challenge shown below, we can detect the use of CDP, and therefore the fact that a browser is automated:

var detected = false;
var e = new Error();
Object.defineProperty(e, 'stack', {
   get() {
       detected = true;
   }
});
console.log(e);

If the value of detected is equal to true then it means that the browser is automated. One of the side effects of this detection technique is that it will flag human users with dev tools open as bots. Note that the console.log(e) is part of the challenge since it is what triggers the serialisation in CDP. You can find more details about this challenge in my DataDome blog post.

Limits of current detection techniques

In this article, we presented 3 different detection techniques that can be used to detect bots based on (headless) Chrome instrumented with Puppeteer:

  1. Using the user agent HTTP headers or with navigator.userAgent in JS to detect user agents linked to Headless Chrome: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/125.0.0.0 Safari/537.36
  1. By detecting if navigator.webdriver = true in JavaScript
  1. By detecting the side effects of CDP

Even though these detection techniques are quite effective — they have no false positives, besides the CDP detection technique that will flag people with the dev tools open — sophisticated attackers are aware of it and started to develop countermeasures to avoid being detected. In particular, certain frameworks, such as nodriver enable bot developers to bypass CDP detection by avoiding the use of the Runtime.enable CDP command.

Other recommended articles

Fraud detection: how to detect if a user lied about its OS and infer its real OS?

In this article, we explain how we explain how you can detect that a user lied about the real nature of its OS by modifying its user agent. We provide different techniques that enable you to retrieve the real nature of the OS using JavaScript APIs such as WebGL and getHighEntropyValues.

Read more

Published on: 11-06-2024

(Unmodified) Headless Chrome instrumented with Puppeteer: How consistent is the fingerprint in 2024?

In this article, we conduct a deep dive analysis of the fingerprint of an unmodified headless Chrome instrumented with Puppeteer browser. We compare it with the fingerprint of a normal Chrome browser used by a human user to identify the main differences and see if they can be leveraged for bot detection.

Read more

Published on: 02-06-2024

The role of weak (fingerprinting) signals in bot and fraud detection

In this article, we discuss how weak fingerprinting signals that provide information about the user's device, OS and browser can be used in the context of bot and fraud detection.

Read more

Published on: 24-05-2024