How to securely authenticate Google read aloud requests

You may see requests coming with the following user agents in your logs:

  • Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)
  • Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)

These requests come from the Google Read Aloud service. It enables users to read web pages using text-to-speech and is used by services such as  Google GoGoogle Read itRead Aloud on the Google app.

How can I verify if a Google Read Aloud request comes from a real Google application?

Google Read Aloud is considered a user-triggered fetcher, i.e. it is related to actions that are initiated by a user to perform a product-specific fetching function. Google provides a list of IP ranges related to user-triggered fetchers:

These IP addresses belong to the GOOGLE autonomous system. If we use the host command to do a reverse DNS on one of these IPs, e.g. 66.249.83.72, we observe that it is linked to google-proxy-xxxxxx.google.com

host 66.249.83.72
72.83.249.66.in-addr.arpa domain name pointer google-proxy-66-249-83-72.google.com

Thus to verify if a request comes from a legitimate Google Read Aloud, you should verify that:

  1. This IP address belongs to the list provided by Google;
  1. The user agent of the request is one of the following:
    1. Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)
    1. Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)

Disclaimer:

  1. Relying solely on the user agent is not a safe way to authenticate Google Read Aloud as this attribute can be easily modified/forged by an attacker.
  1. You should verify that the IP address is part of the list provided by Google to ensure the request comes from Google and not from an attacker who rented a Google Cloud Platform (GCP) machine.

How can I prevent Google Read Aloud from accessing my website?

Google Read Aloud is not considered as a crawler since it is triggered upon user requests. Thus, it doesn’t take into account / respect the content of your robots.txt file. To opt out of Google Read Aloud and prevent it from leveraging the content of your website, you need to use the nopagereadaloud meta tag:

<meta name="google" content="nopagereadaloud">

How can I prevent Google Read Aloud from accessing paid content on my website?

To prevent Google Read Aloud from accessing paid or paywalled content, you should use structured data for subscription and paywalled content. The structured data JSON objects need to have the  isAccessibleForFree property equal to False.

Other recommended articles

Privacy leak: detecting anti-canvas fingerprinting browser extensions

In this article, we present 2 approaches that can be used to detect anti-canvas fingerprinting countermeasures and we discuss the potential consequences in terms of privacy for their users.

Read more

Published on: 29-06-2024

Fraud detection: how to detect if a user lied about its OS and infer its real OS?

In this article, we explain how we explain how you can detect that a user lied about the real nature of its OS by modifying its user agent. We provide different techniques that enable you to retrieve the real nature of the OS using JavaScript APIs such as WebGL and getHighEntropyValues.

Read more

Published on: 11-06-2024

(Unmodified) Headless Chrome instrumented with Puppeteer: How consistent is the fingerprint in 2024?

In this article, we conduct a deep dive analysis of the fingerprint of an unmodified headless Chrome instrumented with Puppeteer browser. We compare it with the fingerprint of a normal Chrome browser used by a human user to identify the main differences and see if they can be leveraged for bot detection.

Read more

Published on: 02-06-2024