How to securely authenticate Google read aloud requests
You may see requests coming with the following user agents in your logs:
-
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)
-
Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)
These requests come from the Google Read Aloud service. It enables users to read web pages using text-to-speech and is used by services such as Google Go, Google Read it, Read Aloud on the Google app.
How can I verify if a Google Read Aloud request comes from a real Google application?
Google Read Aloud is considered a user-triggered fetcher, i.e. it is related to actions that are initiated by a user to perform a product-specific fetching function. Google provides a list of IP ranges related to user-triggered fetchers:
These IP addresses belong to the GOOGLE
autonomous system. If we use the host
command to do a reverse DNS on one of these IPs, e.g.
66.249.83.72
, we observe that it is linked to google-proxy-xxxxxx.google.com
host 66.249.83.72
72.83.249.66.in-addr.arpa domain name pointer google-proxy-66-249-83-72.google.com
Thus to verify if a request comes from a legitimate Google Read Aloud, you should verify that:
- This IP address belongs to the list provided by Google;
- The user agent of the request is one of the following:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)
Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)
Disclaimer:
- Relying solely on the user agent is not a safe way to authenticate Google Read Aloud as this attribute can be easily modified/forged by an attacker.
- You should verify that the IP address is part of the list provided by Google to ensure the request comes from Google and not from an attacker who rented a Google Cloud Platform (GCP) machine.
How can I prevent Google Read Aloud from accessing my website?
Google Read Aloud is not considered as a crawler since
it is triggered upon user requests. Thus, it doesn’t take into account / respect the content of your
robots.txt
file. To opt out of Google Read Aloud and prevent it from leveraging the content
of your website, you need to use the nopagereadaloud
meta
tag:
<meta name="google" content="nopagereadaloud">
How can I prevent Google Read Aloud from accessing paid content on my website?
To prevent Google Read Aloud from accessing paid or
paywalled content, you should use structured
data for subscription and paywalled content. The structured data JSON objects need to have the
isAccessibleForFree
property equal to False
.