How to securely authenticate Google read aloud requests

You may see requests coming with the following user agents in your logs:

  • Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)
  • Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)

These requests come from the Google Read Aloud service. It enables users to read web pages using text-to-speech and is used by services such as  Google GoGoogle Read itRead Aloud on the Google app.

How can I verify if a Google Read Aloud request comes from a real Google application?

Google Read Aloud is considered a user-triggered fetcher, i.e. it is related to actions that are initiated by a user to perform a product-specific fetching function. Google provides a list of IP ranges related to user-triggered fetchers:

These IP addresses belong to the GOOGLE autonomous system. If we use the host command to do a reverse DNS on one of these IPs, e.g. 66.249.83.72, we observe that it is linked to google-proxy-xxxxxx.google.com

host 66.249.83.72
72.83.249.66.in-addr.arpa domain name pointer google-proxy-66-249-83-72.google.com

Thus to verify if a request comes from a legitimate Google Read Aloud, you should verify that:

  1. This IP address belongs to the list provided by Google;
  1. The user agent of the request is one of the following:
    1. Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)
    1. Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)

Disclaimer:

  1. Relying solely on the user agent is not a safe way to authenticate Google Read Aloud as this attribute can be easily modified/forged by an attacker.
  1. You should verify that the IP address is part of the list provided by Google to ensure the request comes from Google and not from an attacker who rented a Google Cloud Platform (GCP) machine.

How can I prevent Google Read Aloud from accessing my website?

Google Read Aloud is not considered as a crawler since it is triggered upon user requests. Thus, it doesn’t take into account / respect the content of your robots.txt file. To opt out of Google Read Aloud and prevent it from leveraging the content of your website, you need to use the nopagereadaloud meta tag:

<meta name="google" content="nopagereadaloud">

How can I prevent Google Read Aloud from accessing paid content on my website?

To prevent Google Read Aloud from accessing paid or paywalled content, you should use structured data for subscription and paywalled content. The structured data JSON objects need to have the  isAccessibleForFree property equal to False.