What is the Facebook external hit user agent?

You may observe requests with a user agent containing the facebookexternalhit substring in your log and wonder if they are all linked to Facebook/Meta. These requests don’t always originate from Facebook. They may also come from the iMessage link preview feature or from an attacker that spoofed its user agent. In this article, we provide more information to distinguish between these different situations.

Is the Facebook external hit substring always linked to Facebook/Meta?

NO, not all requests whose user-agent contains the facebookexternalhit substring are linked to Meta. Only the requests whose user agents match the following and whose IP addresses belong to AS32934 (Facebook, Inc.) come from Facebook/Meta:
  • facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
  • facebookexternalhit/1.1
  • facebookcatalog/1.0

Why does Facebook/Meta make requests with facebookexternalhit to my website?

Facebookexternalhit is the Facebook crawler. It is used to retrieve information about websites or applications that are shared on Facebook. For example, when you copy a link in messenger/facebook, it makes a request with the following user-agent: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

The request comes from a Facebook IP address, in the case of my experiment 31.13.127.2, which belongs to AS32934 (Facebook, Inc.).

Facebookexternalhit is also linked to iMessage (iPhone message) link preview feature

In your logs you may also see requests with a user agent that look as follows: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0

User agents that contain both the facebookexternalhit and Twitterbot substrings are linked to the Apple iMessage application. Whenever you receive a link in a conversation, iMessage triggers a request with the previous user agent to retrieve information such as the title, a short description, and the favicon of the site.

Contrary to requests made by the Facebook crawler, these requests come from the end-user IP address. Thus, the IP addresses you observe in the logs will be linked to different (mobile) ISPs such as AT&T, Verizon and Comcast but are not linked to Facebook or Twitter (X).

How can I verify if a facebook external hit request comes from Facebook?

Facebook provides a procedure to authenticate its crawlers. As always, you should never rely solely on the user-agent to authenticate a good bot as this HTTP header can be easily spoofed by an attacker. Thus, you should:
  1. Verify that the request user agent has the following pattern facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php), facebookexternalhit/1.1 or facebookcatalog/1.0
  2. And verify that the requests originates from AS32934 (Facebook, Inc.). To do that, you can either use the command whois command below that returns the list of IP ranges linked to this ASes, or you can use IP-related APIs such as IP Info.

The whois command to retrieve AS32934 (Facebook, Inc.) IP ranges: whois -h whois.radb.net -- '-i origin AS32934' | grep ^route

It returns different IP ranges (CIDRs) linked to the AS32934:

route:          31.13.24.0/21
route:          31.13.64.0/18
route:          31.13.64.0/19
route:          31.13.64.0/24
...

I'm seeing spikes of traffic coming from facebookexternalhit, is it normal?

The first step is to verify if the requests actually come from the Facebook crawler (cf previous section) or if they come from a malicious bot. Reminder: you should never rely only on the user agent to authenticate a good bot.

If the requests come Facebook's autonomous systems, then the spike is not malicious. Note that it may still cause damage to your infrastructure, such as causing a high CPU load and an increase in latency. This is a known issue that has been discussed for years:

That's also something I experience on all of my websites, including small websites with no particular information worth scraping. In the context of my work, we also observe a lot of websites from all fields (e-commerce, insurance, classified, travel, finance, etc) facing a similar situation. I have already contacted Facebook about the spikes of traffic engendered by their crawlers but never got any news from them.

You may have read this DataDome article about the Facebook crawler that was abused by attackers to scrape content. However, the spikes observed today are not linked to this exploit since it has been fixed by Facebook.

Conclusion

Requests with a user agent that contains the facebookexternalhit substring are either linked to the Facebook crawler or the iMessage link preview feature. The requests that come from iMessage look as follows: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0. They contain both facebookexternalhit and Twitterbot in their user agent.

To verify if a request that contain facebookexternalhit actually comes from Facebook, you should not rely solely on the user-agent. You should also verify that the IP address is linked to the AS32934 (Facebook, Inc.)

Other recommended articles

How to securely authenticate Google Read Aloud requests

In this article, we discuss what's Google Read Aloud, how you can authenticate its requests and ensure that it doesn't access paywalled content.

Read more

Published on: 02-06-2024

The LinkedInBot

This article provides information about the Linkedin bot, such as its user agent LinkedInBot/1.0 (compatible; Mozilla/5.0; Apache-HttpClient +http://www.linkedin.com) and how you can safely verify that a request originates from the Linkedin bot using reverse DNS.

Read more

Published on: 25-04-2024

Go HTTP Client

This article provides about the go-http-client/x.x user agent. It is linked to Go HTTP client, an HTTP(s) client implemented in Golang that can be used to make requests from a Golang program.

Read more

Published on: 01-05-2024