Everything you want to know about the user agent HTTP header

Origin of the user agent header

The user-agent string was designed as a way for clients (web browsers, for example) to introduce themselves to servers. It enabled servers to deliver content optimized for different browsers or to understand the capabilities of the client requesting the information. Initially, this might have been as simple as distinguishing between the first web browsers like Mosaic and the early versions of Netscape.

Variability across browsers and operating systems

Today, the diversity of User Agents reflects the vast ecosystem of devices, operating systems, and browsers. For example, a user agent string from Chrome running on Windows 10 differs significantly from Safari on an iPhone. The structure of the user agent string generally includes the browser name, version, and information about the operating system, but the format and details can vary widely.

Consider these examples:

  • Chrome on Windows 10: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
  • Safari on an iPhone (iOS): Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1
  • Analyzing these strings provides insights into the device type (desktop, mobile), operating system (Windows, iOS), and browser (Chrome, Safari), among other details. However, the absence of a strict standard for user agent strings means they can be quite varied and, at times, misleading and difficult to parse. That’s one of the reasons why Client hints HTTP headers such as sec-ch-ua-platform were recently introduced in browsers. For example, the sec-ch-ua-platform header, provides information about the user platform, e.g. macOS without having to parse the user agent.

    Limit of using the user-agent for authentication and (good) bot detection

    The user agent header can also be used to identify bot traffic. For example, to detect bots that declare their presence in their user agent such as Googlebot: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

    However, you should always keep in mind that the user agent — as any data sent by the user — can be modified. When the user agent has been intentionally modified, we talk about user agent spoofing.

    Thus, while relying on the user agent is acceptable for non critical use cases such as analytics or generic browser detection, it is not secure enough to authenticate and allow good bot traffic or partners. Indeed, to authenticate and allow good bot traffic securely, relying solely on the user-agent string is inadequate. You should use more secure information, such as:
    • A secret authentication token;
    • The IP address (or IP range) of the good bot;
    • The reverse DNS.
    • Google provides different methods you can use to authenticate and identify Google bot traffic safely.

      Human users can also modify their user agent

      Legitimate human users can also alter their user-agent strings, using browser extensions such as User-Agent Switcher. This capability is often used for testing purposes or to access content restricted to certain devices or browsers. It underscores the point that user agent strings, while useful for understanding user behavior in aggregate, should not be trusted for authentication or access control.

      Conclusion

      The User Agent HTTP header remains a valuable tool for web developers and security professionals. It offers insights into the devices and browsers accessing web content, enabling optimized experiences and the detection of patterns that may indicate bot activity. However, its utility is limited by the ease with which it can be spoofed. When it comes to security, especially in distinguishing between human traffic, good bot traffic and potentially harmful traffic, it's vital to employ more reliable methods than user agent string analysis alone.

      In case you want to leverage the user agent for feature detection, you should also check if the client hints HTTP headers may not be a better fit for your use case. While these headers may not be available on outdated browsers, they expose structured information about the OS and the browser on recent browsers, all without having to parse the user agent.