Forget IPs: using cryptography to verify bot and agent traffic

With the rise of traffic from AI agents, what’s considered a bot is no longer clear-cut. There are some clearly malicious bots, like ones that DoS your site or do credential stuffing, and ones that most site owners do want to interact with their site, like the bot that indexes your site for a search engine, or ones that fetch RSS feeds.

Historically, Cloudflare has relied on two main signals to verify legitimate web crawlers from other types of automated traffic: user agent headers and IP addresses. The User-Agent header allows bot developers to identify themselves, i.e. MyBotCrawler/1.1. However, user agent headers alone are easily spoofed and are therefore insufficient for reliable identification. To address this, user agent checks are often supplemented with IP address validation, the inspection of published IP address ranges to confirm a crawler's authenticity. However, the logic around IP address ranges representing a product or group of users is brittle – connections from the crawling service might be shared by multiple users, such as in the case of privacy proxies and VPNs, and these ranges, often maintained by cloud providers, change over time.

Cloudflare will always try to block malicious bots, but we think our role here is to also provide an affirmative mechanism to authenticate desirable bot traffic. By using well-established cryptography techniques, we’re proposing a better mechanism for legitimate agents and bots to declare who they are, and provide a clearer signal for site owners to decide what traffic to permit.

Today, we’re introducing two proposals – HTTP message signatures and request mTLS – for friendly bots to authenticate themselves, and for customer origins to identify them. In this blog post, we’ll share how these authentication mechanisms work, how we implemented them, and how you can participate in our closed beta.

Existing bot verification mechanisms are broken

        </a>
      </div>
      <p>Historically, if you’ve worked on ChatGPT, Claude, Gemini, or any other agent, you’ve had several options to identify your HTTP traffic to other services: </p><ol><li><p>You define a <a href="https://www.rfc-editor.org/rfc/rfc9110#name-user-agent"><u>user agent</u></a>, an HTTP header described in <a href="https://www.rfc-editor.org/rfc/rfc9110.html#name-user-agent"><u>RFC 9110</u></a>. The problem here is that this header is easily spoofable and there’s not a clear way for agents to identify themselves as semi-automated browsers — agents often use the Chrome user agent for this very reason, which is discouraged. The RFC <a href="https://www.rfc-editor.org/rfc/rfc9110.html#section-10.1.5-9"><u>states</u></a>:

“If a user agent masquerades as a different user agent, recipients can assume that the user intentionally desires to see responses tailored for that identified user agent, even if they might not work as well for the actual user agent being used.”

You publish your IP address range(s). This has limitations because the same IP address might be shared by multiple users or multiple services within the same company, or even by multiple companies when hosting infrastructure is shared (like Cloudflare Workers, for example). In addition, IP addresses are prone to change as underlying infrastructure changes, leading services to use ad-hoc sharing mechanisms like CIDR lists.

You go to every website and share a secret, like a Bearer token. This is impractical at scale because it requires developers to maintain separate tokens for each website their bot will visit.

We can do better! Instead of these arduous methods, we’re proposing that developers of bots and agents cryptographically sign requests originating from their service. When protecting origins, reverse proxies such as Cloudflare can then validate those signatures to confidently identify the request source on behalf of site owners, allowing them to take action as they see fit.

A typical system has three actors:

User: the entity that wants to perform some actions on the web. This may be a human, an automated program, or anything taking action to retrieve information from the web.
Agent: an orchestrated browser or software program. For example, Chrome on your computer, or OpenAI’s Operator with ChatGPT. Agents can interact with the web according to web standards (HTML rendering, JavaScript, subrequests, etc.).
Origin: the website hosting a resource. The user wants to access it through the browser. This is Cloudflare when your website is using our services, and it’s your own server(s) when exposed directly to the Internet.

In the next section, we’ll dive into HTTP Message Signatures and request mTLS, two mechanisms a browser agent may implement to sign outgoing requests, with different levels of ease for an origin to adopt.

Introducing HTTP Message Signatures

        </a>
      </div>
      <p><a href="https://www.rfc-editor.org/rfc/rfc9421.html"><u>HTTP Message Signatures</u></a> is a standard that defines the cryptographic authentication of a request sender. It’s essentially a cryptographically sound way to say, “hey, it’s me!”. It’s not the only way that developers can sign requests from their infrastructure — for example, AWS has used <a href="https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html"><u>Signature v4</u></a>, and Stripe has a framework for <a href="https://docs.stripe.com/webhooks#verify-webhook-signatures-with-official-libraries"><u>authenticating webhooks</u></a> — but Message Signatures is a published standard, and the cleanest, most developer-friendly way to sign requests.  </p><p>We’re working closely with the wider industry to support these standards-based approaches. For example, OpenAI has started to sign their requests. In their own words:   </p><blockquote><p><i>"Ensuring the authenticity of Operator traffic is paramount. With HTTP Message Signatures (</i><a href="https://www.rfc-editor.org/rfc/rfc9421.html"><i><u>RFC 9421</u></i></a><i>), OpenAI signs all Operator requests so site owners can verify they genuinely originate from Operator and haven’t been tampered with” </i>– Eugenio, Engineer, OpenAI</p></blockquote><p>Without further delay, let’s dive in how HTTP Messages Signatures work to identify bot traffic.</p>
      <div>
        <h3>Scoping standards to bot authentication</h3>
        <a href="#scoping-standards-to-bot-authentication">
          
        </a>
      </div>
    <p>Generating a message signature works like this: before sending a request, the agent signs the target origin with a public key. When fetching <code>https://example.com/path/to/resource</code>, it signs <code>example.com</code>. This public key is known to the origin, either because the agent is well known, because it has previously registered, or any other method. Then, the agent writes a <b>Signature-Input</b> header with the following parameters:</p><ol><li><p>A validity window (<code>created</code> and <code>expires</code> timestamps)</p></li><li><p>A Key ID that uniquely identifies the key used in the signature. This is a <a href="https://www.rfc-editor.org/rfc/rfc7638.html"><u>JSON Web Key Thumbprint</u></a>.  </p></li><li><p>A tag that shows websites the signature’s purpose and validation method, i.e. <code>web-bot-auth</code> for bot authentication.</p></li></ol><p>In addition, the <code>Signature-Agent</code> header indicates where the origin can find the public keys the agent used when signing the request, such as in a directory hosted by <code>signer.example.com</code>. This header is part of the signed content as well.</p><p>Here’s an example:</p>
        <pre><code>GET /path/to/resource HTTP/1.1

Host: www.example.com User-Agent: Mozilla/5.0 Chrome/113.0.0 MyBotCrawler/1.1 Signature-Agent: signer.example.com Signature-Input: sig=("@authority" “signature-agent”);
created=1700000000;
expires=1700011111;
keyid=“ba3e64==”;
tag=“web-bot-auth” Signature: sig=abc==

For those building bots, we propose signing the authority of the target URI, i.e. crawler.search.google.com for Google Search, operator.openai.com for OpenAI Operator, workers.dev for Cloudflare Workers, and a way to retrieve the bot public key in the form of signature-agent, if present.

The User-Agent from the example above indicates that the software making the request is Chrome, because it is an agent that uses an orchestrated Chrome to browse the web. You should note that MyBotCrawler/1.1 is still present. The User-Agent header can actually contain multiple products, in decreasing order of importance. If our agent is making requests via Chrome, that’s the most important product and therefore comes first.

At Internet-level scale, these signatures may add a notable amount of overhead to request processing. However, with the right cryptographic suite, and compared to the cost of existing bot mitigation, both technical and social, this seems to be a straightforward tradeoff. This is a metric we will monitor closely, and report on as adoption grows.

Generating request signatures

        </a>
      </div>
    <p>We’re making several examples for generating Message Signatures for bots and agents <a href="https://github.com/cloudflareresearch/web-bot-auth/"><u>available on Github</u></a> (though we encourage other implementations!), all of which are standards-compliant, to maximize interoperability. </p><p>Imagine you’re building an agent using a managed Chromium browser, and want to sign all outgoing requests. To achieve this, the <a href="https://github.com/w3c/webextensions"><u>webextensions standard</u></a> provides <a href="https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/onBeforeSendHeaders"><u>chrome.webRequest.onBeforeSendHeaders</u></a>, where you can modify HTTP headers before they are sent by the browser. The event is <a href="https://developer.chrome.com/docs/extensions/reference/api/webRequest#life_cycle_of_requests"><u>triggered</u></a> before sending any HTTP data, and when headers are available.</p><p>Here’s what that code would look like: </p>
        <pre><code>chrome.webRequest.onBeforeSendHeaders.addListener(

function (details) { // Signature and header assignment logic goes here // <CODE> }, { urls: ["<all_urls>"] }, [“blocking”, “requestHeaders”] // requires “installation_mode”: “force_installed” );

Cloudflare provides a web-bot-auth helper package on npm that helps you generate request signatures with the correct parameters. onBeforeSendHeaders is a Chrome extension hook that needs to be implemented synchronously. To do so, we import {signatureHeadersSync} from “web-bot-auth”. Once the signature completes, both Signature and Signature-Input headers are assigned. The request flow can then continue.

const request = new URL(details.url);
const created = new Date();
const expired = new Date(created.getTime() + 300_000)
// Perform request signature
const headers = signatureHeadersSync(
request,
new Ed25519Signer(jwk),
{ created, expires }
);
// headers object now contains Signature and Signature-Input headers that can be used

This extension code is available on GitHub, alongside a debugging server, deployed at https://http-message-signatures-example.research.cloudflare.com.

Validating request signatures

        </a>
      </div>
    <p>Using our <a href="https://http-message-signatures-example.research.cloudflare.com"><u>debug server</u></a>, we can now inspect and validate our request signatures from the perspective of the website we’d be visiting. We should now see the Signature and Signature-Input headers:  </p>
      <figure>
      <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/18P5OyGxu2fU0Dpyv70Gjz/d82d62355524ad1914deb41b601bcad2/image3.png" />
      </figure><p><sup><i>In this example, the homepage of the debugging server validates the signature from the RFC 9421 Ed25519 verifying key, which the extension uses for signing.</i></sup></p><p>The above demo and code walkthrough has been fully written in TypeScript: the verification website is on Cloudflare Workers, and the client is a Chrome browser extension. We are cognisant that this does not suit all clients and servers on the web. To demonstrate the proposal works in more environments, we have also implemented bot signature validation in Go with a <a href="https://github.com/cloudflareresearch/web-bot-auth/tree/main/examples/caddy-plugin"><u>plugin</u></a> for <a href="https://caddyserver.com/"><u>Caddy server</u></a>.</p>
      <div>
        <h2>Experimentation with request mTLS</h2>
        <a href="#experimentation-with-request-mtls">
          
        </a>
      </div>
      <p>HTTP is not the only way to convey signatures. For instance, one mechanism that has been used in the past to authenticate automated traffic against secured endpoints is <a href="https://www.cloudflare.com/learning/access-management/what-is-mutual-tls/"><u>mTLS</u></a>, the “mutual” presentation of TLS certificates. As described in our <a href="https://www.cloudflare.com/learning/access-management/what-is-mutual-tls/"><u>knowledge base</u></a>:</p><blockquote><p><i>Mutual TLS, or mTLS for short, is a method for</i><a href="https://www.cloudflare.com/learning/access-management/what-is-mutual-authentication/"><i> </i><i><u>mutual authentication</u></i></a><i>. mTLS ensures that the parties at each end of a network connection are who they claim to be by verifying that they both have the correct private</i><a href="https://www.cloudflare.com/learning/ssl/what-is-a-cryptographic-key/"><i> </i><i><u>key</u></i></a><i>. The information within their respective</i><a href="https://www.cloudflare.com/learning/ssl/what-is-an-ssl-certificate/"><i> </i><i><u>TLS certificates</u></i></a><i> provides additional verification.</i></p></blockquote><p>While mTLS seems like a good fit for bot authentication on the web, it has limitations. If a user is asked for authentication via the mTLS protocol but does not have a certificate to provide, they would get an inscrutable and unskippable error. Origin sites need a way to conditionally signal to clients that they accept or require mTLS authentication, so that only mTLS-enabled clients use it.</p>
      <div>
        <h3>A TLS flag for bot authentication</h3>
        <a href="#a-tls-flag-for-bot-authentication">
          
        </a>
      </div>
    <p>TLS flags are an efficient way to describe whether a feature, like mTLS, is supported by origin sites. Within the IETF, we have proposed a new TLS flag called <a href="https://datatracker.ietf.org/doc/draft-jhoyla-req-mtls-flag/"><code><u>req mTLS</u></code></a> to be sent by the client during the establishment of a connection that signals support for authentication via a client certificate. </p><p>This proposal leverages the <a href="https://www.ietf.org/archive/id/draft-ietf-tls-tlsflags-14.html"><u>tls-flags</u></a> proposal under discussion in the IETF. The TLS Flags draft allows clients and servers to send an array of one bit flags to each other, rather than creating a new extension (with its associated overhead) for each piece of information they want to share. This is one of the first uses of this extension, and we hope that by using it here we can help drive adoption.</p><p>When a client sends the <a href="https://datatracker.ietf.org/doc/draft-jhoyla-req-mtls-flag/"><code><u>req mTLS</u></code></a> flag to the server, they signal to the server that they are able to respond with a certificate if requested. The server can then safely request a certificate without risk of blocking ordinary user traffic, because ordinary users will never set this flag. </p><p>Let’s take a look at what an example of such a req mTLS would look like in <a href="https://www.wireshark.org/"><u>Wireshark</u></a>, a network protocol analyser. You can follow along in the packet capture <a href="https://github.com/cloudflareresearch/req-mtls/tree/main/assets/demonstration-capture.pcapng"><u>here</u></a>.</p>
        <pre><code>Extension: req mTLS (len=12)
Type: req mTLS (65025)
Length: 12
Data: 0b0000000000000000000001</code></pre>
        <p>The extension number is 65025, or 0xfe01. This corresponds to an unassigned block of <a href="https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml#tls-extensiontype-values-1"><u>TLS extensions</u></a> that can be used to experiment with TLS Flags. Once the standard is adopted and published by the IETF, the number would be fixed. To use the <code>req mTLS</code> flag the client needs to set the 80<sup>th</sup> bit to true, so with our block length of 12 bytes, it should  contain the data 0b0000000000000000000001, which is the case here. The server then responds with a certificate request, and the request follows its course.</p>
      <div>
        <h3>Request mTLS in action</h3>
        <a href="#request-mtls-in-action">
          
        </a>
      </div>
    <p><i>Code for this section is available in GitHub under </i><a href="https://github.com/cloudflareresearch/req-mtls"><i><u>cloudflareresearch/req-mtls</u></i></a></p><p>Because mutual TLS is widely supported in TLS libraries already, the parts we need to introduce to the client and server are:</p><ol><li><p>Sending/parsing of TLS-flags</p></li><li><p>Specific support for the <code>req mTLS</code> flag</p></li></ol><p>To the best of our knowledge, there is no public implementation of either scheme. Using it for bot authentication may provide a motivation to do so.</p><p>Using <a href="https://github.com/cloudflare/go"><u>our experimental fork of Go</u></a>, a TLS client could support req mTLS as follows:</p>
        <pre><code>config := &amp;tls.Config{
	TLSFlagsSupported:  []tls.TLSFlag{0x50},
	RootCAs:       	rootPool,
	Certificates:  	certs,
	NextProtos:    	[]string{"h2"},

} trans := http.Transport{TLSClientConfig: config, ForceAttemptHTTP2: true}

This example library allows you to configure Go to send req mTLS 0xfe01 bytes in the TLS Flags extension. If you’d like to test your implementation out, you can prompt your client for certificates against req-mtls.research.cloudflare.com using the Cloudflare Research client cloudflareresearch/req-mtls. For clients, once they set the TLS Flags associated with req mTLS, they are done. The code section taking care of normal mTLS will take over at that point, with no need to implement something new.

Two approaches, one goal

        </a>
      </div>
      <p>We believe that developers of agents and bots should have a public, standard way to authenticate themselves to CDNs and website hosting platforms, regardless of the technology they use or provider they choose. At a high level, both HTTP Message Signatures and request mTLS achieve a similar goal: they allow the owner of a service to authentically identify themselves to a website. That’s why we’re participating in the standardizing effort for both of these protocols at the IETF, where many other authentication mechanisms we’ve discussed here — from TLS to OAuth Bearer tokens –— been developed by diverse sets of stakeholders and standardized as RFCs.   </p><p>Evaluating both proposals against each other, we’re prioritizing <a href="https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-auth-architecture"><u>HTTP Message Signatures for Bots</u></a> because it relies on the previously adopted <a href="https://datatracker.ietf.org/doc/html/rfc9421"><u>RFC 9421</u></a> with several <a href="https://httpsig.org/"><u>reference implementations</u></a>, and works at the HTTP layer, making adoption simpler. <a href="https://datatracker.ietf.org/doc/draft-jhoyla-req-mtls-flag/"><u>request mTLS</u></a> may be a better fit for site owners with concerns about the additional bandwidth, but <a href="https://datatracker.ietf.org/doc/html/draft-ietf-tls-tlsflags"><u>TLS Flags</u></a> has fewer implementations, is still waiting for IETF adoption, and upgrading the TLS stack has proven to be more challenging than with HTTP. Both approaches share similar discovery and key management concerns, as highlighted in a <a href="https://datatracker.ietf.org/doc/draft-meunier-web-bot-auth-glossary/"><u>glossary</u></a> draft at the IETF. We’re actively exploring both options, and would love to <a href="https://www.cloudflare.com/lp/verified-bots/"><u>hear</u></a> from both site owners and bot developers about how you’re evaluating their respective tradeoffs.</p>
      <div>
        <h2>The bigger picture </h2>
        <a href="#the-bigger-picture">
          
        </a>
      </div>
      <p>In conclusion, we think request signatures and mTLS are promising mechanisms for bot owners and developers of AI agents to authenticate themselves in a tamper-proof manner, forging a path forward that doesn’t rely on ever-changing IP address ranges or spoofable headers such as <code>User-Agent</code>. This authentication can be consumed by Cloudflare when acting as a reverse proxy, or directly by site owners on their own infrastructure. This means that as a bot owner, you can now go to content creators and discuss crawling agreements, with as much granularity as the number of bots you have. You can start implementing these solutions today and test them against the research websites we’ve provided in this post.</p><p>Bot authentication also empowers site owners small and large to have more control over the traffic they allow, empowering them to continue to serve content on the public Internet while monitoring automated requests. Longer term, we will integrate these authentication mechanisms into our <a href="https://blog.cloudflare.com/cloudflare-ai-audit-control-ai-content-crawlers/"><u>AI Audit</u></a> and <a href="https://developers.cloudflare.com/bots/get-started/bot-management/"><u>Bot Management</u></a> products, to provide better visibility into the bots and agents that are willing to identify themselves.</p><p>Being able to solve problems for both origins and clients is key to helping build a better Internet, and we think identification of automated traffic is a step towards that. If you want us to start verifying your message signatures or client certificates, have a compelling use case you’d like us to consider, or any questions, please <a href="https://www.cloudflare.com/lp/verified-bots/"><u>reach out</u></a>.</p>