Lightweight Latency Measurement with Server-Timing

Principal Solutions Architect, Fastly

January 20, 2026

CDN & Delivery Compute DevOps Observability Performance

An illustration of a browser window with a large magnifying glass over the left portion of the screen

The dream that websites are lightning-fast and load instantly is only achievable with insights and improvement. What if your browser could show not only the timings of each request and response, but also the timings from systems inside your infrastructure? Like this:

What Is Server-Timing and How Does It Work?

The Server-Timing specification describes a way for servers to communicate performance metrics about the request-response cycle to the user-agent. It’s well-established: the first working draft was in 2015. It’s well-supported by most browsers and their developer tools.

The specification defines an HTTP response header, Server-Timing, which can store metrics. Metrics have a name and parameters, typically dur for duration in milliseconds and desc for description. Here’s an example which includes a database lookup, a template processor, and a cache read:

< Server-Timing: miss, db;dur=53, app;dur=47.2
< Server-Timing: customView, dc;desc=atl
< Server-Timing: cache;desc="Cache Read";dur=23.2
< Server-Timing: total;dur=123.4

As these headers are passed along the network to your browser, they can be visualised by the browser developer tools as above.

A Real-World Example: Measuring Compression Latency

Recently, I built a custom compression experiment, which should make pages smaller and lead to them loading faster. However, what if it generated smaller pages, but took longer to compress them? Clearly, I needed some insights.

My experiment has three main systems: an origin which serves pages, a Fastly Compute application which compresses using my custom experimental code, and a Fastly Delivery application which caches content.

We need each system to send a Server-Timing header.

Adding Server-Timing at the Origin (Rust)

The origin is a Rust application that runs on a central cloud. I created some middleware that wrapped the real response, measured the time taken, and added it as a header:

/// Middleware that adds Server-Timing header to track response generation time.
async fn add_server_timing(request: Request, next: Next) -> Response {
    let start = Instant::now();
    let mut response = next.run(request).await;
    let duration = start.elapsed();

    let timing_value = format!("origin;dur={:.2}", duration.as_secs_f64() * 1000.0);
    response.headers_mut().insert(
        header::HeaderName::from_static("server-timing"),
        header::HeaderValue::from_str(&timing_value).unwrap(),
    );

    response
}

Great, now we have some information about how long the origin has taken to generate the page.

The custom compression Compute application is also written in Rust. I created a similar wrapper that adds the time taken. I chose to add multiple metrics together in one header rather than as separate headers:

/// Add compute timing to the Server-Timing header, preserving existing values
fn add_server_timing(response: &mut Response, timing: &ServerTiming) {
    let compute_value = timing.to_value();
    if let Some(existing) = response.get_header_str("Server-Timing") {
        response.set_header("Server-Timing", format!("{},{}", existing, compute_value));
    } else {
        response.set_header("Server-Timing", compute_value);
    }
}

Great, now we have added information about how long the Compute application took to fetch the response and compress it.

The Delivery application is configured using the Varnish Configuration language. As this application caches content, if the content has been served from cache, the historic Server-Timing headers sent by the origin and Compute application are no longer relevant, so I overwrite them. For uncached content, I append another header. In vcl_deliver():

if (fastly.ff.visits_this_service == 0) {
  if (resp.http.X-Cache ~ "HIT") {
    # Any Server-Timing header is cached and old, so overwrite
    set resp.http.Server-Timing = "delivery;dur=" + time.elapsed.msec;
  } else {
    # Communicated with origin, so append if present
    if (resp.http.Server-Timing) {
      add resp.http.Server-Timing = "delivery;dur=" + time.elapsed.msec;
    } else {
      set resp.http.Server-Timing = "delivery;dur=" + time.elapsed.msec;
    }
  }
}

We’ve now added information about how long the Delivery application took to (maybe fetch the response from the Compute app) and serve it.

End-to-End Latency Visibility in Practice

For a cache miss, we can trace the latency through all three systems:

Server-Timing: origin;dur=1.63,compute;dur=19.21
Server-Timing: delivery;dur=25.00

For a cache hit (fetching content from cache is super speedy):

Server-Timing: delivery;dur=0

Now I can navigate my website with my browser developer tools open, on the Network page and the Timing tab, to see how long the origin took to generate the page, how long the Compute application took to fetch and compress the page, and how long the Delivery application took to fetch and cache the page, just like in the top image.

Beyond Duration: What Else Should You Measure?

I focused on sending a single metric, the duration, for each system. You might be tempted to add more information. Why didn’t I add timestamps instead - that way we could build a network waterfall? As the spec says:

Because there can be no guarantee of clock synchronization between client, server, and intermediaries, it is impossible to map a meaningful startTime onto the client's timeline. For that reason, any startTime attribution is purposely omitted from this specification.

Given the out-of-sync caveat, some organizations do send timestamps as metrics. Why not more metrics? The user ID? The cache status? The name of the template processed? APIs called? What protocol was used? Network protocol metrics? Why would we send low-level details like this metric description to the browser in a Server-Timing header?

?proto=QUIC&rtt=117661&min_rtt=112982&rtt_var=3499&sent=34239&recv=3637&lost=2406&retrans=2406&sent_bytes=40813863&recv_bytes=219344&delivery_rate=3226151&ipace=1016132&icwnd=33600&ss_exit_cwnd=1529088&ss_exit_bw=4655177&ss_exit_reason=1&cwnd=377496&unsent_bytes=0&cid=767cb529d01ecefd&ts=19348&inflight_dur=16087&x=125

The Server-Timing specification also standardizes a JavaScript interface to enable applications to collect, process, and act on these metrics to optimize application delivery. That way, a real user monitoring (RUM) JavaScript application on the page can collect the Server-Timing metrics used to generate the page and send them for processing, along with other performance metrics.

With all this additional information, the “Timing” part of the header seems a bit inaccurate, but it’s a bit late for a rename.

Another common metric is the caching behaviour of content delivery networks. This use case was extracted into a separate header with the Cache-Status RFC.

All these additional metrics might start to remind you of distributed tracing like OpenTelemetry. The Trace Context specification defines standard HTTP headers and a value format to propagate context information that enables distributed tracing scenarios.

How Server-Timing Is Used Across the Web

As Server-Timing is a public response header; I thought I would investigate how it is used by researching using the December 2025 HTTP Archive crawl. They periodically crawl the top sites on the web and record detailed information about fetched resources, used web platform APIs and features, and execution traces of each page.

I found two and a half billion Server-Timing metrics. I dug into the dur values and found that most of them are very tiny, indicating measurement of very brief internal processing, like a database lookup or template processing:

The top metrics are named after companies and represent the time taken to process pages or images.

I found a number of funny metric names:

actually-load-product-groups
CHECK-IF-SPECS-EXIST
getRamProperties
getToast
grape_key
iron_gate_lock
lighthouse-operations
lion
load-product-groups-to-load
lunar-data
mid-OfflineMiddleWare
mw-hotness-cold
not_red
pet
piano
prolog
query-total-lara
retrieve-cats
SauronOverallTimer
terminator_rt
undefined
Vanilla

The top descriptions represent resource types, cache statuses, datacenters, countries and networks:

image
HIT
bur
dca
US
396982

Privacy and Security Considerations

Wait, how was I able to read these funny metric names? I was also able to dig into metrics and metric names, which might be considered sensitive information. Remember that the Server-Timing header is public, which means that your users and any crawlers also have access to it. As the spec says, you should be careful not to “expose potentially sensitive application and infrastructure information.”

Stop Guessing: Measure Latency Where It Actually Happens

Stop guessing where your latency is coming from. Server-Timing is a lightweight way to expose performance data across your entire stack. If you are debugging requests through complex systems, this is the tool you need.

Add Server-Timing to your origin or Compute app and start measuring real end-to-end latency today.