OpenTelemetry Part 1: Making the Edge less distant
One of the main reasons you use Fastly is that we are close to your end users, able to respond in a few milliseconds. But that can also make it feel like Fastly is "outside" your system, "in front". To feel like Fastly is truly part of your application architecture, you need to observe your whole system at the same time, in one place. OpenTelemetry is a new standard that can help.
When people first started building distributed applications for the web, you'd have to figure out all the components of your application architecture: physical servers, operating systems, server software, databases, and finally, code.
Today the serverless revolution has abstracted almost all of that away. I can write code using cloud functions or buy a database plan from a SaaS provider and be sending queries seconds later from my code without installing a thing.
As a result, you can start to run your code in more and more places, on platforms that are more and more ephemeral. If the environment that runs your code took no effort to provision, you don't care if it only lasts a few seconds. But this presents a new challenge: how do you keep track of what was executed where, for whom, and most importantly, if something goes wrong, what happened, where, and why?
Imagine you have a microservices architecture that is designed to serve a newspaper website and it looks like this:
This is a relatively simple architecture but it already has some interesting features:
Two layers of Fastly concentrate globally-distributed requests to a point close to your core infrastructure (we call this shielding).
Some requests may be processed entirely at an edge location; some might involve a call directly from the edge location to a non-Fastly service.
A gateway service running in your core cloud platform reaches out to multiple microservices.
Some of those microservices might be "opaque" vendor SaaS services, while some are your own, but some might even be fronted by or hosted by Fastly.
Multiple vendors. An unknown and unknowable number of layers. Opaque services, apparent circular references… it's pretty terrifying. But if all these components speak OpenTelemetry (OTel), it's possible to understand a complete journey though this system in one single visualization:
What's more, using OpenTelemetry not only allows you to instrument every component in your system, it also avoids locking you into any one observability vendor. The trace above is being rendered by Honeycomb, which is a really great service, but if you prefer, you could use Lightstep or New Relic, or even run your own analysis using an open source self-hosted tool like Zipkin. They can all understand the same telemetry format.
In this example you can see spans from Fastly's edge servers nesting around spans occurring in a Google App Engine instance. The benefit of the open standard is that trace data can be generated by any system, in any language and understood and analyzed by the same collector. Libraries exist for most languages - in my demo, for example, I'm using the standard instrumentation for NodeJS, which will pick up spans automatically from an ExpressJS app.
OpenTelemetry and Fastly
Both of these approaches are already used by large Fastly customers across many industries and countries (for example, I've recently been collaborating with Conde Nast - who publish titles like Wired, GQ and Vogue - on their telemetry). These customers have complex, colorful architectures combining on-premise technology, cloud computing, vendor PaaS, SaaS and serverless services, frontend single-page applications, and native apps!
Slowly but surely all the moving parts of these complex machines of the digital age are starting to speak the same language of OpenTelemetry, and are able to contribute to the same story of each transaction or request that flows across the systems that make up the machine.
At Fastly we know that we are rarely the only platform you're using, and we don't really want to be - we're never going to be the best solution to every problem. We love OpenTelemetry because it helps Fastly shine as part of the architecture you want to build:
It avoids you getting locked in to any vendor, including us
It empowers you to use whatever analytics and insight tooling works best for you
It makes Fastly a first-class component in your system architecture
We'll be doing follow-up posts in the next few weeks to walk through the details of extracting OpenTelemetry from VCL and Compute@Edge services, and a case study of how we're using OpenTelemetry to monitor our own Fiddle tool.