Observability: Embracing the messiness of distributed systems
In this fireside chat, CEO and founder of Honeycomb Charity Majors spoke with our very own Lisa Phillips, Fastly's VP of Data Governance. They discussed embracing messiness, enabling customers to fix their own problems, and the power of structured data.
Here are some highlights:
If you instrumented your code and adopted the observability principles, you could realistically radically reduce alerts. Because you trust yourself and your team to answer any question in a very short amount of time. You don’t have to rely on getting paged about clusters of things that mean this other thing that you can’t really watch for. You can just alert on the most common significant events, and that’s it. And I think it’s important that we have a different term for observability, because the best practices are different enough and diametrically opposed monitoring. Monitoring means you do not look at graphs all day. The system informs you when it’s broken. That’s great.
But with observability, you have to get comfortable with a lot more fuzziness and messiness. And it’s more like, “No, I’m making a change to the system. I know that changes introduce problems. I don’t know what problem it could cause, but I’m going to go look.” It embraces the complexity of modern distributed systems and how they’re being used. You simply can’t predict everything. We need to learn to embrace that and we need to meet that with the ability to observe.
Enabling customers to fix their problems
The only thing that’s better than giving people the right answers is giving them the power to get those answers themselves. Fastly is a really good match for us, because we both believe in empowering developers, not putting a lot of layers between them and what they want to do. It’s always better to give people the ability to ask and answer any arbitrary question than to predefine, “Here are the questions that we expect and assume that you will care about. Here is your top ten list. What about number 11?” It’s just easier and more powerful if you can give them the building blocks, the low level primitives so that they can ask and answer any question.
The power of structured data
The internet still runs on strings, dots, metrics, and bailing wire. But as it becomes more chaotic and more complex, we need to retain a certain amount of discipline. I think of debugging like looking for a needle in a stack of burning needles. You need context. You need to know, “What is the unique idea of that needle?” Not, “Give me all the silver needles. Give me all the pointy ones.” And that’s what traditional dashboards and metrics let you do. They let you look for the big, very rough groupings of things. They don’t let you say, “Well, I want a needle that was created in 1997 that is exactly three centimeters long that has this unique ID that was built by this company.” All of these really fine grain things.
For that, you need structured data. Structured data is really powerful. You need something that will let you explore context and that doesn’t confine you to predefined aggregates around set intervals and tags. We want to make it the easiest, best way for people to ask questions about their data to help them get your work done. Ultimately, this is what we’re trying to build.