Krystal Mejia
Senior Software Engineer, CBS
Matt Ball
Senior Video DevOps Engineer, CBS
Bryce Fisher-Fleig
Lead Video Software Engineer, CBS
The world’s appetite for streaming — of both live and on-demand content — has become more voracious than ever before. This talk delves into three interconnected projects that exemplify CBS Interactive’s brand of high-velocity innovation in the video streaming arena: Airspace, which is set to scale up automation on thousands of CDN services; Propeller, the live-streaming platform that orchestrates all CBS live events; and Visor, an in-house analytics platform that delivers visibility from initial request to final playback.
Lightning-fast streaming at scale
Hear how we enable our video and audio streaming customers to deliver pixel-perfect content to the world.
Krystal Mejia (00:08):
Hello, everyone. My name is Krystal Mejia. I'm a senior software engineer in the video technology group at Viacom CBS. I'm joined today by two of my teammates, Matt Ball, and Bryce Fisher-Fleig to talk to you about our innovative approach to distributing events in this seemingly new era of streaming. And so a little background about where we work. VTG's mission is to provide advanced technologies to all the different brands that fall under the Viacom CBS umbrella. We work with a bunch of different combination of these, and to shed some context, the corporation is clearly made up of quite a few different brands. Every brand has its own unique workflow and integration, and we're streaming a lot of content.
Krystal Mejia (00:44):
So I'll quickly show you some examples. So we've got ETLive, a 24/7 live linear channel with a lot of seat-back interactivity that we call DVR. We also have some hybrid approaches like Dabble, where the capturing is happening on prem, but the encoding is happening in the cloud. We even have huge events like Big Brother Live. It's a DRM protected channel with three months of interactivity and multiple feeds coming in. We even stream ad hoc events for college sports and partnership with 350 schools, we provide video infrastructure to stream hundreds of concurrent events throughout the school year. And speaking of sports, despite a global pandemic sports does not seem to be slowing down and more and more people are looking to watch these events from the comfort of their home. With great success we were able to stream the National Women's Soccer League, as well as the UEFA Champions League, which included the women's league as well.
Krystal Mejia (01:34):
And of course you can't forget the biggest sporting event of the year, which is Super Bowl. Super Bowl is this event that breaks its own streaming record every year. It's really hard to provision for it ahead of time because we don't know what those numbers will be. And so when we think back to two years ago, when we streamed Super Bowl 53, following that event, we really got together as a team and tried to figure out a really great way to come up with a platform that we can sort of create all of these streaming events on demand rather than having to manually provision them.
Krystal Mejia (02:03):
And that brings us to Propeller. So Propeller is our vendor-agnostic live streaming platform used to provision manage and create resources in the cloud. And so the goal of Propeller is to be agile. We want to support as many events and workflows as possible. We are vendor-agnostic and so the idea is depending on the complexity of the event, we can select different encoders, the right encoder for the job. And so with an API driven approach, we're able to do this all dynamically, no need to manually configure and create events.
Krystal Mejia (02:35):
Speaking of encoding complexity, configuring events is, it's not an easy task. You have to think of all these different use cases upfront. What [inaudible 00:02:43] protocols are you using? What are your downstream requirements? Do you need encryption? Is monetization needed. Is your stream accessible? All these sorts of complexities can really drive down a platform and of course increased bandwidth that you need to reserve upfront. And so it's important to keep in mind, of course, as the complexity of these events increase, you're going to need more bandwidth and this is Fastly altitude, and so you're not really questioning how we create events, but rather how we are able to deliver them with great success. We have several backend services that really help support the Propeller platform, Airspace and Visor being two biggest ones. So imagine this, we have hundreds of concurrent events running on the platform at any given time. How are we monitoring these events? How are we creating CDN configs? We cannot possibly be doing this manually. So I'm going to hand it off to Matt to talk a little bit about Airspace and how they're able to assist us with CDN provisioning.
Matt Ball (03:37):
Thanks for filling us in on Propeller, Krystal. My name's Matt Ball, I'm a senior video dev ops engineer here on the Viacom CBS team. I work on a project called Airspace. So I want to give you a high-level overview of that now. Here's the overall architecture and where Airspace fits in, in relation to the other products that we're talking about today. Before Super Bowl 53 my colleagues and I were tasked with creating all the workflows manually. It was very time consuming, error prone. The day of the event went really well, there were a few human errors that were discovered. For instance, we weren't varying correctly for [inaudible 00:04:17] content. So users were getting [inaudible 00:04:20] content, whether or not they supported it or not.
Matt Ball (04:22):
As we began to migrate most of our day-to-day workflows to Fastly after the event, we realized we needed some sort of configuration management system and we explored using Terraform, but found it was lacking for our needs. And also many vendors didn't have modules. And in particular, some of the vendors that we wanted to use in our multi CDN workflows. So we decided to make our own tool to do this internally. We called it Airspace and Fastly had a great API that we could leverage. So did many of the other vendors that we wanted to use. And so that's how we decided to go ahead and embark on the journey of creating Airspace.
Matt Ball (05:07):
So here's what Airspace does. It primarily was in the beginning, the ops tool for us to use to more easily create CDNs, keep them consistent. But we also realized we needed to expose a REST API in particular for Propeller so that they could create properties on demand. So we went ahead and exposed that REST API, and now Airspace works hand in glove with Propeller and Visor. Instead of clients creating tickets for us humans to grade CDNs, they can create those CDNs themselves programmatically from their API or from the command line, however they want to do it. It allows teams to concentrate on what they're doing instead of worrying about how to the CDN and deliver the content to the viewer. It also allows us to thoroughly test any change before it hits production, using a fully functional suite of behave tests. And we've caught many areas before they've hit production with these behave tests, they just air out and prevent anything from going out.
Matt Ball (06:18):
Here are the Airspace components. We have three primary components. There's the REST API, there's the static-source. And they're workers that churn through the queue, all the jobs that are created to create an Airspace CDN workflow, there's many jobs, many moving parts. And of course we rely on the various third party APIs, without which we couldn't actually do any of this work.
Matt Ball (06:44):
The static-source module dynamically generates a Fastly BCL based on what's been requested for other properties that involve EdgeCast or CloudFront. It creates rules, rule engine, rule sets, and creates Lambda edge functions or WEF rules. And the VCL fragments that we generate have been developed over the past few years since Super Bowl 53. And they've been improving over time as we worked through issues, work with Fastly on improvements. So today we're able to create a single CDN workflows, which are pretty standard and pretty straightforward and fine for a lot of our needs.
Matt Ball (07:31):
For more robust delivery needs we have multi CDN workflows that we can configure on demand as well. We utilize a Fastly property as both Edge and Media Shield in this case. So all the other CDNs that are in the mix use Fastly as origin and Fastly then protects our origin. Something very important that Airspace does is maintain our logging configurations for all of our different CDNs. It keeps things consistent across the board. It allows us to update things fairly quickly. If the Visor team needs to change the way the logs are delivered, the way the logging format is, so we can do that fairly quickly within a matter of minutes. Just change all the configurations across the board. So Airspace has not only made it possible to create optimized video delivery CDN configurations fairly quickly, it's made things a lot easier in terms of keeping things consistent, making updates. And here to talk about what Visor does with all of those logs it's receiving, and it's a lot of logs from both Origin and CDN Edge is Bryce Fisher-Fleig
Bryce Fisher-Fleig (08:43):
Thanks, Matt. My name is Bryce Fisher-Fleig and I'm the tech lead on the Visor project. Let's go back to the slide where we look at how all these pieces fit together. So first of all, Propeller spins up all the encoders and the origins, and then reaches out to Airspace to set up the CDN delivery. Really for up to a few hundred thousand concurrents, that's all you need. That'll work great. You have the end to end bit flow going all the way from the studio out to your customers and your viewers at the edge.
Bryce Fisher-Fleig (09:10):
Now let's talk about how Visor fits into that. Propeller internally, when it creates these channels they call them, they have a unique ID for each channel. Channels are grouped into an organization. So because I asked them to do it this way, Propeller will actually send over the channel ID and the organization ID to Airspace when they ask for the CDN configuration. Airspace will actually inject those values into the logging format. And then every time someone comes to Fastly and sends a request their way, all those values with the channel ID and organization ID make their way into Visor.
Bryce Fisher-Fleig (09:46):
Visor will periodically go and scrape AWS to find metrics and CloudWatch about the channels and the origins. And now we can use those same channel IDs and org IDs to connect all the dots all the way through the whole system. With Visor we also go and collect data from a player beacons and from various QOE systems as well. So by having information out from the encoder and the origin from delivery and from players, we're able to see everything happening end to end. And our goal is to provide that information in one minute or less if possible.
Bryce Fisher-Fleig (10:23):
Propeller existed before Visor, so we started designing Visor around Propeller from day one. There's a few special things we have to do to accommodate that. Because Propeller allows these channels to come and go dynamically with user interaction, we had to make sure that we could support that same kind of dynamism on our site. So one of the rules we came up with, and this is the most important one, the live streams only ever exist as data in our system. They never exist as code. The reason that's so important is if we represent these live streams as code, now we'll have to do a code deploy every time there's a new live stream. And we just never be able to keep up with the hundreds of channels that Propeller has at any one time.
Bryce Fisher-Fleig (11:08):
The other thing we do is we take that metadata that we have the channel ID and the org ID, and we make sure that every fact table inside Visor uses the same identifiers. What we're able to do is connect all the dots inside our database in a single place. So you can write one query that connects the dots between all of that information. The really cool thing is because of all this dynamism that we support, we don't need any out-of-band signaling. It means as soon as you start sending your data to Visor, we automatically support you out of the box.
Bryce Fisher-Fleig (11:38):
So let's talk a little bit more about how Super Bowl 55 and Fastly and Visor all fit together. So no one CDN is big enough to the entire Super Bowl. So we work with three or four different CDN partners to do the event, but we need to have one place that we stop all the other traffic from coming in and taking down origin. As you saw on that slide, we trust Fastly to play that pivotal role in protecting our origin and also to be an edge at the same time. It has worked out great. We're really big fans of Fastly. We love having configurability and you have thousands and thousands of services. So it just feels natural to use them in this pivotal role.
Bryce Fisher-Fleig (12:17):
When we think about doing Super Bowl, we think about how do we make sure that all of our partners and all of our different vendors that we work with can all see as much information in context as possible. And our key stakeholder, we work with there is Fastly. So we actually design our dashboards, our logging formats, our visualizations all around and for the Fastly team. So we actually regularly get together with Gino and the folks in ops and build out dashboards to their specifications and see them as really our key customer there.
Bryce Fisher-Fleig (12:50):
What we're able to offer is that we have more context on what's happening in the origin and coding side in the Propeller world. We also know what's happening with the other CDNs, and then we have all this QOE data from the players. So we're able to provide all that rich context. But because Fastly really values their customer's privacy, they make a deliberate choice not to collect too much information about every individual requests. That's exactly what Visor does. So we're able to take all that information and provide it back as troubleshooting information to Fastly specifically for this event. We're really excited to provide them all this extra context.
Bryce Fisher-Fleig (13:26):
So under the hood, Visor uses a relational database. And if you know anything about relational databases, you don't usually think of them as super scalable, but the special database that we have is a product out of Azure [inaudible 00:13:38] explorer. It's designed to be a horizontally scalable database, but still we want to make sure we do everything we can to be sure that our database does not fall over.
Bryce Fisher-Fleig (13:46):
So step one of that is capacity planning. We take big events we've had happened recently. The election coming up, we'll use that, the presidential debates that we've had recently, and we look at the number of people who are watching the streams. We determine how much load that put on each component of our system. Then we multiply that out to the traffic LOAD we expect for Super Bowl. And then we load test against that. One of the things we do that's really unique is because we want to be very sure that nothing goes wrong, we practice having our whole database fall over and seeing how long it takes us to set it back up. And we build SLAs around that as well.
Bryce Fisher-Fleig (14:23):
In order to be able to do all of that, we would spend a lot of time talking to the experts in Azure, Fastly and AWS, all across the board. No matter how prepared you are on game day, anything can happen. So we try to make sure we have lots of safety valves to ensure that we don't fall over, or if we do, we can recover pretty quickly. One of the key things we want to have is the ability to sample our logs. So we throw away some percentage of logs so that we don't overwhelm our system. Fastly doesn't support that out of the box, but we were able to talk to the product managers and different folks at Fastly and they said, "Oh yeah, sure. We actually already have VCL code that can do that." So we were able to throw that over to the Airspace team and they were able to add that feature for us. We said, "Hey, we're worried about overwhelming any one place that we're trying to send the logs in Azure. Is there any way we can split and load balance?"
Bryce Fisher-Fleig (15:10):
And they said, "Yeah, I think we've got code for that too." They threw that over the fence and we're able to put that into Airspace. So now those are features in Airspace as well. By using the big budgets we get for these mega events, like the Super Bowl, we're able to drive innovation and add new features that can support even the smallest live streams that Viacom CBS does. And by breaking apart our work into these different products, we're able to go very deep and provide more value and more benefit to each brand and CBS, Viacom CBS than they could have on their own. If you have any questions or want to talk about anything that we've discussed in our talk, please feel free to reach out to us. Thanks again.