Senior Product Manager, Gannett
Site Reliability Engineer, Platform Engineering, Gannett
The engineering teams at USA TODAY take you behind the scenes of November 3, 2020, to relive the spikes, dips, and most-astounding stats they witnessed as the ballots rolled in and the headlines rolled out. Plus, hear how fronting USA TODAY’s origin with Fastly’s edge empowers their teams to provide accurate live results and critical news updates for the thousands of local and national elections with no downtime or customer experience impact.
Empowering the new age of news
Discover how Fastly enables digital publishers to report and secure in real time.
Danny Sanchez (00:08):
Hello everybody and welcome now to Altitude. We're very excited to be here to talk to you about election night and really election week. I'm Danny Sanchez, Senior Product Manager at Gannett, responsible for our election results system. I'm joined by my colleague, Yanyan Ni, Site Reliability Engineer at Gannett as well. We'll hear from her in a bit. And so really before we jump into elections and all that entails, I want to tell you a little bit about our organization and how we approach elections and journalism. So Gannett is a news and information company spanning 250 newsrooms across 46 states with 5,000 journalists working to bring news to those communities. And our unique approach is really encompassing the national level, as well as the hyper-local level. We have newsrooms as large as USA Today, which has a national perspective, but also newsrooms in small rural communities, mid-sized cities, and large cities as well. So when we talk about elections and how to cover them, we're concerned with that big picture as well as what's happening in individual's communities. Next slide.
Danny Sanchez (01:16):
And so when we talk about election day or election night, we tend to think of it as this monolithic event. We see everything tidy in one big chart showing a US map, let's say. But really because we are 50 states with 3,200 counties and five US territories, all with their own laws and their own procedures. It really is a much more complicated event under the hood. And so the tools that we've built help us present that information to our readers, collect that data efficiently, and present it so that our readers can get a complete picture of the election from that presidential race down to their very local races. So next slide.
Danny Sanchez (01:53):
So before we dive into all of the nitty-gritty details of elections, let's just take a quick look at what we're talking about. So this is a sample of some of our election results pages and components, showing things like US maps, giving a big picture, state maps with county-level data, charts, infographics, balance of power graphics that show control of the United States Senate and the United States House of Representatives, and in addition to all of that, we also cover multiple dates with our system. And so this system actually covers just for primaries, where members of a party compete to get the nomination for their party. And what you see on the right is an example of that sort of map as well. So we really cover special elections, primaries, all kinds of different things with our system. Next slide.
Danny Sanchez (02:39):
So let's take a closer look at our key component, which is our homepage embed that went across the sites of USA Today and 300 of our newsrooms. This module that you see here gives a quick glance at the top narratives of the general election, which are what's happening with the presidential race, a few key states that we want readers to pay attention to and control of the Senate and the House. Next slide.
Danny Sanchez (03:06):
We also have special modules which require special publishing actions to really get working. So these are two that allow us to highlight particular key races for our readership. At the top, you see our presidential states to watch. And what this does is help guide the reader on election night and throughout the election to states that are maybe swing states that are most in contention, or maybe states that haven't been called yet by the associated press, or the AP as we know it, which is our provider for election results.
Danny Sanchez (03:35):
And what you see below is our local races to watch. That was a special module that allowed each of those 300 newsrooms I mentioned before to select the AP races that they care about in their community. We all talk about the presidential race, but at the local level, there may be a governor race, maybe a ballot initiative that's controversial, even little state house races and things like that, that may be of just as much interest to local readers as the presidential. Each of these modules were able to be updated in real time, and the selection updated using a Google Sheets tool we developed. Which involved making sure that we had timely results and also that our editors decisions were reflected immediately across all our templates. Next slide.
Danny Sanchez (04:15):
In addition to the actual data itself, we also created a mechanism for our editors and reporters to add written context and hyperlinks to any individual race or any landing page. As you all may know, this was a pretty confusing election, lots of different things going on with mail-in balloting and the pandemic. And so this allowed us to give readers that extra context, or maybe explain some weird rules in places where we have runoffs and things like that, or maybe explain what it actually means, "A ballot initiative." Next slide.
Danny Sanchez (04:47):
Lastly, this particular module is our in-story embed. And so in addition to those landing pages, we produce a lot of journalism about those races. And so this module allowed us to embed live results within any new story across the network for any AP covered race, including those State Houses and down-ballot races. Our editors created 1,100 instances of these modules to embed in their stories so they could provide live results right there, amidst their journalism. And in addition, this also worked across native apps, Google Accelerated Mobile Pages, and Facebook Instant Articles. Next slide.
Danny Sanchez (05:26):
So why show you all these modules? Because, this involves a lot of intricate cache handling. And so what you see here is an assembly of all of our different modules put together that would update when a single vote updates in a US House race. And so what you see at the top there is our balance of power graphic on our home page, all of the various maps, the single race charts, the County level data, that race embed, which may be in an unknown number of stories, and the County level data and maps. And how we handle this with Fastly is every single race, and even every reporting unit inside a race, which would be a County, has its own unique surrogate key. And so when we get data that's fresh from the AP for that race, we can selectively and strategically burst cache across all of those modules you see here, and only those modules, when we have that fresh data, letting us be robust, but fast to update. Next slide.
Danny Sanchez (06:26):
In addition to that national data, we also handle local data, and we created a Google Sheets based tool that allows newsrooms to enter those local races. That's another cache handling situation where we need the races updated immediately upon a published action, but we need that robust infrastructure to handle all of the load to that tool. Next slide.
Danny Sanchez (06:45):
Lastly, engineering for elections in general is a complicated undertaking. We'd like to call it the Super Bowl of newsroom development. And for those of you outside of the US, that's our biggest sporting event here in American Football. And so we have to handle this immediate flood of traffic, but with data that needs to be updated in real time and constantly, because readers expect up to the second results as soon as they're provided. We also have to handle weird data conditions like multiple winners, and runoffs, and all kinds of the unusual cases in our designs. And so to talk a lot more about that engineering side and how that works under the hood, I'm very happy to introduce my colleague, Yanyan Ni. Yanyan, take it away.
Yanyan Ni (07:26):
Thank you Danny. Hello everyone. I'm Yanyan Ni, a Site Reliability Engineer in Gannett. So let's first take a look at our election results pipeline. When the client hits our election pages, it would be passed to Fastly service. So based on the URL past, the Fastly service will direct the traffic to our web server. And then the web server will using the API key to access our microservice thorium, which sits behind another Fastly services, the FAM, the Fastly API Management. In short, it is an API gateway build on the Fastly. I will talk more details about it in a few slides. And the thorium receive that request, it will send the GraphQL query on to our API services.
Yanyan Ni (08:20):
The API service is a complexed system behind the two layers of the Fastly, including the FAM service. So in general, it is getting the data from Associated Press, and in return the GraphQL response to our microservice. And the thorium will wrench the data and to get into a multiple results module, and then return the results to our web server. Finally, the election data will be displayed in the client site.
Yanyan Ni (08:57):
So the diagram on the right, showing the how it is working for the API services. It has composed of the key components, election aggregation as one of the application, which job is to pulling the data from Associate Press. And once there is a data get updated it will formatting those data and call the Content Ingestor API. The latter one, revise the data to the internal database. At the same time, it will generate a series of the surrogate keys and the purge the module Fastly services using those, surrogate key which generated before. The Content API is a key applications in API services, whose job is to passing the GraphQL queries, and take the data from the database and the return the cache the GraphQL response to its upstream applications.
Yanyan Ni (09:58):
The caching control is many set up in the API services layer, which is very critical for us to deliver our content efficiently and accurately. Like Danny said before each race and the County in the race has its own surrogate keys, which allows us to purge the election related data efficiently. And the automatically set up the patch in the Content Ingestor API allows us to serve the latest accountant instantly.
Yanyan Ni (10:34):
In addition, we also set up the TTL for a GraphQL query, which is containing all election dates and the race types. And we'll make sure this data get expired in time. So thanks to the caching strategies, which we set up with the automatic purchase system. The time from Associated Press update to display on the page is surprisingly fast. So it is less than one second for each single message to be displayed. Below is the average response time for our key election related applications during the election day.
Yanyan Ni (11:19):
So right now, I will talk about the Fastly API Management. Back to 2016 election, it was the first time where containerized the production application. And also the first time we are using the Kubernetes. So in the spirit of the 2016, just a few months before the election, we decided to build our own API Management gateway completely in Fastly.
Yanyan Ni (11:47):
There are 11 critical Elections APIs migrated from our current API Management product to the FAM just a few days before the election night. Including some application I have been mentioned before, such as Content API, which is a high-speed data store containing all of our new stories, photos, sports data, and other assets that power our journalism. As well as the microservice thorium, which is a key application to arrange the election data display in multiple results module.
Yanyan Ni (12:20):
As a API gateway, the FAM is using the Private Edge Dictionary to store the API keys. So when there is a client or the upstream Fastly services is using the API key to access the FAM. It we'll verify the key by checking if it is existing in the Private Edge Dictionaries. Once the key has been verified, it will pass the traffic to the downstream origins. We store those, the key values as well as the key metadata in our Private Edge Dictionaries. So except key verification, FAM also percents multiple features as the API gateway. Such as load balancing the traffic to multiple backends, the header manipulation, as well as the past path rewrite.
Yanyan Ni (13:10):
So as FAM is a single Fastly services which includes the multiple proxy maintained by different teams. We need to make sure each team can maintain its own proxy easily, but at the same time they are not interfere with each other. So encapsulation is become very important. We designed this as a way that each proxy is configured under the proxy voter separately maintained by different teams. And we also templated the terraform and the VCL, which can automatically append the proxy appends related configs to avoid any human mistake.
Yanyan Ni (13:47):
We also developed the two very handable tools for us to speed up the migration. One is a Command Prompt. Another is Key Migration Job. The Command Prompt is a go binary which is used the by the teams to create the new proxies. So they just need to answer a series of questions. Then it will be automatically create a proxy related a folder with all the configs has been set up inside. The Key Migration Job is the [Jenkins 00:14:18] job, which facilitate our migrate existing the API keys from the old API platform to the corresponding Private Edge Dictionary, which can be used by the FAM. So these two handable tools really facilitate our migration and to ensure we can migrate the election related components to FAM before the election night.
Yanyan Ni (14:43):
Since Fastly support the streaming logging in the BigQuery, we are logging the FAMs' logs to the Google BigQuery and then using the data studio to visualize those logs as we show on the right. It's one of our dashboards built for FAM monitoring. So a single proxies or multiple proxy can be select to check the request the numbers and also the ratio of the request status as well as the details of each requests, which help us to troubleshooting the FAM issues.
Yanyan Ni (15:21):
So here is some of our exciting results. I really want to share with you is we compare the transaction times between the old API gateway, which is shown on the pink line here with the FAM which shown as a green line here. As you can see, the transaction time in FAM has been greatly reduced by 20 to 30% on average when compared to the old API gateway solutions. So this reduction of the transaction time is improved, not just the FAM, but the overall performance of our whole election pipelines. So we're very glad that this results as pay off for our hard work, for migrating all the election related components to FAM before the election.
Yanyan Ni (16:10):
Before I show some election day data, I would want to last of talk about the monitoring system we're using for election. So we enable the live event monitoring in our seven election related the Fastly services. And Fastly provided the instant information in shared chatops channels. And they also have the dedicated engineerings to help troubleshoot via the War Room, which is very helpful for us to make sure our Fastly services is working properly.
Yanyan Ni (16:44):
Since the most logs from faculty has been sent to the SumoLogic. So we create a custom dashboard based on our Fastly logs and the stats API. We can check in the multiple services status, just seeing a single page. For the key at Fastly services, such as FAM and Accountant API. We are sending the logs to the Google BigQuery and the using the data studio for visualization as I shown before.
Yanyan Ni (17:14):
So here is a part of our election dashboard built in the SumoLogic. As you can see, we can check in different to the error status from different to Fastly services, just on this single page. There's something that haven't been showing in this screen, but we also checking on the request number as well as the backend status. Since we also send the log from our backend to the Sumo.
Yanyan Ni (17:44):
So this slide shows the requests we received for election, as you can see on the election night, we do receive bumped requests, which around like four times and three time during the following day when compared to our regular traffic.
Yanyan Ni (18:06):
So if this is an example that the push notification we have been sent to our users, and it does trigger up to 10 times, request burst after a mobile push notification sent to the users. But luckily we have the proper cache control setup and the Fastly protect us being hit by this request to burst, which saved us the cost by over provisioning the backend resources to handling this issues.
Yanyan Ni (18:42):
So in a summary, the cache control ensures the speed and accuracy of the content delivery, and the Content API the application increased at the GraphQL caching efficiency, and fundamentally decreased the amount of resources required for each elections data requests. The FAM is a API gateway built in natively in Fastly greatly reduce the transaction times between critical elections API. Finally, I want to shout out to Fastly Mission Control and our account team, without your help, we cannot deliver our accountant so smoothly and efficiently. Thank you all.