GlobalSign TLS certificate revocation errors
October 18, 2016
On October 13, 2016 around 11:10am GMT, users visiting websites using GlobalSign TLS certificates, including some hosted by Fastly, started experiencing TLS certificate validation errors. This issue was caused by incorrect certificate revocation information published by our certificate vendor, GlobalSign.
This security advisory describes the root cause of this issue, and describes the actions Fastly has taken to limit customer impact.
Specific website visitors may have seen certificate validation errors, such as “NET::ERR_CERT_REVOKED” when accessing specific websites using GlobalSign certificates.
This incident affected Fastly customers, given several of our TLS options involve Fastly customers using certificates procured for them by Fastly from our TLS certificate authority vendor GlobalSign. For instance, customers using Fastly shared certificates automatically use certificates issued through GlobalSign. Customers who use our Customer Certificate Hosting or SNI customer certificate hosting with certificates issued by a vendor other than GlobalSign were not affected by the issue.
Errors would have been inconsistent and only affected a small number of users, who received incorrect OCSP responses. The issue may have subsisted for customers for up to four days when their local machine cached the OCSP entry. Fastly reviewed our traffic levels at the time of the incident, and did not observe a notable decrease in traffic across its customers.
Around October 7, 2016, Certificate Authority GlobalSign removed a cross-certificate and issued a Certificate Revocation List (CRL) listing this revocation. Due to a technical error on their end in compiling Online Certificate Status Protocol (OCSP) responses, starting October 13, 2016, their OCSP responder returned inaccurate responses for a number of intermediate certificates. This led some browsers to infer that several intermediate certificates issued by GlobalSign had been revoked.
This information was propagated through the Online Certificate Status Protocol (OCSP) responders of GlobalSign, and in some cases cached by an intermediate, non-Fastly CDN and on client systems.
These inaccurate OCSP responses led some web users to experience certificate validation errors from websites leveraging GlobalSign certificates, including many Fastly hosted services. The errors resulted in users having to accept certificate failures prior to being able to access the target website, or prevented site access.
Errors would have been inconsistent and limited to a subset of users, as not all browsers validate OCSP prior to allowing access to a web site, and the issue did not affect other mechanisms to signal validity, such as Certificate Revocation Lists (CRL).
Our mitigation and response
Once Fastly was informed of the issue, we contacted GlobalSign. GlobalSign investigated, determined the root cause, and addressed the issue by removing the incorrect OCSP responses.
However, by that time incorrect responses were cached at a number of levels, including the local OCSP cache, part of the operating system. Responses are commonly stored up to the final validity of the OCSP responses; GlobalSign issues OCSP responses with a validity of four days, which means that once received by a client, the client will deem them to be valid for a full four days.
GlobalSign’s OCSP responder used a CDN other than Fastly which may have cached responses and resulted in their responder returning failures even after the root cause was addressed. In addition, due to the caching behavior of client operating systems, some client machines that had accessed websites using a failed OCSP response continued to cache that response. This resulted in clients on those machines not being able to access affected web sites even after the issue was addressed by GlobalSign.
While users can flush the OCSP cache on their machine manually, and GlobalSign had made available guidance for them on how to do so, this went beyond the technical capability of most end users. In addition, the workarounds provided by GlobalSign were not effective in all situations.
Fastly did not consider this sufficient mitigation to be passed along to our customers’ end users. After careful investigation, Fastly offered customers another option to address issues:
- Customers with a dedicated map could roll to a new certificate from a GlobalSign intermediate which was not affected by the issue. Since the cached entry would apply only to the old certificate, this fully addressed the issue for those customers;
- Customers on a shared certificate could move to additional shared endpoints with similar newly issued certificates. Customers who contacted us reporting issues were offered to be moved onto the new mapping.
Fastly was unable to take action independently from our customers to address the root cause, as we recognize that some customers leverage certificate pinning (see “More information,” below) in their client application. Due to this, we could not immediately roll existing customers to a new intermediate certificate without customer acknowledgement.
Fix / workarounds
Revocation errors should disappear after the expiry of the OCSP response lifetime, four days after the original incident. Customers who migrated to the updated maps provided by Fastly would have seen the issue mitigated shortly after their move.
Fastly recognizes that customers rely on third-party certificate authorities as well as on the CDN to successfully accept and deliver user traffic. As an outcome of this event, we are working on the following remediation and mitigation steps:
- Ensuring our incident response plans take into account third-party failures within TLS certificate delivery as a potential reason for a customer outage. With this event, we have put in place the necessary infrastructure to rapidly migrate across TLS hierarchies, and we will continue to put in place additional contingency plans with regards to these third-party vendors;
- Fastly’s status page at status.fastly.com used a GlobalSign certificate and may have been similarly affected during the incident. We are reviewing our status messaging mechanisms to ensure they do not rely on critical services that would directly affect delivery of content through the CDN, in this case the same Certificate Authority; and
- Evaluate with GlobalSign opportunities to increase the reliability of providing OCSP responses for GlobalSign certificates to Fastly customers.
We are working with our vendor GlobalSign to ensure plans are put in place to mitigate future events related to certificate issuance and revocation.
Background on certificate validation and revocation checking
When a browser connects to a website and evaluates an X.509 certificate, the browser typically wants to ensure that the certificate is still valid. In order to support this, X.509 allows the certificate authority to confirm validity in a number of different ways. We include reference explanation below, as it helps explain why specific clients were or were not affected by issues resulting from this incident. In principle, most issues would have been seen by users whose browser or operating system performed interactive OCSP requests:
- CRL: Certificate authorities publish Certificate Revocation Lists (CRLs), which are files that contain lists of certificates which have been revoked by the Certificate Authority. Most browsers do not interactively download the CRL for each request, as it is typically large in size and downloading and validating it upon each connection would have significant performance implications.
- CRLSets: Google Chrome uses CRLSets as a mechanism to quickly block certificates. Chrome generally does not perform interactive OCSP and CRL checks, though specific operating system libraries may perform these checks on a system using Chrome to access a webpage. Google will crawl CRL lists published by Certificate Authorities, extracting those of value, and making them available for automated download by Chrome browsers.
- OCSP: Certificate Authorities also enable an Online Certificate Status Protocol (OCSP) responder, which can be interactively queried by a browser to validate whether the certificate is still valid. Many browsers interactively perform an OCSP check when validating a TLS certificate. When a successful OCSP response is received, the result — whether positive (valid) or negative (invalid) — is cached in a local OCSP cache part of the operating system. Responses are commonly stored up to the final validity of the OCSP responses, which in GlobalSign’s case was four days.
- OCSP stapling: OCSP responses can also be delivered by the web server through a mechanism called OCSP stapling. In this case, the confirmation of validity, signed by the Certificate Authority, is retrieved by the web server and “stapled” during the TLS handshake. OCSP stapling has both performance and security benefits, as it no longer requires a separate connection to the OCSP responder of the Certificate Authority. Stapling is a best practice for certificate revocation and is deployed across the Fastly fleet.
Browsers may, in addition to validity checking, also check whether a certificate is valid for a specific site through Public Key Pinning. This mechanism consists of the client application, whether an app or a browser, validating whether the root, intermediate, or end-entity certificate of a service are to be expected. Pins can be hardcoded in the client, or distributed through the Public Key Pinning Extension for HTTP (HPKP). Pinning is a valuable and commonly used security feature that reduces the risk of a Certificate Authority being subverted to issue an otherwise valid certificate for a service. However, Pinning also may reduce the flexibility of a website to rapidly move to another certificate hierarchy. In this incident, the possible use of pinning by our customers limited Fastly’s ability to automatically and transparently migrate all customers to another hierarchy.
Globalsign Incident Report
GlobalSign has published an incident report with information on their incident response at https://www.globalsign.com/en/customer-revocation-error/. This document contains further information on the steps taken by GlobalSign to prevent recurrence of this type of incident.