One common challenge many software development teams encounter is verifying certificates and completing SSL handshakes. This issue manifests in multiple places:
- Custom or COTS code fails to connect to external APIs
- Dependency management fails to trust https repositories
- External partners fail to connect to internally developed services without disabling certification validation
According to Stack Overflow, this isn't a new problem. While the last decade has shown technologies come and go, questions about keystores and x509 certificates are a constant source of trouble for many development teams.
This post provides a short overview of how digital trust works, common issues, and some pointers about how to resolve these issues.
Digital Trust Overview
Like in our analogue world, in the digital world we have a regular need to establish and verify identity. How do we know that when we type in google.com into our browser, that we're really connected to Google? And how do we know that when we connect to a new domain we're not talking to an imposter or man-in-the-middle?
Verifying digital trust leverages the concepts of digital signatures and chains of trust. Digital signatures use cryptography to prove that a certificate was really signed by the signatory on the certificate. Unlike signatures on a check, these can't be easily forged. Certificate chains allow us to use the transitive property: if I trust Alice and Alice trusts Bob, I trust Bob. Similarly, if I trust a given certificate authority (CA) and it trusts an intermediate CA, I will trust the intermediate CA. Extending that rule allows me to trust any certificate signed by the intermediate CA.
The original set of certificates that you trust typically has anywhere from 120 to 150 certificates. A quick glance at my Windows 10 trust store (run
certmgr.msc and look in the Trusted Root Certification Authorities folder) contains 115. My AdoptOpenJDK distro has 138 (with version jdk-188.8.131.52 and using
keytool.exe -list -keystore -cacerts | grep fingerprint | wc -l with the password
changeit to get a quick count). Eclipse's Temurin version 17.0.2_8-jre-centos7 ships with 131. Amazon's latest Corretto JDK ships with 142 (in
/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt). Firefox comes with its own trust store and seems to have a similar number (counting them is tedious).
Your OS, JVM, or application trusts every certificate in its trust store and any certificate that it signs or is signed by a certificate it signs. (This doesn't apply to certificates that are no longer valid because they have passed their validity date or have been revoked.) If, while establishing a TLS connection, your application fails to validate the certificate it is nearly always because the certificate doesn't provide a chain to a known trust anchor.
A Simple Example
It may help to provide a simple example here of how chains work in practice. Take the certificate chain for google.com:
In this case, our operating system includes a certificate with the friendly name
GlobalSign Root CA - R1 and a Subject Key Identifier of
607b661a450d97ca89502f7d04cd34a8fffcfd4b in its list of trust anchors. Because of that, we can also trust
GTS Root R1 and because that is trusted, we'll also trust
GTS CA 1C3. Finally, because www.google.com presents a certificate for *.google.com that is issued by GTS CA 1C3, we can have confidence that the endpoint we're connected to really is owned by Google and not some Blue Coat Proxy or a nefarious man-in-the-middle attacker who pretends to be Google. At each level of trust, our application or operating system uses secure cryptographic libraries to verify that the certificates were signed by the stated issuer.
There are a few common indicators that you probably have a trust problem with a service provider. As I write this, I find over a dozen questions asked on Stack Overflow within the last week and they include these symptoms:
- SSLHandshakeException or PKIX path building failed. A Java application fails to connect to a remote service. While the endpoint may include self-signed certificates, that isn't always the case. Fundamentally, however, a self-signed certificate is no different from any other untrusted certificate.
- Curl fails with an error message of CERTIFICATE_VERIFY_FAILED
- SSL certificate problem: self-signed certificate in certificate chain (maybe)
- Gradle can’t download dependencies and
--debugreveals that gradle doesn’t trust some repositories
Each of these messages from Java, an OS command, Node, or Python point to the same problem. The application doesn't know that it is ok to trust the endpoint.
Fixing the Problem
There are typically two kinds of problems that can exist when you find the issues listed above:
- The service provides a certificate that is not anchored by a trusted root CA
- The certificate does chain up to a trusted anchor but the certificate chain that the server presents is incomplete.
You will see the first issue whenever you deal with a self-signed certificate or a certificate that is issued by a non-public CA. Neither of these are fundamentally different from any other CA. However, no popular set of trust anchors include those non-public certificates. In this scenario, you can solve your connection issues by adding the self-signed certificate to your trust store (technical details below).
The second case is also quite common. In the example above for google.com, Google's servers present three certificates. You can see this by running
openssl s_client -connect www.google.com:443. If the google.com server only presented the leaf certificate for *.google.com, your application would be unable to verify its authenticity because it doesn't have the public key (needed for cryptographic verification) for the GTS CA 1C3 CA. (While Windows can use the certificate Authority Info Access extension to find the intermediate certificates, Linux and Java, along with most application frameworks will not fill in the missing pieces by default.) The solution here is to make sure that the endpoint server offers the entire certificate chain up to though not necessarily including its trust anchor.
This concludes the technical overview. The sections below provide technical details for various steps described briefly above.
Adding Trusted Certificates
Every platform has a unique method for adding trusted certificates. As a general rule, never add a leaf certificate (i.e. for a specific URL). Instead, add a known CA that you or your organization consider to be a trust anchor. This should generally be a self-signed certificate that is securely managed.
- Java: you can a certificate to your cacerts file using:
keytool -import -cacerts -file <your-pem-file>. Add the
-keystoreparameter to point a specific keystore file that isn't a cacerts file.
- Java: you can use an entirely custom trust store that replaces the existing one using JVM parameters: -
- Linux: While various Linux distros differ in certificate management, the basic idea is that you add a new CA in a known location and prompt the OS to refresh its list of trust anchors through a privileged command. On Ubuntu, add a CA in PEM format to
/usr/local/share/ca-certificatesand then run
sudo update-ca-certificates. On CentOS, you can copy the file to
- Python and Linux commands typically rely on the trust anchors offered by the operating system. Each, however has a way to trust additional CAs. For example, with curl, you can use the
--cacertflag to ask curl to use a given set of trust anchors.
- Node: also uses the OS certificates but you can export
NODE_EXTRA_CA_CERTSto point to a file with additional trust anchors before running your node application.
- Windows: In a large organization, the Windows trust stores are defined by organizational policies and implemented and enforced using Group Policy Objects.
Adding Certificates to a Certificate Chain
It's important to note that the terms keystore and trust store are often used interchangeably. However, they do have a key distinction: while a trust store simply stores a set of trust anchors that will be used for a client to trust a service, a keystore contains a keypair used by a service to prove its identity to clients. All keystore formats allow for a private key and one or more certificates in a key pair. In the Google example above (google.com), it's keystore includes a private key along with the corresponding certificate for *.google.com. It also includes certificates for GTS CA 1C3, and GTS Root R1. It does not include the trust anchor certificate for GlobalSign Root CA since it would be unhelpful: if the client trusts GlobalSign Root CA, it already has a copy of it; if it doesn't, including it in the chain won't be a reason for it to trust it.
When creating a Java JKS file to be used as a keystore, make sure that the PEM file which will always include the private key and a corresponding public certificate, also includes the other certificates in the chain. After creating the JKS file, you can validate that it includes the certificates by running
keytool -list to see that the single entry has three certificates or whatever number are needed to chain up to the trust anchor.
Other Causes for Connection Failures
- Invalid Chain (mismatch): in order for you to trust certificate X, it needs to be signed by a CA you trust. Every certificate includes a signature from the issuing CA and the CA is identified using an Authority Key Identifier such as
59a4660652a07b95923ca394072796745bf93dd0, not a common name like GTS CA 1C3 CA. If a certificate for capitaltg.com is issued by a CA called Good CA with public key X, the chain for capitaltg.com must include a Good CA certificate with the same public key. A chain that instead includes the certificate for Good CA with public key Y will fail to validate by non-Windows clients.
- Certificate Expiration: while your leaf certificate may not have expired, check to make sure that intermediate certificates are still valid. Certificates that have expired are not valid even though they have valid replacements. Intermediate certificates can be replaced in the chain without invalidating the certificates they previously issued. That is true when a new certificate uses the same public and private key pair. One way to see that it by comparing the Subject Key Identifier for the old and new certificates.
- Subject Alternative Names: is a x590 extension that allows certificates to specify the DNS name for the servers instead of relying on having them in the common name. This allows certificates to be used for multiple host names and prevents some attacks. If you are connecting to 127.0.0.1 but the certificate is issued with a SAN for localhost, your client will note a mismatch between the certificate and the DNS entry used to reach the service. If you connect to a server as some.hostname.com, your client will fail to connect if a SAN for some.hostname.com is not defined, even if the Common Name is some.hostname.com.