Secrets, such as private keys or API tokens, are regularly leaked by developers in source code repositories. This often happens because DevOps tools require credentials to deliver the full benefits of automation. In this session, we look at ci_secrets, one tool for automating detection of leaked secrets in a DevOps-friendly way. We'll examine its strengths and weaknesses and discuss why it's important to understand the DevOps pipeline to effectively secure it.
Phillip Marlow is a cybersecurity and DevOps engineer. He helps organizations understand how to adopt DevOps practices to increase their security rather than sacrifice it in the name of speed. He is also the author of ci_secrets, a DevOps tool for detecting leaked credentials in source code.
One developer was working on his code demonstrating DevOps security techniques. This person isn’t new to security, he’s spoken at Black Hat and owns a security company. One night he gets an email from Amazon that his AWS access key has been published on GitHub and he needs to delete or rotate the key or his account will be suspended. It turns out he had inadvertently published his AWS keys when checking in a test file. These kinds of mistakes happen all the time. In his case, it cost him $500 in fraudulent charges. In other cases, it could be a lot worse. (Mogull, 2014)
DevOps tools often require credentials in order to deliver the full benefits of automation. But if developers are untrained, careless, or simply not vigilant enough, these credentials can end up being leaked on public source code repositories. In this presentation, ci_secrets is shown to be a DevOps-friendly way to identify leaked secrets and notify the project team so that the incident can be handled appropriately.
DevOps is a great enabler for the business. Automatic deployments can deliver value faster than traditional roll out of software and collaboration tools like Slack can deliver feedback to developers faster than traditional reports. However, to enable the software to do all these things for you requires the software to have the appropriate credentials. And if those credentials aren’t properly protected, they become another opportunity for the adversary to gain a foothold or pivot within your environment. For example, when AWS keys are leaked, it may result in fraudulent charges when a cryptominer is created in your environment. Or if your Slack API token is found, it might be used to read your company’s internal communication, perhaps even finding other credentials that have been shared between developers on Slack.
Many developers don’t know how to properly protect credentials within a DevOps pipeline. We know this because researchers keep finding these credentials stored on sites like GitHub or GitLab. The types of secrets that are being leaked vary from private key files to Slack API tokens to AWS keys, but they are often easy for researchers, or malicious actors, to find. In the recent study from NC State, researchers found the median time to discovery for a newly leaked secret was 20 seconds (Meli, McNiece, & Reaves, 2019). That doesn’t provide the developers and defenders much time to mitigate a leak before it potentially becomes a problem.
DevOps moves too quickly to have a human reviewer check for leaks. Several tools have been developed to identify leaked credentials. For example, truffleHog hunts secrets stored within the repository history and git-secrets which is a git hook which prevents secrets from being published to a remote repository. However, to truly integrate with the DevOps workflow, project staff need a tool which integrates into the CI/CD pipeline. This ensures that running the tool can be enforced across every contributor rather than relying on each developer to maintain the tool within their environment. By not requiring any external resources such as a persistent server or storage, the tool is easier to integrate and it minimizes any cost concerns that might put a tool out of reach of small businesses or open source projects.
ci_secrets is a Python tool which searches git commit history for leaked secrets. It can be easily run and configured from the command line which makes it easy to integrate with existing CI systems. It also provides logging and exit codes in the standard format to fail a build and trigger standard notifications when a leaked secret is identified. Finally, it also uses a plugin framework to allow new methods for identifying secrets to be easily integrated when necessary. These plugins are reusable across ci_secrets and Yelp’s detect-secrets project.
ci_secrets scans the changes introduced by each commit to the repository individually. For each commit, it runs the selected plugins to identify various types of secrets including private key files and AWS keys. If a secret is discovered, it logs a hash of that secret and identifies the commit at which it was found. This makes responding to the incident easier as it identifies both what was compromised and where. If more than one secret is discovered, ci_secrets logs each and prints a cumulative count to ensure responders do not miss any leaked credentials. Finally, if anything was found, it will fail the build which sends standard notifications and alerts development teams to a problem.
To test that ci_secrets actually finds leaked credentials, three common scenarios were tested:
Adding an AWS credentials file to the repository and publishing it.
Adding an AWS credentials file to the repository and publishing it in the middle of a sequence of commits.
Adding an AWS credentials file to the repository, removing it in a new commit, and publishing those commits.
For each test, the build resulting from the publication of leaked credentials was expected to fail when running ci_secrets. This indicated that ci_secrets was successful in identifying a leaked credential. In addition, a merge request that contained a leaked credential was expected to fail, also indicating that ci_secrets had successfully identifying the leak in the branch to be merged.
Consider the commit history presented. A new branch, Branch C, has been published and its most recent commit is 22a. If the CI system does not tell you the source of the new branch, and therefore the most recently scanned commit, how can ci_secrets determine this? It might look at de9 since it is the most recent ancestor that has a named branch, but this is not guaranteed to make it the start of the branch. It is possible that both commits de9 and 22a were published at the same time and the true source is actually commit 42b. Unless the CI system or the developer specifies differently, ci_secrets will assume that without knowing the latest scanned commit, it should scan only one commit back. This is usually fine since a new branch will often start with only a single commit, but this is a limitation of the tool and it will print a warning message if this case is encountered.
In addition to checking that the credentials were correctly identified during their publication, each test scenario also included a test to see if the credential was identified during the merge or pull request build. This is important as a backup if the leak is not resolved immediately upon publication. It is also helpful to surface the issue at that time because it is a common time for a human reviewer to see and attempt to resolve any issues before the request is approved. This also helps to mitigate the issue with identifying the source of the branch discussed on the previous slide by double checking all the commits in a branch during the merge or pull request.
Consider the case where Branch B is being merged into Branch A. For this build, ci_secrets will scan commits de9, which is the latest scanned commit, and commit 1cb, its parent. It will stop at commit f2c which is the most recent common ancestor of both branches. It is assumed that it and any previous commits were scanned when they were published to Branch A and don’t represent any new changes. It is also assumed that commit 42b was scanned when it was published to Branch A. As you can see, this double checking is kept to the minimum number of commits to reduce the time to discovery of any leaked credentials. It also prevents the tool from becoming an impediment to the developer’s workflow. However, to maintain complete coverage of the repository, the tool must be run for every published commit or else some changes, and therefore leaked credentials, may be missed.
ci_secrets is not a silver bullet for the issue of leaked credentials in source code. It is an excellent detection tool, but does nothing to prevent secrets from being leaked in the first place. At this time it is also only able to scan Git repositories, so users of other source control tools such as Subversion or Mercurial can’t use ci_secrets. While the use of a plugin framework is an overall strength because it allows adding new detection methods easily, ci_secrets does nothing to augment these detection abilities. So if a plugin has not been created and configured for the type of secret a user wants to detect, it won’t be found. Finally, as was discussed earlier, it is difficult to identify the beginning of a branch. Because of this, there are edge cases where ci_secrets may not find a leaked credential. Despite this, ci_secrets provides an improvement on the current standard practice of training developers to not commit secrets.
Using ci_secrets is fairly straightforward and it can be easily added to a CI script file. The two most important configuration options are the --since flag and the --includesMergeCommit flag. The --since flag identifies the most recently scanned commit. This ensures that ci_secrets scans far enough back into the history, while not re-scanning anything to prevent duplicate work extending the time to run. This commit ID is often provided by the CI system, although it may be named differently depending on the specific system. The --includesMergeCommit flag is needed only when a CI system creates a prospective merge commit when running a build for a pull request. If it is present, ci_secrets does not stop scanning at the merge request if it is also the most recent commit. Finally, ci_secrets also supports standard logging levels if needed to verify it is running as expected. It will log found secrets at all log levels.
ci_secrets reads the configuration for which plugins to use from a file named .ci_secrets.yml. This file lists which plugins to use and any initialization parameters required. This allows users to configure different settings depending on the needs of the project. For example, the AWS plugin might be excluded by a project that does not use AWS.
ci_secrets and usage examples are available on both GitLab and GitHub. Published releases are available on PyPi for easy download and use.