Secrets Scanning: Detecting API Keys, Tokens, and Passwords in Code
A comprehensive guide to detecting secrets in source code and git history using detect-secrets, GitHub Secret Scanning, GitGuardian, and TruffleHog, plus a practical rotation workflow when a secret is confirmed exposed.
Hardcoded secrets in source code and git history are one of the most common and most damaging security findings. The 2023 GitGuardian State of Secrets Sprawl report found 10 million secrets exposed in public GitHub repositories in a single year — a number that represents only the public surface. Private repositories, internal tooling, and developer laptops contain a much larger volume of exposed credentials that never show up in statistics.
The challenge with secrets is that they are easy to introduce accidentally and extraordinarily difficult to remove once committed — git history persists even after file deletion, and the secret may already have been cloned, forked, or mirrored. This guide covers the tools for detecting secrets before and after they reach a repository, and the process for responding when detection fails.
Why Secrets End Up in Code
Understanding the common paths to accidental secret exposure helps you target your defenses:
Convenience during development: Developers hardcode credentials to avoid setting up proper environment variable injection during local development. The .env file gets added to .gitignore, but the secret-containing config file does not.
Copy-paste from documentation: Internal documentation, Confluence pages, and Slack messages frequently contain real credentials in code examples. These get copied into codebases.
Test fixtures and seed data: Real credentials are used in test fixtures or database seed files because they are "just for testing." The test database is production.
Debugging: console.log(process.env) or a debugging snippet that prints configuration is committed and forgotten.
Misconfigured .gitignore: .env.local is in .gitignore but .env.production is not. Or the .gitignore is added after the secrets file has already been tracked.
CI/CD pipelines: Secrets are printed in CI logs for debugging and then stored in log archives accessible to all team members.
detect-secrets (Yelp)
detect-secrets is an open-source Python tool designed for use as a pre-commit hook and in CI pipelines. It maintains a baseline file (.secrets.baseline) that records known false positives, allowing the tool to alert only on new secrets.
Installation:
pip install detect-secrets
# Or via pre-commit
pip install pre-commit
Initialize a baseline (run this in an existing repository to acknowledge existing findings):
detect-secrets scan > .secrets.baseline
Review the baseline file and audit each finding. For confirmed false positives (test tokens, placeholder values), mark them as false positives in the baseline:
detect-secrets audit .secrets.baseline
Pre-commit hook configuration (.pre-commit-config.yaml):
repos:
- repo: https://github.com/Yelp/detect-secrets
rev: v1.5.0
hooks:
- id: detect-secrets
args: ['--baseline', '.secrets.baseline']
exclude: |
(?x)^(
package-lock\.json|
yarn\.lock|
\.secrets\.baseline
)$
Install the hooks:
pre-commit install
After installation, every git commit runs detect-secrets. If a new secret is found that is not in the baseline, the commit is blocked:
Detect secrets...........................................................Failed
- hook id: detect-secrets
- exit code: 1
ERROR: Potential secrets about to be committed to git repo!
Secret Type: Secret Keyword
Location: src/config.ts:15
Please follow the steps below to remove the secret:
1. Run 'git checkout -- src/config.ts'
2. If the false positive, add it to .secrets.baseline with 'detect-secrets audit .secrets.baseline'
CI integration:
# .github/workflows/secrets-scan.yml
- name: Check for new secrets
run: |
pip install detect-secrets
detect-secrets scan --baseline .secrets.baseline
if [ $? -ne 0 ]; then
echo "New secrets detected!"
exit 1
fi
Supported detectors: detect-secrets ships with detectors for AWS keys, Azure storage keys, base64 high-entropy strings, basic auth passwords in URLs, cloud CKV keys, GitHub tokens, hex high-entropy strings, JWTs, keywords (password, secret, api_key, etc.), private keys (PEM), Slack tokens, Stripe keys, and more.
TruffleHog
TruffleHog is designed for deep git history scanning — it traverses every commit in a repository's history and scans all changed files for secrets. It is particularly valuable for finding secrets that were committed and "deleted" (deletion in git does not remove from history).
TruffleHog v3 uses a combination of regular expression matching, Shannon entropy analysis, and direct verification against provider APIs. Verification is TruffleHog's key differentiator — it actively tests detected credentials to confirm they are valid and active, drastically reducing false positives.
Installation:
# Homebrew
brew install trufflehog
# Docker
docker pull trufflesecurity/trufflehog:latest
# Go
go install github.com/trufflesecurity/trufflehog/v3@latest
Scanning a local repository:
# Scan entire git history
trufflehog git file://. --only-verified --json
# Scan with specific branch
trufflehog git file://. --branch main --only-verified
# Scan a remote repository
trufflehog github --repo https://github.com/your-org/your-repo --only-verified
The --only-verified flag returns only secrets that TruffleHog has confirmed are active by calling the provider's API. This is the right default for finding immediately actionable issues.
GitHub Actions integration:
# .github/workflows/trufflehog.yml
name: TruffleHog Secret Scan
on:
push:
branches: [ main, develop ]
pull_request:
jobs:
trufflehog:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0 # Fetch full history for complete scan
- name: TruffleHog OSS
uses: trufflesecurity/trufflehog@main
with:
path: ./
base: ${{ github.event.repository.default_branch }}
head: HEAD
extra_args: --only-verified
When run on a PR, TruffleHog scans only the diff between the base branch and HEAD, making it fast enough for CI use.
Scanning S3 buckets and other sources: TruffleHog supports scanning beyond git repositories:
# Scan an S3 bucket
trufflehog s3 --bucket my-backup-bucket --only-verified
# Scan Syslog
trufflehog syslog --address tcp://127.0.0.1:514
# Scan GitHub organization
trufflehog github --org your-org --only-verified
GitGuardian
GitGuardian is a commercial SaaS product built specifically for secrets detection at scale. It monitors your GitHub (or GitLab/Bitbucket) organization in real time, detecting secrets in new commits within seconds of push. Its key advantages over open-source tools are:
- Real-time monitoring: Detection within seconds of a push, before the developer has moved on
- Historical scanning: Scans your entire git history on first integration
- Developer notifications: Alerts the committing developer directly via email or Slack
- Incident management: Tracks each exposed secret as an incident, with status (new, assigned, resolved)
- False positive management: Extensive pattern library tuned for low false positive rates
- Internal secret detection: Detects generic high-entropy strings and patterns beyond named provider tokens
Integration: Connect via GitHub App. The app requests read access to your repositories and registers a webhook to receive push events. No sensitive permissions are required.
The GitGuardian workflow for a new finding:
- Developer pushes a commit containing an AWS access key
- GitGuardian detects it within seconds
- The committing developer receives an email: "Potential secret detected in commit abc123"
- Security team receives a Slack notification
- The incident dashboard shows the finding with: severity, file path, commit, which developer, whether the key is valid (live validation against AWS)
- Developer rotates the key, removes it from the repository, cleans history
- Incident marked as resolved
GitGuardian for internal monitoring: The self-hosted option (gitguardian.com/vms) allows scanning internal code, documents, and logs that cannot be sent to a third party. Appropriate for highly regulated environments.
GitHub Secret Scanning (Built-in)
For repositories on GitHub, the platform's native Secret Scanning is the lowest-effort baseline. It scans every push and historical content in real time using patterns provided by GitHub and 200+ partner providers. Partners (including AWS, Google, GitHub, Stripe, Twilio) receive alerts when their token format is detected and can immediately invalidate the exposed credential before it is exploited.
Details on configuration are covered in the GitHub Security Features article. For this guide, the key operational point is: enable it for all repositories, period. For private repositories this requires GitHub Advanced Security, but the cost is justified by the continuous protection it provides.
Handling a Confirmed Exposed Secret
When a secret is confirmed exposed (either detected as active/verified or observed in logs with no evidence of compromise yet), the response must be immediate:
Step 1: Rotate immediately (this takes priority over everything else)
The secret is exposed from the moment it was first committed, not from when you discovered it. Minutes matter. Rotate the credential before investigating:
- AWS: IAM > Users > Security credentials > Access keys > Deactivate and create new
- GitHub: Developer settings > Personal access tokens > Revoke
- Stripe: Dashboard > Developers > API keys > Roll key
- Generic: Call the service's API or use its dashboard to invalidate the key
Step 2: Determine exposure scope
Review access logs for the exposed credential for the period between the first commit date and rotation:
# AWS CloudTrail — find API calls using the exposed access key
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=AccessKeyId,AttributeValue=AKIAIOSFODNN7EXAMPLE \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-12-31T23:59:59Z \
--max-results 50
For other services, check their activity logs or audit trails. Look for:
- Unexpected geographic locations
- Unusual access times
- Actions that your application should not be performing
- Data exports or bulk reads
Step 3: Remove from git history
Removing from git history is important to prevent future exposure, but it does not protect against anyone who cloned the repository during the exposure window. History rewriting is for hygiene, not security remediation.
# Using git-filter-repo (preferred over BFG)
pip install git-filter-repo
# Replace all occurrences of the secret value
git filter-repo \
--replace-text <(echo "ACTUAL_SECRET_VALUE==>PLACEHOLDER")
# Force-push the rewritten history
git push --force --all
After history rewriting:
- Notify all repository collaborators — their local clones contain the old history and must be re-cloned or carefully rebased
- If the repository is public, contact GitHub support to purge cached views of the file containing the secret
- If the repository was forked, the fork retains the old history — contact GitHub to remove publicly accessible forks if necessary
Step 4: Post-incident controls
After an incident, implement controls to prevent recurrence:
- Add
detect-secretsas a pre-commit hook for all developers - Enable Push Protection on GitHub
- Add the specific secret type to your GitGuardian or TruffleHog monitoring
- Run a full history scan to identify any other secrets that may be lurking
- Add a CI check that blocks merges if new secrets are detected
The goal is for secret exposure incidents to be caught within minutes of introduction — ideally at pre-commit time, and at worst by automated monitoring within seconds of push. Manual discovery weeks or months after the fact should not be how you learn about exposed credentials.