Secrets Scanning: Detecting API Keys, Tokens, and Passwords in Code

Hardcoded secrets in source code and git history are one of the most common and most damaging security findings. The 2023 GitGuardian State of Secrets Sprawl report found 10 million secrets exposed in public GitHub repositories in a single year — a number that represents only the public surface. Private repositories, internal tooling, and developer laptops contain a much larger volume of exposed credentials that never show up in statistics.

The challenge with secrets is that they are easy to introduce accidentally and extraordinarily difficult to remove once committed — git history persists even after file deletion, and the secret may already have been cloned, forked, or mirrored. This guide covers the tools for detecting secrets before and after they reach a repository, and the process for responding when detection fails.

Why Secrets End Up in Code

Understanding the common paths to accidental secret exposure helps you target your defenses:

Convenience during development: Developers hardcode credentials to avoid setting up proper environment variable injection during local development. The .env file gets added to .gitignore, but the secret-containing config file does not.

Copy-paste from documentation: Internal documentation, Confluence pages, and Slack messages frequently contain real credentials in code examples. These get copied into codebases.

Test fixtures and seed data: Real credentials are used in test fixtures or database seed files because they are "just for testing." The test database is production.

Debugging: console.log(process.env) or a debugging snippet that prints configuration is committed and forgotten.

Misconfigured .gitignore: .env.local is in .gitignore but .env.production is not. Or the .gitignore is added after the secrets file has already been tracked.

CI/CD pipelines: Secrets are printed in CI logs for debugging and then stored in log archives accessible to all team members.

detect-secrets (Yelp)

detect-secrets is an open-source Python tool designed for use as a pre-commit hook and in CI pipelines. It maintains a baseline file (.secrets.baseline) that records known false positives, allowing the tool to alert only on new secrets.

Installation:

pip install detect-secrets

# Or via pre-commit
pip install pre-commit

Initialize a baseline (run this in an existing repository to acknowledge existing findings):

detect-secrets scan > .secrets.baseline

Review the baseline file and audit each finding. For confirmed false positives (test tokens, placeholder values), mark them as false positives in the baseline:

detect-secrets audit .secrets.baseline

Pre-commit hook configuration (.pre-commit-config.yaml):

repos:
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.5.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']
        exclude: |
          (?x)^(
            package-lock\.json|
            yarn\.lock|
            \.secrets\.baseline
          )$

Install the hooks:

pre-commit install

After installation, every git commit runs detect-secrets. If a new secret is found that is not in the baseline, the commit is blocked:

Detect secrets...........................................................Failed
- hook id: detect-secrets
- exit code: 1

ERROR: Potential secrets about to be committed to git repo!

Secret Type: Secret Keyword
Location:    src/config.ts:15

Please follow the steps below to remove the secret:
1. Run 'git checkout -- src/config.ts'
2. If the false positive, add it to .secrets.baseline with 'detect-secrets audit .secrets.baseline'

CI integration:

# .github/workflows/secrets-scan.yml
- name: Check for new secrets
  run: |
    pip install detect-secrets
    detect-secrets scan --baseline .secrets.baseline
    if [ $? -ne 0 ]; then
      echo "New secrets detected!"
      exit 1
    fi

Supported detectors: detect-secrets ships with detectors for AWS keys, Azure storage keys, base64 high-entropy strings, basic auth passwords in URLs, cloud CKV keys, GitHub tokens, hex high-entropy strings, JWTs, keywords (password, secret, api_key, etc.), private keys (PEM), Slack tokens, Stripe keys, and more.

TruffleHog

TruffleHog is designed for deep git history scanning — it traverses every commit in a repository's history and scans all changed files for secrets. It is particularly valuable for finding secrets that were committed and "deleted" (deletion in git does not remove from history).

TruffleHog v3 uses a combination of regular expression matching, Shannon entropy analysis, and direct verification against provider APIs. Verification is TruffleHog's key differentiator — it actively tests detected credentials to confirm they are valid and active, drastically reducing false positives.

Installation:

# Homebrew
brew install trufflehog

# Docker
docker pull trufflesecurity/trufflehog:latest

# Go
go install github.com/trufflesecurity/trufflehog/v3@latest

Scanning a local repository:

# Scan entire git history
trufflehog git file://. --only-verified --json

# Scan with specific branch
trufflehog git file://. --branch main --only-verified

# Scan a remote repository
trufflehog github --repo https://github.com/your-org/your-repo --only-verified

The --only-verified flag returns only secrets that TruffleHog has confirmed are active by calling the provider's API. This is the right default for finding immediately actionable issues.

GitHub Actions integration:

# .github/workflows/trufflehog.yml
name: TruffleHog Secret Scan

on:
  push:
    branches: [ main, develop ]
  pull_request:

jobs:
  trufflehog:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      with:
        fetch-depth: 0  # Fetch full history for complete scan

    - name: TruffleHog OSS
      uses: trufflesecurity/trufflehog@main
      with:
        path: ./
        base: ${{ github.event.repository.default_branch }}
        head: HEAD
        extra_args: --only-verified

When run on a PR, TruffleHog scans only the diff between the base branch and HEAD, making it fast enough for CI use.

Scanning S3 buckets and other sources: TruffleHog supports scanning beyond git repositories:

# Scan an S3 bucket
trufflehog s3 --bucket my-backup-bucket --only-verified

# Scan Syslog
trufflehog syslog --address tcp://127.0.0.1:514

# Scan GitHub organization
trufflehog github --org your-org --only-verified

GitGuardian

GitGuardian is a commercial SaaS product built specifically for secrets detection at scale. It monitors your GitHub (or GitLab/Bitbucket) organization in real time, detecting secrets in new commits within seconds of push. Its key advantages over open-source tools are:

Real-time monitoring: Detection within seconds of a push, before the developer has moved on
Historical scanning: Scans your entire git history on first integration
Developer notifications: Alerts the committing developer directly via email or Slack
Incident management: Tracks each exposed secret as an incident, with status (new, assigned, resolved)
False positive management: Extensive pattern library tuned for low false positive rates
Internal secret detection: Detects generic high-entropy strings and patterns beyond named provider tokens

Integration: Connect via GitHub App. The app requests read access to your repositories and registers a webhook to receive push events. No sensitive permissions are required.

The GitGuardian workflow for a new finding:

Developer pushes a commit containing an AWS access key
GitGuardian detects it within seconds
The committing developer receives an email: "Potential secret detected in commit abc123"
Security team receives a Slack notification
The incident dashboard shows the finding with: severity, file path, commit, which developer, whether the key is valid (live validation against AWS)
Developer rotates the key, removes it from the repository, cleans history
Incident marked as resolved

GitGuardian for internal monitoring: The self-hosted option (gitguardian.com/vms) allows scanning internal code, documents, and logs that cannot be sent to a third party. Appropriate for highly regulated environments.

GitHub Secret Scanning (Built-in)

For repositories on GitHub, the platform's native Secret Scanning is the lowest-effort baseline. It scans every push and historical content in real time using patterns provided by GitHub and 200+ partner providers. Partners (including AWS, Google, GitHub, Stripe, Twilio) receive alerts when their token format is detected and can immediately invalidate the exposed credential before it is exploited.

Details on configuration are covered in the GitHub Security Features article. For this guide, the key operational point is: enable it for all repositories, period. For private repositories this requires GitHub Advanced Security, but the cost is justified by the continuous protection it provides.

Handling a Confirmed Exposed Secret

When a secret is confirmed exposed (either detected as active/verified or observed in logs with no evidence of compromise yet), the response must be immediate:

Step 1: Rotate immediately (this takes priority over everything else)

The secret is exposed from the moment it was first committed, not from when you discovered it. Minutes matter. Rotate the credential before investigating:

AWS: IAM > Users > Security credentials > Access keys > Deactivate and create new
GitHub: Developer settings > Personal access tokens > Revoke
Stripe: Dashboard > Developers > API keys > Roll key
Generic: Call the service's API or use its dashboard to invalidate the key

Step 2: Determine exposure scope

Review access logs for the exposed credential for the period between the first commit date and rotation:

# AWS CloudTrail — find API calls using the exposed access key
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=AccessKeyId,AttributeValue=AKIAIOSFODNN7EXAMPLE \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-12-31T23:59:59Z \
  --max-results 50

For other services, check their activity logs or audit trails. Look for:

Unexpected geographic locations
Unusual access times
Actions that your application should not be performing
Data exports or bulk reads

Step 3: Remove from git history

Removing from git history is important to prevent future exposure, but it does not protect against anyone who cloned the repository during the exposure window. History rewriting is for hygiene, not security remediation.

# Using git-filter-repo (preferred over BFG)
pip install git-filter-repo

# Replace all occurrences of the secret value
git filter-repo \
  --replace-text <(echo "ACTUAL_SECRET_VALUE==>PLACEHOLDER")

# Force-push the rewritten history
git push --force --all

After history rewriting:

Notify all repository collaborators — their local clones contain the old history and must be re-cloned or carefully rebased
If the repository is public, contact GitHub support to purge cached views of the file containing the secret
If the repository was forked, the fork retains the old history — contact GitHub to remove publicly accessible forks if necessary

Step 4: Post-incident controls

After an incident, implement controls to prevent recurrence:

Add detect-secrets as a pre-commit hook for all developers
Enable Push Protection on GitHub
Add the specific secret type to your GitGuardian or TruffleHog monitoring
Run a full history scan to identify any other secrets that may be lurking
Add a CI check that blocks merges if new secrets are detected

The goal is for secret exposure incidents to be caught within minutes of introduction — ideally at pre-commit time, and at worst by automated monitoring within seconds of push. Manual discovery weeks or months after the fact should not be how you learn about exposed credentials.