Data Retention Policies: How Long to Keep Data and How to Delete It Securely

Why Data Retention Policies Are a Security Issue

Organizations that retain data indefinitely accumulate risk. Every database row, log file, and backup archive represents potential breach impact. A customer who signed up in 2015 and deleted their account in 2016 whose PII is still in your database in 2025 is a victim in any breach between now and whenever you get around to deleting their data.

Data retention is simultaneously a legal obligation, a privacy right, and a security control. The regulatory landscape creates minimum retention floors (you must keep tax records for at least X years) and maximum retention ceilings (GDPR's storage limitation principle says personal data should not be kept longer than necessary). Navigating these requirements requires knowing what data you hold, why you hold it, and when you no longer need it.

Legal Retention Requirements by Data Type

Retention requirements vary by jurisdiction, industry, and data category. What follows covers the most common categories for technology companies.

Financial and Tax Records

United States: The IRS generally requires businesses to retain financial records for at least 3 years (the standard audit limitation period), 6 years if the IRS suspects substantial understatement of income, and 7 years for records related to bad debt deductions or worthless securities. Employment tax records must be kept for at least 4 years.

European Union: The EU Accounting Directive requires accounting records to be retained for at least 7 years. VAT records: 5–10 years depending on member state.

United Kingdom: HMRC requires business records to be kept for at least 5 years after the January 31st submission deadline for the relevant tax year.

Payment transaction records: PCI DSS requires transaction data to be retained for at least 12 months, with at least 3 months immediately available for analysis. Cardholder data (PANs, CVVs) must be purged as soon as it is no longer needed for business purposes.

Employment Records

US: The EEOC requires employment applications and related records to be retained for 1–3 years. Payroll records: at least 3 years (FLSA). OSHA requires certain medical and exposure records to be retained for 30 years.

EU/UK: Employment contracts and payroll records typically 6–7 years after employment ends. Medical records related to occupational health may require longer retention.

Healthcare and Medical Data

US (HIPAA): Medical records must be retained for at least 6 years from creation or the date when the record was last in effect, whichever is later. State law may impose longer requirements—California requires 7 years; some states require medical records be retained until the patient turns 27 (to cover minors who are treated close to age 18).

EU (GDPR + national law): Medical records are typically retained 10 years post-treatment under most EU member state laws, though this varies significantly.

Personal Data Under GDPR

GDPR does not specify retention periods—it requires that personal data be retained no longer than necessary for the purpose for which it was collected (Article 5(1)(e)). "Necessary" is determined by:

The original purpose of collection (service provision ends when the account is closed)
Legal obligations that require retention (financial records, litigation)
Legitimate interests that require retention (fraud prevention, security logs)

Practically, most GDPR-compliant organizations define retention periods by data category and publish them in their privacy notice. Common GDPR-compliant retention schedules:

Customer account data: retain for the duration of the relationship + 3–7 years for legal claims limitation period
Marketing data: retain until consent is withdrawn + reasonable time for preference propagation
Website analytics: typically anonymized or deleted within 13–26 months
Security and audit logs: 6–24 months (enough for incident investigation without indefinite accumulation)
Support tickets: 3–5 years after ticket closure

Security and Access Logs

Most security frameworks require log retention as follows:

PCI DSS: At least 12 months, with 3 months immediately available
SOC 2: Typically requires 90 days to 1 year depending on the criteria
HIPAA: Audit logs for 6 years
ISO 27001: Evidence of security controls typically retained for the duration of the certification plus one audit cycle

Legal Litigation Hold

When litigation is reasonably anticipated or has commenced, the normal retention schedule must be suspended for potentially relevant data. "Potentially relevant" is broadly interpreted by courts. A litigation hold:

Suspends automated deletion for covered data
Covers custodians (employees) who are likely to have relevant information
Applies to all storage media: email, Slack, databases, backups, cloud storage

Failing to preserve data subject to litigation hold is spoliation and can result in adverse inference instructions (the court telling the jury to assume the destroyed evidence was unfavorable to the party that destroyed it) or sanctions.

Practically: your data retention system must have a litigation hold mechanism that exempts specific data from automated deletion. Every deletion pipeline needs a "hold check" before executing.

Secure Deletion Methods

Deleting a file does not destroy the data—it marks the storage blocks as available for reuse. Until those blocks are overwritten, the data can often be recovered. Secure deletion ensures recovery is cryptographically infeasible.

On Magnetic Hard Drives (HDD)

The traditional standard is the DoD 5220.22-M method: overwrite with a fixed pattern (e.g., all zeros), then the complement (all ones), then a random pattern, then verify. Modern research suggests that a single overwrite with random data is sufficient for current HDD technology (NIST SP 800-88 concurs), but three-pass overwrites remain common in regulated industries.

Tools: shred (Linux), Eraser (Windows), DBAN (Darik's Boot and Nuke) for full disk erasure.

# Securely delete a file on Linux (3 passes)
shred -vzu -n 3 /path/to/sensitive-file

# Securely wipe free space (to sanitize previously deleted files)
shred -vz /dev/sda  # CAUTION: entire drive

On Solid State Drives (SSD)

SSDs are fundamentally different from HDDs. Wear leveling algorithms spread writes across the entire drive to extend lifespan, meaning a file overwrite command may write to different physical cells than the original data. The original data may remain in cells that wear leveling has retired.

For SSDs, the correct approach is:

ATA Secure Erase command: Built into SSD firmware, triggers a manufacturer-implemented erasure procedure. Reliable but must be issued via BIOS or specialized tools (hdparm on Linux, Samsung Magician for Samsung drives).
Cryptographic erasure: Encrypt the entire drive with a strong key, then destroy the key. The data on the drive is cryptographically inaccessible. This is the approach used by iOS and Android factory reset—the data remains but the key is gone.
Physical destruction: For drives leaving organizational control (decommissioned servers, leased hardware), degaussing (ineffective for SSDs) is replaced by shredding. Use a NAID AAA-certified media destruction service for auditable chain-of-custody destruction.

In Cloud Environments

Cloud storage complicates secure deletion because:

Data may be replicated across multiple availability zones
Object storage (S3, GCS) does not expose the underlying storage blocks
Snapshots and versioning may retain deleted data
Backup systems may archive deleted data before the deletion propagates

The practical approach for cloud environments is cryptographic erasure:

Generate a unique data encryption key (DEK) per record or per data category
Encrypt data at rest using the DEK
Store the DEK in a key management service (AWS KMS, GCP Cloud KMS, HashiCorp Vault)
To "delete" data: delete the DEK from the KMS

Without the DEK, the encrypted data in S3 or the database is cryptographically inert. This approach handles:

Object storage where physical overwrite is impossible
Database backups (the backup contains encrypted data but not the key)
Data residency across AZs (all copies are encrypted with the same key)

Additional steps for cloud deletion:

Disable S3 versioning or explicitly delete all versions
Delete snapshots and automated backups after the retention period
Remove data from CDN caches (CloudFront invalidation, etc.)
Check log aggregation pipelines that may have captured data

Database Deletion

Hard deletes remove rows from the database. Soft deletes mark rows as deleted_at: timestamp while preserving them in the database—useful for audit trails but incompatible with legal erasure obligations.

For GDPR right-to-erasure compliance, you cannot rely on soft deletes alone. Options:

Hard delete after the retention period
Anonymize the record by overwriting PII fields with null or random values, preserving the row skeleton for statistical/analytical purposes
Pseudonymize by replacing PII with a consistent token, then destroying the mapping table when erasure is required

Anonymization is typically more practical than deletion for analytics systems where you want to preserve aggregate patterns but not individual identities.

Retention Automation

Manual retention enforcement is unreliable. Automated systems run consistently; humans forget or deprioritize "delete old data" in favor of feature work.

Retention Schedule Design

Define a retention schedule as a structured policy:

{
  "data_categories": [
    {
      "category": "customer_pii",
      "models": ["User", "Address"],
      "retention_days_after_account_closure": 1095,
      "deletion_method": "anonymize",
      "legal_basis": "legitimate_interests_limitation_period"
    },
    {
      "category": "auth_logs",
      "models": ["AuthEvent"],
      "retention_days": 365,
      "deletion_method": "hard_delete",
      "legal_basis": "security_monitoring"
    },
    {
      "category": "payment_records",
      "models": ["Transaction"],
      "retention_days": 2555,
      "deletion_method": "archive_then_delete",
      "legal_basis": "financial_regulation"
    }
  ]
}

Automated Deletion Jobs

Implement cron jobs or scheduled functions that enforce the retention schedule:

// Example: Nightly retention enforcement job
async function enforceRetention() {
  const cutoffDate = new Date();
  cutoffDate.setDate(cutoffDate.getDate() - 365);

  // Hard delete old auth events
  const deletedCount = await AuthEvent.deleteMany({
    createdAt: { $lt: cutoffDate },
    legalHold: { $ne: true }  // Always check for legal hold
  });

  logger.info('Retention enforcement completed', {
    model: 'AuthEvent',
    cutoffDate,
    deletedCount: deletedCount.deletedCount
  });
}

Key design requirements:

Legal hold check: Never delete records flagged for legal hold
Idempotent: Safe to run multiple times
Auditable: Log every deletion action with counts, cutoff dates, and operator identity
Batch size limited: Delete in batches (e.g., 10,000 records at a time) to avoid database lock contention
Alerting: Alert on unexpected zero-deletion runs (might indicate the job is misconfigured) and on unexpectedly large deletion counts

GDPR Erasure Request Workflow

Right-to-erasure requests under GDPR Article 17 must be fulfilled within 30 days. Build a workflow:

User submits erasure request via privacy settings or email
System validates the request (confirm identity)
Check for legal grounds to refuse (outstanding debt, ongoing litigation, legal retention obligation)
If no grounds to refuse: trigger deletion/anonymization across all systems
Confirm completion to the user within 30 days
Log the request and the fulfillment action

This must cover all systems: primary database, backups (mark for deletion and let scheduled backup purge handle it), analytics systems, email marketing platforms, CRM, support ticketing. Maintain a data map so you know where every category of personal data is stored.

Data retention is unglamorous infrastructure work, but the organizations that do it well have smaller breach impact, smoother regulatory audits, and—critically—a more trustworthy relationship with the users whose data they hold.