Compliance

Data Classification Policy: How to Classify and Protect Sensitive Data

Learn how to implement a 4-tier data classification model with labeling, handling requirements, and DLP controls to protect sensitive data across your organization.

March 9, 20267 min readShipSafer Team

Data Classification Policy: How to Classify and Protect Sensitive Data

Data classification is the foundation of information security. Without knowing which data requires the most protection, security teams end up either over-securing everything (expensive and operationally painful) or applying uniform controls that leave the most sensitive data exposed. A well-designed classification policy tells every employee exactly how to handle different types of data — and gives security tools the context to enforce those rules automatically.

Why Classification Matters for Compliance

Major frameworks require it, directly or indirectly:

  • SOC 2 (Confidentiality criteria C1.1): Requires identifying and maintaining confidential information
  • ISO 27001 (Annex A 5.9–5.13): Asset management and information classification controls
  • GDPR (Article 32): Requires appropriate technical and organisational measures based on risk — classification is how you identify which data is high risk
  • HIPAA (Security Rule): Protected Health Information (PHI) must be identified before appropriate safeguards can be applied
  • PCI DSS (Requirement 9.5): Cardholder data environment requires accurate scoping, which depends on knowing where cardholder data lives

Auditors for every one of these frameworks will ask: "How do you know where your sensitive data is?" Classification is the answer.

The Four-Tier Model

A four-tier classification model balances completeness with usability. Adding more tiers creates confusion about which tier applies in borderline cases.

Tier 1: Public

Definition: Information that is intentionally made available to the public with no expectation of restriction. Disclosure would not harm the organization, its customers, or partners.

Examples: Marketing materials, published blog posts, product documentation, press releases, job postings.

Handling requirements: No special controls. Can be shared freely.

Tier 2: Internal

Definition: Information generated in the ordinary course of business that is not intended for public disclosure but whose exposure would cause minimal harm.

Examples: Internal memos, general company policies, non-sensitive meeting notes, most internal email, employee directory (names and roles without contact details).

Handling requirements:

  • Store in company-approved systems (Google Drive, SharePoint, Notion)
  • Do not share externally without business justification
  • Encrypt when transmitted over public networks (standard HTTPS is sufficient)
  • No special disposal requirements (standard delete is acceptable)

Tier 3: Confidential

Definition: Sensitive business information that could cause significant harm to the organization or individuals if disclosed without authorization. Includes most regulated data categories.

Examples: Customer data (names, contact information, usage data), financial records, business contracts and NDAs, source code, employee compensation data, sales pipeline and forecasts, security audit reports.

Handling requirements:

  • Store in access-controlled systems with audit logging
  • Restrict access to employees who need it for their job function
  • MFA required for systems storing Confidential data
  • Encrypt at rest and in transit
  • Do not transmit via unencrypted channels (no plain-text email for files containing customer data)
  • Secure deletion required — use certified wipe tools or physical destruction for hardware
  • Third parties accessing Confidential data must sign an NDA and be subject to a vendor security assessment

Tier 4: Restricted

Definition: The most sensitive category. Data subject to strict regulatory requirements, or whose unauthorized disclosure would cause severe harm — financial, legal, reputational, or physical.

Examples: Payment card data (PAN, CVV), protected health information (PHI), Social Security numbers and government IDs, biometric data, passwords and authentication credentials, encryption keys, data subject to specific contractual restrictions, merger and acquisition information before public announcement.

Handling requirements:

  • Store in dedicated, hardened systems with strict access control
  • Access limited to named individuals with documented business justification
  • All access must be logged and reviewed
  • Encrypted at rest using AES-256 or equivalent, with keys managed separately
  • Application-level encryption for database fields where feasible
  • MFA enforced for all access; privileged access managed via a PAM solution
  • No storage on personal devices or unmanaged endpoints
  • Physical documents must be stored in locked, access-controlled areas
  • Disposal requires certified destruction (NIST 800-88 compliant wipe or physical destruction)
  • Cannot be shared with third parties without explicit legal authorization (DPA for personal data, explicit contract clause for payment data)

Data Labeling

Labeling is how classification becomes operationally visible. Data must be labeled so that anyone handling it knows its classification and therefore its handling requirements.

Document Labeling

For files and documents, use classification labels in headers and footers. Common approaches:

  • Manual labeling: Users apply labels when creating documents. Works for small teams but breaks down as the organization scales.
  • Microsoft Purview / Google Drive labels: Native label capabilities in enterprise productivity suites. Allow mandatory labeling prompts on document creation, automatic application of protection settings (encryption, sharing restrictions) based on label.
  • Automatic classification: Tools like Microsoft Purview Information Protection can automatically detect and label sensitive content based on patterns (credit card numbers, SSNs, PHI).

For documents containing multiple data types, apply the highest applicable classification.

Database Column Classification

For structured data stores, annotate schema definitions with classification levels. Some database management tools support column-level classifications natively. For others, maintain a data dictionary:

# Example data dictionary entry
table: customers
columns:
  - name: customer_id
    classification: internal
    pii: false
  - name: email
    classification: confidential
    pii: true
    regulation: [gdpr, ccpa]
  - name: stripe_payment_method_id
    classification: restricted
    pii: false
    regulation: [pci-dss]
  - name: date_of_birth
    classification: restricted
    pii: true
    regulation: [gdpr, hipaa]

This data dictionary feeds your data discovery scans, DLP tool configuration, and access control policy.

Code and Repository Classification

Source code repositories often contain secrets that should be Restricted (API keys, passwords) embedded in code that would otherwise be Internal. Train developers on:

  • Never commit secrets to source control
  • Use .gitignore to exclude configuration files that may contain credentials
  • Use pre-commit hooks to scan for credentials before they are committed

Data Loss Prevention Tools

A classification policy without enforcement tooling relies entirely on employees following the rules manually — which they will not do consistently. DLP tools automate detection and enforcement.

Endpoint DLP

Endpoint DLP agents monitor file activity on company devices and enforce handling rules:

  • Block upload of files labeled Restricted to unauthorized cloud storage
  • Alert when Confidential files are transferred to removable storage
  • Require justification and approval for certain data transfers

Tools: Microsoft Purview DLP, Symantec DLP, Forcepoint, Nightfall.

Network DLP

Network DLP inspects traffic leaving the corporate network or cloud environment:

  • Detect and block transmission of credit card numbers via email or web uploads
  • Alert when large volumes of Confidential data are downloaded via API
  • Inspect HTTPS traffic via SSL inspection proxy

Cloud DLP

For SaaS applications and cloud storage, cloud-native DLP capabilities are increasingly available:

  • Google Cloud DLP: Scans Cloud Storage, BigQuery, and Datastore for sensitive data types, can automatically de-identify or report findings
  • AWS Macie: Discovers and classifies sensitive data in S3 buckets
  • Microsoft Purview: Covers M365 and Azure data stores

Run periodic DLP scans of your cloud storage to detect data that has been stored outside its designated location.

Training and Culture

Even the best technical controls fail when employees do not understand classification. Train all employees on:

  • The four classification levels with concrete examples relevant to their role
  • How to label documents they create
  • What to do when they are unsure of the classification (escalate to their manager or the security team — do not default to the lowest tier)
  • What to do when they discover a potential misclassification or data in the wrong place

Reinforce through scenario-based training: "You receive an email from a customer with their health insurance card attached. What classification is this and what do you do with it?" Answering these questions in a training context is far better than answering them incorrectly in the field.

Include classification requirements in your onboarding program and in annual security awareness training. Measure completion rates — an assigned training that employees skip is not a control.

Check Your Security Score — Free

See exactly how your domain scores on DMARC, TLS, HTTP headers, and 25+ other automated security checks in under 60 seconds.