Penetration Testing Guide: How to Run Your First Pentest

A penetration test is one of the highest-signal investments a startup can make in its security program. Unlike automated scanning, which identifies known vulnerabilities against a database of signatures, a skilled penetration tester reasons about your system the way an attacker would — chaining small weaknesses into significant exploits, finding logic flaws that no scanner detects, and stress-testing your assumptions about what an outsider can see and do.

For first-timers, the process is opaque enough that companies often waste the engagement. This guide walks through every step, from deciding what kind of test to run to getting maximum value from the findings.

Types of Penetration Tests

Black Box Testing

The tester receives no prior information about your system — no architecture diagrams, no credentials, no source code. They start from the same position as an external attacker: a company name and a target scope.

Pros: Most accurately simulates a real external attack. Findings are high-confidence because the tester didn't know where to look.

Cons: Expensive relative to coverage. The tester spends significant time on reconnaissance that your engineers could have summarized in an hour. Misses application-layer logic flaws that require authenticated access.

Best for: Testing your external perimeter, validating that your public attack surface is hardened, checking whether your detection and alerting fires when someone actively probes your network.

Gray Box Testing

The tester receives partial information — typically credentials for an authenticated user account, and sometimes a high-level architecture overview. They skip external reconnaissance and focus on what an attacker with a foothold could achieve.

Pros: Balances realism with efficiency. Simulates a credential compromise or an insider threat. Covers more of the application logic than pure black box.

Cons: Not as realistic for assessing external attack paths. The tester is starting with a head start that a real attacker would have to earn.

Best for: Web application security assessments, API testing, testing your application's authorization logic. The most common choice for a first pentest.

White Box Testing

The tester receives full access: source code, architecture documentation, credentials, network diagrams. This is closer to a thorough security code review and architecture assessment combined with exploitation validation.

Pros: Highest coverage. The tester can identify vulnerabilities in code that would be impossible to discover through black-box probing. Best value per dollar when the goal is finding all vulnerabilities rather than simulating realistic attack paths.

Cons: Least realistic. Doesn't test your detection capabilities because the tester isn't evading anything.

Best for: Deep application security reviews, pre-launch security assessments of critical features, validating that specific high-risk code paths are implemented correctly.

Scoping the Engagement

The most common mistake in a first pentest is scope that's either too broad ("test everything") or too vague ("the web app"). Good scoping takes 30–60 minutes with the vendor and produces a written scope document that both parties sign before work begins.

Define what's in scope:

Specific domains and subdomains (e.g., app.yourcompany.com, api.yourcompany.com — not *.yourcompany.com unless you've inventoried every subdomain)
IP ranges and cloud environments
Specific application features (authentication flow, payment processing, admin panel)
Mobile applications if relevant

Define what's explicitly out of scope:

Third-party services you don't control (your payment processor's infrastructure, for example)
Production databases unless you're comfortable with the risk of data exposure during testing
Social engineering of employees (keep this separate from application security tests unless you specifically want it)
Denial-of-service testing (destructive and usually not worth the risk to uptime)

Define the testing window:

Specify whether testing can occur 24/7 or only during business hours. Some teams want to observe their alerting in real-time; others prefer the tester works off-hours to avoid impacting development velocity.

Get an emergency contact in writing:

Both parties should have a direct contact for the "stop test" scenario — if the tester discovers active malicious activity or if testing causes unintended production impact. This prevents a test from becoming an incident.

Choosing a Vendor

Penetration testing quality varies enormously. A $5,000 engagement from an inexperienced firm may produce an automated Nessus scan with a report attached. A $25,000 engagement from a skilled team will produce findings that your engineers will be fixing for months.

Signals of a quality vendor:

Named, credentialed testers: You should know who will actually perform your assessment. CVEs they've discovered, CVE disclosures, or public research (blog posts, conference talks) indicate real capability.
Methodology documentation: Can they describe their testing methodology? OWASP Testing Guide, PTES (Penetration Testing Execution Standard), and NIST SP 800-115 are common frameworks. A vendor who can't articulate their methodology is running ad hoc assessments.
Sample reports: Request a redacted sample report. The quality of findings — severity rationale, reproduction steps, remediation guidance — predicts the quality of your report.
References from companies at your scale: Enterprise pentesting firms aren't necessarily better for startups. Ask for references from 50–200 person SaaS companies with similar architecture.
Communication during the engagement: The best firms provide daily check-ins or a shared communication channel during active testing. You should know what they're working on and hear preliminary findings before the report is finalized.

Price ranges (2026):

Web application (gray box, 1 week): $8,000–$20,000
External network and web application combined: $15,000–$35,000
Mobile application (iOS or Android): $10,000–$25,000
Red team engagement (multi-week, full attack chain): $40,000–$100,000+

What a Good Report Contains

A pentest report should be actionable, not just impressive. The deliverable includes:

Executive summary: A 1–2 page narrative suitable for leadership and board. No technical jargon. Covers scope, testing period, overall risk assessment, and highest-priority findings.

Finding details: For each vulnerability:

Title and severity rating (Critical / High / Medium / Low / Informational)
Affected asset
Detailed description and root cause
Proof of concept (steps to reproduce)
Impact if exploited
Remediation recommendation
References (CVE, CWE, OWASP)

Findings summary table: A spreadsheet-friendly summary that your engineering team can use to track remediation.

If the report is just a list of vulnerabilities without reproduction steps or remediation guidance, send it back.

The Remediation Workflow

A pentest report without a remediation plan is just documentation of your risk. Structure the remediation process:

Triage findings within 48 hours: Assign each finding to a team owner and validate severity. The tester's severity rating is a suggestion — your team understands business context.
Fix Critical and High findings first: These represent exploitable vulnerabilities with material impact. Target remediation within 2–4 weeks.
Schedule Medium findings into the sprint: Medium findings should have a fix committed within 60–90 days.
Treat Informational findings as a backlog: These are improvements, not urgencies. Review them quarterly.
Track remediation in your issue tracker: Don't manage pentest findings in a spreadsheet divorced from your engineering workflow. Create issues in GitHub, Jira, or Linear so they get sprint-level prioritization.

Retesting: Don't Skip It

Most vendors offer a retest at reduced cost (typically 25–50% of original engagement price) where they verify that specific findings have been remediated. This is worth doing for Critical and High findings.

Retesting serves two purposes: it confirms your fix actually addresses the root cause (not just the symptom), and it produces a document showing that known vulnerabilities were remediated — useful for SOC2 audits, security questionnaires, and the next M&A due diligence process.

Running Your Second Pentest Differently

After your first pentest, you know your weaknesses. Use that knowledge to make subsequent tests more valuable:

Rotate vendors every 2–3 years to get fresh eyes
Scope subsequent tests to areas that changed significantly (new product features, new infrastructure)
Consider moving from gray-box to red team as your detection and response capability matures — the question evolves from "can they get in?" to "can we catch them?"