Why companies are moving to a COST (continuous offensive security testing) model
Zoran Gorgiev, Gavin Sutton
Table of contents
In March 2026, Gartner published an article titled “The Future of Pen Testing Is Continuous Offensive Security Testing.”
In it, the tech research firm gave a formal name to a security testing model some organizations had already been quietly applying: COST (continuous offensive security testing). It also gave the enterprise market and our industry a shared language for a change that many of us already knew was coming.
We covered COST and its mechanics elsewhere. The question we’re investigating here is why companies are making this move toward COST, and why now, rather than three or four years ago.
We strongly believe that most of them held on to annual or periodic penetration testing in the past, not because they believed it was sufficient, but because there was no practical alternative, and especially not at scale. The tools, operational models, and automation required for continuous offensive testing simply didn’t exist in suitable forms at the time.
But now they do. And four organizational pressures are making the transition to COST inevitable.
Why CISOs are turning to COST
Traditional penetration testing broke down slowly, then all at once. Here’s what changed and why CISOs are turning to COST.
Annual pentesting leaves organizations acting on outdated data
The main purpose of a pentest report is to guide consequential decisions. It
- Shapes remediation priorities
- Supports risk tracking
- Feeds board reporting
- Gives leadership confidence in the environment’s security posture
However, if you only act on it every 12 months, you have a problem. The report serves as the reference point for resource allocation, remediation sequencing, and risk acceptance decisions long after it accurately reflects your environment.
This drawback makes annual testing more than just a problem of frequency. Quarterly testing can reduce the gap, but it does not solve the underlying issue. It is still periodic testing, with the same structural limits and many of the same problems.
COST, on the other hand, removes this problem precisely by being continuous. When a deployment changes the authorization logic on a high-value API, a triggered test either validates that the change is safe or produces a new finding. In any case, the update happens within hours, not next year.
The downstream effect of the risk signal updating when risk changes on decision quality is massive:
- Remediation teams work from current findings rather than a backlog inherited from a point-in-time assessment.
- Risk owners make prioritization calls based on current exposures.
- CISOs report to the boards what’s happening in production and/or development right now, rather than what happened when they scoped the last test.
In short, COST changes the underlying logic: from a schedule-driven snapshot to a change-driven, ongoing picture of what is and isn’t validated.
Hiring can’t scale security testing
An ISACA 2025 study found that 55% of cybersecurity teams are understaffed, and 65% have unfilled roles. And as it’s often argued, offensive security expertise sits at the scarce end of this gap. There simply aren’t enough qualified offensive security professionals, and by extension, penetration testers, to meet the market demand.
This limitation is inherently structural. Training pipelines cannot possibly produce offensive security talent fast enough to close a gap of that size — a gap that has been widening for years. Accordingly, organizations that try to scale their security testing program through hiring alone are working against a supply constraint that isn’t going away anytime soon.
AI agents and agentic AI change this situation. An AI penetration testing platform allows you to cover a massive, frequently changing enterprise environment without the headcount.
The operational infrastructure for COST, which didn’t exist at a viable cost or scale three years ago, now does. The market reflects this: the pentesting market, valued at $2.4–2.7 billion in 2025, is projected to reach $5.5–7.4 billion by the early 2030s. That’s growth driven largely by the move toward continuous delivery made possible by AI.
Modern attack surfaces outgrow any fixed test scope
Traditional penetration tests are scoped based on what an organization knows about its systems at a specific point in time. That’s a reasonable starting point. However, the modern application stack has greatly outgrown it.
Most production environments now rely on components your organization didn’t write and doesn’t fully control:
- Third-party APIs
- AI inference endpoints
- Open-source libraries
- Identity providers
- Cloud-managed services
- MCP servers that wire AI agents to live business data
Each of these extends the attack surface beyond the boundary of the scope document.
That matters immensely, as some of the most consequential security incidents of recent years didn’t originate in poorly tested systems. They occurred in supply chain components that were out of the known scope; the affected component simply wasn’t on anyone’s radar when the incident happened.
A periodic human-led testing model lacks a mechanism to handle this state. If your team integrates a new AI API in March and the next scheduled test is in September, the integration will remain untested for six months.
But under a COST model, a new third-party integration can itself be a trigger. A CVE published against a dependency in your stack can also be a trigger. The scope keeps expanding as the attack surface grows; it doesn’t stay fixed to what you know at one point in time.
That’s a structural difference in how you define and capture security risk. And it becomes even more meaningful as supply chains grow more complex, and AI components become standard parts of production architectures.
Record vulnerability volumes make periodic testing obsolete
The number of new vulnerabilities is now so high that the time between periodic penetration tests is becoming an unmanageable exposure window.
FIRST projects a median of about 59,427 CVEs for 2026 — the first year on record to go over 50,000. And NIST reported that CVE submissions in the first three months of 2026 are almost one-third higher than the same period in 2025.
Each disclosure is a potential exposure event for any organization running the affected software. For organizations on annual or quarterly test cycles, the vast majority of those disclosures fall into a gap between assessments. Unvalidated, unaddressed, and unknown whether they affect anything in the live environment.
Moreover, exploitation doesn’t wait for defenders to catch up. Mandiant’s M-Trends 2026 report puts the mean time to exploit at -7 days, indicating that attackers exploit vulnerabilities before public disclosure in more and more cases.
Continuous testing addresses this head-on:
- When a CVE affecting a framework in the stack becomes public, COST fires a test. The organization knows within hours whether the vulnerability is present and exploitable in its environment.
- Since a good AI-based continuous security testing platform doesn’t depend exclusively on known vulnerabilities — for reference, see “Can AI identify 0-days” — it can also help you detect weaknesses before the security community knows they exist.
That’s what systematically driving down exposure windows looks like in practice. It means minimizing the time between when a vulnerability occurs and when an organization learns about it, as well as eliminating vulnerabilities.
What it means when Gartner names a model
When Gartner formally defines a security category, practical applications follow for the enterprise market:
- Procurement teams use the vocabulary in their RFPs (Request for Proposal).
- Budget owners use it to build internal business cases.
- The category name serves as a shared reference point for legal, compliance, and the CISO’s office.
Security leaders who have been running informal versions of continuous penetration testing for two or three years now have practically standardized terminology for what they do. Those who haven’t started yet have a clear reference point for the conversation with their boards and CFOs, backed by renowned analysts.
Organizations that develop this capability before it becomes a norm will have an edge in being ready for audits, guaranteeing the quality of risk evidence, and reporting to the board. The gap between early adopters and the industry norm is already narrowing. If you wait to catch up, it will likely cost more than if you take action now.
Making COST practical for API-based environments

Equixly was built specifically for API-first environments, where COST is hardest to implement with traditional tooling. Its Agentic AI Hacker can:
- Run continuously both in production and development.
- Run on deployment and configuration changes.
- Chain authorization bypasses and business logic attacks the way a skilled human pentester would.
- Route confirmed, exploitable findings to remediation owners with the evidence they need to act.
For mature security teams, Equixly provides a route to making offensive security continuous, event-driven, and embedded in the ways modern applications are built and operated in practice.
See COST in action for API-first environments. Book a demo to get started with Equixly.
FAQs
How do you build a business case for COST to a CFO who views it as just extra spending on security?
Show the total annual costs for testing from start to finish, including:
- What the testing will cover
- Time spent fixing issues
- Preparing evidence for review
- Audit preparation
- Breach risks amassed from an overly long exposure window
Then, demonstrate how continuous testing infrastructure replaces that recurring overhead with current evidence trail.
What is the difference between a COST program and running penetration tests more frequently?
Frequency testing still follows a calendar. Conversely, COST fires tests in response to concrete changes in your environment. Thus, testing effort focuses on where and when security risk materializes, rather than on a fixed schedule that ignores what has changed.
If COST runs tests based on triggers, how can you prevent it from running during sensitive release windows?
Set clear COST governance rules ahead of time. That includes scheduling execution times, setting blackout periods, and deciding how deep to test. By doing this, you can pause automated testing or run it in a less disruptive way during maintenance, major launches, or other important business periods.
Zoran Gorgiev
Technical Content Specialist
Zoran is a technical content specialist with SEO mastery and practical cybersecurity and web technologies knowledge. He has rich international experience in content and product marketing, helping both small companies and large corporations implement effective content strategies and attain their marketing objectives. He applies his philosophical background to his writing to create intellectually stimulating content. Zoran is an avid learner who believes in continuous learning and never-ending skill polishing.
Gavin Sutton
Head of Marketing
Gavin is marketing leader with more than a decade of experience in the cybersecurity industry helping startups and scale ups grow internationally. He has a passion for working with disruptive technology companies who can reshape the security landscape with their innovative solutions.