Blog
AI Penetration Testing

What is continuous offensive security testing (COST) and how does it relate to CTEM?

Zoran Gorgiev, Gavin Sutton

Table of contents

Why CTEM needs continuous validation
What is continuous offensive security testing (COST)?
What are the four phases of COST?
How does COST relate to CTEM?
What changes when teams adopt COST?
What are the common mistakes when implementing COST?
Why COST and CTEM matter for API-based systems and applications
The future of CTEM depends on continuous validation

What is continuous offensive security testing (COST) and how does it relate to CTEM?

CTEM offered a way out of the vulnerability-chasing treadmill. Rank your exposures by business impact. Validate the ones that matter. Fix them. Repeat.

However, in practice, many CTEM programs hit the same wall: the validation step. Teams know something looks risky. They do not know whether an attacker could turn it into a breach. And the traditional way to find out, booking a pentest six weeks from now, does not fit into a framework meant to operate at the speed of today’s attack surface change.

This is the problem that continuous offensive security testing (COST) addresses. Gartner introduced the term in its March 2026 research note The Future of Pen Testing Is Continuous Offensive Security Testing (Poole et al., 2026).

COST is an approach gaining growing attention in the cybersecurity industry: an operating model in which offensive testing begins when something material changes, rather than when the calendar dictates.

In this article, we will examine the nature and importance of COST, how it naturally integrates with CTEM, and what the model means for teams that ship software with a high frequently.

Why CTEM needs continuous validation

Gartner defined CTEM in 2022 as a cycle with five stages:

Scope
Discover
Prioritize
Validate
Mobilize

5 phases of CTEM (continuous threat exposure management)

The first three stages are now well understood. Attack surface management tools map assets. Vulnerability scanners surface findings. Risk engines rank them. All of this runs continuously in modern security stacks.

The fourth stage, validate, is where things get tricky. Validation answers the following question: given this exposure, can an attacker exploit it in our environment, with our controls, against our data?

A CVE score cannot answer this question. A vulnerability scanner cannot either. Offensive testing can. It chains flaws together, defeats implemented controls, and reaches a production asset of value.

However, a problem occurs here: CTEM runs continuously, but traditional penetration testing does not. You get a yearly point-in-time engagement, maybe two. Everything else runs unvalidated: each new deployment, microservice, and identity change between assessments. And each interval between tests is an unvalidated exposure window — exactly where attackers operate.

COST closes this gap.

What is continuous offensive security testing (COST)?

Continuous offensive security testing (COST) is an operating model for offensive security that validates an organization’s security defenses through trigger‑driven, adversary‑based testing as environments and threats change.

Instead of scheduling tests in the calendar, you define events that automatically start a test, specifying which method is invoked in response to which trigger, at what depth, and on what timeline.

When those events happen, testing begins promptly. It runs inside a governed execution window. Results feed back into your remediation workflows.

Among the usual triggers are:

A new deployment in CI/CD
A configuration change in a cloud account
A threat-intelligence alert regarding a technology you use
A zero day affecting your software
An asset appearing in your external attack surface
A change to authorization logic or identity rules
A CTEM prioritization engine flagging an exposure as critical

COST pulls together testing methods your team already knows: penetration testing, red teaming, bug bounty, and control validation. It is not some sort of replacement for them or a separate product category. You should view it as a way to organize offensive security so that testing happens when risk changes, not six months later.

What are the four phases of COST?

4 phases of COST

Gartner describes COST as a cycle comprising four phases:

Target definition: What are we testing, and why is it worth testing right now?
Plan: Which method, which attacker objectives, and what depth?
Execute: Run the test with a mix of automation, AI, and human reasoning.
Report: Turn findings into actions that the right team can pick up immediately.

This process runs constantly. It does not start in January and end in December. It fires whenever a trigger occurs and feeds back into CTEM’s mobilization step.

Triggers fall into one of the following three buckets:

High risk: Externally reachable change, privilege-path change, threat-aligned indicator, or CTEM-critical exposure. Testing starts as soon as possible (within one to two hours at the latest).
Medium risk: Internal posture changes that do not widen the external attack surface. Testing runs within the same business day.
Low risk: Routine updates with minimal exposure impact. Automated checks handle these on a rolling basis.

In short, the urgency and the depth of testing are proportional to risk severity, with higher-risk exposures requiring more extensive and immediate attention.

For instance:

A zero day against a public-facing service would trigger red team-level work.
A rewritten Identity and Access Management (IAM) policy would initiate control validation to check whether those changes introduced new vulnerabilities.
A minor library bump would run through automated checks without waking anyone up.

How does COST relate to CTEM?

COST and CTEM

The cleanest way to frame the relationship is to say that CTEM is the program, and COST is how you validate inside it.

CTEM’s fourth stage poses and strives to answer the question, “Is this exposure exploitable, and what does it mean for my business?”. COST makes it possible for you to answer that question continuously. Findings from COST flow into CTEM’s fifth stage (mobilization), so that confirmed, exploitable weaknesses get routed to the right asset owners with context and evidence.

Without COST, CTEM validation defaults to periodic security tests. That leaves the same exposure windows CTEM was meant to close wide open for threat actors to exploit. With COST, the validation stage runs at the same cadence as discovery and prioritization.

Two implications follow from this understanding:

CTEM without continuous validation is incomplete. It is nothing but a traditional vulnerability management with a new name.
COST without CTEM risks drifting into alert noise and fatigue. Triggers activate, tests run, and reports pile up on and on. But the logic that tells you continually which exposures are exploitable and carry a big business risk for your vertical and organization, and which findings to act on promptly, is missing.

What changes when teams adopt COST?

Teams that adopt a continuous offensive security testing model make three concrete changes to their operations.

Triggers replace schedules

The list of “penetration tests to schedule this quarter” becomes a list of “events that should automatically start a test.” That maps onto DevOps signals your team already emits: deployment hooks, Terraform plans, identity changes, and CI/CD pipeline completions. The security team stops worrying about schedules and calendars and starts writing trigger rules.

Risk tiers replace static scope documents

Scope negotiations used to take weeks. With a tier model — that is, the security testing model based on risk severity we discussed earlier — you define the tiers once. After that, each trigger routes itself to the right methodology and execution window.

This way, your organization gains massively in responsiveness. As Gartner itself notes, COST reduces trigger-to-start times from weeks to minutes.

Outcomes replace test counts

Success stops looking like “we ran 12 penetration tests this year” and starts taking the form of:

Average time from trigger occurrence to validation start
Reduction in exposure time for each asset type
Rate of SLA adherence for critical triggers
Improved detection coverage based on validation findings

This measurement frame aligns offensive security with the metrics CISOs already report to the board, such as risk reduction, resilience, and time-to-remediate.

What are the common mistakes when implementing COST?

COST is not a plug-and-play device. Knowing what could go wrong during implementation upfront can save you months of rework.

Trigger sprawl

The first instinct after reading about COST is to wire up every possible event: every deployment, config change, CVE alert, and new asset. Within weeks, tests will be firing constantly, reports will be stacking up, and nobody will be reading them.

The best strategy is to start small. Pick a few high-risk triggers connected to your most valuable assets. Make sure the process works from start to finish before expanding. Only add triggers to your list if they prove they’re necessary.

Findings with no clear owner

A test that confirms an exploitable authorization flaw is only useful if someone fixes it. COST results must land in the backlog of the team that owns the affected service rather than in a shared inbox or a generic security queue.

Before turning on a single trigger, map each asset class to an owner and a ticketing destination. If you can’t answer “who gets paged when this fires?”, you are not ready to fire it yet.

Risk tiers that are never updated

The risk tier model can work beautifully on day one, only to quietly rot over the following six months. New services launch, architectures shift, a side project becomes revenue-critical.

Tiers written against last year’s environment route today’s triggers to the wrong depth of testing. Yes, we said that you define tiers once, but like virtually anything else, you must update what you defined once. Treat the tier model as a living document and review it at a minimum every quarter.

Mistaking automation for validation

Automated checks are great for the low-risk tier. They are not a substitute for the kind of chained, business-logic-aware testing that catches the vulnerabilities attackers exploit in the wild — Broken Object Level Authorization (BOLA), Broken Function Level Authorization (BFLA), Broken Object Property Level Authorization (BOPLA), and generally logic flaws in multi-step workflows.

Teams that rely heavily on scanners essentially still work within the limitations of traditional vulnerability management. One of the reasons agentic AI testing exists is to overcome these exact limitations by emulating the depth of a human pentester’s reasoning, at the cadence COST demands.

Security teams defining COST rules alone

COST lives on DevOps signals — deployment hooks, IaC plans, and pipeline events. If security writes trigger rules in isolation, engineering feels tested at rather than tested with. The programs that work treat trigger definition as a joint undertaking, with platform engineering at the table from day one.

View these warnings as predictable failure modes of any COST implementation. At the same time, keep in mind that every one of them is solvable with a thoughtful rollout and the right tooling underneath.

Why COST and CTEM matter for API-based systems and applications

Equixly’s platform was built for API-driven architectures, and COST is the clearest description yet of what continuous offensive security with API-first architectures as a focal point looks like in practice.

APIs are a harsh test case for the traditional vulnerability management model. In some organizations, they can go through multiple changes a day. Their attack surface expands with every new endpoint, integration, microservice, and MCP implementation.

Authorization logic, the biggest class of API vulnerabilities per the OWASP API Security Top 10, is in constant motion. A pentest from last quarter almost certainly does not reflect the API security posture today.

Continuous penetration testing on APIs is a natural fit for COST’s trigger-driven model:

A new endpoint shipping to production is a trigger.
An updated OpenAPI specification is a trigger.
A new third-party integration is a trigger.
A CVE against a language runtime you use is a trigger.

Equixly’s Agentic AI Hacker runs tests when these events occur. It chains requests, probes business logic, validates authorization boundaries, and surfaces evidence of exploitability rather than theoretical risk. The findings flow into CTEM’s mobilization phase with the context and evidence that asset owners need to act.

What this AI penetration testing approach does, and does well, is compress the validation cycle for exposures in API-based systems (including GenAI as well as web applications) from weeks to hours, decreasing the corresponding costs.

And that is precisely what COST asks for at the high-risk tier.

The future of CTEM depends on continuous validation

The attack surface does not pause for quarterly reviews. A new deployment, configuration change, or endpoint — they are all potential breaches waiting to be validated. CTEM provided you with a framework for a continuous approach to security risk exposure. COST gives you the operational model to act on it.

For security leaders, this is a major step forward. Offensive testing ceases to be a scheduled activity — it responds to security issues as they emerge, rather than as the calendar dictates. That is a material improvement in security posture; one that compounds over time as trigger rules mature, tier definitions sharpen, and remediation workflows tighten around confirmed, evidence-backed findings.

Organizations that build this capability now — before it likely becomes a compliance expectation — will carry a measurable advantage in resilience, response time, and board-level risk reporting.

Gartner’s direction is clear. The gap between where most programs operate today and where the discipline is heading is closing fast, and the cost of waiting is measured in unvalidated exposure windows.

Curious how COST works in practice?

Talk to us.

FAQs

Does COST mean we stop running periodic penetration tests?

Not necessarily, since some compliance requirements — e.g., [DORA Article 26](https://equixly.com/blog/2024/05/15/the-dora-regulation-and-api-security-testing/) — require scheduled engagements and deep human-led testing, such as TLPT (threat-led penetration testing). In such cases, organizations should keep both: run mandated periodic tests for compliance purposes and use COST to cover the exposure windows between those tests.

Can a small team with limited resources implement COST?

One of the primary functions of an AI penetration testing platform, such as Equixly, is to [augment security teams](https://equixly.com/blog/2025/10/20/equixly-for-security-teams/). Even a two- or three-person security team can run COST when an agentic platform handles the continuous offensive testing. Equixly’s Agentic AI Hacker runs unattended across your API-based environment, chains attacks the way a human pentester would, and operates at a scale that makes continuous validation a practical and viable reality for any team.

How does Equixly support COST in a CTEM program?

Equixly is the continuous offensive testing capability that sits at the point where CTEM validation meets COST execution. When Equixly’s Agentic AI Hacker detects an exposure in your API-based application or system during testing, it feeds findings of confirmed exploitable issues (with proof-of-concept evidence) into CTEM’s mobilization stage, routing them to the right asset owners. In this case, [security testing](https://equixly.com/blog/2024/07/15/guide-to-api-security-testing/) runs on events such as new [CI/CD deployments](https://equixly.com/blog/2026/04/20/how-to-build-api-security-into-your-ci-cd-pipeline-a-devsecops-playbook/) or API changes, keeping CTEM validation current as your environment changes. In short, if you’re working within a CTEM program, leveraging COST with an agentic AI solution like Equixly enables you to bridge the gap between periodic tests and [continuous security validation](https://equixly.com/blog/2025/09/22/ai-vs-ai-llms-apis-agents/).

Zoran Gorgiev

Technical Content Specialist

Zoran is a technical content specialist with SEO mastery and practical cybersecurity and web technologies knowledge. He has rich international experience in content and product marketing, helping both small companies and large corporations implement effective content strategies and attain their marketing objectives. He applies his philosophical background to his writing to create intellectually stimulating content. Zoran is an avid learner who believes in continuous learning and never-ending skill polishing.

Gavin Sutton

Head of Marketing

Gavin is marketing leader with more than a decade of experience in the cybersecurity industry helping startups and scale ups grow internationally. He has a passion for working with disruptive technology companies who can reshape the security landscape with their innovative solutions.

04/24/26 AI Penetration Testing