Cloud security has a last-mile problem.
The industry has spent billions making detection faster and smarter. CNAPPs, CSPMs, and vulnerability scanners surface thousands of findings every day. But visibility isn’t the issue anymore. In cloud security remediation, what matters is what happens after issues are identified.
Tamnoon’s own research shows most security teams investigate about 4% of the alerts their tools generate, with less than 1% resulting in a confirmed fix. For the ones that do get addressed, it’s not fast. Tamnoon’s State of Cloud Remediation 2025 revealed that critical cloud misconfigurations sit unresolved for an average of 128 days before remediation. That’s over four months of exposure per finding.
Why? Because every investigation starts with simple questions like:
- Is this resource in production?
- Who owns it?
- What breaks if we change it?
Answering those questions takes time most teams don’t have, so the backlog grows by nearly 40% year over year.
This clearly reflects a remediation gap rooted in how issues are resolved, something automated remediation is working to solve. But successful AI-powered remediation depends on how work moves from alert to resolution.
Explore each stage of the CNAPP remediation workflow, how findings are prioritized and investigated, where automation applies, and where human judgment is required to protect production systems.
How Manual Remediation Typically Works
Before getting into what automated cloud remediation looks like, it’s worth laying out what most teams are actually dealing with today.
- An alert fires: Maybe it’s an S3 bucket without HTTPS enforcement, or an EC2 instance running outdated metadata service settings. An analyst picks it up, sometimes within hours, sometimes days later, depending on what else is competing for their attention.
- Someone investigates it: They log into the cloud console. Check whether the resource is in production or a test environment. Try to figure out who owns it by digging through resource tags, Slack messages, or internal wikis. If the resource has dependencies, they need to map those too. All of this is manual, and for a single finding it can take hours.
- A ticket is created: If they determine the issue is worth fixing, they write a ticket and hand it off to the resource owner, usually someone on the engineering or DevOps team.
- Someone fixes it (eventually): That person gets to it when they can, often without the full picture of why it was flagged or what’s at stake. The fix might take five minutes. The handoff and wait time might take five days or months.
Where This Breaks Down
There are a few specific friction points that make this cloud security remediation process unsustainable at scale.
Context gathering is the biggest time sink, and it’s almost entirely manual. There’s no systematic way to assess blast radius, so teams either move cautiously and leave risk open, or move fast and risk breaking production.
Every alert gets roughly the same treatment regardless of whether it’s a production database or an idle test bucket. Cloud security alert fatigue sets in, and the handoff between security and engineering creates friction that compounds over time, with security losing credibility each time they escalate something that turns out to be low priority.
None of this is a failure of the people involved, but a broader problem with the process. The tools that find problems weren’t designed to fix them, and the space between those two steps is where risk accumulates.
Here’s how the two approaches compare at each stage of the workflow:
| Workflow Stage | Manual Remediation | Automated Remediation |
| Alert intake | Analyst reviews alerts one at a time. Duplicates across tools are common and often treated as separate issues. | AI deduplicates across CNAPPs and groups related alerts into prioritized initiatives. ~145 alerts per initiative on average. |
| Context gathering | Hours per finding. Analyst checks cloud consoles, resource tags, Slack threads, and internal wikis manually. | Tami runs automated queries against cloud APIs in minutes. All read-only. No production impact. |
| Safety assessment | Based on severity rating and analyst judgment. No systematic blast radius analysis. | RCI score updated progressively by targeted investigative automations. Each step narrows uncertainty. |
| Execution | Ticket handoff to resource owner, who re-researches the issue with minimal context. Days to weeks of wait time. | Parameterized scripts from battle-tested playbooks. Human approval gates for RISKY findings. UNSAFE findings routed with full context. |
| Validation | Check the CNAPP dashboard the next day to see if the alert disappeared. | Post-remediation scans, configuration drift checks, and complete audit trail with full attribution. |
| Prevention | Rarely reached. Same misconfigurations resurface. Alert fatigue cycle restarts. | Guardrails implemented via SCPs, Azure Blueprints, and policy-as-code. Same issue can’t be reintroduced. |
How AI-Powered Remediation Changes the Workflow
The workflow still follows a logical progression. You don’t skip steps, as AI and human expertise are applied at different stages to eliminate the bottlenecks described above.
In Tamnoon’s case, the AI agent driving this workflow is called Tami, trained on millions of real-world cloud security alerts and remediation outcomes. Tami handles the investigative and operational work. Human experts (called CloudPros) handle judgment calls and high-risk approvals.
One thing worth noting upfront: most of this work happens outside the production environment. Investigation, enrichment, safety analysis, and remediation planning all occur without touching live infrastructure. The only moment production is involved is the final, validated execution step. That distinction matters when the fear of breaking something is the main reason remediation stalls.
Here’s what each stage looks like when automated cloud remediation is done right:
Stage 1: Ingestion and Aggregation
Every automated cloud remediation workflow starts with the same problem: too many alerts from too many tools, with no unified view of what actually needs attention.
This stage pulls findings from whatever CNAPPs, CSPMs, or detection tools the organization is running. Findings are normalized into a common format, deduplicated across tools, and grouped into unified initiatives based on shared root cause or affected resource.
All of this is handled by AI. No human involvement at this stage. The system recognizes when two different tools flag the same S3 bucket for the same issue and groups them into a single initiative rather than creating duplicate work.
In a manual workflow, this step either doesn’t exist or happens in an analyst’s head. They might see the same finding from two solutions and treat them as separate issues, doubling their workload. Or they miss that six alerts all trace back to the same underlying cloud misconfiguration. There’s no systematic deduplication, just tribal knowledge and spreadsheets.
With AI, the analyst’s queue goes from thousands of individual alerts to a manageable set of prioritized initiatives before a human ever looks at it. That alone changes the starting point for everything that follows.
Stage 2: Contextual Enrichment
A raw alert doesn’t contain enough information to act on safely. It tells you what’s wrong, but not whether fixing it will break something. This stage builds the full picture around each initiative.
The AI runs automated queries against live cloud APIs to pull context that would take an analyst hours to gather manually. Depending on the finding type, this includes:
- Checking resource usage patterns over the past 90 days (is this asset actually active, or has it been sitting idle?)
- Pulling access logs from CloudTrail to understand who and what is interacting with the resource
- Mapping workload dependencies and associations (is this container image running in production? Which EC2 instances rely on it?)
- Identifying environment type: production, staging, development, or test
- Checking encryption status and public exposure levels
- Looking up ownership through resource tags, IAM mappings, and account structure
There’s no human involvement yet. And critically, all of this is read-only. The system is querying metadata and logs, not modifying anything in the environment. Nothing in production is being touched.
In a manual workflow, this is the step that kills velocity. An analyst doing this work logs into cloud consoles, runs CLI queries, cross-references internal documentation, and pieces together a picture that might take hours per finding. In many cases, they skip enrichment entirely because they don’t have time, which means they’re making remediation decisions without understanding what they’re actually touching.
The output of this stage is a fully enriched initiative with enough context to answer the real question: is this safe to fix?
Stage 3: Investigation And Safety Assessment
Enrichment builds the case file. Investigation answers the question: what happens if we fix this right now?
This is where automated cloud remediation diverges most sharply from the manual approach. Instead of relying on an analyst’s judgment call based on a severity rating, the AI runs targeted investigative automations tailored to the specific finding type. Each one answers a single safety question, and the results feed into a dynamic remediation confidence score called the Remediation Confidence Indicator (RCI).
Depending on the finding, the AI might:
- Analyze CloudTrail logs to determine whether actual HTTP traffic exists to an S3 bucket before enforcing HTTPS (would the fix break active workloads?)
- Check whether a flagged container image is deployed and running in production, or sitting unused in a registry
- Query VPC Flow Logs and CloudWatch metrics to detect calls to the EC2 instance metadata service before enforcing IMDSv2
- Pull a full software bill of materials to understand what’s inside a vulnerable image and whether a patch path exists
- Identify whether a resource is behind CloudFront, a load balancer, or accessed directly
Each automation updates the RCI, which rates every finding as SAFE (proceed with confidence), RISKY (remediation path exists but requires human review), or UNSAFE (needs team coordination before anyone touches it). The score isn’t static, but rather evolves with each new piece of evidence, narrowing uncertainty step by step.
What This Looks Like In Practice
Say several S3 buckets get flagged for the same policy violation: no HTTPS enforcement. Same alert, same severity. After investigation, the system determines:
- One bucket is completely empty and unused. SAFE to delete or enforce policy.
- Two buckets show 100% HTTPS traffic with no HTTP calls detected. SAFE to enforce.
- Another bucket has 45 active HTTP GetObject calls in the last 90 days. Enforcing HTTPS would break a production workload. UNSAFE, routed for coordination with the application team.
Same finding. Three different outcomes. That distinction doesn’t exist in a manual workflow, where the analyst checks the severity, maybe confirms whether the resource is in production, and either escalates or attempts a fix. There’s no progressive confidence scoring, no systematic blast radius assessment, and no clear framework for distinguishing “safe to proceed” from “needs more context.”
All of the investigation work described above is still read-only. No changes to production. The RCI is built entirely from observed behavior and metadata, not from test executions against live infrastructure.
See How RCI Works in a Live Environment
See how Tami investigates findings, assigns safety scores, and routes remediation based on what’s actually safe to act on.
Stage 4: Remediation Execution
This is the one stage where production is actually touched, and only for findings that have passed through investigation and safety assessment. Nothing reaches this point without an RCI attached.
The remediation path depends on that score:
- SAFE findings: get parameterized remediation scripts generated from battle-tested playbooks. These are production-ready artifacts: IaC template updates, CLI commands, or policy-as-code changes tailored to the specific finding and environment. For organizations running in supervised autopilot mode, human experts (CloudPros) execute these fixes on the organization’s behalf. In co-pilot mode, the organization’s own team executes using the AI-generated plan.
- RISKY findings: get full remediation plans with human approval gates. The fix exists, and the path is clear, but the action requires human review before execution. A container image rebuild based on an updated Dockerfile, for example, or an AMI upgrade for an instance with active metadata service usage. The AI has done the investigation. The human makes the final call.
- UNSAFE findings: Not auto-remediated. They’re routed to the appropriate team with the complete investigation context: what was found, what was checked, why it’s unsafe to proceed without coordination, and who owns the resource. The developer or DevOps engineer receiving this isn’t getting a raw alert. They’re getting an investigated, enriched finding with a clear explanation of what needs to happen next.
In a manual workflow, none of this sorting exists. The developer gets a ticket that says “fix this S3 bucket policy” with minimal context. They re-research the issue, figure out the right fix, test it, and deploy it. The investigation that already happened on the security side doesn’t transfer. With agentic remediation, that work carries through to execution. Nothing gets repeated.
Stage 5: Validation And Prevention
The fix is deployed, but the workflow isn’t done.
This stage confirms that the remediation actually worked, that it resolved the root cause, and didn’t introduce new problems. The AI runs post-remediation scans to verify the finding is closed, checks for configuration drift or regressions, and documents everything in a complete audit trail.
Closing The Loop
That audit trail captures the full timeline: what was found, what context was gathered, what the RCI was at each stage, what action was taken, who approved it, and what the verification results showed.
Every step is attributed, whether it was performed by the AI agent or a human expert. For organizations operating under SOC2, HIPAA, or similar frameworks, this is the compliance record that proves the issue was handled properly.
Preventing Recurrence
Then comes the step most manual workflows never reach: prevention.
The system implements guardrails like Service Control Policies, Azure Blueprints, and policy-as-code rules to ensure the same misconfiguration can’t be reintroduced. This is what turns a single fix into a permanent posture improvement.
In a manual workflow, validation usually means checking the CNAPP dashboard the next day to see if the alert disappeared. Root cause confirmation is rare. Prevention is rarer. The same cloud misconfiguration resurfaces weeks later, the same alert fires again, and the cycle restarts.
That recurrence is a major driver of cloud security alert fatigue, and it’s entirely avoidable when validation and prevention are built into the workflow.
What Changes When Remediation Actually Scales
The difference isn’t just speed, though that matters. Our own data shows organizations using AI-powered cloud security remediation see up to an 95% reduction in MTTR, a 25x increase in investigated alerts, and up to a 95% reduction in ticket volume.
But the bigger shift is operational:
- Security teams stop gathering context and start making decisions.
- Developers stop receiving raw alerts and start receiving production safe remediation plans with clear next steps.
- The backlog shrinks instead of growing, and findings that get fixed stay fixed, because prevention is built into the workflow.
Automated remediation solves the last mile of cloud security across all stages. Ingestion turns thousands of alerts into manageable initiatives, enrichment builds context, investigation determines what’s safe to act on, execution follows the evidence, and validation and prevention close the loop permanently.
At every stage, AI handles the volume and investigative heavy lifting, while humans handle judgment calls and high-risk approvals. Neither replaces the other. Together, they close the last mile of cloud security.
Organizations using Tamnoon reduce open cloud exposures by up to 82% within 90 days. Book a demo to see how agent-led, expert-supervised remediation can work with your CNAPP.