Detection methodology

How ZephMatrix finds hidden AWS costs

Nine signal categories. Direct AWS API data. Exact qualification thresholds — not heuristics. And the execution layer that turns findings into confirmed savings.

Run the free report See how it works ↓

Signal categories

Waste sub-types

AWS data sources

100%

Actions human-gated

Detection is step one

Finding waste is the easy part. Doing something about it is the hard part.

Every FinOps tool surfaces findings. Most stop there. ZephMatrix continues: the agent investigates the highest-value finding daily — adding evidence, safety classification, and utilization context — then prepares the action, executes after human approval, and confirms whether savings materialized against a pre-action baseline.

Detect

Nine signal categories, seven AWS data sources

Investigate

Agent adds owner, safety, and utilization context

Approve & execute

Human approves. Agent executes in AWS.

Verify

Savings confirmed against pre-action baseline

See the execution layer ↓View plans

Scan mechanics

How a scan works

Connection

< 10 min setup

Create a cross-account IAM role and paste the ARN — no agent or collector to install
Read-only scope: cost data, inventory, tags, CloudWatch metrics, Compute Optimizer signals
No data leaves your account — ZephMatrix queries AWS APIs directly using your role

Scan scope

All enabled regions

Inventory signals scan every enabled AWS region in parallel
Cost Explorer signals are account-level — not region-specific
Multi-account: member inventory via cross-account role; anomaly and commitment signals require a payer/management account connection

Cadence

Waste & anomaly signals

Every 6 h

Idle resources, orphaned storage, billing anomalies — refreshed automatically, no action needed

Cost hotspot signals

Each report run

Rightsizing, network, transfer, observability, managed services, containers — fresh on every report

Data sources

Where the data comes from

ZephMatrix uses authoritative AWS APIs directly — not third-party pricing databases or scraped cost estimates. Each signal category names its exact source below.

AWS data source	What ZephMatrix uses it for
AWS EC2 / EBS / ELB APIs	Inventory-based waste — volumes, snapshots, load balancers, NAT gateways, Elastic IPs, AMIs. Direct resource inspection, not billing estimates.
AWS CloudWatch Metrics	Utilization signals — CPU, database connections, network bytes. Used to distinguish genuinely idle resources from temporarily quiet ones.
AWS Cost Explorer	Billing-layer hotspots — data transfer, CloudWatch costs, managed service spend, container platform spend. Reveals cost patterns that inventory APIs cannot.
AWS Compute Optimizer	Rightsizing recommendations for EC2, Lambda, ECS, and RDS, backed by 14 days of CloudWatch utilization data analyzed by AWS ML.
AWS Cost Anomaly Detection	ML-identified spend spikes with root-cause attribution by service, region, and usage type. Requires payer or management account access.
AWS Savings Plans & RI APIs	Commitment coverage, utilization rates, and expiry timelines. Identifies gaps between on-demand spend and committed capacity.
AWS S3 / ECR / CloudWatch Logs APIs	Storage hygiene signals — incomplete multipart uploads, untagged ECR images, and log groups without retention policies.

The nine categories

What ZephMatrix looks for — and why

Not all nine categories are waste. Each has a different nature — read the signal type badge on each card to understand what you are looking at.

Direct wasteIdle or orphaned resource — safe to clean up

Financial incidentUnexpected spend spike — investigate and route

Financial gapOn-demand spend that could be committed

Optimization signalOver-provisioned compute — owner decision needed

Spend hotspotWhere the money is going — not necessarily waste

Spend concentrationLarge service line — efficiency review candidate

Resource hygiene Usage and architecture Financial control

Resource hygiene

Waste

Direct wasteHigh confidence · Immediate win

Idle and orphaned resources billed at full rate

EC2, EBS, ELB, RDS, S3, ECR, CloudWatch APIs

Direct inventory inspection across EC2, EBS, ELB, RDS, VPC, S3, ECR, and CloudWatch. Each sub-type has a specific qualification threshold — not a heuristic, a defined rule. High confidence, immediately actionable.

11 waste sub-types — qualification thresholds

Sub-type	Qualifies when
Unattached EBS volumes	Unattached for more than 7 days
Orphaned snapshots	Source volume deleted and snapshot age over 30 days
Stale AMIs	Older than 90 days, not in any running instance or launch template
Idle EC2 instances	Average CPU below 5% over 14 days, launched more than 3 days ago
Idle RDS instances	Average connections below 1 over 14 days
Idle NAT gateways	Less than 1 MB egress over 7 days
Unused Elastic IPs	No AssociationId, InstanceId, or NetworkInterfaceId present
gp2 → gp3 upgrade opportunities	Volume type is gp2 and same-size gp3 saves more than $1/month
CloudWatch log groups without retention	No retentionInDays set and stored bytes above 1 GB
Incomplete S3 multipart uploads	Incomplete uploads present and no AbortIncompleteMultipartUpload lifecycle rule configured
ECR repositories without lifecycle policies	No lifecycle policy, untagged images older than 14 days, cleanup potential above 1 GB

Rightsizing

Optimization signalMedium confidence · Strategic

Oversized compute flagged by AWS's own ML

AWS Compute Optimizer (EC2, Lambda, ECS, RDS)

ZephMatrix pulls recommendations directly from AWS Compute Optimizer, which analyzes 14 days of CloudWatch utilization data. We surface the highest-savings opportunity across EC2, Lambda, ECS, and RDS. Medium confidence — rightsizing still needs owner and workload validation.

Qualification criteria

·AWS Compute Optimizer internal ML heuristics (14-day CloudWatch lookback)
·Cross-region aggregation — all enabled regions scanned
·Top recommendation ranked by estimated monthly savings

Note — Rightsizing recommendations require Compute Optimizer to be enabled in your AWS account. ZephMatrix reads the results — it does not run its own sizing analysis.

Usage and architecture

Network

Spend hotspotMedium confidence · Owner review

NAT gateway traffic hotspots

EC2 DescribeNatGateways + CloudWatch BytesInFromSource, BytesOutToDestination

NAT gateways are charged per hour plus per GB processed — costs that can exceed the underlying EC2 spend. ZephMatrix measures actual 30-day traffic per NAT gateway and flags high-cost gateways. These are concentration signals, not waste — the finding shows where cost is and helps an owner decide if a VPC endpoint review is warranted.

Qualification criteria

·Traffic volume measured over 30 days
·Qualifies as a hotspot if total processed bytes exceed 100 GB/month
·Cost estimated at $32/month base + $0.045/GB data processing
·Top hotspot by estimated monthly spend is surfaced

Data transfer

Spend hotspotMedium confidence · Owner review

Inter-region and egress cost hotspots

AWS Cost Explorer GetCostAndUsage (30-day window)

Data transfer charges appear in billing as dozens of granular usage types — easily missed in a standard Cost Explorer view. ZephMatrix queries Cost Explorer filtered on transfer-type usage and groups by service, usage type, and region. Surfacing this tells you where the transfer cost is concentrated — not necessarily that it is waste.

Qualification criteria

·Filters on usage types containing datatransfer, dataxfer, or natgateway-bytes
·Qualifies if monthly cost is $25 or above
·Top 20 hotspots by cost returned, top finding surfaced in report

Observability

Spend hotspotMedium confidence · Owner review

CloudWatch log ingestion and storage cost

AWS Cost Explorer GetCostAndUsage (30-day window)

CloudWatch costs blend across ingestion, storage, and vended logs — categories that blur together in standard billing views. ZephMatrix isolates CloudWatch-specific usage types and surfaces the highest-cost line. This is a concentration signal — useful for identifying which log groups to review for retention policy gaps.

Qualification criteria

·Filtered on CloudWatch service
·Usage types: timedstorage, dataprocessing, vendedlog, logs, putlogevents
·Qualifies if monthly cost is $20 or above
·Top 20 hotspots returned, top finding surfaced in report

Financial control

Commitment

Financial gapMedium confidence · Strategic

Savings Plans and Reserved Instance gaps

AWS Cost Explorer, Savings Plans API, RI Utilization API (60-day window)

On-demand spend that could be covered by a Savings Plan or RI without any infrastructure change. ZephMatrix runs three analyses: coverage gaps, upcoming expiries, and utilization warnings. These are strategic signals — purchase decisions need finance review and spend stability confirmation.

Qualification criteria

·Coverage gap: on-demand compute spend analyzed over 60 days. Confidence scored by coefficient of variation of daily spend — stable spend earns higher confidence. High priority if confidence ≥ 75% and savings ≥ $500/month.
·Expiry alert: active Savings Plans or RIs expiring within 60 days
·Utilization warning: active commitments with utilization below 70% — over-committed, paying for unused capacity

Anomaly

Financial incidentHigh confidence · Owner review

ML-identified spend spikes with root-cause attribution

AWS Cost Anomaly Detection (ce:GetAnomalies, 30-day window)

ZephMatrix reads directly from AWS Cost Anomaly Detection — AWS's own ML service that monitors spend continuously and identifies abnormal cost changes. We surface the top 2 anomalies by dollar impact with their root-cause attribution. This is an incident signal — the first step is always investigation and review, not deletion.

Qualification criteria

·Any anomaly returned by AWS Cost Anomaly Detection qualifies — no custom threshold applied
·Top 2 anomalies by total dollar impact are surfaced
·Root causes captured: service, region, linked account, usage type
·Requires payer or management account access — member-account-only connections will not see this signal

Note — ZephMatrix does not run its own anomaly detection algorithm. AWS Cost Anomaly Detection is the same service AWS uses internally. We surface and contextualize its output.

Managed services

Spend concentrationLow confidence · Strategic

OpenSearch, ElastiCache, and Redshift spend

AWS Cost Explorer GetCostAndUsage (30-day window)

Managed database and cache services often represent large, stable monthly costs that grow unnoticed. ZephMatrix isolates these service lines from billing and surfaces any exceeding the threshold. These are concentration signals — not calling something wasteful, but flagging where the spend is so an owner can decide if a sizing or efficiency review is warranted.

Qualification criteria

·Services monitored: Amazon OpenSearch Service, Amazon ElastiCache, Amazon Redshift
·Qualifies if monthly spend is $50 or above per service
·Top 20 services returned, top finding surfaced in report

Container platforms

Spend concentrationLow confidence · Strategic

EKS, ECS, and Fargate spend

AWS Cost Explorer GetCostAndUsage (30-day window)

Container platform costs are opaque — cluster charges, task hours, and Fargate compute blend across multiple billing line items. ZephMatrix isolates these service lines and surfaces any exceeding the threshold. Concentration signal — shows where cluster efficiency work should start, not that there is definite waste.

Qualification criteria

·Services monitored: Amazon EKS, Amazon ECS, AWS Fargate
·Qualifies if monthly spend is $40 or above per service
·Top 20 services returned, top finding surfaced in report

The execution layer

Detection tells you what is wrong. The execution layer is what fixes it.

Every other FinOps tool stops at the finding. ZephMatrix runs a daily investigation loop — enriching findings with context, routing ownership, executing approved actions in AWS, and confirming whether savings materialized. This is the part that actually reduces your bill.

Detect

✓Waste and anomaly signals refresh every 6 hours automatically
✓Hotspot signals (rightsizing, network, transfer, etc.) refresh on each report run
✓Findings ranked by savings potential, confidence, and actionability

Investigate

✓Agent investigates the highest-value finding daily
✓Adds team and tag context — who created it, which environment it belongs to
✓Safety classification — is it safe to act on, or does it need review?
✓Utilization context — corroborating evidence from CloudWatch and Cost Explorer

Approve & execute

✓Proposed action presented with full evidence
✓You approve or reject inline — nothing executes without explicit approval
✓Agent executes the approved action directly in AWS via the scoped IAM role
✓Slack routing available for async team review

Verify

✓Baseline captured before any action executes
✓AWS Cost Explorer polled after execution to confirm spend changed
✓Savings confirmed or flagged — not estimated
✓Full audit trail: finding → case → approval → execution → outcome

How execution works

The base IAM role is read-only by default. ZephMatrix executes through four named workflows — each maps to a specific class of findings. Three of the four require an optional IAM capability add-on; one (anomaly investigation) runs on read-only access alone.

Workflow	What it does	Execution capability required
Non-Prod EC2 Stop Candidate Review	Identifies idle non-production EC2 instances, verifies owner and safety context, and prepares an approval-gated stop action. Does not handle rightsizing, termination, or storage changes.	ec2_scheduler
Orphaned Storage and Idle Resource Review	Reviews unattached EBS volumes, orphaned snapshots, unused Elastic IPs, stale AMIs, and idle load balancers or RDS instances. Execution-eligible items go through the approval gate; review-only items are routed to owners.	resource_cleanup_execution
Safe Savings Policy Review	Reviews low-risk policy optimization opportunities: EBS gp2 → gp3 upgrades, CloudWatch log retention gaps, S3 incomplete multipart cleanup, and ECR lifecycle policy gaps. Zero downtime, fully reversible.	safe_savings_execution
AWS Billing Anomaly Root-Cause and Routing	Investigates billing anomalies from AWS Cost Anomaly Detection, explains likely cost drivers with supporting evidence, and routes the issue to the right owner. No write actions — runs on read-only access alone.	None required

IAM actions added per capability group ↓

Capability	IAM actions added to the role
ec2_scheduler	ec2:StopInstances, ec2:StartInstances
resource_cleanup_execution	ec2:DeleteVolume, ec2:DeleteSnapshot, ec2:ReleaseAddress, ec2:DeregisterImage
safe_savings_execution	ec2:ModifyVolume, logs:PutRetentionPolicy, s3:PutLifecycleConfiguration, ecr:PutLifecyclePolicy

Coming next — Rightsizing execution (changing EC2/RDS instance types), load balancer deregistration, and RDS snapshot cleanup require additional owner validation steps and are not yet supported.

Approval-gated actions

Every action requires explicit human approval before it executes. Actions are classified into two categories by blast radius.

Cleanup actions

Irreversible — review resource before approving

EBS volume deletion

Delete an unattached volume. Irreversible — agent confirms no attachment before surfacing for approval.

Snapshot deletion

Delete an orphaned snapshot. Irreversible — source volume verified as deleted first.

Elastic IP release

Release an unassociated Elastic IP. Recoverable — a new EIP can be allocated at any time.

AMI deregistration

Deregister a stale AMI and delete its backing snapshots. Agent verifies no running instance or launch template references it.

Low-risk optimizations

Safe to approve — no data loss, fully reversible

gp2 → gp3 upgrade

Live volume type change. No downtime, no data loss, identical baseline IOPS at lower cost.

CloudWatch log retention

Set a 90-day retention policy on log groups with no policy set. No existing log data deleted on approval.

S3 multipart lifecycle rule

Add an AbortIncompleteMultipartUpload rule to a bucket. No existing objects touched.

ECR lifecycle policy

Add an untagged-image cleanup rule to an ECR repository. Only untagged images older than the policy window are removed.

EC2 start / stop scheduling Optional add-on

When the ec2_scheduler capability is enabled, the agent can stop and start non-production EC2 instances on a configured schedule (e.g. stop dev instances at 7 pm, start at 8 am). Only instances tagged ZephMatrixManaged=true are eligible — the IAM role rejects the call otherwise. This is an opt-in capability enabled per AWS connection.

What is never actioned autonomously — Production instances managed by Auto Scaling Groups, IaC-managed resources (Terraform, CDK, CloudFormation), and anything ZephMatrix classifies as protected scope are excluded from the approval queue entirely. The agent surfaces them as findings for human review only.

Verified savings

Confirmed outcomes, not projections

Before any approved action executes, ZephMatrix captures the current resource state and cost baseline. After execution, it checks AWS Cost Explorer to confirm whether spend actually changed. The result is a timestamped savings record tied to the specific action — not an estimate.

Pre-action

Resource state and cost baseline recorded

Execution

Approved action runs in AWS via scoped IAM role

Verification

Cost Explorer polled — savings confirmed or flagged

Audit trail

Finding → case → approval → outcome — full chain visible

Data access

What ZephMatrix accesses — and what it does not

What we access

✓Resource inventory metadata (IDs, types, states, tags)
✓CloudWatch metrics (CPU utilization, connection counts, traffic bytes)
✓AWS Cost Explorer billing data (spend by service, usage type, region)
✓Compute Optimizer recommendations
✓Cost Anomaly Detection results
✓Savings Plans and Reserved Instance utilization data
✓S3 bucket metadata, lifecycle configuration, and multipart upload listings
✓ECR repository metadata, image listings, and lifecycle policies
✓CloudWatch log group metadata and retention settings

What we never access

✕Application data, database rows, or file contents
✕S3 object contents — only bucket metadata and lifecycle configuration
✕CloudTrail event history
✕Secrets, credentials, or parameter store values
✕VPC Flow Logs or network packet data
✕EC2 instance memory contents
✕RDS query logs or database schema

IAM policy

The exact IAM policy used for read access — and the separate, narrower policy for approved write actions — is generated during setup and visible in your account dashboard. No permissions are requested beyond what each scan category requires.

Start free

Connect AWS and get your first Hidden Cost Report in under ten minutes.

The Discovery plan is free — no credit card required. Connect AWS, run recurring Hidden Cost Report refreshes during the pilot, and see all findings in full. Upgrade when you want the daily investigation loop and governed execution to run automatically.

Run the free report Talk to the team