Detection methodology
How ZephMatrix finds hidden AWS costs
Nine signal categories. Direct AWS API data. Exact qualification thresholds — not heuristics. And the execution layer that turns findings into confirmed savings.
9
Signal categories
11
Waste sub-types
7
AWS data sources
100%
Actions human-gated
Detection is step one
Finding waste is the easy part. Doing something about it is the hard part.
Every FinOps tool surfaces findings. Most stop there. ZephMatrix continues: the agent investigates the highest-value finding daily — adding owner attribution, safety classification, and utilization context — then routes action, executes under human approval, and confirms whether savings materialized against a pre-action baseline.
01
Detect
Nine signal categories, seven AWS data sources
02
Investigate
Agent adds owner, safety, and utilization context
03
Approve & execute
Human approves. Agent executes in AWS.
04
Verify
Savings confirmed against pre-action baseline
Scan mechanics
How a scan works
01
Connection
- Create a cross-account IAM role and paste the ARN — no agent or collector to install
- Read-only scope: cost data, inventory, tags, CloudWatch metrics, Compute Optimizer signals
- No data leaves your account — ZephMatrix queries AWS APIs directly using your role
02
Scan scope
- Inventory signals scan every enabled AWS region in parallel
- Cost Explorer signals are account-level — not region-specific
- Multi-account: member inventory via cross-account role; anomaly and commitment signals require a payer/management account connection
03
Cadence
Waste & anomaly signals
Every 6 hIdle resources, orphaned storage, billing anomalies — refreshed automatically, no action needed
Cost hotspot signals
Each report runRightsizing, network, transfer, observability, managed services, containers — fresh on every report
Data sources
Where the data comes from
ZephMatrix uses authoritative AWS APIs directly — not third-party pricing databases or scraped cost estimates. Each signal category names its exact source below.
| AWS data source | What ZephMatrix uses it for |
|---|---|
| AWS EC2 / EBS / ELB APIs | Inventory-based waste — volumes, snapshots, load balancers, NAT gateways, Elastic IPs, AMIs. Direct resource inspection, not billing estimates. |
| AWS CloudWatch Metrics | Utilization signals — CPU, database connections, network bytes. Used to distinguish genuinely idle resources from temporarily quiet ones. |
| AWS Cost Explorer | Billing-layer hotspots — data transfer, CloudWatch costs, managed service spend, container platform spend. Reveals cost patterns that inventory APIs cannot. |
| AWS Compute Optimizer | Rightsizing recommendations for EC2, Lambda, ECS, and RDS, backed by 14 days of CloudWatch utilization data analyzed by AWS ML. |
| AWS Cost Anomaly Detection | ML-identified spend spikes with root-cause attribution by service, region, and usage type. Requires payer or management account access. |
| AWS Savings Plans & RI APIs | Commitment coverage, utilization rates, and expiry timelines. Identifies gaps between on-demand spend and committed capacity. |
| AWS S3 / ECR / CloudWatch Logs APIs | Storage hygiene signals — incomplete multipart uploads, untagged ECR images, and log groups without retention policies. |
The nine categories
What ZephMatrix looks for — and why
Not all nine categories are waste. Each has a different nature — read the signal type badge on each card to understand what you are looking at.
Waste
Direct wasteHigh confidence · Immediate winIdle and orphaned resources billed at full rate
Direct inventory inspection across EC2, EBS, ELB, RDS, VPC, S3, ECR, and CloudWatch. Each sub-type has a specific qualification threshold — not a heuristic, a defined rule. High confidence, immediately actionable.
11 waste sub-types — qualification thresholds
| Sub-type | Qualifies when |
|---|---|
| Unattached EBS volumes | Unattached for more than 7 days |
| Orphaned snapshots | Source volume deleted and snapshot age over 30 days |
| Stale AMIs | Older than 90 days, not in any running instance or launch template |
| Idle EC2 instances | Average CPU below 5% over 14 days, launched more than 3 days ago |
| Idle RDS instances | Average connections below 1 over 14 days |
| Idle NAT gateways | Less than 1 MB egress over 7 days |
| Unused Elastic IPs | No AssociationId, InstanceId, or NetworkInterfaceId present |
| gp2 → gp3 upgrade opportunities | Volume type is gp2 and same-size gp3 saves more than $1/month |
| CloudWatch log groups without retention | No retentionInDays set and stored bytes above 1 GB |
| Incomplete S3 multipart uploads | Incomplete uploads present and no AbortIncompleteMultipartUpload lifecycle rule configured |
| ECR repositories without lifecycle policies | No lifecycle policy, untagged images older than 14 days, cleanup potential above 1 GB |
Rightsizing
Optimization signalMedium confidence · StrategicOversized compute flagged by AWS's own ML
ZephMatrix pulls recommendations directly from AWS Compute Optimizer, which analyzes 14 days of CloudWatch utilization data. We surface the highest-savings opportunity across EC2, Lambda, ECS, and RDS. Medium confidence — rightsizing still needs owner and workload validation.
Qualification criteria
- ·AWS Compute Optimizer internal ML heuristics (14-day CloudWatch lookback)
- ·Cross-region aggregation — all enabled regions scanned
- ·Top recommendation ranked by estimated monthly savings
Note — Rightsizing recommendations require Compute Optimizer to be enabled in your AWS account. ZephMatrix reads the results — it does not run its own sizing analysis.
Network
Spend hotspotMedium confidence · Owner reviewNAT gateway traffic hotspots
NAT gateways are charged per hour plus per GB processed — costs that can exceed the underlying EC2 spend. ZephMatrix measures actual 30-day traffic per NAT gateway and flags high-cost gateways. These are concentration signals, not waste — the finding shows where cost is and helps an owner decide if a VPC endpoint review is warranted.
Qualification criteria
- ·Traffic volume measured over 30 days
- ·Qualifies as a hotspot if total processed bytes exceed 100 GB/month
- ·Cost estimated at $32/month base + $0.045/GB data processing
- ·Top hotspot by estimated monthly spend is surfaced
Data transfer
Spend hotspotMedium confidence · Owner reviewInter-region and egress cost hotspots
Data transfer charges appear in billing as dozens of granular usage types — easily missed in a standard Cost Explorer view. ZephMatrix queries Cost Explorer filtered on transfer-type usage and groups by service, usage type, and region. Surfacing this tells you where the transfer cost is concentrated — not necessarily that it is waste.
Qualification criteria
- ·Filters on usage types containing datatransfer, dataxfer, or natgateway-bytes
- ·Qualifies if monthly cost is $25 or above
- ·Top 20 hotspots by cost returned, top finding surfaced in report
Observability
Spend hotspotMedium confidence · Owner reviewCloudWatch log ingestion and storage cost
CloudWatch costs blend across ingestion, storage, and vended logs — categories that blur together in standard billing views. ZephMatrix isolates CloudWatch-specific usage types and surfaces the highest-cost line. This is a concentration signal — useful for identifying which log groups to review for retention policy gaps.
Qualification criteria
- ·Filtered on CloudWatch service
- ·Usage types: timedstorage, dataprocessing, vendedlog, logs, putlogevents
- ·Qualifies if monthly cost is $20 or above
- ·Top 20 hotspots returned, top finding surfaced in report
Commitment
Financial gapMedium confidence · StrategicSavings Plans and Reserved Instance gaps
On-demand spend that could be covered by a Savings Plan or RI without any infrastructure change. ZephMatrix runs three analyses: coverage gaps, upcoming expiries, and utilization warnings. These are strategic signals — purchase decisions need finance review and spend stability confirmation.
Qualification criteria
- ·Coverage gap: on-demand compute spend analyzed over 60 days. Confidence scored by coefficient of variation of daily spend — stable spend earns higher confidence. High priority if confidence ≥ 75% and savings ≥ $500/month.
- ·Expiry alert: active Savings Plans or RIs expiring within 60 days
- ·Utilization warning: active commitments with utilization below 70% — over-committed, paying for unused capacity
Anomaly
Financial incidentHigh confidence · Owner reviewML-identified spend spikes with root-cause attribution
ZephMatrix reads directly from AWS Cost Anomaly Detection — AWS's own ML service that monitors spend continuously and identifies abnormal cost changes. We surface the top 2 anomalies by dollar impact with their root-cause attribution. This is an incident signal — the first step is always investigation and owner routing, not deletion.
Qualification criteria
- ·Any anomaly returned by AWS Cost Anomaly Detection qualifies — no custom threshold applied
- ·Top 2 anomalies by total dollar impact are surfaced
- ·Root causes captured: service, region, linked account, usage type
- ·Requires payer or management account access — member-account-only connections will not see this signal
Note — ZephMatrix does not run its own anomaly detection algorithm. AWS Cost Anomaly Detection is the same service AWS uses internally. We surface and contextualize its output.
Managed services
Spend concentrationLow confidence · StrategicOpenSearch, ElastiCache, and Redshift spend
Managed database and cache services often represent large, stable monthly costs that grow unnoticed. ZephMatrix isolates these service lines from billing and surfaces any exceeding the threshold. These are concentration signals — not calling something wasteful, but flagging where the spend is so an owner can decide if a sizing or efficiency review is warranted.
Qualification criteria
- ·Services monitored: Amazon OpenSearch Service, Amazon ElastiCache, Amazon Redshift
- ·Qualifies if monthly spend is $50 or above per service
- ·Top 20 services returned, top finding surfaced in report
Container platforms
Spend concentrationLow confidence · StrategicEKS, ECS, and Fargate spend
Container platform costs are opaque — cluster charges, task hours, and Fargate compute blend across multiple billing line items. ZephMatrix isolates these service lines and surfaces any exceeding the threshold. Concentration signal — shows where cluster efficiency work should start, not that there is definite waste.
Qualification criteria
- ·Services monitored: Amazon EKS, Amazon ECS, AWS Fargate
- ·Qualifies if monthly spend is $40 or above per service
- ·Top 20 services returned, top finding surfaced in report
The execution layer
Detection tells you what is wrong. The execution layer is what fixes it.
Every other FinOps tool stops at the finding. ZephMatrix runs a daily investigation loop — enriching findings with context, routing ownership, executing approved actions in AWS, and confirming whether savings materialized. This is the part that actually reduces your bill.
01
Detect
- ✓Waste and anomaly signals refresh every 6 hours automatically
- ✓Hotspot signals (rightsizing, network, transfer, etc.) refresh on each report run
- ✓Findings ranked by savings potential, confidence, and actionability
02
Investigate
- ✓Agent investigates the highest-value finding daily
- ✓Adds owner attribution — who created it, who is responsible
- ✓Safety classification — is it safe to act on, or does it need review?
- ✓Utilization context — corroborating evidence from CloudWatch and Cost Explorer
03
Approve & execute
- ✓Proposed action presented with full evidence
- ✓You approve or reject inline — nothing executes without explicit approval
- ✓Agent executes the approved action directly in AWS via the scoped IAM role
- ✓Slack routing available for async team review
04
Verify
- ✓Baseline captured before any action executes
- ✓AWS Cost Explorer polled after execution to confirm spend changed
- ✓Savings confirmed or flagged — not estimated
- ✓Full audit trail: finding → case → approval → execution → outcome
How execution works
The base IAM role is read-only by default. ZephMatrix executes through four named workflows — each maps to a specific class of findings. Three of the four require an optional IAM capability add-on; one (anomaly investigation) runs on read-only access alone.
| Workflow | What it does | Execution capability required |
|---|---|---|
| Non-Prod EC2 Stop Candidate Review | Identifies idle non-production EC2 instances, verifies owner and safety context, and prepares an approval-gated stop action. Does not handle rightsizing, termination, or storage changes. | ec2_scheduler |
| Orphaned Storage and Idle Resource Review | Reviews unattached EBS volumes, orphaned snapshots, unused Elastic IPs, stale AMIs, and idle load balancers or RDS instances. Execution-eligible items go through the approval gate; review-only items are routed to owners. | resource_cleanup_execution |
| Safe Savings Policy Review | Reviews low-risk policy optimization opportunities: EBS gp2 → gp3 upgrades, CloudWatch log retention gaps, S3 incomplete multipart cleanup, and ECR lifecycle policy gaps. Zero downtime, fully reversible. | safe_savings_execution |
| AWS Billing Anomaly Root-Cause and Routing | Investigates billing anomalies from AWS Cost Anomaly Detection, explains likely cost drivers with supporting evidence, and routes the issue to the right owner. No write actions — runs on read-only access alone. | None required |
IAM actions added per capability group ↓
| Capability | IAM actions added to the role |
|---|---|
| ec2_scheduler | ec2:StopInstances, ec2:StartInstances |
| resource_cleanup_execution | ec2:DeleteVolume, ec2:DeleteSnapshot, ec2:ReleaseAddress, ec2:DeregisterImage |
| safe_savings_execution | ec2:ModifyVolume, logs:PutRetentionPolicy, s3:PutLifecycleConfiguration, ecr:PutLifecyclePolicy |
Coming next — Rightsizing execution (changing EC2/RDS instance types), load balancer deregistration, and RDS snapshot cleanup require additional owner validation steps and are not yet supported.
Approval-gated actions
Every action requires explicit human approval before it executes. Actions are classified into two categories by blast radius.
Cleanup actions
Irreversible — review resource before approving
EBS volume deletion
Delete an unattached volume. Irreversible — agent confirms no attachment before surfacing for approval.
Snapshot deletion
Delete an orphaned snapshot. Irreversible — source volume verified as deleted first.
Elastic IP release
Release an unassociated Elastic IP. Recoverable — a new EIP can be allocated at any time.
AMI deregistration
Deregister a stale AMI and delete its backing snapshots. Agent verifies no running instance or launch template references it.
Low-risk optimizations
Safe to approve — no data loss, fully reversible
gp2 → gp3 upgrade
Live volume type change. No downtime, no data loss, identical baseline IOPS at lower cost.
CloudWatch log retention
Set a 90-day retention policy on log groups with no policy set. No existing log data deleted on approval.
S3 multipart lifecycle rule
Add an AbortIncompleteMultipartUpload rule to a bucket. No existing objects touched.
ECR lifecycle policy
Add an untagged-image cleanup rule to an ECR repository. Only untagged images older than the policy window are removed.
EC2 start / stop scheduling Optional add-on
When the ec2_scheduler capability is enabled, the agent can stop and start non-production EC2 instances on a configured schedule (e.g. stop dev instances at 7 pm, start at 8 am). Only instances tagged ZephMatrixManaged=true are eligible — the IAM role rejects the call otherwise. This is an opt-in capability enabled per AWS connection.
What is never actioned autonomously — Production instances managed by Auto Scaling Groups, IaC-managed resources (Terraform, CDK, CloudFormation), and anything ZephMatrix classifies as protected scope are excluded from the approval queue entirely. The agent surfaces them as findings for human review only.
Verified savings
Confirmed outcomes, not projections
Before any approved action executes, ZephMatrix captures the current resource state and cost baseline. After execution, it checks AWS Cost Explorer to confirm whether spend actually changed. The result is a timestamped savings record tied to the specific action — not an estimate.
Pre-action
Resource state and cost baseline recorded
Execution
Approved action runs in AWS via scoped IAM role
Verification
Cost Explorer polled — savings confirmed or flagged
Audit trail
Finding → case → approval → outcome — full chain visible
Data access
What ZephMatrix accesses — and what it does not
What we access
- ✓Resource inventory metadata (IDs, types, states, tags)
- ✓CloudWatch metrics (CPU utilization, connection counts, traffic bytes)
- ✓AWS Cost Explorer billing data (spend by service, usage type, region)
- ✓Compute Optimizer recommendations
- ✓Cost Anomaly Detection results
- ✓Savings Plans and Reserved Instance utilization data
- ✓S3 bucket metadata, lifecycle configuration, and multipart upload listings
- ✓ECR repository metadata, image listings, and lifecycle policies
- ✓CloudWatch log group metadata and retention settings
What we never access
- ✕Application data, database rows, or file contents
- ✕S3 object contents — only bucket metadata and lifecycle configuration
- ✕CloudTrail event history
- ✕Secrets, credentials, or parameter store values
- ✕VPC Flow Logs or network packet data
- ✕EC2 instance memory contents
- ✕RDS query logs or database schema
IAM policy
The exact IAM policy used for read access — and the separate, narrower policy for approved write actions — is generated during setup and visible in your account dashboard. No permissions are requested beyond what each scan category requires.
Start free
Connect AWS and get your first Hidden Cost Report in under ten minutes.
The Discovery plan is free — no credit card required. Connect one AWS account, run up to three report refreshes per month, and see all findings in full. Upgrade when you want the daily investigation loop and governed execution to run automatically.