AWS EC2 Cost Optimization: Complete Guide (2025)

EC2 costs usually come down to three levers: pricing model, instance choice, and waste control. This complete guide shows what to check, how to fix it safely, and how to stop the same issues coming back—manually or with automation.

Quick Wins
EC2 Pricing in 90 Seconds
Pick the Right Instance Type
Use Spot, RI & Savings Plans Responsibly
Waste Fixes
Schedule Start/Stop & Hibernate Dev/Test
Guardrails That Stick
What to Automate
KPIs & Reporting
EC2 FAQs
TL;DR

Quick Wins (2 Minute Checklist)

Start here. Use this to triage fast; detailed steps live further below in Waste fixes (step‑by‑step).

Public IPv4 / Elastic IPs — remove any you don’t need; prefer private + NAT/ALB or IPv6. (See detailed fix below.)

Unattached/idle EBS volumes — snapshot if needed, then delete or downsize. (See detailed fix below.)

Idle ALB/NLB — validate no active targets/routes; then delete. (See detailed fix below.)

Low‑traffic NAT gateways — retire or re‑architect; avoid cross‑AZ consolidation. (See detailed fix below.)

ℹ️ Tip: Use a rules+automation view that surfaces disassociated EIPs, unattached/idle EBS, idle LBs, and idle/low-traffic NAT gateways and lets you fix in one click (release EIP, delete volume/snapshot/LB/NAT). Hyperglance provides those exact checks and actions.

EC2 Pricing in 90 Seconds (2025)

You don’t need a PhD to pick a model. Use this quick guide.

Model	Best for	Pros	Watch‑outs
On‑Demand	Unpredictable or short‑lived	Zero commitment	Most expensive for steady 24×7
Savings Plans (Compute/EC2)	Steady but evolving	Broad coverage, flexible	Commit to $/hour for 1–3 years
Reserved Instances (Zonal/Regional)	Long‑lived, stable	Deeper discounts, capacity reservation	Less flexible than SPs
Spot	Interruptible (batch, CI/CD, ML training)	Biggest discounts	Two‑minute interruption notice; automate rebalancing

How to choose (rule of thumb):

Measure steady state (the “always on” baseline).
Cover it with Savings Plans (SP) / Reserved Instances (RI)
to the comfort level of your org.
Layer Spot for interruptible/batch.
Leave the spiky remainder on On‑Demand.

Note on AWS public IPv4 pricing (2024): AWS now charges all public IPv4 addresses at $0.005 per IP‑hour. Free Tier includes 750 hours/month for one in‑use public IPv4; BYOIP is still free. Plan migrations (ALB/NLB, NAT, IPv6) accordingly.

ℹ️ Tip: Pair right-sizing & commitment planning with a cost dashboard so you can see what to cover with RI/SP and what to leave On-Demand. Hyperglance includes RI recommendations, right-sizing, budgets and tagged spend views to guide safer commits.

Pick the Right Instance Type (and Generation)

Three common mistakes:

Over‑provisioning vCPU/RAM “just in case.”
Ignoring newer generations (paying more for less).
Staying on legacy families when workload needs changed (e.g., memory‑ vs. compute‑bound).

Simple method:

Baseline CPU, memory, network, and EBS metrics for a representative period.
Test next‑gen families (e.g., move from m5 to m7g/m7i equivalents where compatible).
Validate performance; if equal or better, lock in savings with Savings Plans (SP) / Reserved Instances (RI).

Right‑sizing 101 (size ladder & example)

A t3.micro can be orders of magnitude cheaper than a t3.large for tiny dev/test services—and on new AWS accounts (<12 months) one t2.micro (or t3.micro where t2.micro isn’t available) is Free Tier–eligible (up to 750 hours/month).

Start small, watch utilization (CPU/RAM/EBS throughput), and scale up (bigger size) or out (more instances) based on bottlenecks. Use autoscaling where appropriate to avoid paying for idle headroom. Always verify Free Tier eligibility in your console.

ℹ️ Tip: Run a right-sizing pass before you lock in commitments. Hyperglance offers right-sizing recommendations and commit planning so you can trial newer families, then commit with confidence.

Other Considerations

Burstable (t) and credit math:* t‑series accrue/spend CPU credits. For low steady CPU with bursts, they’re cost‑effective; if credits deplete, costs rise—especially in Unlimited mode, where surplus credits are billed at published per‑vCPU‑hour rates. Monitor CPUCreditBalance and budget alerts. Consider t4g (Graviton) for better price/perf when your stack supports ARM.
Network/storage‑bound: choose instances with higher network/EBS bandwidth.
GPU/AI workloads: For training, consider P5 (H100) or P4d (A100); for inference/graphics, G5/G6 (A10G/L4‑class). Evaluate Trn1 (Trainium) for training and Inf2 (Inferentia) for inference where supported—often lower $/throughput. Right‑size GPU count & memory to model size; aim for >70% sustained GPU utilization and high memory occupancy.
Throughput bottlenecks: Keep accelerators fed with fast I/O—EFA for distributed training, FSx for Lustre/S3 data pipelines, and adequate EBS/instance NVMe. Optimize dataloaders and batch sizes to minimize host‑side stalls.
Reserving capacity & pricing: For steady GPU queues, combine Savings Plans/Reserved Instances with On‑Demand Capacity Reservations to secure availability; use Spot only for fault‑tolerant jobs with frequent checkpoints and automatic rebalancing.
Make the same work cheaper: Use mixed precision (FP16/bfloat16/FP8), gradient checkpointing, and quantization (INT8/FP8) where frameworks allow to reduce GPU hours and memory needs.
AMD vs Intel price/perf: AMD (m7a/c7a) often has a lower on‑demand price than comparable Intel (m7i/c7i) in many regions—check current pricing for your region/size before committing. Validate with a short canary.
Graviton (ARM) savings: Where supported, Graviton (arm64) can deliver up to ~40% better price/perf vs x86/x64. Test multi‑arch builds and measure p95/p99 before rollout.

Use Spot, RI & Savings Plans Responsibly

Spot: AWS can reclaim capacity with as little as a two‑minute interruption notice—often not enough time to react manually. Design for interruptions (checkpointing, idempotent jobs, diversified pools), and use Instance Rebalance Recommendations (IMDS + EventBridge) and Capacity Rebalancing in Auto Scaling to launch replacements before the 2‑minute timer, when signals arrive early. Keep an On‑Demand fallback.
Best‑effort note: Rebalance signals can arrive close to the 2‑minute warning—automation beats human intervention.
Savings Plans vs RI:
- SPs are more flexible (especially Compute SPs); RIs can offer deeper discounts or capacity guarantees.
- Don’t over‑commit: cover the known steady part of your usage, not the peaks.
Coverage approach: Measure → Cover steady with SP/RI → Add Spot for interruptible → Review quarterly.

🧠 Need to brush up on your jargon? Head over to our FinOps glossary.

Waste Fixes (Step-by-Step)

Unattached/idle EBS volumes

Spot it: Unattached volumes; volumes attached to stopped instances; massively over-provisioned gp3/io* vs usage.
Fix safely: Snapshot, then delete or right-size.
Prevent: IaC + image pipelines set sane defaults; enforce DeleteOnTerminate=true; automation flags unattached > N days.

ℹ️ Tip: Automate the hygiene loop: snapshot → delete/right-size after N days of being unattached/idle. Hyperglance includes rules for unattached/idle EBS and actions to delete volumes and snapshots.

Public IPv4 / Elastic IPs (in-use or idle)

Spot it: Any public IPv4 (Elastic IP) in your account—attached or idle—accrues cost; idle ones add zero value.
Fix safely: Re-associate or release disassociated EIPs; review whether the workload actually needs a public IPv4 (ALB/NLB, NAT, or IPv6 may suffice).
Prevent: Tag ownership and purpose; alert on disassociated EIPs and on resources with unnecessary public IPv4; prefer IPv6 where supported.

ℹ️ Tip: Keep a live inventory of public IPv4 and auto-clean drift: when an EIP is disassociated, trigger an automation to release it after a grace period. Hyperglance ships the rule and the Release Elastic IP action

Idle ALB/NLB

Spot it: No registered targets; near-zero requests over a rolling window.
Fix safely: Double-check DNS and health checks; then delete.
Prevent: Lifecycle rules to retire after X idle days.

ℹ️ Tip: When an ALB/NLB has no targets or sustained near-zero traffic, raise a change ticket and, on approval, delete it automatically. Hyperglance has idle/no-targets LB rules and a Delete Load Balancer action.

NAT gateways with negligible traffic

Spot it: Bytes processed near zero; duplicated NAT per tiny subnets/AZs.
Fix safely: Consolidate only when traffic is negligible; avoid consolidating across AZs (cross-AZ data processing/transfer charges can erase savings); verify route tables; consider alternatives (per-AZ NATs, VPC gateway endpoints for S3/DynamoDB) when appropriate.
Prevent: Minimum throughput policy + alerts.

ℹ️ Tip: Flag idle/underutilized NATs and, where safe, queue a Delete NAT Gateway action; highlight per-AZ traffic so teams avoid cross-AZ consolidation mistakes. Hyperglance includes the idle NAT rule and a Delete NAT Gateway action.

Snapshot hygiene

Spot it: Orphaned or ancient snapshots that outlived their purpose.
Fix safely: Apply retention policies (e.g., 7/30/90); clean snapshots for terminated resources.
Prevent: Automated lifecycle management + exception tags.

ℹ️ Tip: Enforce 7/30/90 retention and bulk-remove orphaned snapshots older than your policy. Hyperglance includes an “Orphaned EBS snapshots >30 days” rule and a Delete EBS Snapshot action.

Schedule Start/Stop & Hibernate Dev/Test

If it doesn’t need to run 24×7, don’t pay 24×7.

When to schedule: Dev, QA/UAT, training, lab environments, data science sandboxes.
Approach:
- Create a simple tag‑based schedule (e.g., office hours).
- Or, segment by AWS account per environment (e.g., a dedicated dev account) and apply a blanket automation to that account (e.g., shut down all non‑exempt instances at 20:00 Friday), with an allowlist for exceptions.
- Support hibernate for fast resume when workable.
- Maintain exception lists (e.g., Schedule=NeverStop).

Example tags you can standardize:

Owner=team-x
Environment=dev
Schedule=OfficeHours # e.g., 08:00–19:00 Mon–Fri
Retire-After=2025-12-31
CostCenter=CC-1234

ℹ️ Tip: Apply blanket schedules by tag or by environment account (e.g., dev), then report hours avoided weekly. Hyperglance supports automated stop/start schedules for compute resources.

Guardrails That Stick (Prevention Layer)

Tagging standards that drive automation: Owner, CostCenter, Environment, Schedule, Retire-After, DataClass.

Golden images / IaC defaults:
- Right‑sized root volumes (not 100 GB “just because”)
- Latest generation families as defaults
- DeleteOnTerminate=true
Same‑AZ by default: Prefer same‑AZ data paths where HA allows; be explicit about cross‑AZ data transfer costs; design per‑AZ NATs and cache layers to minimize inter‑AZ traffic.
Change windows & approvals: Auto‑fix the no‑brainers; ticket the risky stuff.

ℹ️ Tip: Treat tag completeness & normalization as a control so cost ownership and schedules stick. Hyperglance (or similar) offers Tag Normalization plus automations to add/update/remove tags at scale.

What to Automate (and What Not To)

Safe to auto‑fix:

Disassociated EIPs older than 24h → release
Unattached EBS older than 7 days → snapshot + delete
Idle ALB/NLB with no targets for 14 days → delete
Snapshot lifecycle enforcement

Alert/ticket first:

Anything that could alter production routing
NAT gateway changes
Instance family migrations for critical apps

KPIs & Reporting

Track these to prove impact:

Waste removed ($/month) split by environment/team
Time‑to‑fix (alert → remediation)
Coverage by SP/RI (% of steady usage covered)
Spot utilization hours (and fallback events)
Exception count (and recurring offenders)

ℹ️ Tip: Use dashboards for tagged spend, budgets, trends/anomalies and a “savings from automations” roll-up that non-FinOps stakeholders can understand. Hyperglance (or similar) covers Cost Explorer (tagged spend), budgets, trends/anomalies out of the box.

EC2 FAQs

How do I reduce EC2 costs quickly? Start with high‑confidence waste: public IPv4/EIPs (attached or idle), unattached EBS, idle LBs/NAT, and stopped instances with billable storage. Then right‑size and schedule non‑prod.

Which EC2 pricing model is cheapest for steady workloads? Savings Plans or RIs usually beat On‑Demand for 24×7 usage. Cover the measured baseline, not the peaks.

Are EC2 Spot instances reliable for production? Yes—for fault‑tolerant services engineered for loss of nodes. Plan for only ~2 minutes’ notice before interruption. Automate early drain/replace using Instance Rebalance Recommendations and Capacity Rebalancing; keep diversified instance types/AZs and an On‑Demand fallback.

Savings Plans vs Reserved Instances—what’s the difference? SPs are more flexible (especially Compute SPs). RIs can offer deeper discounts or capacity reservations. Many teams mix both.

What’s the cheapest EC2 instance type for dev/test? Start by right‑sizing: if the workload is truly tiny, a t3.micro costs far less than a t3.large, and on new AWS accounts (<12 months) one t2.micro (or t3.micro where t2.micro isn’t available) is Free Tier–eligible (up to 750 hours/month). Often t‑class burstable instances work well for dev/test—just monitor CPU credits. If you’re constantly depleting them, switch to a non‑burstable general‑purpose instance (e.g., m7i.large) or size up within t‑class.

How do I find and remove unattached EBS volumes? List unattached volumes, snapshot if needed, then delete or right‑size. Prevent repeats by setting DeleteOnTerminate=true in your images/IaC.

Why am I paying for public IPv4 addresses when my instance is stopped? Since 1 Feb 2024, AWS bills all public IPv4 (Elastic IP) addresses at $0.005 per IP‑hour—attached or idle, regardless of instance state. The Free Tier includes 750 hours/month for one in‑use public IPv4; BYOIP remains free. Reduce usage (prefer private + NAT/ALB or IPv6), release unneeded EIPs, and alert on drift.

Can I automatically stop EC2 instances after hours? Yes—use tag‑driven schedules or account‑level automation for a dedicated dev/test account, plus a nightly job. Keep an exception tag for critical resources.

🤓 What's the FinOps Framework? Find out in our guide to FinOps.

TL;DR Checklist

Pick the right pricing model: cover steady with SP/RI; layer Spot for interruptible.
Right‑size onto current‑gen instance types.
Purge waste: EBS, EIP, idle LB/NAT, stale snapshots.
Schedule non‑prod.
Bake guardrails into IaC and enforce with automation.

Why Teams Choose Hyperglance

Hyperglance gives FinOps teams, architects, and engineers real-time visibility across AWS, Azure, and GCP — costs, security, and performance in one view.

Spot waste, fix issues automatically, and stay ahead of your spend with built-in FinOps intelligence and no-code automation.

Visual clarity: Interactive diagrams show every relationship and cost driver.
Actionable automation: Detect and fix cost and security issues automatically.
Built for FinOps: Hundreds of optimization rules and analytics, out of the box.
Multi-cloud ready: Unified visibility across AWS, Azure, and GCP.

Book a demo today, or find out how Hyperglance helps you cut waste and complexity.

Book a Demo

Hyperglance, a FinOps Certified Platform

About The Author: David Gill

As Hyperglance's Chief Technology Officer (CTO), David looks after product development & maintenance, providing strategic direction for all things tech. Having been at the core of the Hyperglance team for over 10 years, cloud optimization is at the heart of everything David does.

Follow David on LinkedIn >

Follow Hyperglance on LinkedIn >

AWS EC2 Cost Optimization: Complete Guide

Contents