AWS launched the AWS Well-Architected Framework to help cloud architects design & operate securely & efficiently, and help teams make better-informed decisions when building applications.
You've probably heard about Well-Architected. Maybe you'd like to get certified as an AWS Well-Architected Partner?
Here's what you need to know about the framework:
What is the AWS Well-Architected Framework?
The AWS Well-Architected Framework was introduced in 2015. It describes the key concepts, design principles, and best practices to consider when operating in the cloud. On paper, it's applicable to AWS, but the vast majority of the content applies to any brand of cloud architecture.
Those adhering to the framework should ensure their cloud is secure, resilient, performant, and sustainable. After answering a few foundational questions, you can see how well your architecture aligns with best practice, and how to improve it.
In AWS' own words:
"if you neglect the six pillars...it can become challenging to build a system that delivers on your expectations and requirements"
The framework also includes domain-specific 'lenses'. These Lenses go into more detail than the general guidance does, covering domains including machine learning, data analytics, IoT, media streaming, financial services, and gaming.
What are the AWS Well-Architected Framework Pillars?
The AWS Well-Architected Framework had grown steadily from 4 to 6 pillars since its inception in late 2015. The current pillars are:
- Operational Excellence
- Performance Efficiency
- Cost Optimization
The addition of Operational Excellence turned 4 pillars into 5 in November 2016. The sixth pillar, Sustainability, has been in place since AWS announced it in December 2021, a convenient fit for AWS' 2025 100% renewable power target.
Lots of us have grown weary of construction references in technology (think Agile v Waterfall debates), but they remain difficult to avoid. Think of the pillars as core operating principles, or areas of high-level guidance. To use another construction reference, the pillars are at the foundation of how cloud architects should govern their AWS setup.
The AWS Well-Architected Framework Pillars
AWS Well-Architected Pillar Structure
Each of the six pillars has:
- An official Definition
- Multiple Design Principles
- Multiple Best Practices, grouped into categories
- A prescriptive guide (referred to as a White Paper) with links to other useful resources, e.g. case studies, training, detailed guides
Importantly, each Best Practice category has at least one self-assessment Question.
These questions are uniquely labeled, e.g. OPS1, OPS2, and are designed to help organizations assess their adherence to the pillars (and discover improvements they can make).
By subscribing, you're agreeing that Hyperglance can email you news, tips, updates & offers. You can unsubscribe at any time.
The AWS Well-Architected Framework Checklist
If you're after a one-page summary of all of the pillars, design principles, areas of best practice, and questions, you're in the right place.
A Map of the AWS Well-Architected Framework (source: AWS)
1. Operational Excellence Pillar
This pillar focuses on the day-to-day running and monitoring of systems, and continous improvement of processes and procedures. Important topics include defining standards, automation of changes, and responding to events.
- Perform operations as code
- Make frequent, small, reversible changes
- Refine operations procedures frequently
- Anticipate failure
- Learn from all operational failures
Best Practice Categories & Topics:
- Organization > Organization Priorities
- Organization > Operating Model
- Organization > Organizational Culture
- Prepare > Design Telemetry
- Prepare > Design for Operations
- Prepare > Mitigate Deployment Risks
- Prepare > Operational Readiness and Change Management
- Operate > Understanding Workload Health
- Operate > Understanding Operational Health
- Operate > Responding to Events
- Evolve > Learn, Share, and Improve
- OPS 1: How do you determine what your priorities are?
- OPS 2: How do you structure your organization to support your business outcomes?
- OPS 3: How does your organizational culture support your business outcomes?
- OPS 4: How do you design your workload so that you can understand its state?
- OPS 5: How do you reduce defects, ease remediation, and improve flow into production?
- OPS 6: How do you mitigate deployment risks?
- OPS 7: How do you know that you are ready to support a workload?
- OPS 8: How do you understand the health of your workload?
- OPS 9: How do you understand the health of your operations?
- OPS 10: How do you manage workload and operations events?
- OPS 11: How do you evolve operations?
2. Security Pillar
This pillar focuses on protecting your information and systems. Important topics include user permissions, security event detection, and data integrity & confidentiality.
- Implement a strong identity foundation
- Enable traceability
- Apply security at all layers
- Automate security best practices
- Protect data in transit and at rest
- Keep people away from data
- Prepare for security events
Best Practice Categories & Topics:
- Security > Shared Responsibility
- Security > Governance
- Security > Operating Your Workloads Securely
- Security > AWS Account Management and Separation
- Identity & Access Management > Identity Management
- Identity & Access Management > Permissions Management
- Detection > Configure
- Detection > Investigate
- Infrastructure Protection > Protecting Networks
- Infrastructure Protection > Protecting Compute
- Data Protection > Data Classification
- Data Protection > Protecting Data at Rest
- Data Protection > Protecting Data in Transit
- Incident Response > Design Goals of Cloud Response
- Incident Response > Educate
- Incident Response > Prepare
- Incident Response > Simulate
- Incident Response > Iterate
- SEC 1: How do you securely operate your workload?
- SEC 2: How do you manage identities for people and machines?
- SEC 3: How do you manage permissions for people and machines?
- SEC 4: How do you detect and investigate security events?
- SEC 5: How do you protect your network resources?
- SEC 6: How do you protect your compute resources?
- SEC 7: How do you classify your data?
- SEC 8: How do you protect your data at rest?
- SEC 9: How do you protect your data in transit?
- SEC 10: How do you anticipate, respond to, and recover from incidents?
3. Reliability Pillar
This pillar focuses on ensuring workloads perform their intended functions and can recover quickly when things go wrong. Important topics include recovery planning, adapting to ever-changing requirements, and distributed system design.
- Automatically recover from failure
- Test recovery procedures
- Scale horizontally to increase aggregate workload availability
- Stop guessing capacity
- Manage change in automation
Best Practice Categories & Topics:
- Foundations > Manage Service Quotas and Constraints
- Foundations > Plan your Network Topology
- Workload Architecture > Design Your Workload Service Architecture
- Workload Architecture > Design Interactions in a Distributed System to Prevent Failures
- Workload Architecture > Design Interactions in a Distributed System to Mitigate or Withstand Failures
- Change Management > Monitor Workload Resources
- Change Management > Design your Workload to Adapt to Changes in Demand
- Change Management > Implement Change
- Failure Management > Back up Data
- Failure Management > Use Fault Isolation to Protect Your Workload
- Failure Management > Design your Workload to Withstand Component Failures
- Failure Management > Test Reliability
- Failure Management > Plan for Disaster Recovery (DR)
- REL 1: How do you manage service quotas and constraints?
- REL 2: How do you plan your network topology?
- REL 3: How do you design your workload service architecture?
- REL 4: How do you design interactions in a distributed system to prevent failures?
- REL 5: How do you design interactions in a distributed system to mitigate or withstand failures?
- REL 6: How do you monitor workload resources?
- REL 7: How do you design your workload to adapt to changes in demand?
- REL 8: How do you implement change?
- REL 9: How do you back up data?
- REL 10: How do you use fault isolation to protect your workload?
- REL 11: How do you design your workload to withstand component failures?
- REL 12: How do you test reliability?
- REL 13: How do you plan for disaster recovery (DR)?
4. Performance Efficiency Pillar
This pillar focuses on the structured and streamlined allocation of IT resources. Important topics include monitoring, maintaining efficiency as requirements evolve, and optimizing resource size and type to match workloads.
- Democratize advanced technologies
- Go global in minutes
- Use serverless architectures
- Experiment more often
- Consider mechanical sympathy
Best Practice Categories & Topics:
- Selection > Performance Architecture Selection
- Selection > Compute Architecture Selection
- Selection > Storage Architecture Selection
- Selection > Database Architecture Selection
- Selection > Network Architecture Selection
- Review > Evolve Your Workload to Take Advantage of New Releases
- Monitoring > Monitor Your Resources to Ensure That They Are Performing as Expected
- Trade-offs > Using Trade-offs to Improve Performance
- PERF 1: How do you select the best performing architecture?
- PERF 2: How do you select your compute solution?
- PERF 3: How do you select your storage solution?
- PERF 4: How do you select your database solution?
- PERF 5: How do you configure your networking solution?
- PERF 6: How do you evolve your workload to take advantage of new releases?
- PERF 7: How do you monitor your resources to ensure they are performing?
- PERF 8: How do you use trade-offs to improve performance?
5. Cost Optimization Pillar
This pillar focuses on avoiding unnecessary costs. Key topics include understanding spending over time and controlling fund allocation, selecting resources of the right type and quantity, and scaling to meet business needs without overspending.
If the AWS cost optimization pillar is executed correctly, your organization can achieve zen-like AWS cost efficiency. These AWS principles will guide you through best practices and force you to analyze your current challenges and needs.
- Implement Cloud Financial Management
- Adopt a consumption model
- Measure overall efficiency
- Stop spending money on undifferentiated heavy lifting
- Analyze and attribute expenditure
Best Practice Categories & Topics:
- Practice Cloud Financial Management > Functional Ownership
- Practice Cloud Financial Management > Finance and Technology Partnership
- Practice Cloud Financial Management > Cloud Budgets and Forecasts
- Practice Cloud Financial Management > Cost-Aware Processes
- Practice Cloud Financial Management > Cost-Aware Culture
- Practice Cloud Financial Management > Quantify Business Value Delivered Through Cost Optimization
- Expenditure and usage awareness > Governance
- Expenditure and usage awareness > Monitor Cost and Usage
- Expenditure and usage awareness > Decommission Resources
- Cost-effective resources > Evaluate Cost When Selecting Services
- Cost-effective resources > Select the Correct Resource Type, Size, and Number
- Cost-effective resources > Select the Best Pricing Model
- Cost-effective resources > Plan for Data Transfer
- Manage demand and supply resources > Manage Demand
- Manage demand and supply resources > Dynamic Supply
- Optimize over time
- COST 1: How do you implement cloud financial management?
- COST 2: How do you govern usage?
- COST 3: How do you monitor usage and cost?
- COST 4: How do you decommission resources?
- COST 5: How do you evaluate cost when you select services?
- COST 6: How do you meet cost targets when you select resource type, size and number?
- COST 7: How do you use pricing models to reduce cost?
- COST 8: How do you plan for data transfer charges?
- COST 9: How do you manage demand, and supply resources?
- COST 10: How do you evaluate new services?
6. Sustainability Pillar
This pillar focuses on minimizing your cloud's environmental impact. Important topics include understanding the impact, optimizing utilization, and establishing a shared responsibility model for sustainability.
- Understand your impact
- Establish sustainability goals
- Maximize utilization
- Anticipate and adopt new, more efficient hardware and software offerings
- Use managed services
- Reduce the downstream impact of your cloud workloads
Best Practice Categories & Topics:
- Region selection
- User behavior patterns > Scale infrastructure with user load
- User behavior patterns > Align SLAs with sustainability goals
- User behavior patterns > Eliminate creation and maintenance of unused assets
- User behavior patterns > Optimize geographic placement of workloads for user locations
- User behavior patterns > Optimize team member resources for activities performed
- Software and architecture patterns > Optimize software and architecture for asynchronous and scheduled jobs
- Software and architecture patterns > Remove or refactor workload components with low or no use
- Software and architecture patterns > Optimize areas of code that consume the most time or resources
- Software and architecture patterns > Optimize impact on customer devices and equipment
- Software and architecture patterns > Use software patterns and architectures that best support data access and storage patterns
- Data patterns > Implement a data classification policy
- Data patterns > Use technologies that support data access and storage patterns
- Data patterns > Use lifecycle policies to delete unnecessary data
- Data patterns > Minimize over-provisioning in block storage
- Data patterns > Remove unneeded or redundant data
- Data patterns > Use shared file systems or object storage to access common data
- Data patterns > Minimize data movement across networks
- Data patterns > Back up data only when difficult to recreate
- Hardware patterns > Use the minimum amount of hardware to meet your needs
- Hardware patterns > Use instance types with the least impact
- Hardware patterns > Use managed services
- Hardware patterns > Optimize your use of GPUs
- Development and deployment process > Adopt methods that can rapidly introduce sustainability improvements
- Development and deployment process > Keep your workload up to date
- Development and deployment process > Increase utilization of build environments
- Development and deployment process > Use managed device farms for testing
- SUS 1: How do you select Regions to support your sustainability goals?
- SUS 2: How do you take advantage of user behavior patterns to support your sustainability goals?
- SUS 3: How do you take advantage of software and architecture patterns to support your sustainability goals?
- SUS 4: How do you take advantage of data access and usage patterns to support your sustainability goals?
- SUS 5: How do your hardware management and usage practices support your sustainability goals?
- SUS 6: How do your development and deployment processes support your sustainability goals?
How Can I Apply AWS' Framework?
There's a lot of information to take in, without even going into the details of best practices. You're probably wondering what to do next!
Here's what we'd recommend:
- Complete AWS' Well-Architected training session, which is free and only takes 90 minutes.
- Form a project team including key technology and business stakeholders. Don't forget teams like Compliance, Product Management, and Marketing - all of which could be directly impacted by, or heavily reliant on, AWS setup changes. Pull together the team, scope and objectives using a lightweight project management framework or template.
- Define the workload, i.e. the scope of your review. This could be as small as a static website, or a large, complex microservices architecture.
- Using the framework questions and best practices, pull together your technical team and conduct your own architectural assessment. This won't be a 5-minute job, so plan it well and be patient.
- Use the built-in AWS Well-Architected Tool to carry out an initial review of your architecture and identify improvements
- Enforcing hundreds of best practices is time-consuming and error-prone. Look for a tool that has built-in Well-Architected monitoring and automated remediation. Very quickly, you'll see that these tools save you time, money, and risk.
Hyperglance & AWS
Hyperglance gives you complete cloud management enabling you to have confidence in your security posture and cost management whilst providing you with enlightening, real-time architecture diagrams.
Monitor your cloud security & compliance, manage costs & reduce your bill, explore interactive diagrams & inventory, and utilize powerful built-in automation. Save time & money and get complete peace of mind.
Experience it all, for free, with a 14-day trial.
About The Author: David Gill
As Hyperglance's Chief Technology Officer, David looks after product development & maintenance, providing strategic direction for all things tech. Having been at the core of the Hyperglance team for over 10 years, AWS, Azure and cloud optimisation are at the heart of everything David does.