Mistakes are commonplace in any networking environment, and especially when it comes to administering complex public cloud infrastructure with multiple engineers working asynchronously.
Too often, we’ll get wrapped up in “deploying tool X” or “connect to external network Y” and forget to check on our own network posture.
Let’s take a look at 3 common mistakes that could cost your organization millions of dollars.
Mistake 1: Assuming Security is Built-In
Now, this may seem obvious, but basic security practices are often neglected in favor of convenience or speed.
The average cost of a data breach in 2020 was $3,860,000 USD according to a recent report from IBM.
Cryptojacking is another common attack, where an attacker gains credentials to your cloud and spin up resources to mine cryptocurrency, potentially spending millions in compute power (find out more).
Let’s go over a few basic security controls that your organization should have in place to prevent unauthorized access.
Restrict root access to 1-2 people:
Root accounts are the equivalents of super admin in your cloud, the username or password for this account should be locked up safely. A disgruntled employee or hacker could use these credentials to wreak havoc in your cloud.
One option is to give access to the root password to a team member, and then pair the multi-factor authentication (MFA) with another employee’s device who cannot see the password. Effectively instilling a two-man rule for each root login (think nuclear submarine). CSPs have methods of recovering the account in the event the password/MFA holder are moved.
Require MFA for every employee (especially root):
Some organizations may avoid this due to the inconvenience of having to check a code on your phone every time you log in.
MFA is still one of the most effective ways of preventing unauthorized access, we’d recommend looking into physical MFA options like YubiKey for a more convenient MFA option.
Enable and use API & activity logging:
In AWS, this is what CloudTrail provides. The audit trail that CloudTrail provides will be vital during incident investigations. At a minimum, each account should have its own s3 bucket with all of the CloudTrail logs (and a storage lifecycle). Access to the bucket and trail should be restricted to security personnel to prevent tampering or accidental deletion.
For environments with higher security requirements, consider centralizing CloudTrail logs in a separate “logging” account. This account could also include a security information and event management (SIEM) tool. Always establish a data retention/archival policy for logs in the SIEM as costs can quickly skyrocket if the data is left unchecked.
Use leased roles and integrate authentication with your identity provider (IdP):
A key benefit of IdP integration is that you won’t have to worry about employees retaining access to your cloud after they’ve left the company. Take advantage of your SSO/LDAP/AD to simplify access provisioning. Better yet, utilize leased roles for all day to day activities. Instead of logging into an IAM user, which is presumably going to have the same access key for X days, you’ve been granted temporary credentials for 2 hours.
The method of doing this will vary depending on your CSP and identity provider/SSO, Okta is one of the simplest to set up, there are also workarounds to integrate with other providers like GSuite. In most cases, a quick search will identify the steps you need to take.
Follow least-privilege principles for users and networks
Overprivileged users with good intent can still cause damage to your production networks if left to their own devices. You can expect non-networking professionals to assign public IPs, create lax security groups, and spin up resources with little consideration for cost or security. Limit IAM privileges to admins to prevent users from bypassing authentication controls.
Least-privilege should also apply to your networks. Restrict ingress always, restrict egress too! Over-utilize your stateless (Network ACL) and stateful (Security Groups) firewalls. When possible, throw in a load balancer with a web application firewall (WAF) and a NAT gateway and hide your web servers safely in private subnets.
Hyperglance is particularly effective at quickly identifying unsecured resources. It has over 200 built-in rules that will identify security holes and compliance failings in real-time.
Mistake 2: Underutilized Resources & Data
Underutilized or unnecessary cloud resources can easily drive up costs.
As your organization grows, the overhead tied to underutilized resources will also grow if you’re not careful.
Let’s look at a few common scenarios.
Rarely do engineers deploy instances to the exact specifications required by the apps running on them. There may be opportunities to downsize instances and clusters where CPU, memory, and storage aren’t being fully utilized.
Hyperglance’s cost dashboard conveniently identifies these resources for you as shown below:
Sandbox & test instances left running:
Every cloud needs a sandbox environment, usually a separate account, which will help you distinguish what’s “live” or in production, and what is still being deployed & configured.
Busy engineers may forget to spin a sandbox resource down, leaving potentially costly assets running in limbo.
Best practice is to build automation to ensure your sandbox resources aren’t accumulating.
Hyperglance’s rules engine, paired with its built-in automation, enables you to filter resources by environment and create a cleanup job in minutes!
Storing data without lifecycle policies:
The cost of storing cloud data varies depending on use case and accessibility. Data costs can skyrocket when tied to other systems like a Security Incident & Events Manager (SIEM), usually in the form of Elasticsearch or Splunk.
Create a data retention policy, your organization’s cybersecurity team or regulatory compliance guidance should tell your engineers what types of data need to be retained (snapshots, backups, logs) and for how long.
Mistake 3: Overusing On-Demand Pricing
On-demand pricing is typically one of those things most cloud professionals know is costly, but are hesitant to move away from possibly due to unpredictable environment changes, or growth.
There are a few ways to tackle this issue without leaving money on the table.
Understand the cost differentials:
Research the different options available with your CSP, AWS for example offers 1 and 3 year reservation with no upfront, partial upfront, and full upfront options in addition to spot instances.
AWS’ instance reservations can save you up to 72% compared to on-demand pricing, spot instances can save up to 90%!
Conduct quarterly cloud workload reviews:
Once every three months, take some time with the engineering team to go through your cloud workloads, focusing on the high-cost resources first.
While reviewing, ask the questions “do we need this” and if so “will this requirement still exist in 1-3 years?”. If the answer is yes, annotate the instance type and expected duration.
Add on-demand pricing vs reservation pricing and at the end of this exercise, you’ll have a list to present to your CFO that strongly justifies upfront pricing.
Hyperglance’s interactive diagram, inventory list, and cost explorer are particularly useful for this exercise, enabling you to see costs for different environments right away and drill down as needed.
Consider working with a reseller:
If your team is small or overburdened, it may make sense to pass off the management of instance reservations to a reseller.
Resellers purchase instance reservations on your behalf and then apply them across multiple customers, the more customers they have, the less risk involved.
Typically you’ll see pass-thru savings of 4-15% with the added benefit of not having to manage your own instance reservations.
Find the right balance
Perhaps you’re working with Kubernetes clusters or EC2 auto-scaling groups as well as EC2 instances.
It’s important to find the right balance of on-demand, reservation, and spot options for each environment.
For example, you’re tasked with saving money on a cluster that has three t3.xlarge ($0.1664) master nodes, and a minimum of three t3.xlarge worker nodes at all times but can scale to up to 10 worker nodes as needed.
The cluster is also expected to still be in use over the next 12 months. The most cost-effective method would be to purchase 1-year reservations for the three master and three worker nodes, saving 41% each, and then configure the cluster to prioritize spot instances ($0.0499 per Hour) when scaling, with on-demand as a backup if spot instances aren’t available.
Hyperglance - Cloud Management You Control
Hyperglance gives you complete cloud management enabling you to have confidence in your security posture and cost management whilst providing you with enlightening, real-time architecture diagrams.
Monitor security & compliance, manage costs & reduce your bill, interactive diagrams & inventory, built-in automation. Save time & money and get complete peace of mind.
Experience it all, for free, with a 14-day trial.