FinOps: The Art of Avoiding Bankruptcy in the Cloud

In 2021 a small start up company began preliminary testing of their newest product on Google Cloud (GCP). After a few moments they received an email informing them they had exceeded their initial $7 budget limit. Not initially alarmed, they checked the dashboard and found the bill was currently at around the $5000 mark. By the time they had successfully shut down the experiment just 2 hours later, the badly written code had caused costs to rise to nearly $72,000.

This story ends well: Google cancelled the bill as a one-time gesture. However, the internet is littered with tales of developers and tech companies being stung by leaving Graphics Processing Unit (GPU) instances or Machine Learning (ML) notebooks running, naming storage buckets generically, or lax security protocols.  Distributed Denial of Service (DDOS) attacks – millions of api calls in just a few moments – can lead to eye-watering bills of thousands or more if no protections have been set up. Even those who are dutiful and careful can end up genuinely using resources that incur hefty costs. Netflix’s estimated AWS bill in 2019 was $9.6 million dollars per month, and they were said to be expecting to pay a billion dollars throughout 2023.

With sums like these at play it was inevitable that companies would try and find ways of reducing cost and limiting spend. A recent report found 80% of those surveyed acknowledged that poor financial management related to cloud costs had a negative impact on their business, and estimated up to 32% of costs were wasted spend. However, adding limits to cloud computing is difficult: most providers will allow you to rack up costs ad-infinitum with little warning, expecting you to create your own alerts and protections. After all, the point of cloud-based services is to scale to demand, so you wouldn’t want them limiting your services when your customers are using them!

The Rise of FinOps

In the past couple of years the word FinOps has become widely recognised. Depending on who you ask, the word is a portmanteau of Financial and Operations, or Finance and another portmanteau, DevOps (a portmanteau-teau?). It describes the process of streamlining your operational processes from a cost perspective. This can start as simply as  setting alerts notifying you when your predicted spend is set to exceed pre-set limits, and extend to managing your resources based on time and predicted use. But there are pitfalls to watch for and balances to consider at every facet.

Early Warning Systems

Every sensible cloud-based set up should have at least basic alerts on their accounts. This can be broadly set against a single account or group of accounts, and compare the current bill to previous spend on a set timeframe (week, month, year, etc). They can also extend to a more granular level, focussing on a set service or even as acute as an individual instance. While more information is always useful, the cost in both development time and resources to create warnings on every item can negate any impact, and they will not save you the spend already raised – in the initial example, that particular start up received warnings, but the damage had already been done by the time they had begun to take action. And what if the warnings occur when no-one is on call to deal with them? If an alarm rings in the middle of a wood with no one around, does it even make a sound?

Elastic Fantastic

The great appeal of working in the cloud is you can start off with a small amount of resources and add more as your needs grow, then contract again when demand drops. This elasticity is fundamental to using cloud services effectively to meet your users’ requirements. But, if your offering’s popularity grows exponentially, or if you haven’t configured your scaling settings correctly, those costs can quickly spiral. We want our projects to be successful and would hope that our own income will also grow at the same rate, but that may not always be the case; popularity can often be driven by offering a cheap or free deal to our customers. So how can we draw a balance between being popular, and risking bankruptcy?

  • The first option is to set a hard limit on our scaling, halting the addition of resources at a ceiling. This will obviously have the desired effect of keeping the cost down – but will stop new customers from using your services, and can also possibly affect current users’ experience too.
  • We could also step-scale our resources, making larger jumps of adding resources to give us time to make sure we are using the services we have effectively, and adding any efficiencies we have here or elsewhere.
  • Finally, we can use our FinOps nous to make sure we are as efficient as possible, watching our configurations and metrics to keep everything as financially lean as we can.

Are you Developer Experienced (DevX)?

DevX (or DX if you’re feeling lazy) plays a huge part in successful tech business’ progress – happy developers who feel at ease with the tech they use, valued and invested in their work are able to iterate fast and efficiently and will produce better code. But great DX comes at a cost. Ephemeral environments that any developer can spin up to test an idea? They’re expensive. Effective guardrails and rollbacks to ensure experiments are safe? That gets costly too, and can actively hinder a developer’s output. Balance is key, and FinOps should be used more as a scalpel making precise incisions and cuts, than an axe lopping off major limbs.

A great example is switching off developer-only resources during non-office hours. This can be an effective way of creating savings as virtual machines lying dormant in the middle of the night are effectively dead weight. But what if your employees work unusual hours? Or an architect has a crazy idea that might be an absolute game-changer in the early hours and wants to test it before they forget? Or even the initial boot up of a VM adding more cost every morning, and wasting developer time with basic setup tasks? Another alternative is to keep your resource power at its lowest possible level, selecting the smallest CPUs possible to test the job for your environments. Be aware, this can frustrate a high-functioning developer and hinder their output by taking up unnecessary time waiting for code to run or webpages to render.

Enforcing thousands of unit tests on every CICD pipeline run will ensure code is safe and free from side effects, but may end up costing man-hours as developers wait for them to complete on every push.

A good work environment needs to cater for these eventualities, without creating mechanisms to override them that are so complex and costly they negate the original saving. 

The advent of Platform Engineering along with standard DevOps has created a focus on good developer experience, and we should limit hindrance of that endeavour as much as possible. In these scenarios FinOps practitioners need to look at the problem holistically, rather than focussing on the bottom line.

Deep Frozen Storage

Keeping data safe, durable and accessible is a permanent consideration for any and every company, not just those specifically in tech. As your business grows, so will your data and its associated storage costs, but while some may grow stale quickly and/or need updating, government regulation or otherwise may necessitate the keeping of even the most dried out of information. FinOps really comes into its own here, defining rules for which level of storage data should be kept in and automations to move it around, or delete, if genuinely unneeded without day to day human oversight. The online design tool Canva has used the selection of storage classes (varying from active and instantly available all the way to ‘deep freeze’ archives) and an effective data lifecycle to save themselves an estimated $3 million dollars every year, so the potential for heavy savings are enormous.

There aren’t too many downsides to this approach, apart from the extra initial costs of getting it set up, but pay attention to access patterns and any regulation requirements to ensure you don’t delete something important or lose quick access to data you might need urgently.

Protection Racket

Protecting your cloud presence from attack and error can be incredibly expensive – but the costs incurred from outage, outside attack or compromise can be devastating. Asking yourself what safety and security measures are essential, what would be desirable, and what are the implications without them, should be the first and most important step to developing a financially-balanced security approach. 

  • Any personal data, be it of users or employees, should have the highest levels of protection available, whereas test data can be afforded only mild security, if any. 
  • Is encryption essential on all your data and if so does it warrant regularly rotated encryption keys or will a simpler cheaper level of protection do?
  • Do we need to back up every database we own, or can we apply different levels of back up, such as snapshots versus full redundancy, to data that is more or less vital? 
  • Will enforcing two factor authentication on all of our staff be considered a barrier to development or too costly to set up if we have a huge data breach because someone was hacked? 

Good security will seem almost sunk cost, as most of the time its presence isn’t felt and the effect is business as usual, but the temptation to loosen levels of security for cost saving should be very carefully considered.

It’s (not) all about the money, honey

The FinOps Foundation – an offshoot of The Linux Foundation – defines FinOps as both an operational framework and a cultural practice, and likens it to security in the cloud – “everyone’s responsibility”; but one that can be overseen by a central team when appropriate. By gaining early buy-in from those involved across our organisations, including developers all the way up to CEOs, we can make sure FinOps is a consideration in all development practices, both customer and non-customer facing alike. Looking at the areas mentioned in this article, along with others, can help protect against unwanted charges and spiralling costs. However, there will be many other domains worth considering in your FinOps journey.

Cloud Computing is enabling innovation and inclusivity in all areas of technology and when handled effectively and successfully, FinOps can help ensure longevity and prosperity.

For more insights from our fantastic tech leads, head over to our Insights page.

We use cookies