How the Home Office’s Immigration Technology department reduced its cloud costs by 40% – GOV.UK

The Home Office team use the cloud as their primary hosting platform and the department is one of the biggest cloud users in the UK government. The Immigration Technology department has over 500 developers working on immigration projects.

Home Office senior leaders wanted Immigration Technology to reduce spending on the cloud, so its platform team investigated ways to reduce costs over the last year.

Immigration Technology reduced its overall cloud costs by 40%, by using a variety of optimisation techniques across storage, use and resources. By continuing these techniques, the team is confident it can increase cloud cost savings by at least another 20% as they continue to experiment.

The team looked at how developers were working and found they were not using cloud resources as efficiently as they could because of their focus on delivery. By focusing on mission-critical deadlines, developers were using more expensive on-demand services and not planning resource usage. This led to expensive cloud bills, sometimes called bill shock, because most cloud providers charge for cloud resources by the second.

Immigration Technology researched industry practices to help reduce cloud costs and discovered 7 strategies aimed at increasing efficiency.

From their research Immigration Technology found using excess capacity in the cloud was the best way to reduce its overall cloud computing bill. Sometimes this is referred to as using Spot Instances, Low-Priority VMs, Transient Servers, or Preemptible VMs.

Some cloud providers offer spare compute resources that are not being used at discounted rates. However, using excess capacity in the cloud can be very volatile because the cloud provider can ask for the resources back at any time if another customer wants them, sometimes with only a few minutes warning. Immigration Technology used a provider that offered a bidding system to get the lowest price.

To mitigate the risk of losing vital computer capacity, Immigration Technology does not use the excess capacity option to power important production services that must always be available. But it is perfect for non-production services or temporary jobs, which currently make up about 30% of the compute powering their non-production containerised clusters.

Potential cost saving: the department paid approximately 80% less for cloud resources by using excess capacity services.

Most cloud providers charge customers by the second for cloud services so Immigration Technology experimented with scheduling compute and storage resources to run them only when needed. This approach provided big savings.

The team started by identifying which services they could turn on and off easily. For example, most development and testing systems were not used overnight or at the weekend, so these were automatically turned off for 12 hours every evening and 24 hours over the weekend.

Potential cost saving: Immigration Technology reduced the costs by over 60% by turning off services overnight and over the weekend without affecting user functionality.

Immigration Technology encouraged teams to use autoscaling, but not all developers were using this method. Most cloud providers offer an autoscaling product. Immigration Technology decided to incorporate one as a standard component into their build templates, which means the department automatically has autoscaling as part of every standard build.

For example, for the Access UK online visa application service, the team went from running 20 containers in production down to 2 containers. These scaled up only when there was an increase in demand.

Potential cost savings: By autoscaling all standard services, Immigration Technology only pays for the compute they need at the time.

After analysing its cloud estate, Immigration Technology discovered that environments did not always have the appropriate compute and storage resources.

For example, the team monitored some of their compute instances over a 2-week period and found they were using only 10% to 20% of the resources they were paying for on average. Immigration Technology wanted to get its compute usage between 60% to 80% to get better value for money.

This was because development, pre-production and production environments were often set up in the same way to simplify development. However, this meant that relatively unused development servers could be running on a full production service.

To maximise usage, Immigration Technology set standard build templates to scale services appropriately for their environments, a strategy called rightsizing.

Immigration Technology created a small build template for development purposes, which does not include high availability options and keeps costs down. Immigration Technology uses a more expensive large build for production services, which need high availability options.

When looking at databases, Immigration Technology found many developers were using expensive default options they did not need, for example, enhanced monitoring services. These options were not required for a development database and could increase the service cost considerably, so the platform teams agreed to use smaller database services in development.

Potential cost savings: By rightsizing services, Immigration Technology kept costs low in non-production and other low usage environments.

In some cases, the platforms team found the best way to reduce costs was to re-architect the service to make it cloud native. For example, Immigration Technology found that folding multiple databases into one managed database instance helped to reduce costs considerably. They also moved applications into containers to reduce the resource footprint of the services.

Immigration Technology also made another big change to reduce its compute footprint. The team switched some services that did relatively simple tasks to use serverless technology instead. They also replaced commercial middleware tools with simpler functions, such as using storage lifecycle policies instead of a more expensive document management tool.

Potential cost savings: Immigration Technology used cheaper cloud native options by re-architecting services that could not be made more efficient.

Development teams were using a large number of cloud resources during testing or building, but did not always stop or delete them once testing was complete. Immigration Technology saved 1% in costs by cutting unused resources and encouraging development teams to keep their environments tidy.

Immigration Technology development teams regularly perform housekeeping tasks like:

Potential cost savings: Doing regular housekeeping tasks and creating policies to archive older assets saved around 1% on cloud costs.

Most cloud providers will offer a discount in return for committing to specified level of usage. Sometimes these discounts are referred to as reserved or committed use instances. Immigration Technology found they could get an upfront discount of up to 40% on compute resources, if they bought in bulk upfront.

However, up-front usage commitments are often restricted to a specific server type and the server must be switched on 24 hours a day, 7 days a week. This strategy cannot be used alongside other cloud cost saving techniques like scheduling where servers are switched off for a period of time.

Immigration Technology also found the savings from the up-front usage commitments were often lower than from many of the other techniques. For example, turning off servers overnight and at weekends can save over 60% on cloud costs. Immigration Technology realised it could save more money by only using this method for services where no other techniques would work.

Potential cost savings: Immigration Technology only considered up-front usage commitments if no other technique would have worked.

Having tested and compared the effectiveness of all of these techniques, Immigration Technology then needed to encourage developers to use them regularly. Some techniques need development effort, for example, rightsizing components requires monitoring to calculate what size the underlying compute should be. Developers often prioritise delivery rather than cost efficiency, so Immigration Technology worked directly with developers to give them time to think about their working practices.

To do this, Immigration Technology uses a cost efficiency scale for each service and team. Known as the Hosting Cost Efficiency Rating, the scale gives different weighting to the following techniques:

The scale shows developers and teams how they can improve their service. Immigration Technology is planning to gamify the process to encourage teams to become even more efficient.

Immigration Technology is also working to get its service teams to incorporate the cost saving practices by:

GDS will be producing guidance on cost optimisation in the new year that will further explain some of the techniques that Immigration Technology used to reduce their costs in the cloud.

For more information on using cloud cost optimisation techniques to save money, contact cloud-strategy@digital.cabinet-office.gov.uk.

Read more:
How the Home Office's Immigration Technology department reduced its cloud costs by 40% - GOV.UK

Related Posts

Comments are closed.