Blog: Our customer reduced costs by over 20% by optimizing Azure elastic pools

A leading provider of payroll and HR services in the United States faced the difficulties with configuring the Azure elastic pools for the needs of their SaaS Azure app. The Azure tools provided by Microsoft were not flexible enough to fine-tune the performance of 20,000 customer databases with different utilization patterns. Moreover, it was almost impossible to maintain so many databases manually.

We came up with the solution that allows automatic management of the customer databases basing on extensive monitoring of elastic pools performance.

Now the customer can optimize the Azure resources utilization that results in over 20 percent reduction of the Azure services costs.

Have a look at how we optimized elastic pools utilization in the following video.

The problem with elastic pools utilization

Our customer is maintaining the application that served hundreds of thousands of payroll clients. They decided to move the application to the Azure cloud and employ the Azure elastic pools in order to optimize the databases utilization and reduce the costs. Unlike on-premise model, the computational resources are located at the databases level in Azure. The Azure elastic pools solution allowed us to provision certain resources (Database Transaction Units) to a pool, and share it across many databases. The major challenge for our customer was to configure elastic pools with 20,000 databases the most convenient and efficient way. The customer also faced the problem with monitoring since the standard Azure tools didn’t provide sufficient information on database performance.

DB Best solution

We implemented the main functionality of our solution by the means of 3 modules:

  • Aggregation engine,
  • Suggestion engine,
  • Execution engine.

We delivered these modules as PowerShell scripts.

In order to assess the environment, we needed to create a way to provide clear visibility into elastic pools performance. To solve this task, we developed the telemetry tool which monitors each database during the certain time period. Before the main engines start to work, we collect the telemetry data of each elastic pool performance for the last 7 days.

Aggregation phase

With the Aggregation engine, we use the collected data to calculate the aggregated storage indicator in order to determine which part each database occupies in total Pool’s utilization. We then use the values of this indicator later while solving the task of unloading the overloaded pools.

We use the Aggregation engine with a specially developed script that we execute in Windows using task scheduling. We then execute this script anytime with any frequency to perform the most recent telemetry calculation.

Suggestion phase

We then use the aggregated data to pass on to the Suggestion Engine to classify the workloads. The Suggestion engine allows us to see how databases consume transaction units in different pools during the time interval. According to this information, we classify the pools into the three groups: stable, need attention, or open to additional workload. We then can decide which databases we must remove or add to the pool to return the problematic pool to the stable state or make it open to additional workload.

suggestion engine

When we want to unload a pool, at first we consider the pools with the least amount of used resources in terms of size and Database Transaction Units. When the pool resources exceed the specified limit, this pool is no longer considered for additional workload. If there are no suitable pools for unloading, we create a new pool.

Execution phase

Now we optimize the pools by moving the databases that do not comply with the current pool utilization pattern. Using the a rebalancing PowerShell script, we return the problematic pool to the stable state or make it open to additional workload. Eventually, the Execution engine regroups all suggested databases between the pools to consume the Azure resources the most efficient way.

Execution Engine

Development overview

We started the project with analyzing the existing customer system and creating specifications that were reviewed by the customer prior to investing in a lot of coding. According to the approved specifications, we developed 3 main modules. Then we performed the unit and integration testing in the customer environment. We ensured that all delivered scripts were working properly. After completing the testing stage, we delivered our solution to the customer and provided deployment support.

Development Cycle

Results

In this project, we overcame the challenge of inability to optimally configure the cloud system using standard Azure resources. When we deal with constantly changing cloud services, it is especially important to distinctly see the computational resources consumption in the whole system. So we came up with the solution that improved the visibility of Azure resources consumed by the SaaS application. This enabled us to keep up with possible cloud utilization changes. Moreover, our tool can be run automatically on a frequent schedule. As a result, we made it possible to utilize the Azure elastic pools more reasonably. This leads to over 20 percent lower recurring Azure costs.

Feel free to contact DB Best to become more efficient with your cloud resources.