Database Health Check for SQL Server Systems

22 Aug 2017 Andrey Ivanov Customer Stories, Data Management Optimized (DMO), Database Consulting, Database Upgrade, DBMSys, Microsoft SQL Server, Operational Data Management
Tags: health check, high availability

One of our public sector customers was using a distributed SQL Server 2016 database system and was considering an upgrade to unite the servers in one data center, or even a move to the cloud. We helped them by using our DBMSys service to perform the overall database health check. As part of the database health check, we found several high priority issues which required immediate actions. In addition, we provided the customer with detailed information on their existing database system, as well as the set of best practices and suggestions on how to upgrade it.

Watch the following video to learn more about the technical approach we used.

Typical issues we found with the database health check using DBMsys

Initially, we used DBMSys to discover the customer’s systems using the database health check feature. DBMSys is a lightweight yet powerful online service designed by DB Best specifically for collecting data from multiple environments. The net result is that DBMSys can help develop a business case for an upgrade or modernization of the data infrastructure, calculating the expenses for all possible options and making them easily comparable. DBMSys installs on a single machine in the customer’s environment and runs over 3-4 weeks, gathering the required information about devices, databases, licenses, etc.

We found that the customer’s system included about a hundred databases. The system utilizes no more than 500 GB of storage which made some of these databases good candidates for hybrid integration with Azure.

The system was distributed between two independent data centers. Each of them hosted a fully functional replica of the customer’s system, and they both acted as parts of the Always On Availability Groups, leveraging the latest high availability features of SQL Server 2016.

However, the customer’s IT department wanted to eliminate the remote data center which hosted the secondary replica. They were experiencing difficulties with the support of a remote data center and it required additional funding. To do so, they considered both an upgrade of the main data center or moving it to the cloud.

As a result, we discovered that they didn’t utilize the system at full capacity. However, it had a number of critical issues which required taking immediate actions. The most critical issues were related to the servers’ memory and storage. In addition, we found some security issues and lack of backups for some of the databases.

Best practices for addressing the memory issues

Proper configuration of RAM is one of the most important tasks in database server management. To run smoothly, applications must have enough memory, so you need to be intelligent while managing available resources.

We discovered random RAM peaks, caused by the lack of memory for the Windows Server OS. This issue can be addressed in two different ways. First, you can add more RAM to bring those peak values under control. However, this is a dead-end way, because for 99% of the time the system consumes around 30 percent of available RAM, which means that you don’t need to add memory. So, we prefer the second way of addressing this issue which relies on query optimization. To do so, you need to track down the queries or processes that cause the RAM peaks, and optimize them.

Best practices for addressing the storage issues

Examining the storage issues, we found out that each of the servers in this environment show signs of significantly reduced storage performance. On average, disk latency for most drives met the expected performance levels. However, one of the disk drives experienced a 7-second latency during query execution. We expected a 10-millisecond latency and the database health check easily identified this performance bottleneck 700 times over the allowed value.

We discovered the bottleneck appeared to be in one drive of this server containing the majority of the database data files. So, we suggested distributing the data between other drives or even servers to avoid this bottleneck.

In addition, we leveraged cloud backups for the databases. This cool new SQL Server 2016 feature allows for storing backups of the on-premises database in the Azure Blob Storage, saving the storage on hardware.

General suggestions and resulting benefits

The biggest issues in this customer’s system came from the poor optimization causing a shortage of resources. However, the customer can simply choose to move the system to the cloud and forget about resource management.

We suggested that our customer should move only the secondary replica to the Microsoft Azure cloud. This would allow maintaining the high availability feature while keeping the hardware in the primary data center. Since we discovered that the hardware is underutilized, it may allow future growth of the database system. Also, in this way, they could reach one of the project goals of eliminating the remote secondary physical server, saving costs on its hardware support.

Addressing performance issues, we decided to leverage read-only routing. In this case, the system redirects the read-only queries to the secondary server, decreasing the workload on the primary replica. This will improve the query execution time even if the customer places the secondary replica in the cloud.

Regarding the security and high availability issues, we found out the original system lacked the backups for some of the databases. So, we created automated encrypted backups for all databases to prevent data loss in the event of a disaster. We also planned on running an integrity check on a weekly basis. Despite the customer having already implemented the SQL Server 2016 high availability features, this approach allows for discovering possible issues in the high availability system as they occur.

Blog: Getting the most out of a comprehensive database health check for SQL Server

Typical issues we found with the database health check using DBMsys

Best practices for addressing the memory issues

Best practices for addressing the storage issues

General suggestions and resulting benefits

Related posts

Getting started