Tags: ETL, OLAP, optimization
When a company grows, it’s no surprise that their database stored data grows as well. This is the case for all companies, and our customer, a global retailer in the beauty industry, was no exception. While working on this project, we managed to increase the average speed of query execution by 30x, saved up to 190 GB of memory and freed up 28 GB of disk storage space. Prior to these great results, there was a hard optimization process performed on the customer’s system based on the OLAP cubes technology.
We sped up the query execution by 107x. Freed up to 190 GB of memory, and improved the overall performance of the OLAP Cube by 3x. We managed to reach these fabulous results during a complex Big Data Management Project.
Watch the following video to see how we did it!
Overview of the customer’s original system
Find out more facts and figures, as well as some optimization tricks and best practices below.
Imagine a big international company with many branches across different continents. At the end of the day, each branch creates a sales report and stores other valuable data in their own databases. This data, initially stored in various sources, is then aggregated by the ETL process and loaded into the OLAP cubes.
This is how it works: the data must be arranged into a consistent formation and combined into a single repository. This is done by ETL processes, which are built for data conversion between OLTP databases or other data sources, and OLAP cubes. The ETL processes are performed in 3 steps:
- Data extraction from the servers.
- Transformation and data cleaning.
- Loading into Data Warehouse.
When these actions are completed, the ETL process commands the OLAP cube to connect to the source and operate with the raw data. As a result, the OLAP cube contains the same information as the Data Warehouse, but the data here is organized in a special multidimensional format which allows faster query execution times.
Issues of the original system
The customer’s analytics and sales managers must receive actual versions of business intelligence reports every day. This is a critical requirement for business processes of the company, but in reality, the daily reports were generated with a 15-hour delay! Actually, you can hardly call these reports daily — as the company’s analytics team had to work with yesterday’s data. The same issue impacted other types of reports (weekly, monthly, annual) as well.
Essentially, the slow generation of business intelligence reports was the main reason the customer contacted DB Best. The solution to the problem was not simple. We started with a deep technical analysis of each system’s component, and as a result, we discovered critical problems that were related to the system’s architecture. For example, some earlier implemented jobs were still being executed in the cube, but their results were not used in the overall reports. Also, the changes made in some system’s components were causing issues in other components. This happened because the system was being developed for a long period of time and different software solutions were implemented. At the same time, the service was usually provided by specialists who were just solving a local task and not taking into account the complexity of the whole system. Essentially, the number of issues in the original customer’s system was growing like a snowball, and the whole system required optimization. We performed these optimizations in three steps:
- Optimization of the OLAP cubes,
- Improving the ETL process and
- Auditing of the MDX code.
Let’s take a closer look at these operations. But before that, we should mention that at the start of the project, the customer provided access only to the SSAS part of the original system, which was actually used for reports generation. Only after we made some critical improvements and provided complex support, did the customer grant access to the required parts of the original system.
Optimization of the OLAP cubes
This is the most important stage, which is about the structure of the customer’s OLAP cubes. We implemented quite a few optimization tricks with the cubes’ multidimensional structure, in addition to building some new OLAP cubes as well.
We removed the unused dimensions’ hierarchies, deleted useless links in the tables and databases, and reviewed the data calculation system inside the OLAP cube. As a result, the calculation time for the cube decreased in around 3 times. Moreso, we reduced the disk space usage by 28 GB.
Adding the data slices (which is one the best practices) resulted in performance increase for frequently used queries. This allowed us to speed up the executing queries and decrease the memory usage from 190 GB, with the suspended query, to less than 1 GB with completed query.
Summarizing the results of the optimization stage, we want to highlight the fact that the customer planned a server upgrade for the OLAP cubes, which were growing rapidly in size. After completing the optimization however, the customer decided to stay on the current server as there was no longer a need to upgrade. This is exactly what big data management is all about!
Improving the ETL process
The ETL process in the original customer’s system was based on the IBM InfoSphere DataStage and SQL Server Integration Services technologies, which is a common solution for Big Data systems. The customer couldn’t take full advantage of the ETL process and the number of issues on this stage was growing.
We made some qualitative improvements to increase the performance of the complex ETL processes. The data was extracted from the smaller regional databases, transformed to the required format, and loaded into the data warehouse within the desired time interval. This resulted in it working about 2.5 times faster than before.
Optimizing the MDX code
The last, but not least feature in the list of performed optimization tricks is the code audit of the MDX queries. Here’s just one significant example: After analyzing the MDX code, we’ve found a query which used about 16 GB of RAM. We examined this query (which was executed pretty often) and found that it had 5000 strings of code. After optimizing the MDX code query, its size was reduced to just 1100 strings. So now, this query can be executed 107 times faster than before.
Overall, the MDX code optimization allowed to speed up the executing of the queries by 30x.
Reaching project’s goals
The result of hard optimization work left the customer pretty happy. The daily reports are now generated in 7 hours, and the analytics and sales managers can enjoy updated business intelligence reports at the beginning of the working day, which was one of the critical requirements for the customer.
More on that, the business intelligence reports were now generated in multiple formats, which includes IBM Cognos, SAS, Microsoft Excel and SQL Server Reporting Services. The reports in SSRS format were used as the data source for mobile applications.
This is a classic example of what happens when you can’t handle big data, and why big data should be managed by experts who understand best practices. Contact DB Best Technologies, the experts in big data management, to increase the performance and reliability of your data warehouses.
Be sure to check our related video on General principles of OLAP System.