The evolution of data warehouse solutions

Advances in Data Warehouse (DW) and Data Lake (DL) solutions from major players like Oracle, Teradata, Microsoft, IBM, Amazon Web Services, and Google Cloud have blurred the distinction between the two. Last year, Gartner recognized the shift and renamed their DW report to Data Management Solutions for Analytics (DMSA).

Dark Data

We’ve experienced this transition first hand as our customers look for new ways to use what Gartner calls “dark data” with their traditional DW data. Dark data consists mainly of data that must be retained for various compliance reasons. But since the data is stored as log files, text files, and other semi-structured formats on file system archives, it’s basically not accessible to business analysts.

Dark Data

 

Serverless Query Against Dark Data

With Data Lake technologies like Azure Data Lake Analytics, Amazon Athena on top of S3 storage, and Google Cloud Dataproc, organizations can register dark data in the form of virtual tables. This allows traditional BI tools like Microsoft Excel, Power BI, Tableau, QlikView, IBM Cognos, and every other tool that has access to ODBC or JDBC drivers, to access dark data.

Here is an example of what the data processing pipeline looks like using Amazon Athena for serverless queries. For Microsoft Azure, the pipeline looks identical. You simply can replace:

  • Amazon S3 with Azure Blob Storage or Azure Data Lake Store
  • Amazon EMR with Azure HDInsight
  • Amazon Athena with Azure Data Lake Analytics
  • Amazon Redshift with Azure SQL Data Warehouse
  • Amazon QuickSight with Microsoft Power BI

Amazon Athena Sample Pipeline for Analyzing Dark Data

Here is where things get tricky. You now have business analysts processing petabytes of dark data and then joining the results with traditional DW data inside of the client BI tools. Needless to say, the Business Analysts will complain about performance inconsistencies.

Data Warehouse External Tables

The traditional DW vendors recognized this problem and have solutions to access “dark data” located on Hadoop Distributed File Systems (HDFS) or native cloud storage like Azure Blog Store/Data Lake, Amazon S3, Google Cloud Storage, etc. This means that features like PolyBase for SQL Server 2017 and Azure Data Warehouse, and Amazon Redshift Spectrum can join exabytes of cataloged dark data with DW tables with great performance.

Here is an example of a data processing pipeline using PolyBase for Azure SQL Data Warehouse. Like the prior pipeline above, the AWS pipeline is identical. You can simply replace:

  • Azure Data Lake Analytics with Amazon Athena
  • Azure Data Lake Store with Amazon S3
  • Azure Data Factory with Amazon Glue
  • Azure Data Warehouse with Amazon Redshift
  • Polybase with Amazon Redshift Spectrum
  • Microsoft Power BI with Amazon QuickSight

How SQL Server Polybase simplifies access to Dark Data.

More importantly, organizations can operationalize data for analytics using enterprise OLAP engines, data cleansing, and master data management solutions that already connect to the DW tables. This is because cataloged dark data looks like a regular SQL table or view.

Quite simply, it’s all about pushing the computations for aggregating and filtering data as close to the persisted data as possible. This will reduce network and storage latency, ensuring the best performance for your organization.

We are ready to help you make sense of how these technologies all fit together to support traditional BI and advanced analytics using machine learning for your organization.

Learn more about Data Warehouses
Contact us

Featured technologies for data warehouse solutions

Our technology partners like Microsoft, Amazon, and Google have developed innovative solutions for accessing traditional data warehouse data, operational data, and dark data. They’ve created robust and customizable solutions to analyze, present, and take actions to transform your business into a data-driven business.

Check out the different services that we provide for our featured data warehouse technologies.

Other supported data warehouse and big data solutions

With advances in big data analytics solutions, we see that the line between the data warehouse and big data technologies is becoming increasingly fuzzy. Here are some of the other data warehouse and big data solutions that we’ve helped customers with. Whether it was migrating to a newer platform, or optimizing what they were already using.

Traditional Data Warehouse Platforms

  • Oracle Data Warehouse
  • Snowflake Computing
  • Teradata
  • Pivotal Greenplum
  • IBM PureData System for Analytics (formerly Netezza)
  • Micro Focus Vertica
  • SAP HANA and SAP BW
  • SAP IQ (formerly Sybase IQ)

Big Data Solutions

  • Azure Data Lake Store with Azure Data Lake Analytics featuring U-SQL
  • Amazon Athena using standard SQL against S3 storage
  • Apache Hadoop and Spark including HortonWorks Data Platform, Cloudera, Azure HDInsight, AWS EMR, and Google Cloud Dataproc
  • Apache HBase including deployments using Azure HDInsight, AWS EMR, and Google BigTable.
  • Apache Cassandra including DataStax distribution on Azure, AWS, and Google Cloud.

What we do with data warehouse and big data technologies

We recommend starting with our Future-State Architectural Design Engagement to better understand your needs and then work with your team to develop a technology roadmap for your data warehouse and analytics solutions.

The process starts with understanding your current state and goals for your new solution either on-premises or on Azure, AWS, or GCP. This example shows how we looked at a customer’s current data warehouse solution to see how they could meet their scalability needs while controlling costs.

scheme_psd_1_y

We then work with your team to develop a high-level architecture to identify which services are needed to satisfy your objectives. Here is an example of a high-level solution architecture using AWS services that addresses the future state requirement. This includes maintaining the customer’s current investment with Pentaho for analytics.

scheme_psd_2_y

 

The process of understanding your on-premises and cloud-based data and then looking at the technologies used to ingest & collect, store, process & analyze, consume & visualize to deliver answers and insights provides your team with a greater understanding of the solution. We use this as a roadmap for developing a project plan and creating a return on investment model for a cost-effective solution which can grow as your business needs change.

Our core services with data warehouse technologies include the following:

Let's get together to talk about Data Warehouses
Contact us today

Learn more

Reference data sheet

Check out our data sheets featuring data warehouse technologies.

BI & AA Offering for Microsoft
8 November 2016 Bill Ramos

In combination with Microsoft SQL Server, Office, SharePoint and Cortana Analytics suite, DB Best has helped hundreds of customers deliver Business Intelligence solutions across their ...

BI and AA Offering for AWS
8 November 2016 Bill Ramos

In combination with Amazon RDS, Amazon Redshift, Amazon Kinesis and Amazon EMR, DB Best has helped dozens of customers deliver Business Intelligence solutions across their entire organ...

Let us help you with your project!
Contact us for a FREE quote today.
Request a quote