Category Archives: data modeling

Data Warehouse Design Fundamentals

I was asked to review an initial design for a data warehouse a few weeks ago for a state government agency. It was in the early stages of design so it was nothing more than a data mart, but it was going to go through a formal review by a big name outside consulting company, so they wanted my feedback before they got hammered by this company.

It was in sad shape, to put it mildly. The design was very basic and not really ready for review. It had a single fact table, surrounded by 7 or 8 dimension tables. These tables were basically a copy of the same tables that existed in a source application, with the main change being that they appended the word ‘dimension’ or ‘fact’ at the end of each table. Even though they weren’t far along, it looked to me that they were definitely going in the wrong direction. And there’s no way they were ready for a review by the big name consulting company.

As flawed as this design was, it’s not unique. So many times when I review a design or an existing warehouse, it’s not even close to the true architecture of what one should be. I touched on this problem in an earlier post titled Data Warehouse Renovations. Over the next several days I will describe 8 core principles to follow when creating your data warehouse. Abiding by these principles will make your warehouse a powerful tool for measuring, analyzing and reporting on your company’s performance.

The Business Process Workflow Data Warehouse

There are a number of different ‘types’ of data warehouses. I say this with trepidation because a data warehouse is usually not referred to as having a type. Perhaps I should rephrase my statement; there are a number of different types of data marts. A data warehouse can contain multiple data marts, each with a different aim. For the purposes of this discussion, I’m talking about a data warehouse as if it has a primary objective, which in my experience most do.

Read any book on data warehousing, look at a blog with a sample star schema, or take an introductory data warehouse design class, and you will see the same pattern. You will see a fact table representing a well-defined transaction surrounded by several dimensions. The fact table will be easy to understand; it usually reflects a common transaction that has been mapped from an application in a straightforward manner. It will look something like this:

In the diagram above, the fact table is derived from a sales transaction that occurred in an OLTP database. This star schema is a good learning tool and it’s a pattern that can be reused for many transactional processes, but let’s talk about business process workflows.

Examples of Business Process Workflows

Business Process Workflows happen over periods of time, are vague in their scope, and acquire new or changing data as the process moves forward. Also, it can be uncertain when they are actually complete. These are a few examples of the kinds of business processes I’m referring to:

– A financial services company receives a request from a customer to open a new 401k account.
– A social worker places a child in a foster home.
– A laptop in sent in to the manufacturer for warranty repair.

What’s different about the Business Process Workflow data warehouse vs. the Transactional data warehouse? The following are significant distinctions:

1. Workflows span long periods of time, some take hours, but others months or even years.
2. Workflows don’t have definite time periods and some may not end.
3. Workflows almost always require accumulating facts that must be updated after they are stored when more data arrives.
4. Workflow data can be removed or hidden due to incorrect data entry, legal issues, or duplicate data entry.
5. There are no standard practices for data as there are with financial, accounting and sales applications.
6. Data is often much more poorly entered and verified.

In the next several days, I will discuss each of these points in more detail, review why they make the Business Process Workflow data warehouse a true challenge, and discuss solutions for each. There is a paradigm shift that must be grasped and some deep-seated data warehouse modeling principles must be relinquished.

Data Warehouse Renovations

I’ve run across a number of incomplete data warehouses. The kind where release 1 made it out the door, but wasn’t very successful. The fundamental problem that connected each of these failures is that the data warehouse wasn’t built as a data warehouse. What do I mean by that? The warehouse did not meet most of the accepted patterns that are required of a database for it to be designated a data warehouse. They were not dimensional models. Some were not even close.

In one case the warehouse was a copy of the database used for an OLTP application. The IT group did this to allow users to separate reporting from transactional usage, and then called this the data warehouse. In other cases an ETL process was performed on the OLTP database where a subset of data was moved to a different server. Lookup tables were employed to decode data, and the data tables were reconfigured, to some extent, to support reporting.

In each case, the approach taken was to support a certain number of reports with hard and fast definitions. The schema design was developed as a rigid structure, supporting the reports developed, but little else. In one case I saw, the warehouse could not deliver the reports by itself, so pre-report building stored procedures had to run to populate tables with the report data. This appears to defy the logic and purpose of data warehousing, but sometimes people just want to make things work.

A layer of abstraction would be useful to introduce, but that’s for a subsequent article. The point here is to highlight the difference in implementations in the real world. We often see this in IT. We have short timeframes for delivery, a lack of skills coverage, and limited bandwidth, making it very difficult to implement the right design all of the time, or even some of the time. Learning to work with and adapt to ‘inferior’ designs is a normal part of my job, and as a consultant, one I need to do well. In a project I’m currently working, my job is to turn one of these pseudo-data warehouses into a true data warehouse, and in a short amount of time. It’s a job that is part science and part art, and in a few months I hope to write about the techniques I used to make it a successful implementation.