Core Principles of Data Warehouse Design #5 – 8

Last year I started a sequence of blog posts to discuss what I believe are the core principles you should adhere to when developing a data warehouse.  I meant to post them over a period of several weeks, but I only made it through the first half.  In order to complete what I start, this post will finish the sequence.  I’ll cram the last 4 posts into one, which I probably should have done to begin with.  These were the posts that comprise the first half of this sequence, followed by the remaining four principles:

#5 Slowly Changing Dimensions

A good data warehouse has good history, and this can’t be done in a dimensional model without slowly changing dimensions (SCD).  The key to appropriate SCD development to identify for each dimension table on a column-by-column basis whether it is important for that column’s previous state to be maintained in the DW.  For an Employee dimension, the worker’s name might not be a something I want to maintain historically.  If the name changes, it’s still the same worker, so when I’m reporting sales figures by worker, I want all of the sales reported under the worker’s current name.  However, if the worker changes department, I would want the change to be maintained historically, so when I report sales by department, the numbers are reported correctly.

Here’s an example of a Worker dimension before any changes.  For worker Fred Smith, he changes offices from West to East.  Jane Doe’s has a name change to Jane Jones.  Since office is a SCD type 2 column the historical row is maintained, but it is end dated.  A new row is added and it is made the current row since it has current data.  Since Name is a SCD type 1 column, a new row is not added for Jane Jones, instead the existing row is updated.

Worker Dimension

Worker Dim ID Name Gender Office Start Date End Date Current
545 Fred Smith M West 6/1/08 true
711 Jane  Doe M North 7/15/09 true

After SCD Process

Worker Dim ID Name Gender Office Start Date End Date Current
545 Fred Smith M West 6/1/08 5/21/10 false
910 Fred Smith M East 5/22/10 true
711 Jane  Jones F North 7/15/09 true

#6 New Data Structures

This principle says that new database tables should be created that don’t exist in any source system.  It’s a reminder to not get locked in to how source databases are structured, and to invent new entities that supoort the requirements of the data warehouse.  An example of this is a data warehouse I designed for a foster care agency.  The OLTP database tracked abuse incidents, but the agency needed to report on repeat abuse incidents within a certain time frame and in particular situations.  This caused me to create a fact table that specifically tracked Repeat Abuse.  This way the agency could report on this metric easily, without concern for applying the right business rules or time frames.

#7 Ease of Query

The data warehouse should be easy to query; no long SQL, complex joins, nested subqueries, etc.  The queries should perform well and they should be easy and obvious.  There’s a standard query that should deliver most metrics from the dimensional model and it looks like this:

SELECT col1, col2, SUM(col3) FROM Fact Table
 JOIN Date Dimension
 JOIN Dimension2
 JOIN Dimension3
 WHERE col3 = ? and col4 = ?
 GROUP BY col1, col2
 

#8 Completeness

This is simple.  How often does the data warehouse satisfy a user’s request for a metric, report, or analysis?  How often do they have to go to sources other than the DW to get the answer?  This is a measure of Completeness.  A high percentage demonstrates that the DW contains most or all of the data it requires.  A low number indicates that users still don’t use the DW as much as they should.  Completeness is difficult to achieve in Phase 1 of a development effort.  A DW that can provide 65-70% of the data requests in Phase 1 is a good accomplishment.

Conclusion

If you are embarking on your first Data Warehouse design effort, keep these 8 principles in mind.  Achieve them all and you will produce an excellent DW that will be valuable right away on initial deployment.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: