Monthly Archives: July, 2010

8 Core Principles of Data Warehouse Design – #3 Transformation

Transformation isn’t clearly stated in the requirements gathering phase, but it must be derived by carefully analyzing the business requirements. Too many times I’ve seen data warehouses attempted without transformation. The data is often a denormalized mirror of the source data and this does not help your users at all.

After you have modeled the data warehouse in an autonomous fashion (see sections #1 and #2) then you must capture the source data and transform it so that it fits the new model. Transformation of the data is code and ETL that is executed behind the scenes and not visible to end users. It makes use of metadata to execute the necessary transformations.
Some transformations can be simple, such as a two columns titled FIRST_NAME and LAST_NAME being concatenated into the data warehouse as a new column called FULL_NAME. Others can be very complicated, such as determining the primary location of a criminal act based on various reports and numerous sources. In any case, whether simple or complex, make sure you transform the data.

Advertisements

8 Core Principles of Data Warehouse Design – #2 Dimensional Model

The reasons for building a data warehouse are different forms of reporting and analytics, activities that require slicing and dicing of data in an accessible and comprehensible form. A dimensional model is a great way to provide this clear view of the data. I’m not going to show how to create the dimensional model in this series, but I do want to make the point that a good data warehouse includes a dimensional model.

Begin the modeling process by turning business requirements into a conceptual star schema. Draw simple diagrams with facts in the middle of the star and dimensions surrounding it. Give the facts and dimensions high-level names, such as Measurements and Factory. Continue through the requirements and conceptual modeling process until you’ve completely identified all the facts and dimensions. Then create a logical model by defining the columns that will make up each fact and dimension, and identify each column as a string or a number. This logical model should be able to answer all of the questions posed by the business requirements, and it can then be turned into your physical data warehouse.
Providing a good dimensional model will make your business users and your IT staff who have to use the data warehouse very happy. It is a much easier way for them to get the data they need from the data warehouse.
For more info on dimensional modeling, start with the Kimball books and web site. Go to http:\\www.ralphkimball.com.