Don’t be afraid of redundancy. As developers and database developers we try to reuse components and objects as much as possible. We try not to create two functions that do the same thing, and we try to model relational OLTP databases so that every data element is defined just once. Let go of this objective.
A data warehouse serves individual purposes, and everyone’s purpose. To make this happen with a non-redundant streamlined data model is very difficult. Data that needs to be measured transactionally may also need to be aggregated, and it may need point-in-time reporting capabilities. This could mean you store the same data three different ways. And that’s ok.
The reasons for building a data warehouse are different forms of reporting and analytics, activities that require slicing and dicing of data in an accessible and comprehensible form. A dimensional model is a great way to provide this clear view of the data. I’m not going to show how to create the dimensional model in this series, but I do want to make the point that a good data warehouse includes a dimensional model.
Begin the modeling process by turning business requirements into a conceptual star schema. Draw simple diagrams with facts in the middle of the star and dimensions surrounding it. Give the facts and dimensions high-level names, such as Measurements and Factory. Continue through the requirements and conceptual modeling process until you’ve completely identified all the facts and dimensions. Then create a logical model by defining the columns that will make up each fact and dimension, and identify each column as a string or a number. This logical model should be able to answer all of the questions posed by the business requirements, and it can then be turned into your physical data warehouse.
Providing a good dimensional model will make your business users and your IT staff who have to use the data warehouse very happy. It is a much easier way for them to get the data they need from the data warehouse.
For more info on dimensional modeling, start with the Kimball books and web site. Go to http:\\www.ralphkimball.com.
This is an easy one to state. Design the Data Warehouse based on the requirements of the business. It sounds so obvious. The requirements should be driven by the Reporting needs, Analytics, Dashboards, Scorecards, Ad-Hoc Analysis and Data Mining. Why state this obvious principle? In all failed DW efforts, this principle is ignored.
A better way to understand Autonomous Design is by stating what not to do. Don’t design the DW by using a relational database as the source data model. Don’t make the DW look like a denormalized relational OLTP database. Don’t allow the source data models to determine the DW data model.
This is what I mean. An OLTP application has a Customer, Product and Order table. The star schema has 3 Dimension tables, Customer_Dimension, Product_Dimension and Order_Dimension. These Dimension tables have the same data elements as their source tables, except that codes are decodes. A fact table is added called Order_Fact that has 3 foreign keys, one to each Dimension table. Not much was done to create a good data model for Business Intelligence functions. The data model should be conceived from the ground up and would support the required BI functions. This is autonomous design.