Category Archives: data modeling

No Clustered Index

After seeing another SQL Server database this week without what I consider to be ‘the basics’, it got me to thinking what the basics are.  In other words, what are the core fundamental practices we should follow every time we create a new database.  The problem database that I saw this week had several tables without clustered indexes.  In my world that wouldn’t happen.  Every database table I create, even in a small, simple, or temporary database, always gets a clustered index right off the bat.

Here are the things I always do when creating a database or adding tables:

1. Every table has a Clustered Index.

2. I always assign a Primary Key.

3. I always create a Surrogate Key.

4. I always assign an appropriate data type to each column, so if a column is a number I always make its data type numeric, if its a date I make it datetime.  I don’t get lazy and make these strings.

5. I almost always put an identity column on the table that auto increments.  I can’t say I do this every time, but I don’t do it only when I have a good reason.

What kinds of things do you always do?

I find when you don’t do these things at the start, they can be forgotten later.  I suspect that’s what happened with the database I reviewed this week.  It’s been changed and added to so much over the years that I believe that some tables that created ‘temporarily’ in development ultimately became the real thing, and nobody went back and updated them.

It might be drudgery, but it saves a lot of pain and effort later when things are created right from the start.

The NFL’s Over Hyped Quarterbacks

The NFL and its fans have never had a wild devotion to statistics the way Baseball fans have.  After watching the Super Bowl this past Sunday and seeing much of the playoffs leading up to the big game, I noticed more announcers use statistics to implore the greatness of a particular player.  The comparison of the Quarterbacks of today to the gunslingers on yesteryear is especially appealing, since the numbers racked up by today’s QBs are so much better than those of the past.  This is primarily due to rule changes made by the NFL over the years making it easier for the passing game to succeed, and not because of increasingly talented players at the Quarterback position.

During the pregame banter leading up to the game, one pundit mentioned how Aaron Rodgers had already had a two passer ratings of more than 100 and Brett Favre had only done it once in his career.  Well, in 1996 when Favre went to his first Superbowl at age 27, he had a completion percentage of 59.9 and a passer rating of 95.8.  The league that year had a completion percentage of 57.6 and passer rating of 76.9  In 2010, Aaron Rodgers won the big game at age 27 with numbers of 65.7 and 101.2.  The league averaged 60.8 and 82.2.  Comparing the individual performance to the league’s, and without doing much more analysis, it seems Rodgers and Favre had essentially the same season.

This pattern of improving QB statistics goes back a long time and has trended upwards over time.  In 1970 the completion percentage for Quarterbacks averaged 51.1 percent and the Passer Rating was 65.6, well below today’s numbers and still well below numbers from the 1990s.

In the Business Intelligence and Data Architecture professions, we need to make sure the data we deliver is put in the proper context.  The business world has seen improvements in productivity and new processes over the years and we need to make sure our data models continue to be relevant as things change.

8 Core Principles of Data Warehouse Design – #3 Transformation

Transformation isn’t clearly stated in the requirements gathering phase, but it must be derived by carefully analyzing the business requirements. Too many times I’ve seen data warehouses attempted without transformation. The data is often a denormalized mirror of the source data and this does not help your users at all.

After you have modeled the data warehouse in an autonomous fashion (see sections #1 and #2) then you must capture the source data and transform it so that it fits the new model. Transformation of the data is code and ETL that is executed behind the scenes and not visible to end users. It makes use of metadata to execute the necessary transformations.
Some transformations can be simple, such as a two columns titled FIRST_NAME and LAST_NAME being concatenated into the data warehouse as a new column called FULL_NAME. Others can be very complicated, such as determining the primary location of a criminal act based on various reports and numerous sources. In any case, whether simple or complex, make sure you transform the data.