Query a Star Schema for Trends over Time

A data mart in the form of star schema provides us with a powerful but easy way to develop SQL for metrics and reports.  The typical query will get data from a fact table, join to one or more dimensions, and perform some grouping and aggregations.  Very quickly the foundation for a new report can be developed.

With a good data model we can develop a fair number of reports with queries like this, but there are more complex queries that we need to write.  SCD type 2 dimensions ensure that fact data is related to dimension data in the state the dimension was in when the fact was created.  However, to report accurate sales results requires us to stray from the typical star schema query.

A requirement comes to you asking you to compare 1st quarter sales with 2nd quarter for each business unit in your organization, but to only include business units that have greater than $100K in sales year to date.

The first thing we do is get a list of business units with YTD sales greater than $100K.

Select  bud.bus_unit_id from Sales_Fact sf
Join Business_Unit_Dim bud on bud. business_unit_dim_id = sf. business_unit_dim_id
Join Date_Dim dd on dd.date_dim_id = sf.date_dim_id
Where dd.Year_YYYY = 2012
Group by bud.bus_unit_id
Having sum(sales_amt) > 100000

The Business_Unit_Dim table has SCD type 2 changes but the query doesn’t care.  It will return the right results even if a particular business unit has been changed 100 times this year.  The key is getting the right list of business unit ids and then using them as a subquery in the outer query.

The outer query looks like this:

Select bud.business_unit_name, dd.quarter_name, sum(sales_amt) from Sales_Fact sf
Join Business_Unit_Dim bud on bud. business_unit_dim_id = sf. business_unit_dim_id
Join Date_Dim dd on dd.date_dim_id = sf.date_dim_id
Where dd.Year = 2012 and dd.Quarter in (1,2) and
Bud.business_unit_id in
(Select  bud.bus_unit_id from Sales_Fact sf
Join Business_Unit_Dim bud on bud. business_unit_dim_id = sf. business_unit_dim_id
Join Date_Dim dd on dd.date_dim_id = sf.date_dim_id
Where dd.Year_YYYY = 2012
Group by bud.bus_unit_id
Having sum(sales_amt) > 100000)
Group by bud.business_unit_name, dd.quarter_name
Order by bud.business_unit_name, dd.quarter_name

This example shows us how to trend data over time using a star schema and that it’s the date dimension that give us the ability to do that.  It doesn’t depend on SCD changes and it shows we often need to query a star schema to get the right subset of data we want first, and then we query the star again to get the ‘report’ data.  It also shows us that a healthy library of queries that we can use as subqueries can greatly improve our query development process.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: