T-SQL Tuesday and APPLY Knowledge: An Essential tool for BI Development

Business Intelligence solutions require the arrangement of data in ways that that are different from our OLTP databases.  Navigating the source databases can often require advanced T-SQL techniques and the APPLY operator is a great asset to the BI developer’s toolkit.

Here’s an example.  Consider how the APPLY operator can be very useful to you as you move data from one database to the data warehouse. You’re developing ETL and have to perform complex data transformations to populate a star schema and the requirements call for a dimensiontable to have a particular piece of data at a point in time.  Do you want to get the largest purchase made by a customer within the last year?  Would you like to see date of the first time the customer made a purchase larger than $100?  Or the average of their last five purchases?

Below is a Customer Dimension table that I want to populate.  The MaxPurchase column contains the largest purchase made by a customer on a single order within the last year and MaxPurchaseDate has the date of that purchase, and I’ll focus on these columns in my example:

I’ll show how I think most developers would approach this problem, and then I’ll demonstrate how to do it with APPLY.  Often, a developer’s first inclination is to start with a standard join and then tweak it to get the right answer:

SELECT c.CustomerId, c.CustomerName, oh1.OrderDate, oh1.PurchaseAmt  FROM Customer c
INNER JOIN OrderHeader oh1 ON oh1.CustomerId = c.CustomerId
WHERE oh1.OrderDate BETWEEN ’04/01/2010′ AND ’03/31/2011′

Of course, this returns every order for every customer for the year and that’s not what I want.  I only need the order with the maximum purchase amount, so I wrap the original query and do something like this:

SELECT CustomerId, CustomerName, OrderDate, MAX(PurchaseAmt) AS MaxPurchase FROM
(SELECT c.CustomerId, c.CustomerName, oh1.OrderDate, oh1.PurchaseAmt
FROM Customer c
INNER JOIN OrderHeader oh1 ON oh1.CustomerId = c.CustomerId
WHERE oh1.OrderDate BETWEEN ’04/01/2010′ AND ’03/31/2011′) AS tbl
GROUP BY CustomerId, CustomerName, OrderDate
ORDER BY CustomerId

At first this looks right, but then I realize I’m still getting more than one order per customer, since this query only returns the maximum purchase for a single date for a customer, but not the max purchase for the year. Where do I go next? Maybe I start to add row numbers, or I add a join back to the OrderHeader table, or I add a subquery.

Instead, let’s see why using the APPLY operator gives us the most elegant solution:

SELECT CustomerId, CustomerName, p.OrderDate, p.PurchaseAmt
FROM Customer c
CROSS APPLY (SELECT TOP 1 oh.OrderDate, oh.PurchaseAmt FROM OrderHeader oh
WHERE oh.CustomerId = c.CustomerId AND
oh.OrderDate BETWEEN ’04/01/2010′ AND ’03/31/2011′
ORDER BY PurchaseAmt DESC) p
ORDER BY CustomerId

Now I have a concise query that allows me to populate my Customer dimension with the maximum purchase and the date it occurred. The only choice I have to make is whether to use CROSS APPLY or OUTER APPLY. The former returns only customers that had a purchase within the last year, the latter returns all my customers even if they didn’t make a purchase during the time frame (OrderDate and PurchaseAmt would return as NULL, just like a LEFT OUTER JOIN that filed to join).

What about performance? That probably deserves another blog post to compare in detail with other solutions. However, if the OrderHeader table is indexed on CustomerId and OrderDate (descending) then performance would be satisfactory, and might be the best of the alternatives.

The Apply operator: Elegant, concise, essential for Business Intelligence, useful in so many scenarios.


2 responses

  1. Thanks for posting, Carlos. I know I’ve found myself writing layer upon layer of query just to get the info I’ve wanted. Next time, I’ll think about APPLY.

  2. […] Bossy ( Blog ) shows how APPLY can elegantly replace layers of subqueries when querying for that “right” […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: