Performance of Analytic Functions in 2012

In my session Fast-track to BI Analytics with SQL Server 2012, I showed some of the new analytic functions that were released with SQL Server 2012 this past weekend at SQL Saturday #131 in Phoenix.  One of the questions that came up during the session was performance, and would these functions be poor performers.  My response was that I hadn’t measured performance to any extent, but I would expect them to perform better than the convoluted queries we had to write to get the same results without these functions, especially since the new process takes place inside the DB engine.

I decided to look at one of my simpler demo queries and see how it performed.  I used the ContosoRetailDW database for this query, which sums sales by day for each day in 2008 and show the sales for the day, the sales 7 days in the future, and the sales 14 days in the past for comparison.

SELECT OrderDateKey, DailySales,
LEAD(DailySales,7,0) OVER (ORDER BY OrderDateKey ) [7_Days_Future],
LAG(DailySales,14,0) OVER (ORDER BY OrderDateKey ) [14_Days_Past]
FROM
(SELECT f.DateKey as OrderDateKey, SUM(SalesAmount) as DailySales
FROM FactOnlineSales f
inner join DimDate dd on dd.DateKey = f.DateKey
where dd.CalendarYear = 2008
group by f.DateKey) as DailyTable

order by OrderDateKey

These are the statistics for CPU and I/O used by the query.  One thing that was noticeable is that the DB engine made good use of the quad-core CPU on my laptop.

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0…
Table ‘FactOnlineSales’. Scan count 9, logical reads 46937, physical reads 0, read-ahead reads 0…
Table ‘DimDate’. Scan count 9, logical reads 334, physical reads 0, read-ahead reads 0…

SQL Server Execution Times: CPU time = 18094 ms, elapsed time = 2885 ms.

I coded the query without the LAG and LEAD functions and did it in a way I think a lot of developers would do it, with a number of self-joins:

with SalesTbl(OrderDateKey, DailySales) as
(SELECT f.DateKey as OrderDateKey, SUM(SalesAmount) as DailySales
FROM FactOnlineSales f
inner join DimDate dd on dd.DateKey = f.DateKey
where dd.CalendarYear = 2008
group by f.DateKey)
SELECT st1.OrderDateKey, st1.DailySales, st2.DailySales as [7_Days_Future], st3.DailySales as [14_Days_Past]
FROM SalesTbl st1
LEFT JOIN SalesTbl st2 on dateadd(d,7,st2.OrderDateKey) = st1.OrderDateKey
LEFT JOIN SalesTbl st3 on dateadd(d,-14,st3.OrderDateKey) = st1.OrderDateKey
order by st1.OrderDateKey

This query performed much worse, doing 3 times as many logical reads and using quite a bit more CPU.

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0…
Table ‘FactOnlineSales’. Scan count 27, logical reads 140767, physical reads 0, read-ahead reads 0…
Table ‘DimDate’. Scan count 27, logical reads 1002, physical reads 0, read-ahead reads 0…

SQL Server Execution Times:
CPU time = 64317 ms, elapsed time = 9622 ms.

I refactored the query to make it perform better by putting the aggregation subquery into a table variable clustered by date, and then I used the table variable to perform the self-joins:

declare @SalesTbl table
(OrderDateKey datetime primary key clustered,
DailySales int)

insert into @SalesTbl
SELECT f.DateKey as OrderDateKey, SUM(SalesAmount) as DailySales
FROM FactOnlineSales f
inner join DimDate dd on dd.DateKey = f.DateKey
where dd.CalendarYear = 2008
group by f.DateKey

SELECT st1.OrderDateKey, st1.DailySales, st2.DailySales as [7_Days_Future], st3.DailySales as [14_Days_Past]
FROM @SalesTbl st1
LEFT JOIN @SalesTbl st2 on dateadd(d,7,st2.OrderDateKey) = st1.OrderDateKey
LEFT JOIN @SalesTbl st3 on dateadd(d,-14,st3.OrderDateKey) = st1.OrderDateKey
order by st1.OrderDateKey

This performed much better, doing the same number of logical reads as the query using LAG and LEAD but using less CPU.  Interestingly though, the query didn’t make as good of a use of my quad-core environment while populating the table variable, so it responded 3 times slower in real people time than the one using the analytic functions.

Table ‘#A43DC7FA’. Scan count 0, logical reads 733, physical reads 0, read-ahead reads 0…
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0…
Table ‘FactOnlineSales’. Scan count 1, logical reads 46535, physical reads 0, read-ahead reads 0…
Table ‘DimDate’. Scan count 1, logical reads 113, physical reads 0, read-ahead reads 0…

SQL Server Execution Times:
CPU time = 9828 ms, elapsed time = 9848 ms.

(366 row(s) affected)

(366 row(s) affected)
Table ‘#A43DC7FA’. Scan count 3, logical reads 1468, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 93 ms, elapsed time = 101 ms.

Based on this little comparison, it’s clear that the LAG and LEAD functions are tuned for fast performance, and I would assume the rest of the analytic functions are too.  I didn’t spend any time trying to tune the original query I showed, so I wonder if I could squeeze more performance out of it if I wanted to.  But as is, it performs good enough, and probably better than most queries doing this kind of thing without the analytic functions.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: