Category Archives: SQL Saturday

Columnstore Indexing at SQL Saturday #104

SQL Server 2012 will give us more index options by adding the Columnstore index. This post is an overview of this new feature, and it’s also a prelude to the presentation I’m delivering on Columnstore indexing January 7, 2011 at SQL Saturday #104 in Colorado Springs, CO.

My presentation, Using Columnstore Indexes in SQL Server 2012, will include demos that cover the important things to know about Columnstore indexing. If you haven’t heard much about this type of index structure, here are a few things to know:

–          Intended for Data Warehousing, Reporting and Ad-hoc Queries
–          Does not use B-Tree structure like row indexing
–          Can only have 1 per table
–          Does not allow table to be updated
–          Best built with columns that have non-unique, repetitive data
–          More advantage is gained from bigger data
–          Column order doesn’t matter
–          Maximum benefit gained with queries that aggregate data
–          Don’t use for single-row type queries
–          Memory-intensive to build and execute

It’s Fast

Why build a Columnstore index? It can be very fast to query, much faster than the typical row-based indexing we are used to. They’ll make reporting faster, and they can be part of the solution for the elusive ad-hoc query challenge that power business users want.

Creating a Columnstore

A Columnstore can and probably should contain every column in the table with a few exceptions.  Unique data columns in some cases should be avoided.  Also, creating a Columnstore index requires a lot of memory, so you can reduce the stress on your system by not including every column.

Using the Index

The SQL Server optimizer will decide if it can benefit from using the index just like it does with other indexes. You’ll be able to check the execution plan to see if SQL Server decided to use it.

Here’s a preview of the type of comparisons, analysis and best practices I’ll be presenting at SQL Saturday #104. I’ll focus on comparing this new type of index with row-based indexes and I’ll help you determine when the right time to use it is.

This query uses the AdventureWorksDW and is a common type of query when using a dimensional model. It aggregates data from a fact table while joining to dimension tables:

SELECT dp.EnglishProductName, dp.ProductLine, dd.CalendarQuarter,
sum(SalesAmount) as TotalSales
FROM FactInternetSales f
inner join DimProduct dp on dp.ProductKey = f.ProductKey
inner join DimDate dd on dd.DateKey = f.OrderDateKey
where dd.CalendarYear = 2004
group by dp.EnglishProductName, dp.ProductLine, dd.CalendarQuarter
order by dp.EnglishProductName, dp.ProductLine, dd.CalendarQuarter

This is the I/O and CPU usage:

CPU time = 200 ms,  elapsed time = 493 ms.
Table ‘DimProduct’. Scan count 0, logical reads 516, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘FactInternetSales’. Scan count 1, logical reads 2062, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘DimDate’. Scan count 1, logical reads 66, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

And the execution plan which is showing us that SQL Server is doing a clustered index scan on the fact table:

Columnstore Query

I’ll run the same query after adding a Columnstore index that includes every column in the FactInternetSales table.  Below is an abbreviated version of the DDL:

CREATE NONCLUSTERED COLUMNSTORE INDEX CX_FACT_INTERNET_SALES ON dbo.FactInternetSales
(
ProductKey,
OrderDateKey,
DueDateKey,

)

Executing the same query produces the following results:

CPU time = 80 ms,  elapsed time = 343 ms.
Table ‘DimProduct’. Scan count 0, logical reads 516, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘FactInternetSales’. Scan count 1, logical reads 64, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘DimDate’. Scan count 1, logical reads 66, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

As you can see from the execution plan, the query optimizer used the Columnstore index this time.  The CPU utilization was reduced from 200ms to 80ms and the logical reads went from 2062 to 64 on the FactInternetSales table. The disclaimer is that this is not a scientific test, but it’s a quick preview of what Columnstore indexes can do.

SQL Saturday #104

This is a snippet of what you’ll see if you attend my presentation in Colorado Springs. I’ll use larger tables to demonstrate how to get the most out of Columnstore indexes, how to use them with a partial set of columns and with partitioning, and I’ll compare them to their equivalent row-based queries.

SQL Saturday #92 – Portland

With the PASS Summit just a few days after SQL Saturday #92 in Portland, OR, and just 180 miles north, you might think that the Portland event is just a warm-up, like going to a minor league baseball game, but that’s nothing further from the truth.  You can always count on Arnie Rowland putting together a first-class event and attracting speakers and sponsors of PASS Summit quality.  Take a glance at the October 8 roster of speakers and sessions and you’ll see what I’m talking about.  If the great schedule isn’t enough, Portland is a really fun, beauitful town with tons to do and some of the best food in the country.  I’ve had the opportunity to travel to Portland in the last 2 years because of a consulting engagement in the area and I’m happy to have had a chance to take in the scenery, sample the restaurants and enjoy the ambiance.

If you’re from out of town and new to Portland, take a stroll downtown and along the Willamette river.  Pop in to a local coffee shop and drink the city’s own Stumptown coffee.  Go to Powell’s bookstore for one of the best book browsing experiences you can have.  Make sure you take in the view of Mt. Hood if it’s a clear day.  You can always go to the nationally famous Voodoo Donuts where I find the standing in line and meeting people experience more interesting than the donuts themselves.  And if you want good food and drink you can’t miss if you partake at Pine State Biscuits, A Pizza Scholls, Deschutes Brewery, Moroccan cuisine at Dar Essalam in Wilsonville, or Petite Provence, probably the best french boulangerie/patisserie in the world (sorry France!)  I don’t think I’ve had a disappointing meal yet in Portland.   To top it off, the SQL Saturday event itself is on the waterfront.

See you there and don’t forget to come to my session Real-time Data Warehouse and Reporting at 10:15 in the morning.  I’m going to cover numerous topics in a little over an hour; SQL Server Replication, CDC, Star Schemas, SSIS, and Reporting, all in real-time.

SQL Saturday #94 – Salt Lake City

I’ve been too busy to post lately and now SQL Saturday #92 in Portland, OR is upon us.  Since I didn’t recap SQL Saturday #94 in Salt Lake City on September 10 and thank everyone, I’m going to take this opportunity to do so before the event in the rose city this weekend.  The gathering in Salt Lake was one of the smaller SQL Saturdays I’ve attended, but that has its advantages because the sessions allow for a much more intimate setting and better interaction among participants.  I had 15 people attend the preview of my PASS Summit session See the Future with Visual Predictive Analytics and I feel like I got to know everyone who was there.  I got to spend a time talking to interesting people like Paul Turley, the Reporting Services author, who I had met before but this time had the chance to get to know personally.  A very gracious than you goes to the organizers, especially Pat Wright, Tjay Belt, and all the others who worked hard to make it happen.

What made this SQL Saturday special was the presence of the continuing participation of the Colorado community that has been so active in these type of gatherings.  People that have become good friends volunteered their time as speakers including Marc Beacom, Jason Horner, Chris Shaw, Mike Fal, and Gabriel Villa.  It’s been a blast to be part of this group and I look forward to many more.  And if you haven’t been to a SQL Server event, your missing out.  There is something unique  about this community and you have to just see for yourself.

Now on to Portland…