The Strata Conference started yesterday (February 28) with a day of Tutorials, Jumpstarts, and Deep Data sessions. I attended two half-day tutorials, Hadoop Data Warehousing with Hive by Dean Wampler (Think Big Analytics), Jason Rutherglen (Think Big Analytics) and The Two Most Important Algorithms in Predictive Modeling Today by Jeremy Howard (Kaggle) and Mike Bowles.
The morning session focused on the use of Hive, the SQL-like language that can be used to perform analysis and create reports for data stored in a Hadoop data warehouse. The session was very informative, although perhaps a little too ambitious. Since it was a tutorial, the speakers prepared a virtual machine that the audience could install on their own machines and execute the same statements as the speakers. This ended up being a difficult process, and I found it easier and more valuable, as much of the audience did, to simply follow along and watch the presentation. Also, the session had a few demo glitches and the quantity of material was too much for the time allotted. Even so, I got a lot out of the session. It made clear to me the business cases where the use of Hive is appropriate, and it also highlighted for me the strengths and limitations of Hive.
I really liked the afternoon session. Jeremy Howard and Mike Bowles each presented their favorite predictive modeling algorithm. Jeremy talked about how he’s used Random Forests to solve numerous types of problems and did a demo showing how to predict who was likely to live or die on the Titanic. Mike discussed how to use the glmnet algorithm and showed how it to apply it to sonar data to find oil under the ocean.
The next two days of the conference are filled with shorter 40 minute sessions. I’m looking forward to getting a lot more info about analytics and big data.
Have you put in your vote for which sessions you’d like to see at SQL Rally? If not then cast your vote here so you can get the most out of your time this May in Dallas. All it takes is to be a member of PASS and a few minutes.
I have a session on the ballot that’s called Using Columnstore Indexes in SQL Server 2012 and below is the session abstract for you to review. You might not be sure whether to vote for my session or you don’t know whether you need to attend session on columnstore indexes, so I’ll give you one really good reason.
The Reason for Learning about Columnstores
Columnstore Indexes are really important because in SQL Server 2012 they will have to be seriously considered by DBAs and BI Architects as a replacement for cubes (or as an alternative) and as the primary indexing scheme for star schemas, report subsytems, data warehouses and any other reporting and analysis scenario. Making the right decision will require you to have in-depth knowledge of how these indexes work. That’s what you’ll get if you attend my session.
Session Abstract: Using Columnstore Indexes in SQL Server 2012
Columnstore Indexes in SQL Server 2012 will allow you to significantly improve the processing time of common data warehousing queries without creating cubes, aggregated tables, or other techniques normally used to improve performance. This session will show how to implement this new type of index in SQL Server and demonstrate their advantages compared to traditional solutions. Carlos will also discuss the scenarios for which columnstore indexes should be implemented to provide powerful but flexible BI solutions.
Go to this blog post I wrote several weeks ago for more info on columnstore indexes.