Category Archives: SQL Saturday

What to expect in a day of Predictive Analytics

SQL Saturday is coming this weekend in Baton Rouge, LA and I’m presenting a full day pre-con on Friday on Predictive Analytics.  It’s a full day deep dive on how to start using Predictive Analytics in your organization, and ideal for people who are brand new to the field.

What should you expect in a full day session like this, besides singing, dancing and a few jokes?

To give you an idea as to what’s covered, here’s an outline of the agenda for the day.  I hope you can make it, and there is still time to register by clicking on this link:

Predictive Analytics in the Enterprise

Early Morning

  • Predictive Modeling Overview
  • SQL Server Data Mining, Analysis Service and Excel
  • Predictive Model by Class

Late Morning

  • Case Studies
  • Tools and Algorithms
  • Demo\Lab 1 – Develop Hiring Model

After Lunch

  • Use Models with DMX – C#, SSIS, SSRS
  • Demo\Lab 2 – Develop College Model

Mid Afternoon

  • Demo – Neural Network Code
  • Exercise 1 – Develop Churn Model
  • Demo 3 – Discuss alternative tools such as R, WEKA, RapidMiner

Late Afternoon

  • Deployment and Monitoring
  • Exercise 2 – Develop Movie Recommendation Model
  • Exercise 3 – Develop Stock Price Prediction Model

SQL Saturday #279 in Phoenix – March 29, 2014

If you haven’t noticed, SQL Saturday #279 in Phoenix this weekend has 3 great BI pre-cons that are scheduled for Friday, March 28.  The pre-cons are held at the Microsoft Office in Tempe, AZ and run from 8:30AM – 4:00PM. There is still time to register!

I’ll be presenting Predictive Analytics in the Enterprise, showing you how to start and be successful with a predictive modeling project in your organization.  I’ll provide an overview of predictive analytics, discuss the concepts and ideas behind the subject, use case studies to review successful projects I’ve worked on, and demonstrate how predictive models are built.   I’ll also have a number of labs for the class to use as exercises so you can leave the session having built your own predictive models.  To participate, bring your laptop or plan to share with someone.  I have a lot of material to go through in just one day.

The other two pre-cons will be led by David Stein and Bill Pearson and are listed below.  Also, don’t foget Saturday, a full day of free training.  I’ve spoken at this event in 2012 and 2013, and it’s a top notch gathering of SQL Professionals.  I expect it to be this year also.

Session 1: Fast Track your Business Intelligence Projects with BIML – David Stein

Session 2: Predictive Analytics in the Enterprise – Carlos Bossy

Session 3: Practical self-Service BI with PowerPivot for Excel – Bill Pearson

Big Data, BI and SQL Saturday in Silicon Valley

In preparing to speak at SQL Saturday in Silicon Valley this week, I was thinking of ways to explain map/reduce in my session.  One way is to look at the parts of map/reduce that are analogous to a SQL query, which most of the attendees will find familiar. A simple SQL statement like ones we write for business intelligence purposes often have a filter, an aggregation, and a group by for aggregations.
Here’s an example of a query that has all 3 of these components to sum sales by region and quarter for the year 2013:

select dg.region, dd.quarter, sum(f.sales_amt) as sum_sales from fact_sales f
join dim_geography dg on dg.dim_geography_id = f.dim_geography_id
join dim_date dd on dd.dim_date_id = f.dim_date_id
group by dg.region, dd.quarter
where dd.dd.year = 2013

When writing the code in a map/reduce environment, the 3 components are still there in a different way.  Any data not needed for the result is filtered out in the mapper function using code, and the the group by values are identified as the key in the mapper function.  In the reducer, the aggregation is explicitly coded to sum by the key value.  All three steps take place in a procedural way in this C# code:

public override void Map(string inputLine, MapperContext context)
   //Split the pipe-delimited input into an array
   var aryValues = inputLine.Split(‘|’);

   // This is the filter since the 10th position in the array is year of sale. If it isn’t 2013, discard the input.
   if (aryValues[10] == “2013”) {
      // write the region and quarter (this is the group by), and sales amount (offset 22)
      context.EmitKeyValue(aryValues[11].ToUpper() + ” ” + aryValues[12].ToUpper(), aryValues[22]);

//Reducer public class NamespaceReducer : ReducerCombinerBase {
public override void Reduce(string key, IEnumerable<float> values, ReducerCombinerContext context) {
   // Initialize sum of sales for this key (region)
   decimal dSumSales = 0;

   // loop though each sales and sum it
   foreach (string value in values)
      dSumSales += Convert.ToDecimal(value);
   // write the sum of the sales to output
   context.EmitKeyValue(key, dSumSales.ToString());

Since we’re working without a schema and the data is semi-structured (or unstructured), the challenge that we have is in finding the sales amount, sales quarter, and the region in each data row.  The example shown assumes the data is pipe-delimited, so that makes it easier to identify.

To see this in action, join me at 11am in Mountain View this weekend for my session Getting Started with Big Data and BI.