Starting something new is hard. An exercise regimen, a diet, studying for certification exams. Getting to the finish and attaining the desired goal is even harder. Completing a predictive analytics initiative is no different and the first one is particularly hard.
You’ve spent time on research, reading and investigation, and you see the upside. You know there can be a high return on investment, and a successful outcome can change the culture of your organization. But you’re a DBA, a BI practitioner or analyst and predictive analytics is new to you and your company. You want to ensure success but how do you go about it to make sure it works?
There are characteristics that successful initiatives have in common, and the ones listed here are what I’ve seen lead to success.
Tightly couple the team with BI
The predictive analytics initiative should be part of the BI team, or work very closely with it. Why? The BI team knows where the data is, how to get it, understands its quality, and has already acquired much of it. There will be some data needed that BI people don’t care about, but 80% of it will overlap.
Hire a Data Scientist or assign the role
Make someone the data scientist, either by title or role. You’ll have to decide whether to bring them in from outside the organization, or assign someone with the essential skills, but make sure the role is occupied.
Buy, Open Source, or use what you own
Don’t get hung up on which tools to use. There are many good tools out there and you probably already own some of them, so use what you know and have. Do you have to use R to be successful? No, although much of what you read makes it sound like you do. But also consider which open source tools may help augment your development suite.
Choose a Focused Goal
Make the first initiative you pursue focused on trying to solve a single problem. This is true of most of BI development. We shouldn’t build a whole data warehouse in one shot, don’t try to solve every predictive problem out there. And choose one management will get behind, like churn or upselling.
Give it Time
To get good results requires time. It takes anywhere from 4-12 months to put a solutions into production.
All the parts that go into a successful initiative will be covered at my pre-con at the PASS Summit titled Predictive Analytics in the Enterprise.
I spent last week at the PASS Summit in Seattle and it was an experience worth writing about. I presented two session which I’ll also talk about, but I want to cover the greater event first. This conference brings people together better than any conference I’ve attended, and it isn’t even close. A great effort is made by the Summit organizers, volunteers, and attendees to keep you busy either attending sessions or meeting with people. I participated by being a first-time attendee mentor and heading up a birds of a feather table at Friday’s lunch, but there were so many that I couldn’t be part of them all. Yet it seems I spent the whole week, outside of sessions, talking to old friends and meeting new people from 8 AM until 10 PM every day. The talk ranged from mentoring people new to the SQL Server world to discussing deep technical issues, new business opportunities, and ideas for SQL Saturdays and user’s groups. This is the true value of the Summit that no other conference provides. When you combine the networking at the Summit with all of the associated local and regional events such as user’s groups and SQL Saturdays, you really get to know people in the community in a personal way.
I had the privilege of presenting two sessions at the Summit this year. I presented Real-time Data Warehouse and Reporting Solutions on Friday and to my surprise, the room was so full that the moderator had to close the door 10 minutes before the scheduled start time. There were even people sitting on the floor! The attendance and the participation was great to see. There was great interaction with the audience during the session and 30 minutes of follow up questions afterwards. The demo for this session is really difficult to get working during the presentation, but I’m happy to say that I got it running!
My other session was Thursday morning titled Data Modeling Best Practices for Enterprise Tabular Models. The session also had a full house, although it wasn’t so full that the moderator needed to close the doors. There was also a lot of interaction during this session and questions afterwards. My only regret about this presentation is that one of my DAX queries didn’t work. I had a cheat sheet so I could copy and paste longer code and avoid typing in front of the audience, but even this failed. I must have made a stray keystroke into the cheat sheet when I was reviewing the presentation beforehand in the speaker room. I reviewed it later and it turned out I had removed a parentheses from the middle of the DAX code. It’s really difficult to debug code during a session and after a couple of quick attempts and suggestions from the audience I decided to move on.
I should have looked on my own blog! The correct code was right there since I had blogged about the topic a week earlier, but I felt compelled to move on and not hold up the session any longer since the point I wanted to make with the failing DAX code wasn’t foundational to the presentation. However, it’s always disappointing when something like this happens because I spent so much time preparing. It makes me think I should prepare my demos using techniques from the Food channel, where I have a version of it already baked so I can go to it just in case it fails during the presentation. I’ll blog about the failed code later this week since it reinforces some of the points I was making regarding the BI Architect’s decision-making process for Enterprise Tabular Models..
I’ve written a couple of blog posts this week on Tabular Models, and for a good reason. I’m presenting a session called Data Modeling Best Practices for Enterprise Tabular Models at the PASS Summit next week, so naturally it’s a topic that’s on my mind. But I’m also presenting another session on how to develop a Real-time Data Warehouse, so I decided that I should write a post on that topic too, since it’s also been on my mind.
Real-time DW Tenets
Instead of discussing the technology implementation and associated difficulties of loading a data warehouse in real-time, I thought I would start with the three tenets that I think everyone needs to consider before embarking on a project like this. Everyone develops a real-time solution differently based on their needs, technology, hardware, and other factors, but there are a few basic ideas to keep in mind no matter how you go about it.
Process only the data you must and nothing more. To meet the challenge of real-time, don’t do anything extra, and don’t touch any data you don’t have to touch. You always want to handle as little data as possible while still getting the job done.
Don’t impact the Source Systems. Contention with the source databases will almost certainly cause the real-time process to fail. Make sure your real-time processes don’t use resources that are needed by the applications that generate the source data. Try to be as invisible to them as possible.
Take advantage of what SQL Server does for you. SQL Server is a rich product with a great number of features and tools that can help you with this endeavor, so take advantage of them. In my session I discuss how Replication, CDC, SSIS and other tools can be used as part of the solution. Don’t write code you don’t have to write.
PASS Summit 2012
To see me build a functioning real-time data warehouse in real-time, come to my presentation at the PASS Summit on Friday, November 9. The session is at 9:45am and is title Real-Time Data Warehouse and Reporting Solutions.