I’ve written a couple of blog posts this week on Tabular Models, and for a good reason. I’m presenting a session called Data Modeling Best Practices for Enterprise Tabular Models at the PASS Summit next week, so naturally it’s a topic that’s on my mind. But I’m also presenting another session on how to develop a Real-time Data Warehouse, so I decided that I should write a post on that topic too, since it’s also been on my mind.
Real-time DW Tenets
Instead of discussing the technology implementation and associated difficulties of loading a data warehouse in real-time, I thought I would start with the three tenets that I think everyone needs to consider before embarking on a project like this. Everyone develops a real-time solution differently based on their needs, technology, hardware, and other factors, but there are a few basic ideas to keep in mind no matter how you go about it.
Process only the data you must and nothing more. To meet the challenge of real-time, don’t do anything extra, and don’t touch any data you don’t have to touch. You always want to handle as little data as possible while still getting the job done.
Don’t impact the Source Systems. Contention with the source databases will almost certainly cause the real-time process to fail. Make sure your real-time processes don’t use resources that are needed by the applications that generate the source data. Try to be as invisible to them as possible.
Take advantage of what SQL Server does for you. SQL Server is a rich product with a great number of features and tools that can help you with this endeavor, so take advantage of them. In my session I discuss how Replication, CDC, SSIS and other tools can be used as part of the solution. Don’t write code you don’t have to write.
PASS Summit 2012
To see me build a functioning real-time data warehouse in real-time, come to my presentation at the PASS Summit on Friday, November 9. The session is at 9:45am and is title Real-Time Data Warehouse and Reporting Solutions.
On Friday I blogged about how to create a composite key to establish a relationship between two tables imported into a Tabular Model from different data sources. My example used a report extract table containing sales data in a relational database and an Excel spreadsheet with sales projections. After making the relationship, the first thing you’ll want to do with it is to compare the actual sales data with the sales projections to see if you are meeting sales targets. Since projections were made by store and year, you’re going to want to compare them to the total sales by store and year.
Sales Difference Calculation
To make the comparison you have to write DAX code that sums the actual sales data and then subtracts the projection to determine by how much the actuals exceeded or under delivered against the projection. To do this we’ll create a new column in the SalesProjections table and call it SalesProjDiff. The column is defined using this DAX code:
=SUMX(FILTER(ReportExtract, ReportExtract[YearStoreType] = SalesProjections[YearStoreType]), ReportExtract[SalesAmount]) – SalesProjections[ProjectedSales]
This new column uses the SUMX function to sum sales by year and store type and then subtracts the sales projection for the same year and store. To ensure that the correct sum is compared with the right projection value, make sure the filter uses the composite key you created that defined the relationship between the two tables that we called YearStoreType in each of the tables. When this is done you now have a column called SalesProjDiff that users can access in client visualization and reporting tools. This allows them to see how sales have done versus projections in the past, and also to measure how current sales are tracking against future projections.
I’ll cover more data modeling scenarios like this one using Tabular Models in my session at the PASS Summit titled Data Modeling Best Practices for Enterprise Tabular Models at 10:15 AM on Thursday, November 8.
One of the best features of Tabular Models in SQL Server 2012 is that they can include data from multiple sources such as relational databases, cubes, Excel files, or just about anything else you can think of. The logical data model of the underlying data is not dictated by this new version of Analysis Services, thereby allowing a Tabular Model to be developed with practically any existing data that’s available to you.
Faster Ramp Up
This makes it easier to get started with Tabular, especially if you’re working with a single data source that has a consistent data model. But what about multiple data sources that are modeled differently? For example, when most of the detail data you need is in a report extract table, but some budget data is in an Excel file. To handle this properly and to avoid having a collection of disparate data in the Model, the data modeling process needs to be done after the import takes place and one of the ways to do this is to create the right relationships among the various entities.
Here’s the preexisting Report Extract table you’re working with. Notice that it contains sales data by store, product, brand and date.
One of the people in the finance department has this spreadsheet, which has sales projections by store type and year for the past, present, and future. You want report developers who use your Tabular Model to compare the actual to projected sales.
The two sets of data aren’t related so you can’t do the actual vs projected comparisons you need to do. One problem is that there isn’t a unique key in either table that you can use to create a relationship between the two tables. To solve this problem you’ll need to create a composite key from two columns in each table that will be used as a unique key in one of the tables. This will allow you to create the relationship.
Add a column called YearStoreType to the Report Extract table that’s the concatenation of the year and store type columns:
Add another column, this time to the Sales Projection table, call it YearStoreType also (although the column names don’t have to match to be related), and define it this way:
Now create the relationship between the two tables, using the Sales Projections table as the lookup table.
Add measures like sums, average and any other calculations you need to have in your model. When you deploy it and load it as a pivot table in Excel, you’ll be able to easily drag and drop the sales projections and compare them with the actual sales data for each year and store type. This will show you why Tabular Models, which do not require source data to be modeled in any particular form, forces you to define the data model within the Tabular Model itself so it to makes sense to report developers and Excel users.
To learn more about data modeling in Tabular Models, attend my session at the PASS Summit titled Data Modeling Best Practices for Enterprise Tabular Models at 10:15 AM on Thursday, November 8.