In my previous post I introduced a particular type of Data Warehouse, one focused on business processes. I pointed out six differences between a traditional, transaction-based data warehouse and the business process workflow-oriented warehouse. This section will cover the first 3 items in the list:
1. Business processes span long periods of time, some take hours, but others months or even years.
2. Workflows don’t have definite time periods and some may not end.
3. Workflows almost always require accumulating facts that must be updated after they are stored when more data arrives.
These assertions force us to think about fact tables differently than we are accustomed to. Business process workflows modeled as fact tables will often be factless, but with a large number of dimensions. Business processes often don’t have an obvious value that should be measured, so the important value to measure is the ‘fact’ that the business process has occurred or is occurring, and the length of time the workflow has taken to complete. These types of fact tables become the norm, instead of the exception that they are in the transactional data warehouse.
This type of table is known as an accumulating fact table, as termed by the widely used Kimball methodology for developing data warehouses. An accumulating fact table is updated as new data emerges when all of the data is not present at the time the fact row is created. For example, I worked on a data warehouse where children were placed in foster homes by social services. The placement of a child prompts the creation of a fact row in the data warehouse, but several columns of the fact could not be populated, since they had not yet occurred. Data such as the end date of the placement and the placement end reason are not known until the placement comes to an end at some indeterminate date in the future. However, creating the fact at the time the placement begins has value since it can be used to measure the number of children in placement in various ways using a number of dimensions. The process from beginning to end often takes months and can take years.
Don’t worry about having to update this kind of table several times after it’s been stored. This will be a common event in a business process workflow data warehouse. In the example of the child’s placement, several checkpoints along the way generate more data. The dates of the checkpoints and any relevant data about the checkpoints is captured and the fact row is updated with the new data. Before the late arriving data arrives, any dimension pointers for that data will point to a dimensional value called ‘Unknown Value’ or ‘Data not Available’. This provided our users with a good way to know that the placement hasn’t ended yet.