The Business Process Workflow Data Warehouse – Part 2
In part 2 of The Business Process Workflow Data Warehouse I will review some of the problems encountered with data quality when developing this type of warehouse. Data quality is often a sizeable problem the data architect needs to solve before deploying a data warehouse. Poor data quality in any data warehouse can be a showstopper that causes major delays and even outright failure. Be prepared for these problems to be worse in the Business Process Workflow-oriented Data Warehouse. The points numbered 4-6 below are the second of six points I made earlier and they address the quality of your data in different ways:
4. Workflow data can be removed or hidden due to incorrect data entry, legal issues, or duplicate data entry.
5. There are no standard practices for data as there are with financial, accounting and sales applications.
6. Data is often much more poorly entered and verified.
The central theme to these three statements is that workflows are vague and imprecise when compared with their transactional counterparts. Businesses are dependent on their transactional data. An order can’t be processed unless the transaction has been entered into a software application. However, a child can be visited by a social worker without entering any data into their computer system. If the social worker visits the child and enters the wrong date for the visit, nothing is missed, except the measure is reported incorrectly. Two rules of thumb to keep in mind when working with workflows:
o Data entry for workflows can be wrong while the work performed was correct.
o Data entry often occurs after the work has been performed.
To support this ‘flexible’ type of data entry requires a change in the data architect’s way of thinking. Your design must allow for data to be entered after the fact. Your design must allow for data to be corrected. Your design must allow for partial data to be entered for an activity, and then updated later with more data for the same activity.
This approach can be hard to accept if you’ve developed a transactional data warehouse before. You will break rules that you previously thought were untouchable, but accepting these ‘broken’ rules will put more accurate measurable data into your data warehouse.
The Business Process Workflow Data Warehouse – Part 1
The Business Process Workflow Data Warehouse
There are a number of different ‘types’ of data warehouses. I say this with trepidation because a data warehouse is usually not referred to as having a type. Perhaps I should rephrase my statement; there are a number of different types of data marts. A data warehouse can contain multiple data marts, each with a different aim. For the purposes of this discussion, I’m talking about a data warehouse as if it has a primary objective, which in my experience most do.
Read any book on data warehousing, look at a blog with a sample star schema, or take an introductory data warehouse design class, and you will see the same pattern. You will see a fact table representing a well-defined transaction surrounded by several dimensions. The fact table will be easy to understand; it usually reflects a common transaction that has been mapped from an application in a straightforward manner. It will look something like this:
In the diagram above, the fact table is derived from a sales transaction that occurred in an OLTP database. This star schema is a good learning tool and it’s a pattern that can be reused for many transactional processes, but let’s talk about business process workflows.
Examples of Business Process Workflows
Business Process Workflows happen over periods of time, are vague in their scope, and acquire new or changing data as the process moves forward. Also, it can be uncertain when they are actually complete. These are a few examples of the kinds of business processes I’m referring to:
– A financial services company receives a request from a customer to open a new 401k account.
– A social worker places a child in a foster home.
– A laptop in sent in to the manufacturer for warranty repair.
What’s different about the Business Process Workflow data warehouse vs. the Transactional data warehouse? The following are significant distinctions:
1. Workflows span long periods of time, some take hours, but others months or even years.
2. Workflows don’t have definite time periods and some may not end.
3. Workflows almost always require accumulating facts that must be updated after they are stored when more data arrives.
4. Workflow data can be removed or hidden due to incorrect data entry, legal issues, or duplicate data entry.
5. There are no standard practices for data as there are with financial, accounting and sales applications.
6. Data is often much more poorly entered and verified.
In the next several days, I will discuss each of these points in more detail, review why they make the Business Process Workflow data warehouse a true challenge, and discuss solutions for each. There is a paradigm shift that must be grasped and some deep-seated data warehouse modeling principles must be relinquished.