Introduction to Dimensional Modeling for Data Warehousing Part 2, Dimensional Modeling Principles
By Kostis Panayotakis
In part 1 of this article series, we described the general structure of a dimensional model. In the present article we shall describe the basic design principles of dimensional modeling. Dimensional modeling follows the four steps defined below. A. Selection of the business process (or processes), the performance of which shall be monitored. Business processes the performance of which is considered critical, and relevant data are sufficient (e.g. operations data derived from these processes), should be selected with priority. The selected business process, may relate to a single organizational unit, or spanning more than one organizational unit.
The capture of overlapping information by different departments which can lead to many versions of truth, is avoided through the capture of a single data stream for an ‘end-to-end’ process. B. Determination of the level of detail at which the process shall be monitored (also called grain statement). The grain statement is the first step in a dimensional model design. Examples of grain statement are:
Each product sold (meaning: an entry shall be created in the fact table for each product sold)
Each new service contract (e.g. insurance contract) o The daily snapshot of the stock in a pharmacy
The accumulated capture of all facts of a transaction which has been completed in more than one steps (e.g. the lifecycle of a tax transaction: tax statement submission – statement control – tax clearance – payment – final payment)
Based on the grain statement, one can derive the facts which should be stored in the fact table as well as the ‘surrounding’ dimensions.
The level of detail captured should be the lowest possible (atomic level). The lowest level of detail, includes the full scope of informational dimensions related to an event.
As soon as a higher level of detail is selected, by aggregating atomic data, certain event dimensions are lost.
Dimensional models aim to capture measurements, according to the way an Analyst views data:o Events that took place at a certain moment in time o Periodic measurements which provide a snapshot of the situation at a given moment in time
Complete view of a transaction which had more than one steps (did not start and complete at a single event)
C. Selection of the dimensions which form the event framework, within which the measurements were made. Common examples of dimensions are: date (or time) at which the event took place, Customer, product, branch office. The concrete definition of the level of grain facilitates the selection of dimensions. The lower the level of detail, the richer the set of dimensions which accompany the facts. D. Preliminary determination of the analysis methods to be implemented. Selection of the key performance indicators (KPI) for each monitored business process. Identification of the facts needed in order to derive these indicators. Given that conditions change, additional facts may be selected to be captured. These facts should relate to the same level of detail.
The dimensional model should be flexible enough, in order to allow its future enrichment with new facts in the fact table and new dimensional attributes. Copyright 2006 –– Κostis Panayotakis
View dimensional model examples from the Healthcare and Taxation sectors.
Κostis Panayotakis - http://www.pleroforea.com