HOLISTIC DATA WAREHOUSING
ON MICROSOFT SQL SERVER 2008 A New Data Warehousing Strategy, Methodology and Guide to the Free Ready Made Template for Full Supply Chain and Sales & Operations Reporting
Gerry Phillips and Jane McCarthy
For-tee Too Sight Publishing Melbourne, Australia
www.42sight.com 42sight.blogspot.com Page | i
Published by For-tee Too Sight Publishing 49 Mowbray Drive Wantirna South Melbourne, Australia, 3152
www.42sight.com 42sight.blogspot.com
All rights reserved. No part of this book may be reproduced in whole or in part without written permission from the publisher except in the case of brief quotations embodied in reviews. For information address: For-tee Too Sight Publishing, 49 Mowbray Drive, Wantirna South, Melbourne, Australia, 3152
Page | ii
Contents Acknowledgements ................................................ 1 Imagine .................................................................. 2 Introduction ........................................................... 4
Who Are We? ............................................................................................4 Book Overview ...........................................................................................5 How this Book is Structured .......................................................................7
What is a Data Warehouse? ................................... 8
Our Definition of a Data Warehouse ..........................................................8 The Data Warehouse Environment ............................................................9 A DW System Compared to a Transactional System .................................11 A Comparison of a Holistic DW to a Conventional DW .............................13 Full Supply Chain Reporting – “One Page” ...............................................16 What Data Did Our First Data Warehouse Have? .....................................20 What is a Data Warehouse? – Summary ..................................................22
The Holistic Data Warehouse Strategy.................. 23
The Holistic Data Warehouse Vision ........................................................23 The ‘Standard’ Data Warehouse Goal ......................................................23 What is the Supply Chain? .......................................................................24 Full Supply Chain Reporting Goals............................................................25 Objectives of All Data Warehouses ..........................................................27
Minimize Inconsistent Reports and Reconcile Different Views of the Same Data. 27 Improve Quality of Data ..................................................................................... 29
Page | iii
To Consolidated Enterprise Data from Multiple Sources and Time Periods ......... 32 Make the Data Easily Accessible and Provide Transparency ............................... 33 To Enable Common and Flexible Calendars ........................................................ 35 To Save Time on Report Preparation and Construction ....................................... 35 To Address the Weaknesses of Current Reporting Systems ................................. 36 To Empower People with Information ................................................................ 37
The Additional Objectives of the Holistic DW .......................................... 38
To Enable Pre-emptive Reporting of Events that are Expected to Happen........... 38 To Enable a Single Cross Functional Report ........................................................ 39 To Offer all Supply Chain Related Information .................................................... 42 To Effortlessly Replace All Your Reporting Systems ............................................. 45 To Address Deficiencies in the Operational Systems ........................................... 46 To Be Capable of Unlimited Measures ................................................................ 48 To Allow Unlimited Product Hierarchies ............................................................. 53
In Summary the Benefits of the Holistic DW............................................ 53 The Holistic DW Strategy in Summary ..................................................... 54
Our First Data Warehouse Project......................... 55
Why the Need for a Data Warehouse? .................................................... 55 The Consultant’s Health Check ................................................................ 56 The Project Begins with a Simple Objective ............................................. 57 Our Data Warehouse Design Brainstorm ................................................. 57
Two Hours Later a New Methodology ................................................................ 57
Other Factors That Influenced this New Methodology ............................ 61 Two Weeks Later the Prototype .............................................................. 62 Apparently we are “Doing it the Wrong Way!” ....................................... 63 We Decide to Research the “Correct Way” ............................................. 64
Page | iv
The Prototype’s First benefits for the Business ........................................65 Senior Management Excited About the System .......................................66 From Access Prototype to Microsoft SQL Test Server ..............................67 The Final Server and Launched Six Months After Starting ........................68 Switching Off the Old Reporting Systems .................................................68 SAP Story .................................................................................................68 Summary of Our First DW Project ............................................................69
Technical Section .................................................. 70
Introduction to the Technical Section ......................................................70
Data Warehouse Infrastructure ............................ 71
Microsoft SQL & Windows Server 64 bit ..................................................71
The Holistic DW – Under the Hood ....................... 73
The Translated Data Table (TDT) ..............................................................73 The Holistic Data Warehouse Data Table .................................................78
The Holistic Data Table vs the Star Schema Fact Table ....................................... 79 The Overall Data Table Structure........................................................................ 82 The Data Table Data Elements ........................................................................... 93
The Holistic Data Warehouse Linked Tables...........................................103
Smart Values in the Linked Identity Field .......................................................... 105 Holistic Linked Table Joins ................................................................................ 106 The Types of Linked Tables Used....................................................................... 107 “Conforming” in the Holistic DW Model Linked Tables ...................................... 109 Linked Tables in More Detail ............................................................................ 110
The Translated Data Table Linkages .......................................................140
The Simple Linkages with Single Links: .............................................................. 141
Page | v
The Complex Linkages with Multiple Links: ....................................................... 146 The Translated Data Table Conversion Calculation ........................................... 148
The Translated Data Table Summary ..................................................... 148 The Full Supply Chain Top Down/Bottom Up Reports ........................... 149
The Top-Down Reporting ................................................................................. 150 The Bottom-Up Reporting ................................................................................ 153 Full Supply Chain Reporting in Summary .......................................................... 154
The Holistic Data Warehouse “In a Nutshell” ........................................ 154
The “Rules” of Data Warehousing ....................... 155
The “Holistic” View – Are They Applicable and Why? ............................ 155 Rules That Are Applicable to the HDW .................................................. 156 Where the HDW Bends the Rules .......................................................... 162 Where the HDW Breaks the Rules ......................................................... 165 Our Bottom Line on Data Warehouse Rules .......................................... 173
How to Populate the Holistic Data Warehouse ... 174
The SQL Server “SSIS Import and Export Wizard” .................................. 175 The Microsoft Adventure Works Holistic DW Load................................ 184
The Initialisation Phase .................................................................................... 186 Loading the Data Table .................................................................................... 187 Loading the Item Tables ................................................................................... 216 Loading the Entity Tables ................................................................................. 228 Loading the Representative (Rep) Table ........................................................... 237 Loading the Reason Table ................................................................................ 239 Loading the Conversion Table........................................................................... 243 The Wrap Up Section of the Adventure Works Load ......................................... 248 In Summary the Adventure Works Load Into the Holistic DW ........................... 265
Page | vi
The Data Loading Process (DLP) .............................................................267
ETL, ELT, ELTLT, Blah, blah, blah…..................................................................... 267 The Holistic Staging Database in SQL Server ..................................................... 273 Using Microsoft Access as a Staging, Cleaning Tool and Data “Portal” ............. 274 Adding Ancillary Data Back on the Data Source ................................................ 275 Timing of the Loads .......................................................................................... 275 The Batch Loading (and Deleting) Approach to DLP .......................................... 275 There are Many Tools in Your DLP Toolbox ....................................................... 277 The SQL Server Integration Services (SSIS) Wizard is Your Friend ...................... 278 Wikipedia Article on SQL Server Integration Services ........................................ 279 Slowly Changing Linked Tables/Dimensions – Effective Date To and From Links 282
Aggregations – Alternate “Frequency” Loads.........................................283 Summary of How to Populate the Holistic DW.......................................284
How to Use the Holistic Template....................... 286 Reporting From the Holistic Data Warehouse ..... 292
Reconcile the Reporting .........................................................................292 Queries Over the Translated Data Table (TDT) .......................................293
Technical – Getting the ODBC Connection to Work ........................................... 293 Writing the Query Over the TDT ....................................................................... 294
Using Prime Report ................................................................................294 Using Prime Report to Build Reports ......................................................299 Reporting Over the Top Down and Bottom Up ......................................300
Template Top Down/Bottom Up Spreadsheet Models ...................................... 302 Cost Sensitivity Analysis with the Bottom Up .................................................... 303
Variations on the Translated Data Table (TDT) ... 305
The TDT with Selectable Date Periods....................................................305
Page | vii
The Linked Calculator Table and the TDT .............................................. 309 Security Tables/ Linked Views and the TDT ........................................... 312
How the Full Supply Chain Queries Work ............ 316
The “Top Down” Query (TDQ) Explained ............................................... 316
The Top Down Views from an SQL Perspective ................................................. 325 Summary of the Top Down Query (TDQ)........................................................... 339
The “Bottom Up” Query (BUQ) Explained ............................................. 340
The Bottom Up Views from an SQL Perspective ................................................ 348 Summary – Bottom Up Query (BUQ) ................................................................ 362 Summary – The Full Supply Chain Queries ........................................................ 363
Appendix ............................................................ 365
Where to Find the Holistic DW Template Download ............................. 365 How to Install the Holistic Data Warehouse Template .......................... 365 How to Install the Adventure Works Demo Loads ................................. 369
Bibliography ........................................................ 373 Acronyms Used Throughout the Book................. 375
Abbreviations ........................................................................................ 376
Page | viii
Acknowledgements
Mark McCarthy for help with the word smithing, proof reading, nice breakfasts on the weekends and his support Amanda Phillips for putting up with Gerry while he was pre-occupied with the book Daniel Moorfield for giving us the opportunity and having faith in our capabilities to build our first data warehouse Mark Phillips putting us on track with Microsoft SQL Server Russell Eves for his support and help with proof reading Ray Phillips for helping with the proof reading Jane Wong for helping us with book ideas For all our friends and family who kept on asking “is the book done?”
Page | 1
Imagine
“Logic will get you from A to B. Imagination will take you everywhere” – Albert Einstein
Imagine a real silver bullet for data warehousing:
Imagine a single data warehouse that: can be used to store practically any information about what is going on in or outside your business is one central data store for all historical, operational and forward looking information Imagine a reporting system: that seamlessly provides all the information summarised on one report where the user could open a single spreadsheet and explore all the data has unlimited potential and infinite applications Imagine a data warehouse implementation: without a 6+ month planning phase where you do not have to design the model and instead use a standard template that takes a matter of hours to install the empty model that does not tax key resources that does not require a major financial strategy to implement Imagine a business intelligence environment where: you never need to build another data model and reporting system you do not have to wait months before new types of data are made available for reporting you can adapt it to changing requirements and conditions as you go without having to re-train and re-do all the reporting e.g. business acquisitions and mergers when a new report requirement for new type of data can be met within days a forum exists where you are able to share your ideas, ask for help and benefit from ideas input by other users in businesses employing the same template you have time to help users with their advanced reporting and analysis needs Imagine the improvements for your people and processes where:
Page | 2
Imagine Page |3 managers are enlightened to what factors are actually important, now that all the data and information they were lacking previously, is readily available people are no longer wasting time manually preparing reports people no longer blaming the lack of data for not being able to do their job properly managers transcending above the data and being able to focus on ensuring your business’s success Imagine being able to achieve this with only intermediate skills in reporting and analysis using query language, without an IT qualification. Imagine being able to do this with only a budget of up to $30,000 for the hardware and software Imagine bringing your business out of the information dark ages into a new era of business information transparency and people empowerment within a few months Imagine all this.... well you can now give your imagination a break as we have done it and you can too with the Holistic Data Warehouse.
These and many more outcomes are now possible with a single data warehouse model that can handle all your current and future needs. This model is our free Holistic Data Warehousing template, on Microsoft SQL Server 2008.
* Links to where you can download the free template are via our website www.42sight.com
Introduction
As a company grows it becomes more of a challenge to locate all the necessary information to make calculated decisions. Until now there has been no single system that will give you all the answers you need to proceed without spending a great deal of time and money. “Holistic Data Warehousing” is a book about a new simple method for developing a powerful model over all of your Business data. We go through the strategy behind the Holistic approach and contrast it to the more traditional and complicated data warehousing methods. When used, this strategy and method will help you foresee your Company’s future by providing data from all aspects of the business not just the traditional historical data and business plans. This model and methodology takes full benefit of today’s computing power and as such it “achieves complexity through simplicity”. The pre-emptive reporting facet allows reporting on current operational data and plans that provide the user the ability to analyse, project and “fore tell”. Data warehouses using our template, or modelled on our design schema, are extremely flexible and can be loaded with new information in very short time. When there is a need for new data to come through the reporting process it can often be made available within hours of the request and does not require many days or months of effort and rigmarole. This provides a highly potent system that significantly empowers its users with information and analysis when they need it.
Who Are We?
In short we are business people with no IT qualifications, starting our careers at the beginning of the PC revolution. At the time we started our first data warehouse project we were in the Finance & IT team at an A$100 million “Fast Moving Consumer Goods” (FMCG) business in Australia, but with quite varied and broad backgrounds. Each of us is multi-skilled and was responsible for functions that would normally be performed by multiple people across different departments. This business is the Australian operation of a large US$3 Billion+ food business with head quarters in the USA and operations all over the world. This food business manufactures most of their products using Australian and imported ingredients and sells to the big grocery retailers, the main fast food chains and other food manufacturers. The business is very diverse with over 25 production lines, thousands of ingredients and a portfolio of over 1000 products. Jane, in her 20+ years in the business, has vast experience in Customer Service, Sales and Information Technology using her business system knowledge to provide a whole slew of Page | 4
Introduction Page |5 reporting from many different sources of data. Jane was the business’s E-Commerce and barcode systems expert. Along with this Jane was the business’s expert in our trade marketing and scan data analysis systems.
Book Overview
Gerry similarly has widespread experience with financial, costing, management accounting and commercial roles over his 20+ years in the Australian operation and more recently in the associated businesses across the Asia Pacific zone and involved with international projects originating from the USA head office. He is an Australian Certified Practicing Accountant (CPA) with a Bachelor of Economics and the business’s expert in management reporting and financial planning. Up until 2008, Gerry developed and built the Company’s financial, costing and management reporting, budgeting and forecasting systems. The Company’s sales forecasting systems were also developed by Gerry using varied software packages. The as mentioned above, we are both completely devoid of any formal IT training or qualifications, and are both basically self taught from an IT standpoint. Even so, this Australian business had complete faith in us to deliver a data warehouse, covering the whole supply chain, quickly and on time. We were given $30,000 to spend and six months to complete it from scratch. We are not professional authors. We are two business people that decided to write, produce and publish a book to about a paradigm shift approach to data warehousing and business intelligence. Our major goal with this book is to empower businesses that are daunted by the cost and expertise normally required when implementing a world class business intelligence system. With the free Holistic Data Warehouse Template, the ability to implement a business Intelligence system is now within reach for all, with or without IT qualifications. Even for big businesses, where they are entrenched in their current technology and where an unconventional model might be difficult to get off the ground, the Holistic Data Warehouse Template will have considerable benefits as a prototyping system in areas where experimental efforts are difficult to cost justify. All you need is access to data, MSSQL developer edition (US$37) and this template, and you too, can build proof of concept Full Supply Chain Reporting in only a week or two
Book Overview
This book is targeted at a widespread audience from those in Senior Management across to those users in the business with self-taught IT skills and know-how that are interested in embarking on an in-house data warehousing project. The primary target for this book is IT savvy business people like we were, starved of IT resources and tools, in a business with a
Introduction Book Overview Page |6 legacy environment and too small to embark on a fully fledged business intelligence solution with an astronomical cost. The book is about our business intelligence strategy and our Holistic Data Warehouse model. Throughout the book you will be exposed to our IT philosophy and this we are un-apologetic about and realise that in many cases we break with convention. The plan is for the book to be a part of a bigger solution that includes the website and the online community that we would like to build and grow. Primarily the book serves as the guide to the Holistic Data Warehouse Template and our strategy. This template if adopted by everyone using this book and method will provide the means for you and your fellow Holistic Data Warehouse Users to share queries and reports. This is especially relevant for those of you in a supply chain business, in the business of buying, and/or manufacturing and selling things. The first chapter presents an introduction to data warehousing as a concept and establishes at a basic level the differences between our Holistic Data Warehouse model and the traditional models. At this early stage we give you a taste of the powerful reporting and analytics that the Holistic DW Model, was from the beginning, designed to provide. The next chapter covers our philosophy of data warehousing and the Holistic Data Warehouse Strategy in contrast to the normal Data Warehousing goals and objectives. Our strategy is one that is very ambitious, forward looking and pro-active where the normal approach is to be subservient to business requirements, backwards looking and re-active. We then take you through a timeline of our first data warehouse project and describe how we formulated this methodology in the absence of any understanding of the current data warehousing methodologies. We did not know who Ralph Kimball and Bill Imnon were (two Data Warehousing pioneers and leading authorities in the field) until after we began our prototype. As users building a system from scratch, we knew what we wanted to be in it; everything! The Technical Section of the book goes into more detail of the workings of the model and how to use the included SQL Server template. This begins with the Holistic DW – Under the Hood chapter where we go through all the essential aspects of our model beginning with the Translated Data Table and all of its elements. All through the book we demonstrate the power of the Holistic DW Model by showing examples over the Microsoft Adventure Works sample data including cost sensitivity analysis. The next chapter of the book, the largest, takes you through the most important topic, for our template, which is “How to populate…” the Holistic Data Warehouse model. We go
Introduction Page |7 through in detail the demonstration load of the Microsoft sample data into the model fully documenting every step in the comprehensive load. Almost every scrap of information that would be useful for business reporting was loaded into our template.
How this Book is Structured
The remainder of the book, including the appendixes, shows how to install the Holistic Data Warehouse Template for SQL Server 2008 and how to use it to build a data warehouse for any Business data. The use of our template reporting including our Prime Report spreadsheet is demonstrated and the installation explained. The template includes two Microsoft SQL Server Integration Services (SSIS) routines for both the 2005 and 2008 versions of sample database from Microsoft for Adventure Works. Our strategy with this book, template and online presence is to foster a new community around the template with a website where users can exchange ideas and where we can provide support back and share improvements, add-ons reports and models.
How this Book is Structured
This book starts at a very highly summarised level and slowly spirals downward through the subject areas drilling further and further into more detail. As the book progresses the concepts, commentary, explanations, documentation and diagrams become more detailed and complex. We have chosen this approach so that the reader can start at the 50,000 foot level slowly descending through greater degrees of complexity along the learning curve. This means if you pull out before the end of the book you will still have been through most of the topics relating to the Holistic Data Warehouse and our philosophy on business intelligence, systems and reporting. Inevitably this spiralling approach results in some areas being repetitive but we have tried our best to limit this and keep you, our reader, engaged. This book is heavily illustrated to visualise the concepts that we are covering and uses many reporting examples from the Holistic DW Model with the Microsoft Adventure Works demonstration data. Many of the concepts are abstract and much easy to explain with visuals and example reporting.
What is a Data Warehouse?
In this chapter the conventional theory relating to a data warehouse (DW) will be summarised, particularly our view of the “Dimensional theory” of data warehousing as professed by the pioneer Ralph Kimball (a leader of data warehousing theory since the 1990’s). We then briefly describe the business intelligence environment that a DW belongs to and its part in it. This will then be contrasted to transactional systems and at a very high level we summarise our “Holistic” approach to data warehousing with a comparison to the conventional dimensional DW methodology. Finally “Full Supply Chain” reporting is previewed giving a taste of the powerful reporting and analysis that we had in mind when we first had the idea for the Holistic Data Warehouse.
Our Definition of a Data Warehouse
A data warehouse (DW) is the component of an environment that holds and makes available a large collection of information from disparate systems in a dimensional database structure. The overall DW environment includes the process of extracting data from the source systems and the tools to deliver decision support information. “A decision is the action an executive must take when he has information so incomplete that the answer does not suggest itself” – Arthur William Radford Our definition has many elements and we will break them down: “makes available a large collection of information” This relates the aspect of a DW where it is used to supply data for querying and analysis. The DW data store is not suitable to support other types of systems like a transactional system. The data structures of a DW are perfect for analysing data on a mass scale. They support big, potentially complex queries, over significant amounts of data and sometimes spanning lengthy periods of time. This is an environment with a few large sporadic queries are made by users, rather than a transactional system with frequent micro and simultaneous updates to many different tables. In summary in a daily business cycle a DW is a “write once, multiple reads, few users” system where a transactional system is a “24 hour many users multiple read/write/update system”. The ratio of users of a DW to a transactional system in a business varies and it parallels the pyramid organisational structures found in businesses. The people further up the pyramid are the typical DW user where people near the base of the organisation’s pyramid are skewed to transactional system use.
Page | 8
The Data Warehouse Environment
What is a Data Warehouse? Page |9 “dimensional database structure...” traditionally this refers to the Kimball “Star Schema” for a relational database and “cubes” which is a term often used to describe the data repositories used in business intelligence tools. We have our version of a dimensional database in the Holistic Data Warehouse model. This is referring to a single place rather than many disparate places. By using dimensional structures to store data we achieve the ability to use filtering, aggregations and other techniques to organise the data for reporting, with each data element classified by the associated dimensions.
“the process of extracting data from the source systems” This aspect is the Extract Transform Load (ETL) or as we call it the “DLP – Data Loading Process”. This process is common to all data warehouse (DW) load processes as they use data from disparate sources and bring it together into the one place. The chapter called How to Populate the Holistic Data Warehouse from page 174 is a major part of this book and justifiably so, as the task of populating any DW is a major undertaking. “to deliver decision support information” as above with the typical users of a DW these people are often the “decision” makers of the business. The DW system provides data and reporting that supports the people making decisions. Of high value is information that supports people making strategic decisions. Transactional systems can support people making “operational” decisions. However, we argue that a DW can sometimes fill information gaps in a business, where the “transactional” system is lacking capabilities to support operational decision makers at lower business levels. This is one of our objectives of the Holistic Data Warehousing Strategy “To Address Deficiencies in the Operational Systems” covered in more detail on page 46.
The Data Warehouse Environment
As seen in Figure 1 below we see at a very high level that a data warehouse (#3) sits above your current computer systems (#1) that are used for transaction processing and recording of business information. It is 100% reliant on these source systems for data and rarely has any facility for manual data entry and manual processing. In this diagram we see that with our Holistic methodology the data warehouse (DW) is a “single data store”. Most data warehousing environments have multiple data stores (“Data Marts”).
What is a Data Warehouse? P a g e | 10
The Data Warehouse Environment
Figure 1 – The data warehousing environment
We call the process that takes this data, processes and loads it into the DW “DLP” – “Data Load Process” (#2). Conventional theory refers to this as “ETL” – “Extract Transform and Load” a term we consider to be too rigid. Refer to the chapter The Data Loading Process (DLP) on page 267. The system(s) that deliver information from the DW to the user is the reporting front-end (#4) and these are sometimes referred to as “BI” – “Business Intelligence”. We include with our template some Excel models that provide the basic reporting.
A DW System Compared to a Transactional System
What is a Data Warehouse? P a g e | 11
Figure 2 – legend to Figure 1 – The data warehousing environment
A DW System Compared to a Transactional System
We could use a whole book chapter comparing a data warehouse (DW) system and environment to transactional systems. Instead this topic will be briefly covered in a few paragraphs. There are differences based on the purpose of each system and there are differences from a technical and architectural perspective. In Figure 3 below, at a very high level, you can see how a transactional system can be designed with a schema from the demonstration system Microsoft Adventure Works. This diagram depicts the usual arrangement of many interconnected files fulfilling multiple purposes. It is too large to reproduce in detail in this book but we wanted to give you an impression of how complex and intertwined a transactional system can be. DWs are purpose built to do reporting and analysis where transactional systems are obviously built to primarily process transactions. In DW literature these are often referred to with an unusual acronym of “OLTP” – “On-Line Transaction Processing”. This is now a very old fashioned term because in these days are there any “Offline” transaction processing systems? These would be systems where paperwork is processed in batches by “key punch operators” or with punch cards (where the term “key punch operator” is derived from).
What is a Data Warehouse? A DW System Compared to a Transactional System P a g e | 12 Often businesses have many transactional systems using different platforms and in global organisations these transactional systems are often located across many countries and different languages. These factors make combined business reporting impossible without a DW.
Figure 3 – The Microsoft Adventure Works schema as an example of a transactional database (found at Microsoft’s www.codeplex.com website)
Transactional systems often have low historical data retention where one of the purposes of a DW is to retain data. However, this will become less important reason for a DW, as hardware capabilities grow easily enabling years of history to be stored without detriment. Performance issues are usually a primary concern driving the need for a DW. However, as business systems become more powerful, this will be irrelevant. As computers become more powerful the integration of data from multiple sources should be the primary reason for a DW. The DW will be organised for reporting, using business intelligence systems, whereas the transactional system will be optimised for many concurrent users, all processing updates to many tables and have a reputation of being notoriously slow to run “Large” reports.
A Comparison of a Holistic DW to a Conventional DW
What is a Data Warehouse? P a g e | 13
A Comparison of a Holistic DW to a Conventional DW
Figure 4 – The Holistic Data Warehouse generic multi-purpose “Link” structure
The Holistic Data Warehouse (HDW) represents a modern methodology of data warehousing. Simply it is one multi-purpose data warehouse model for all your business intelligence needs. The template is designed for a simple implementation and it adapts to additional requirements without modification as you go. This is made possible by the HDW having a simple generic “Link” structure as depicted in Figure 4 above.
What is a Data Warehouse? P a g e | 14
A Comparison of a Holistic DW to a Conventional DW
Figure 5 – A “Mesh” of conventional and customised data warehouses (Data Marts)
The conventional “dimensional” approach is to use multiple single purpose data warehouse models, each requiring a fresh implementation and addressing a single area of the business. In an optimised environment these models would share some of their structures and
A Comparison of a Holistic DW to a Conventional DW
What is a Data Warehouse? P a g e | 15 dimension tables. When viewed as a whole this would resemble a “Mesh” of models as depicted above in Figure 5. The “Time Dimension” and “Product Dimensions” are typical of dimensions that should be shared to ensure consistency in reporting from these systems.
The conventional “Mesh” approach is used in a Microsoft sample Adventure Works data warehouse based on the simulated business “Adventure Works” transaction system. This Adventure Works DW has a mesh of five Fact Tables & 16 Dimension Tables some of which are shared between each model. It is a sample system that perfectly typifies the conventional approach to a DW. We use the Adventure Works data throughout this book and have a full data load in the template to the HDW. The data from Adventure Works is almost perfect for demonstrating and proving the capabilities of the HDW. With conventional models the design requirements are needed up front, often with strict change governance, resulting in changes being difficult to apply. Where dimensions are shared this can reduce workload except where modifications are required. Another factor burdening the conventional approach is that each of these models would in turn require their own documentation and people to support them. The conventional design was required due to the poor performance hardware of yesteryear where a Holistic DW system would have been prohibitively slow. Computers have now become so powerful, that even when compared those available 10 years ago, they are practically zero cost for the same performance. The speed available from today’s hardware can more than compensate for this performance difference and the additional cost of this hardware is insignificant compared to the implementation cost of a conventional system. This is discussed in more detail later in the book. ‘Dimensional cube / OLAP’ systems are another data store option for business intelligence systems. We consider these to be in a different space to the relational database models that are used as data warehouses. However, “OLAP” systems often need to sit above a relational data storage system, and often these are either data bases of the “dimensional” or the “normalised” flavour. In summary the conventional approach to data warehousing, used since the 1990s, has proved to be very effective and powerful. Our concerns are with the cost and difficulty of implementation and with the level of expertise required to build, grow and maintain them. In these modern times businesses should have higher expectations from their systems; they need an adaptable and agile data warehouse that is quick to implement and modify. They need more powerful reporting such as “Full Supply Chain Reporting”; they need a Holistic Data Warehouse.
What is a Data Warehouse? P a g e | 16
Full Supply Chain Reporting – “One Page”
Full Supply Chain Reporting – “One Page”
We had “Full Supply Chain Reporting" in mind right at the beginning when we formulated the Holistic Data Warehouse methodology. This will be covered in detail in the Strategy section but we will give some brief insight into it here in this introduction. The objectives of Full Supply Chain Reporting are to provide pre-emptive, forward looking, cross functional reports and for manufacturing businesses full drill through reporting. A sample of a cross functional report is shown below in Figure 6. These Sales and Operations reports provide a holistic view over the business and give them insight into the future business and operational plans. Many businesses have analysts that prepare these cross functional reports manually, and the process requires data from many different sources to be consolidated on the summary reports. The Holistic DW is special in that these reports can be made automatically available, not only for the full business and once a month, but daily and for any level of detail desired. The report is always up-to-date which avoids the usual traps of out of sync information. Our philosophy is that business should be demanding from their reporting systems a single view over their whole business. These are the “everything you wanted to know about your business on the one page but were afraid to ask” reports. Kimball refers to this type of reporting as “drill-across” reporting. A normal DW requires significant investment of time and effort in ensuring your separate reporting systems share common dimensions, the Holistic DW Model does this reporting naturally.
Full Supply Chain Reporting – “One Page”
What is a Data Warehouse? P a g e | 17
Figure 6 – A sample Total Business cross functional Sales and Operations report
What is a Data Warehouse? Full Supply Chain Reporting – “One Page” P a g e | 18 However, “drill-across” is nice to have, but for a manufacturing business the Holistic DW goes one better with “drill across AND drill through” reporting – The “Top-Down” and “Bottom-Up” reports. These reports are only applicable to manufacturing and assembly businesses as they will go through multiple levels of production to explode the entire supply chain and include complete reporting on all components. That is why we refer to this as a Full Supply Chain “Top-Down” report. We depict the type and structure of the information automatically revealed by these reports below in Figure 7, where through “Production In”, data components are found and then supply chain information is reported for them. The partner report is a “Bottom up” which begins at a component to find where it is used and provides complete reporting for all end products. The “Top-Down”/”Bottom-Up” reports are included with the Holistic DW Template and they will work automatically if implemented according to our guidelines, (found in the technical section – Reporting from the Holistic Data Warehouse). Additionally the reports are covered in more detail in the Holistic Data Warehouse – Under the Hood Chapter and documented fully with details of how they work in the last chapters of the book. Systems that just provide siloed reporting should be a thing of the past as even existing DW systems can be modified to provide Full Supply Chain Reporting but with varying degrees of difficulty and cost.
Full Supply Chain Reporting – “One Page”
What is a Data Warehouse? P a g e | 19
Figure 7 – The information revealed by a Holistic DW "Top-Down” report
What is a Data Warehouse? P a g e | 20
What Data Did Our First Data Warehouse Have?
What Data Did Our First Data Warehouse Have?
The following diagram, Figure 8 – Data within our first model on page 21, depicts all the different types of data in our first attempt at data warehousing. This business intelligence system was constructed, including front-end reporting, to this level detail within three months of beginning the final model after prototyping in Microsoft Access. All made possible by this powerful Holistic Data Warehouse model. The first Holistic DW Model has reporting available in any or all of the following measures. $ Value Kilograms Qty Standard Cost “Latest” cost 15 Std Cost elements Cost elements like: o Raw Material cost o Labour Cost o Packaging cost And 15 “latest” cost Elements by date This means we can run sales reports valued in packaging cost or labour cost in addition to the normal sales “revenue” measure. Notably we had most of this information in the Microsoft Access prototype within a month of starting, so this is no way a significant undertaking, even for people like ourselves with limited IT experience.
What Data Did Our First Data Warehouse Have?
What is a Data Warehouse? P a g e | 21
Figure 8 – Data within our first model
What is a Data Warehouse? P a g e | 22
What is a Data Warehouse? – Summary
What is a Data Warehouse? – Summary
In this “What is a Data Warehouse?” chapter we briefly defined a data warehouse and then contrasted the Holistic Data Warehouse to the conventional type of DW. The two contrasting approaches were described as a single Holistic “Link style” to a conventional “Mesh” arrangement of multiple DW models with shared structures. We define a Data Warehouse as a central data store from which all your business intelligence reporting is based on. The broad user requirements for reporting and apparently incompatible data normally make a DW a considerable challenge to implement. That is until our Holistic Data Warehouse model came along. Finally we touched on the powerful Full Supply Chain Reporting that is easily enabled by the Holistic Data Warehouse. This reporting was envisioned when we began our first data warehouse and in a later chapter we document our first project that gave birth to the Holistic Data Warehousing methodology. It was a journey that took a business with basically no supply chain reporting capabilities and cumbersome business reporting to the leading edge. In the next chapter of the book we take you through the strategy that underpins the Holistic Data Warehouse methodology and go into greater depth comparing it to the conventional approach. Wikipedia is a great reference on the subject of data warehousing if you are interested in learning more about the current thinking on this topic.
The Holistic Data Warehouse Strategy
“The vision is really about empowering workers, giving them all the information about what’s going on so they can do a lot more than they’ve done in the past.” – Bill Gates In this chapter we would like to first take you through our Vision, Goals, Objectives and Benefits of the Holistic Data Warehouse. We will contrast these with the “Standard” data warehouse strategy.
The Holistic Data Warehouse Vision
One simple and easy system with minimal limitations providing the one view over the whole business and its supply chain in which the user can use filtering techniques to select which data they would like to view on their report. Right from the beginning, with our first data warehouse, our aim has been for a single repository and reporting front end that can be used for anything we could be thrown at it and something that will provide the holistic view over the business and the supply chain. With this approach there is a bonus in that it avoids the time and effort normally associated with planning and developing separate models for each new business requirement. We give some insight to what we define as a “Supply Chain” on the following page.
The ‘Standard’ Data Warehouse Goal
The goal of a data warehouse is to provide business intelligence that is consistent and reconciled based on operational data, decision support data and external data from multiple sources. This is the primary goal of any data warehouse and addresses the major weakness of business reporting that is based on many different systems. While not impossible this goal is difficult to achieve with most standard data warehouses. Some will never manage to get there. Working against them is the time & effort involved in set-up and the cost. Making data “conform” is difficult and there are many books and reference material written about how to implement data warehouses properly in order to attain this goal of consistency. Interestingly the Adventure Works Data Warehouse fails to meet this goal. It actually has four separate specialized models for sales reporting which to us seems ridiculous, and on top of that the data does not fully reconcile with some key omissions.
Page | 23
The Holistic Data Warehouse Strategy P a g e | 24
Full Supply Chain Reporting Goals
What is the Supply Chain?
From a very shallow perspective the “Supply Chain” is often considered to be just those processes and activities between supplier and customer. However, for a manufacturing company, we consider the “Supply Chain” to be much greater than this and that the chain goes all the way back through production to the suppliers of component materials. From Wikipedia: “A supply chain... is the system of organizations, people, technology, activities, information and resources involved in moving a product or service from supplier to customer. Supply chain activities transform natural resources, raw materials and components into a finished product that is delivered to the end customer.” (For our view of the supply chain the “end customer” is the end consumer). A basic diagram representing the supply chain of a consumer goods manufacturing and distribution company follows:
Figure 9 – Example of a supply chain for a consumer goods manufacturing and distribution company
Full Supply Chain Reporting Goals
The Holistic Data Warehouse Strategy P a g e | 25
Full Supply Chain Reporting Goals 1st Goal – (Applicable to all businesses using the Holistic DW Model)
To provide reporting that gives the user the option to see ALL the information available on activities pertaining to the subject matter they are interested in.
2nd Goal – (Applicable to manufacturing and assembly types of businesses)
To provide reporting that enables the user to see ALL information about the activities in the Full Supply Chain that are related to the subject matter in question. These additional Full Supply reporting goals underlie the Holistic Data Warehouse (DW). The first relates to the often requested (but not achieved) “report that tells me everything I need know on the one page”. The types of supply chain questions seen below in Figure 10 below on the next page (under “1st Goal”). These reports are usually done manually by users consolidating data from multiple sources in their spreadsheets. This “drill-across” reporting was introduced in the “What is a Data Warehouse?” chapter. The first Full Supply Chain Reporting goal does not necessarily need our Holistic DW Model. Gerry achieved this goal, with some difficulty, with a SAP BW (Business Warehouse) where he was the architect of the model (using a BW “Multi-provider” Cube). The substantial cube had 400+ reporting fields and 110+ key figures resulting in complex queries and reports. While the Full Supply Chain Reporting goal was attained, the effort and cost was astronomically higher than our earlier model over the legacy systems. The second goal is accomplished with the Holistic DW “Top-Down” and “Bottom-Up” reports built into the template. These are reports that are extremely difficult if not considered impossible in normal DW and business intelligence (BI) systems. For instance with SAP BW we tried and failed to achieve these reports. However, these Holistic DW reports do work automatically, if populated according to our guidelines and only apply to production businesses as the reporting goes through the supply chain to analyse components. These powerful reports are fully documented in the How the Full Supply Chain Queries Work chapter at the end of the book and in the largest section of this book we take you through in detail a full documented load of the Microsoft Adventure Works data into the Holistic DW, showing you how to populate the model to ensure the reports work.
The Holistic Data Warehouse Strategy P a g e | 26
Full Supply Chain Reporting Goals
Figure 10 – Questions about the supply chain
How often have you asked your own BI resources for this type of reporting in vain? How often have you been told that your report request was not possible, can’t be done, too ambitious and/or too costly? We believe that there is no BI report that is too hard. We have produced reports from the Holistic DW within 45 minutes of hearing the user’s request where others had failed to deliver the report after months of effort in other larger companies with more resources. The Holistic DW can do the hard reports. One interesting point was, that although we started with these goals in mind when envisioning our data warehousing solution, we didn’t think that these goals were anything
The Holistic Data Warehouse Strategy P a g e | 27 special at the time and just took them as a given. Only afterwards did we realise the ambitious nature of these goals for a normal data warehouse implementation. In a later chapter we tell the full story of our first effort.
Objectives of All Data Warehouses
After a few years of researching standard data warehouses our opinion is that these goals are technically very difficult (read expensive) to achieve using a traditional data warehouse model(s) and even more modern business intelligence systems. We have been to quite a few sales presentations where the sell job was for systems that provide a single view over the so-called supply chain, costing anywhere between $100,000 to around a $1 million and that was just for their services and software. But still these systems appeared to be too limited only encompassing either the distribution “supply chain” without any consideration to the supplier end of the chain or the reverse, focusing on suppliers and not customers.
Objectives of All Data Warehouses
Minimize inconsistent reports and reconcile different views of the same data Improve Quality of Data To consolidated enterprise data from multiple sources and time periods. Make the Data easily accessible and provide transparency To enable common and flexible calendars To save time on report preparation and construction Address the weaknesses of current reporting systems
End Of Extract To empower people with information The “Holistic Data Warehousing” book can be found at Amazon. These objectives while being seen as standard data warehouse objectives are normally very difficult to achieve. However, the Holistic takesbe these objectives quite A colour hardback version ofData the Warehouse book will also available easily into its stride. This is because sometime early 2011. it has a simple design and structure. Even with only a basic raw prototype we achieved all these objectives, using around two years worth of data within two weeks of starting project, using the Holistic Data Warehouse framework. If you have any our questions or comments please lodge them via the blog or the codeplex websites (link found at the blog) We next take you through each of these standard objectives in some detail to describe our interpretation what they mean. –of http://42sight.blogspot.com/
Minimize Inconsistent Reports and Reconcile Different Views of the Same Data Establish one version of the “truth”.
Prior to our first data warehouse, the business had inconsistent reporting of sales, even within the same “systems”. Within the various Cognos Analyst models that reported sales,