Dim Modeling Paper-Revised

Published on July 2016 | Categories: Documents | Downloads: 22 | Comments: 0 | Views: 173
of 34
Download PDF   Embed   Report

Comments

Content

Successful Dimensional Modeling of Very Large Data Warehouses
By Bert Scalzo, Ph.D.
[email protected]

About the Author
       Oracle DBA from 4 through 8i Worked for Oracle Education Worked for Oracle Consulting Holds several Oracle Masters BS, MS and PhD in Computer Science MBA and insurance industry designations Articles in
• Oracle Magazine • Oracle Informant • PC Week (now E-Magazine)

About Quest Software

Know Your Application
What type of application are you building:
 On Line Transaction Processing (OLTP)

 Operational Data Store (ODS)
 On Line Analytical Processing (OLAP)  Data Mart / Data Warehouse (DM/DW)

OLTP
Business Focus Operational

ODS
Operational Tactical

OLAP
Tactical

DM/DW
Tactical Strategic

End User Tools
DB Technology Trans Count Trans Size Trans Time Size in Gigs Normalization Data Modeling

Client Server Web
Relational Large Small Short 10 – 200 3NF Traditional ER

Client Server Web
Relational Medium Medium Medium 50 – 400 3NF Traditional ER

Client Server
Cubic Small Medium Long 50 – 400 N/A N/A

Client Server Web
Relational Small Large Long 400 - 4000 0NF Dimensional

Embrace New Concepts
 “Teach Old Dog New Tricks”
 Throw out any OLTP baggage  Forget OLTP “Golden Rules”

Star Schema Design
“Star schema” approach to dimensional data modeling was pioneered by Ralph Kimball
Dimensions: smaller, de-normalized tables containing business descriptive columns that end-users query on Facts: very large tables with primary keys formed from the concatenation of related dimension table foreign key columns, and possessing numerically additive, non-key columns used for calculations during end-user queries

Facts

Dimensions

108th -1010th

103rd -105th

Transform OLTP Model
Fold OLTP model into itself to form a Star:
 De-Normalize parent/child relationships

 De-Normalize lookup relationships
 Use surrogate or meaningless keys

 Create and populate a time dimension
 Create hierarchies of data in dimensions

OLTP Model

Dimensional Model

Dimension Hierarchies
SQL> select distinct levelx from dw_period;
LEVELX -------------------DAY MONTH QUARTER WEEK YEAR

SQL> select distinct levelx from dw_product;
LEVELX -------------------ALL PRODUCTS CATEGORY ITEM PSA SUB_CATEGORY

Avoid Snowflakes

Avoid natural desire to normalize model: •Complicates end-user query construction •Adds additional level of “JOIN” complexity •Database optimizers do not handle very well •Saves some space at the cost of longer queries

Snowflake Model

Common Aggregations
Build end-user driven aggregate tables: •By time (e.g. week, month, quarter, year) •By geographic regions (e.g. time zones) •By end-user reporting interests (e.g. beer) •By dimension hierarchy (e.g. product category) •Aggregates should be 5 to 10 times smaller

Time Aggregates

Non-Time Aggregates

Index Design

All fact table, foreign key columns must have individual bitmap indexes on them

All dimension table, non-key columns should have individual bitmap indexes

10 B-Tree Indexes

48 Bitmap Indexes!!!

Key Fact Table Issues
Fact tables should: •NOT create or enable foreign key constraints •NOT create or enable table check constraints •NOT create or enable primary/unique constraints (use unique indexes which offer parallel creation) •NOT create or enable column check constraints (other than simple NOT NULL check constraints)

•NOT create or enable “row” level triggers
•NOT enable logging on tables or their indexes

No PK/UK/FK Constraints

Key Oracle Issues
Trust me – no way to build large DW in Oracle 7.X Very brief overview in next few slides of: •Partioning options •Indexing options •Comparative timings •Tuning ad-hoc Star queries

•Serial versus Parallel queries
•Materialized Views …

Oracle Partitioning
•Way beyond the scope of dimensional modeling, but:

•Use Range or List Partitioning using your time dimension
•Fact unique index = local, prefixed b-tree index

•Fact time index = local, prefixed bitmap index
•Fact non-time index = local, non-prefixed bitmap index

•If any non-time dimension provides a good locality of reference for typical user queries, then sub-partition on that dimension (i.e use 8i’s new composite partitioning)

TABLE

OBJECT

RELATIONAL

TABLE IN CLUSTER

TABLE IN TABLESPACE

ORG INDEX

ORG HEAP

TABLE NONPARTITION CLUSTER INDEX NONCLUSTER INDEX TABLE-IZED INDEX

TABLE PARTITION

INDEX NONPARTITION

INDEX NONPARTITION

INDEX NONPARTITION

INDEX NONPARTITION

INDEX PARTITION

INDEX NONPARTITION

INDEX PARTITION

GLOBAL

GLOBAL

GLOBAL

GLOBAL

GLOBAL

GLOBAL

GLOBAL

LOCAL

1. BTREE

2. BTREE

12. BTREE

4. BTREE

6. BTREE

7. BTREE

9. BTREE

10. BTREE

3. BITMAP

5. BITMAP

8. BITMAP

11. BITMAP

Indexing Options!!!

Oracle 8i Table Option Timings
Fact Implementation Regular “Heap” Table
Single Column Partition Multi Column Partition Composite Partition Index Organized Table

Timing
9,293 4,747 4,987 6,319 12,508

Partition Index 14,902 NOTE: specific to my data and user queries Organized

Tuning Star Queries
•Way beyond the scope of dimensional modeling, but: •Use Oracle 8.X’s Range Partitioning based upon your time dimension (do not try to use hash or composite partitioning)

•Fact unique index uses local, prefixed b-tree index
•Fact time index uses local, prefixed bitmap index

•Fact non-time index use local, non-prefixed bitmap index

Typical User Query

Query: beer and coffee sales for November of 98 in Dallas

Best Explain Plan

Star Transformation

Oracle 8i Query Options
Explain Plan Serial, No Partition
Serial, with Partition Parallel, No Partition Parallel, with Partition

UNIX 9,688

NT

22,34 4 5,578 11,62 5
ORA600 ORA600

11,14 0

25,45 4

NOTE: specific to my data and user queries

Oracle 8i Materialized Views
•Way beyond the scope of dimensional modeling, but : •Special form of snapshots (i.e. replication) •End-users direct all queries against detail table •Optimizer rewrites queries to use best aggregate •Optimizer suggests new aggregates based on load •Eliminates need for numerous aggregation programs

Other DW Presentations
Optimizing Data Warehouse Ad-Hoc Queries against "Star Schemas“ Attendees will learn optimal techniques for designing, monitoring and tuning "Star Schema" Data Warehouses in Oracle 8.0 and 8i. While there are numerous books and papers on Data Warehousing with Oracle, they generally provide a 50,000 foot overview focusing on hardware and software architectures -- with some database design. This presentation provides the ground level, detailed recipe for successfully querying tables whose sizes exceed 500 million rows. Issues covered will include table and index designs, partitioning options, statistics and histograms, Oracle initialization parameters and star transformation explain plans. Attendees should be DBAs familiar with "Star Schema" database designs, have at least one years experience with Oracle 8.0, and some exposure to Oracle 8i. Optimizing Data Warehouse Loading via Parallelized Pro-C and SQL Attendees will learn optimal techniques for coding, monitoring and tuning parallel loading of Data Warehouses in Oracle 8.0 and 8i. While there are numerous books and papers on Data Warehousing with Oracle, they generally provide a 50,000 foot overview focusing on hardware and software architectures -- with some database design. This presentation provides the ground level, detailed recipe for high speed loading of tables whose sizes exceed 500 million rows. Issues covered will include database instance options, table and index designs, partitioning options, optimizer choices, plus Oracle initialization parameters. Attendees should be DBAs or senior developers familiar with Oracle 8.X, ProC and SMP or MPP UNIX environments.

THANK YOU FOR LISTENING

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close