Data Warehousing and BA

Published on February 2017 | Categories: Documents | Downloads: 24 | Comments: 0 | Views: 436
of 77
Download PDF   Embed   Report

Comments

Content

Business Analytics

By
Dr. Atanu Rakshit
Email: [email protected]
[email protected]

Business Analytics
• Text Book:
– ‘Business Intelligence A Managerial Approach’ by
Efraim Turban, Ramesh Sharda, Dursun Delen and
Devid King, 2/e, Pearson, 2012

• Reference Material:
– ‘Business Analytics for Manager’ by Gert H. N.
Laursen and Jesper Thorlund, Wiley, 2010

Business Analytics
• Reference Material:
– ‘Decision Support and Business Intelligence
Systems’ by Efraim Turban, Ramesh Sharda and
Dursun Delen, 9/e, Pearson, 2012
– ‘Business Intelligence Strategy A Practical Guide
for Achieving BI Excellence’ by John Boyer, Bill
Frank, Brian Green and Tracy Harris, MC Press,
2010

Business Analytics
• Sessions Plan











Introduction to Business Analytics
Data Warehousing
Data Mining for Business Intelligence
Business Analytics Model
The Business Analytics at the Analytics Level
Business Analytics at the Strategic Level
Business Analytics at the Functional Level
Business Performance Management
Big Data Analytics
Project Presentation

Business Analytics

Introduction to Data
Warehousing

Learning Objectives
• Understand the basic definitions and concepts of
data warehouses
• Learn different types of data warehousing
architectures; their comparative advantages and
disadvantages
• Describe the processes used in developing and
managing data warehouses
• Explain data warehousing operations
• Explain the role of data warehouses in decision
support

Learning Objectives
• Explain data integration and the extraction,
transformation, and load (ETL) processes
• Describe real-time (a.k.a. right-time and/or active)
data warehousing
• Understand data warehouse administration and
security issues

Opening Vignette…
“DirecTV Thrives with Active Data Warehousing”
• Company background
• Problem description
• Proposed solution
• Results
• Answer & discuss the case questions.

Main Data Warehousing Topics









DW definition
Characteristics of DW
Data Marts
ODS, EDW, Metadata
DW Framework
DW Architecture & ETL Process
DW Development
DW Issues

What is a Data Warehouse?




A physical repository where relational data are
specially organized to provide enterprise-wide,
cleansed data in a standardized format
“The data warehouse is a collection of integrated,
subject-oriented databases designed to support
DSS functions, where each unit of data is nonvolatile and relevant to some moment in time”

Characteristics of DW











Subject oriented
Integrated
Time-variant (time series)
Nonvolatile
Summarized
Not normalized
Metadata
Web based, relational/multi-dimensional
Client/server
Real-time and/or right-time (active)

Data Mart
A departmental data warehouse that stores
only relevant data
– Dependent data mart
A subset that is created directly from a data
warehouse
– Independent data mart
A small data warehouse designed for a
strategic business unit or a department

Data Warehousing Definitions
• Operational data stores (ODS)
A type of database often used as an interim area for a data
warehouse
• Oper marts
An operational data mart
• Enterprise data warehouse (EDW)
A data warehouse for the enterprise
• Metadata
Data about data. In a data warehouse, metadata describe
the contents of a data warehouse and the manner of its
acquisition and use

DW Framework

DW Architecture
Three-tier architecture



1.
2.
3.



Data acquisition software (back-end)
The data warehouse that contains the data & software
Client (front-end) software that allows users to access
and analyze data from the warehouse

Two-tier architecture
First 2 tiers in three-tier architecture is combined into one

Sometimes there is only one tier

DW Architectures

OLAP Definition
OLAP is implemented in a multi-user client/server
mode and offers consistently rapid response to queries,
regardless of database size and complexity. OLAP
helps the user synthesize enterprise information
through comparative, personalized viewing, as well as
through analysis of historical and projected data in
various "what-if" data model scenarios. This is
achieved through use of an OLAP Server.
19

OLAP Server
• An OLAP server is a high-capacity, multi-user data
manipulation engine specifically designed to support
and operate on multi-dimensional data structures.
• A multi- dimensional structure is arranged so that
every data item is located and accessed based on the
intersection of the dimension members which define
that item.
• The design of the server and the structure of the data
are optimized for rapid ad-hoc information retrieval
in any orientation, as well as for fast, flexible
calculation and transformation of raw data based on
formulaic relationships. 20

OLAP Server
• The OLAP Server may either physically stage the
processed multi-dimensional information to deliver
consistent and rapid response times to end users, or it
may populate its data structures in real-time from
relational or other databases, or offer a choice of
both.
• Given the current state of technology and the end
user requirement for consistent and rapid response
times, staging the multi-dimensional data in the
OLAP Server is often the preferred method.
21

Multi-dimensional Data
• “Hey…I sold $100M worth of goods”
Dimensions: Product, Region, Time
Hierarchical summarization paths

Product

W
S
N
Juice
Cola
Milk
Cream
Toothpaste
Soap
1 2 34 5 6 7
Month
22

Product
Industry

Region
Country

Time
Year

Category

Region

Quarter

Product

City

Office

Month

Day

Week

A Visual Operation: Pivot (Rotate)

10
Juice
Cola
Milk
Cream

47
30
12

Product

3/1 3/2 3/3 3/4

Date

23

“Slicing and Dicing”
The Telecomm Slice

Product

Household
Telecomm
Video
Audio

Europe
Far East
India
Retail Direct

Sales Channel

Special
24

Roll-up and Drill Down
Higher Level of
Aggregation








Sales Channel
Region
Country
State
Location Address
Sales Representative
Low-level
Details
25

Nature of OLAP Analysis
• Aggregation -- (total sales,
percent-to-total)
• Comparison -- Budget vs.
Expenses
• Ranking -- Top 10, quartile
analysis
• Access to detailed and aggregate
data
• Complex criteria specification
• Visualization
26

A Web-based DW Architecture

Web pages

Client
(Web browser)

Internet/
Intranet/
Extranet

Application
Server

Web
Server

Data
warehouse

Data Warehousing Architectures
• Issues to consider when deciding which
architecture to use:
– Which database management system (DBMS)
should be used?
– Will parallel processing and/or partitioning be
used?
– Will data migration tools be used to load the data
warehouse?
– What tools will be used to support data retrieval
and analysis?

Alternative DW Architectures

Alternative DW Architectures

Alternative DW Architectures
1.
2.
3.
4.
5.

Independent Data Marts
Data Mart Bus Architecture
Hub-and-Spoke Architecture
Centralized Data Warehouse
Federated Data Warehouse

• Each has pros and cons!

Teradata Corp. DW Architecture

Data Integration and the Extraction,
Transformation, and Load (ETL) Process
• Data integration
Integration that comprises three major processes: data
access, data federation, and change capture
• Enterprise application integration (EAI)
A technology that provides a vehicle for pushing data from
source systems into a data warehouse
• Enterprise information integration (EII)
An evolving tool space that promises real-time data
integration from a variety of sources, such as relational
databases, Web services, and multidimensional databases

Data Integration and the Extraction,
Transformation, and Load (ETL) Process
Extraction, transformation, and load (ETL)
Transient
data source

Packaged
application

Data
warehouse
Legacy
system

Extract

Transform

Cleanse

Load

Data mart
Other internal
applications

ETL
• Issues affecting the purchase of ETL tool
– Data transformation tools are expensive
– Data transformation tools may have a long learning curve

• Important criteria in selecting an ETL tool
– Ability to read from and write to an unlimited number of
data sources/architectures
– Automatic capturing and delivery of metadata
– A history of conforming to open standards
– An easy-to-use interface for the developer and the
functional user

Data Warehouse Development


Data warehouse development approaches




Inmon Model: EDW approach (top-down)
Kimball Model: Data mart approach (bottom-up)
Which model is best?


There is no one-size-fits-all strategy to DW

– One alternative is the hosted warehouse
• Data warehouse structure:




The Star Schema vs. Relational

Real-time data warehousing?

Representation of Data in DW
• Dimensional Modeling – a retrieval-based system that
supports high-volume query access
• Star schema – the most commonly used and the simplest style
of dimensional modeling
– Contain a fact table surrounded by and connected to several
dimension tables
– Fact table contains the descriptive attributes (numerical values)
needed to perform decision analysis and query reporting
– Dimension tables contain classification and aggregation information
about the values in the fact table

• Snowflakes schema – an extension of star schema where the
diagram resembles a snowflake in shape

Multidimensionality
• Multidimensionality
The ability to organize, present, and analyze data by
several dimensions, such as sales by region, by product, by
salesperson, and by time (four dimensions)

• Multidimensional presentation
– Dimensions: products, salespeople, market segments, business units,
geographical locations, distribution channels, country, or industry
– Measures: money, sales volume, head count, inventory profit, actual
versus forecast
– Time: daily, weekly, monthly, quarterly, or yearly

Star vs Snowflake Schema
Star Schema
Dimension
TIME

Snowflake Schema
Dimension
PRODUCT

Dimension
MONTH

Quarter

Brand

M_Name

...

...

...

Fact Table
SALES

Dimension
QUARTER

UnitsSold

Dimension
BRAND
Brand
Dimension
DATE
Date

LineItem

...

...

Q_Name

...

Dimension
GOGRAPHY

Division

Coutry

...

...

...

Dimension
CATEGORY
Category

Fact Table
SALES

...
Dimension
PEOPLE

Dimension
PRODUCT

...

UnitsSold
...

Dimension
PEOPLE

Dimension
STORE

Division

LocID

...

...

Dimension
LOCATION
State
...

Analysis of Data in DW
• Online analytical processing (OLAP)

– Data driven activities performed by end users to query the
online system and to conduct analyses
– Data cubes, drill-down / rollup, slice & dice, …

• OLAP Activities





Generating queries (query tools)
Requesting ad hoc reports
Conducting statistical and other analyses
Developing multimedia-based applications

Analysis of Data Stored in DW
OLTP vs. OLAP
• OLTP (online transaction processing)

– A system that is primarily responsible for capturing and
storing data related to day-to-day business functions
such as ERP, CRM, SCM, POS,
– The main focus is on efficiency of routine tasks

• OLAP (online analytic processing)
– A system is designed to address the need of
information extraction by providing effectively and
efficiently ad hoc analysis of organizational data
– The main focus is on effectiveness

Application-Orientation vs.
Subject-Orientation
Subject-Orientation

Application-Orientation

Operational
Database
Loans

Credit
Card

Data
Warehouse
Customer
Vendor

Trust
Savings

Product
Activity

OLAP vs. OLTP

OLTP vs Data Warehouse
• OLTP








Application Oriented
Used to run business
Detailed data
Current up to date
Isolated Data
Repetitive access
Clerical User

• Warehouse (DSS)








Subject Oriented
Used to analyze business
Summarized and refined
Snapshot data
Integrated Data
Ad-hoc access
Knowledge User (Manager)

OLTP vs Data Warehouse
• OLTP
– Performance Sensitive
– Few Records accessed at
a time (tens)
– Read/Update Access
– No data redundancy
– Database Size 100MB 100 GB

• Data Warehouse
– Performance relaxed
– Large volumes accessed
at a time(millions)
– Mostly Read (Batch
Update)
– Redundancy present
– Database Size
100
GB - few terabytes

OLTP vs Data Warehouse
• OLTP
– Transaction throughput
is the performance
metric
– Thousands of users
– Managed in entirety

• Data Warehouse
– Query throughput is the
performance metric
– Hundreds of users
– Managed by subsets

To summarize ...
• OLTP Systems are
used to “run” a business

• The Data Warehouse
helps to “optimize” the
business

OLAP Operations
• Slice – a subset of a multidimensional array
• Dice – a slice on more than two dimensions
• Drill Down/Up – navigating among levels of data
ranging from the most summarized (up) to the most
detailed (down)
• Roll Up – computing all of the data relationships for
one or more dimensions
• Pivot – used to change the dimensional orientation
of a report or an ad hoc query-page display

A 3-dimensional
OLAP cube with
slicing
operations

OLAP

Ti
m

e

Slicing Operations on a
Simple Tree-Dimensional
Data Cube

Sales volumes of
a specific Product
on variable Time
and Region

Cells are filled
with numbers
representing
sales volumes

Geography

Product
Sales volumes of
a specific Region
on variable Time
and Products

Sales volumes of
a specific Time on
variable Region
and Products

Variations of OLAP






Multidimensional OLAP (MOLAP)
OLAP implemented via a specialized
multidimensional database (or data store) that
summarizes transactions into multidimensional
views ahead of time
Relational OLAP (ROLAP)
The implementation of an OLAP database on top of
an existing relational database
Database OLAP and Web OLAP (DOLAP and WOLAP);
Desktop OLAP,…

ROLAP / MOLAP Approaches

Relational OLAP: 3 Tier DSS
Data Warehouse

ROLAP Engine

Database Layer

Application Logic Layer

Presentation Layer

Generate SQL execution
plans in the ROLAP
engine to obtain OLAP
functionality.

Obtain multidimensional reports
from the DSS Client.

Store atomic data in
industry standard
RDBMS.

53

Decision Support Client

MD-OLAP: 2 Tier DSS
MDDB Engine

Database Layer

MDDB Engine

Application Logic Layer

Store atomic data in a proprietary data structure
(MDDB), pre-calculate as many outcomes as
possible, obtain OLAP functionality via proprietary
algorithms running against this data.

54

Decision Support Client

Presentation Layer
Obtain multi-dimensional
reports from the DSS
Client.

Massive DW and Scalability
• Scalability
– The main issues pertaining to scalability:





The amount of data in the warehouse
How quickly the warehouse is expected to grow
The number of concurrent users
The complexity of user queries

– Good scalability means that queries and other
data-access functions will grow linearly with the
size of the warehouse

Symmetric Multi Processing

CPUs

Shared Symmetric System Bus

Shared
Memory

One hop

Disks

Figure 6: All components are equidistant in classical SMP architectures.

Real-time/Active DW/BI
• Enabling real-time data updates for real-time
analysis and real-time decision making is
growing rapidly
– Push vs. Pull (of data)

• Concerns about real-time BI





Not all data should be updated continuously
Mismatch of reports generated minutes apart
May be cost prohibitive
May also be infeasible

RDW / ADW
Batch

Mini-Batch

Micro-Batch

Real-Time

Description

Source changes
Data is loaded in full Data is loaded
are captured and
or incrementally using incrementally using
accumulated to be
a off-peak window.
intra-day loads.
loaded in intervals.

Source changes
are captured and
immediately
applied to the DW.

Latency

Daily or higher

Hourly or higher

15min & higher

Second(s)

Capture

Filter Query

Filter Query

CDC

CDC

Intialization

Pull

Pull

Push, then Pull

Push

Target Load

High Impact

Low Impact, load frequency is tuneable

Source Load

High Impact

Queries at peak
times necessary

Some to none depending on CDC
technique

RDW / ADW
Need for real-time data warehousing
Decision Support has become operational
Integrated BI requires closed-loop analytics
The reach and impact of information access for
decision making can affect customer service, SCM,
and beyond.
• Traditional hub-and-spoke architecture is difficult to
keep in sync
• One huge BW so that data is centralized for BI/BA
tools






Real-time/Active DW at Teradata

Enterprise Decision Evolution and DW

Traditional vs Active DW
Environment

DW Administration and Security
• Data warehouse administrator (DWA)
– DWA should…
• have the knowledge of high-performance software, hardware and
networking technologies.
• possess solid business knowledge and insight.
• be familiar with the decision-making processes so as to suitably
design/maintain the data warehouse structure.
• possess excellent communications skills.

• Security and privacy is a pressing issue in DW
– Safeguarding the most valuable assets
– Government regulations
– Must be explicitly planned and executed

The Future of DW
• Sourcing…





Open source software
SaaS (software as a service)
Cloud computing
DW appliances

• Infrastructure…






Real-time DW
Data management practices/technologies
In-memory processing (“super-computing”)
New DBMS
Advanced analytics

MDM
Master Data Management (MDM)
Master Data Management
Operational versus Analytical Master Data Management
Demystifying Master Data Management
Would You Like Fries With That? And Does CrossSelling Justify Master Data Management?
Data management's top eight stories of 2008
Human resources data analytics brings metrics to
workforce management
3-65

Master Data Management

© Affecto 2008

Master data – what is it?
• Master data is data shared across computer systems
in the enterprise.
• Master data is the dimension or hierarchy data in
data warehouses and transactional systems
• Master data is core business objects shared by
applications across an enterprise
• Slowly changing Reference data shared across
systems
• Master data is data worth managing
© Affecto 2008

Master data vs. Metadata vs. Transactional
Transactional
Company

Country

Account

SubAccount

Date

Amount

Affecto

NO

505050

500

20080301

KR30.000

Metadata
Company

Country

Account

Sub-Account

Date

Amount

Text

Text

Integer

Integer

Date

Float

nVarchar(50)

Char(2)

Int(6)

Int(3)

Datetime

Decimal

(YYYYMMDD)

Master data
Products
Software
Hardware
CPU

Customers
Affecto OY

Country
Europe

Affecto AS

Norway

© Affecto 2008
Affecto
AB

Sweden

Master data applications
• Product master data
– Product Information Management (PIM)
• Customer master data
– Customer Data Integration (CDI)
• Analytical master data
– Hierarchies used for reporting
• Other possible
– Recipe master data
– Vendor master data
– Employee master data
© Affecto 2008

What is Master Data Management
• The processes and technology to produce and
maintain a single clean copy of master data
– The “Golden” record
• An Application for creating and maintaining an
authoritative view of master data including
policies and procedures for access, update,
modification, viewing between systems across
the enterprise
© Affecto 2008

Why master data management?








Different people involved
Inevitable manual process
Error-prone, inconsistent
No way to audit
No way to rollback changes
Time and resource
consuming
Updates are “interpreted”
by systems experts

PPS
PPS
Admin







Essbase
Essbase
Analysis
Services

DW

Accounts
Entity
Project
Product
Location
Channel

ETL
ERP
EAI
Admin Spreadsheet

Dynamics

ERP

SAP

Custom

Review
Spreadsheet
Business
User
© Affecto 2008

IT Admin
E-Mail

Master data Management solution
• Single version of
truth
• Master data
synchronized and
validated
• Data maintained by
Business Users and
domain experts –
not systems
experts

PPS

Essbase
Essbase
Analysis
Services

DW







Accounts
Entity
Project
Product
Location
Channel

Dynamics

Business
User

© Affecto 2008

MDM

ERP

ETL
EAI

SAP

Custom

Governance & Compliance
• Master data governance
– Can you track changes in dimensions?
– Do you know who made the changes?
– Do you know when changes occurred?
– Can you produce a dimension from Q2 last year?

• Compliance
– International accounting standard
– Transparency and auditability
© Affecto 2008

DW Implementation Issues
• Tasks for successful DW implementation
– Establishment of service-level agreements and data-refresh
requirements
– Identification of data sources and their governance policies
– Data quality planning
– Data model design
– ETL tool selection
– Relational database software and platform selection
– Data transport
– Data conversion
– End-user support

DW Implementation Guidelines









Project must fit with corporate strategy & business
objectives
There must be complete buy-in to the project by executives,
managers, and users
It is important to manage user expectations about the
completed project
The data warehouse must be built incrementally
Build in adaptability, flexibility and scalability
The project must be managed by both IT and business
professionals
Only load data that have been cleansed and are of a quality
understood by the organization
Do not overlook training requirements

Successful DW Implementation
Things to Avoid
Starting with the wrong sponsorship chain
Setting expectations that you cannot meet
Engaging in politically naive behavior
Loading the data warehouse with information just
because it is available
• Believing that data warehousing database design is
the same as transactional database design
• Choosing a data warehouse manager who is
technology oriented rather than user oriented






Failure Factors in DW Projects
• Lack of executive sponsorship
• Unclear business objectives
• Cultural issues being ignored
– Change management






Unrealistic expectations
Inappropriate architecture
Low data quality / missing information
Loading data just because it is available

BI / OLAP Portal for Learning
• MicroStrategy, and much more…
• www.TeradataStudentNetwork.com
• Pw: <check with TDUN>

Q&A

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close