Business Analytics
By
Dr. Atanu Rakshit
Email:
[email protected]
[email protected]
Business Analytics
• Text Book:
– ‘Business Intelligence A Managerial Approach’ by
Efraim Turban, Ramesh Sharda, Dursun Delen and
Devid King, 2/e, Pearson, 2012
• Reference Material:
– ‘Business Analytics for Manager’ by Gert H. N.
Laursen and Jesper Thorlund, Wiley, 2010
Business Analytics
• Reference Material:
– ‘Decision Support and Business Intelligence
Systems’ by Efraim Turban, Ramesh Sharda and
Dursun Delen, 9/e, Pearson, 2012
– ‘Business Intelligence Strategy A Practical Guide
for Achieving BI Excellence’ by John Boyer, Bill
Frank, Brian Green and Tracy Harris, MC Press,
2010
Business Analytics
• Sessions Plan
–
–
–
–
–
–
–
–
–
–
Introduction to Business Analytics
Data Warehousing
Data Mining for Business Intelligence
Business Analytics Model
The Business Analytics at the Analytics Level
Business Analytics at the Strategic Level
Business Analytics at the Functional Level
Business Performance Management
Big Data Analytics
Project Presentation
Business Analytics
Introduction to Data
Warehousing
Learning Objectives
• Understand the basic definitions and concepts of
data warehouses
• Learn different types of data warehousing
architectures; their comparative advantages and
disadvantages
• Describe the processes used in developing and
managing data warehouses
• Explain data warehousing operations
• Explain the role of data warehouses in decision
support
Learning Objectives
• Explain data integration and the extraction,
transformation, and load (ETL) processes
• Describe real-time (a.k.a. right-time and/or active)
data warehousing
• Understand data warehouse administration and
security issues
Opening Vignette…
“DirecTV Thrives with Active Data Warehousing”
• Company background
• Problem description
• Proposed solution
• Results
• Answer & discuss the case questions.
Main Data Warehousing Topics
•
•
•
•
•
•
•
•
DW definition
Characteristics of DW
Data Marts
ODS, EDW, Metadata
DW Framework
DW Architecture & ETL Process
DW Development
DW Issues
What is a Data Warehouse?
•
•
A physical repository where relational data are
specially organized to provide enterprise-wide,
cleansed data in a standardized format
“The data warehouse is a collection of integrated,
subject-oriented databases designed to support
DSS functions, where each unit of data is nonvolatile and relevant to some moment in time”
Characteristics of DW
•
•
•
•
•
•
•
•
•
•
Subject oriented
Integrated
Time-variant (time series)
Nonvolatile
Summarized
Not normalized
Metadata
Web based, relational/multi-dimensional
Client/server
Real-time and/or right-time (active)
Data Mart
A departmental data warehouse that stores
only relevant data
– Dependent data mart
A subset that is created directly from a data
warehouse
– Independent data mart
A small data warehouse designed for a
strategic business unit or a department
Data Warehousing Definitions
• Operational data stores (ODS)
A type of database often used as an interim area for a data
warehouse
• Oper marts
An operational data mart
• Enterprise data warehouse (EDW)
A data warehouse for the enterprise
• Metadata
Data about data. In a data warehouse, metadata describe
the contents of a data warehouse and the manner of its
acquisition and use
DW Framework
DW Architecture
Three-tier architecture
•
1.
2.
3.
•
Data acquisition software (back-end)
The data warehouse that contains the data & software
Client (front-end) software that allows users to access
and analyze data from the warehouse
Two-tier architecture
First 2 tiers in three-tier architecture is combined into one
Sometimes there is only one tier
DW Architectures
OLAP Definition
OLAP is implemented in a multi-user client/server
mode and offers consistently rapid response to queries,
regardless of database size and complexity. OLAP
helps the user synthesize enterprise information
through comparative, personalized viewing, as well as
through analysis of historical and projected data in
various "what-if" data model scenarios. This is
achieved through use of an OLAP Server.
19
OLAP Server
• An OLAP server is a high-capacity, multi-user data
manipulation engine specifically designed to support
and operate on multi-dimensional data structures.
• A multi- dimensional structure is arranged so that
every data item is located and accessed based on the
intersection of the dimension members which define
that item.
• The design of the server and the structure of the data
are optimized for rapid ad-hoc information retrieval
in any orientation, as well as for fast, flexible
calculation and transformation of raw data based on
formulaic relationships. 20
OLAP Server
• The OLAP Server may either physically stage the
processed multi-dimensional information to deliver
consistent and rapid response times to end users, or it
may populate its data structures in real-time from
relational or other databases, or offer a choice of
both.
• Given the current state of technology and the end
user requirement for consistent and rapid response
times, staging the multi-dimensional data in the
OLAP Server is often the preferred method.
21
Multi-dimensional Data
• “Hey…I sold $100M worth of goods”
Dimensions: Product, Region, Time
Hierarchical summarization paths
Product
W
S
N
Juice
Cola
Milk
Cream
Toothpaste
Soap
1 2 34 5 6 7
Month
22
Product
Industry
Region
Country
Time
Year
Category
Region
Quarter
Product
City
Office
Month
Day
Week
A Visual Operation: Pivot (Rotate)
10
Juice
Cola
Milk
Cream
47
30
12
Product
3/1 3/2 3/3 3/4
Date
23
“Slicing and Dicing”
The Telecomm Slice
Product
Household
Telecomm
Video
Audio
Europe
Far East
India
Retail Direct
Sales Channel
Special
24
Roll-up and Drill Down
Higher Level of
Aggregation
•
•
•
•
•
•
Sales Channel
Region
Country
State
Location Address
Sales Representative
Low-level
Details
25
Nature of OLAP Analysis
• Aggregation -- (total sales,
percent-to-total)
• Comparison -- Budget vs.
Expenses
• Ranking -- Top 10, quartile
analysis
• Access to detailed and aggregate
data
• Complex criteria specification
• Visualization
26
A Web-based DW Architecture
Web pages
Client
(Web browser)
Internet/
Intranet/
Extranet
Application
Server
Web
Server
Data
warehouse
Data Warehousing Architectures
• Issues to consider when deciding which
architecture to use:
– Which database management system (DBMS)
should be used?
– Will parallel processing and/or partitioning be
used?
– Will data migration tools be used to load the data
warehouse?
– What tools will be used to support data retrieval
and analysis?
Alternative DW Architectures
Alternative DW Architectures
Alternative DW Architectures
1.
2.
3.
4.
5.
Independent Data Marts
Data Mart Bus Architecture
Hub-and-Spoke Architecture
Centralized Data Warehouse
Federated Data Warehouse
• Each has pros and cons!
Teradata Corp. DW Architecture
Data Integration and the Extraction,
Transformation, and Load (ETL) Process
• Data integration
Integration that comprises three major processes: data
access, data federation, and change capture
• Enterprise application integration (EAI)
A technology that provides a vehicle for pushing data from
source systems into a data warehouse
• Enterprise information integration (EII)
An evolving tool space that promises real-time data
integration from a variety of sources, such as relational
databases, Web services, and multidimensional databases
Data Integration and the Extraction,
Transformation, and Load (ETL) Process
Extraction, transformation, and load (ETL)
Transient
data source
Packaged
application
Data
warehouse
Legacy
system
Extract
Transform
Cleanse
Load
Data mart
Other internal
applications
ETL
• Issues affecting the purchase of ETL tool
– Data transformation tools are expensive
– Data transformation tools may have a long learning curve
• Important criteria in selecting an ETL tool
– Ability to read from and write to an unlimited number of
data sources/architectures
– Automatic capturing and delivery of metadata
– A history of conforming to open standards
– An easy-to-use interface for the developer and the
functional user
Data Warehouse Development
•
Data warehouse development approaches
–
–
–
Inmon Model: EDW approach (top-down)
Kimball Model: Data mart approach (bottom-up)
Which model is best?
•
There is no one-size-fits-all strategy to DW
– One alternative is the hosted warehouse
• Data warehouse structure:
–
•
The Star Schema vs. Relational
Real-time data warehousing?
Representation of Data in DW
• Dimensional Modeling – a retrieval-based system that
supports high-volume query access
• Star schema – the most commonly used and the simplest style
of dimensional modeling
– Contain a fact table surrounded by and connected to several
dimension tables
– Fact table contains the descriptive attributes (numerical values)
needed to perform decision analysis and query reporting
– Dimension tables contain classification and aggregation information
about the values in the fact table
• Snowflakes schema – an extension of star schema where the
diagram resembles a snowflake in shape
Multidimensionality
• Multidimensionality
The ability to organize, present, and analyze data by
several dimensions, such as sales by region, by product, by
salesperson, and by time (four dimensions)
• Multidimensional presentation
– Dimensions: products, salespeople, market segments, business units,
geographical locations, distribution channels, country, or industry
– Measures: money, sales volume, head count, inventory profit, actual
versus forecast
– Time: daily, weekly, monthly, quarterly, or yearly
Star vs Snowflake Schema
Star Schema
Dimension
TIME
Snowflake Schema
Dimension
PRODUCT
Dimension
MONTH
Quarter
Brand
M_Name
...
...
...
Fact Table
SALES
Dimension
QUARTER
UnitsSold
Dimension
BRAND
Brand
Dimension
DATE
Date
LineItem
...
...
Q_Name
...
Dimension
GOGRAPHY
Division
Coutry
...
...
...
Dimension
CATEGORY
Category
Fact Table
SALES
...
Dimension
PEOPLE
Dimension
PRODUCT
...
UnitsSold
...
Dimension
PEOPLE
Dimension
STORE
Division
LocID
...
...
Dimension
LOCATION
State
...
Analysis of Data in DW
• Online analytical processing (OLAP)
– Data driven activities performed by end users to query the
online system and to conduct analyses
– Data cubes, drill-down / rollup, slice & dice, …
• OLAP Activities
–
–
–
–
Generating queries (query tools)
Requesting ad hoc reports
Conducting statistical and other analyses
Developing multimedia-based applications
Analysis of Data Stored in DW
OLTP vs. OLAP
• OLTP (online transaction processing)
– A system that is primarily responsible for capturing and
storing data related to day-to-day business functions
such as ERP, CRM, SCM, POS,
– The main focus is on efficiency of routine tasks
• OLAP (online analytic processing)
– A system is designed to address the need of
information extraction by providing effectively and
efficiently ad hoc analysis of organizational data
– The main focus is on effectiveness
Application-Orientation vs.
Subject-Orientation
Subject-Orientation
Application-Orientation
Operational
Database
Loans
Credit
Card
Data
Warehouse
Customer
Vendor
Trust
Savings
Product
Activity
OLAP vs. OLTP
OLTP vs Data Warehouse
• OLTP
–
–
–
–
–
–
–
Application Oriented
Used to run business
Detailed data
Current up to date
Isolated Data
Repetitive access
Clerical User
• Warehouse (DSS)
–
–
–
–
–
–
–
Subject Oriented
Used to analyze business
Summarized and refined
Snapshot data
Integrated Data
Ad-hoc access
Knowledge User (Manager)
OLTP vs Data Warehouse
• OLTP
– Performance Sensitive
– Few Records accessed at
a time (tens)
– Read/Update Access
– No data redundancy
– Database Size 100MB 100 GB
• Data Warehouse
– Performance relaxed
– Large volumes accessed
at a time(millions)
– Mostly Read (Batch
Update)
– Redundancy present
– Database Size
100
GB - few terabytes
OLTP vs Data Warehouse
• OLTP
– Transaction throughput
is the performance
metric
– Thousands of users
– Managed in entirety
• Data Warehouse
– Query throughput is the
performance metric
– Hundreds of users
– Managed by subsets
To summarize ...
• OLTP Systems are
used to “run” a business
• The Data Warehouse
helps to “optimize” the
business
OLAP Operations
• Slice – a subset of a multidimensional array
• Dice – a slice on more than two dimensions
• Drill Down/Up – navigating among levels of data
ranging from the most summarized (up) to the most
detailed (down)
• Roll Up – computing all of the data relationships for
one or more dimensions
• Pivot – used to change the dimensional orientation
of a report or an ad hoc query-page display
A 3-dimensional
OLAP cube with
slicing
operations
OLAP
Ti
m
e
Slicing Operations on a
Simple Tree-Dimensional
Data Cube
Sales volumes of
a specific Product
on variable Time
and Region
Cells are filled
with numbers
representing
sales volumes
Geography
Product
Sales volumes of
a specific Region
on variable Time
and Products
Sales volumes of
a specific Time on
variable Region
and Products
Variations of OLAP
•
•
•
Multidimensional OLAP (MOLAP)
OLAP implemented via a specialized
multidimensional database (or data store) that
summarizes transactions into multidimensional
views ahead of time
Relational OLAP (ROLAP)
The implementation of an OLAP database on top of
an existing relational database
Database OLAP and Web OLAP (DOLAP and WOLAP);
Desktop OLAP,…
ROLAP / MOLAP Approaches
Relational OLAP: 3 Tier DSS
Data Warehouse
ROLAP Engine
Database Layer
Application Logic Layer
Presentation Layer
Generate SQL execution
plans in the ROLAP
engine to obtain OLAP
functionality.
Obtain multidimensional reports
from the DSS Client.
Store atomic data in
industry standard
RDBMS.
53
Decision Support Client
MD-OLAP: 2 Tier DSS
MDDB Engine
Database Layer
MDDB Engine
Application Logic Layer
Store atomic data in a proprietary data structure
(MDDB), pre-calculate as many outcomes as
possible, obtain OLAP functionality via proprietary
algorithms running against this data.
54
Decision Support Client
Presentation Layer
Obtain multi-dimensional
reports from the DSS
Client.
Massive DW and Scalability
• Scalability
– The main issues pertaining to scalability:
•
•
•
•
The amount of data in the warehouse
How quickly the warehouse is expected to grow
The number of concurrent users
The complexity of user queries
– Good scalability means that queries and other
data-access functions will grow linearly with the
size of the warehouse
Symmetric Multi Processing
CPUs
Shared Symmetric System Bus
Shared
Memory
One hop
Disks
Figure 6: All components are equidistant in classical SMP architectures.
Real-time/Active DW/BI
• Enabling real-time data updates for real-time
analysis and real-time decision making is
growing rapidly
– Push vs. Pull (of data)
• Concerns about real-time BI
–
–
–
–
Not all data should be updated continuously
Mismatch of reports generated minutes apart
May be cost prohibitive
May also be infeasible
RDW / ADW
Batch
Mini-Batch
Micro-Batch
Real-Time
Description
Source changes
Data is loaded in full Data is loaded
are captured and
or incrementally using incrementally using
accumulated to be
a off-peak window.
intra-day loads.
loaded in intervals.
Source changes
are captured and
immediately
applied to the DW.
Latency
Daily or higher
Hourly or higher
15min & higher
Second(s)
Capture
Filter Query
Filter Query
CDC
CDC
Intialization
Pull
Pull
Push, then Pull
Push
Target Load
High Impact
Low Impact, load frequency is tuneable
Source Load
High Impact
Queries at peak
times necessary
Some to none depending on CDC
technique
RDW / ADW
Need for real-time data warehousing
Decision Support has become operational
Integrated BI requires closed-loop analytics
The reach and impact of information access for
decision making can affect customer service, SCM,
and beyond.
• Traditional hub-and-spoke architecture is difficult to
keep in sync
• One huge BW so that data is centralized for BI/BA
tools
•
•
•
•
Real-time/Active DW at Teradata
Enterprise Decision Evolution and DW
Traditional vs Active DW
Environment
DW Administration and Security
• Data warehouse administrator (DWA)
– DWA should…
• have the knowledge of high-performance software, hardware and
networking technologies.
• possess solid business knowledge and insight.
• be familiar with the decision-making processes so as to suitably
design/maintain the data warehouse structure.
• possess excellent communications skills.
• Security and privacy is a pressing issue in DW
– Safeguarding the most valuable assets
– Government regulations
– Must be explicitly planned and executed
The Future of DW
• Sourcing…
–
–
–
–
Open source software
SaaS (software as a service)
Cloud computing
DW appliances
• Infrastructure…
–
–
–
–
–
Real-time DW
Data management practices/technologies
In-memory processing (“super-computing”)
New DBMS
Advanced analytics
MDM
Master Data Management (MDM)
Master Data Management
Operational versus Analytical Master Data Management
Demystifying Master Data Management
Would You Like Fries With That? And Does CrossSelling Justify Master Data Management?
Data management's top eight stories of 2008
Human resources data analytics brings metrics to
workforce management
3-65
Master Data Management
© Affecto 2008
Master data – what is it?
• Master data is data shared across computer systems
in the enterprise.
• Master data is the dimension or hierarchy data in
data warehouses and transactional systems
• Master data is core business objects shared by
applications across an enterprise
• Slowly changing Reference data shared across
systems
• Master data is data worth managing
© Affecto 2008
Master data vs. Metadata vs. Transactional
Transactional
Company
Country
Account
SubAccount
Date
Amount
Affecto
NO
505050
500
20080301
KR30.000
Metadata
Company
Country
Account
Sub-Account
Date
Amount
Text
Text
Integer
Integer
Date
Float
nVarchar(50)
Char(2)
Int(6)
Int(3)
Datetime
Decimal
(YYYYMMDD)
Master data
Products
Software
Hardware
CPU
Customers
Affecto OY
Country
Europe
Affecto AS
Norway
© Affecto 2008
Affecto
AB
Sweden
Master data applications
• Product master data
– Product Information Management (PIM)
• Customer master data
– Customer Data Integration (CDI)
• Analytical master data
– Hierarchies used for reporting
• Other possible
– Recipe master data
– Vendor master data
– Employee master data
© Affecto 2008
What is Master Data Management
• The processes and technology to produce and
maintain a single clean copy of master data
– The “Golden” record
• An Application for creating and maintaining an
authoritative view of master data including
policies and procedures for access, update,
modification, viewing between systems across
the enterprise
© Affecto 2008
Why master data management?
•
•
•
•
•
•
•
Different people involved
Inevitable manual process
Error-prone, inconsistent
No way to audit
No way to rollback changes
Time and resource
consuming
Updates are “interpreted”
by systems experts
PPS
PPS
Admin
•
•
•
•
•
•
Essbase
Essbase
Analysis
Services
DW
Accounts
Entity
Project
Product
Location
Channel
ETL
ERP
EAI
Admin Spreadsheet
Dynamics
ERP
SAP
Custom
Review
Spreadsheet
Business
User
© Affecto 2008
IT Admin
E-Mail
Master data Management solution
• Single version of
truth
• Master data
synchronized and
validated
• Data maintained by
Business Users and
domain experts –
not systems
experts
PPS
Essbase
Essbase
Analysis
Services
DW
•
•
•
•
•
•
Accounts
Entity
Project
Product
Location
Channel
Dynamics
Business
User
© Affecto 2008
MDM
ERP
ETL
EAI
SAP
Custom
Governance & Compliance
• Master data governance
– Can you track changes in dimensions?
– Do you know who made the changes?
– Do you know when changes occurred?
– Can you produce a dimension from Q2 last year?
• Compliance
– International accounting standard
– Transparency and auditability
© Affecto 2008
DW Implementation Issues
• Tasks for successful DW implementation
– Establishment of service-level agreements and data-refresh
requirements
– Identification of data sources and their governance policies
– Data quality planning
– Data model design
– ETL tool selection
– Relational database software and platform selection
– Data transport
– Data conversion
– End-user support
DW Implementation Guidelines
•
•
•
•
•
•
•
•
Project must fit with corporate strategy & business
objectives
There must be complete buy-in to the project by executives,
managers, and users
It is important to manage user expectations about the
completed project
The data warehouse must be built incrementally
Build in adaptability, flexibility and scalability
The project must be managed by both IT and business
professionals
Only load data that have been cleansed and are of a quality
understood by the organization
Do not overlook training requirements
Successful DW Implementation
Things to Avoid
Starting with the wrong sponsorship chain
Setting expectations that you cannot meet
Engaging in politically naive behavior
Loading the data warehouse with information just
because it is available
• Believing that data warehousing database design is
the same as transactional database design
• Choosing a data warehouse manager who is
technology oriented rather than user oriented
•
•
•
•
Failure Factors in DW Projects
• Lack of executive sponsorship
• Unclear business objectives
• Cultural issues being ignored
– Change management
•
•
•
•
Unrealistic expectations
Inappropriate architecture
Low data quality / missing information
Loading data just because it is available
BI / OLAP Portal for Learning
• MicroStrategy, and much more…
• www.TeradataStudentNetwork.com
• Pw: <check with TDUN>
Q&A