Business Intelligence & Data Mining-14

Published on January 2017 | Categories: Documents | Downloads: 64 | Comments: 0 | Views: 277
of 25
Download PDF   Embed   Report

Comments

Content

Lessons & Challenges
from Mining Retail ECommerce Data
Kohavi et. al (2004)

Motivation
n
n
n
n

n

n

Important domain of data mining
Massive amounts of data is collected
Data collection is automatic – not prone to errors
Data is ‘Rich’ – has a lot of potential for discovering
patterns
Three types of Data: Clickstream data, Transactional
data and User Profile data
Combined mining of these 3 types of data is possible
10%

90%

The E-Commerce Data Mining Suite
n

n

n
n

E-Commerce data mining suite developed by
Blue Martini Software
Purchased and used by many ‘Brand Name’
retailers: Debenhams, Harley Davidson,
Sainbury’s, Sprint etc.
System designed specifically for BI
End-to-end solution:
n
n
n
n
n

Data Collection
Data Warehousing
Data Transformations
Visualization
Data Mining

The Business Intelligence Process
Pattern Evaluation

Data Mining
Task-relevant Data
Data Warehouse
Data Cleaning
Data Integration
Data Sources

Selection and
Reduction

The Experience Shared
• Business Lessons & Technical Lessons have been shared
• Data Mining projects executed for more than 20 clients
• Clients from different industry verticals with varying
business models
• Clients spread over: US, Europe, Asia & Africa
• Data Sizes upto 100 million records
• Diverse data:
Clickstream
User Profile
Demographic
Response to Mail Campaigns
Orders Placed through website / telephone / in-store

Business Lessons

Requirements Gathering is Challenging
n

Clients are reluctant to list “business questions”
n
n

n

Clients present standard reporting type
questions, e.g.
n
n

n

They may not know what questions to ask
They do not understand the underlying technology
and how much it can do

What is the gender-wise distribution of customers?
What is the region-wise response rate of the mail
campaign?

Instead of asking questions like:
n

n

What are the characteristics of customers who spend
more than $500?
What kind of people responded to the mail
campaign?

Educating the Users
n

Involving the users is critical for success
n
n

n

Understanding the business
Uncovering the real needs

Users will have to educated
n
n
n

What can be achieved by BI
Prototypes / Demo Systems
Case studies

Business Events
n

The architecture records
n

n

n

n
n
n

n

Every customer search and number of results returned:
Too many rows, No rows
Shopping cart events: Add to cart, Change Quantity,
Delete
Registration, log-in, checkout, payment, order
confirmation
Any failure / crashes
User’s timezone
Technical capabilities of the user’s computer

These details are collected particularly
because they are useful for ANALYSIS

Data Collection
n

Usual methods of data collection:
n
n

n

n

Stateless Http requests from multiple web servers
Parsing and loading them session-wise and userwise
Difficult – Web logs were designed for debugging
web servers not to provide data for BI

Blue Martini architecture was designed for BI
n

n
n
n

Session & user data collected and linked together
at Application Server level
Transactions automatically tied to sessions
All data automatically recorded in a database
Pre-processing and data cleaning is not required

Data Collection Lessons
n

Collect the right data upfront
n

n

n

All data that could be useful should be
collected and integrated
Stored in a database / data warehouse

Integrate with External Events
n
n

Marketing events like promotions
Cannot be captured by the data collection
systems

Creating the Data Warehouse
n

n

n

DW creation requires substantial data
transformations
Can take 80% of the time taken to the
complete BI exercise
Requires integration of several data sources:
n
n
n
n
n

Website
Payment gateway
Call center
POS terminals / shops’ systems
External systems / inputs (e.g. promotions /
campaigns data)

Logical DW Architecture

Data Warehousing: Challenges
• Loading and Maintaining Consistent Data

• Loading and Storing Large Volumes of Data
• Coping with Changes in Operational Definitions
• Providing Reasonable Response Times
• If it is an E-Commerce site – the website itself will
be outside the Firewall, so data will have to be
copied across the Firewall

Business Intelligence Tools
n

n

The software provided: Reports, Visualization
and Data Mining
Data Mining algorithms included:
n
n
n
n

Rule Induction
Anomaly (outlier) detection
Entropy-based statistics
Association Rules

Business Intelligence Lessons (1)
n

Operational transactions have higher
priority than BI
n

n

n

“BI can be taken up after the system
stabilizes”
Can take several months to get started

Users are “happy” with basic reports /
MIS
n

n

Unexpectedly insightful findings capture their
interest
This can start the BI process

Business Intelligence Lessons (2)
n

Trained Data Analysts are required
n
n

n

Domain knowledge is important
Technical know-how is essential

Terminology needs to be Defined
n
n

Users can misinterpret results
Potentially useful findings may be ignored or
unrealistic expectations can arise

Business Intelligence: Challenges
• Designing user-friendly interactive interface

• Automatic Feature Construction
• Building models that users can interpret
• Making users understand that correlation does not
imply causality
• Explaining insights
• Linking ROI to insights

Deployment
n

Insights need to be shared
n

n

n

Insights obtained by Data Mining needs to be
shared across the organization
Easy to use tools for capturing and
communicating (e.g. by E-mail) will help

Taking Action
n
n

n

Business users must see the value
Acting on the results may be difficult (e.g.
designing a campaign for a special segment
of customers)
A good architecture would help

Technical Lessons

Data Collection and Management Lessons
n

Collect data at the right level
n

n

n

Data was collected at the Application Server
level
Reduced pre-processing of weblog data

Design the GUI with Data Mining in mind
n
n
n

All useful data can be captured
Default values should be avoided
Validate data to reduce cleaning effort

Data Collection and Management
Challenges
n

Should data be sampled?
n
n
n

n

Slowly changing “dimensions”
n

n

n

E-Commerce data is huge in volume
Is it necessary to store all the data?
Will rare events be missed if sampling is done?
Customers evolve (e.g. lifetime changes, lifestyle
changes)
Products evolve (e.g. new lines, new technology)

Frequency of DW uploads
n
n

DW uploads take time and processing power
Should not disrupt BI analysts’ work

Data Cleaning and Pre-processing
Lessons & Challenges
n

Time-outs, incomplete sessions, crashes
n
n

n

Duplicates
n
n
n

n

n

Needs to be detected
What to do with such data?
Same customer with more than one ID
Same account used by multiple customers
“Guest” log-ins

Missing, unknown, not applicable or default
values
Hierarchical Attributes
n

Most algorithms cannot handle hierarchical attributes

An Attribute Hierarchy
all

all

region

country

city
office

Europe

Germany ...

Frankfurt

...

...

Spain

North_America

Canada

Vancouver ...

...

Mexico

Toronto

L. Chan ... M. Wind

Analysis Lessons & Challenges
n

Enriching the Data
n
n
n

n

Exploration
n
n
n

n

Add demographic attributes
Create derived attributes
Calculate weighted averages, moving averages
Visualization
Domain knowledge can help in gaining insight
Customer propensity scoring

Building Models
n
n
n

n
n

Start with simple models (easy to explain to users)
Build models at the right level of the attribute hierarchy
Address scalability issues (to maintain users’ interest and
confidence)
Test and validate the models
Estimate accuracy levels

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close