# Business Intelligence & Data Mining-9

of 35

## Content

Association Rules

Association Rules
• Usually applied to market baskets but other
applications are possible
• Useful Rules contain novel and actionable
information: e.g. On Thursdays grocery customers are
likely to buy diapers and beer together
• Trivial Rules contain already known information: e.g.
People who buy maintenance agreements are the ones
who have also bought large appliances
• Some novel rules may not be useful: e.g. New
hardware stores most commonly sell toilet rings

Association Rule: Basic Concepts
• Given: (1) a set of transactions, (2) each transaction is a
set of items (e.g. purchased by a customer in a visit)
• Find: (all ?)rules that correlate the presence of one set
of items with that of another set of items
– E.g., 98% of people who purchase tires and auto accessories
also get automotive services done

• Applications

Retailing (What other products should the store stocks up)
Attached mailing in direct marketing
Catalog design (Which items should appear next to each
other)

What Is Association Rule Mining?
• Association rule mining:
– Finding frequent patterns, associations, correlations, or
causal structures among sets of items or objects in
transaction databases, relational databases, and other
information repositories.

• Examples.
– Rule form: “Body → Ηead [support, confidence]”.
– major(x, “CS”) ^ takes(x, “DB”) → grade(x, “CS”, “A”)
[1%, 75%]

Rule Measures: Support and Confidence
Customer

Customer

Customer

Find all the rules X & Y ⇒ Z with
minimum confidence and support
– support, s, probability that a
transaction contains {X & Y &
Z}
– confidence, c, conditional
probability that a transaction
having {X & Y} also contains Z

Transaction ID Items Bought
2000
A,B,C
1000
A,C
4000
A,D
5000
B,E,F

Let minimum support 50%,
and minimum confidence
50%, we have
– A ⇒ C (50%, 66.6%)
– C ⇒ A (50%, 100%)

Mining Association Rules—An Example
Frequent Itemsets
Transaction ID
2000
1000
4000
5000

Items Bought
A,B,C
A,C
A,D
B,E,F

For rule A ⇒ C:

Min. support 50%
Min. confidence 50%
Frequent Itemset Support
{A}
75%
{B}
50%
{C}
50%
{A,C}
50%

support = support({A &C}) = 50%
confidence = support({A &C})/support({A}) = 66.6%

The Apriori principle (Agarwal, 1995):
Any subset of a frequent itemset must also be frequent

Mining Frequent Itemsets: the Key Step
• Find the frequent itemsets: the sets of items that
have minimum support
– A subset of a frequent itemset must also be a frequent
itemset
• i.e., if {AB} is a frequent itemset, both { A} and {B} should be
frequent itemsets

– Iteratively find frequent itemsets with cardinality from 1
to k (k-itemset)

• Use the frequent itemsets to generate association
rules.

The Apriori Algorithm
• Generate C1: all 1 unique items
• Generate L1: all 1 unique items with minimum
support
• Join Step: Ck is generated forming Cartesian Product of
Lk-1with L1. Since, any (k-1)-itemset that is not frequent
cannot be a subset of a frequent k-itemset

• Prune Step: Lk-1 is generated by selecting from Ck itemsets
those with minimum support

The Apriori Algorithm — Example
Database D
TID
100
200
300
400

L2

itemset sup.
{1}
2
C1
{2}
3
3
Scan D {3}
{4}
1
{5}
3

Items
134
235
1235
25

itemset
{1 3}
{2 3}
{2 5}
{3 5}

C3

sup
2
2
3
2

itemset
{1 3 5}
{2 3 5}

C2 itemset sup
{1
{1
{1
{2
{2
{3

Scan D

2}
3}
5}
3}
5}
5}

1
2
1
2
3
2

L1 itemset sup.
{1}
{2}
{3}
{5}

2
3
3
3

C2 itemset
{1 2}
Scan D

L3 itemset sup
{2 3 5} 2

{1
{1
{2
{2
{3

3}
5}
3}
5}
5}

Is Apriori Fast Enough? —
Performance Bottlenecks
• The core of the Apriori algorithm:
– Use frequent ( k – 1)-itemsets to generate candidate frequent k-itemsets
– Use database scan and matching to collect counts for the candidate
itemsets

• The bottleneck of Apriori: candidate generation
– Huge candidate sets:
• 104 frequent 1 -itemset will generate 107 candidate 2 -itemsets
• To discover a frequent pattern of size 100, e.g., {a1, a2, …, a 100}, one
needs to generate 2 100 ≈ 1030 candidates.

– Multiple scans of database:
• Needs (n +1 ) scans, n is the length of the longest pattern

Methods to Improve Apriori’s Efficiency
• Hash-based itemset counting: A k-itemset whose
corresponding hashing bucket count is below the threshold
cannot be frequent
• Transaction reduction: A transaction that does not contain
any frequent k-itemset is useless in subsequent scans
• Sampling: mining on a subset of given data, lower support
threshold + a method to determine the completeness
• Dynamic itemset counting: add new candidate itemsets only
when all of their subsets are estimated to be frequent

How to Count Supports of Candidates?
• Why is counting supports of candidates a problem?
– The total number of candidates can be very huge
– One transaction may contain many candidates

• Method:
– Candidate itemsets are stored in a hash-tree
– Leaf node of hash-tree contains a list of itemsets and
counts
– Interior node contains a hash table
– Subset function: finds all the candidates contained in a
transaction

Criticism of Support and Confidence
• Example 1: (Agarwal & Yu, PODS98)
– Among 5000 students
• 3750 eat cereal
• 2000 both play basket ball and eat cereal
– play basketball ⇒ eat cereal [40%, 66.7%] is misleading because the
overall percentage of students eating cereal is 75% which is higher than
66.7%.
– not play basketball ⇒ eat cereal [35%, 87.5%] lower support but
higher confidence!
although with lower support and confidence

cereal
2000
1750
3750
not cereal
1000
250
1250
sum(col.)
3000
2000
5000

Criticism of Support and Confidence
• Example 2:
– X and Y: positively
correlated,
– X and Z, Y and Z: negatively
correlated
– support and confidence of
X=>Z dominates

• We need a measure of
dependent or correlated events
• P(B|A)/P(B) is called the lift
of rule A => B

X 1 1 1 1 0 0 0 0
Y 1 1 0 0 0 0 0 0
Z 0 1 1 1 1 1 1 1
X=>Y 25%
X=>Z 37.5%
Y=>Z 12.5%

50%
75%
50%

Other Interestingness Measures: lift
P( A ∧ B)
• Lift = P(B|A)/P(B) =
P( A) P( B)
– takes both P(A) and P(B) into consideration
– P(A^B)=P(B)*P(A), if A and B are independent events
– A and B negatively correlated, if lift < 1
– If lift > 1, A and B positively correlated
X 1 1 1 1 0 0 0 0
Y 1 1 0 0 0 0 0 0
Z 0 1 1 1 1 1 1 1

Rule

Support

Lift

X=>Y
X=>Z
Y=>Z

25%
37.50%
12.50%

2
0.86
0.57

Extensions

Multiple-Level Association Rules
• Items often form hierarchy.
• Items at the lower level are
expected to have lower
support.
• Rules regarding itemsets at
appropriate levels could be
quite useful.
• Transaction database can be
encoded based on
dimensions and levels

Food

milk
skim
Fraser

2%

wheat

Sunset

white

Mining Multi-Level Associations
• A top-down, progressive deepening approach:
– First find high-level strong rules (Ancestors):

– Then find their lower-level “weaker” rules (Descendants):
2% milk → wheat bread [6%, 50%].

• Variations of mining multiple-level association rules.
– Level-crossed association rules:
2% milk → Wonder wheat bread
– Association rules with multiple, alternative hierarchies:

Multi-level Association: Uniform
Support vs. Reduced Support
• Uniform Support: the same minimum support for all levels
§ + One minimum support threshold. No need to examine itemsets
containing any item whose ancestors do not have minimum
support.

§ – Lower level items do not occur as frequently. If support
threshold
• too high ⇒ miss low level associations
• too low ⇒ generate too many high level associations

• Reducing Support: reduced minimum support at lower
levels
– Needs modification to the basic algorithm

Uniform Support
Multi-level mining with uniform support
Level 1
min_sup = 5%

Level 2
min_sup = 5%

Milk
[support = 10%]

2% Milk

Skim Milk

[support = 6%]

[support = 4%]

Reduced Support
Multi-level mining with reduced support
Level 1
min_sup = 5%

Level 2
min_sup = 3%

Milk
[support = 10%]

2% Milk

Skim Milk

[support = 6%]

[support = 4%]

Multi-level Association:
Redundancy Filtering
• Some rules may be redundant due to “ancestor”
relationships between items.
• Example
– milk ⇒ wheat bread [support = 8%, confidence = 70%]
– 2% milk ⇒ wheat bread [support = 2%, confidence = 72%]

• We say the first rule is an ancestor of the second rule.
• A rule is redundant if its support is close to the
“expected” value, based on the rule’s ancestor.

Multi-Level Mining: Progressive
Deepening
• A top-down, progressive deepening approach:
– First mine high-level frequent items:
– Then mine their lower-level “weaker” frequent itemsets:
2% milk (5%), wheat bread (4%)

• Different min-support threshold across multi-levels
– If adopting the same min-support across multi-levels
then reject t if any of t’s ancestor is infrequent.

– If adopting reduced min-support at lower levels
then examine only those descendants whose ancestor’s support is
frequent and whose own support is > reduced min-support

Sequence pattern mining
• Sequence events database
– Consists of sequences of values or events changing
with time
– Data is recorded at regular intervals
– Characteristic sequence events’ characteristics
• Trend, cycle, seasonal, irregular

• Examples
– Financial: stock price, inflation
– Biomedical: blood pressure
– Meteorological: precipitation

Two types of sequence data
• Event series
– Record events that happens at certain time

• Time series
– Record changes of certain (typically numeric)
values over time
– E.g. stock price movements, blood pressure

Event series
• Series can be represented in two ways:
– As a sequence (string) of events.
• Empty space if no events occur at a certain time
• Hard to represent multiple events

– As a set of tuples: {(time, event)}
• Allow for multiple event at the same time

Types of interesting info
• Which events happen often (not too interesting)
• What group of event happen often
– People who rent “Star Wars” also rent “Star Trek”

• What sequence of event happen often
– Renting “Star Wars”, then “Empire Strikes Back”, then
“Return of the Jedi” in that order

• Association of events within a time window
– People who rent “Star Wars” tend to rent “Empire
Strikes Back” within one week

Similarity/Difference with
Association Rules
• Similarities:
– Groups of events : frequent item sets
– Associations : Association rules

• Differences:
– Notion of (time) windows:
• People who rent “Star Wars” tend to rent “Empire Strikes
Back” within one week

– Ordering of events is important

Episodes
• A partially ordered sequence of events
A
A

B

C
B

Serial
(B follows A)

A

Parallel
(B follows A OR
A follows B)

B
General
(order between
A & B unknown
or immaterial but A
& B precede C)

Sub-episode / super-episode
• If A, B & C occur within a time window:
§
§

A & B is a sub-episode of A, B & C
A,B & C is the super-episode of A, B, C, A & B,
B&C

Frequent episodes / Episode Rules
• Frequent episodes
– Find episodes that appear often

• Episode rules
– Used to emphasize the effect of events on
episodes
– Support/confidence as defined in association
rules

• Example (window size = 11)
AB-C-DEABE-F-A-DFECDAABBCDE

Episode Rules : Example
A

A
C

B

B

Window size 10: Support 4%, Confidence 80%

• Meaning: Given episode on the left appear,
episode on the right appears 80% of the time.
• This essentially says that when (A,B) appears,
then C appears (within a given window size)

Mining episode rules
• Apriori principle for episode
– An episode is frequent if and only if all its subepisode is frequent

• Thus apriori-based algorithm can be applied
• However, there are a few tricky issues

Mining episode rules
• Recognizing episode in sequences
– Parallel episode: standard association rules techniques
– Serial/General episode: Finite state machine based
construction
• Alternative: Count parallel episodes first, then use them to
generate candidate episodes of other types

• Counting number of windows
– One event appears in n windows for window size w
– O.K. if sequence long, as the ratios even out.
– However when sequence size is small, the ‘edges’ can
dominate

## Recommended

#### Business Intelligence & Data Mining-10

Or use your account on DocShare.tips

Hide