Association Rules
• Usually applied to market baskets but other
applications are possible
• Useful Rules contain novel and actionable
information: e.g. On Thursdays grocery customers are
likely to buy diapers and beer together
• Trivial Rules contain already known information: e.g.
People who buy maintenance agreements are the ones
who have also bought large appliances
• Some novel rules may not be useful: e.g. New
hardware stores most commonly sell toilet rings
Association Rule: Basic Concepts
• Given: (1) a set of transactions, (2) each transaction is a
set of items (e.g. purchased by a customer in a visit)
• Find: (all ?)rules that correlate the presence of one set
of items with that of another set of items
– E.g., 98% of people who purchase tires and auto accessories
also get automotive services done
• Applications
–
–
–
–
Retailing (What other products should the store stocks up)
Attached mailing in direct marketing
Market Basket Analysis (what do people buy together?)
Catalog design (Which items should appear next to each
other)
What Is Association Rule Mining?
• Association rule mining:
– Finding frequent patterns, associations, correlations, or
causal structures among sets of items or objects in
transaction databases, relational databases, and other
information repositories.
Rule Measures: Support and Confidence
Customer
buys both
Customer
buys beer
Customer
buys diaper
Find all the rules X & Y ⇒ Z with
minimum confidence and support
– support, s, probability that a
transaction contains {X & Y &
Z}
– confidence, c, conditional
probability that a transaction
having {X & Y} also contains Z
The Apriori principle (Agarwal, 1995):
Any subset of a frequent itemset must also be frequent
Mining Frequent Itemsets: the Key Step
• Find the frequent itemsets: the sets of items that
have minimum support
– A subset of a frequent itemset must also be a frequent
itemset
• i.e., if {AB} is a frequent itemset, both { A} and {B} should be
frequent itemsets
– Iteratively find frequent itemsets with cardinality from 1
to k (k-itemset)
• Use the frequent itemsets to generate association
rules.
The Apriori Algorithm
• Generate C1: all 1 unique items
• Generate L1: all 1 unique items with minimum
support
• Join Step: Ck is generated forming Cartesian Product of
Lk-1with L1. Since, any (k-1)-itemset that is not frequent
cannot be a subset of a frequent k-itemset
• Prune Step: Lk-1 is generated by selecting from Ck itemsets
those with minimum support
The Apriori Algorithm — Example
Database D
TID
100
200
300
400
Is Apriori Fast Enough? —
Performance Bottlenecks
• The core of the Apriori algorithm:
– Use frequent ( k – 1)-itemsets to generate candidate frequent k-itemsets
– Use database scan and matching to collect counts for the candidate
itemsets
• The bottleneck of Apriori: candidate generation
– Huge candidate sets:
• 104 frequent 1 -itemset will generate 107 candidate 2 -itemsets
• To discover a frequent pattern of size 100, e.g., {a1, a2, …, a 100}, one
needs to generate 2 100 ≈ 1030 candidates.
– Multiple scans of database:
• Needs (n +1 ) scans, n is the length of the longest pattern
Methods to Improve Apriori’s Efficiency
• Hash-based itemset counting: A k-itemset whose
corresponding hashing bucket count is below the threshold
cannot be frequent
• Transaction reduction: A transaction that does not contain
any frequent k-itemset is useless in subsequent scans
• Sampling: mining on a subset of given data, lower support
threshold + a method to determine the completeness
• Dynamic itemset counting: add new candidate itemsets only
when all of their subsets are estimated to be frequent
How to Count Supports of Candidates?
• Why is counting supports of candidates a problem?
– The total number of candidates can be very huge
– One transaction may contain many candidates
• Method:
– Candidate itemsets are stored in a hash-tree
– Leaf node of hash-tree contains a list of itemsets and
counts
– Interior node contains a hash table
– Subset function: finds all the candidates contained in a
transaction
Criticism of Support and Confidence
• Example 1: (Agarwal & Yu, PODS98)
– Among 5000 students
• 3000 play basketball
• 3750 eat cereal
• 2000 both play basket ball and eat cereal
– play basketball ⇒ eat cereal [40%, 66.7%] is misleading because the
overall percentage of students eating cereal is 75% which is higher than
66.7%.
– not play basketball ⇒ eat cereal [35%, 87.5%] lower support but
higher confidence!
– play basketball ⇒ not eat cereal [20%, 33.3%] is more informative,
although with lower support and confidence
basketball not basketball sum(row)
cereal
2000
1750
3750
not cereal
1000
250
1250
sum(col.)
3000
2000
5000
Criticism of Support and Confidence
• Example 2:
– X and Y: positively
correlated,
– X and Z, Y and Z: negatively
correlated
– support and confidence of
X=>Z dominates
• We need a measure of
dependent or correlated events
• P(B|A)/P(B) is called the lift
of rule A => B
Other Interestingness Measures: lift
P( A ∧ B)
• Lift = P(B|A)/P(B) =
P( A) P( B)
– takes both P(A) and P(B) into consideration
– P(A^B)=P(B)*P(A), if A and B are independent events
– A and B negatively correlated, if lift < 1
– If lift > 1, A and B positively correlated
X 1 1 1 1 0 0 0 0
Y 1 1 0 0 0 0 0 0
Z 0 1 1 1 1 1 1 1
Rule
Support
Lift
X=>Y
X=>Z
Y=>Z
25%
37.50%
12.50%
2
0.86
0.57
Extensions
Multiple-Level Association Rules
• Items often form hierarchy.
• Items at the lower level are
expected to have lower
support.
• Rules regarding itemsets at
appropriate levels could be
quite useful.
• Transaction database can be
encoded based on
dimensions and levels
Food
bread
milk
skim
Fraser
2%
wheat
Sunset
white
Mining Multi-Level Associations
• A top-down, progressive deepening approach:
– First find high-level strong rules (Ancestors):
milk → bread [20%, 60%].
– Then find their lower-level “weaker” rules (Descendants):
2% milk → wheat bread [6%, 50%].
• Variations of mining multiple-level association rules.
– Level-crossed association rules:
2% milk → Wonder wheat bread
– Association rules with multiple, alternative hierarchies:
2% milk → Wonder bread
Multi-level Association: Uniform
Support vs. Reduced Support
• Uniform Support: the same minimum support for all levels
§ + One minimum support threshold. No need to examine itemsets
containing any item whose ancestors do not have minimum
support.
§ – Lower level items do not occur as frequently. If support
threshold
• too high ⇒ miss low level associations
• too low ⇒ generate too many high level associations
• Reducing Support: reduced minimum support at lower
levels
– Needs modification to the basic algorithm
Uniform Support
Multi-level mining with uniform support
Level 1
min_sup = 5%
Level 2
min_sup = 5%
Milk
[support = 10%]
2% Milk
Skim Milk
[support = 6%]
[support = 4%]
Reduced Support
Multi-level mining with reduced support
Level 1
min_sup = 5%
Level 2
min_sup = 3%
Milk
[support = 10%]
2% Milk
Skim Milk
[support = 6%]
[support = 4%]
Multi-level Association:
Redundancy Filtering
• Some rules may be redundant due to “ancestor”
relationships between items.
• Example
– milk ⇒ wheat bread [support = 8%, confidence = 70%]
– 2% milk ⇒ wheat bread [support = 2%, confidence = 72%]
• We say the first rule is an ancestor of the second rule.
• A rule is redundant if its support is close to the
“expected” value, based on the rule’s ancestor.
Multi-Level Mining: Progressive
Deepening
• A top-down, progressive deepening approach:
– First mine high-level frequent items:
milk (15%), bread (10%)
– Then mine their lower-level “weaker” frequent itemsets:
2% milk (5%), wheat bread (4%)
• Different min-support threshold across multi-levels
lead to different algorithms:
– If adopting the same min-support across multi-levels
then reject t if any of t’s ancestor is infrequent.
– If adopting reduced min-support at lower levels
then examine only those descendants whose ancestor’s support is
frequent and whose own support is > reduced min-support
Sequence pattern mining
• Sequence events database
– Consists of sequences of values or events changing
with time
– Data is recorded at regular intervals
– Characteristic sequence events’ characteristics
• Trend, cycle, seasonal, irregular
Two types of sequence data
• Event series
– Record events that happens at certain time
– E.g. network logins
• Time series
– Record changes of certain (typically numeric)
values over time
– E.g. stock price movements, blood pressure
Event series
• Series can be represented in two ways:
– As a sequence (string) of events.
• Empty space if no events occur at a certain time
• Hard to represent multiple events
– As a set of tuples: {(time, event)}
• Allow for multiple event at the same time
Types of interesting info
• Which events happen often (not too interesting)
• What group of event happen often
– People who rent “Star Wars” also rent “Star Trek”
• What sequence of event happen often
– Renting “Star Wars”, then “Empire Strikes Back”, then
“Return of the Jedi” in that order
• Association of events within a time window
– People who rent “Star Wars” tend to rent “Empire
Strikes Back” within one week
Similarity/Difference with
Association Rules
• Similarities:
– Groups of events : frequent item sets
– Associations : Association rules
• Differences:
– Notion of (time) windows:
• People who rent “Star Wars” tend to rent “Empire Strikes
Back” within one week
– Ordering of events is important
Episodes
• A partially ordered sequence of events
A
A
B
C
B
Serial
(B follows A)
A
Parallel
(B follows A OR
A follows B)
B
General
(order between
A & B unknown
or immaterial but A
& B precede C)
Sub-episode / super-episode
• If A, B & C occur within a time window:
§
§
A & B is a sub-episode of A, B & C
A,B & C is the super-episode of A, B, C, A & B,
B&C
Frequent episodes / Episode Rules
• Frequent episodes
– Find episodes that appear often
• Episode rules
– Used to emphasize the effect of events on
episodes
– Support/confidence as defined in association
rules
• Example (window size = 11)
AB-C-DEABE-F-A-DFECDAABBCDE
Episode Rules : Example
A
A
C
B
B
Window size 10: Support 4%, Confidence 80%
• Meaning: Given episode on the left appear,
episode on the right appears 80% of the time.
• This essentially says that when (A,B) appears,
then C appears (within a given window size)
Mining episode rules
• Apriori principle for episode
– An episode is frequent if and only if all its subepisode is frequent
• Thus apriori-based algorithm can be applied
• However, there are a few tricky issues
Mining episode rules
• Recognizing episode in sequences
– Parallel episode: standard association rules techniques
– Serial/General episode: Finite state machine based
construction
• Alternative: Count parallel episodes first, then use them to
generate candidate episodes of other types
• Counting number of windows
– One event appears in n windows for window size w
– O.K. if sequence long, as the ratios even out.
– However when sequence size is small, the ‘edges’ can
dominate