DATA MINING TOOLS AND
APPLICATIONS
Submitted By:-
Tanuj Goyal
Ankit Chourasia
Vinita Singhal
WHAT IS DATA MINING?
Data mining is the process of analysing data from different
perspectives and summarizing it into useful information -
information that can be used to increase revenue, cuts costs, or
both.
Data mining software is one of a number of analytical tools for
analysing data. It allows users to analyse data from many
different dimensions or angles, categorize it, and summarize the
relationships identified.
Technically, data mining is the process of finding correlations or
patterns among dozens of fields in large relational databases.
FOR EXAMPLE :-
One Midwest grocery chain used the data mining capacity of
Oracle to analyse local buying patterns.
They discovered that when men bought diapers on Thursdays and Saturdays, they also
tended to buy beer.
Further analysis showed that these shoppers typically did their weekly grocery shopping on
Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded
that they purchased the beer to have it available for the upcoming weekend. The grocery
chain could use this newly discovered information in various ways to increase revenue. For
example, they could move the beer display closer to the diaper display. And, they could
make sure beer and diapers were sold at full price on Thursdays.
DATA MINING TOOLS
Artificial neural networks
Decision trees
Nearest neighbour method
Rule induction
Data visualization
ARTIFICIAL NEURAL NETWORKS
Artificial neural networks are computational models that are capable
of machine learning and pattern recognition. They are usually presented as
systems of interconnected "neurons" that can compute values from inputs by
feeding information through the network.
For example, in a neural network for handwriting recognition, a set of input
neurons may be activated by the pixels of an input image representing a letter or
digit. The activations of these neurons are then passed on, weighted and
transformed by some function determined by the network's designer, to other
neurons, etc., until finally an output neuron is activated that determines which
character was read.
DECISION TREES
Tree-shaped structures that represent sets of decisions. These
decisions generate rules for the classification of a dataset. They
provide a set of rules that you can apply to a new (unclassified)
dataset to predict which records will have a given outcome.
7
Decision Trees for Credit Card
Insurance Database
age
Cr Ins
<=43
Male
>43
Female
Critical value of 43 is determined
by the algorithm
N 3,Y 0
Decision:No
Gender
N 0, Y 6
Decision: Yes
Yes
No
N 4, Y 1
Decision: No
Yes 2, No 0
Decision? Yes
Dependent Variable
Life Insurance Promotion
A Production Rule
from the Tree
IF (age<=43)&(Sex=Male)
&(Credit Card In = No)
THEN Life Insurance = No
NEAREST NEIGHBOUR METHOD
It is a simple algorithm that stores all available or
historical cases and classifies or predicts new cases
based on a similarity measure. It uses old patterns to
predict the new ones.
RULE INDUCTION
Rule induction is the extraction of useful if-then rules
from data based on statistical significance.
The Diaper – Beer incident is an example of Rule
induction tool.
DATA VISUALIZATION
The visual interpretation of complex relationships in
multidimensional data is done so that it is easy to
understand. Graphics, charts, tables etc. are used to
illustrate data relationships.
DATA MINING APPLICATIONS
Market Analysis and Management
Target marketing, Customer Relation Management, Cross Selling, Market Segmentation
Risk Analysis and Management
Banks assume a financial risk when they grant loans
Risk models attempt to predict the probability of default or fail to pay back the borrowed amount
Credit cards
Insurance companies
Fraud detection and management
Other Applications
Text mining (news group, email, documents) andWeb analysis.
Intelligent query answering
12
MARKET ANALYSIS AND MANAGEMENT
Where are the data sources for analysis?
Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public)
lifestyle studies,clickstreams
Customer profiling-segmentation
Data mining can tell you what types of customers buy what products (clustering or classification)
Target marketing
Find clusters of “model” customers who share the same characteristics: interest, income level,
spending habits, etc.
13
MARKET ANALYSIS AND MANAGEMENT
Effectiveness of sales campaigns
Advertisements, coupons, discounts, bonuses
Promote products and attract customers
Can help improve profits
Compare amount of sales and number of transactions
During the sales period versus before or after the sales campaign
Association analysis
Which items are likely to be purchased together with the items on sale
14
MARKET ANALYSIS AND MANAGEMENT
Customer retention Analysis of Customer loyalty
Sequences of purchases of particular customers
Goods purchased at different periods by the same customers can be grouped into
sequences
Changes in customer consumption or loyalty
Suggests adjustments on the pricing and variety of goods
To retain old customers and attract new customers
Cross-selling and up-selling
Associations from sales records
A customer who buy a PC is likely to buy a printer
Purchase Recommendations
FRAUD DETECTION AND MANAGEMENT
Applications
Widely used in health care, retail, credit card services, telecommunications (phone card
fraud), etc.
Approach
Use historical data to build models of fraudulent behavior and use data mining to help
identify similar instances
Examples
Credit card transactions: The FALCON fraud assessment system by HNC Inc. to signal
possibly fraudulent credit card transactions
Money Laundering: Detect suspicious money transactions (US Treasury's Financial Crimes
Enforcement Network)
Detecting telephone fraud: ASPECT European Research Gr.
Unsupervised clustering to detect fraud in mobile phone networks
Telephone call model: destinationof the call, duration, time of day or week. Analyze patterns that deviate froman
expected norm.
FINANCIAL DATA ANALYSIS
Financial data
complete, reliable, high quality
Loan payment prediction and customer credit policy analysis
17
LOAN PAYMENT PREDICTION AND
CUSTOMER CREDIT POLICY ANALYSIS
Factors influencing loan payment performance
Loan-to-value ratio
Term of the loan
Debt ratio (total monthly debt/total monthly income)
Payment-to-income ratio
Income level
Education level
Residence region
Credit history
Analyst may find that
Payment-Income ratio is a dominant factor while education level and debt ratio are not
18
RISK MANAGEMENT AND INSURANCE
Determine insurance rates
Manage investment portfolios
Differentiate between companies and/or individuals who are
good and poor credit risks
Farmer`s Group discover a scenario:
Someone who owns a sports car is not a higher accident risk
Conditions: the sport car to be a second car and the family car to be a
station wagon or a sedan
19
DATA MINING FOR THE
TELECOMMUNICATION INDUSTRY
Telecommunication data are multidimensional
Calling-time
Duration
Location of caller
Location of callee
Type of call
Used to Identify and Compare
Data Traffic
System Workload
Resource Usage
User Group Behavior
Profit
Fraudulent pattern analysis and identification of unusual patterns
To achieve customer loyalty
Characteristics of customers affecting line usage
OTHER APPLICATIONS
• Sports and Gaming
• Predicting outcome of football games
• Text Mining
• Spam detection
• Educational Data Mining
• Clustering students
• Design enterece exams, selection policies
• Human Resources
• How to select applicants