2. Data mining
• Data mining is the process of discovering interesting
patterns (or knowledge) from large amounts of data.
• Data mining is also called knowledge discovery and data
mining (KDD)
• Wikipedia definition: “Data mining is the entire process
of applying computer-based methodology, including new
techniques for knowledge discovery, from data.”
– Process of semi-automatically analyzing large databases
to find patterns that are sources valid, novel, potentially
useful, understandable
3. –Source:
• Databases (most obvious)
• Text Documents
• Computer Simulations (web)
• Social Networks
• Image.
Data Mining Tasks
1_Classification
2_Regression
3_Deviation detection
4_Clustring
5_ Association Rule Discovery
6_Sequential Pattern Discovery
4. • Components/Functionalities:
There are two main components:
• Knowledge Discovery
Concrete information gleaned from known data. Data you
may not have known, but which is supported by recorded
facts. Also known as descriptive
Knowledge Prediction
Uses known data to forecast future trends, events, etc.
Also known as predictive.(ie: Stock market predictions)
5. Classification :
In this we do able to identify or predict what class we are talking about.
e.G in business we reduce the cost of mailing by electing and sending mail to those consumers which
are likely to purchase those products.
Regressions:
Predict the values of the given continuous variable based on the values of variable , by
assuming the given linear or nonlinear values.
e.g predicting the predict the financial status of company by data mining of last few years
Deviation Detection:
Detect the significant deviation from the normal behavior. E.g credit card fraud detection.
6. 12
Clustering
Given a set of data points , each set has attribute and similarity b/w them. The fin d
cluster that , data points in one cluster are more similar to each other and in separate
cluster are less similar to one another
Association rule :
When we are given the set of records and by association , it makes the proper associated items , grouped each other.
E.gmemory card and mobile.
Sequential Pattered Discovery :
Set of objects, associated with its own timeline of events then finding the rules that strongly predict the strong
Sequential dependencies . E.g
(intro to c++)(c++ premier)---TCl, TCK
7. Architecture of Data Mining
DATA CLEANING
DATA INTRGRATION
DATA SELECTION
DATA TRANSFORMATION
DATA MINING
PATTEREN EVALUATION
KNOWLEDGE REPRESENTAION
8. • Rapid computerization of businesses produce huge amount of data
• How to make best use of data?
• A growing realization: knowledge discovered from data can be used for competitive advantage.
• Make use of your data assets
• There is a big gap from stored data to knowledge; and the transition won’t occur automatically.
• Many interesting things you want to find cannot be found using database queries
“find me people likely to buy my products”
“Who are likely to respond to my promotion”
9. Uses:
• Business Strategies•
Market Basket Analysis
Identify customer demographics, preferences, and
purchasing patterns.
• AI/Machine Learning•
Combinatorial/Game Data Mining
Good for analyzing winning strategies to games, and thus
developing intelligent AI opponents.
• Risk Analysis•
Product Defect Analysis
Analyze product defect rates for given plants and predict
possible complications
• Scientific Analysis:
10. • User Behavior Validation
Fraud Detection
In the realm of cell phones
Comparing phone activity to calling records. Can help detect
calls made on cloned phones.
Similarly, with credit cards, comparing purchases with
historical purchases. Can detect activity with stolen cards.
•
Extra-Terrestrial Intelligence
Scanning Satellite receptions for possible transmissions from
other planets.
Uses
11. When/Why we do data mining
• The data is abundant.
• The data is being warehoused.
• The computing power is affordable.
• The competitive pressure is strong.
• Data mining tools have become available
13. Warning :: Prevalence of Data Mining
• Your data is already being mined, whether you like it or not.
• Many web services require that you allow access to your information [for
data mining] in order to use the service.
• Google mines email data in Gmail accounts to present account owners
with ads.
• Facebook requires users to allow access to info from non-Facebook pages.
Facebook privacy policy:
"We may use information about you that we collect from other sources,
including but not limited to newspapers and Internet sources such as
blogs, instant messaging services and other users of Facebook, to
supplement your profile.