Data-Mining-ppt (1).pptx

www.studymafia.org
Submitted To: Submitted By:
www.studymafia.org
www.studymafia.org
Seminar
On
Data Mining

Content
 Data Mining
 Data Mining Definition
 Data Mining – Two Main Components
 Data Mining vs. Data Analysis
 What is (not) Data Mining?
 Related Fields
 Data Mining Process
 Major Data Mining Tasks
 Uses of Data Mining
 Sources of Data for Mining
 Challenges of Data Mining
 Advantages
 Conclusion
 Reference

Data Mining
 New buzzword, old idea.
 Inferring new information from already collected
data.
 Traditionally job of Data Analysts
 Computers have changed this.
Far more efficient to comb through data using a
machine than eyeballing statistical data.

Data Mining Definition
Data mining in Data is the
non-trivial process of identifying
 valid
 novel
 potentially useful
 and ultimately understandable patterns in data.

Data Mining vs. Data
Analysis
 In terms of software and the marketing thereof
Data Mining != Data Analysis
 Data Mining implies software uses some intelligence
over simple grouping and partitioning of data to infer
new information.
 Data Analysis is more in line with standard statistical
software (ie: web stats). These usually present
information about subsets and relations within the
recorded data set (ie: browser/search engine usage,
average visit time, etc. )

What is (not) Data Mining?
Look up phone number
in phone directory
Query a Web search
engine for information
about “Amazon”
•Certain names are more
prevalent in certain US
locations (O’Brien,
O’Rurke, O’Reilly… in
Boston area)
• Group together similar
documents returned by
search engine according to
their context (e.g. Amazon
rainforest, Amazon.com,)
What is not Data Mining? What is Data Mining?

Data Mining Techniques
 Classification
 Clustering
 Regression
 Association Rules

Why Mine Data? Scientific
Viewpoint
 Data collected and stored at
enormous speeds (GB/hour)
o remote sensors on a satellite
o telescopes scanning the skies
o microarrays generating gene
expression data
o scientific simulations
generating terabytes of data
 Traditional techniques infeasible for raw data
 Data mining may help scientists
o in classifying and segmenting data
o in Hypothesis Formation

Related Fields
Statistics
Machine
Learning
Databases
Visualization
Data Mining and
Knowledge Discovery

__
__
__
__
__
__
__
__
__
Transformed
Data
Patterns
and
Rules
Target
Data
Raw
Data
Knowledge
Interpretation
& Evaluation
Integration
Understanding
Data Mining Process
DATA
Ware
house
Knowledge

Major Data Mining Tasks
 Classification: predicting an item class
 Associations: e.g. A & B & C occur frequently
 Visualization: to facilitate human discovery
 Estimation: predicting a continuous value
 Deviation Detection: finding changes
 Link Analysis: finding relationships...

Uses of Data Mining
 AI/Machine Learning
Combinatorial/Game Data Mining
Good for analyzing winning strategies to games, and thus
developing intelligent AI opponents. (ie: Chess)
 Business Strategies
Market Basket Analysis
Identify customer demographics, preferences, and
purchasing patterns.
 Risk Analysis
Product Defect Analysis
Analyze product defect rates for given plants and predict
possible complications (read: lawsuits) down the line.

Uses of Data Mining (Cont..)
 User Behavior Validation
Fraud Detection
In the realm of cell phones
Comparing phone activity to calling records. Can
help detect calls made on cloned phones.
Similarly, with credit cards, comparing purchases
with historical purchases. Can detect activity with
stolen cards.

Uses of Data Mining (Cont..)
 Health and Science
Protein Folding
Predicting protein interactions and functionality
within biological cells. Applications of this research
include determining causes and possible cures for
Alzheimers, Parkinson's, and some cancers (caused
by protein "misfolds")
Extra-Terrestrial Intelligence
Scanning Satellite receptions for possible
transmissions from other planets.
 For more information see Stanford’s Folding@home
and SETI@home projects. Both involve participation
in a widely distributed computer application.

Sources of Data for Mining
 Databases (most obvious)
 Text Documents
 Computer Simulations
 Social Networks

Advantages of Data Mining
 Marketing / Retail
 Finance / Banking
 Manufacturing
 Governments

Challenges of Data Mining
 Scalability
 Dimensionality
 Complex and Heterogeneous Data
 Data Quality
 Data Ownership and Distribution
 Privacy Preservation
 Streaming Data

Conclusion
 Comprehensive data warehouses that integrate operational
data with customer, supplier, and market information have
resulted in an explosion of information.
 Competition requires timely and sophisticated analysis on an
integrated view of the data.
 However, there is a growing gap between more powerful
storage and retrieval systems and the users’ ability to
effectively analyze and act on the information they contain.

Reference
 www.google.com
 www.wikipedia.com
 www.studymafia.org

Data-Mining-ppt (1).pptx

Recommended

Recommended

More Related Content

Similar to Data-Mining-ppt (1).pptx

Similar to Data-Mining-ppt (1).pptx (20)

Recently uploaded

Recently uploaded (20)

Data-Mining-ppt (1).pptx