• Save
Introduction to Data Mining
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Introduction to Data Mining



Introduction to Data mining

Introduction to Data mining



Total Views
Views on SlideShare
Embed Views



4 Embeds 15

http://www.slideshare.net 7
http://www.dataminingtools.net 5
http://dataminingtools.net 2
http://www.mefeedia.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Introduction to Data Mining Presentation Transcript

  • 1. Introduction on Data Mining
  • 2. What is Data Mining
    Non-trivial extraction of implicit, previously unknown and potentially useful information from data
    Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns
    Data mining is the process of automatically discovering useful information in large data repositories
  • 3. Simple Examples for Data Mining
    • Predicting whether a newly arrived customer will spend more than 100$ at a department store.
    • 4. Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)
  • Why Data Mining
    Credit ratings/targeted marketing:
    Given a database of 100,000 names, which persons are the least likely to default on their credit cards?
    Identify likely responders to sales promotions
    Fraud detection
    Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?
  • 5. Origins of Data Mining
    Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems
    Traditional Techniquesmay be unsuitable due to
    Enormity of data
    High dimensionality of data
    Heterogeneous, distributed nature of data
  • 6. Data Mining Tasks
    Prediction Methods
    Use some variables to predict unknown or future values of other variables
    Description Methods
    Find human-interpretable patterns that describe the data.
  • 7. Data Mining Tasks
    Classification [Predictive]
    Clustering [Descriptive]
    Association Rule Discovery [Descriptive]
    Sequential Pattern Discovery [Descriptive]
    Regression [Predictive]
    Deviation Detection [Predictive]
  • 8. Classification: Definition
    It is used for discrete target variables
    Ex: predicting whether a Web user will make a purchase at an online store is an classification tasks because the target variabe is binary-valued.
  • 9. Clustering: Definition
    - Clustering analysis seeks to find groups of closely related observations that belong to the same cluster are more similar to each other than observations that observations that belong s to other clusters.
    -to find areas of ocean that have aq significant impact on the earth’s climate.
  • 10. Association Rule Discovery: Definition
    Given a set of records each of which contain some number of items from a given collection;
    Produce dependency rules which will predict occurrence of an item based on occurrences of other items.
  • 11. Contd…
    Rules Discovered:
    {Milk} --> {Coke}
    {Diaper, Milk} --> {Beer}
  • 12. Sequential Pattern Discovery: Definition
    Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.
    (A B) (C) ---> (D E)
  • 13. Contd…
    Rules are formed by first disovering patterns. Event occurrences in the patterns are governed by timing constraints.
    (A B) (C) (D E)
    <= xg
    <= ws
    <= ms
  • 14. Sequential Pattern Discovery: Example
    In telecommunications alarm logs,
    (Rectifier_Alarm) --> (Fire_Alarm)
  • 15. Regression
    Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.
    Greatly studied in statistics, neural network fields.
  • 16. Regression-examples
    Predicting sales amounts of new product based on advertising expenditure.
    Predicting wind velocities as a function of temperature, humidity, air pressure, etc.
    Time series prediction of stock market indices.
  • 17. Deviation/Anomaly Detection
    Detect significant deviations from normal behavior
    Credit Card Fraud Detection
    Network Intrusion Detection
  • 18. Visit more self help tutorials
    Pick a tutorial of your choice and browse through it at your own pace.
    The tutorials section is free, self-guiding and will not involve any additional support.
    Visit us at www.dataminingtools.net