DATA MINING – TECHNIQUES AND APPLICATIONS
Upcoming SlideShare
Loading in...5
×
 

DATA MINING – TECHNIQUES AND APPLICATIONS

on

  • 574 views

 

Statistics

Views

Total Views
574
Views on SlideShare
487
Embed Views
87

Actions

Likes
0
Downloads
18
Comments
0

2 Embeds 87

http://www.ustudy.in 82
http://ustudy.in 5

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Data mining, as we all know, is a knowledge discovery process. With data mining, we can extract hidden, predictive information from large data sets.
  • This chart shows the evolution of data mining. From left to right, you have the evolutionary step, the business question that can be answered at that step, the enabling technologies and finally, the characteristics. You’ll notice that as time progresses, the information gained from the data becomes more and more useful, more and more sophisticated. In the 60’s you had simple data collection that could only answer broad, trivial questions. In the 80’s data access provided a bit more granularity of data In the 90’s data warehousing added on-line analytical processing for dynamic data delivery Finally, with Data mining, you get predictive, proactive information delivery
  • I’m breaking the process of data mining into 3 distinct phases. Exploration Model building and validation Deployment
  • Cleaning data is the process of minimizing errors in the dataset Data transformation creates a consistent format for data Feature selection narrows down the number of variables – useful for very large data sets with large numbers of irrelevant variables Exploratory Data Analysis is a graphics based statistical process that helps pre-analyze data by examining things like variable distributions.
  • The model building phase uses three major techniques, all of which have been covered in class by Dr. lee. They include: Decision trees Clustering Association Rules
  • Decision trees are tree shaped structures representing a set of decisions. This figure shows a decision tree that determines whether or not to play golf on a particular day based on weather conditions.
  • There are 2 types of clustering: In Hierarchical Clustering, Clusters are discovered successively using previously established clusters. In Partitional Clustering,All clusters are discovered at once
  • There are 2 kinds of Hierarchical Clustering: In Agglomerative Clustering (up or down), All elements are treated as a cluster and are merged into successively larger clusters. When you’ve reached the top or bottom, you are left with a fully clustered data set. In Divisive Clustering, Begins with the entire data set and breaks the data set into clusters.
  • Using support and confidence probabilities, you can estimate the probability of an expected outcome
  • Deployment is the final phase of data mining. In this phase, you select the best model from the previous phase and apply it to new data in order to generate predictions or estimates of the expected outcome. This is the payoff for the work done in phases 2 and 3. If the previous phases were done well, you can use the same model for different datasets with similar attributes and get usable results.
  • An investigation initiated by the federal government found that of the 5 departments investigated, all failed to implement their data mining projects according to federal privacy guidelines. This demonstrates that even when the intentions of data mining are harmless, sloppy implementation can result in a very dangerous situation.
  • In any large data sets, there will be interesting patterns or relationships. The danger lies in people’s eagerness to apply these sometimes spurious rules and patterns to data without careful validation.
  • Data Mining is a powerful tool with real-world applications. Data mining can be used to harness the mountains of data collected everyday. But... Data Mining can be abused and even when used in a positive manner, poor implementation can lead to serious issues.

DATA MINING – TECHNIQUES AND APPLICATIONS DATA MINING – TECHNIQUES AND APPLICATIONS Presentation Transcript

  • DATA MINING – TECHNIQUES AND APPLICATIONS Charlie Chough CS157B Spring 2006
  • TOPICS
    • What is Data Mining?
    • How does Data Mining work?
    • What are the applications for Data Mining?
    • What are the issues surrounding Data Mining?
  • What Is Data Mining?
    • Data Mining is the extraction of hidden predictive information from large databases.
    • Data Mining can predict future trends and behaviors allowing businesses to make proactive, knowledge-driven business decision.
  • What Is Data Mining?
    • The Evolution of Data Mining
    (Emerging Today)   Prospective, proactive information delivery Advanced algorithms, multiprocessor computers, massive databases "What’s likely to happen to Boston unit sales next month? Why?" Data Mining (1990s) Retrospective, dynamic data delivery at multiple levels On-line analytic processing (OLAP), multidimensional databases, data warehouses "What were unit sales in New England last March? Drill down to Boston." Data Warehousing & Decision Support (1980s)   Retrospective, dynamic data delivery at record level Relational databases (RDBMS), Structured Query Language (SQL), ODBC "What were unit sales in New England last March?" Data Access (1960s)   Retrospective, static data delivery Computers, tapes, disks "What was my total revenue in the last five years?" Data Collection Characteristics Enabling Technologies Business Question Evolutionary Step
  • How Does Data Mining Work?
    • 3 Phase Approach
      • 1) Exploration
      • 2) Model Building and Validation
      • 3) Deployment
  • How Does Data Mining Work?
    • Exploration
      • Data Preparation
        • Cleaning Data
        • Data Transformation
        • Feature Selection
        • Exploratory Data Analysis
  • How Does Data Mining Work?
    • Model Building and Validation
      • Techniques
        • Decision Trees
        • Clustering
        • Association Rules
  • How Does Data Mining Work?
    • Model Building and Validation
      • Decision Trees
        • Tree shaped structures that represent sets of decisions.
  • How Does Data Mining Work?
    • Model Building and Validation
      • Hierarchical Clustering
        • Clusters are discovered successively using previously established clusters.
      • Partitional Clustering
        • All clusters are discovered at once.
  • How Does Data Mining Work?
    • Model Building and Validation
      • Hierarchial Clustering
        • Agglomerative Clustering (up or down)
          • All elements are treated as a cluster and are merged into successively larger clusters.
        • Divisive Clustering
          • Begins with the entire data set and breaks the data set into clusters.
  • How Does Data Mining Work?
    • Model Building and Validation
      • Partitional Clustering
        • K-means clustering
        • QT Clustering
        • Fuzzy C-means Clustering
  • How Does Data Mining Work?
    • Model Building and Validation
      • Association Rules
        • Association Rules describe a correlation of events.
          • Support
          • Confidence
  • How Does Data Mining Work?
    • Deployment
      • Select the best model from the previous phase and apply it to new data in order to generate predictions or estimates of the expected outcome.
  • Applications for Data Mining?
    • Retail Market Basket Analysis
    • Business Intelligence
    • Medicine
    • Law Enforcement
  • Applications for Data Mining?
    • Retail Market Basket Analysis
      • Online retailers that suggest other products based on what other customers have purchased
      • Merchandising based on what items customers purchase together
        • Milk and bread
        • Diapers and Beer
  • Applications for Data Mining?
    • Business Intelligence
      • Business Intelligence tools allow businesses to gather, store, access and analyze corporate data to aid in the decision-making process.
        • Customer Profiling
        • Inventory and Distribution Analysis
        • Market Research and Segmentation
  • Applications for Data Mining?
    • Medicine
      • Data mining can be used to find combinations of prescription drugs that can have harmful interaction or side effects.
  • Applications for Data Mining?
    • Law Enforcement
      • Law enforcement agencies are using data mining to help identify terrorists.
  • Issues Surrounding Data Mining
    • Privacy Concerns
    • Data Dredging
  • Issues Surrounding Data Mining
    • Privacy Concerns
      • Multi-state Anti-Terrorism Information Exchange (MATRIX)
        • Massive collection of non-publicly available, personal data managed by a private Florida company.
  • Issues Surrounding Data Mining
    • Privacy Concerns
      • Government agencies failed to properly implement privacy rules for data mining.
        • Lapses by the Dept. of Agriculture, FBI, IRS, Small Business Administration and State Department increased the risk of data exposure.
  • Issues Surrounding Data Mining
    • Data Dredging
      • The practice of imposing patterns on data where none exist.
  • Conculsions
    • Data Mining is a powerful tool with real-world applications
    • But... Data Mining must be used carefully
  • References
      • Silberschatz, Korth, Sudarshan. 2006. Database System Concepts 5 th Ed. New York, NY: McGraw Hill
      • Wikipedia.com. 2006. ( http://en.wikipedia.org/wiki/Data_mining )
      • Thearling.com. 2006. ( http://www.thearling.com )
      • Small Business Computing.com. 2006. ( http://sbc.webopedia.com/TERM/B/Business_Intelligence.html )