Your SlideShare is downloading. ×
Introduction
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Introduction

769
views

Published on


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
769
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
34
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Chapter 1 Introduction to Data Mining Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
  • 2. Outline
    • What Motivated Data Mining?
    • So, What Is Data Mining?
    • What kind of patterns can we mined?
  • 3. What Motivated Data Mining?
    • Necessity is the mother of invention – Plato
    • The Explosive Growth of Data: from terabytes to petabytes
      • Data collection and data availability
      • Major sources of abundant data
  • 4. What Motivated Data Mining?
      • Data collection and data availability
        • Automated data collection tools, database systems, Web, computerized society
      • Major sources of abundant data
        • Business: Web, e-commerce, transactions, stocks, …
        • Science: Remote sensing, bioinformatics, scientific simulation, …
        • Society and everyone: news, digital cameras, YouTube
  • 5. What Motivated Data Mining?
    • We are drowning in data, but starving for knowledge!
  • 6. Evolution of Database Technology
    • 1960s:
      • Data collection, database creation, IMS and network DBMS
    • 1970s:
      • Relational data model, relational DBMS implementation
    • 1980s:
      • RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
      • Application-oriented DBMS (spatial, scientific, engineering, etc.)
  • 7. Evolution of Database Technology
    • 1990s:
      • Data mining, data warehousing, multimedia databases, and Web databases
    • 2000s
      • Stream data management and mining
      • Data mining and its applications
      • Web technology (XML, data integration) and global information systems
  • 8.  
  • 9. Outline
    • What Motivated Data Mining?
    • So, What Is Data Mining?
    • What kind of patterns can we mined?
  • 10. So, What Is Data Mining?
    • Data mining (knowledge discovery from data)
      • Extraction of interesting ( non-trivial, implicit , previously unknown and potentially useful) patterns or knowledge from huge amount of data
      • Data mining: a misnomer?
  • 11. So, What Is Data Mining?
    • Alternative names
      • Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
  • 12. Knowledge Discovery (KDD) Process
      • Data mining—core of knowledge discovery process
    Data Cleaning Data Integration Databases Data Warehouse Knowledge Task-relevant Data Selection Data Mining Pattern Evaluation
  • 13. Knowledge Process
    • Data cleaning – to remove noise and inconsistent data
    • Data integration – to combine multiple source
    • Data selection – to retrieve relevant data for analysis
    • Data transformation – to transform data into appropriate form for data mining
    • Data mining
    • Evaluation
    • Knowledge presentation
  • 14. Knowledge Process
    • Step 1 to 4 are different forms of data preprocessing
    • Although data mining is only one step in the entire process, it is an essential one since it uncovers hidden patterns for evaluation
  • 15. Knowledge Process
    • Based on this view, the architecture of a typical data mining system may have the following major components:
      • Database, data warehouse, world wide web, or other information repository
      • Database or data warehouse server
      • Data mining engine
      • Pattern evaluation model
      • User interface
  • 16.  
  • 17. Data Mining and Business Intelligence Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems
  • 18. Data Mining: Confluence of Multiple Disciplines Data Mining Database Technology Statistics Machine Learning Pattern Recognition Algorithm Other Disciplines Visualization
  • 19. Data Mining – on what kind of data?
    • Relational Database
    • Data Warehouse (is a repository of information collected from multiple sources, stored under a unified schema, and usually resides at a single site)
    • Transactional Database
  • 20.  
  • 21. Data Mining – on what kind of data?
    • Advanced data and information systems
    • Object-oriented database
    • Temporal DB, Sequence DB and Time serious DB
    • Spatial DB
    • Text DB and Multimedia DB
    • … and WWW
  • 22. Outline
    • What Motivated Data Mining?
    • So, What Is Data Mining?
    • What kind of patterns can we mined?
  • 23. What kind of patterns can we mined?
    • In general, data mining tasks can be classified into two categories: descriptive and predictive
      • Descriptive mining tasks characterize the general properties of the data in database
      • Predictive mining tasks performs inference on the current data in order to make predictions
  • 24. Mining frequent patterns, Associations, and Correlations (Ch4)
    • Frequent patterns are patterns that occur frequently in data
    • Association analysis:
      • Example: buys(X,”computer”) => buys(X,”software”) [support = 1%, confidence = 50%]
  • 25. Classification and Prediction (Ch 5)
    • Classification is the process of finding a MODEL that describes and distinguish data classes or concepts
  • 26. Cluster analysis (Ch 6)
    • In general, the class label are not present in the training data simply they are not known to begin with
    • The objects are clustered or grouped based on the principle of maximizing the intra-cluster similarity and minimizing the inter-cluster similarity
  • 27. Cluster analysis
  • 28. Outlier Analysis (Ch 7)
    • Most data mining methods discard outliers as noise or exceptions.
    • However, in some application such as fraud detection, the rare event can be more interesting than regularly occurring ones