Your SlideShare is downloading. ×
Web-Based Data Mining System
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Web-Based Data Mining System

257
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
257
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. C.-C. Chan Department of Computer Science University of Akron Akron, OH 44325-4003 USA chan@uakron.edu UA Faculty Forum 2008 by C.-C. Chan
  • 2. Outline
    • Overview of Data Mining
    • Software Tools
    • A Rule-Based System for Data Mining
    • Concluding Remarks
    UA Faculty Forum 2008 by C.-C. Chan
  • 3. Data Mining (KDD)
    • From Data to Knowledge
    • Process of KDD (Knowledge Discovery in Databases)
    • Related Technologies
    • Comparisons
    UA Faculty Forum 2008 by C.-C. Chan
  • 4. Why KDD?
    • We are drowning in information, but starving for knowledge  John Naisbett
    • Growing Gap between Data Generation and Data Understanding:
    • Automation of business activities:
    • Telephone calls, credit card charges, medical tests, etc.
    • Earth observation satellites:
    • Estimated will generate one terabyte (10 15 bytes) of data per day. At a rate of one picture per second.
    • Biology:
    • Human Genome database project has collected over gigabytes of data on the human genetic code [Fasman, Cuticchia, Kingsbury, 1994.]
    • US Census data:
    • NASA databases:
    • World Wide Web:
    UA Faculty Forum 2008 by C.-C. Chan
  • 5. Process of KDD [1] Fayyad, U., Editorial, Int. J. of Data Mining and Knowledge Discovery , Vol.1, Issue 1, 1997. [2] Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth, "From data mining to knowledge discovery: an overview," in Advances in Knowledge Discovery and Data Mining , Fayyad et al (Eds.), MIT Press, 1996. UA Faculty Forum 2008 by C.-C. Chan
  • 6. Process of KDD
    • Selection
        • Learning the application domain
        • Creating a target dataset
    • Pre-Processing
        • Data cleaning and preprocessing
    • Transformation
        • Data reduction and projection
    • Data Mining
        • Choosing the functions and algorithms of data mining
        • Association rules, classification rules, clustering rules
    • Interpretation and Evaluation
        • Validate and verify discovered patterns
    • Using discovered knowledge
    UA Faculty Forum 2008 by C.-C. Chan
  • 7. Typical Data Mining Tasks
    • Finding Association Rules [Rakesh Agrawal et al, 1993]
      • Each transaction is a set of items.
    • Given a set of transactions, an association rule is of the form X  Y
    • where X and Y are sets of items.
        • e.g.: 30% of transactions that contain beer also contain diapers;
        • 2% of all transactions contain both of these items.
    • Applications:
      • Market basket analysis and cross-marketing
      • Catalog design
      • Store layout
      • Buying patterns
    UA Faculty Forum 2008 by C.-C. Chan
  • 8.
    • Finding Sequential Patterns
        • Each data sequence is a list of transactions.
        • Find all sequential patterns with a user-specified minimum support.
          • e.g.: Consider a book-club database
          • A sequential pattern might be
            • 5% of customers bought “Harry Potter I”, then “Harry Potter II”, and then “Harry Potter III”.
    • Applications:
      • Add-on sales
      • Customer satisfaction
      • Identify symptoms/diseases that precede certain diseases
    UA Faculty Forum 2008 by C.-C. Chan
  • 9.
    • Finding Classification Rules
        • Finding discriminant rules for objects of different classes.
      • Approaches:
        • Finding Decision Trees
        • Finding Production Rules
    • Applications:
      • Process loans and credit cards applications
      • Model identification
    UA Faculty Forum 2008 by C.-C. Chan
  • 10.
    • Text Mining
    • Web Usage Mining
    • Etc.
    UA Faculty Forum 2008 by C.-C. Chan
  • 11. Related Technologies
    • Database Systems
      • MS SQL server
        • Transaction databases
        • OLAP (Data Cubes)
        • Data Mining
          • Decision Trees
          • Clustering Tools
    • Machine Learning/Data Mining Systems
      • CART (Classification And Regression Trees)
      • C 5.x (Decision Trees)
      • WEKA (Waikato Environment for Knowledge Analysis)
      • LERS
      • ROSE 2
    • Rule-Based Expert System Development Environments
      • CLIPS, JESS
      • EXSYS
    • Web-based Platforms
      • Java
      • MS .Net
    UA Faculty Forum 2008 by C.-C. Chan
  • 12. Comparisons UA Faculty Forum 2008 by C.-C. Chan Pre- Processing Learning Data Mining Inference Engine End-User Interface Web-Based Access Reasoning with Uncertainties MS SQL Server N/A Decision Trees Clustering N/A N/A N/A N/A CART C 5.x N/A Decision Trees Built-in Embedded N/A N/A WEKA Yes Trees, Rules, Clustering, Association N/A Embedded Need Programming N/A CLIPS JESS N/A N/A Built-in Embedded Need Programming 3 rd parties Extensions
  • 13. Rule-Based Data Mining System Objectives
    • Develop an integrated rule-based data mining system provides
      • Synergy of database systems, machine learning, and expert systems
      • Dealing with uncertain rules
      • Delivery of web-based user interface
    UA Faculty Forum 2008 by C.-C. Chan
  • 14. Structure of Rule-Based Systems UA Faculty Forum 2008 by C.-C. Chan
  • 15. System Workflow UA Faculty Forum 2008 by C.-C. Chan Input Data Set Data Pre-processing Rule Generator User Interface Generator
  • 16.
    • Input Data Set :
      • Text file with comma separated values (CSV)
      • It is assumed that there are N columns of values corresponding to N variables or parameters, which may be real or symbolic values.
      • The first N – 1 variables are considered as inputs and the last one is the output variable.
    • Data Preprocessing :
      • Discretize domains of real variables into a finite number of intervals
      • Discretized data file is then used to generate an attribute information file and a training data file.
    • Rule Generator :
      • A symbolic learning program called BLEM2 is used to generate rules with uncertainty
    • User Interface Generator :
      • Generate a web-based rule-based system from a rule file and corresponding attribute file
    UA Faculty Forum 2008 by C.-C. Chan
  • 17. Architecture of RBC generator Workflow of RBC generator Rule set File Metadata File RBC Generator UA Faculty Forum 2008 by C.-C. Chan Requests Middle Tier Client Responses SQL DB server Rule Table Definition
  • 18. Concluding Remarks
    • A system for generating rule-based classifier from data with the following benefits:
    • No need of end user programming
    • Automatic rule-based system creation
    • Delivery system is web-based provides easy access
    UA Faculty Forum 2008 by C.-C. Chan
  • 19. Project Status
    • The current version 1.4 of our system provides fundamental features for data mining from data including:
      • Data Preprocessing
      • Management of preprocessed data files
      • Machine Learning tool to generate rules from data
      • Rule-Based Classifier system supporting uncertain rules
      • Web-Based access
    UA Faculty Forum 2008 by C.-C. Chan
  • 20. Future Work
    • More advanced features in Data Preprocessing such as data cleansing, data transformation, and data statistics
    • Learning from multi-criteria inputs with preferential rankings to support Multiple Criteria Decision Making processes
    • Concept-Oriented information retrieval and search
    UA Faculty Forum 2008 by C.-C. Chan
  • 21.
    • Thank You!
    UA Faculty Forum 2008 by C.-C. Chan