Your SlideShare is downloading. ×
Web-Based Data Mining System
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Web-Based Data Mining System


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. C.-C. Chan Department of Computer Science University of Akron Akron, OH 44325-4003 USA UA Faculty Forum 2008 by C.-C. Chan
  • 2. Outline
    • Overview of Data Mining
    • Software Tools
    • A Rule-Based System for Data Mining
    • Concluding Remarks
    UA Faculty Forum 2008 by C.-C. Chan
  • 3. Data Mining (KDD)
    • From Data to Knowledge
    • Process of KDD (Knowledge Discovery in Databases)
    • Related Technologies
    • Comparisons
    UA Faculty Forum 2008 by C.-C. Chan
  • 4. Why KDD?
    • We are drowning in information, but starving for knowledge  John Naisbett
    • Growing Gap between Data Generation and Data Understanding:
    • Automation of business activities:
    • Telephone calls, credit card charges, medical tests, etc.
    • Earth observation satellites:
    • Estimated will generate one terabyte (10 15 bytes) of data per day. At a rate of one picture per second.
    • Biology:
    • Human Genome database project has collected over gigabytes of data on the human genetic code [Fasman, Cuticchia, Kingsbury, 1994.]
    • US Census data:
    • NASA databases:
    • World Wide Web:
    UA Faculty Forum 2008 by C.-C. Chan
  • 5. Process of KDD [1] Fayyad, U., Editorial, Int. J. of Data Mining and Knowledge Discovery , Vol.1, Issue 1, 1997. [2] Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth, "From data mining to knowledge discovery: an overview," in Advances in Knowledge Discovery and Data Mining , Fayyad et al (Eds.), MIT Press, 1996. UA Faculty Forum 2008 by C.-C. Chan
  • 6. Process of KDD
    • Selection
        • Learning the application domain
        • Creating a target dataset
    • Pre-Processing
        • Data cleaning and preprocessing
    • Transformation
        • Data reduction and projection
    • Data Mining
        • Choosing the functions and algorithms of data mining
        • Association rules, classification rules, clustering rules
    • Interpretation and Evaluation
        • Validate and verify discovered patterns
    • Using discovered knowledge
    UA Faculty Forum 2008 by C.-C. Chan
  • 7. Typical Data Mining Tasks
    • Finding Association Rules [Rakesh Agrawal et al, 1993]
      • Each transaction is a set of items.
    • Given a set of transactions, an association rule is of the form X  Y
    • where X and Y are sets of items.
        • e.g.: 30% of transactions that contain beer also contain diapers;
        • 2% of all transactions contain both of these items.
    • Applications:
      • Market basket analysis and cross-marketing
      • Catalog design
      • Store layout
      • Buying patterns
    UA Faculty Forum 2008 by C.-C. Chan
  • 8.
    • Finding Sequential Patterns
        • Each data sequence is a list of transactions.
        • Find all sequential patterns with a user-specified minimum support.
          • e.g.: Consider a book-club database
          • A sequential pattern might be
            • 5% of customers bought “Harry Potter I”, then “Harry Potter II”, and then “Harry Potter III”.
    • Applications:
      • Add-on sales
      • Customer satisfaction
      • Identify symptoms/diseases that precede certain diseases
    UA Faculty Forum 2008 by C.-C. Chan
  • 9.
    • Finding Classification Rules
        • Finding discriminant rules for objects of different classes.
      • Approaches:
        • Finding Decision Trees
        • Finding Production Rules
    • Applications:
      • Process loans and credit cards applications
      • Model identification
    UA Faculty Forum 2008 by C.-C. Chan
  • 10.
    • Text Mining
    • Web Usage Mining
    • Etc.
    UA Faculty Forum 2008 by C.-C. Chan
  • 11. Related Technologies
    • Database Systems
      • MS SQL server
        • Transaction databases
        • OLAP (Data Cubes)
        • Data Mining
          • Decision Trees
          • Clustering Tools
    • Machine Learning/Data Mining Systems
      • CART (Classification And Regression Trees)
      • C 5.x (Decision Trees)
      • WEKA (Waikato Environment for Knowledge Analysis)
      • LERS
      • ROSE 2
    • Rule-Based Expert System Development Environments
      • CLIPS, JESS
      • EXSYS
    • Web-based Platforms
      • Java
      • MS .Net
    UA Faculty Forum 2008 by C.-C. Chan
  • 12. Comparisons UA Faculty Forum 2008 by C.-C. Chan Pre- Processing Learning Data Mining Inference Engine End-User Interface Web-Based Access Reasoning with Uncertainties MS SQL Server N/A Decision Trees Clustering N/A N/A N/A N/A CART C 5.x N/A Decision Trees Built-in Embedded N/A N/A WEKA Yes Trees, Rules, Clustering, Association N/A Embedded Need Programming N/A CLIPS JESS N/A N/A Built-in Embedded Need Programming 3 rd parties Extensions
  • 13. Rule-Based Data Mining System Objectives
    • Develop an integrated rule-based data mining system provides
      • Synergy of database systems, machine learning, and expert systems
      • Dealing with uncertain rules
      • Delivery of web-based user interface
    UA Faculty Forum 2008 by C.-C. Chan
  • 14. Structure of Rule-Based Systems UA Faculty Forum 2008 by C.-C. Chan
  • 15. System Workflow UA Faculty Forum 2008 by C.-C. Chan Input Data Set Data Pre-processing Rule Generator User Interface Generator
  • 16.
    • Input Data Set :
      • Text file with comma separated values (CSV)
      • It is assumed that there are N columns of values corresponding to N variables or parameters, which may be real or symbolic values.
      • The first N – 1 variables are considered as inputs and the last one is the output variable.
    • Data Preprocessing :
      • Discretize domains of real variables into a finite number of intervals
      • Discretized data file is then used to generate an attribute information file and a training data file.
    • Rule Generator :
      • A symbolic learning program called BLEM2 is used to generate rules with uncertainty
    • User Interface Generator :
      • Generate a web-based rule-based system from a rule file and corresponding attribute file
    UA Faculty Forum 2008 by C.-C. Chan
  • 17. Architecture of RBC generator Workflow of RBC generator Rule set File Metadata File RBC Generator UA Faculty Forum 2008 by C.-C. Chan Requests Middle Tier Client Responses SQL DB server Rule Table Definition
  • 18. Concluding Remarks
    • A system for generating rule-based classifier from data with the following benefits:
    • No need of end user programming
    • Automatic rule-based system creation
    • Delivery system is web-based provides easy access
    UA Faculty Forum 2008 by C.-C. Chan
  • 19. Project Status
    • The current version 1.4 of our system provides fundamental features for data mining from data including:
      • Data Preprocessing
      • Management of preprocessed data files
      • Machine Learning tool to generate rules from data
      • Rule-Based Classifier system supporting uncertain rules
      • Web-Based access
    UA Faculty Forum 2008 by C.-C. Chan
  • 20. Future Work
    • More advanced features in Data Preprocessing such as data cleansing, data transformation, and data statistics
    • Learning from multi-criteria inputs with preferential rankings to support Multiple Criteria Decision Making processes
    • Concept-Oriented information retrieval and search
    UA Faculty Forum 2008 by C.-C. Chan
  • 21.
    • Thank You!
    UA Faculty Forum 2008 by C.-C. Chan