Your SlideShare is downloading. ×
เอกสาร
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

เอกสาร

326
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
326
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Improving quality of graduate students by data mining
      • Asst. Prof. Kitsana Waiyamai, Ph.D.
      • Dept. of Computer Engineering
      • Faculty of Engineering, Kasetsart University
      • Bangkok, Thailand
  • 2. Content
    • PART I
      • Introduction to data mining
      • Data mining technique: association rule discovery
      • Data mining technique: data classification
    • PART II
      • Improving quality of graduate students by data mining
    • Conclusion
  • 3. What Is Data Mining ?
    • Knowledge Discovery from Data: KDD (Data Mining):
    • The process of nontrivial extraction of patterns from data. Patterns that are:
        • implicit,
        • previously unknown, and
        • potentially useful
    • Patterns must be comprehensible for human users.
  • 4. Knowledge Discovery Process: Iterative & Interactive Process Mining Objective Data sources Databases, flat files, Complex data Data Warehouses Preprocessing data Gathering, cleaning and selecting data Search for patterns: Data Mining Neural nets, machine learning, statistics and others Analyst reviews output Report findings Take actions based on findings Interpret results Knowledge
  • 5. What kind of data can be mined?
    • Relational databases
    • Data warehouses
    • Transactional databases and Flat files
    • Advanced DB systems and information repositories
      • Object-oriented and object-relational databases
      • Spatial databases
      • Time-series data and temporal data
      • Text databases, multimedia databases
      • Heterogeneous and legacy databases
      • World Wide Web
      • Bioinformatic data
    Databases Data Warehouse
  • 6. Two modes of data mining
    • Predictive data mining
      • Predict behavior based on historic data
      • Use data with known results to build a model that can be later used to explicitly predict values for different data
      • Methods: classification, prediction, … etc.
    • Descriptive data mining
      • Describe patterns in existing data that may be used to guide decisions
      • Methods: Associations rule discovery, Sequence pattern discovery, Clustering, … etc.
  • 7. Data Mining Techniques
    • Data Clustering
    • Association rule discovery
    • Data Classification
    • Outlier detection
    • Data regression
    • Etc.
  • 8.  
  • 9.
    • Classification is the process of assigning new objects to predefined categories or classes
      • Given a set of labeled records
      • Build a model
      • Predict labels for future unlabeled records
    • Example :
      • Age, Educational background, Annual income, Current debts, Housing location => Making Decision
      • Degree=“Master” and Income=7500 => Credit=“Excellent”
    Data Classification
  • 10. Three-Step Process of Classification Model construction Model Evaluation Classification Classifier Model Training Data Testing Data Classifier Model Unseen Data
  • 11. Data Mining Tools
    • ANGOSS KnowledgeStudio
    • IBM Intelligent Miner
    • Metaputer PolyAnalyst
    • SAS Enterprise Miner
    • SGI Mineset
    • SPSS Clementine
    • Many others
    • More at http://www. kdnuggets .com/software
  • 12. Data Mining Projects
    • Checklist:
      • Start with well-defined questions
      • Define measures of success and failure
    • Main difficulty: No automation
      • Understanding the problem
      • Data preparation
      • Selection of the right mining methods
      • Interpretation
  • 13. Using Data Mining for Improving Quality of Engineering Graduates
    • Objective:
    • Discover knowledge from large databases of engineering student records.
    • Discovered knowledge are useful in:
    • - Assisting in development of new curricula,
    • - Improvement of existing curricula,
    • - Helping students to select the appropriate major
  • 14. Using a data mining technique to help students in selecting their majors
    • Motivation:
    • - Student major selection is very important factor for his/her success.
    • - Lack of experience and information on each major.
    • Solution:
    • - Find out the profiles of good students for each major using student profile database and course enrollment student databases (10 years)
    • - Determine the most appropriate major for each student
  • 15. A Data Mining based Approach for Improving Quality of Engineering Graduates course enrollment student databases student profile database Data Mining Tool Java Servlet User DB2 SQL Server
  • 16. Data for Data Mining Student profile database course enrollment student databases .... .... ...... ....... .... ........... 3.2 ..... 3.4 Songkla male 37058167 2.3 ..... 2.5 Bangkok male 37058063 GPA ..... Sch_GPA Address Sex Stu_code B+ 2537 1 208111 37058063 D 2537 1 403111 37058063 C+ 2537 1 204111 37058063 Grade Year Term Sub_code Stu_code
  • 17. Data preparation a classification model + .... .... ...... ....... .... ........... 3.2 ..... 3.4 Songkla male 37058167 2.3 ..... 2.5 Bangkok male 37058063 GPA ..... Sch_GPA Address Sex Stu_code B+ 2537 1 208111 37058063 D 2537 1 403111 37058063 C+ 2537 1 204111 37058063 Grade Year Term Sub_code Stu_code ...... ..... ....... ...... ..... ....... 3.2 ..... High High male 37058167 2.3 .... Low Medium male 37058063 GPA … 403111 204111 Sex Stu_code
  • 18. Global Classification Model Global Decision Tree which determines which majors should be appropriate to which students. Each internal node represents a test on student’s profile. Each leaf node represents an appropriate major to be selected
  • 19. Drawbacks of Global Classification Model
    • - Low Precision ~ 50 % due to the large number of majors
    • - Number of students is different in each department => the model cannot predict correctly the best major to be selected.
    • - The model proposes a unique major to be selected, a set of possible majors ordered by appropriateness score would be preferred.
  • 20. Classification Model for Each Major
    • - Decision tree predicts whether a student is likely to be a good student in a given major.
    • Good students are those that graduate within 4 years and are at the first 40% ranking in a given major.
    • - Leaf nodes represent two class: Good and Bad
  • 21. Advantage of Major’s Classification Model
    • Good precision 80%
    • The model predicts the best major to be selected even if number of students in each major is different
    • Its proposes a set of possible majors to be selected ordered by appropriateness score.
    Encountered problems
    • Database size
    • Other factors that could affect student’s decision:
      • Teacher Preference, etc.
  • 22. Presentation of Discovered Knowledge
  • 23. Applying Association rule discovery for Grade prediction Basket Analysis 204111 Medium 403111 High 417167 Medium 417168 Medium Education
  • 24. Grade Prediction for the Coming Term
  • 25. Presentation of Discovered Knowledge
  • 26. Conclusion & Future works
    • Application of data mining in Education
    • Use data mining techniques for improving quality of engineering students
    • Apply data mining techniques to several other educational domains.

×