Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Educational Data Mining
An insight into EDM at Muğla Sıtkı Koçman University
Presentation by Steven Strehl, HTW Berlin
E-Mail: steven.strehl@htw-berlin.de
June 30th, 2014
Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Gürüler, Hüseyin and Istanbullu, Ayhan (2014): Modeling Student Performance
in Higher Education Using Data Mining. In: Educational Data Mining,
Alejandro Peña-Ayala (Ed.).
Gürüler, Hüseyin (2005): Veritabanları Üzerinde Veri Madenciliği
Uygulaması. Muğla Sıtkı Koçman Üniversitesi.
Teşekkür ederim to Hüseyin Gürüler for providing support and additional information by mail and
Sema Karakurt for helping me translate important aspects of Hüseyin’s Master’s thesis from Turkish
to English.
Primary sources
Vision and Goals
Student Knowledge Discovery Software
Improve efficiency, quality,
“experience” of studies
Predict failure or success of students
Eliminate factors that lead to failure
Figure 1: SKDS user interface
Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Context
Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Student profiling
Figure 2: Female student
Gender
Military service
Grade Point Average
Name
Secondary school
Scholarship
Age
Final mark
Secondary school
Nationality
Study programme
Family income
Marital status
Focus subject
Parents’ professions
Religion
Hometown
…
Context
Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Figure 3: Individual students
Figure 4: Anonymous students
Basics
Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Knowledge Discovery in Databases
Data MiningAvailable Data Selected Data Pre-processed
Data
Transformed
Data
Interpretation
and Evaluation
Figure 5: Stages of the KDD process
Basics
Data Mining
Cross Industry Standard Process
for Data Mining (CRISP-DM)
Verification-driven DM
Aims at verifying assumptions
by data queries
Discovery-driven DM
Aims at gaining new insights
by unveiling patterns
Figure 6: CRISP-DM Process Diagram
Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Approach
Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Data curation
Figure 7: Male student
Gender
Military service
Grade Point Average
Name
Secondary school
Scholarship
Age
Final mark
Secondary school
Nationality
Study programme
Family income
Marital status
Focus subject
Parents’ professions
Religion
Hometown
…
Approach
Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Data volume and provenience
Data Mining
Microsoft
Decision
Tree
Algorithm
Available Data
University departments,
faculties, central
registration system and
archives
Selected Data
13 tables related to
the scope of SKDS
Pre-processed
Data
6 tables
Transformed Data
View consisting of
111 columns and
6,470 records
Interpretation
and Evaluation
Figure 8: Stages of the KDD process
Approach
Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Models and results
Model 1: GPA >= 3.0
- Language prep. (English)
- Registration preference
Model 2: GPA >= 2.0
- Family income
Figure 9: Decision Tree for Model 1 Figure 10: Decision Tree for Model 2
Issues and Outlook
Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Issues
Data availability, variety, format
Usability not yet suitable for everyday use
Transfer impact of findings
Outlook
Data Mining new algorithms available
MÜKÜP/SKDS was an early WEKA
Improvement new GUI, easier to use
Discussion
Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Topics
Social background Family income has been discovered as an
important factor for success. Does the data scientist’s work end here?
Study conditions How to measure and find aspects that counterbalance negative preconditions
of the social background?
Health How does the mental and physical state influence success and failure?
Open Data What if Student Life Cycle Data were open?
List of Figures
[1] Gürüler, Hüseyin and Istanbullu, Ayhan (2014): Modeling Student Performance in Higher Education Using Data Mining. No CC license.
[2] http://openclipart.org/user-detail/ryanlerch Public domain.
[3] http://openclipart.org/user-detail/ryanlerch Public domain.
[4] http://openclipart.org/user-detail/thekua Public domain.
[5] Composed graphic from http://openclipart.org/user-detail/jean_victor_balin, http://openclipart.org/user-detail/buggi, http://openclipart.org/user-detail/gsagri04 Public
Domain.
[6] Kenneth Jensen, https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining#mediaviewer/File:CRISP-DM_Process_Diagram.png
[7] http://openclipart.org/user-detail/ryanlerch Public domain.
[8] Composed graphic from http://openclipart.org/user-detail/jean_victor_balin, http://openclipart.org/user-detail/buggi,
http://openclipart.org/user-detail/gsagri04 Public Domain.
[9] Gürüler, Hüseyin and Istanbullu, Ayhan (2014): Modeling Student Performance in Higher Education Using Data Mining. No CC license.
[10] Gürüler, Hüseyin and Istanbullu, Ayhan (2014): Modeling Student Performance in Higher Education Using Data Mining. No CC license.

An insight into Educational Data Mining at Muğla Sıtkı Koçman University, Turkey

  • 1.
    Except where otherwisenoted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Educational Data Mining An insight into EDM at Muğla Sıtkı Koçman University Presentation by Steven Strehl, HTW Berlin E-Mail: steven.strehl@htw-berlin.de June 30th, 2014
  • 2.
    Except where otherwisenoted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Gürüler, Hüseyin and Istanbullu, Ayhan (2014): Modeling Student Performance in Higher Education Using Data Mining. In: Educational Data Mining, Alejandro Peña-Ayala (Ed.). Gürüler, Hüseyin (2005): Veritabanları Üzerinde Veri Madenciliği Uygulaması. Muğla Sıtkı Koçman Üniversitesi. Teşekkür ederim to Hüseyin Gürüler for providing support and additional information by mail and Sema Karakurt for helping me translate important aspects of Hüseyin’s Master’s thesis from Turkish to English. Primary sources
  • 3.
    Vision and Goals StudentKnowledge Discovery Software Improve efficiency, quality, “experience” of studies Predict failure or success of students Eliminate factors that lead to failure Figure 1: SKDS user interface Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
  • 4.
    Context Except where otherwisenoted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Student profiling Figure 2: Female student Gender Military service Grade Point Average Name Secondary school Scholarship Age Final mark Secondary school Nationality Study programme Family income Marital status Focus subject Parents’ professions Religion Hometown …
  • 5.
    Context Except where otherwisenoted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Figure 3: Individual students Figure 4: Anonymous students
  • 6.
    Basics Except where otherwisenoted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Knowledge Discovery in Databases Data MiningAvailable Data Selected Data Pre-processed Data Transformed Data Interpretation and Evaluation Figure 5: Stages of the KDD process
  • 7.
    Basics Data Mining Cross IndustryStandard Process for Data Mining (CRISP-DM) Verification-driven DM Aims at verifying assumptions by data queries Discovery-driven DM Aims at gaining new insights by unveiling patterns Figure 6: CRISP-DM Process Diagram Except where otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
  • 8.
    Approach Except where otherwisenoted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Data curation Figure 7: Male student Gender Military service Grade Point Average Name Secondary school Scholarship Age Final mark Secondary school Nationality Study programme Family income Marital status Focus subject Parents’ professions Religion Hometown …
  • 9.
    Approach Except where otherwisenoted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Data volume and provenience Data Mining Microsoft Decision Tree Algorithm Available Data University departments, faculties, central registration system and archives Selected Data 13 tables related to the scope of SKDS Pre-processed Data 6 tables Transformed Data View consisting of 111 columns and 6,470 records Interpretation and Evaluation Figure 8: Stages of the KDD process
  • 10.
    Approach Except where otherwisenoted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Models and results Model 1: GPA >= 3.0 - Language prep. (English) - Registration preference Model 2: GPA >= 2.0 - Family income Figure 9: Decision Tree for Model 1 Figure 10: Decision Tree for Model 2
  • 11.
    Issues and Outlook Exceptwhere otherwise noted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Issues Data availability, variety, format Usability not yet suitable for everyday use Transfer impact of findings Outlook Data Mining new algorithms available MÜKÜP/SKDS was an early WEKA Improvement new GUI, easier to use
  • 12.
    Discussion Except where otherwisenoted, content of this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Topics Social background Family income has been discovered as an important factor for success. Does the data scientist’s work end here? Study conditions How to measure and find aspects that counterbalance negative preconditions of the social background? Health How does the mental and physical state influence success and failure? Open Data What if Student Life Cycle Data were open?
  • 13.
    List of Figures [1]Gürüler, Hüseyin and Istanbullu, Ayhan (2014): Modeling Student Performance in Higher Education Using Data Mining. No CC license. [2] http://openclipart.org/user-detail/ryanlerch Public domain. [3] http://openclipart.org/user-detail/ryanlerch Public domain. [4] http://openclipart.org/user-detail/thekua Public domain. [5] Composed graphic from http://openclipart.org/user-detail/jean_victor_balin, http://openclipart.org/user-detail/buggi, http://openclipart.org/user-detail/gsagri04 Public Domain. [6] Kenneth Jensen, https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining#mediaviewer/File:CRISP-DM_Process_Diagram.png [7] http://openclipart.org/user-detail/ryanlerch Public domain. [8] Composed graphic from http://openclipart.org/user-detail/jean_victor_balin, http://openclipart.org/user-detail/buggi, http://openclipart.org/user-detail/gsagri04 Public Domain. [9] Gürüler, Hüseyin and Istanbullu, Ayhan (2014): Modeling Student Performance in Higher Education Using Data Mining. No CC license. [10] Gürüler, Hüseyin and Istanbullu, Ayhan (2014): Modeling Student Performance in Higher Education Using Data Mining. No CC license.