SlideShare a Scribd company logo
1 of 2
CIS 690: Implementation of High-Performance
                    Data Mining Systems
                       Summer, 2000
Credit: 3 hours
Prerequisite: CIS 300 (Algorithms and Data Structures) and instructor permission, or CIS 500 (Analysis of
    Algorithms and Data Structures); basic courses in probability and statistics, databases recommended
Textbook: none (course notes)
Venue: Monday-Friday 8:00-10:00am, 236 Nichols Hall (lecture) and 19 Nichols Hall (lab)
Instructor: William H. Hsu, Department of Computing and Information Sciences
    Office: 213 Nichols Hall URL: http://www.cis.ksu.edu/~bhsu               E-mail: bhsu@cis.ksu.edu
    Office phone: 532-6350 Home phone: 539-7180
Office hours: after class; 1-3pm Monday; by appointment
Class web page: http://ringil.cis.ksu.edu/Courses/Summer-2000/CIS690/

Course Description
          This is a implementation practicum and basic tutorial on knowledge discovery in databases (KDD)
for students interested in applications of pattern recognition and machine learning such as data mining,
classification, expert systems, and planning and design automation. No prior background in artificial
intelligence, machine learning, or knowledge-based systems is assumed or required, but preliminary
coursework in probability and database systems is recommended. The course will introduce the following
basic algorithms and models: decision trees, simple (naïve) Bayes, feedforward artificial neural networks
(specifically, multilayer perceptrons), and the simple genetic algorithm. It will focus on implementation of
some basic algorithms and configuring, modifying, and augmenting existing codes for machine learning
and KDD.

         Half of the course will be spent in lecture and discussion (60 minutes per day, 5 days per week);
the other half, in the laboratory.

Course Requirements
Homework: 2 (out of 3) programming and written assignments (15%)
Paper reviews: 2 (out of 3) written reviews (1-2 pages) of research papers (10%)
Examinations: 1 take-home midterm (15%)
Computer language(s): C/C++ and Java (either permitted for term programming project)
Project: programming practicum using Linux (Beowulf) supercluster (60% total)

Selected Reading (on Reserve in K-State CIS Library)

•   Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and
    R. Uthurusamy, eds. MIT Press, 1996. ISBN: 0262560976
•   Machine Learning, T. M. Mitchell. McGraw-Hill, 1997. ISBN: 0070428077

Additional Bibliography (Excerpted in Course Notes and Handouts)

•   Artificial Intelligence: A Modern Approach, S. J. Russell and P. Norvig. Prentice Hall, 1995.
    ISBN: 0131038052
•   C4.5: Programs for Machine Learning, J. R. Quinlan. Morgan Kaufmann, 1993. ISBN: 1558602380
•   Genetic Algorithms in Search, Optimization, and Machine Learning, D. E. Goldberg. Addison-
    Wesley, 1989. ISBN: 0201157675
Class Calendar

 Lecture      Date       Topic                                             Source
                         Administrivia; overview of KDD
 0            May 15                                                       TMM Chapter 1
                         Lab environment, MLC++
                         Decision trees
 1            May 16                                                       TMM 3; Quinlan; RN 18
                         ID3 in MLC++; C4.5
                         Decision trees, overfitting
 2            May 17                                                       TMM 3; Quinlan; RN 18
                         MineSet Tree Visualizer
                         Wrappers
 3            May 18                                                       MLC++ manual
                         Wrappers in MLC++
                         Bagging
 4            May 19                                                       MLC++ manual
                         Using wrapper inducers in MLC++
                         Boosting
 5            May 22                                                       MLC++ manual
                         Implementing wrapper inducers
                         Simple Bayes (naïve Bayes)
 6            May 23                                                       TMM 6; MLC++ manual
                         Naïve Bayes inducer in MLC++
                         Improving simple Bayes (naïve Bayes)
 7            May 24                                                       TMM 6; paper
                         Improvements to simple Bayes
                         Using simple Bayes for text mining
 8            May 25                                                       TMM 6; paper; D2K manual
                         NCSA Data to Knowledge (D2K)
                         Introduction to Bayesian networks
 9            May 26     Hugin                                             TMM 6
                         Take-Home Midterm (due 6/1/2000)
                         Learning / building Bayesian networks
 10           May 30                                                       TMM 6
                         Bayesian Network Interchange Format (BNIF)
                         Learning Bayesian network structure
 11           May 31                                                       TMM 6; XBN docs
                         MSBN, XML; ODBC and Bayesian networks
                         Perceptrons and winnow                            TMM 4; RN 19;
 12           June 1
                         Perceptrons in MLC++; SNOW                        MLC++ manual
                         Intro to artificial neural networks (ANNs)        TMM 4; RN 19;
 13           June 2
                         SNNS                                              SNNS manual
                         Learning in ANNs
 14           June 5                                                       TMM 4; RN 19; NS manual
                         NeuroSolutions
                         Intro to genetic algorithms (GAs)                 TMM 9; DEG 1; paper;
 15           June 6
                         Genesis                                           D2K Jenesis manual
                           Designing genetic algorithms (GAs)
 16           June 7                                                        TMM 9; DEG 6
                           NCSA Jenesis
                           Intro to genetic programming (GP)
 17            June 8                                                       TMM 9
                           Implementing GP
                           Conclusions and wrap-up
                                                                            TMM 1, 3, 4, 6, 9;
 18            June 9      KDD developer resources
                                                                            RN 18, 19; DEG 1, 6
                           NO FINAL EXAM
TMM: Machine Learning, T. M. Mitchell
RN: Artificial Intelligence: A Modern Approach, S. J. Russell and P. Norvig
DEG: Genetic Algorithms in Search, Optimization, and Machine Learning, D. E. Goldberg

Lightly-shaded entries denote the (tentative) due dates of paper reviews.
Heavily-shaded entries denote the (tentative) due dates of written or programming assignments.

More Related Content

More from butest

The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 
Download
DownloadDownload
Downloadbutest
 
resume.doc
resume.docresume.doc
resume.docbutest
 
Download.doc.doc
Download.doc.docDownload.doc.doc
Download.doc.docbutest
 
Resume
ResumeResume
Resumebutest
 
Web Design Contract
Web Design ContractWeb Design Contract
Web Design Contractbutest
 
Download a Web Design Proposal Form
Download a Web Design Proposal FormDownload a Web Design Proposal Form
Download a Web Design Proposal Formbutest
 
Word version.doc.doc
Word version.doc.docWord version.doc.doc
Word version.doc.docbutest
 
Web Design
Web DesignWeb Design
Web Designbutest
 

More from butest (20)

The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 
Download
DownloadDownload
Download
 
resume.doc
resume.docresume.doc
resume.doc
 
Download.doc.doc
Download.doc.docDownload.doc.doc
Download.doc.doc
 
Resume
ResumeResume
Resume
 
Web Design Contract
Web Design ContractWeb Design Contract
Web Design Contract
 
Download a Web Design Proposal Form
Download a Web Design Proposal FormDownload a Web Design Proposal Form
Download a Web Design Proposal Form
 
Word version.doc.doc
Word version.doc.docWord version.doc.doc
Word version.doc.doc
 
Web Design
Web DesignWeb Design
Web Design
 

Syllabus.doc.doc

  • 1. CIS 690: Implementation of High-Performance Data Mining Systems Summer, 2000 Credit: 3 hours Prerequisite: CIS 300 (Algorithms and Data Structures) and instructor permission, or CIS 500 (Analysis of Algorithms and Data Structures); basic courses in probability and statistics, databases recommended Textbook: none (course notes) Venue: Monday-Friday 8:00-10:00am, 236 Nichols Hall (lecture) and 19 Nichols Hall (lab) Instructor: William H. Hsu, Department of Computing and Information Sciences Office: 213 Nichols Hall URL: http://www.cis.ksu.edu/~bhsu E-mail: bhsu@cis.ksu.edu Office phone: 532-6350 Home phone: 539-7180 Office hours: after class; 1-3pm Monday; by appointment Class web page: http://ringil.cis.ksu.edu/Courses/Summer-2000/CIS690/ Course Description This is a implementation practicum and basic tutorial on knowledge discovery in databases (KDD) for students interested in applications of pattern recognition and machine learning such as data mining, classification, expert systems, and planning and design automation. No prior background in artificial intelligence, machine learning, or knowledge-based systems is assumed or required, but preliminary coursework in probability and database systems is recommended. The course will introduce the following basic algorithms and models: decision trees, simple (naïve) Bayes, feedforward artificial neural networks (specifically, multilayer perceptrons), and the simple genetic algorithm. It will focus on implementation of some basic algorithms and configuring, modifying, and augmenting existing codes for machine learning and KDD. Half of the course will be spent in lecture and discussion (60 minutes per day, 5 days per week); the other half, in the laboratory. Course Requirements Homework: 2 (out of 3) programming and written assignments (15%) Paper reviews: 2 (out of 3) written reviews (1-2 pages) of research papers (10%) Examinations: 1 take-home midterm (15%) Computer language(s): C/C++ and Java (either permitted for term programming project) Project: programming practicum using Linux (Beowulf) supercluster (60% total) Selected Reading (on Reserve in K-State CIS Library) • Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds. MIT Press, 1996. ISBN: 0262560976 • Machine Learning, T. M. Mitchell. McGraw-Hill, 1997. ISBN: 0070428077 Additional Bibliography (Excerpted in Course Notes and Handouts) • Artificial Intelligence: A Modern Approach, S. J. Russell and P. Norvig. Prentice Hall, 1995. ISBN: 0131038052 • C4.5: Programs for Machine Learning, J. R. Quinlan. Morgan Kaufmann, 1993. ISBN: 1558602380 • Genetic Algorithms in Search, Optimization, and Machine Learning, D. E. Goldberg. Addison- Wesley, 1989. ISBN: 0201157675
  • 2. Class Calendar Lecture Date Topic Source Administrivia; overview of KDD 0 May 15 TMM Chapter 1 Lab environment, MLC++ Decision trees 1 May 16 TMM 3; Quinlan; RN 18 ID3 in MLC++; C4.5 Decision trees, overfitting 2 May 17 TMM 3; Quinlan; RN 18 MineSet Tree Visualizer Wrappers 3 May 18 MLC++ manual Wrappers in MLC++ Bagging 4 May 19 MLC++ manual Using wrapper inducers in MLC++ Boosting 5 May 22 MLC++ manual Implementing wrapper inducers Simple Bayes (naïve Bayes) 6 May 23 TMM 6; MLC++ manual Naïve Bayes inducer in MLC++ Improving simple Bayes (naïve Bayes) 7 May 24 TMM 6; paper Improvements to simple Bayes Using simple Bayes for text mining 8 May 25 TMM 6; paper; D2K manual NCSA Data to Knowledge (D2K) Introduction to Bayesian networks 9 May 26 Hugin TMM 6 Take-Home Midterm (due 6/1/2000) Learning / building Bayesian networks 10 May 30 TMM 6 Bayesian Network Interchange Format (BNIF) Learning Bayesian network structure 11 May 31 TMM 6; XBN docs MSBN, XML; ODBC and Bayesian networks Perceptrons and winnow TMM 4; RN 19; 12 June 1 Perceptrons in MLC++; SNOW MLC++ manual Intro to artificial neural networks (ANNs) TMM 4; RN 19; 13 June 2 SNNS SNNS manual Learning in ANNs 14 June 5 TMM 4; RN 19; NS manual NeuroSolutions Intro to genetic algorithms (GAs) TMM 9; DEG 1; paper; 15 June 6 Genesis D2K Jenesis manual Designing genetic algorithms (GAs) 16 June 7 TMM 9; DEG 6 NCSA Jenesis Intro to genetic programming (GP) 17 June 8 TMM 9 Implementing GP Conclusions and wrap-up TMM 1, 3, 4, 6, 9; 18 June 9 KDD developer resources RN 18, 19; DEG 1, 6 NO FINAL EXAM TMM: Machine Learning, T. M. Mitchell RN: Artificial Intelligence: A Modern Approach, S. J. Russell and P. Norvig DEG: Genetic Algorithms in Search, Optimization, and Machine Learning, D. E. Goldberg Lightly-shaded entries denote the (tentative) due dates of paper reviews. Heavily-shaded entries denote the (tentative) due dates of written or programming assignments.