SlideShare a Scribd company logo
1 of 8
INFT 910
Advanced Topics in Artificial Intelligence
DATA MINING and KNOWLEDGE DISCOVERY
Ryszard S. Michalski
Machine Learning and Inference Laboratory
Department of Systems Engineering and Operations Research
Department of Computer Science
George Mason University
Fairfax, USA
and
Institute of Computer Science
Polish Academy of Sciences
Warsaw, Poland
Copyright©1998-2000 by R. S. Michalski
INFT 910
Advanced Topics in Artificial Intelligence
DATA MINING and KNOWLEDGE DISCOVERY
Ryszard S. Michalski
Email: michalski@gmu.edu
Web: http://www.mli.gmu.edu./people/michalski.html
Course description
This course is concerned with the modern methods and systems for deriving user-
oriented knowledge from large databases and other information sources, and applying
this knowledge to support decision making. Information sources can be in numerical,
textual, visual, or multimedia forms. The course covers theoretical and practical aspects
of current methods and selected systems for data mining, knowledge discovery, and
knowledge management, including those for text mining, multimedia mining, and web
mining.
The course is taught using a novel adaptive teaching method, in which the presentation
level and the amount of time spent on different topics is adjusted according to the
interests of the students in the particular class. This teaching method stresses teaching
students how to learn on their own, encourages student’s initiative in learning, and
motivates them to study deeper the topics most interesting to them through projects
and individual reading.
Students will learn the course topics through lectures, through reading of the assigned or
selected by them materials, and individual presentations. In addition, students with
different backgrounds will work on a group project in which they will complement each
other in expertise and background, and learn skills of collaboration. They will also get
hand-on experience with some of the state-of-the-art data mining and knowledge
discovery systems
Topics
1) Goals of data mining and knowledge discovery
2) Fundamental concepts: data, information, knowleldge
3) Databases, information systems, and knowledge bases
4) Statistics-based data mining methods
5) Machine learning-based, and other data mining methods
6) Knowledge application and management
7) Data and knowledge visualization
8) Systems and applications
9) Future directions
Texts:
Lecture Notes of the Instructor
Supplementary Texts:
Michalski, R.S., Bratko, I., Kubat, M., Machine Learning and Data Mining: Methods
and Applications, John Wiley & Sons, 1998.
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uhturusamy, R., Advances in
Knowledge Discovery and Data Mining, AAAI Press/The MIT Press, 1996.
Additonal texts:
Agrawal, R., Stolorz, P. and Piatesky-Shapiro G., Proceedings of the Fourth
International Conference on Knowledge Discovery and Data Mining, AAAI Press,
New Yourk, August 27-31, 1998.
Sharma, S., Applied Multivariate Techniques, John Wiley & Sons, Inc., 1996.
Grinstein Georges, Andreas Wierse, Usama Fayyad, Proceedings of the Third
International Conference on Knowledge Discovery and Data Mining, (KDD-97), Newport
Beach, CA, August 14-17, 1997.
Pawlak, Z. Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic
Publishers, 1991.
Simoudis, E. Han J. and Fayyad, U. (eds.), Proceedings of the Second International
Conference on Knowledge Discovery and Data Mining, Portland, OR, August 2-4,1996.
TsumotoS., Kobayashi S., Yomomori T., Tanaka H., Proceedidings of the Fourth
International Workshop on Rough Sets, Fuzzy Sets and Machine Discovery, The Univeristy
of Tokio, November 6-8, 1996.
Ziarko W. P., Rough Sets, Fuzzy Sts and Knowledge Discovery, Springer Verlag, 1993.
Weiss, S. M., Kulikowski, C.,Computers that Learn, Morgan Kaufman Publishers, 1991.
Bock , H.H. (ed.), Classification and Related Methods of Data Analysis, 1987.
Journals and Proceedings
Machine Learning, Kluwer Academic Publishers
Artificial Intelligence Journal, North Holland
Proceedings of Machine Learning Workshops, No. 2-8, Morgan Kaufmann Publishers (83 -
U. of Illinois, 85 - Rutgers U., 87 - U. of California at Irvine, 88 - U. of Michigan at Ann Arbor,
89 - Cornell UY., 90- U. of Texas; 91-Northwestern University; 92 - University of Aberdeen).
Proceedings of the IJCAI Conferences, (87-Milano, Italy; 89-Detroit; 91-Sydney, Australia;
93-Montpelier, France; 95-Montreal, Canada; 97-Nagoya, Japan ).
Proceedings of the AAAI Conferences (e.g. 88- St. Paul, Minnesota, 97- Providence, RI)
Copyright © Ryszard S. Michalski, 1999-2000
Grading policy:
50% project, 30% presentations and 20% participation in class
discussions
Office Hours:
Wednesdays 3:00- 4:15 or by appointment.
Grading policy
Presentations and participation in class discussions count for 20%
Homeworks (assigned/voluntary) count for 20%
Experimental project and report count for 60%
Grading on each of the above items will be on the scale 0-10.
The final examination in the form of the project presentation.
Office Hours
Wednesdays 3:00-4:15 or by appointment
Room 411, SITE 2
Computer Access
• To activate your computer account at GMU, connect to 'mason', type 'accounts'
and press enter at the login prompt, press enter at the password prompt, and
follow the remaining instructions. You can also connect to the Web page
'iso.gmu.edu' for this procedure. Once you have your GMU account, you can get
one on SITE: Login to 'mason' and type "sitereg". You will be prompted for your
GMU id and will be allowed to create a SITE account on line.
• In the case that you will be working on a project requiring resources MLI
Laboratory, you will be able to get an account on the laboratory computers
{Contact Ken Kaufman (kaufman@aic.gmu.edu) for an account}.
Groups and collaboration
Early at the beginning of the course, you will form study/project groups. You should
meet with your study group once a week as part of your normal class work. Another
part of the normal class work is individual reading of the material relevant to the
topics covered in the class. In your study/projectYou should discuss questions that
you may have regarding the material covered in the class, or any other relevant
material that you may have learned from reading any material relevant to the topic of
the class.
Except when group projects are explicitly declared, you must write your own
individual report for each assignment. You will learn much more working with your
group than you would working alone. It is important to acknowledge the sources of
your information -- name the persons with whom you collaborated, cite sections from
books or articles if you use them, etc. In short, collaborate freely, acknowledge all
help and sources, and write your own individual homework reports. Your study group
will also function as a project team in the projects that you will work on.
PROJECTS
There will be two projects involving experiments in the SITE lab. They will include C
or Java programming. The first project will be a lab assignment in concurrent
programming. The second project will be an experimental study of an operating
system component. For each project, your team will submit:
1.A single, group technical report (there will be a length limit);
2.Individual contribution assessments (one page max) of (a) your own contribution to
the effort, and (b) the contributions of the other members to the effort; and
3.A single joint declaration signed by all group members declaring the fractions of
the report's grade that should be assigned to each group member. In a group of
size N, you will receive (15)(N)(p) points if your percentage was p. (If you cannot
agree, this joint declaration should state "We were unable to agree on the allocation
of effort" and your individual report should state what you think the percentages
should be.) It is important for your to work out in advance how you will divide up the
work on the project among yourselves so that you can aim for equal distribution of
the points. It is important to work out a schedule so that you can get everything done
-- don't put the main work off to the last minute because you can be severely
hampered by computer overloads that so frequently happen in last-minute rushes. It
is also important to keep your promises to your group members because otherwise
they will not sign an equal-distribution statement with you at the end. The project due
dates will not be postponed except for major emergencies (e.g., snow days or
machine unavailability).
If you encounter any breakdowns in the operation of your group, let me know
immediately so that I can help you solve the problem.
1) Goals of data mining and knowledge discovery (1h)
2) Data, information, knowledge, and knowledge operators (3h)
3) Databases, information systems, and knowledge bases (6h)
4) Statistics-based data mining methods (5-10h)
5) Machine learning-based, and other data mining methods (5-10h)
6) Knowledge application and management (4h)
7) Data and knowledge visualization (5-10h)
8) Systems and applications (6h)
9) Future directions (1h)
Grading policy:
50% project, 30% presentations and 20% participation in class
discussions
Office Hours:
Wednesdays 3:00- 4:00 or by appointment.

More Related Content

Similar to AI Data Mining Course

Collaborating Across Boundaries to Engage Journalism Students in Computationa...
Collaborating Across Boundaries to Engage Journalism Students in Computationa...Collaborating Across Boundaries to Engage Journalism Students in Computationa...
Collaborating Across Boundaries to Engage Journalism Students in Computationa...Kim Pearson
 
Collaborating Across Boundaries to Engage Journalism Students in Computationa...
Collaborating Across Boundaries to Engage Journalism Students in Computationa...Collaborating Across Boundaries to Engage Journalism Students in Computationa...
Collaborating Across Boundaries to Engage Journalism Students in Computationa...Kim Pearson
 
Course Summary - Dissertation I: Principles of Research and Writing
Course Summary - Dissertation I: Principles of Research and WritingCourse Summary - Dissertation I: Principles of Research and Writing
Course Summary - Dissertation I: Principles of Research and WritingE Rey Garcia, MPA, DCS-EIS Candidate
 
Resume It Industry Strath Address
Resume It Industry Strath AddressResume It Industry Strath Address
Resume It Industry Strath Addressdroussinov
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Joanne Luciano
 
Data Mining and Machine Learning
Data Mining and Machine LearningData Mining and Machine Learning
Data Mining and Machine LearningJakub Ruzicka
 
EDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action ResearchEDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action Researcheckchela
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 
ISSIP EDUCATION & RESEARCH SIG
ISSIP EDUCATION & RESEARCH SIGISSIP EDUCATION & RESEARCH SIG
ISSIP EDUCATION & RESEARCH SIGALessio Patatìn
 
Mini-Projects_Development_in_Computer_Science_-_St.pdf
Mini-Projects_Development_in_Computer_Science_-_St.pdfMini-Projects_Development_in_Computer_Science_-_St.pdf
Mini-Projects_Development_in_Computer_Science_-_St.pdfOmar Omar
 
Mini-Projects_Development_in_Computer_Science_-_St.pdf
Mini-Projects_Development_in_Computer_Science_-_St.pdfMini-Projects_Development_in_Computer_Science_-_St.pdf
Mini-Projects_Development_in_Computer_Science_-_St.pdfNipaPharma1
 
Modules module5mod5home.htmlmodule 5 homecomparing models
Modules module5mod5home.htmlmodule 5   homecomparing modelsModules module5mod5home.htmlmodule 5   homecomparing models
Modules module5mod5home.htmlmodule 5 homecomparing modelsPOLY33
 
The Topic of the Paper The Problem and Solution for Improving Cus.docx
The Topic of the Paper The Problem and Solution for Improving Cus.docxThe Topic of the Paper The Problem and Solution for Improving Cus.docx
The Topic of the Paper The Problem and Solution for Improving Cus.docxwsusan1
 
Statistics 695A: Machine Learning, Fall 2004
Statistics 695A: Machine Learning, Fall 2004Statistics 695A: Machine Learning, Fall 2004
Statistics 695A: Machine Learning, Fall 2004butest
 
Statistics 695A: Machine Learning, Fall 2004
Statistics 695A: Machine Learning, Fall 2004Statistics 695A: Machine Learning, Fall 2004
Statistics 695A: Machine Learning, Fall 2004butest
 

Similar to AI Data Mining Course (20)

Collaborating Across Boundaries to Engage Journalism Students in Computationa...
Collaborating Across Boundaries to Engage Journalism Students in Computationa...Collaborating Across Boundaries to Engage Journalism Students in Computationa...
Collaborating Across Boundaries to Engage Journalism Students in Computationa...
 
Collaborating Across Boundaries to Engage Journalism Students in Computationa...
Collaborating Across Boundaries to Engage Journalism Students in Computationa...Collaborating Across Boundaries to Engage Journalism Students in Computationa...
Collaborating Across Boundaries to Engage Journalism Students in Computationa...
 
Course Summary - Dissertation I: Principles of Research and Writing
Course Summary - Dissertation I: Principles of Research and WritingCourse Summary - Dissertation I: Principles of Research and Writing
Course Summary - Dissertation I: Principles of Research and Writing
 
Resume It Industry Strath Address
Resume It Industry Strath AddressResume It Industry Strath Address
Resume It Industry Strath Address
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020
 
Data Mining and Machine Learning
Data Mining and Machine LearningData Mining and Machine Learning
Data Mining and Machine Learning
 
EDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action ResearchEDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action Research
 
NUS PhD e-open day 2020
NUS PhD e-open day 2020NUS PhD e-open day 2020
NUS PhD e-open day 2020
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
ISSIP EDUCATION & RESEARCH SIG
ISSIP EDUCATION & RESEARCH SIGISSIP EDUCATION & RESEARCH SIG
ISSIP EDUCATION & RESEARCH SIG
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Mini-Projects_Development_in_Computer_Science_-_St.pdf
Mini-Projects_Development_in_Computer_Science_-_St.pdfMini-Projects_Development_in_Computer_Science_-_St.pdf
Mini-Projects_Development_in_Computer_Science_-_St.pdf
 
Mini-Projects_Development_in_Computer_Science_-_St.pdf
Mini-Projects_Development_in_Computer_Science_-_St.pdfMini-Projects_Development_in_Computer_Science_-_St.pdf
Mini-Projects_Development_in_Computer_Science_-_St.pdf
 
Modules module5mod5home.htmlmodule 5 homecomparing models
Modules module5mod5home.htmlmodule 5   homecomparing modelsModules module5mod5home.htmlmodule 5   homecomparing models
Modules module5mod5home.htmlmodule 5 homecomparing models
 
INFS 401 Spring 2016
INFS 401 Spring 2016INFS 401 Spring 2016
INFS 401 Spring 2016
 
The Topic of the Paper The Problem and Solution for Improving Cus.docx
The Topic of the Paper The Problem and Solution for Improving Cus.docxThe Topic of the Paper The Problem and Solution for Improving Cus.docx
The Topic of the Paper The Problem and Solution for Improving Cus.docx
 
III-1ece.pdf
III-1ece.pdfIII-1ece.pdf
III-1ece.pdf
 
Statistics 695A: Machine Learning, Fall 2004
Statistics 695A: Machine Learning, Fall 2004Statistics 695A: Machine Learning, Fall 2004
Statistics 695A: Machine Learning, Fall 2004
 
Statistics 695A: Machine Learning, Fall 2004
Statistics 695A: Machine Learning, Fall 2004Statistics 695A: Machine Learning, Fall 2004
Statistics 695A: Machine Learning, Fall 2004
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

AI Data Mining Course

  • 1. INFT 910 Advanced Topics in Artificial Intelligence DATA MINING and KNOWLEDGE DISCOVERY Ryszard S. Michalski Machine Learning and Inference Laboratory Department of Systems Engineering and Operations Research Department of Computer Science George Mason University Fairfax, USA and Institute of Computer Science Polish Academy of Sciences Warsaw, Poland Copyright©1998-2000 by R. S. Michalski
  • 2. INFT 910 Advanced Topics in Artificial Intelligence DATA MINING and KNOWLEDGE DISCOVERY Ryszard S. Michalski Email: michalski@gmu.edu Web: http://www.mli.gmu.edu./people/michalski.html Course description This course is concerned with the modern methods and systems for deriving user- oriented knowledge from large databases and other information sources, and applying this knowledge to support decision making. Information sources can be in numerical, textual, visual, or multimedia forms. The course covers theoretical and practical aspects of current methods and selected systems for data mining, knowledge discovery, and knowledge management, including those for text mining, multimedia mining, and web mining. The course is taught using a novel adaptive teaching method, in which the presentation level and the amount of time spent on different topics is adjusted according to the interests of the students in the particular class. This teaching method stresses teaching students how to learn on their own, encourages student’s initiative in learning, and motivates them to study deeper the topics most interesting to them through projects and individual reading. Students will learn the course topics through lectures, through reading of the assigned or selected by them materials, and individual presentations. In addition, students with different backgrounds will work on a group project in which they will complement each other in expertise and background, and learn skills of collaboration. They will also get hand-on experience with some of the state-of-the-art data mining and knowledge discovery systems
  • 3. Topics 1) Goals of data mining and knowledge discovery 2) Fundamental concepts: data, information, knowleldge 3) Databases, information systems, and knowledge bases 4) Statistics-based data mining methods 5) Machine learning-based, and other data mining methods 6) Knowledge application and management 7) Data and knowledge visualization 8) Systems and applications 9) Future directions Texts: Lecture Notes of the Instructor Supplementary Texts: Michalski, R.S., Bratko, I., Kubat, M., Machine Learning and Data Mining: Methods and Applications, John Wiley & Sons, 1998. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uhturusamy, R., Advances in Knowledge Discovery and Data Mining, AAAI Press/The MIT Press, 1996.
  • 4. Additonal texts: Agrawal, R., Stolorz, P. and Piatesky-Shapiro G., Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, AAAI Press, New Yourk, August 27-31, 1998. Sharma, S., Applied Multivariate Techniques, John Wiley & Sons, Inc., 1996. Grinstein Georges, Andreas Wierse, Usama Fayyad, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, (KDD-97), Newport Beach, CA, August 14-17, 1997. Pawlak, Z. Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, 1991. Simoudis, E. Han J. and Fayyad, U. (eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, August 2-4,1996. TsumotoS., Kobayashi S., Yomomori T., Tanaka H., Proceedidings of the Fourth International Workshop on Rough Sets, Fuzzy Sets and Machine Discovery, The Univeristy of Tokio, November 6-8, 1996. Ziarko W. P., Rough Sets, Fuzzy Sts and Knowledge Discovery, Springer Verlag, 1993. Weiss, S. M., Kulikowski, C.,Computers that Learn, Morgan Kaufman Publishers, 1991. Bock , H.H. (ed.), Classification and Related Methods of Data Analysis, 1987. Journals and Proceedings Machine Learning, Kluwer Academic Publishers Artificial Intelligence Journal, North Holland Proceedings of Machine Learning Workshops, No. 2-8, Morgan Kaufmann Publishers (83 - U. of Illinois, 85 - Rutgers U., 87 - U. of California at Irvine, 88 - U. of Michigan at Ann Arbor, 89 - Cornell UY., 90- U. of Texas; 91-Northwestern University; 92 - University of Aberdeen). Proceedings of the IJCAI Conferences, (87-Milano, Italy; 89-Detroit; 91-Sydney, Australia; 93-Montpelier, France; 95-Montreal, Canada; 97-Nagoya, Japan ). Proceedings of the AAAI Conferences (e.g. 88- St. Paul, Minnesota, 97- Providence, RI) Copyright © Ryszard S. Michalski, 1999-2000
  • 5. Grading policy: 50% project, 30% presentations and 20% participation in class discussions Office Hours: Wednesdays 3:00- 4:15 or by appointment. Grading policy Presentations and participation in class discussions count for 20% Homeworks (assigned/voluntary) count for 20% Experimental project and report count for 60% Grading on each of the above items will be on the scale 0-10. The final examination in the form of the project presentation. Office Hours Wednesdays 3:00-4:15 or by appointment Room 411, SITE 2 Computer Access • To activate your computer account at GMU, connect to 'mason', type 'accounts' and press enter at the login prompt, press enter at the password prompt, and follow the remaining instructions. You can also connect to the Web page 'iso.gmu.edu' for this procedure. Once you have your GMU account, you can get one on SITE: Login to 'mason' and type "sitereg". You will be prompted for your GMU id and will be allowed to create a SITE account on line. • In the case that you will be working on a project requiring resources MLI Laboratory, you will be able to get an account on the laboratory computers {Contact Ken Kaufman (kaufman@aic.gmu.edu) for an account}. Groups and collaboration
  • 6. Early at the beginning of the course, you will form study/project groups. You should meet with your study group once a week as part of your normal class work. Another part of the normal class work is individual reading of the material relevant to the topics covered in the class. In your study/projectYou should discuss questions that you may have regarding the material covered in the class, or any other relevant material that you may have learned from reading any material relevant to the topic of the class. Except when group projects are explicitly declared, you must write your own individual report for each assignment. You will learn much more working with your group than you would working alone. It is important to acknowledge the sources of your information -- name the persons with whom you collaborated, cite sections from books or articles if you use them, etc. In short, collaborate freely, acknowledge all help and sources, and write your own individual homework reports. Your study group will also function as a project team in the projects that you will work on. PROJECTS There will be two projects involving experiments in the SITE lab. They will include C or Java programming. The first project will be a lab assignment in concurrent programming. The second project will be an experimental study of an operating system component. For each project, your team will submit: 1.A single, group technical report (there will be a length limit); 2.Individual contribution assessments (one page max) of (a) your own contribution to the effort, and (b) the contributions of the other members to the effort; and 3.A single joint declaration signed by all group members declaring the fractions of the report's grade that should be assigned to each group member. In a group of size N, you will receive (15)(N)(p) points if your percentage was p. (If you cannot agree, this joint declaration should state "We were unable to agree on the allocation of effort" and your individual report should state what you think the percentages should be.) It is important for your to work out in advance how you will divide up the work on the project among yourselves so that you can aim for equal distribution of the points. It is important to work out a schedule so that you can get everything done -- don't put the main work off to the last minute because you can be severely hampered by computer overloads that so frequently happen in last-minute rushes. It is also important to keep your promises to your group members because otherwise they will not sign an equal-distribution statement with you at the end. The project due dates will not be postponed except for major emergencies (e.g., snow days or machine unavailability). If you encounter any breakdowns in the operation of your group, let me know immediately so that I can help you solve the problem.
  • 7. 1) Goals of data mining and knowledge discovery (1h) 2) Data, information, knowledge, and knowledge operators (3h) 3) Databases, information systems, and knowledge bases (6h) 4) Statistics-based data mining methods (5-10h) 5) Machine learning-based, and other data mining methods (5-10h) 6) Knowledge application and management (4h) 7) Data and knowledge visualization (5-10h) 8) Systems and applications (6h) 9) Future directions (1h)
  • 8. Grading policy: 50% project, 30% presentations and 20% participation in class discussions Office Hours: Wednesdays 3:00- 4:00 or by appointment.