2. INFT 910
Advanced Topics in Artificial Intelligence
DATA MINING and KNOWLEDGE DISCOVERY
Ryszard S. Michalski
Email: michalski@gmu.edu
Web: http://www.mli.gmu.edu./people/michalski.html
Course description
This course is concerned with the modern methods and systems for deriving user-
oriented knowledge from large databases and other information sources, and applying
this knowledge to support decision making. Information sources can be in numerical,
textual, visual, or multimedia forms. The course covers theoretical and practical aspects
of current methods and selected systems for data mining, knowledge discovery, and
knowledge management, including those for text mining, multimedia mining, and web
mining.
The course is taught using a novel adaptive teaching method, in which the presentation
level and the amount of time spent on different topics is adjusted according to the
interests of the students in the particular class. This teaching method stresses teaching
students how to learn on their own, encourages student’s initiative in learning, and
motivates them to study deeper the topics most interesting to them through projects
and individual reading.
Students will learn the course topics through lectures, through reading of the assigned or
selected by them materials, and individual presentations. In addition, students with
different backgrounds will work on a group project in which they will complement each
other in expertise and background, and learn skills of collaboration. They will also get
hand-on experience with some of the state-of-the-art data mining and knowledge
discovery systems
3. Topics
1) Goals of data mining and knowledge discovery
2) Fundamental concepts: data, information, knowleldge
3) Databases, information systems, and knowledge bases
4) Statistics-based data mining methods
5) Machine learning-based, and other data mining methods
6) Knowledge application and management
7) Data and knowledge visualization
8) Systems and applications
9) Future directions
Texts:
Lecture Notes of the Instructor
Supplementary Texts:
Michalski, R.S., Bratko, I., Kubat, M., Machine Learning and Data Mining: Methods
and Applications, John Wiley & Sons, 1998.
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uhturusamy, R., Advances in
Knowledge Discovery and Data Mining, AAAI Press/The MIT Press, 1996.
5. Grading policy:
50% project, 30% presentations and 20% participation in class
discussions
Office Hours:
Wednesdays 3:00- 4:15 or by appointment.
Grading policy
Presentations and participation in class discussions count for 20%
Homeworks (assigned/voluntary) count for 20%
Experimental project and report count for 60%
Grading on each of the above items will be on the scale 0-10.
The final examination in the form of the project presentation.
Office Hours
Wednesdays 3:00-4:15 or by appointment
Room 411, SITE 2
Computer Access
• To activate your computer account at GMU, connect to 'mason', type 'accounts'
and press enter at the login prompt, press enter at the password prompt, and
follow the remaining instructions. You can also connect to the Web page
'iso.gmu.edu' for this procedure. Once you have your GMU account, you can get
one on SITE: Login to 'mason' and type "sitereg". You will be prompted for your
GMU id and will be allowed to create a SITE account on line.
• In the case that you will be working on a project requiring resources MLI
Laboratory, you will be able to get an account on the laboratory computers
{Contact Ken Kaufman (kaufman@aic.gmu.edu) for an account}.
Groups and collaboration
6. Early at the beginning of the course, you will form study/project groups. You should
meet with your study group once a week as part of your normal class work. Another
part of the normal class work is individual reading of the material relevant to the
topics covered in the class. In your study/projectYou should discuss questions that
you may have regarding the material covered in the class, or any other relevant
material that you may have learned from reading any material relevant to the topic of
the class.
Except when group projects are explicitly declared, you must write your own
individual report for each assignment. You will learn much more working with your
group than you would working alone. It is important to acknowledge the sources of
your information -- name the persons with whom you collaborated, cite sections from
books or articles if you use them, etc. In short, collaborate freely, acknowledge all
help and sources, and write your own individual homework reports. Your study group
will also function as a project team in the projects that you will work on.
PROJECTS
There will be two projects involving experiments in the SITE lab. They will include C
or Java programming. The first project will be a lab assignment in concurrent
programming. The second project will be an experimental study of an operating
system component. For each project, your team will submit:
1.A single, group technical report (there will be a length limit);
2.Individual contribution assessments (one page max) of (a) your own contribution to
the effort, and (b) the contributions of the other members to the effort; and
3.A single joint declaration signed by all group members declaring the fractions of
the report's grade that should be assigned to each group member. In a group of
size N, you will receive (15)(N)(p) points if your percentage was p. (If you cannot
agree, this joint declaration should state "We were unable to agree on the allocation
of effort" and your individual report should state what you think the percentages
should be.) It is important for your to work out in advance how you will divide up the
work on the project among yourselves so that you can aim for equal distribution of
the points. It is important to work out a schedule so that you can get everything done
-- don't put the main work off to the last minute because you can be severely
hampered by computer overloads that so frequently happen in last-minute rushes. It
is also important to keep your promises to your group members because otherwise
they will not sign an equal-distribution statement with you at the end. The project due
dates will not be postponed except for major emergencies (e.g., snow days or
machine unavailability).
If you encounter any breakdowns in the operation of your group, let me know
immediately so that I can help you solve the problem.
7. 1) Goals of data mining and knowledge discovery (1h)
2) Data, information, knowledge, and knowledge operators (3h)
3) Databases, information systems, and knowledge bases (6h)
4) Statistics-based data mining methods (5-10h)
5) Machine learning-based, and other data mining methods (5-10h)
6) Knowledge application and management (4h)
7) Data and knowledge visualization (5-10h)
8) Systems and applications (6h)
9) Future directions (1h)
8. Grading policy:
50% project, 30% presentations and 20% participation in class
discussions
Office Hours:
Wednesdays 3:00- 4:00 or by appointment.