Course outline

533 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
533
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Course outline

  1. 1. CS/CMPE 536 –Data Mining Outline
  2. 2. Description <ul><li>A comprehensive introduction to the concepts and techniques in data mining </li></ul><ul><ul><li>data mining process – its need and motivation </li></ul></ul><ul><ul><li>data mining tasks and functionalities </li></ul></ul><ul><ul><li>association mining </li></ul></ul><ul><ul><li>cluster mining </li></ul></ul><ul><ul><li>Web mining </li></ul></ul><ul><ul><li>text mining </li></ul></ul><ul><ul><li>evaluation of DM tools and programming of algorithms in C/C++/Java </li></ul></ul><ul><li>Emphasis on concept building, algorithm evaluation, and applications </li></ul>
  3. 3. Goals <ul><li>To provide a comprehensive introduction to data mining </li></ul><ul><li>To develop conceptual and theoretical understanding of the data mining process </li></ul><ul><li>To provide hands-on experience in the implementation and evaluation of data mining algorithms and tools </li></ul><ul><li>To develop interest in data mining research </li></ul>
  4. 4. After Taking this Course… <ul><li>You should be able to … </li></ul><ul><li>understand the need and motivation for data mining </li></ul><ul><li>understand the characteristics of different data mining tasks </li></ul><ul><li>decide what data mining task and algorithm to use for a given problem/data set </li></ul><ul><li>implement and evaluate data mining solutions </li></ul><ul><li>use commercially available DM tools </li></ul>
  5. 5. Before Taking This Course… <ul><li>You should be comfortable with… </li></ul><ul><li>Data structures and algorithms! </li></ul><ul><ul><li>CS213 is a prerequisite </li></ul></ul><ul><ul><li>You should be comfortable with algorithm descriptions and implementations in a high-level programming language </li></ul></ul><ul><li>Databases </li></ul><ul><ul><li>Understanding of the database concept and familiarity with database terms and terminology </li></ul></ul><ul><ul><li>CS341 is recommended, not required </li></ul></ul><ul><li>Basic math background </li></ul><ul><ul><li>Algebra, calculus, etc </li></ul></ul><ul><li>Programming in a high-level language </li></ul><ul><ul><li>C/C++ or Java </li></ul></ul>
  6. 6. Grading <ul><li>Points distribution </li></ul><ul><li>Quizzes (~ 6) 10% </li></ul><ul><li>Assignments (hand + computer) 15% </li></ul><ul><li>Project 15% </li></ul><ul><li>Midterm exam 25% </li></ul><ul><li>Final exam (comprehensive) 35% </li></ul>
  7. 7. Policies (1) <ul><li>Quizzes </li></ul><ul><ul><li>Most quizzes will be announced a day or two in advance </li></ul></ul><ul><ul><li>Unannounced quizzes are also possible </li></ul></ul><ul><li>Sharing </li></ul><ul><ul><li>No copying is allowed for assignments. Discussions are encouraged; however, you must submit your own work </li></ul></ul><ul><ul><li>Violators can face mark reduction and/or reported to Disciplinary Committee </li></ul></ul><ul><li>Plagiarism </li></ul><ul><ul><li>Do NOT pass someone else’s work as yours! Write in your words and cite the reference. This applies to code as well. </li></ul></ul>
  8. 8. Policies (2) <ul><li>Submission policy </li></ul><ul><ul><li>Submissions are due at the day and time specified </li></ul></ul><ul><ul><li>Late penalties: 1 day = 10%; 2 day late = 20%; not accepted after 2 days </li></ul></ul><ul><ul><li>An extension will be granted only its need is established and when requested several days in advance. </li></ul></ul><ul><li>Classroom behavior </li></ul><ul><ul><li>Maintain classroom sanctity by remaining quiet and attentive </li></ul></ul><ul><ul><li>If you have a need to talk and gossip, please leave the classroom so as not to disturb others </li></ul></ul><ul><ul><li>Dozing is allowed provided you do not snore loud  </li></ul></ul>
  9. 9. Project <ul><li>Design, implementation and evaluation of a data mining solution </li></ul><ul><li>You may choose a problem of your liking (after consultation with me) or select one suggested by me </li></ul><ul><li>You may do the project in groups (of 2) </li></ul><ul><li>Start thinking about the project now </li></ul>
  10. 10. Summarized Course Contents <ul><li>Introduction and motivation </li></ul><ul><li>The data mining process – tasks and functionalities </li></ul><ul><li>Data preprocessing for data mining – data cleaning, reduction, summarization, normalization, etc </li></ul><ul><li>Mining frequent patterns and associations – algorithms and applications </li></ul><ul><li>Mining by clustering – algorithms and applications </li></ul><ul><li>Mining Web data </li></ul><ul><li>Intro to text mining </li></ul>
  11. 11. Course Material <ul><li>Required textbook </li></ul><ul><ul><li>Data Mining: Concepts and Techniques, Han and Kamber, Second Edition , 2006 </li></ul></ul><ul><li>Supplementary material </li></ul><ul><ul><li>Introduction to Data Mining, Tan et al., Addison-Wesley, 2006. </li></ul></ul><ul><ul><li>Web Data Mining, B. Liu, Sprinter, 2006. </li></ul></ul><ul><ul><li>Handouts (as and when necessary) </li></ul></ul><ul><li>Other resources </li></ul><ul><ul><li>Books in library </li></ul></ul><ul><ul><li>Web (e.g. wikipedia) </li></ul></ul>
  12. 12. Course Web Site <ul><li>For announcements, lecture slides, handouts, assignments, quiz solutions, web resources: </li></ul><ul><li>http://suraj.lums.edu.pk/~cs536a08/ </li></ul><ul><li>The resource page has links to information available on the Web. It is basically a meta-list for finding further information. </li></ul>
  13. 13. Other Stuff <ul><li>How to contact me? </li></ul><ul><ul><li>Office hours: 12.00 to 13.20 TR (office: 429) </li></ul></ul><ul><ul><li>E-mail: [email_address] </li></ul></ul><ul><ul><li>By appointment: outside office hours e-mail me for an appointment before coming </li></ul></ul><ul><li>Philosophy </li></ul><ul><ul><li>Knowledge cannot be taught; it is learned. </li></ul></ul><ul><ul><li>Be excited. That is the best way to learn. I cannot teach everything in class. Develop an inquisitive mind, ask questions, and go beyond what is required. </li></ul></ul><ul><ul><li>I don’t believe in strict grading. But… there has to be a way of rewarding performance. </li></ul></ul>
  14. 14. Reference Books in LUMS Library (1) <ul><li>Data Mining: Introductory and Advanced Topics, Dunham, Pearson Education, 2003. </li></ul><ul><li>Data Mining: Concepts, Models, Methods, and Algorithms, Mehmed Kantardzic, 006.3 K167D, 2003. </li></ul><ul><li>Principles of Data Mining, Hand and Mannila, 006.3 H236P, 2001. </li></ul><ul><li>The elements of statistical learning; data mining, inference, and prediction , Tervor Hastie, Robert Tibshirani and Jerome Friedman, 006.31 H356E 2001. </li></ul><ul><li>Data mining and uncertain reasoning;an integrated approach, Zhengxin Chen, 006.321 C518D 2001. </li></ul><ul><li>Graphical models; methods for data analysis and mining, Christian Borgelt and Rudolf Kruse, 006.3 B732G 2001. </li></ul><ul><li>Information visualization in data mining and knowledge discovery, Usama Fayyad (ed.), 006.3 I434 2002. </li></ul><ul><li>Intelligent data warehousing;from data preparation to data mining, Zhengxin Chen, 005.74 C518I 2002. </li></ul><ul><li>Machine learning and data mining;methods and applications, Michalski, Ryszard S., ed.;Bratko, Ivan, ed.;Kubat, Miroslav, ed., 006.31 M149 1999. </li></ul>
  15. 15. Reference Books in LUMS Library (2) <ul><li>Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Witten et al., Morgan Kaufmann, 006.3 W829D, 2005. </li></ul><ul><li>Managing and mining multimedia databases , Bhavani Thuraisingbam, 006.7 T536M 2001. </li></ul><ul><li>Mastering data mining;the art and science of customer relationship management, J.A. Michael Berry and Gordon Linoff, 006.3 B534M 2000. </li></ul><ul><li>Data mining explained;a manager's guide to customer-centric business intelligence, Rhonda Delmater and Monte Hancock, 006.3 D359D 2001. </li></ul><ul><li>Data mining solutions;methods and tools for solving real-world problems, Christopher Westphal and Teresa Blaxton, 006.3 W537D 1998. </li></ul>

×