Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Course Policy CSCI 4370/5370 Data Mining Computer Science Department University of Central Arkansas Fall 2009 Catalog description This course introduces the basic concepts, principles, and the state-of-the-art technologies for Data Mining including Introduction of Data Mining, Data Preprocessing, Data Warehouse, Association Rules, Classification, and Clustering. Specific applications in financial data and Bioinformatics are included. Prerequisite: CSCI 3360 Database Systems Course goal: Introduce concepts, principles, technologies and practice of knowledge discovery and data mining. Objectives: Upon the completion of this course, the student will be able to • Master the key concepts and principles employed in Data Mining • Specify the relations between Data Mining, Data Warehouse, and Database Systems. • Mine frequent patterns, associations, and correlations from different type of data • Apply classification techniques for rule extraction and prediction. • Form meaningful clusters and explain the clustering results. • For graduate students (CSCI 5370), apply advanced Data Mining techniques to their research topic and generate quality results. • For graduate students (CSCI 5370), demonstrate an ability to work with undergraduate students as a team leader and guide the team to complete the final project. Textbook • Data Mining: Concepts and Techniques, Micheline Kamber and Jiawei Han, 2nd ed., Morgan Kaufmann, 2005 Reference • Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Prentice Hall, 2006 • Data Mining: Practical Machine Learning Tools and Techniques, Ian H. Witten and Eibe Frank ; Morgan Kaufmann, 2005. • Other handouts and selected papers Course Description This course includes the following major topics: • Introduction: The basic architecture of a data mining system is described and a brief introduction to the concepts of database systems and data warehouses is provided. • Data Preprocessing: Using techniques for preprocessing the data prior to mining are described methods of data cleaning, data integration and transformation. • Data Warehouse: An introduction to data warehouse and OLAP (Online Analytical Processing) is provided. Topics include the concept of data warehouses and multidimensional databases, the construction of data cubes, and the relationship between data warehousing and data mining. • Association Rules: An introduction to this topic including a classification of association rules, a presentation of the basic Apriori algorithm and its variations, techniques for mining multi-level association rules, multi-dimensional association rules, and correlation rules. • Classification: A description of methods for data classification and prediction, including decision tree, Bayesian classification, Neural Networks, K-nearest neighbor, genetic algorithm, and fuzzy set approaches is provided. 1
  2. 2. • Clustering: A description of methods of cluster analysis is given. This topic first introduces the concept of data clustering, and then presents several major data clustering approaches. Course Grade Undergraduate (CSCI 4370) • Midterm exam 20 % • Final exam 20 % • Homework Assignment 25 % • Semester project 30 % • Class Participation 5% Graduate (CSCI 5370) • Midterm exam 15 % • Final exam 15 % • Homework Assignment 15 % • Semester project 50 % • Class Participation 5% Students are expected to read and present several papers on various topics involving current techniques. The semester project would involve researching and writing a 6-8 page IEEE format paper for each team. Graduate students are responsible for leading and guiding the team to complete the final project. The objective is to get acquainted with reading scientific papers in the area of Data Mining, to practice scientific writing, and to do state-of-the-art research of one particular topic. The target quality of the research papers initiated in the class is expected to be good enough for submission to Data Mining professional conferences with appropriate further modification. Your numeric score will be translated to a letter grade at the end of the semester according to the table below. Numeric Score Letter Grade 90 – 100 A 80 – 89 B 70 – 79 C 60 – 69 D 0 - 59 F Statement on Academic dishonesty/plagiarism: Academic misconduct is defined in the section of Academic Policies in your Student Handbook. Students who engage in such misconduct will be penalized. You are encouraged to familiarize with all policies listed in the Student Handbook The University of Central Arkansas adheres to the requirements of the Americans with Disabilities Act. If you need an accommodation under this Act due to a disability, please contact the UCA Office of Disability Services, 450-3135 Dr. Bernard Chen, Ph.D. Assistant Professor Computer Science Department University of Central Arkansas 2
  3. 3. CSCI 3360: Database Systems (Fall 2009) Class Policy Instructor: Dr. Bernard Chen Office: MCST 304, Email: Class Schedule: Mon Wed Fri 4:00 PM – 4:50 PM, MCS 328 Office Hours: 9:00 am- 10:00 am and 11:00 am-1:00 pm, Monday, Wednesday, Friday; 2:00 pm-4:00 pm Monday, Wednesday or by appointment Extra instruction is available and encouraged when your own attempts to understand the subject matter are unsuccessful. Come prepared with specific questions or areas to be discussed. Attendance and Drop Policy 1. Attendance is mandatory. Attendance will be taken in the form of a short answer related to the class. If you are absent on a day when homework, lab assignments or programming projects are due, you will automatically forfeit any points assigned; the course assignment late-policy shall not apply. In addition, missed in-class daily work, quizzes and exams cannot be made-up. If you do not attend class, you automatically forfeit any points given that day. Only Exceptions: a. School related functions such as band, orchestra, sports events, etc. A note from the coach, instructor, supervisor, etc. must be provided. Any homework, lab assignments, or programming projects due during the planned absence must be turned in to the instructor prior to the missed class, unless prior approval is obtained from your instructor (via written request) to submit the work after you return. Any missed exams must be made-up by the first class-day following the return from such an excused event. b. Medically related absence. For all medically related absences proper documentation from a physician including the physician’s name and phone number included on the document must be provided. c. Family related emergency. Such emergencies must involve an immediate family member (father, mother, brother, sister) or other member identified in advance to the professor. 2. It is the student’s responsibility to find out any information they missed due to an absence. 3. The students are allowed to miss three classes during one semester. However, if the students miss more than 3 classes, for each class the students missed, it will result in one point reduction from the final score of the class. 4. If the students absent from the class for consecutive two weeks (4 classes for T,TH class; 6 classes for M,W,F classes), the students will receive a “W” without notice. 5. All computers and cell phones need to be shut down during the class. If the computer or cell phone is turned on when it is not necessary, students will be considered absent for the class. Homework Policy 1. Homework shall be submitted on the date due. NO LATE ASSIGNMENTS SHALL BE ACCEPTED. 2. Unless specifically stated otherwise, you may collaborate on homework; however, the work submitted must reflect the individual effort of the person presenting the work. 3. If it is necessary for a student to be absent, it is still their responsibility to determine if there are any changes in assignment due dates, schedule changes, etc. and to submit all assignments when due. 4. Save all work on a floppy diskette or USB flash memory device for back-up purposes. (The computers on campus are reloaded periodically and anything you leave on them will be erased.) 5. In case of a discrepancy in recorded grades, it is suggested that each student keep a portfolio of his/her graded work. 3
  4. 4. Exam Policy 1. Missing an exam is a very serious matter. There are only 3 valid reasons for missing an exam (see Attendance and Drop Policy above): a. School related functions such as band, orchestra, sports events, etc. A note from the coach, instructor, supervisor, etc. must be provided. b. An illness which requires a doctor's care (you must provide documentation from your physician for the absence, which includes the physician’s name and phone number.) c. A documented family emergency such as a death or surgery. 2. Make-up tests will be conducted at the instructor’s discretion. Classroom / Lab Conduct 1. No food/drink in the classroom or lab. 2. No cell phone use in the classroom or lab (talking, texting, calculating, etc.). 3. No music/pornography in the classroom or lab. 4. Students must be provided with an environment conducive to learning. Disturbance of class by inappropriate talking, laughing, being loud, inappropriate images on your computer screen, etc. shall result in the student’s dismissal from the class. 5. Class and lab time are to be devoted to learning the material outlined in the course policy and syllabus. This time shall not be utilized for checking email, visiting FaceBook or MySpace sites, or engaging in chat or any other non-course related activities. Violation of this policy shall result in the student’s dismissal from the class. Academic Misconduct 1. The conduct of students in this course is expected to be in compliance with the ethical standards detailed on pages 40-41 of the UCA 2006-2007 Student Handbook in the section entitled “Definition of Academic Misconduct”. 2. Dishonesty in any form – including plagiarism, turning in assignments prepared by others, unauthorized possession of exams, copying assignments from other student’s work/storage media, allowing other students to copy or view your work – shall result in the student being penalized for the violation; such penalty may result in that student being dismissed from the course and assigned an “F” at the end of the semester. If assignments are copied, both students involved will be penalized equally. University Policies It is important that you familiarize yourself with the university policies described in the UCA 2006-2007 Student Handbook. a. Computer Use Policy: Refer to the section starting on page 31 of the UCA 2006-2007 Student Handbook. b. Sexual Harassment Policy: Refer to the section starting on page 117 of the UCA 2006-2007 Student Handbook. c. Academic Policies: Refer to the section starting on page 38 of the UCA 2006-2007 Student Handbook. Disabilities The University of Central Arkansas adheres to the requirements of the Americans with Disabilities Act. If you need an accommodation under this Act due to a disability, please contact the UCA Office of Disability Services at 450-3135. Dr. Bernard Chen, Ph.D. Assistant Professor Computer Science Department University of Central Arkansas 4
  5. 5. CSCI 4370/5370 Data Mining ‘09 Week Topic 1 Introduction to Data Mining 2 Data Preprocessing • Data Cleaning • Data Integration and Transformation • Data Reduction 3– 4 Data Warehouse • Multidimensional Data Model • Data Warehouse Architecture • Data Warehouse Implementation • From Data Warehouse to Data Mining 5 Semester Project Discussion 6– 8 Mining Frequent Patterns, Associations, and Correlations • Basic Concepts and a Road Map • Efficient and Scalable Frequent Itemset Mining Method • Mining Various Kinds of Association Rules • From Association Mining to Correlation Analysis • Constrain-Based Association Mining 9 – 11 Classification and Prediction • Decision Tree • Bayesian Classification • Rule-Based Classification • Classification by Back-propagation • SVM • Associative Classification • Lazy Learners 12– 14 Clustering • Cluster Analysis • Partition Methods • Hierarchical Methods • Density-Based Methods • Grid-Based Method • Model-Based Methods • Clustering High-Dimensional Data 15 Semester Project Presentation NOTE: This syllabus represents a general plan for the course and deviations from this plan may be necessary during the duration of the course. 5