Course Design Document IS424: Data Mining and Business Analytics
Upcoming SlideShare
Loading in...5
×
 

Course Design Document IS424: Data Mining and Business Analytics

on

  • 2,410 views

 

Statistics

Views

Total Views
2,410
Views on SlideShare
2,410
Embed Views
0

Actions

Likes
0
Downloads
15
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Course Design Document IS424: Data Mining and Business Analytics Course Design Document IS424: Data Mining and Business Analytics Document Transcript

  • Course Design Document IS424: Data Mining and Business Analytics Version 1.2 29 December 2009 IS424 – Data Mining and Business Analytics Page 1
  • Table of Contents 1. Revision History ............................................................................................................. 3 2. Overview of the Data Mining and Business Analytics Course....................................... 4 3. Output and Grading Summary ........................................................................................ 5 4. Learning Outcomes, Achievement Methods and Assessment........................................ 6 5. Classroom Planning ........................................................................................................ 9 6. Course Schedule Summary ............................................................................................. 9 7. List of Information Resources and References............................................................. 10 8. Software Tools .............................................................................................................. 12 9. Weekly Plan .................................................................................................................. 12 IS424 – Data Mining and Business Analytics Page 2
  • 1. Revision History Version Description of Changes Author Date V1.0 Ee-Peng Lim and 10-09-2008 Swapna G. V1.1 Ee-Peng Lim and 26-12-2008 Swapna G. V1.2 Feida Zhu 29-12-2009 IS424 – Data Mining and Business Analytics Page 3
  • 2. Overview of the Data Mining and Business Analytics Course 2.1 Synopsis Data mining consists of a wide range of data analysis techniques that can be applied to large datasets to discover patterns, trends and other forms of knowledge embedded in the data. In the commercial world, data mining is often conducted on enterprise data stored in relational databases to help managers make informed decisions so as keep businesses competitive and attuned to changing market conditions. With the recent advances in data generation and collection, new data types such as text, Web, spatial, and temporal data have emerged creating new opportunities for mining knowledge from data for business intelligence. This course provides an introduction to the fundamental issues and basic techniques of data mining. The topics covered include data mining process, data preprocessing, data mining techniques and data mining evaluation. In particular, the use of data mining in support business intelligence and decision making will be covered through labs, projects and case studies. Students are expected to learn data mining and its use in business intelligence through acquiring the basic data mining concepts and techniques, using them to explore data, and deriving useful knowledge patterns from the data through hands-on programming and experimentation that involve some industry strength data mining software packages. 2.2 Prerequisites IS202 Data Management or equivalent IS201 Object Oriented Application Development (Java programming) 2.3 Objectives Through this course, students will: • Gain an understanding of basic data mining applications and techniques. • Learn how to preprocess data before applying data mining techniques. • Explore the use of data mining techniques on different datasets using software packages. • Learn how to visualize the discovered patterns. • Learn how to evaluate the data mining performance. • Study case studies of applications using data mining. 2.4 Basic Modules Basic modules can be found in below figure IS424 – Data Mining and Business Analytics Page 4
  • Business Intelligence and Social Network Text Processing Applications Related Applications Case Studies, Data Mining Text DataText Mining Best Practices Techniques Retrieval Evaluation Mining & Techniques Innovations Techniques Techniques 3. Output and Grading Summary In order to evaluation teaching quality and learning result, different kind of assessment methods is used for this course. The detailed information is as below. Week Date Output Assessments Individual Group Weighting Weighting 1 2 3 Quiz 3% 4 5 Quiz 3% 6 7 Quiz 3% Project (Proposal) 5% 8 R E C E S S 9 Quiz 3% 10 11 Quiz 3% 12 Project (Final Report) 15% 13 Project (Presentation) 10% 14 15 Final Exam 40% Lab Assignments 5% Class Participation Contribution to Peer 10% Learning Total 70% 30% . 3.1 Participation (10%) IS424 – Data Mining and Business Analytics Page 5
  • • In-class discussion and critique to other teams’ paper presentations/projects: 5% • Contribution to the learning of the class and lecture notes scribing: 5% 3.2 Assignment (20%) • Quiz: 15% • Lab Assignment: 5% 3.3 Project (30%) The project is intended to complement the class materials, by getting students to investigate selected topics in greater depth or breadth. The project should be done in teams of 4-5 students. • Project proposal: 5% (1 to 2 pages) • Project Presentation: 10% (20 mins) • Final report: 15% (10 to 20 pages) 3.4 Exam (40%) • The final exam: 40%. 4. Learning Outcomes, Achievement Methods and Assessment IS424 – Data Mining and Business Tasks to Achieve Outcomes Method of Outcome Assessment Analysis (Assessment Methods will be developed in next phase of detailed course design) Integration of Business & Technology 1 in a sector context Understand the business value of Paper Presentations of case of data mining and business 1. Business IT Value Linkage skills YY studies analytics, and how technology can Group Project be used to create this value 2. Cost & Risk Analysis skills Examples, exercises and assignments will draw from real problems in specific industries E.g., Banking/Financial Services, Paper Presentations of case 3. Technology Application skills in a YY Retail/Hospitality/Entertainment, studies particular Sector Telecommunications Lab assignments IT architecture, design and 2 development skills 1. System Requirements Specification skills IS424 – Data Mining and Business Analytics Page 6
  • Students will learn how to architect 2. Software and IT architecture and design solutions using Y Group Project analysis and Design skills established “building block” applications and components Students will develop, configure Lab Assignments 3. Implementation skills Y and validate working solutions Group Project Students will do assignments, labs and projects that taken from the 4. Technology Application skills YY context of how business analytics Group Project are used in selected industry sectors and business functions 3 Project Management skills 1. Scope & Requirement Management skills 2. Risks Management skills 3. Project Integration and Time Management skills 4. Configuration Management skills 5. Quality Management skills 4 Learning to Learn skills Students are given problems where they will have to go beyond the materials and references given in class. They will have to Lab assignments 1. Search skills YY systematically search to find more Group Project information that will be required to execute their assignments, labs and projects. Students are given opportunity to 2. Skills for developing a methodology learn on their own when working Paper Presentation YY for learning on the assignments and class Group project exercises 5 Collaboration (or Team) skills: 1. Skills to improve the effectiveness of group processes and work products Change management skills for 6 enterprise systems 1. Skills to diagnose business changes 2. Skills to implement and sustain business changes Skills for working across countries, 7 cultures and borders 1. Cross-national Awareness skills IS424 – Data Mining and Business Analytics Page 7
  • Includes how to distribute the 2. Business across Countries Y business analytic results throughout Facilitation skills a globally distributed enterprise 8 Communication skills Students will present their solutions Paper Presentation 1. Presentation skills YY and results, will have their Group Project presentations critiqued. Students will also submit written Project Proposal 2. Writing skills YY summaries of their assignments, Group Project labs and projects. Y : This sub-skill is covered partially by the course YY : This sub-skill is a main focus for this course IS424 – Data Mining and Business Analytics Page 8
  • 5. Classroom Planning Each week there will be three hours of lectures during which theory, practical demonstrations and case-studies will be presented. Each class is split into multiple sessions according to the schedule given in Section 6. In general, the first session is used for lectures, while the second session is for labs, and in class discussions. However, there may be variations from week to week as appropriate. In addition, weekly consultation session is available to answer student questions or enquiries about the course during teaching sessions. Before final examination, extra consultation time will be allocated and detail will be announced in class. All important announcements will be posted to the Course Vista Notice board. Urgent announcements will also be mailed to all members of the class. 6. Course Schedule Summary Week Topic Lab Reading 1 Course Overview (1 hr) B1.Ch 1 Introduction to data mining (2 hrs) 2 What is data? (2 hrs) Intro to SAS Enterprise Miner lab B1.Ch 2 Project Initiation (1 hr) 3 Data Exploration (2 hr) Data Sampling techniques lab (1 B1.Ch 3 hr) 4 Classification: Decision tree (3 hr) B1.Ch 4 5 Classification: Rule-based, Nearest Data exploration lab (1 hrs) B1.Ch 5 Neighbor and Bayesian classifiers (2 hrs) 6 Clustering: K-means clustering Classification Lab (2 hrs) (1 hr) B1.Ch 8 7 Clustering: Hierarchical clustering B1.Ch 8 (1 hrs) Association Analysis (2 hrs) B1.Ch 6 Project Proposal Due 8 Session break 9 K-means clustering lab (1 hr) Association Analysis Lab (2 hr) 10 Web Mining: (1 hrs) B2.Ch10.4, B2.Ch10.5 11 Web Mining: Link analysis (1 hr) Web classification lab (1 hr) [P08,P09] 12 Special Topics in Data Mining: IS424 – Data Mining and Business Analytics Page 9
  • (3 hrs) Course Review Project Report Due 13 Project Presentation: Team #1 to #4 Project Presentation: Team #5 to #8 14 Study Week 15 Exam 7. List of Information Resources and References 7.1 Core Text Books: • [B1] Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, 2006. • [B2] Data Mining: Concepts and Techniques (Second Edition), J. Han and M. Kamber, Kaufmann Publishers, 2006. 7.2 Reference Books: • [B3] Data Mining: Practical Machine Learning Tools and Techniques (Second Edition), Ian H. Witten, Eibe Frank, Kaufmann Publishers, 2006. Note: Both core text books and reference books are available on reserve at SMU library. 7.3 Reference Papers • [P01] Agrawal R, Srikant R. "Fast Algorithms for Mining Association Rules", VLDB. Sep 12-15 1994, Chile, 487-99. http://www.sigmod.org/vldb/conf/1994/P487.PDF • [P02] Agrawal R, Imielinski T, Swami AN. "Mining Association Rules between Sets of Items in Large Databases." SIGMOD. June 1993, 22(2):207-16. http://portal.acm.org/ft_gateway.cfm?id=170072&type=pdf&coll=GUIDE&dl=portal,AC M&CFID=11111111&CFTOKEN=2222222 • [P03] J. Han, J. Pei, Y. Yin and R. Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach”, Data Mining and Knowledge Discovery, 8(1):53-87, 2004. http://www-faculty.cs.uiuc.edu/~hanj/pdf/dami04_fptree.pdf • [P04] Wray Buntine, “Learning Classification Trees”, Proceedings on Conf. on AI and Statistics, 1991. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.267&rep=rep1&type=pdf • [P05] Sreerama K. Murthy, “Automatic construction of decision trees from data: A multi- disciplinary survey.” Data Mining and Knowledge Discovery 2(4):345-389, 1998. http://www.itlabs.umn.edu/classes/Spring-2000/csci5980-dm/decision-tree-survey.pdf • [P06] Anil K. Jain, M. Narasimha Murty, Patrick J. Flynn. “Data Clustering: A Review”. ACM Computing Survey 31(3): 264-323 (1999) www.cs.rutgers.edu/~mlittman/courses/lightai03/jain99data.pdf IS424 – Data Mining and Business Analytics Page 10
  • • [P07] Michael Steinbach, George Karypis and Vipin Kumar. “A Comparison of Document Clustering Techniques”. KDD Workshop on Text Mining, 2000. http://glaros.dtc.umn.edu/gkhome/fetch/papers/docclusterKDDTMW00.pdf • [P08] J. Kleinberg. “Authoritative sources in a hyperlinked environment.” In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1998. http://www.cs.cornell.edu/home/kleinber/auth.pdf • [P09] Sergey Brin and Lawrence Page. “The anatomy of a large-scale hypertextual web search engine.” pages 107-117, 1998. http://infolab.stanford.edu/pub/papers/google.pdf • [P10] M Kuramochi, G Karypis. “Frequent Subgraph Discovery.” ICDM, 2001. http://www-users.cs.umn.edu/~kuram/papers/fsg.pdf 7.4 Best Practices and Case Studies S/No Title Author 1 Analytics-driven solutions for customer targeting and sales force R. Lawrence, C. Perlich, S. Rosset, I. allocation Khabibrakhmanov, S. Mahatma, S. http://www.research.ibm.com/journal/sj/464/lawrence.pdf Weiss 2 Closing the gap: automated screening of tax returns to identify Dave DeBarr, Zach Eyler-Walker egregious tax shelters http://www.dataminingcasestudies.com/DMCS_WorkshopProce edings25.pdf (page 34) 3 Data mining for improved cardiac care R. Bharat Rao, Sriram Krishnan, http://www.dataminingcasestudies.com/DMCS_WorkshopProce Radu Stefan Niculescu edings25.pdf (page 12) 4 Market basket recommendations for the HP SMB store Pramod Singh, A Charles Thomas, http://www.dataminingcasestudies.com/DMCS_WorkshopProce Ariel Sepulveda edings25.pdf (page 92) 5 The problem of disguised missing data Ronald K. Pearson http://www.sigkdd.org/explorations/issues/8-1-2006-06/12- Pearson.pdf 6 Data Quality Models for High Volume Transaction Streams Joseph Bugajski, Chris Curry, Robert http://www.opendatagroup.com/grossman-dmcs-07.pdf Grossman, David Locke, Steve Vejcik 7 External Search Term Marketing Program: A Return on Pramod Singh, Laksminarayan Investment Approach Choudur, Alan Benson and Manoj http://www.dataminingcasestudies.com/DMCS_WorkshopProce Mathew edings25.pdf (page 60) 8 Crime Data Mining: A General Framework and Some Examples Hsinchun Chen, Wingyan Chung, http://ai.eller.arizona.edu/COPLINK/publications/CrimeDataMi Jennifer Xu, Gang Wang, Yi Qin, ning_Computer.pdf Michael Chau 9 Privacy Preserving Data Mining Systems Nan Zhang, Wei Zhao http://ranger.uta.edu/~nzhang/files/Computer07.pdf IS424 – Data Mining and Business Analytics Page 11
  • 8. Software Tools • SAS Enterprise Miner 9. Weekly Plan Week: 1 Date: 7 January 2009 Session 1: Course Overview Session 2: Introduction to data mining & BI Assignment: HW1 Reading: • B1.Ch 1 Project: Week: 2 Date: 14 January 2009 Session 1: What is Data? Session 2: Lab: Introduction to SAS Enterprise Miner Assignment: HW2 Reading: • B1.Ch 2 Project: Week: 3 Date: 21 January 2009 Session 1: Data Exploration Session 2: Lab: Data Sampling Assignment: HW3 Reading: • B1.Ch 3 Project: • Project Description Release Week: 4 Date: 28 January 2009 Session 1: Classification – Decision Tree Assignment: HW4 Reading: IS424 – Data Mining and Business Analytics Page 12
  • • B1.Ch 4 Project: Week: 5 Date: 4 February 2009 Session 1: Classification - Nearest Neighbor and Bayesian classifiers Session 2: Lab – Data Exploration Assignment: HW5 Reading: • B1.Ch 5 Project: Week: 6 Date: 11 February 2009 Session 1: Clustering – K means clustering Session 2: Lab - Decision tree Assignment: HW6 Reading: • B1.Ch 8 Project: Week: 7 Date: 18 February 2009 Session 1: Clustering – Hierarchical Clustering Session 2: Association Analysis Assignment: HW7 Reading: • B1.Ch 8 • B1.Ch 6 Project: IS424 – Data Mining and Business Analytics Page 13
  • Week 8 (Date: 23 February to 1 March 2009): Recess Week: 9 Date: 4 March 2009 Session 1: Lab – K-Means clustering Session 2: Lab - Association Analysis Assignment: Reading: Project: Week: 10 Date: 11 March 2009 Session 1: Web Mining Session 2: Paper presentation Assignment: HW10 Reading: • B2.Ch10.4, B2.Ch10.5 Project: Week: 11 Date: 18 March 2009 Session 1: Web Mining Session 2: Lab – Web Classification Session 3: Paper presentation Assignment: HW11 Reading: Project: Week: 12 Date: 25 March 2009 Session 1: Special Topics Session 2: Course review Assignment: Reading: Project: Project report due IS424 – Data Mining and Business Analytics Page 14
  • Week: 13 Date: 1 April 2009 Session 1: • Student Presentation Assignment: • Student feedback • Peer assessment Reading: Handout Project: Week 14 (Date: 6 to 12 April): Study Week Week 15 (Date: Tuesday, 14 April, 5:00pm): Final Exam IS424 – Data Mining and Business Analytics Page 15