Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. MODULE SPECIFICATION 1. Module title Data Mining 2. Module code MA2F15N 3. Module level I 4. Module Leader Dr R. Rigby 5. Home academic department CCTM 6. Teaching location North 7. Teaching semester Spring 8. Teaching mode day 9. Module Type STAN 10. Credit rating for module 15 11. Prerequisites and corequisites Prerequisites: MA2F10N Foundations of Statistics 12. Module summary MA2F15N Data Mining This module investigates methods of uncovering important information or structure in large data sets. In particular, the module introduces techniques for both supervised learning (decision tree, neural networks and logistic regression models) and unsupervised learning (cluster and association analyses). Many areas of application are investigated including business (e.g. credit scoring and fraud detection) and health (e.g. screening patients to predict future cases of a specific disease). Interpretation of the results of data mining and their practical implications are discussed. Appropriate software will be used ( e.g. SAS Enterprise Miner). Semester: Spring Prerequisites: MA2F10N Foundations of Statistics Assessment: Data analysis coursework with managerial report 40% + Unseen Exam 60%
  2. 2. 13. Module aims This module investigates methods of extracting information or structure from large data sets. The principal graduate attribute developed in this module is A2. Students completing this module should specifically be able to: * Appreciate the purpose and breadth of areas of application of data mining (A3) * Prepare data sets to facilitate effective data mining (A2) * Understand and compare the techniques and tools available for solving data mining problems (A2). * Explore and solve data mining problems using appropriate software (e.g. SAS Enterprise Miner) (A2). * Discuss the intelligent use of data mining and the practical issues involved in its application (A2) * Present the results, verbally and graphically, of data mining projects to non-technical managers (A3). 14. Learning outcomes On completing this module, students should be able to, 1. prepare large data sets for analysis 2. identify and apply appropriate analyses for large data sets 3. appreciate the practical implications and limitations of data mining analyses applied to real-life situations 4. use appropriate software (e.g. SAS Enterprise Miner) to analyse large data sets and communicate the results to non-technical managers 15. Syllabus Data mining process Definition of data mining, intelligence value chain, data mining cycle. Data preparation: Visualising large data sets, data cleaning, outlier detection, variable transformation. Supervised learning techniques: classifying cases into population groups: decision trees, neural networks, logistic regression models Unsupervised learning techniques: Identifying population groups : cluster analysis Identifying product groups : association analysis (‘market basket analysis’) Applications: The methods have wide applications in business, marketing, health etc. including : Basket analysis (for retail companies e.g. supermarkets etc.) Customer Relations Management: profiling and segmentation (for retail companies e.g. banks, telecom companies etc) Credit scoring and fraud detection (e.g. for bank credit cards, etc.) Health screening to predict disease (e.g. heart disease, diabetes, etc.) The course will be taught using a powerful interactive menu driven package (SAS Enterprise Miner). 16. Assessment strategy The assessment will consist of a coursework and an unseen examination. The coursework will be the analysis of a large data set together with a report at a level suitable for
  3. 3. a non-technical manager (L1 - L4) This will enable students to demonstrate that they can apply their knowledge to a practical problem, think critically and produce solutions, seek, handle and interpret information and communicate their work effectively. The unseen examination will provide an opportunity for students to demonstrate their knowledge of data mining techniques and their ability to apply these techniques appropriately to the solution of problems (L2, L3) August reassessment strategy: Individual Coursework (40%), Unseen Examination (60%). 17. Summary description of assessment items Assessment Description of item % Qual Qual Tariff Week type Weighting Mark Set due CWK Coursework 40 - -       11 EXU Unseen Examination 60 - -       13 -             - -       - -             - -       - -             - -       - 18. Learning and teaching Lectures (22 hours) will be used to formally introduce the various concepts and ideas underpinning the module and will provide a focal point for the module. Practical sessions (22 hours) will give students 'hands-on' experience of using a computer package to analyse large data sets. Students will be provided with instructions on applying the software. The material in the course will be based on real-life data sets from a variety of different applications. Students will be expected to spend further 100 hours on self-study ( unsupervised homework exercises and further directed reading). 19. Bibliography Adriaans, P. and Zantinge, D. (1996) Data Mining. Addison Wesley: Harlow, England. ISBN 0-201-40380-3 Berry, M. J. A. and Linoff, G. (1997). Data Mining Techniques: for Marketing, Sales and Customer Support. Wiley : New York. ISBN 0–471–17980-9. Berry, M. J. A. and Linoff, G. (2000). Mastering Data Mining. Wiley : New York. ISBN 0–471-33123-6. Berson, A., Smith, S. and Thearling, K. (2000). Building Data Mining Applications for CRM (Customer Relations Management). McGraw-Hill : New York. ISBN 0–07-134444-6 Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press : Oxford. ISBN 0-19-853864-2 Brieman, L., Friedman, J. H. , Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Chapman and Hall. Delmater, R. and Hancock, M. (2001) Data Mining Explained. Digital Press, Boston. ISBN 1-55558-231-1 Groth, R. (1997). Data Mining: a hands-on approach for business professionals. Prentice Hall: Englewood Cliffs, N.J. Hand, D. J. , Mannila, H. and Smyth, P. (2001) Principles of Data Mining. Bradford Book. ISBN 0-262-08290-X. Hastie, T., Tibshirani, R. and Friedman, J. (2001) The Elements of Statistical Learning. Springer-Verlag , New York. ISBN 0-387-95284-5
  4. 4. Haykin, S. (1994) Neural Networks : A Comprehensive Foundation. Macmillan : New York. Kass, G. V. (1980) An exploratory technique for investigating large quantities of data. Applied Statistics, 29, 127-129. Kohonen, T. (1995) Self Organising Maps. Springer : Berlin. Quinlan, J. R. (1993). C4.5 : Programs for Machine Learning. Morgan Kaufman : San Mateo, California. Quinlan, J. R. (1993). Induction of Decision Trees. Machine Learning, 1, 81-106. Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press : Cambridge. ISBN 0-521-46086-7 SAS Institute Inc. (2000) Getting Started with Enterprise Miner Software, Version 4.0. SAS Institute Inc. : Cary, North Carolina. ISBN 1-58025-723-2 Westphal, C. and Blaxton, T. (1998) Data Mining Solutions : Methods and Tools for Solving Read-World Problems. Wiley : New York. ISBN 0-471-25384-7 20. Approved to run from September 2005 21. Module multivalency Designate for: Foundation Degree Computing and Mathematics 22. Module designation undergraduate only subject context 23. Subject Standards Board Mathematics 24. Subject Standards External Examiner(s)