1. Module title
2. Module code
3. Module level
4. Module Leader
Dr R. Rigby
5. Home academic department
6. Teaching location
7. Teaching semester
8. Teaching mode
9. Module Type
10. Credit rating for module
11. Prerequisites and corequisites
Prerequisites: MA2F10N Foundations of Statistics
12. Module summary
This module investigates methods of uncovering important information or structure in large data
sets. In particular, the module introduces techniques for both supervised learning (decision tree,
neural networks and logistic regression models) and unsupervised learning (cluster and
association analyses). Many areas of application are investigated including business (e.g. credit
scoring and fraud detection) and health (e.g. screening patients to predict future cases of a
specific disease). Interpretation of the results of data mining and their practical implications are
Appropriate software will be used ( e.g. SAS Enterprise Miner).
Prerequisites: MA2F10N Foundations of Statistics
Assessment: Data analysis coursework with managerial report 40% + Unseen Exam 60%
13. Module aims
This module investigates methods of extracting information or structure from large data sets.
The principal graduate attribute developed in this module is A2.
Students completing this module should specifically be able to:
* Appreciate the purpose and breadth of areas of application of data mining (A3)
* Prepare data sets to facilitate effective data mining (A2)
* Understand and compare the techniques and tools available for solving data mining problems
* Explore and solve data mining problems using appropriate software (e.g. SAS Enterprise
* Discuss the intelligent use of data mining and the practical issues involved in its application
* Present the results, verbally and graphically, of data mining projects to non-technical
14. Learning outcomes
On completing this module, students should be able to,
1. prepare large data sets for analysis
2. identify and apply appropriate analyses for large data sets
3. appreciate the practical implications and limitations of data mining analyses applied to
4. use appropriate software (e.g. SAS Enterprise Miner) to analyse large data sets and
communicate the results to non-technical managers
Data mining process
Definition of data mining, intelligence value chain, data mining cycle.
Visualising large data sets, data cleaning, outlier detection, variable transformation.
Supervised learning techniques:
classifying cases into population groups:
decision trees, neural networks, logistic regression models
Unsupervised learning techniques:
Identifying population groups : cluster analysis
Identifying product groups : association analysis (‘market basket analysis’)
The methods have wide applications in business, marketing, health etc. including :
Basket analysis (for retail companies e.g. supermarkets etc.)
Customer Relations Management: profiling and segmentation (for retail companies e.g. banks,
telecom companies etc)
Credit scoring and fraud detection (e.g. for bank credit cards, etc.)
Health screening to predict disease (e.g. heart disease, diabetes, etc.)
The course will be taught using a powerful interactive menu driven package (SAS Enterprise
16. Assessment strategy
The assessment will consist of a coursework and an unseen examination.
The coursework will be the analysis of a large data set together with a report at a level suitable for
a non-technical manager (L1 - L4) This will enable students to demonstrate that they can apply
their knowledge to a practical problem, think critically and produce solutions, seek, handle and
interpret information and communicate their work effectively.
The unseen examination will provide an opportunity for students to demonstrate their knowledge
of data mining techniques and their ability to apply these techniques appropriately to the solution
of problems (L2, L3)
August reassessment strategy: Individual Coursework (40%), Unseen Examination (60%).
17. Summary description of assessment items
Assessment Description of item % Qual Qual Tariff Week
type Weighting Mark Set due
CWK Coursework 40 - - 11
EXU Unseen Examination 60 - - 13
- - - -
- - - -
- - - -
18. Learning and teaching
Lectures (22 hours) will be used to formally introduce the various concepts and ideas
underpinning the module and will provide a focal point for the module. Practical sessions (22
hours) will give students 'hands-on' experience of using a computer package to analyse large data
sets. Students will be provided with instructions on applying the software. The material in the
course will be based on real-life data sets from a variety of different applications.
Students will be expected to spend further 100 hours on self-study ( unsupervised homework
exercises and further directed reading).
Adriaans, P. and Zantinge, D. (1996) Data Mining. Addison Wesley: Harlow, England.
Berry, M. J. A. and Linoff, G. (1997). Data Mining Techniques: for Marketing, Sales and Customer
Support. Wiley : New York. ISBN 0–471–17980-9.
Berry, M. J. A. and Linoff, G. (2000). Mastering Data Mining. Wiley : New York.
Berson, A., Smith, S. and Thearling, K. (2000). Building Data Mining Applications for CRM
(Customer Relations Management). McGraw-Hill : New York. ISBN 0–07-134444-6
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press : Oxford.
Brieman, L., Friedman, J. H. , Olshen, R. A. and Stone, C. J. (1984). Classification and
Regression Trees. Chapman and Hall.
Delmater, R. and Hancock, M. (2001) Data Mining Explained. Digital Press, Boston.
Groth, R. (1997). Data Mining: a hands-on approach for business professionals. Prentice Hall:
Englewood Cliffs, N.J.
Hand, D. J. , Mannila, H. and Smyth, P. (2001) Principles of Data Mining. Bradford Book.
Hastie, T., Tibshirani, R. and Friedman, J. (2001) The Elements of Statistical Learning.
Springer-Verlag , New York. ISBN 0-387-95284-5
Haykin, S. (1994) Neural Networks : A Comprehensive Foundation. Macmillan : New York.
Kass, G. V. (1980) An exploratory technique for investigating large quantities of data. Applied
Statistics, 29, 127-129.
Kohonen, T. (1995) Self Organising Maps. Springer : Berlin.
Quinlan, J. R. (1993). C4.5 : Programs for Machine Learning. Morgan Kaufman : San Mateo,
Quinlan, J. R. (1993). Induction of Decision Trees. Machine Learning, 1, 81-106.
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press :
Cambridge. ISBN 0-521-46086-7
SAS Institute Inc. (2000) Getting Started with Enterprise Miner Software, Version 4.0. SAS
Institute Inc. : Cary, North Carolina. ISBN 1-58025-723-2
Westphal, C. and Blaxton, T. (1998) Data Mining Solutions : Methods and Tools for Solving
Read-World Problems. Wiley : New York. ISBN 0-471-25384-7
20. Approved to run from
21. Module multivalency
Designate for: Foundation Degree Computing and Mathematics
22. Module designation undergraduate only
23. Subject Standards Board
24. Subject Standards External Examiner(s)