1.
MODULE SPECIFICATION
1. Module title
Data Mining
2. Module code
MA2F15N
3. Module level
I
4. Module Leader
Dr R. Rigby
5. Home academic department
CCTM
6. Teaching location
North
7. Teaching semester
Spring
8. Teaching mode
day
9. Module Type
STAN
10. Credit rating for module
15
11. Prerequisites and corequisites
Prerequisites: MA2F10N Foundations of Statistics
12. Module summary
MA2F15N
Data Mining
This module investigates methods of uncovering important information or structure in large data
sets. In particular, the module introduces techniques for both supervised learning (decision tree,
neural networks and logistic regression models) and unsupervised learning (cluster and
association analyses). Many areas of application are investigated including business (e.g. credit
scoring and fraud detection) and health (e.g. screening patients to predict future cases of a
specific disease). Interpretation of the results of data mining and their practical implications are
discussed.
Appropriate software will be used ( e.g. SAS Enterprise Miner).
Semester: Spring
Prerequisites: MA2F10N Foundations of Statistics
Assessment: Data analysis coursework with managerial report 40% + Unseen Exam 60%
2.
13. Module aims
This module investigates methods of extracting information or structure from large data sets.
The principal graduate attribute developed in this module is A2.
Students completing this module should specifically be able to:
* Appreciate the purpose and breadth of areas of application of data mining (A3)
* Prepare data sets to facilitate effective data mining (A2)
* Understand and compare the techniques and tools available for solving data mining problems
(A2).
* Explore and solve data mining problems using appropriate software (e.g. SAS Enterprise
Miner) (A2).
* Discuss the intelligent use of data mining and the practical issues involved in its application
(A2)
* Present the results, verbally and graphically, of data mining projects to non-technical
managers (A3).
14. Learning outcomes
On completing this module, students should be able to,
1. prepare large data sets for analysis
2. identify and apply appropriate analyses for large data sets
3. appreciate the practical implications and limitations of data mining analyses applied to
real-life situations
4. use appropriate software (e.g. SAS Enterprise Miner) to analyse large data sets and
communicate the results to non-technical managers
15. Syllabus
Data mining process
Definition of data mining, intelligence value chain, data mining cycle.
Data preparation:
Visualising large data sets, data cleaning, outlier detection, variable transformation.
Supervised learning techniques:
classifying cases into population groups:
decision trees, neural networks, logistic regression models
Unsupervised learning techniques:
Identifying population groups : cluster analysis
Identifying product groups : association analysis (‘market basket analysis’)
Applications:
The methods have wide applications in business, marketing, health etc. including :
Basket analysis (for retail companies e.g. supermarkets etc.)
Customer Relations Management: profiling and segmentation (for retail companies e.g. banks,
telecom companies etc)
Credit scoring and fraud detection (e.g. for bank credit cards, etc.)
Health screening to predict disease (e.g. heart disease, diabetes, etc.)
The course will be taught using a powerful interactive menu driven package (SAS Enterprise
Miner).
16. Assessment strategy
The assessment will consist of a coursework and an unseen examination.
The coursework will be the analysis of a large data set together with a report at a level suitable for
3.
a non-technical manager (L1 - L4) This will enable students to demonstrate that they can apply
their knowledge to a practical problem, think critically and produce solutions, seek, handle and
interpret information and communicate their work effectively.
The unseen examination will provide an opportunity for students to demonstrate their knowledge
of data mining techniques and their ability to apply these techniques appropriately to the solution
of problems (L2, L3)
August reassessment strategy: Individual Coursework (40%), Unseen Examination (60%).
17. Summary description of assessment items
Assessment Description of item % Qual Qual Tariff Week
type Weighting Mark Set due
CWK Coursework 40 - - 11
EXU Unseen Examination 60 - - 13
- - - -
- - - -
- - - -
18. Learning and teaching
Lectures (22 hours) will be used to formally introduce the various concepts and ideas
underpinning the module and will provide a focal point for the module. Practical sessions (22
hours) will give students 'hands-on' experience of using a computer package to analyse large data
sets. Students will be provided with instructions on applying the software. The material in the
course will be based on real-life data sets from a variety of different applications.
Students will be expected to spend further 100 hours on self-study ( unsupervised homework
exercises and further directed reading).
19. Bibliography
Adriaans, P. and Zantinge, D. (1996) Data Mining. Addison Wesley: Harlow, England.
ISBN 0-201-40380-3
Berry, M. J. A. and Linoff, G. (1997). Data Mining Techniques: for Marketing, Sales and Customer
Support. Wiley : New York. ISBN 0–471–17980-9.
Berry, M. J. A. and Linoff, G. (2000). Mastering Data Mining. Wiley : New York.
ISBN 0–471-33123-6.
Berson, A., Smith, S. and Thearling, K. (2000). Building Data Mining Applications for CRM
(Customer Relations Management). McGraw-Hill : New York. ISBN 0–07-134444-6
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press : Oxford.
ISBN 0-19-853864-2
Brieman, L., Friedman, J. H. , Olshen, R. A. and Stone, C. J. (1984). Classification and
Regression Trees. Chapman and Hall.
Delmater, R. and Hancock, M. (2001) Data Mining Explained. Digital Press, Boston.
ISBN 1-55558-231-1
Groth, R. (1997). Data Mining: a hands-on approach for business professionals. Prentice Hall:
Englewood Cliffs, N.J.
Hand, D. J. , Mannila, H. and Smyth, P. (2001) Principles of Data Mining. Bradford Book.
ISBN 0-262-08290-X.
Hastie, T., Tibshirani, R. and Friedman, J. (2001) The Elements of Statistical Learning.
Springer-Verlag , New York. ISBN 0-387-95284-5
4.
Haykin, S. (1994) Neural Networks : A Comprehensive Foundation. Macmillan : New York.
Kass, G. V. (1980) An exploratory technique for investigating large quantities of data. Applied
Statistics, 29, 127-129.
Kohonen, T. (1995) Self Organising Maps. Springer : Berlin.
Quinlan, J. R. (1993). C4.5 : Programs for Machine Learning. Morgan Kaufman : San Mateo,
California.
Quinlan, J. R. (1993). Induction of Decision Trees. Machine Learning, 1, 81-106.
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press :
Cambridge. ISBN 0-521-46086-7
SAS Institute Inc. (2000) Getting Started with Enterprise Miner Software, Version 4.0. SAS
Institute Inc. : Cary, North Carolina. ISBN 1-58025-723-2
Westphal, C. and Blaxton, T. (1998) Data Mining Solutions : Methods and Tools for Solving
Read-World Problems. Wiley : New York. ISBN 0-471-25384-7
20. Approved to run from
September 2005
21. Module multivalency
Designate for: Foundation Degree Computing and Mathematics
22. Module designation undergraduate only
subject context
23. Subject Standards Board
Mathematics
24. Subject Standards External Examiner(s)
Views
Actions
Embeds 0
Report content