Successfully reported this slideshow.
Upcoming SlideShare
×

# DM_Lab6

563 views

Published on

• Full Name
Comment goes here.

Are you sure you want to Yes No
• I am the owner of this file. Please delete it asap, or else I will report this to the administrator.

Are you sure you want to  Yes  No
• Be the first to like this

### DM_Lab6

1. 1. Data mining exercise with SPSS Clementine Lab 6 Winnie Lam Email: [email_address] Website: http://www.comp.polyu.edu.hk/~cswinnie/ The Hong Kong Polytechnic University Department of Computing Last update:22/09/2005
2. 2. Introduction - Neural Networks <ul><li>It is also known as A rtificial N eural N etworks </li></ul><ul><li>It can be considered as simplified mathematical models of brain-like systems and they function as parallel distributed computing networks </li></ul><ul><li>Its functionality is loosely based on the neuron (functional unit of the nervous system) </li></ul>This image is copyright Dennis Kunkel at www.DennisKunkel.com
3. 3. Neuron INPUT OUTPUT
4. 4. Neural Networks A single neuron have 5 components 1. Input x 2. Weight w 3. Bias b 4. Activation function f 5. Output y Σ f x n x 1 x 2 w 1 w 2 w n x 0 =1 INPUT OUTPUT y w 0 (bias,b)
5. 5. <ul><li>Example: X, a bunch of faces </li></ul><ul><li>x , a single face </li></ul><ul><li>f ( x ) =1 or -1 for x in X </li></ul><ul><li>(X is the set of objects we intend to separate) </li></ul>An illustration
6. 6. An illustration f(  f( 
7. 7. <ul><li>In Clementine, the neural networks used are feedforward neural networks, also known as multilayer perceptrons . </li></ul><ul><li>The neurons in such networks (or units ) are arranged in layers. </li></ul>Neural Networks
8. 8. Stage 1: Data Understanding Data file is located in: http:// www.comp.polyu.edu.hk/~cswinnie/lab.html
9. 9. Data Understanding <ul><li>Given: Data file (DRUG1n) </li></ul>Answer by yourself: 1. How many no. of attributes? 2. How many no. of records? 3. Any problems in the data?
10. 10. Data Preparation Result Add node: Var. File (in Source Palette) Task 1 : Import data into Clementine
11. 11. Stage 2: Data Preparation
12. 12. Data Preparation Add Node: Derive (in Field Ops Palette) Task 2 : Derive a new field “Na_to_K” (ratio of Na to K) Result
13. 13. Add Node: Filter (in Field Ops Palette) Task 3 : Discard the fields “Na” and “K” Result Data Preparation
14. 14. Add Node: Partition (in Field Ops Palette) Task 4: Partition the dataset into Training and Testing set (50/50) Data Preparation
15. 15. Add Node: Select (in Record Ops Palette) Task 5: Select Training and Testing set ? 95 records ? 105 records Data Preparation
16. 16. Task 6a : Define and update the fields’ value and type b : Set the input (Age, Sex, BP, Cholesterol, Na_to_K) and output (Drug) Add Node: Type (in Field Ops Palette) Data Preparation ?
17. 17. Stage 2: Data Mining Neural Networks
18. 18. Data Mining – Neural Networks Add Node: Neural Net (in Modeling Palette) Goal: Classification for “ drug ” attribute Result
19. 19. Data Mining – Neural Networks Goal: Validate the model with “test set” Result
20. 20. Data Mining – Neural Networks IF the selection of fields are done in the type node THEN choose “Use type node settings” ELSE choose “Use custom settings” and select targets and inputs
21. 21. Data Mining – Neural Networks 6 training methods for building neural network models. Randomly splits the data into separate training and test sets for purposes of model building. Stopping criteria Default: the network will stop training when the network appears to have reached its optimally trained state.
22. 22. Data Mining – Neural Networks <ul><li>More advanced settings: </li></ul><ul><li>specifying the no. of hidden layers and the </li></ul><ul><li>no. of nodes in each layer </li></ul><ul><li>learning rates </li></ul>
23. 23. Data Mining <ul><li>New Task: </li></ul><ul><li>Discover the rules for classification of drugs with C5.0 </li></ul><ul><li>2. Determine its accuracy with the test set </li></ul>
24. 24. SUMMARY <ul><li>Today, you’ve learnt : </li></ul><ul><li>Revise how to derive new attributes </li></ul><ul><li>Discard useless fields </li></ul><ul><li>Perform data partition (Training and test) </li></ul><ul><li>Neural Networks modeling </li></ul><ul><li>Validation with test set </li></ul>