# Introduction to data mining with Weka by OPEN MINER

Introduction to data mining with WEKA by OPEN MINER
www.open-miner.com

Published in: Technology
### Introduction to data mining with Weka by OPEN MINER

1. 1. AN INTRODUCTION TO DATA MINING WITH WEKA BY OPEN MINER WWW.OPEN-MINER.COM Part 0 About Us
Instructors  Japan Advance Institute of Science and Technology  Computer Engineering  Computer Engineering Email: siriwont@gmail.com, openminer@gmail.com
Course Outline  1st day  Introduction to data mining  Introduction to Weka  Preprocess  Regression & Classification Techniques  Linear Regression  Decision tree
Course Outline (cont')  2nd day  Regression & Classification Techniques  K-Nearest neighbors  Neural Networks  Support Vector Machines (SVM)  Clustering  Association rule discovery  JAVA + WEKA  PHP + WEKA  Knowledge Flow
5. 5. AN INTRODUCTION TO DATA MINING WITH WEKA BY OPEN MINER WWW.OPEN-MINER.COM Part I Introduction to Data Mining
What is data mining?  "The exploration and analysis of large quantities of data in order to discover meaningful patterns and rules" – Data Mining Techniques (2nd Edition)  "Extraction of interesting (non trivial, previously, unknown and potential useful) information from data in large databases" – Data Mining Concepts &Techniques (2nd Edition)
Loyalty Cards
Loyalty Cards (2) Tesco Lotus Club card 08/2552 BigC BigCard 09/2552 Carrefour I wish 2550 TOPS SPOT ~2548 personal positioning shopping
Summary  Social network : facebook, twitter  protein sequence, gene  Data mining
10. 10. AN INTRODUCTION TO DATA MINING WITH WEKA BY OPEN MINER WWW.OPEN-MINER.COM Part II Cross Reference Industry Standard Process for Data Mining
CRISP-DM  CRoss-Industry Standard Process for Data Mining (CRISP-DM)  DaimlerChrysler  SPSS  NCR  Workflow data mining  6
Data Mining Workflow : http://openminer.com/2009/11/03/introduction-datamining/  Business Understanding + Data Understanding + Data Preparation 80%
CRISP Example  ( http://www.nectec.or.th/NTJ/No11/No11.php )
CRISP Example (5)  Stu_code Sex Address GPA 37058063 Male Bangkok 2.3 37058167 Male Songkla 3.2  2535-2542 10,000 476,085  Sub_code … Grade Stu_code 37058063 … C+ 37058063 … D
CRISP Example (6) Old New Stu_code Sex Address GPA Stu_code Sex Address GPA 37058063 Male Bangkok 2.3 37058063 Male Bangkok 37058167 Male Songkla 3.2 37058167 Male Songkla … … … … … … BAD GOOD (New) … … Grade Stu_code Sub_code … Grade e 37058063 … C+ 37058063 … Medium Stu_code Sub_code 37058063 … D 37058063 … Low
Data Mining Concepts and Techniques Supervised learning Unsupervised learning  Classification  Clustering  Associate  Regression
Data Mining Software  Commercial Software  SAS® Enterprise Miner  Open source software or Freeware  Weka  Microsoft SQL Server 2008  RapidMiner  DB2 Intelligent Miner  KNIME (Konstanz Information Miner)
Data Mining Software (2)  Weka
19. 19. AN INTRODUCTION TO DATA MINING WITH WEKA BY OPEN MINER WWW.OPEN-MINER.COM Part III Introduction to Weka
What is Weka ?  Weka  Waikato Environment for Knowledge Analysis  open source Data Mining  !!!  Java  Windows  Linux  MAC OS  Weka  http://www.cs.waikato.ac.nz/ml/weka/
Download Weka  http://www.cs.waikato.ac.nz/ml/weka/ Java Java Java !!
Weka Explorer Tab data mining Workspace: Weka Explorer Status: Weka Log:
23. 23. AN INTRODUCTION TO DATA MINING WITH WEKA BY OPEN MINER WWW.OPEN-MINER.COM Part IV Preprocess
Agenda  (instance) (attribute)  Weka  CSV  ARFF  Preprocess Data Weka  Outlier
Load data into Weka (2)  input Weka (Database) (Internet) (Generate File) Data) CSV ARFF
Lab 4-1: Generate CSV file :  Weka CSV Excel 1: Note : In sex attribute, value 0 = Female, 1 = Male, 2 = Others customers.csv
Weka & MySQL (4)  jdbc:mysql://localhost:3306/weka_course  URL : URL database server Click User Server URL & port Database name
Replace missing values in Weka Choose filters unsupervised attribute ReplaceMissingValues  Apply
29. 29. AN INTRODUCTION TO DATA MINING WITH WEKA BY OPEN MINER WWW.OPEN-MINER.COM Part V Regression & Classification
Agenda (Regression) (Data classification)  training, testing  Linear Regression  Decision tree  K-nearest neighbors  Neural Network  Support Vector Machines (SVM)
What is classification? :
Example: Classification (3)  Model  training data class  Decision tree model  Tree  Evaluate Model
Example: Classification (4)  Unseen data (class)  134.86 96.01 158.83 ?
Classification Steps (3)  (classification model building) 1 1 0  (Training data) 2 0 A 1 B (Evaluate data) (evaluation) 1 1 3 2 A A 0 Unseen data 1 2 1 0 A
Classification in Weka (cont') tab Classify
1: Classifier classification  Bayes (probability)  Functions  Lazy  classification
Lab 5-2: German Credit Card  Business Understanding  system) (decision support  Data Understanding 600  GermanCreditBalance.arff
38. 38. AN INTRODUCTION TO DATA MINING WITH WEKA BY OPEN MINER WWW.OPEN-MINER.COM Part VI Clustering
Segmentation  (geographic)  (demographic)  (behavior)
Data clustering (clustering)  (cluster) (unsupervised learning)
Clustering in Weka (cont') tab Cluster
Example1 : Clustering bank data (bank)  id age sex region income married children save_act current_act mortgage car ID12101 INNER_CITY FEMALE NO NO NO NO NO ID12102 MALE TOWN YES YES YES NO NO ID12103 INNER_CITY FEMALE YES NO NO NO YES ID12104 FEMALE TOWN YES YES NO NO NO ID12105 FEMALE RURAL YES YES YES NO YES ID12106 FEMALE TOWN YES YES YES NO NO ID12107 MALE RURAL NO NO NO YES NO ID12108 MALE TOWN YES YES YES NO YES ID12109 SUBURBAN FEMALE YES YES NO NO NO ID12110 MALE TOWN YES YES YES NO NO
43. 43. AN INTRODUCTION TO DATA MINING WITH WEKA BY OPEN MINER WWW.OPEN-MINER.COM Part VII Association Rules
Market Basket Analysis supermarket  ?
Data from point-of-sale (4) POS database Transaction-time Product 01-13-2009 20:04 Apple 01-13-2009 20:04 Beer TID Product 01-13-2009 20:04 Cereal 01-13-2009 20:04 Diapers 1 Apple, Beer, Cereal, Diapers 2 Apple, Beer, Diapers, Eggs 3 Beer, Eggs Transaction database 01-14-2009 11:30 Beer 01-14-2009 11:30 Diapers 01-14-2009 11:30 Apple 01-14-2009 11:30 Eggs 01-15-2009 14:15 Beer 01-15-2009 14:15 Eggs
Association Rules in Weka (cont') tab Associate
Lab 7-1: Market Basket  Business Understanding CRM  Data Understanding 1,000  2 ……  ……  supermarket_basket_transactions_2005.arff
48. 48. AN INTRODUCTION TO DATA MINING WITH WEKA BY OPEN MINER WWW.OPEN-MINER.COM Part VIII Command line & Integrate System
Weka in command line  Run > Weka cmd DOS DOS ( )
Lab8-2: Weka in Java Program (2) Weka Explorer  compile javac -classpath "C:Program FilesWeka-3-6weka.jar" testClassifier.java run java -classpath "C:Program FilesWeka-3-6weka.jar;." testClassifier
51. 51. AN INTRODUCTION TO DATA MINING WITH WEKA BY OPEN MINER WWW.OPEN-MINER.COM Part IX Knowledge Flow
Weka KnowledgeFlow  component workflow Weka
Example: Knowledge flow (7)  component TextViewer Layout ClassifierPerformanceEvaluator TextViewer Visualization text
Contact Us  E-mail  siriwont@gmail.com  Website  http://www.open-miner.com  Google Buzz  http://www.google.com/profiles/openminer