SlideShare a Scribd company logo
1 of 16
Download to read offline
A BRIEFTUTORIAL ON	

DATA MINING
Xiaming Chen, OMNI-Lab	

2014-07
OUTLINE
• Whats Data Mining?	

• A Hands-on Practice
2
WHATS DATA MINING
WHATS DATA MINING
• Science: probability, statistics, graph theory etc.	

• Techniques: clustering, classification, regression,
prediction etc.	

• A way to think about this world.
On textbooks
4
WHATS DATA MINING
• Science? Maybe
In reality
Research on Social Networks
5
WHATS DATA MINING
• Prediction? Yes!
In reality
The Highest Creature	

Intelligence (100%)
Anti-Prediction
6
US Election,	

Bayes Selection!
WHATS DATA MINING
• The world, thinking? Spying!
In reality
7
“Illegal SPYING below!”
WHATS DATA MINING
• You Need, You Learn, You Expert
8
Insights Thinking Programming
HANDS-ON PRACTICE
HANDS-ON PRACTICE
• Tools to Facilitate Your Data Analysis
• Commercial
• SAS
• IBM SPSS
• Matlab etc.
• Free/Open Source
• RapidMiner + Weka
• R (my favor)
• Python + SciPy + scikit-learn
• Hadoop/Spark etc.
10
HANDS-ON PRACTICE
• Example: RapidMiner + StoneFlakes
http://archive.ics.uci.edu/ml/datasets/StoneFlakes
11
HANDS-ON PRACTICE
• RapidMiner (ads-free)
• A Java-based IDE for ML, data mining, text mining etc.
• Modular design, graphic interface, zero-line coding
• Complete Process logic: data ETL, visualization, modeling,
prediction, reports etc.
• Growing extension market
• CLI and API for other programs
• Call functions of Weka and R
Download: http://www.rapidminer.com/12
HANDS-ON PRACTICE
• StoneFlakes
• StoneFlakes.csv: flake
attribute information
• annotation.csv:
inventory properties
Formated: http://io.hsiamin.com/data/StoneFlakes.tar.gz
13
HANDS-ON PRACTICE
• Demo
14
SUMMER COURSE
• Spatial-temporal Data Analysis
• 郑宇,MSR
• 7.1 ~ 31, 2014
• 周⼆二、四下午2:00 ~ 5:40
• 闵⾏行上院316
15
http://www.hsiamin.com
Thanks	

caesar0301@github
16

More Related Content

Viewers also liked

Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarSpazioDati
 
Data mining process powerpoint ppt slides.
Data mining process powerpoint ppt slides.Data mining process powerpoint ppt slides.
Data mining process powerpoint ppt slides.SlideTeam.net
 
Data Mining Scoring Engine development process
Data Mining Scoring Engine development processData Mining Scoring Engine development process
Data Mining Scoring Engine development processDylan Wan
 
Why Python (for Statisticians)
Why Python (for Statisticians)Why Python (for Statisticians)
Why Python (for Statisticians)Matt Harrison
 
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
 Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSEYandex
 
Event Logs: What kind of data does process mining require?
Event Logs: What kind of data does process mining require?Event Logs: What kind of data does process mining require?
Event Logs: What kind of data does process mining require?Wil van der Aalst
 
Improving Healthcare Operations Using Process Data Mining
Improving Healthcare Operations Using Process Data Mining Improving Healthcare Operations Using Process Data Mining
Improving Healthcare Operations Using Process Data Mining Splunk
 
Data mining process powerpoint presentation templates
Data mining process powerpoint presentation templatesData mining process powerpoint presentation templates
Data mining process powerpoint presentation templatesSlideTeam.net
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
Process Mining: Data Enabling Organisational Change
Process Mining: Data Enabling Organisational ChangeProcess Mining: Data Enabling Organisational Change
Process Mining: Data Enabling Organisational ChangeZbigniew Paszkiewicz
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data miningNeeda Multani
 

Viewers also liked (11)

Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch Seminar
 
Data mining process powerpoint ppt slides.
Data mining process powerpoint ppt slides.Data mining process powerpoint ppt slides.
Data mining process powerpoint ppt slides.
 
Data Mining Scoring Engine development process
Data Mining Scoring Engine development processData Mining Scoring Engine development process
Data Mining Scoring Engine development process
 
Why Python (for Statisticians)
Why Python (for Statisticians)Why Python (for Statisticians)
Why Python (for Statisticians)
 
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
 Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
 
Event Logs: What kind of data does process mining require?
Event Logs: What kind of data does process mining require?Event Logs: What kind of data does process mining require?
Event Logs: What kind of data does process mining require?
 
Improving Healthcare Operations Using Process Data Mining
Improving Healthcare Operations Using Process Data Mining Improving Healthcare Operations Using Process Data Mining
Improving Healthcare Operations Using Process Data Mining
 
Data mining process powerpoint presentation templates
Data mining process powerpoint presentation templatesData mining process powerpoint presentation templates
Data mining process powerpoint presentation templates
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Process Mining: Data Enabling Organisational Change
Process Mining: Data Enabling Organisational ChangeProcess Mining: Data Enabling Organisational Change
Process Mining: Data Enabling Organisational Change
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 

Similar to A Brief Tutorial On Data Mining-20140701

datamining_Lecture_1(introduction).pptx
datamining_Lecture_1(introduction).pptxdatamining_Lecture_1(introduction).pptx
datamining_Lecture_1(introduction).pptxHASHEMHASH
 
Mining the Social Web for Fun & Profit Within Your Organization
Mining the Social Web for Fun & Profit Within Your OrganizationMining the Social Web for Fun & Profit Within Your Organization
Mining the Social Web for Fun & Profit Within Your OrganizationDigital Reasoning
 
Privacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebPrivacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebMatthew Russell
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)Thinkful
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)Thinkful
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningPier Luca Lanzi
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Onlinesfdatascience
 
Real-World Behavior Analysis through a Social Media Lens
Real-World Behavior Analysis through a Social Media LensReal-World Behavior Analysis through a Social Media Lens
Real-World Behavior Analysis through a Social Media LensAli Abbasi
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
 
Dia sds2015 web version
Dia sds2015 web versionDia sds2015 web version
Dia sds2015 web versionMichael Brodie
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesUpXAcademy
 
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Robert Williams
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...jybufgofasfbkpoovh
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data ScienceThinkful
 
Ethics and ux ux sofia nov 2018
Ethics and ux ux sofia nov 2018Ethics and ux ux sofia nov 2018
Ethics and ux ux sofia nov 2018Eric Reiss
 
Data Infrastructure for Your Retail Digital Strategy
Data Infrastructure for Your Retail Digital StrategyData Infrastructure for Your Retail Digital Strategy
Data Infrastructure for Your Retail Digital StrategyAtif Shaikh
 

Similar to A Brief Tutorial On Data Mining-20140701 (20)

Waves keynote2c
Waves keynote2cWaves keynote2c
Waves keynote2c
 
datamining_Lecture_1(introduction).pptx
datamining_Lecture_1(introduction).pptxdatamining_Lecture_1(introduction).pptx
datamining_Lecture_1(introduction).pptx
 
Mining the Social Web for Fun & Profit Within Your Organization
Mining the Social Web for Fun & Profit Within Your OrganizationMining the Social Web for Fun & Profit Within Your Organization
Mining the Social Web for Fun & Profit Within Your Organization
 
Privacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebPrivacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social Web
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
 
Claudia Bauzer Medeiros - Open Science meets Data Science: Some challenges to...
Claudia Bauzer Medeiros - Open Science meets Data Science: Some challenges to...Claudia Bauzer Medeiros - Open Science meets Data Science: Some challenges to...
Claudia Bauzer Medeiros - Open Science meets Data Science: Some challenges to...
 
Real-World Behavior Analysis through a Social Media Lens
Real-World Behavior Analysis through a Social Media LensReal-World Behavior Analysis through a Social Media Lens
Real-World Behavior Analysis through a Social Media Lens
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Dia sds2015 web version
Dia sds2015 web versionDia sds2015 web version
Dia sds2015 web version
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
Ethics and ux ux sofia nov 2018
Ethics and ux ux sofia nov 2018Ethics and ux ux sofia nov 2018
Ethics and ux ux sofia nov 2018
 
Data Infrastructure for Your Retail Digital Strategy
Data Infrastructure for Your Retail Digital StrategyData Infrastructure for Your Retail Digital Strategy
Data Infrastructure for Your Retail Digital Strategy
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

A Brief Tutorial On Data Mining-20140701