SlideShare a Scribd company logo
/faculteit technologie management
Introduction to Data Mining
a.j.m.m. (ton) weijters
(slides are partially based on an introduction of Gregory
Piatetsky-Shapiro)
/faculteit technologie management
Overview
• Why data mining (data cascade)
• Application examples
• Data Mining & Knowledge Discovering
• Data Mining versus Process Mining
/faculteit technologie management
Why Data Mining
• Cascade of data
– Different growth rates, but about 30% each year is a
low growth rate estimation
• The possibility to use computers to analyze data
– 1975 computer for the whole university (main frame)
with 1MB working memory, now a PC with 512 MB
working memory
/faculteit technologie management
Cascade of data
 Business and government systems (transactions
system, ERP systems, Workflow systems, ...)
 Scientific data: astronomy, biology, etc
 Web, text, and e-commerce (new regularities, about
data storage to prevent attempts)
 Hospitals, internal revenue service
 ...
/faculteit technologie management
Examples large data bases
• AT&T handles billions of calls per day
– so much data, it cannot be all stored -- analysis has to
be done “on the fly”
• Europe's Very Long Baseline Interferometry
(VLBI) has 16 telescopes, each of which
produces 1 Gigabit/second of astronomical
data over a 25-day observation session
• Google
/faculteit technologie management
First conclusion
• Very little data will ever be looked at by a human
• Data Mining algorithms and computers are
NEEDED to make sense and use of data.
/faculteit technologie management
Overview
• Why data mining (data cascade)
• Application examples
• Data Mining & Knowledge Discovering
• Data Mining versus Process Mining
/faculteit technologie management
Application examples I
• Customer Relationship Management (CRM)
– Based on a data base with client information and
behavior try to select other potential consumers of a
product.
– Euro miles.
• Profiling tax cheaters
– Based on the profile of the tax payer and some figures
from the tax (electronic) form try to product tax
cheating.
/faculteit technologie management
Application examples II
• Health care
– Given the patient profile and the diagnoses try to
predict the number of hospital days. Information is
used in planning system.
• Industry
– Job shop planning. Based on already accepted jobs,
try to product the delivery time of a new offered job.
/faculteit technologie management
Type of applications
• Classification (supervised)
– Credit risk: result of data mining are rules that can be used
to classify new clients as: high, normal, low
• Estimation (supervised)
– Credit risk: output is not a classification but a number
between -1 and 1 to indicate risk (-1.0 very low, 0.0 normal,
+1.0 very high)
• Clustering (unsupervised)
• Associations: e.g. Bier & Chips & Peanuts occur
frequently in a shopping list of one person
• Visualization: to facilitate human discovery
/faculteit technologie management
Supervised verses unsupervised
• Supervised (Credit risk)
– Starting point is a historical data base with client
information and his/her financial data including credit
history (classification). This data base is used to
induce credit risk rules.
• Unsupervised (Clustering)
– Try to cluster customers into similar groups (how
many groups, in which sense similar)
/faculteit technologie management
E-commerce – Case Study
• A person buys a book (product) at Amazon.com.
• Task: Recommend other books (products) this
person is likely to buy
• Amazon does clustering based on books bought:
– customers who bought “Advances in Knowledge Discovery and
Data Mining”, also bought “Data Mining: Practical Machine
Learning Tools and Techniques with Java Implementations”
• Recommendation program is quite successful
/faculteit technologie management
Hands-on-project I
• Historical consumer data
– Age, education, sex, relationship,
etc.
– Income
• Model to predict income above
50K
• Use the model to select
consumers for direct mailing
/faculteit technologie management
Problems Suitable for Data-Mining
• have sub-optimal current methods
• have accessible, sufficient, and relevant data
• provides high payoff for the right decisions!
• (have a changing environment)
/faculteit technologie management
Overview
• Why data mining (data cascade)
• Application examples
• Data Mining & Knowledge Discovering
• Data Mining versus Process Mining
/faculteit technologie management
Knowledge Discovery Definition
Knowledge Discovery in Data is the
non-trivial process of identifying
– valid
– novel
– potentially useful
– and ultimately understandable patterns in data.
from Advances in Knowledge Discovery and Data
Mining, Fayyad, Piatetsky-Shapiro, Smyth, and
Uthurusamy, (Chapter 1), AAAI/MIT Press 1996
/faculteit technologie management
Related Fields
Statistics
Machine
Learning
Databases
Visualization
Data Mining and
Knowledge Discovery
/faculteit technologie management
Statistics, Machine Learning and
Data Mining
• Statistics:
– more theory-based
– more focused on testing hypotheses
• Machine Learning
– more heuristics then theory-based
– focused on improving performance of a learning algorithms
• Data Mining and Knowledge Discovery
– Data Mining one step in the Knowledge Discovery process (applying
the Machine Learning algorithm)
– Knowledge Discovery, the whole process including data cleaning,
learning, and integration and visualization of results
• Distinctions are fuzzy
/faculteit technologie management
Knowledge Discovery Process
flow, according to CRISP-DM
Monitoring
Business
Understanding
+ Data
Understanding
+ Data
Preparation
80% of the time
Modeling
(applying mining
algorithm) 20%
/faculteit technologie management
Business
Understanding
Data
Understanding
Evaluation
Data
Preparation
Modeling
Determine
Business Objectives
Background
Business Objectives
Business Success
Criteria
Situation Assessment
Inventory of Resources
Requirements,
Assumptions, and
Constraints
Risks and Contingencies
Terminology
Costs and Benefits
Determine
Data Mining Goal
Data Mining Goals
Data Mining Success
Criteria
Produce Project Plan
Project Plan
Initial Asessment of
Tools and Techniques
Collect Initial Data
Initial Data Collection
Report
Describe Data
Data Description Report
Explore Data
Data Exploration Report
Verify Data Quality
Data Quality Report
Data Set
Data Set Description
Select Data
Rationale for Inclusion /
Exclusion
Clean Data
Data Cleaning Report
Construct Data
Derived Attributes
Generated Records
Integrate Data
Merged Data
Format Data
Reformatted Data
Select Modeling
Technique
Modeling Technique
Modeling Assumptions
Generate Test Design
Test Design
Build Model
Parameter Settings
Models
Model Description
Assess Model
Model Assessment
Revised Parameter
Settings
Evaluate Results
Assessment of Data
Mining Results w.r.t.
Business Success
Criteria
Approved Models
Review Process
Review of Process
Determine Next Steps
List of Possible Actions
Decision
Plan Deployment
Deployment Plan
Plan Monitoring and
Maintenance
Monitoring and
Maintenance Plan
Produce Final Report
Final Report
Final Presentation
Review Project
Experience
Documentation
Deployment
Phases and Tasks
/faculteit technologie management
Other related fields
• Data warehouse
– A data warehouse thus not contain simply accumulated
data at a central point, but the data is carefully assembled
from a variety of information sources around the
organization, cleaned u, quality assured, and then released
(published).
• Business Intelligence (BI)
– The use of data in the data ware house to support the
managers with important information
/faculteit technologie management
Overview
• Why data mining (data cascade)
• Application examples
• Data Mining & Knowledge Discovering
• Data Mining versus Process Mining
/faculteit technologie management
Data Mining versus Process Mining
• Process Mining is data mining but with a strong
business process view.
• Some of the more traditional data mining
techniques can be used in the context of
process mining.
• Some new techniques are developed to perform
process mining (mining of process models).
/faculteit technologie management
Why Process Mining
• Traditional As-Is analysis of business processes strongly
based on the opinion of process expert. The basic idea
is to assemble an appropriate team and to organize
modeling sessions in which the knowledge of the team
members is used to build an adequate As-Is process
model.
• The surplus values of process mining in the As-Is
analysis are:
– information based on the real performance of the process
(objective)
– more details

More Related Content

Similar to UI introduction_to_data_mining YA.ppt

chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
ImXaib
 
Information_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.pptInformation_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.ppt
PrasadG76
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptx
muflehaljarrah
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
Utkarsh Sharma
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
shalini s
 
Operations Research and ICT A Keynote Address
Operations Research and ICT A Keynote AddressOperations Research and ICT A Keynote Address
Operations Research and ICT A Keynote Address
Elvis Muyanja
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
wahiba ben abdessalem
 
Data science and business analytics
Data  science and business analyticsData  science and business analytics
Data science and business analytics
Inbavalli Valli
 
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
israel edem
 
Introductions to Business Analytics
Introductions to Business Analytics Introductions to Business Analytics
Introductions to Business Analytics
Venkat .P
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
ssuser1a4f0f
 
Trends in it
Trends in itTrends in it
Trends in it
Mehak Vaswani
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
Kimberley Mitchell
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical Universitybutest
 
Designing High Quality Data Driven Solutions 110520
Designing High Quality Data Driven Solutions 110520Designing High Quality Data Driven Solutions 110520
Designing High Quality Data Driven Solutions 110520
MariaHalstead1
 
Applying Architecture Design for Information Delivery - HC
Applying Architecture Design for Information Delivery - HCApplying Architecture Design for Information Delivery - HC
Applying Architecture Design for Information Delivery - HC
Human Managed
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
vishal choudhary
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
ssuser0413ec
 
Process-Centric Governance and Information Architecture
Process-Centric Governance and Information ArchitectureProcess-Centric Governance and Information Architecture
Process-Centric Governance and Information Architecture
Simon Rawson
 
Introducation to Information System
 Introducation to Information System  Introducation to Information System
Introducation to Information System
Vidya sagar Sharma
 

Similar to UI introduction_to_data_mining YA.ppt (20)

chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
Information_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.pptInformation_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.ppt
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptx
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
Operations Research and ICT A Keynote Address
Operations Research and ICT A Keynote AddressOperations Research and ICT A Keynote Address
Operations Research and ICT A Keynote Address
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data science and business analytics
Data  science and business analyticsData  science and business analytics
Data science and business analytics
 
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
 
Introductions to Business Analytics
Introductions to Business Analytics Introductions to Business Analytics
Introductions to Business Analytics
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Trends in it
Trends in itTrends in it
Trends in it
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
 
Designing High Quality Data Driven Solutions 110520
Designing High Quality Data Driven Solutions 110520Designing High Quality Data Driven Solutions 110520
Designing High Quality Data Driven Solutions 110520
 
Applying Architecture Design for Information Delivery - HC
Applying Architecture Design for Information Delivery - HCApplying Architecture Design for Information Delivery - HC
Applying Architecture Design for Information Delivery - HC
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
 
Process-Centric Governance and Information Architecture
Process-Centric Governance and Information ArchitectureProcess-Centric Governance and Information Architecture
Process-Centric Governance and Information Architecture
 
Introducation to Information System
 Introducation to Information System  Introducation to Information System
Introducation to Information System
 

Recently uploaded

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 

Recently uploaded (20)

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 

UI introduction_to_data_mining YA.ppt

  • 1. /faculteit technologie management Introduction to Data Mining a.j.m.m. (ton) weijters (slides are partially based on an introduction of Gregory Piatetsky-Shapiro)
  • 2. /faculteit technologie management Overview • Why data mining (data cascade) • Application examples • Data Mining & Knowledge Discovering • Data Mining versus Process Mining
  • 3. /faculteit technologie management Why Data Mining • Cascade of data – Different growth rates, but about 30% each year is a low growth rate estimation • The possibility to use computers to analyze data – 1975 computer for the whole university (main frame) with 1MB working memory, now a PC with 512 MB working memory
  • 4. /faculteit technologie management Cascade of data  Business and government systems (transactions system, ERP systems, Workflow systems, ...)  Scientific data: astronomy, biology, etc  Web, text, and e-commerce (new regularities, about data storage to prevent attempts)  Hospitals, internal revenue service  ...
  • 5. /faculteit technologie management Examples large data bases • AT&T handles billions of calls per day – so much data, it cannot be all stored -- analysis has to be done “on the fly” • Europe's Very Long Baseline Interferometry (VLBI) has 16 telescopes, each of which produces 1 Gigabit/second of astronomical data over a 25-day observation session • Google
  • 6. /faculteit technologie management First conclusion • Very little data will ever be looked at by a human • Data Mining algorithms and computers are NEEDED to make sense and use of data.
  • 7. /faculteit technologie management Overview • Why data mining (data cascade) • Application examples • Data Mining & Knowledge Discovering • Data Mining versus Process Mining
  • 8. /faculteit technologie management Application examples I • Customer Relationship Management (CRM) – Based on a data base with client information and behavior try to select other potential consumers of a product. – Euro miles. • Profiling tax cheaters – Based on the profile of the tax payer and some figures from the tax (electronic) form try to product tax cheating.
  • 9. /faculteit technologie management Application examples II • Health care – Given the patient profile and the diagnoses try to predict the number of hospital days. Information is used in planning system. • Industry – Job shop planning. Based on already accepted jobs, try to product the delivery time of a new offered job.
  • 10. /faculteit technologie management Type of applications • Classification (supervised) – Credit risk: result of data mining are rules that can be used to classify new clients as: high, normal, low • Estimation (supervised) – Credit risk: output is not a classification but a number between -1 and 1 to indicate risk (-1.0 very low, 0.0 normal, +1.0 very high) • Clustering (unsupervised) • Associations: e.g. Bier & Chips & Peanuts occur frequently in a shopping list of one person • Visualization: to facilitate human discovery
  • 11. /faculteit technologie management Supervised verses unsupervised • Supervised (Credit risk) – Starting point is a historical data base with client information and his/her financial data including credit history (classification). This data base is used to induce credit risk rules. • Unsupervised (Clustering) – Try to cluster customers into similar groups (how many groups, in which sense similar)
  • 12. /faculteit technologie management E-commerce – Case Study • A person buys a book (product) at Amazon.com. • Task: Recommend other books (products) this person is likely to buy • Amazon does clustering based on books bought: – customers who bought “Advances in Knowledge Discovery and Data Mining”, also bought “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations” • Recommendation program is quite successful
  • 13. /faculteit technologie management Hands-on-project I • Historical consumer data – Age, education, sex, relationship, etc. – Income • Model to predict income above 50K • Use the model to select consumers for direct mailing
  • 14. /faculteit technologie management Problems Suitable for Data-Mining • have sub-optimal current methods • have accessible, sufficient, and relevant data • provides high payoff for the right decisions! • (have a changing environment)
  • 15. /faculteit technologie management Overview • Why data mining (data cascade) • Application examples • Data Mining & Knowledge Discovering • Data Mining versus Process Mining
  • 16. /faculteit technologie management Knowledge Discovery Definition Knowledge Discovery in Data is the non-trivial process of identifying – valid – novel – potentially useful – and ultimately understandable patterns in data. from Advances in Knowledge Discovery and Data Mining, Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy, (Chapter 1), AAAI/MIT Press 1996
  • 17. /faculteit technologie management Related Fields Statistics Machine Learning Databases Visualization Data Mining and Knowledge Discovery
  • 18. /faculteit technologie management Statistics, Machine Learning and Data Mining • Statistics: – more theory-based – more focused on testing hypotheses • Machine Learning – more heuristics then theory-based – focused on improving performance of a learning algorithms • Data Mining and Knowledge Discovery – Data Mining one step in the Knowledge Discovery process (applying the Machine Learning algorithm) – Knowledge Discovery, the whole process including data cleaning, learning, and integration and visualization of results • Distinctions are fuzzy
  • 19. /faculteit technologie management Knowledge Discovery Process flow, according to CRISP-DM Monitoring Business Understanding + Data Understanding + Data Preparation 80% of the time Modeling (applying mining algorithm) 20%
  • 20. /faculteit technologie management Business Understanding Data Understanding Evaluation Data Preparation Modeling Determine Business Objectives Background Business Objectives Business Success Criteria Situation Assessment Inventory of Resources Requirements, Assumptions, and Constraints Risks and Contingencies Terminology Costs and Benefits Determine Data Mining Goal Data Mining Goals Data Mining Success Criteria Produce Project Plan Project Plan Initial Asessment of Tools and Techniques Collect Initial Data Initial Data Collection Report Describe Data Data Description Report Explore Data Data Exploration Report Verify Data Quality Data Quality Report Data Set Data Set Description Select Data Rationale for Inclusion / Exclusion Clean Data Data Cleaning Report Construct Data Derived Attributes Generated Records Integrate Data Merged Data Format Data Reformatted Data Select Modeling Technique Modeling Technique Modeling Assumptions Generate Test Design Test Design Build Model Parameter Settings Models Model Description Assess Model Model Assessment Revised Parameter Settings Evaluate Results Assessment of Data Mining Results w.r.t. Business Success Criteria Approved Models Review Process Review of Process Determine Next Steps List of Possible Actions Decision Plan Deployment Deployment Plan Plan Monitoring and Maintenance Monitoring and Maintenance Plan Produce Final Report Final Report Final Presentation Review Project Experience Documentation Deployment Phases and Tasks
  • 21. /faculteit technologie management Other related fields • Data warehouse – A data warehouse thus not contain simply accumulated data at a central point, but the data is carefully assembled from a variety of information sources around the organization, cleaned u, quality assured, and then released (published). • Business Intelligence (BI) – The use of data in the data ware house to support the managers with important information
  • 22. /faculteit technologie management Overview • Why data mining (data cascade) • Application examples • Data Mining & Knowledge Discovering • Data Mining versus Process Mining
  • 23. /faculteit technologie management Data Mining versus Process Mining • Process Mining is data mining but with a strong business process view. • Some of the more traditional data mining techniques can be used in the context of process mining. • Some new techniques are developed to perform process mining (mining of process models).
  • 24. /faculteit technologie management Why Process Mining • Traditional As-Is analysis of business processes strongly based on the opinion of process expert. The basic idea is to assemble an appropriate team and to organize modeling sessions in which the knowledge of the team members is used to build an adequate As-Is process model. • The surplus values of process mining in the As-Is analysis are: – information based on the real performance of the process (objective) – more details