SlideShare a Scribd company logo
1 of 29
Download to read offline
Data Mining 101

George Tziralis, FOSS Conf, June 19 09, Athens, GR
the facts
the facts



       The data
       gap
the promise


 understand and take
advantage of the world’s
     information
the name


      data mining:
statistics at speed, scale
     and simplicity
what is

Databases   Statistics


     Artificial
    Intelligence
the difference

•statistics: define a
 hypothesis, then test
•data mining: test all
 possible hypotheses
•is it possible? YES!
the tasks

•classification
•association
•clustering
•prediction
the process

•data input & exploration
•preprocessing
•data mining algorithms
•evaluation &
 intrepretation
an example
#    color    size   value   buy
1     blue    5.32    b      no
2    yellow   8.57    a      yes
3    green    1.23     c     no
4    yellow   9.35     c     yes
5     red     5.99    b      yes
6     red     4.43    b      yes
7    green    6.21    b      no
8    white    4.89    a      yes
9    black    5.15    b      no
10   green    5.67    b      no
an example
attribute                          target
#    color    size   value   buy
1     blue    5.32    b      no
2    yellow   8.57    a      yes
3    green    1.23     c     no      instance
4    yellow   9.35     c     yes
5     red     5.99    b      yes
6     red     4.43    b      yes
7    green    6.21    b      no
8    white    4.89    a      yes
9    black    5.15    b      no
10   green    5.67    b      no
so far

          size
10.0


 7.5


 5.0


 2.5


  0
now


• if size = [4.0 - 7.0] & value = {b,c}
  then buy = no
now
• If color = yellow then buy = yes
• If color = red then buy = yes
• If color = white then buy = yes
• If color = green then buy = no
• If color = blue then buy = no
• If color = black then buy = no
ok, cool! but how?
the tool




                   Weka
Waikato Environment for Knowledge Analysis
    OSS, written in Java, providing API
start




start -> explorer
explore




open file -> data -> contact-lenses.arff
.arff how-to
% ARFF file of the example’s data
@relation testset
@attribute color {blue, yellow, green, red}
@attribute size numeric
@attribute value {a, b, c}
@attribute buy {yes, no}
@data
blue, 5.32, b, no
yellow, 8.57, a, yes
green, 1.23, c, no
...
preprocess




filter -> ... {tons of filters}
visualize




tab “visualize” (per target/class)
visualize




tab “preprocess’’ -> visualize all (per class)
select attributes




tab “select attributes” (default settings)
classify




tab “classify’’ -> rules -> PART -> start!
associate




tab “associate’’ -> start! (default settings)
pls tell me more!
the book




your data mining & data guide!
thank you
gtziralis@gmail.com

More Related Content

More from George Tziralis

Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβουςΤι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβουςGeorge Tziralis
 
Entrepreneurship is Social
Entrepreneurship is SocialEntrepreneurship is Social
Entrepreneurship is SocialGeorge Tziralis
 
Become innovative - a how to
Become innovative - a how toBecome innovative - a how to
Become innovative - a how toGeorge Tziralis
 
Mining the Intensive Care Unit
Mining the Intensive Care UnitMining the Intensive Care Unit
Mining the Intensive Care UnitGeorge Tziralis
 
Σκόρπιες σκέψεις περί επιχειρείν
Σκόρπιες σκέψεις περί επιχειρείνΣκόρπιες σκέψεις περί επιχειρείν
Σκόρπιες σκέψεις περί επιχειρείνGeorge Tziralis
 
A short intro to askmarkets
A short intro to askmarketsA short intro to askmarkets
A short intro to askmarketsGeorge Tziralis
 
Presenting AskMarkets at #ioc09
Presenting AskMarkets at  #ioc09Presenting AskMarkets at  #ioc09
Presenting AskMarkets at #ioc09George Tziralis
 
Analysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.itAnalysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.itGeorge Tziralis
 
A DataMine.it Case Study: Analyzing Earthquakes
A DataMine.it Case Study: Analyzing EarthquakesA DataMine.it Case Study: Analyzing Earthquakes
A DataMine.it Case Study: Analyzing EarthquakesGeorge Tziralis
 
Quantitative Model For an Impact Measurement System
Quantitative Model For an Impact Measurement SystemQuantitative Model For an Impact Measurement System
Quantitative Model For an Impact Measurement SystemGeorge Tziralis
 
HowSocialRU Launch Presentation, Startup Weekend Athens
HowSocialRU Launch Presentation, Startup Weekend AthensHowSocialRU Launch Presentation, Startup Weekend Athens
HowSocialRU Launch Presentation, Startup Weekend AthensGeorge Tziralis
 
George Saliaris Faseas at Open Coffee Athens XVI
George Saliaris Faseas at Open Coffee Athens XVIGeorge Saliaris Faseas at Open Coffee Athens XVI
George Saliaris Faseas at Open Coffee Athens XVIGeorge Tziralis
 
Market Drive Innovation Management, from the inside in
Market Drive Innovation Management, from the inside inMarket Drive Innovation Management, from the inside in
Market Drive Innovation Management, from the inside inGeorge Tziralis
 
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIVDimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIVGeorge Tziralis
 
Tziralis & Ipeirotis at 3rd Prediction Markets Workshop
Tziralis & Ipeirotis at 3rd Prediction Markets WorkshopTziralis & Ipeirotis at 3rd Prediction Markets Workshop
Tziralis & Ipeirotis at 3rd Prediction Markets WorkshopGeorge Tziralis
 
Open Coffee Athens - Patrick de Laive
Open Coffee Athens - Patrick de LaiveOpen Coffee Athens - Patrick de Laive
Open Coffee Athens - Patrick de LaiveGeorge Tziralis
 

More from George Tziralis (20)

PJ Tech Catalyst
PJ Tech CatalystPJ Tech Catalyst
PJ Tech Catalyst
 
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβουςΤι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
 
Entrepreneurship is Social
Entrepreneurship is SocialEntrepreneurship is Social
Entrepreneurship is Social
 
Become innovative - a how to
Become innovative - a how toBecome innovative - a how to
Become innovative - a how to
 
Open Coffee at Stanford
Open Coffee at StanfordOpen Coffee at Stanford
Open Coffee at Stanford
 
Mining the Intensive Care Unit
Mining the Intensive Care UnitMining the Intensive Care Unit
Mining the Intensive Care Unit
 
Σκόρπιες σκέψεις περί επιχειρείν
Σκόρπιες σκέψεις περί επιχειρείνΣκόρπιες σκέψεις περί επιχειρείν
Σκόρπιες σκέψεις περί επιχειρείν
 
A short intro to askmarkets
A short intro to askmarketsA short intro to askmarkets
A short intro to askmarkets
 
Presenting AskMarkets at #ioc09
Presenting AskMarkets at  #ioc09Presenting AskMarkets at  #ioc09
Presenting AskMarkets at #ioc09
 
Analysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.itAnalysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.it
 
AskMarkets pitch
AskMarkets pitchAskMarkets pitch
AskMarkets pitch
 
A DataMine.it Case Study: Analyzing Earthquakes
A DataMine.it Case Study: Analyzing EarthquakesA DataMine.it Case Study: Analyzing Earthquakes
A DataMine.it Case Study: Analyzing Earthquakes
 
Quantitative Model For an Impact Measurement System
Quantitative Model For an Impact Measurement SystemQuantitative Model For an Impact Measurement System
Quantitative Model For an Impact Measurement System
 
HowSocialRU Launch Presentation, Startup Weekend Athens
HowSocialRU Launch Presentation, Startup Weekend AthensHowSocialRU Launch Presentation, Startup Weekend Athens
HowSocialRU Launch Presentation, Startup Weekend Athens
 
George Saliaris Faseas at Open Coffee Athens XVI
George Saliaris Faseas at Open Coffee Athens XVIGeorge Saliaris Faseas at Open Coffee Athens XVI
George Saliaris Faseas at Open Coffee Athens XVI
 
Market Drive Innovation Management, from the inside in
Market Drive Innovation Management, from the inside inMarket Drive Innovation Management, from the inside in
Market Drive Innovation Management, from the inside in
 
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIVDimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
 
Tziralis & Ipeirotis at 3rd Prediction Markets Workshop
Tziralis & Ipeirotis at 3rd Prediction Markets WorkshopTziralis & Ipeirotis at 3rd Prediction Markets Workshop
Tziralis & Ipeirotis at 3rd Prediction Markets Workshop
 
Greek Law on e-Commerce
Greek Law on e-CommerceGreek Law on e-Commerce
Greek Law on e-Commerce
 
Open Coffee Athens - Patrick de Laive
Open Coffee Athens - Patrick de LaiveOpen Coffee Athens - Patrick de Laive
Open Coffee Athens - Patrick de Laive
 

Recently uploaded

Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 

Recently uploaded (20)

Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 

Data Mining 101