SlideShare a Scribd company logo
Data Mining 101

George Tziralis, FOSS Conf, June 19 09, Athens, GR
the facts
the facts



       The data
       gap
the promise


 understand and take
advantage of the world’s
     information
the name


      data mining:
statistics at speed, scale
     and simplicity
what is

Databases   Statistics


     Artificial
    Intelligence
the difference

•statistics: define a
 hypothesis, then test
•data mining: test all
 possible hypotheses
•is it possible? YES!
the tasks

•classification
•association
•clustering
•prediction
the process

•data input & exploration
•preprocessing
•data mining algorithms
•evaluation &
 intrepretation
an example
#    color    size   value   buy
1     blue    5.32    b      no
2    yellow   8.57    a      yes
3    green    1.23     c     no
4    yellow   9.35     c     yes
5     red     5.99    b      yes
6     red     4.43    b      yes
7    green    6.21    b      no
8    white    4.89    a      yes
9    black    5.15    b      no
10   green    5.67    b      no
an example
attribute                          target
#    color    size   value   buy
1     blue    5.32    b      no
2    yellow   8.57    a      yes
3    green    1.23     c     no      instance
4    yellow   9.35     c     yes
5     red     5.99    b      yes
6     red     4.43    b      yes
7    green    6.21    b      no
8    white    4.89    a      yes
9    black    5.15    b      no
10   green    5.67    b      no
so far

          size
10.0


 7.5


 5.0


 2.5


  0
now


• if size = [4.0 - 7.0] & value = {b,c}
  then buy = no
now
• If color = yellow then buy = yes
• If color = red then buy = yes
• If color = white then buy = yes
• If color = green then buy = no
• If color = blue then buy = no
• If color = black then buy = no
ok, cool! but how?
the tool




                   Weka
Waikato Environment for Knowledge Analysis
    OSS, written in Java, providing API
start




start -> explorer
explore




open file -> data -> contact-lenses.arff
.arff how-to
% ARFF file of the example’s data
@relation testset
@attribute color {blue, yellow, green, red}
@attribute size numeric
@attribute value {a, b, c}
@attribute buy {yes, no}
@data
blue, 5.32, b, no
yellow, 8.57, a, yes
green, 1.23, c, no
...
preprocess




filter -> ... {tons of filters}
visualize




tab “visualize” (per target/class)
visualize




tab “preprocess’’ -> visualize all (per class)
select attributes




tab “select attributes” (default settings)
classify




tab “classify’’ -> rules -> PART -> start!
associate




tab “associate’’ -> start! (default settings)
pls tell me more!
the book




your data mining & data guide!
thank you
gtziralis@gmail.com

More Related Content

More from George Tziralis

PJ Tech Catalyst
PJ Tech CatalystPJ Tech Catalyst
PJ Tech Catalyst
George Tziralis
 
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβουςΤι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
George Tziralis
 
Entrepreneurship is Social
Entrepreneurship is SocialEntrepreneurship is Social
Entrepreneurship is Social
George Tziralis
 
Become innovative - a how to
Become innovative - a how toBecome innovative - a how to
Become innovative - a how to
George Tziralis
 
Open Coffee at Stanford
Open Coffee at StanfordOpen Coffee at Stanford
Open Coffee at Stanford
George Tziralis
 
Mining the Intensive Care Unit
Mining the Intensive Care UnitMining the Intensive Care Unit
Mining the Intensive Care Unit
George Tziralis
 
Σκόρπιες σκέψεις περί επιχειρείν
Σκόρπιες σκέψεις περί επιχειρείνΣκόρπιες σκέψεις περί επιχειρείν
Σκόρπιες σκέψεις περί επιχειρείν
George Tziralis
 
A short intro to askmarkets
A short intro to askmarketsA short intro to askmarkets
A short intro to askmarkets
George Tziralis
 
Presenting AskMarkets at #ioc09
Presenting AskMarkets at  #ioc09Presenting AskMarkets at  #ioc09
Presenting AskMarkets at #ioc09
George Tziralis
 
Analysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.itAnalysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.it
George Tziralis
 
AskMarkets pitch
AskMarkets pitchAskMarkets pitch
AskMarkets pitch
George Tziralis
 
A DataMine.it Case Study: Analyzing Earthquakes
A DataMine.it Case Study: Analyzing EarthquakesA DataMine.it Case Study: Analyzing Earthquakes
A DataMine.it Case Study: Analyzing Earthquakes
George Tziralis
 
Quantitative Model For an Impact Measurement System
Quantitative Model For an Impact Measurement SystemQuantitative Model For an Impact Measurement System
Quantitative Model For an Impact Measurement System
George Tziralis
 
HowSocialRU Launch Presentation, Startup Weekend Athens
HowSocialRU Launch Presentation, Startup Weekend AthensHowSocialRU Launch Presentation, Startup Weekend Athens
HowSocialRU Launch Presentation, Startup Weekend Athens
George Tziralis
 
George Saliaris Faseas at Open Coffee Athens XVI
George Saliaris Faseas at Open Coffee Athens XVIGeorge Saliaris Faseas at Open Coffee Athens XVI
George Saliaris Faseas at Open Coffee Athens XVI
George Tziralis
 
Market Drive Innovation Management, from the inside in
Market Drive Innovation Management, from the inside inMarket Drive Innovation Management, from the inside in
Market Drive Innovation Management, from the inside in
George Tziralis
 
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIVDimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIVGeorge Tziralis
 
Tziralis & Ipeirotis at 3rd Prediction Markets Workshop
Tziralis & Ipeirotis at 3rd Prediction Markets WorkshopTziralis & Ipeirotis at 3rd Prediction Markets Workshop
Tziralis & Ipeirotis at 3rd Prediction Markets Workshop
George Tziralis
 
Greek Law on e-Commerce
Greek Law on e-CommerceGreek Law on e-Commerce
Greek Law on e-Commerce
George Tziralis
 
Open Coffee Athens - Patrick de Laive
Open Coffee Athens - Patrick de LaiveOpen Coffee Athens - Patrick de Laive
Open Coffee Athens - Patrick de Laive
George Tziralis
 

More from George Tziralis (20)

PJ Tech Catalyst
PJ Tech CatalystPJ Tech Catalyst
PJ Tech Catalyst
 
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβουςΤι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
 
Entrepreneurship is Social
Entrepreneurship is SocialEntrepreneurship is Social
Entrepreneurship is Social
 
Become innovative - a how to
Become innovative - a how toBecome innovative - a how to
Become innovative - a how to
 
Open Coffee at Stanford
Open Coffee at StanfordOpen Coffee at Stanford
Open Coffee at Stanford
 
Mining the Intensive Care Unit
Mining the Intensive Care UnitMining the Intensive Care Unit
Mining the Intensive Care Unit
 
Σκόρπιες σκέψεις περί επιχειρείν
Σκόρπιες σκέψεις περί επιχειρείνΣκόρπιες σκέψεις περί επιχειρείν
Σκόρπιες σκέψεις περί επιχειρείν
 
A short intro to askmarkets
A short intro to askmarketsA short intro to askmarkets
A short intro to askmarkets
 
Presenting AskMarkets at #ioc09
Presenting AskMarkets at  #ioc09Presenting AskMarkets at  #ioc09
Presenting AskMarkets at #ioc09
 
Analysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.itAnalysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.it
 
AskMarkets pitch
AskMarkets pitchAskMarkets pitch
AskMarkets pitch
 
A DataMine.it Case Study: Analyzing Earthquakes
A DataMine.it Case Study: Analyzing EarthquakesA DataMine.it Case Study: Analyzing Earthquakes
A DataMine.it Case Study: Analyzing Earthquakes
 
Quantitative Model For an Impact Measurement System
Quantitative Model For an Impact Measurement SystemQuantitative Model For an Impact Measurement System
Quantitative Model For an Impact Measurement System
 
HowSocialRU Launch Presentation, Startup Weekend Athens
HowSocialRU Launch Presentation, Startup Weekend AthensHowSocialRU Launch Presentation, Startup Weekend Athens
HowSocialRU Launch Presentation, Startup Weekend Athens
 
George Saliaris Faseas at Open Coffee Athens XVI
George Saliaris Faseas at Open Coffee Athens XVIGeorge Saliaris Faseas at Open Coffee Athens XVI
George Saliaris Faseas at Open Coffee Athens XVI
 
Market Drive Innovation Management, from the inside in
Market Drive Innovation Management, from the inside inMarket Drive Innovation Management, from the inside in
Market Drive Innovation Management, from the inside in
 
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIVDimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
 
Tziralis & Ipeirotis at 3rd Prediction Markets Workshop
Tziralis & Ipeirotis at 3rd Prediction Markets WorkshopTziralis & Ipeirotis at 3rd Prediction Markets Workshop
Tziralis & Ipeirotis at 3rd Prediction Markets Workshop
 
Greek Law on e-Commerce
Greek Law on e-CommerceGreek Law on e-Commerce
Greek Law on e-Commerce
 
Open Coffee Athens - Patrick de Laive
Open Coffee Athens - Patrick de LaiveOpen Coffee Athens - Patrick de Laive
Open Coffee Athens - Patrick de Laive
 

Recently uploaded

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 

Recently uploaded (20)

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 

Data Mining 101