SlideShare a Scribd company logo
Data Mining 101

George Tziralis, FOSS Conf, June 19 09, Athens, GR
the facts
the facts



       The data
       gap
the promise


 understand and take
advantage of the world’s
     information
the name


      data mining:
statistics at speed, scale
     and simplicity
what is

Databases   Statistics


     Artificial
    Intelligence
the difference

•statistics: define a
 hypothesis, then test
•data mining: test all
 possible hypotheses
•is it possible? YES!
the tasks

•classification
•association
•clustering
•prediction
the process

•data input & exploration
•preprocessing
•data mining algorithms
•evaluation &
 intrepretation
an example
#    color    size   value   buy
1     blue    5.32    b      no
2    yellow   8.57    a      yes
3    green    1.23     c     no
4    yellow   9.35     c     yes
5     red     5.99    b      yes
6     red     4.43    b      yes
7    green    6.21    b      no
8    white    4.89    a      yes
9    black    5.15    b      no
10   green    5.67    b      no
an example
attribute                          target
#    color    size   value   buy
1     blue    5.32    b      no
2    yellow   8.57    a      yes
3    green    1.23     c     no      instance
4    yellow   9.35     c     yes
5     red     5.99    b      yes
6     red     4.43    b      yes
7    green    6.21    b      no
8    white    4.89    a      yes
9    black    5.15    b      no
10   green    5.67    b      no
so far

          size
10.0


 7.5


 5.0


 2.5


  0
now


• if size = [4.0 - 7.0] & value = {b,c}
  then buy = no
now
• If color = yellow then buy = yes
• If color = red then buy = yes
• If color = white then buy = yes
• If color = green then buy = no
• If color = blue then buy = no
• If color = black then buy = no
ok, cool! but how?
the tool




                   Weka
Waikato Environment for Knowledge Analysis
    OSS, written in Java, providing API
start




start -> explorer
explore




open file -> data -> contact-lenses.arff
.arff how-to
% ARFF file of the example’s data
@relation testset
@attribute color {blue, yellow, green, red}
@attribute size numeric
@attribute value {a, b, c}
@attribute buy {yes, no}
@data
blue, 5.32, b, no
yellow, 8.57, a, yes
green, 1.23, c, no
...
preprocess




filter -> ... {tons of filters}
visualize




tab “visualize” (per target/class)
visualize




tab “preprocess’’ -> visualize all (per class)
select attributes




tab “select attributes” (default settings)
classify




tab “classify’’ -> rules -> PART -> start!
associate




tab “associate’’ -> start! (default settings)
pls tell me more!
the book




your data mining & data guide!
thank you
gtziralis@gmail.com

More Related Content

More from George Tziralis

PJ Tech Catalyst
PJ Tech CatalystPJ Tech Catalyst
PJ Tech Catalyst
George Tziralis
 
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβουςΤι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
George Tziralis
 
Entrepreneurship is Social
Entrepreneurship is SocialEntrepreneurship is Social
Entrepreneurship is Social
George Tziralis
 
Become innovative - a how to
Become innovative - a how toBecome innovative - a how to
Become innovative - a how to
George Tziralis
 
Open Coffee at Stanford
Open Coffee at StanfordOpen Coffee at Stanford
Open Coffee at Stanford
George Tziralis
 
Mining the Intensive Care Unit
Mining the Intensive Care UnitMining the Intensive Care Unit
Mining the Intensive Care Unit
George Tziralis
 
Σκόρπιες σκέψεις περί επιχειρείν
Σκόρπιες σκέψεις περί επιχειρείνΣκόρπιες σκέψεις περί επιχειρείν
Σκόρπιες σκέψεις περί επιχειρείν
George Tziralis
 
A short intro to askmarkets
A short intro to askmarketsA short intro to askmarkets
A short intro to askmarkets
George Tziralis
 
Presenting AskMarkets at #ioc09
Presenting AskMarkets at  #ioc09Presenting AskMarkets at  #ioc09
Presenting AskMarkets at #ioc09
George Tziralis
 
Analysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.itAnalysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.it
George Tziralis
 
AskMarkets pitch
AskMarkets pitchAskMarkets pitch
AskMarkets pitch
George Tziralis
 
A DataMine.it Case Study: Analyzing Earthquakes
A DataMine.it Case Study: Analyzing EarthquakesA DataMine.it Case Study: Analyzing Earthquakes
A DataMine.it Case Study: Analyzing Earthquakes
George Tziralis
 
Quantitative Model For an Impact Measurement System
Quantitative Model For an Impact Measurement SystemQuantitative Model For an Impact Measurement System
Quantitative Model For an Impact Measurement System
George Tziralis
 
HowSocialRU Launch Presentation, Startup Weekend Athens
HowSocialRU Launch Presentation, Startup Weekend AthensHowSocialRU Launch Presentation, Startup Weekend Athens
HowSocialRU Launch Presentation, Startup Weekend Athens
George Tziralis
 
George Saliaris Faseas at Open Coffee Athens XVI
George Saliaris Faseas at Open Coffee Athens XVIGeorge Saliaris Faseas at Open Coffee Athens XVI
George Saliaris Faseas at Open Coffee Athens XVI
George Tziralis
 
Market Drive Innovation Management, from the inside in
Market Drive Innovation Management, from the inside inMarket Drive Innovation Management, from the inside in
Market Drive Innovation Management, from the inside in
George Tziralis
 
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIVDimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIVGeorge Tziralis
 
Tziralis & Ipeirotis at 3rd Prediction Markets Workshop
Tziralis & Ipeirotis at 3rd Prediction Markets WorkshopTziralis & Ipeirotis at 3rd Prediction Markets Workshop
Tziralis & Ipeirotis at 3rd Prediction Markets Workshop
George Tziralis
 
Greek Law on e-Commerce
Greek Law on e-CommerceGreek Law on e-Commerce
Greek Law on e-Commerce
George Tziralis
 
Open Coffee Athens - Patrick de Laive
Open Coffee Athens - Patrick de LaiveOpen Coffee Athens - Patrick de Laive
Open Coffee Athens - Patrick de Laive
George Tziralis
 

More from George Tziralis (20)

PJ Tech Catalyst
PJ Tech CatalystPJ Tech Catalyst
PJ Tech Catalyst
 
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβουςΤι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
Τι να κάνω στη ζωή μου - μια παρουσίαση για εφήβους
 
Entrepreneurship is Social
Entrepreneurship is SocialEntrepreneurship is Social
Entrepreneurship is Social
 
Become innovative - a how to
Become innovative - a how toBecome innovative - a how to
Become innovative - a how to
 
Open Coffee at Stanford
Open Coffee at StanfordOpen Coffee at Stanford
Open Coffee at Stanford
 
Mining the Intensive Care Unit
Mining the Intensive Care UnitMining the Intensive Care Unit
Mining the Intensive Care Unit
 
Σκόρπιες σκέψεις περί επιχειρείν
Σκόρπιες σκέψεις περί επιχειρείνΣκόρπιες σκέψεις περί επιχειρείν
Σκόρπιες σκέψεις περί επιχειρείν
 
A short intro to askmarkets
A short intro to askmarketsA short intro to askmarkets
A short intro to askmarkets
 
Presenting AskMarkets at #ioc09
Presenting AskMarkets at  #ioc09Presenting AskMarkets at  #ioc09
Presenting AskMarkets at #ioc09
 
Analysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.itAnalysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.it
 
AskMarkets pitch
AskMarkets pitchAskMarkets pitch
AskMarkets pitch
 
A DataMine.it Case Study: Analyzing Earthquakes
A DataMine.it Case Study: Analyzing EarthquakesA DataMine.it Case Study: Analyzing Earthquakes
A DataMine.it Case Study: Analyzing Earthquakes
 
Quantitative Model For an Impact Measurement System
Quantitative Model For an Impact Measurement SystemQuantitative Model For an Impact Measurement System
Quantitative Model For an Impact Measurement System
 
HowSocialRU Launch Presentation, Startup Weekend Athens
HowSocialRU Launch Presentation, Startup Weekend AthensHowSocialRU Launch Presentation, Startup Weekend Athens
HowSocialRU Launch Presentation, Startup Weekend Athens
 
George Saliaris Faseas at Open Coffee Athens XVI
George Saliaris Faseas at Open Coffee Athens XVIGeorge Saliaris Faseas at Open Coffee Athens XVI
George Saliaris Faseas at Open Coffee Athens XVI
 
Market Drive Innovation Management, from the inside in
Market Drive Innovation Management, from the inside inMarket Drive Innovation Management, from the inside in
Market Drive Innovation Management, from the inside in
 
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIVDimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
Dimitris Tsigos presents VTrip Group at Open Coffee Athens XIV
 
Tziralis & Ipeirotis at 3rd Prediction Markets Workshop
Tziralis & Ipeirotis at 3rd Prediction Markets WorkshopTziralis & Ipeirotis at 3rd Prediction Markets Workshop
Tziralis & Ipeirotis at 3rd Prediction Markets Workshop
 
Greek Law on e-Commerce
Greek Law on e-CommerceGreek Law on e-Commerce
Greek Law on e-Commerce
 
Open Coffee Athens - Patrick de Laive
Open Coffee Athens - Patrick de LaiveOpen Coffee Athens - Patrick de Laive
Open Coffee Athens - Patrick de Laive
 

Recently uploaded

Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 

Data Mining 101