SlideShare a Scribd company logo
1 of 14
Data Mining
Milan Mirković
Demistification
Oponašanje ljudske inteligencije
Artificial
Intelligence
Machine
Learning
Deep
Learning
Samostalno učenje, bez eksplicitnog
programiranja
Duboke neuronske mreže
Mašinsko učenje (vrste)
Supervised (nadgledano)
Klasifikacija
• Spam filteri
• Detekcija defekata
• Prepoznavanje objekata
Regresija
• Predviđanje cena na tržištu
• Vremenska prognoza
Unsupervised (nenadgledano)
Klastering
• Segmentacija klijenata
• Identifikacija turističkih atrakcija
Pravila asocijacije
• Analiza potrošačke korpe
Reinforcement
(učenje sa podrškom)
Data mining is an analytical business process which
applies business knowledge to data in order to
achieve business goals, creating new business
knowledge and often using predictive modelling
algorithms. Predictive modelling algorithms are
also called “data mining algorithms”; most originate
in the fields of machine learning and statistics.1
- Tom Khabaza
1 http://khabaza.codimension.net/index_files/datamining.htm
CRoss-Industry Standard Process for
Data Mining
• Objavljen 1999.
• Opisuje tipične faze projekta i aktivnosti u
okviru svake faze
• Kao i zavisnosti odnosno tipične redoslede
faza
• Redosled međutim nije striktan i često dolazi
do povratka na prethodne faze
• Iteracije i preispitivanje
CRISP-DM
Business Understanding
Poslovno razumevanje
• Razumevanje trenutne situacije
• Problemi, raspoloživi resursi, trenutna rešenja u upotrebi
• Definisanje ciljeva i kriterijuma uspeha
• Usaglašavanje očekivanja
• Ograničenja
• Plan projekta
• Obično je podložan promenama
• Opis resursa neophodnih za realizaciju aktivnosti u fazama
Data Understanding
• Raspoloživi podaci, prikupljanje novih, eksterni podaci
• Transakcioni sistemi, kupovina podataka
• Prikupljanje i opisivanje podataka
• Broj zapisa/atributa, tipovi, distribucije vrednosti, osobine
relevantnih podskupova, granularnost i frekvencija
• Provera kvaliteta podataka
• Nedostajuće vrednosti, greške
Razumevanje podataka
Data Preparation
Priprema podataka
• Selekcija (izbor podataka koji će se koristiti)
• Razlozi za upotrebu/isključivanje izvora/zapisa/atributa
• Čišćenje podakta (data cleaning)
• Tretman nedostajućih/ekstremnih vrednosti, konverzija tipova
• Konstruisanje atributa
• Povezivanje iz različitih izvora, kreiranje/izvođenje,
normalizacija/standardizacija
Modeling
Modelovanje
• Izbor tehnika i dokumentovanje
• Posebne pretpostavke ili zahtevi?
• Plan modelovanja
• Razdvajanje podataka u trening/validacioni/test set
• Izgradnja (kreiranje) modela
• Izbor parametara
• Procena performansi i rangiranje
Nadgledano mašinsko učenje
Od ulaznih podataka do predviđanja
Evaluation
• Da li model zadovoljava očekivanja?
• Da li model ima neke nedostatke?
• Šta je sve otkriveno tokom prethodnih faza?
• Interesantni obrasci
• Da li treba napraviti neke izmene u prethodnim
fazama/uraditi nešto na drugačiji način?
Evaluacija
Deployment
Primena
• Plan primene (prediktivnog) rešenja
• Tehnički (operacionalizacija/integracija)
• Organizacioni (promene u procesima)
• Praćenje i održavanje rešenja
• Da li model i dalje dobro radi?
• Preispitivanje projekta i zaključci
• Nove prilike/potencijalna pitanja?
Primer (EDA/ML)
• Jupyter
• Python
• Orange Data Mining
• Kaggle
Korisni linkovi
https://www.kaggle.com/code/bandiatindra/telecom-churn-prediction (Customer churn prediction)
https://orangedatamining.com/ (Orange Data Mining)

More Related Content

Featured

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

Featured (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

GDSC DataMining

  • 2. Oponašanje ljudske inteligencije Artificial Intelligence Machine Learning Deep Learning Samostalno učenje, bez eksplicitnog programiranja Duboke neuronske mreže
  • 3. Mašinsko učenje (vrste) Supervised (nadgledano) Klasifikacija • Spam filteri • Detekcija defekata • Prepoznavanje objekata Regresija • Predviđanje cena na tržištu • Vremenska prognoza Unsupervised (nenadgledano) Klastering • Segmentacija klijenata • Identifikacija turističkih atrakcija Pravila asocijacije • Analiza potrošačke korpe Reinforcement (učenje sa podrškom)
  • 4. Data mining is an analytical business process which applies business knowledge to data in order to achieve business goals, creating new business knowledge and often using predictive modelling algorithms. Predictive modelling algorithms are also called “data mining algorithms”; most originate in the fields of machine learning and statistics.1 - Tom Khabaza 1 http://khabaza.codimension.net/index_files/datamining.htm
  • 5. CRoss-Industry Standard Process for Data Mining • Objavljen 1999. • Opisuje tipične faze projekta i aktivnosti u okviru svake faze • Kao i zavisnosti odnosno tipične redoslede faza • Redosled međutim nije striktan i često dolazi do povratka na prethodne faze • Iteracije i preispitivanje CRISP-DM
  • 6. Business Understanding Poslovno razumevanje • Razumevanje trenutne situacije • Problemi, raspoloživi resursi, trenutna rešenja u upotrebi • Definisanje ciljeva i kriterijuma uspeha • Usaglašavanje očekivanja • Ograničenja • Plan projekta • Obično je podložan promenama • Opis resursa neophodnih za realizaciju aktivnosti u fazama
  • 7. Data Understanding • Raspoloživi podaci, prikupljanje novih, eksterni podaci • Transakcioni sistemi, kupovina podataka • Prikupljanje i opisivanje podataka • Broj zapisa/atributa, tipovi, distribucije vrednosti, osobine relevantnih podskupova, granularnost i frekvencija • Provera kvaliteta podataka • Nedostajuće vrednosti, greške Razumevanje podataka
  • 8. Data Preparation Priprema podataka • Selekcija (izbor podataka koji će se koristiti) • Razlozi za upotrebu/isključivanje izvora/zapisa/atributa • Čišćenje podakta (data cleaning) • Tretman nedostajućih/ekstremnih vrednosti, konverzija tipova • Konstruisanje atributa • Povezivanje iz različitih izvora, kreiranje/izvođenje, normalizacija/standardizacija
  • 9. Modeling Modelovanje • Izbor tehnika i dokumentovanje • Posebne pretpostavke ili zahtevi? • Plan modelovanja • Razdvajanje podataka u trening/validacioni/test set • Izgradnja (kreiranje) modela • Izbor parametara • Procena performansi i rangiranje
  • 10. Nadgledano mašinsko učenje Od ulaznih podataka do predviđanja
  • 11. Evaluation • Da li model zadovoljava očekivanja? • Da li model ima neke nedostatke? • Šta je sve otkriveno tokom prethodnih faza? • Interesantni obrasci • Da li treba napraviti neke izmene u prethodnim fazama/uraditi nešto na drugačiji način? Evaluacija
  • 12. Deployment Primena • Plan primene (prediktivnog) rešenja • Tehnički (operacionalizacija/integracija) • Organizacioni (promene u procesima) • Praćenje i održavanje rešenja • Da li model i dalje dobro radi? • Preispitivanje projekta i zaključci • Nove prilike/potencijalna pitanja?
  • 13. Primer (EDA/ML) • Jupyter • Python • Orange Data Mining • Kaggle
  • 14. Korisni linkovi https://www.kaggle.com/code/bandiatindra/telecom-churn-prediction (Customer churn prediction) https://orangedatamining.com/ (Orange Data Mining)