SlideShare a Scribd company logo
1 of 51
Download to read offline
Uczenie maszynowe
Vladimir Alekseichenko
„rocket science” czy chleb powszedni?
Zmiany w czasie
10min na jeden
36 500 000 minut
~70 lat
Kierowca vs Mechanik
dataworkshop.eu
Bike Sharing Demand
Zadnie - kaggle
Rozwiązanie - github.com/dataworkshop
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Zrozum Biznes i Dane
(understand business and data)
Dni robocze
Weekend
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Wytworzenie cech
(feature engineering)
• ilościowe => od 1 do 10, 11 do 20…
• daty => dzień, miesiąc, rok, godzina, czy weekend…
• kategorii/jakościowe (czerwony, zielony, biały)
• przypisać identyfikator liczbowy (1, 2, 3)
• stworzyć n-kolumn binarnych (jest czerwony? itd)
• prawdopodobieństwa ze zmienną docelową
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Selekcja cech
(feature selection)
• Czym mniej tym lepiej (prostszy model)
• Zostawić najbardziej wartościowe (idealnie jedna :)
• Cechy (zazwyczaj) są zależny, więc trzeba uważać… (sprawdzać empirycznie)
• Szybciej
Variance
Univariate
Recursive
xgbfir
https://github.com/limexp/xgbfir
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Dobór Modelu
(model selection)
• Linear
• Decision Tree
• Random Forest
• Gradient Boosting
• Neural Network
Linear
https://github.com/dataworkshop/model_evaluation/blob/master/step1-regression.ipynb
Decision Tree
http://xgboost.readthedocs.io/en/latest/model.html
Ensemble trees
http://xgboost.readthedocs.io/en/latest/model.html
Ensemble trees
• Bagging (bootstrap aggregation)
• Random Forest
• Extra Trees
• Boosting
• Gradient Boosting
XGBoost
(Extreme Gradient Boosting)
“When in doubt, use
xgboost”
Owen Zhang
Wybór modelu
(model selection)
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Dobór hiperparametrów
(tuning hyperparameters)
• Grid Search
• Random Search
• Bayesian
hyperopt
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Ansambl
(ensemble modeling)
Neuron
(Artificial) Neural Network
MNIST
Dane
Neural Network
Error: 1.60%
http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html
source
Wyzwania
Przeuczenie się
(overfitting)
http://mlwiki.org/index.php/Overfitting
Sprawdzian krzyżowy
(cross-validation)
http://blog.goldenhelix.com/bchristensen/cross-validation-for-genomic-prediction-in-svs/
Kreatywność jest wiele warta
https://techcrunch.com/2016/11/19/how-data-science-and-rocket-science-will-get-humans-to-mars
source
Fala już idzi…
czy jesteś gotów?
Dziękuję
@slon1024
hello@vova.me
dataworkshop.eu

More Related Content

Similar to AIMeetup #3: Uczenie maszynowe - rocket science czy chleb powszedni?

Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...Databricks
 
15 a 01 reporting
15 a 01 reporting15 a 01 reporting
15 a 01 reportingtflung
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdfChris Hoyean Song
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Back to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FMEBack to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FMESafe Software
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseFeatureByte
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsSerge Smetana
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionRittman Analytics
 
Precomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamPrecomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamTatiana Al-Chueyr
 
Summer 2013 Internship Reflection
Summer 2013 Internship ReflectionSummer 2013 Internship Reflection
Summer 2013 Internship ReflectionTrevor Huggins
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)Toshiyuki Shimono
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCgdgsurrey
 
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Databricks
 
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023 Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023 Muntis Rudzitis
 
Reporting with cloud solutions from SAP
Reporting with cloud solutions from SAPReporting with cloud solutions from SAP
Reporting with cloud solutions from SAPAndreas Eißmann
 
Ph.D Defense Clément Béra
Ph.D Defense Clément BéraPh.D Defense Clément Béra
Ph.D Defense Clément BéraClément Béra
 

Similar to AIMeetup #3: Uczenie maszynowe - rocket science czy chleb powszedni? (20)

Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
 
15 a 01 reporting
15 a 01 reporting15 a 01 reporting
15 a 01 reporting
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Back to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FMEBack to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FME
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data Warehouse
 
STEP Architecture Update
STEP Architecture UpdateSTEP Architecture Update
STEP Architecture Update
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
 
Precomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamPrecomputing recommendations with Apache Beam
Precomputing recommendations with Apache Beam
 
Summer 2013 Internship Reflection
Summer 2013 Internship ReflectionSummer 2013 Internship Reflection
Summer 2013 Internship Reflection
 
IC2IT 2013 Presentation
IC2IT 2013 PresentationIC2IT 2013 Presentation
IC2IT 2013 Presentation
 
IC2IT 2013 Presentation
IC2IT 2013 PresentationIC2IT 2013 Presentation
IC2IT 2013 Presentation
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDC
 
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
 
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023 Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
 
Reporting with cloud solutions from SAP
Reporting with cloud solutions from SAPReporting with cloud solutions from SAP
Reporting with cloud solutions from SAP
 
Ph.D Defense Clément Béra
Ph.D Defense Clément BéraPh.D Defense Clément Béra
Ph.D Defense Clément Béra
 

More from 2040.io

Jak budujemy inteligentnego asystenta biznesowego
Jak budujemy inteligentnego asystenta biznesowegoJak budujemy inteligentnego asystenta biznesowego
Jak budujemy inteligentnego asystenta biznesowego2040.io
 
Obsługa klienta z wykorzystaniem sztucznej inteligencji
Obsługa klienta z wykorzystaniem sztucznej inteligencjiObsługa klienta z wykorzystaniem sztucznej inteligencji
Obsługa klienta z wykorzystaniem sztucznej inteligencji2040.io
 
Jak AI pozwala nam usłyszeć głos klienta
Jak AI pozwala nam usłyszeć głos klientaJak AI pozwala nam usłyszeć głos klienta
Jak AI pozwala nam usłyszeć głos klienta2040.io
 
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstuWyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu2040.io
 
Rozpoznawanie mowy: problem rozwiązany?
Rozpoznawanie mowy: problem rozwiązany?Rozpoznawanie mowy: problem rozwiązany?
Rozpoznawanie mowy: problem rozwiązany?2040.io
 
Czy Deep Learning działa?
Czy Deep Learning działa?Czy Deep Learning działa?
Czy Deep Learning działa?2040.io
 
Analiza semantyczna zasosowana w środowisku Menerva
Analiza semantyczna zasosowana w środowisku MenervaAnaliza semantyczna zasosowana w środowisku Menerva
Analiza semantyczna zasosowana w środowisku Menerva2040.io
 
Time-series prediction with neural networks
Time-series prediction with neural networksTime-series prediction with neural networks
Time-series prediction with neural networks2040.io
 
Ai meetup Neural machine translation updated
Ai meetup Neural machine translation updatedAi meetup Neural machine translation updated
Ai meetup Neural machine translation updated2040.io
 
AIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translationAIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translation2040.io
 
AIMeetup #4: Artificial intelligence and economics
AIMeetup #4: Artificial intelligence and economicsAIMeetup #4: Artificial intelligence and economics
AIMeetup #4: Artificial intelligence and economics2040.io
 
AIMeetup #4: Let’s compete with machine! edrone crm
AIMeetup #4: Let’s compete with machine! edrone crmAIMeetup #4: Let’s compete with machine! edrone crm
AIMeetup #4: Let’s compete with machine! edrone crm2040.io
 

More from 2040.io (12)

Jak budujemy inteligentnego asystenta biznesowego
Jak budujemy inteligentnego asystenta biznesowegoJak budujemy inteligentnego asystenta biznesowego
Jak budujemy inteligentnego asystenta biznesowego
 
Obsługa klienta z wykorzystaniem sztucznej inteligencji
Obsługa klienta z wykorzystaniem sztucznej inteligencjiObsługa klienta z wykorzystaniem sztucznej inteligencji
Obsługa klienta z wykorzystaniem sztucznej inteligencji
 
Jak AI pozwala nam usłyszeć głos klienta
Jak AI pozwala nam usłyszeć głos klientaJak AI pozwala nam usłyszeć głos klienta
Jak AI pozwala nam usłyszeć głos klienta
 
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstuWyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu
 
Rozpoznawanie mowy: problem rozwiązany?
Rozpoznawanie mowy: problem rozwiązany?Rozpoznawanie mowy: problem rozwiązany?
Rozpoznawanie mowy: problem rozwiązany?
 
Czy Deep Learning działa?
Czy Deep Learning działa?Czy Deep Learning działa?
Czy Deep Learning działa?
 
Analiza semantyczna zasosowana w środowisku Menerva
Analiza semantyczna zasosowana w środowisku MenervaAnaliza semantyczna zasosowana w środowisku Menerva
Analiza semantyczna zasosowana w środowisku Menerva
 
Time-series prediction with neural networks
Time-series prediction with neural networksTime-series prediction with neural networks
Time-series prediction with neural networks
 
Ai meetup Neural machine translation updated
Ai meetup Neural machine translation updatedAi meetup Neural machine translation updated
Ai meetup Neural machine translation updated
 
AIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translationAIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translation
 
AIMeetup #4: Artificial intelligence and economics
AIMeetup #4: Artificial intelligence and economicsAIMeetup #4: Artificial intelligence and economics
AIMeetup #4: Artificial intelligence and economics
 
AIMeetup #4: Let’s compete with machine! edrone crm
AIMeetup #4: Let’s compete with machine! edrone crmAIMeetup #4: Let’s compete with machine! edrone crm
AIMeetup #4: Let’s compete with machine! edrone crm
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

AIMeetup #3: Uczenie maszynowe - rocket science czy chleb powszedni?

  • 1. Uczenie maszynowe Vladimir Alekseichenko „rocket science” czy chleb powszedni?
  • 3.
  • 4.
  • 5.
  • 6. 10min na jeden 36 500 000 minut ~70 lat
  • 7.
  • 10. Bike Sharing Demand Zadnie - kaggle Rozwiązanie - github.com/dataworkshop
  • 11. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 12. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 13. Zrozum Biznes i Dane (understand business and data)
  • 14.
  • 17. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 18. Wytworzenie cech (feature engineering) • ilościowe => od 1 do 10, 11 do 20… • daty => dzień, miesiąc, rok, godzina, czy weekend… • kategorii/jakościowe (czerwony, zielony, biały) • przypisać identyfikator liczbowy (1, 2, 3) • stworzyć n-kolumn binarnych (jest czerwony? itd) • prawdopodobieństwa ze zmienną docelową
  • 19.
  • 20. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 21. Selekcja cech (feature selection) • Czym mniej tym lepiej (prostszy model) • Zostawić najbardziej wartościowe (idealnie jedna :) • Cechy (zazwyczaj) są zależny, więc trzeba uważać… (sprawdzać empirycznie) • Szybciej
  • 24. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 25. Dobór Modelu (model selection) • Linear • Decision Tree • Random Forest • Gradient Boosting • Neural Network
  • 29. Ensemble trees • Bagging (bootstrap aggregation) • Random Forest • Extra Trees • Boosting • Gradient Boosting
  • 30. XGBoost (Extreme Gradient Boosting) “When in doubt, use xgboost” Owen Zhang
  • 32. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 33. Dobór hiperparametrów (tuning hyperparameters) • Grid Search • Random Search • Bayesian
  • 35. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 37.
  • 40. MNIST
  • 41. Dane
  • 50. source Fala już idzi… czy jesteś gotów?