SlideShare a Scribd company logo
Blend them all
План
● Что такое композиционное обучение?
● Различные виды ансамблей
● Blending
● Blending baking
Что такое композиционное обучение?
Это алгоритмы и методы утилизирующие множество ML моделей, с целью
повысить точность решаемой задачи. Наиболее известные подходы:
● Boosting
● Bagging
● Stacked generalization
● Blending
Boosting
Способ композиции слабой модели с целью получить сильную модель.
Модели обучаются последовательно, обращая внимания на ошибки
предыдущих моделей.
● Устойчив к выбросам
● Модели не обучаются параллельно
● Один и тот же набор фич и данных
Bagging
Bootstrap aggregation - метод обучения нескольких моделей на независимых
наборах данных из обучающей выборки.
● Модели обучаются независимо друг от друга
● Модели могут утилизировать различные фичи
● Есть возможность проводить обучение параллельно
Stacking
0) Обучение базовых моделей (generalizators)
1) Обучение мета алгоритма на выходах (stackers)
2) Получение результата
В п.2 можно обучать несколько мета алгоритмов и добавлять еще несколько
таких уровней (stack).
При обучении используется кросс валидация. Обучение следующих уровней
производиться на Out of Fold данных.
Stacking
Blending
Понятие появилось после Netflix BigChaos Competition. Netflix так и не
реализовал топовую модель из-за ее сложности.
Тоже самое что и Stacking, но вместо OOF обучение уровней, всегда
выделяется некоторая порция данных (holdout), на которых производится
обучение мета алгоритма.
Blending
Плюсы:
● Проще, чем Stacking (holdout vs oof)
● Невозможен смешение данных при обучении stackers и generalizers
● При кросс валидации нет необходимости использовать один и тот же
seed.
Минусы:
● оверфит на holdout
● меньше данных при обучении stackers
Blending baking
● Лучше использовать только diverse модели
● CV всего пайплайна (различные holdout)
● rule of thumb
○ Pearson correlation < 0.95
○ Kolmogorov-Smirnov statistic > 0.05
Blending baking
Спасибо за внимание

More Related Content

More from Provectus

ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
Provectus
 
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K..."Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
Provectus
 
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ..."How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
Provectus
 
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky..."Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
Provectus
 
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2..."Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
Provectus
 
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma..."Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
Provectus
 
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ..."Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
Provectus
 
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
Provectus
 
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
Provectus
 
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti..."Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
Provectus
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
Provectus
 
How to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMHow to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAM
Provectus
 
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC MeetupYurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Provectus
 
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC MeetupAndrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
Provectus
 
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupModern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Provectus
 
Eva Sokolyanskaya, QA Stand Up: Episode 4
Eva Sokolyanskaya, QA Stand Up: Episode 4Eva Sokolyanskaya, QA Stand Up: Episode 4
Eva Sokolyanskaya, QA Stand Up: Episode 4
Provectus
 
Mikhail Dovgiy, QA Stand Up: Episode 4
Mikhail Dovgiy, QA Stand Up: Episode 4Mikhail Dovgiy, QA Stand Up: Episode 4
Mikhail Dovgiy, QA Stand Up: Episode 4
Provectus
 
“Process optimization system”, Dmitry Blazhvskiy.
“Process optimization system”, Dmitry Blazhvskiy.“Process optimization system”, Dmitry Blazhvskiy.
“Process optimization system”, Dmitry Blazhvskiy.
Provectus
 
«Training Deep Learning Models on Multi-GPUs Systems», Dmitry Spodarets.
«Training Deep Learning Models on Multi-GPUs Systems», Dmitry Spodarets.«Training Deep Learning Models on Multi-GPUs Systems», Dmitry Spodarets.
«Training Deep Learning Models on Multi-GPUs Systems», Dmitry Spodarets.
Provectus
 
«Design and purpose of convolutional layers in neural networks», Andrii Latysh.
 «Design and purpose of convolutional layers in neural networks», Andrii Latysh. «Design and purpose of convolutional layers in neural networks», Andrii Latysh.
«Design and purpose of convolutional layers in neural networks», Andrii Latysh.
Provectus
 

More from Provectus (20)

ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
 
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K..."Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
 
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ..."How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
 
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky..."Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
 
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2..."Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
 
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma..."Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
 
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ..."Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
 
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
 
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
 
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti..."Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
 
How to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMHow to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAM
 
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC MeetupYurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
 
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC MeetupAndrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
 
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupModern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
 
Eva Sokolyanskaya, QA Stand Up: Episode 4
Eva Sokolyanskaya, QA Stand Up: Episode 4Eva Sokolyanskaya, QA Stand Up: Episode 4
Eva Sokolyanskaya, QA Stand Up: Episode 4
 
Mikhail Dovgiy, QA Stand Up: Episode 4
Mikhail Dovgiy, QA Stand Up: Episode 4Mikhail Dovgiy, QA Stand Up: Episode 4
Mikhail Dovgiy, QA Stand Up: Episode 4
 
“Process optimization system”, Dmitry Blazhvskiy.
“Process optimization system”, Dmitry Blazhvskiy.“Process optimization system”, Dmitry Blazhvskiy.
“Process optimization system”, Dmitry Blazhvskiy.
 
«Training Deep Learning Models on Multi-GPUs Systems», Dmitry Spodarets.
«Training Deep Learning Models on Multi-GPUs Systems», Dmitry Spodarets.«Training Deep Learning Models on Multi-GPUs Systems», Dmitry Spodarets.
«Training Deep Learning Models on Multi-GPUs Systems», Dmitry Spodarets.
 
«Design and purpose of convolutional layers in neural networks», Andrii Latysh.
 «Design and purpose of convolutional layers in neural networks», Andrii Latysh. «Design and purpose of convolutional layers in neural networks», Andrii Latysh.
«Design and purpose of convolutional layers in neural networks», Andrii Latysh.
 

[Data Science Meetup] Pavel Borobov: Blend them all

  • 2. План ● Что такое композиционное обучение? ● Различные виды ансамблей ● Blending ● Blending baking
  • 3. Что такое композиционное обучение? Это алгоритмы и методы утилизирующие множество ML моделей, с целью повысить точность решаемой задачи. Наиболее известные подходы: ● Boosting ● Bagging ● Stacked generalization ● Blending
  • 4. Boosting Способ композиции слабой модели с целью получить сильную модель. Модели обучаются последовательно, обращая внимания на ошибки предыдущих моделей. ● Устойчив к выбросам ● Модели не обучаются параллельно ● Один и тот же набор фич и данных
  • 5. Bagging Bootstrap aggregation - метод обучения нескольких моделей на независимых наборах данных из обучающей выборки. ● Модели обучаются независимо друг от друга ● Модели могут утилизировать различные фичи ● Есть возможность проводить обучение параллельно
  • 6. Stacking 0) Обучение базовых моделей (generalizators) 1) Обучение мета алгоритма на выходах (stackers) 2) Получение результата В п.2 можно обучать несколько мета алгоритмов и добавлять еще несколько таких уровней (stack). При обучении используется кросс валидация. Обучение следующих уровней производиться на Out of Fold данных.
  • 8. Blending Понятие появилось после Netflix BigChaos Competition. Netflix так и не реализовал топовую модель из-за ее сложности. Тоже самое что и Stacking, но вместо OOF обучение уровней, всегда выделяется некоторая порция данных (holdout), на которых производится обучение мета алгоритма.
  • 9. Blending Плюсы: ● Проще, чем Stacking (holdout vs oof) ● Невозможен смешение данных при обучении stackers и generalizers ● При кросс валидации нет необходимости использовать один и тот же seed. Минусы: ● оверфит на holdout ● меньше данных при обучении stackers
  • 10. Blending baking ● Лучше использовать только diverse модели ● CV всего пайплайна (различные holdout) ● rule of thumb ○ Pearson correlation < 0.95 ○ Kolmogorov-Smirnov statistic > 0.05