Declan Barry is a highly experienced sales professional with over 30 years of experience in retail sales management roles across Ireland. He has worked up to Group Sales Director level, managing teams of up to 70 people. Currently seeking a new opportunity, his background includes increasing sales turnover from €40m to €70m as Sales Director for Robert Roberts Limited from 2000 to 2015. He has a proven track record of successful brand launches, restructurings, and customer relationship management.
The document is an undergraduate prospectus for the University of Manchester that provides information about the university to prospective students. It discusses reasons to choose Manchester such as its world-leading research, inspiring alumni network, diverse opportunities for students, and status as a top university in the Russell Group. The prospectus provides details on academic programs, campus life, student support services, and the city of Manchester. It aims to give applicants a well-rounded view of why Manchester could be an excellent choice for their undergraduate education.
This document provides a summary of Tarek Abdulbary Ibrahim's personal and professional experience. It includes his contact information, education history obtaining a Bachelor's degree from Alexandria University and certificates from the Arab Academy for Science, Technology and Maritime Transport. It details his current role as Senior Sales Executive and Admin Manager at Life Chemicals Group as well as previous sales experience. His computer skills and extracurricular skills focusing on customer service, communication, and teamwork are also highlighted. References and a statement of availability are provided at the end.
This document contains Atul Dabral's resume summary. It includes his contact information, present address, date of birth, languages known, objective, professional profile, academic qualifications and projects worked on. Atul has over 4 years of experience as a Software Developer working with technologies like C/C++. He currently works at Oracle India Pvt. Ltd. as a Member of Technical Staff and has previously worked at Alcatel Lucent India Pvt. Ltd. and Mascon Global Limited Pvt. Ltd. He has expertise in areas such as Operating Systems, Programming Languages, Scripting and Databases. Some of the projects he has worked on include DataBlitz, a main memory database management system and Exalogic
Great Apes Giving Day - Fundraising Training 2016Jordan Brown
As a Great Ape Sanctuary and as a fundraising ambassador you do not want to miss this training session geared towards supporting your fundraising efforts. We will cover your timeline, how to build your page, communication tips and more.
La didáctica es el arte y la ciencia de enseñar. Como arte, se refiere a la habilidad del maestro para comunicar temas de manera clara y estimular el aprendizaje. Como ciencia, estudia los principios y métodos de la enseñanza de manera sistemática. La didáctica general se ocupa de principios aplicables a todas las disciplinas, mientras que la didáctica especial se enfoca en una disciplina en particular. Los objetivos de la didáctica incluyen hacer el proceso de enseñanza-aprendizaje
Barramundi Cycling Company needs foreign exchange advice and hedging for an upcoming import payment from Australia to the US. The summary recommends:
1. Using a forward contract to lock in exchange rates and minimize risk.
2. Paying the initial 50% payment with a spot rate on 6/2/2016 and the remaining 50% at a forward rate on 12/1/2016 (Payment Scheme Option 1).
3. Obtaining direct guarantees for the advance and remaining payments to further mitigate risk.
Declan Barry is a highly experienced sales professional with over 30 years of experience in retail sales management roles across Ireland. He has worked up to Group Sales Director level, managing teams of up to 70 people. Currently seeking a new opportunity, his background includes increasing sales turnover from €40m to €70m as Sales Director for Robert Roberts Limited from 2000 to 2015. He has a proven track record of successful brand launches, restructurings, and customer relationship management.
The document is an undergraduate prospectus for the University of Manchester that provides information about the university to prospective students. It discusses reasons to choose Manchester such as its world-leading research, inspiring alumni network, diverse opportunities for students, and status as a top university in the Russell Group. The prospectus provides details on academic programs, campus life, student support services, and the city of Manchester. It aims to give applicants a well-rounded view of why Manchester could be an excellent choice for their undergraduate education.
This document provides a summary of Tarek Abdulbary Ibrahim's personal and professional experience. It includes his contact information, education history obtaining a Bachelor's degree from Alexandria University and certificates from the Arab Academy for Science, Technology and Maritime Transport. It details his current role as Senior Sales Executive and Admin Manager at Life Chemicals Group as well as previous sales experience. His computer skills and extracurricular skills focusing on customer service, communication, and teamwork are also highlighted. References and a statement of availability are provided at the end.
This document contains Atul Dabral's resume summary. It includes his contact information, present address, date of birth, languages known, objective, professional profile, academic qualifications and projects worked on. Atul has over 4 years of experience as a Software Developer working with technologies like C/C++. He currently works at Oracle India Pvt. Ltd. as a Member of Technical Staff and has previously worked at Alcatel Lucent India Pvt. Ltd. and Mascon Global Limited Pvt. Ltd. He has expertise in areas such as Operating Systems, Programming Languages, Scripting and Databases. Some of the projects he has worked on include DataBlitz, a main memory database management system and Exalogic
Great Apes Giving Day - Fundraising Training 2016Jordan Brown
As a Great Ape Sanctuary and as a fundraising ambassador you do not want to miss this training session geared towards supporting your fundraising efforts. We will cover your timeline, how to build your page, communication tips and more.
La didáctica es el arte y la ciencia de enseñar. Como arte, se refiere a la habilidad del maestro para comunicar temas de manera clara y estimular el aprendizaje. Como ciencia, estudia los principios y métodos de la enseñanza de manera sistemática. La didáctica general se ocupa de principios aplicables a todas las disciplinas, mientras que la didáctica especial se enfoca en una disciplina en particular. Los objetivos de la didáctica incluyen hacer el proceso de enseñanza-aprendizaje
Barramundi Cycling Company needs foreign exchange advice and hedging for an upcoming import payment from Australia to the US. The summary recommends:
1. Using a forward contract to lock in exchange rates and minimize risk.
2. Paying the initial 50% payment with a spot rate on 6/2/2016 and the remaining 50% at a forward rate on 12/1/2016 (Payment Scheme Option 1).
3. Obtaining direct guarantees for the advance and remaining payments to further mitigate risk.
The Big Data market is projected to grow significantly between 2014 and 2026, with professional services remaining the largest segment until 2022. While companies are interested in Big Data due to success stories and potential profit increases, it remains an overwhelming subject for many due to technical challenges and a lack of skilled professionals. To boost sales of Big Data solutions, one should transmit a sense of urgency by citing reports on its business benefits, provide direct competitor success stories, and quantify potential revenue and cost savings. It is also advised to customize examples for each client, start with low-cost initial implementations to overcome adoption barriers, and leverage an MBA's business skills for customer development, needs identification, project management, and expanding client relationships.
Este documento resume los conceptos clave sobre bases de datos. Explica que una base de datos es un conjunto de datos organizados de forma sistemática para permitir un acceso rápido y eficaz a la información. Detalla los tipos de bases de datos, el diseño de bases de datos, y los roles y funciones de los sistemas de gestión de bases de datos, incluyendo el control de acceso, la seguridad y la consistencia de los datos. También describe el modelo relacional de bases de datos basado en relaciones entre conjuntos de datos organizados en tablas.
Patrick Asiedu has over 20 years of experience in structural engineering and project management. He holds an MSc in Structural Engineering and an MBA. His experience includes designing and assessing bridges, rail infrastructure, buildings, tunnels and other structures. Currently he is a Principal Engineer at Balfour Beatty Rail where he manages projects and teams of engineers. Some of his past projects include bridge assessments, station upgrades and infrastructure works for various rail clients.
This document analyzes the intensity of UK policy commitments to nuclear power compared to global trends favoring renewable energy. It hypothesizes that UK commitments may be influenced by aims to maintain nuclear submarine capabilities, an aspect unacknowledged in energy policy discussions. The paper tests this hypothesis by examining linkages between UK civil and military nuclear sectors, particularly during 2003-2006 when policy shifted from viewing nuclear power as "unattractive" to promoting a "nuclear renaissance." While many factors likely play a role, understanding the intensity of UK nuclear commitments may require considering commitments to nuclear submarines, which have remained invisible in energy policy debates.
This document provides a case study of Best Buy, the world's largest consumer electronics retailer. It outlines Best Buy's history of expansion through acquisitions like Geek Squad and strategies to gain market share. However, increased competition from online retailers like Amazon led to declining sales in the 2000s. In 2013, new CEO Hubert Joly implemented strategies like optimizing store layouts, increasing services like Geek Squad support, and expanding online and mobile offerings to transform the business.
This document provides instructions for editing the color of a background layer in Photoshop without affecting the foreground model layer. It describes using the quick selection tool to select just the background, then going to Image > Adjustments > Color Balance to add a color tint to the selected area while keeping shadows and highlights intact. The tone balance sliders can also be used to change the colors of shadows and highlights separately from the midtones.
Este documento trata sobre el agua y el pH. Explica la importancia del agua para los seres vivos, sus propiedades como la capacidad de formar puentes de hidrógeno, y su función como principal disolvente biológico. También describe la disociación del agua en iones hidronio y la definición de ácidos, bases y pH en términos de la concentración de iones hidronio. Finalmente, resume las diversas funciones del agua en el cuerpo.
The document discusses choosing Townsquare Media to distribute a new magazine called "RE-LEASED". Townsquare has experience distributing magazines, including recently acquiring XXL, one of the largest hip-hop magazines. As hip-hop fans often buy multiple related magazines, distributing "RE-LEASED" through Townsquare could increase sales of both it and XXL. While Bauer Media Group also distributes magazines, it does not have experience with hip-hop titles, so Townsquare is the better choice to help the new magazine gain exposure and succeed in the competitive magazine market. The magazine will be sold in supermarkets, online retailers, and WHSmith stores in airports to target its 15-24 year old audience.
Thriller trailers aim to keep audiences on the edge of their seats through suspense and anticipation of danger. They often feature guns, cars, and low-key lighting to set a dark, mysterious atmosphere. Teen thriller subgenres include action, crime, psychological, science fiction, and religious thrillers that incorporate themes of drama, mystery, and the mind. Trailers use techniques like close-ups, dramatic music, and low-key back lighting to convey a character's emotions and build suspense.
DataWeave is a new language for querying and transforming data that contains a data access layer enabling large payloads and random access without costly conversions. An example transforms a JSON file to XML using the DataWeave component in MuleSoft, which has input, DataWeave code, and output sections. The DataWeave code defines the mappings and output format, and changing the output type transforms the data to CSV or Java objects.
Nancy Pearson is a licensed and board certified adult nurse practitioner with over 14 years of experience caring for patients with chronic neurological conditions. She has extensive experience coordinating clinical trials including recruiting subjects, collecting data and specimens, and completing regulatory documents. Her qualifications include certifications in neuroscience nursing and human subjects protection. Her work history shows over 25 years of experience in both inpatient and outpatient settings, most recently as a nurse practitioner for an in-home assessment provider.
The document discusses the UK's Heritage White Paper from 2007 which proposed revisions to the country's heritage protection legislation. It aims to create a unified system for designating, managing and maintaining heritage sites and objects. The current system uses separate processes that the White Paper wants to consolidate. Key points of the new system include a single process for national designation, clearer selection criteria, and more public involvement. However, the document also analyzes some problems and limitations with the White Paper, such as how heritage is defined and its focus on procedural changes rather than addressing social issues.
Машинное обучение в электронной коммерции — практика использования и подводны...Ontico
HighLoad++ 2017
Зал «Найроби+Касабланка», 7 ноября, 16:00
Тезисы:
http://www.highload.ru/2017/abstracts/2851.html
Анализ, проектирование, разработка и эксплуатация моделей предиктивной аналитики в Битрикс24.
В докладе расскажем, как мы создали несколько хайлоад-моделей для предсказания платных клиентов, потенциальной прибыли клиентов и клиентов, вероятно покидающих сервис. Поделимся опытом выбора алгоритмов, библиотек, тонкой настройки моделей в Spark MLib, фильтрации и обработки бигдаты на кластерах Spark в Amazon Web Services и всем тем, что необходимо для доведения "предиктивных" моделей до работающего при высоких нагрузках сервиса.
Самое важное в докладе - опыт доведения алгоритмов до прикладного бизнес-применения, тонкости и техники выжимания из данных самой ценной информации.
The Big Data market is projected to grow significantly between 2014 and 2026, with professional services remaining the largest segment until 2022. While companies are interested in Big Data due to success stories and potential profit increases, it remains an overwhelming subject for many due to technical challenges and a lack of skilled professionals. To boost sales of Big Data solutions, one should transmit a sense of urgency by citing reports on its business benefits, provide direct competitor success stories, and quantify potential revenue and cost savings. It is also advised to customize examples for each client, start with low-cost initial implementations to overcome adoption barriers, and leverage an MBA's business skills for customer development, needs identification, project management, and expanding client relationships.
Este documento resume los conceptos clave sobre bases de datos. Explica que una base de datos es un conjunto de datos organizados de forma sistemática para permitir un acceso rápido y eficaz a la información. Detalla los tipos de bases de datos, el diseño de bases de datos, y los roles y funciones de los sistemas de gestión de bases de datos, incluyendo el control de acceso, la seguridad y la consistencia de los datos. También describe el modelo relacional de bases de datos basado en relaciones entre conjuntos de datos organizados en tablas.
Patrick Asiedu has over 20 years of experience in structural engineering and project management. He holds an MSc in Structural Engineering and an MBA. His experience includes designing and assessing bridges, rail infrastructure, buildings, tunnels and other structures. Currently he is a Principal Engineer at Balfour Beatty Rail where he manages projects and teams of engineers. Some of his past projects include bridge assessments, station upgrades and infrastructure works for various rail clients.
This document analyzes the intensity of UK policy commitments to nuclear power compared to global trends favoring renewable energy. It hypothesizes that UK commitments may be influenced by aims to maintain nuclear submarine capabilities, an aspect unacknowledged in energy policy discussions. The paper tests this hypothesis by examining linkages between UK civil and military nuclear sectors, particularly during 2003-2006 when policy shifted from viewing nuclear power as "unattractive" to promoting a "nuclear renaissance." While many factors likely play a role, understanding the intensity of UK nuclear commitments may require considering commitments to nuclear submarines, which have remained invisible in energy policy debates.
This document provides a case study of Best Buy, the world's largest consumer electronics retailer. It outlines Best Buy's history of expansion through acquisitions like Geek Squad and strategies to gain market share. However, increased competition from online retailers like Amazon led to declining sales in the 2000s. In 2013, new CEO Hubert Joly implemented strategies like optimizing store layouts, increasing services like Geek Squad support, and expanding online and mobile offerings to transform the business.
This document provides instructions for editing the color of a background layer in Photoshop without affecting the foreground model layer. It describes using the quick selection tool to select just the background, then going to Image > Adjustments > Color Balance to add a color tint to the selected area while keeping shadows and highlights intact. The tone balance sliders can also be used to change the colors of shadows and highlights separately from the midtones.
Este documento trata sobre el agua y el pH. Explica la importancia del agua para los seres vivos, sus propiedades como la capacidad de formar puentes de hidrógeno, y su función como principal disolvente biológico. También describe la disociación del agua en iones hidronio y la definición de ácidos, bases y pH en términos de la concentración de iones hidronio. Finalmente, resume las diversas funciones del agua en el cuerpo.
The document discusses choosing Townsquare Media to distribute a new magazine called "RE-LEASED". Townsquare has experience distributing magazines, including recently acquiring XXL, one of the largest hip-hop magazines. As hip-hop fans often buy multiple related magazines, distributing "RE-LEASED" through Townsquare could increase sales of both it and XXL. While Bauer Media Group also distributes magazines, it does not have experience with hip-hop titles, so Townsquare is the better choice to help the new magazine gain exposure and succeed in the competitive magazine market. The magazine will be sold in supermarkets, online retailers, and WHSmith stores in airports to target its 15-24 year old audience.
Thriller trailers aim to keep audiences on the edge of their seats through suspense and anticipation of danger. They often feature guns, cars, and low-key lighting to set a dark, mysterious atmosphere. Teen thriller subgenres include action, crime, psychological, science fiction, and religious thrillers that incorporate themes of drama, mystery, and the mind. Trailers use techniques like close-ups, dramatic music, and low-key back lighting to convey a character's emotions and build suspense.
DataWeave is a new language for querying and transforming data that contains a data access layer enabling large payloads and random access without costly conversions. An example transforms a JSON file to XML using the DataWeave component in MuleSoft, which has input, DataWeave code, and output sections. The DataWeave code defines the mappings and output format, and changing the output type transforms the data to CSV or Java objects.
Nancy Pearson is a licensed and board certified adult nurse practitioner with over 14 years of experience caring for patients with chronic neurological conditions. She has extensive experience coordinating clinical trials including recruiting subjects, collecting data and specimens, and completing regulatory documents. Her qualifications include certifications in neuroscience nursing and human subjects protection. Her work history shows over 25 years of experience in both inpatient and outpatient settings, most recently as a nurse practitioner for an in-home assessment provider.
The document discusses the UK's Heritage White Paper from 2007 which proposed revisions to the country's heritage protection legislation. It aims to create a unified system for designating, managing and maintaining heritage sites and objects. The current system uses separate processes that the White Paper wants to consolidate. Key points of the new system include a single process for national designation, clearer selection criteria, and more public involvement. However, the document also analyzes some problems and limitations with the White Paper, such as how heritage is defined and its focus on procedural changes rather than addressing social issues.
Машинное обучение в электронной коммерции — практика использования и подводны...Ontico
HighLoad++ 2017
Зал «Найроби+Касабланка», 7 ноября, 16:00
Тезисы:
http://www.highload.ru/2017/abstracts/2851.html
Анализ, проектирование, разработка и эксплуатация моделей предиктивной аналитики в Битрикс24.
В докладе расскажем, как мы создали несколько хайлоад-моделей для предсказания платных клиентов, потенциальной прибыли клиентов и клиентов, вероятно покидающих сервис. Поделимся опытом выбора алгоритмов, библиотек, тонкой настройки моделей в Spark MLib, фильтрации и обработки бигдаты на кластерах Spark в Amazon Web Services и всем тем, что необходимо для доведения "предиктивных" моделей до работающего при высоких нагрузках сервиса.
Самое важное в докладе - опыт доведения алгоритмов до прикладного бизнес-применения, тонкости и техники выжимания из данных самой ценной информации.
3. 3
Зачем нужны деревья решений ?
Зачем?
× Простые
× Слабые
× Эвристическое
обучение (NP-complete)
× →разные деревья для
похожих данных
DM Labs
4. 4
CART, история
CART
• CART “Classification and Regression Trees”
Breiman Friedman Olsen & Stone
• CART придумали для knn
http://www.youtube.com/watch?v=8hupHmBVvb0
• Альтернативные алгоритмы:
ID3
C4.5
ID5
CART
DM Labs
5. 5
Из чего состоит дерево ?
DM Labs
Дерево
• Узлы (сплит) – условие
Пил яд?
bool
o Условия бинарные: >?, ==
o Условия только по одной переменной
ок
Цианид?
не полоний,
не мышьяк
• Ветки – True/False
• Листья – константы
o Регрессия: число
Больше 1мг?
ок
numeric
o Классификация: метка/число
не
ок
ок
6. 6
Из чего состоит дерево ?
DM Labs
Дерево
• Узлы (сплит) – условие
Пил яд?
bool
o Условия бинарные: >?, ==
o Условия только по одной переменной
• Ветки – True/False
Цианид?
не полоний,
не мышьяк
• Листья – константы
o Регрессия: число
o Классификация: метка/число
Больше 1мг?
numeric
7. 7
Как вырастить дерево ?
DM Labs
Как вырастить дерево?
Learning?
• Дерево недифференцируемо
• Допустим у нас 100 бинарных переменных...
Надо ли париться?
• Жадная схема
• Наверное не лучшое дерево, но и так сойдет
8. 8
Как вырастить дерево ?
DM Labs
Как вырастить дерево?
1. Как делать сплиты?
2. Когда остановиться со сплитами?
~дошли до листьев
3. Что записывать в лист?
9. 9
Откуда брать сплиты ?
Как делать сплиты?
DM Labs
• Нужно уметь сравнивать узлы – кого лучше
рассекать. Задать меру.
• Мера должна достигать максимума, если в узле
поровну всех классов (хуже будет).
• Должна быть нулем, если в узле все одного класса
(лучше не сделать).
10. 10
Меры ( не ) хорошести сплитов
Как делать сплиты?
DM Labs
• Нужно уметь сравнивать узлы – кого лучше
рассекать. Задать меру (не)хорошести .
1. Misclassification Rate
2. Entropy
H ( p) = −∑ p j log p j
j
3. Gini Index
i ( p ) = ∑ pi p j = 1 − ∑ p 2
j
i≠ j
j
0log0 = 0
p=(p1,p2,…. pn)
11. 11
Меры ( не ) хорошести сплитов
Как делать сплиты?
• Нужно уметь сравнивать узлы – кого лучше
рассекать. Задать меру (не)хорошести .
1. Misclassification Rate
2. Entropy
3. Gini Index
DM Labs
12. 12
Проблемы с Misclassification Rate
Меры (не)хорошести
• Первая – самая логичная, НО:
o МБ ни один сплит ее не улучшает..
40% B
60% A
60% A
40% B
Split 1
Split 2
DM Labs
13. 13
Проблемы с Misclassification Rate
Меры (не)хорошести
DM Labs
• Первая – самая логичная, НО:
o МБ она будет одинаковой для двух сплитов,
а один из них определенно лучше
400 A
400 B
ИЛИ?
400 A
400 B
300 A
100 A
200 A
200 A
100 B
300 B
400 B
0B
14. 14
Хорошесть дерева
Хорошесть дерева
DM Labs
• Сумма хорошестей по узлам
помноженная на %точек в узле
400 A
400 B
ИЛИ?
400 A
400 B
300 A
100 A
200 A
200 A
100 B
300 B
400 B
0B
16. 16
Хорошесть дерева
Хорошесть
# точек
A
дерева
p классов
B
Gini Index
A
pB
p2A
p2B
Вклад в
дерево
B
pA
DM Labs
1- p2A - p2B
300
100
0.75
0.25
0.5625
0.0625
0.375
0.1875
100
300
0.25
0.75
0.0625
0.5625
0.375
0.1875
Total
0.375
200
400
0.33
0.67
0.1111
0.4444
0.4444
0.3333
200
0
1
0
1
0
0
0
Total
0.3333
17. 17
Построение сплитов
Построение сплитов
DM Labs
• Наделать кандидатов в листья:
o для всех сплитов,
o для каждой переменной,
o рекурсивно
• Выбрать сплит с наибольшим сниженим Gini Index.
• Внутрь – метку класса с majority vote или среднее
18. 18
Построение сплитов
Построение сплитов
DM Labs
• Продолжаем процесс пока есть что сплитить
Тоесть, в листья осталось достаточно точек
• Лучше отрастить дерево, а потом сделать из него
аккуратный куст
o ...Если мы сразу не растили пень...
19. Игрушечный пример
DM Labs
Пример, игрушечный
• Надо выбрать по
какой переменной
делать сплит
• Записываем majority
vote/ среднее
6
A
4
A
2
A
A
A A A ABA
A
A
A
A
A A
A
A A
A A
AA
A
A
AA
A
A
A A
B
A A
A
A
A
A
A
A B A A
A
A B
B
A
BA B
B
B
B
B
B
A
A
A
B
A
B
BA
B
B B
B
B
BB
B B
BB B
B B
B
B
B
B B
B
0
• Берем оба
кандидата, ищем
оптимальный сплит,
сравниваем
Classifying A or B
y
19
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
0
2
4
6
x
8
20. Игрушечный пример
20
DM Labs
Пример, игрушечный
• Получаем что-то такое
(R пакет rpart)
Plot showing how Tree works
6
A
A
• Каждый сплит
локально-оптимален
4
A
A
A
2
y
x< 2.808
|
A A ABA
A
A
A
A
A A
A
A A
A A
A
A
A AA
A
A
A
A A
B
A A
A
A
A
A
A B A A A
A
A B
B
A
BA B
0
y>=2.343
y>=3.442
B
B
B
A
B
BA
B
B B
B
B
BB
B B
BB B
B B
B
B
B
B B
B
B
B
B
A
A
A
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
0
2
4
6
x
A
B
A
B
8
23. Упрощение дерева
Упрощение дерева
DM Labs
• Проверять наилучшую глубину лучше с помощью
кросс-валидации.
Misclassification Rates
1
0.9
0.8
0.7
Misclassification rate
on Training Set
0.6
Error rates
23
0.5
0.4
Misclassification rate
on Test Set
0.3
0.2
0.1
0
0
10
20
30
40
Size of the Tree
50
60
70
80
Source: CART by Breiman et al.
24. Упрощение дерева
24
Упрощение дерева
DM Labs
• Проверять наилучшую глубину с помощью кроссвалидации.
• Вводится параметр complexity.
Варьируя его будет оставаться меньше листьев.
size of tree
Inf
0.32
3
5
6
9
10
14
17
21
24
26
0.4
0.6
0.8
1.0
2
0.2
X-val Relative Error
1
0.0057
0.003
cp
0.0021
0.0017
0.0011
Source: CART by Breiman et al.
25. 25
Summary
Summary
DM Labs
• Дерево это хороший и наглядный инструмент
анализа
• Скорее всего оно будет не самым-оптимальным
• Оно простое, но предсказательной силы может не
хватить
• Оно простое, поэтому его быстро строить и быстро
использовать
• Обычно строится в глубину, а потом срезается
подстать какому-то критерию
28. 28
Зачем нужен RF?
Случайный лес
DM Labs
Random Forest = Bagging с деревьями (Breiman et al.)
• Очень простой алгоритм
• Просто паралеллить
• Отличная точность (kaggle-рецепт)
• Быстрый (по сравнению с neural committee)
Реализует принцип «мудрость толпы».
29. 29
Идея RF
Идея
DM Labs
• Строим много моделей.
По-хорошему, разных, на разных dataset’ах.
С высоким variance и приемлимым bias.
• Усредняем – получаем низкое и то и другое.
• Если ошибки моделей некоррелированы, ошибка
должна уменьшится в М раз (по отношению к
составляющим ансамбль моделям).
• На практике оно коррелировано, но эффект есть
30. 30
Алгоритм RF
Случайный лес
Алгоритм:
For i in 1..M:
1. Делаем бутстрап-выборку данных.
2. Строим на ней дерево.
3. Кладем полученное дерево в ансамбль.
Использование:
1. Делаем прогноз всех M деревьев
2. Усредняем / делаем Majority Vote
DM Labs
31. 31
Дерево в лесу
Случайный лес
DM Labs
Деревья используют по-разному:
• Любой попавшийся алгоритм
• ID5 например строит более «широкую» сетку, не
глубоко. Cubist – RF c ID5
• Строят «до упора» - R randomForest
• Строят до заданного уровня (не парятся над
глубиной)
32. 32
История методов ML
Случайный лес
DM Labs
State-of-the-art, широкий класс задач:
• 1980-90-ые: Нейросети
• 2000-ые, начало: SVM
• 2000-ые, конец: Random Forest
• 2010-ые: Random Forest /GBM
• Сейчас: Deep Learning?
Его скоро разберут на части и вставят в RF