SlideShare a Scribd company logo
1 of 23
Download to read offline
MACHINE LEARNING
FROM DISASTER
F#unctional Londoners @ Skills Matter
Phil Trelford 2013 @ptrelford
RMS Titanic
On April 15, 1912, during
her maiden voyage, the
Titanic sank after colliding
with an iceberg, killing
1502 out of 2224
passengers and crew.
…there were not enough
lifeboats for the
passengers and crew.
…some groups of people
were more likely to survive
than others, such as
women, children, and the
upper-class.
Kaggle
competition
Kaggle
Titanic
dataset
train.csv
test.csv
PassengerIdSurvived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
1 0 3 Braund, Mr. Owen Harrismale 22 1 0 A/5 21171 7.25 S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)female 38 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen, Miss. Lainafemale 26 0 0 STON/O2. 3101282 7.925 S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)female 35 1 0 113803 53.1 C123 S
5 0 3 Allen, Mr. William Henrymale 35 0 0 373450 8.05 S
6 0 3 Moran, Mr. Jamesmale 0 0 330877 8.4583 Q
7 0 1 McCarthy, Mr. Timothy Jmale 54 0 0 17463 51.8625 E46 S
8 0 3 Palsson, Master. Gosta Leonardmale 2 3 1 349909 21.075 S
9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female 27 0 2 347742 11.1333 S
10 1 2 Nasser, Mrs. Nicholas (Adele Achem)female 14 1 0 237736 30.0708 C
11 1 3 Sandstrom, Miss. Marguerite Rutfemale 4 1 1 PP 9549 16.7 G6 S
12 1 1 Bonnell, Miss. Elizabethfemale 58 0 0 113783 26.55 C103 S
13 0 3 Saundercock, Mr. William Henrymale 20 0 0 A/5. 2151 8.05 S
14 0 3 Andersson, Mr. Anders Johanmale 39 1 5 347082 31.275 S
15 0 3 Vestrom, Miss. Hulda Amanda Adolfinafemale 14 0 0 350406 7.8542 S
16 1 2 Hewlett, Mrs. (Mary D Kingcome)female 55 0 0 248706 16 S
17 0 3 Rice, Master. Eugenemale 2 4 1 382652 29.125 Q
18 1 2 Williams, Mr. Charles Eugenemale 0 0 244373 13 S
19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)female 31 1 0 345763 18 S
20 1 3 Masselmani, Mrs. Fatimafemale 0 0 2649 7.225 C
21 0 2 Fynney, Mr. Joseph Jmale 35 0 0 239865 26 S
22 1 2 Beesley, Mr. Lawrencemale 34 0 0 248698 13 D56 S
23 1 3 McGowan, Miss. Anna "Annie"female 15 0 0 330923 8.0292 Q
24 1 1 Sloper, Mr. William Thompsonmale 28 0 0 113788 35.5 A6 S
25 0 3 Palsson, Miss. Torborg Danirafemale 8 3 1 349909 21.075 S
26 1 3 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)female 38 1 5 347077 31.3875 S
27 0 3 Emir, Mr. Farred Chehabmale 0 0 2631 7.225 C
28 0 1 Fortune, Mr. Charles Alexandermale 19 3 2 19950 263 C23 C25 C27 S
DATA ANALYSIS
Titanic: Titanic: Machine Learning from Disaster
FSharp.Data: CSV Provider
Counting
let female (passenger:Passenger) = passenger.Sex = “female”
let survived (passenger:Passenger) = passenger.Survived = 1
let females = passengers |> where female
let femaleSurvivors = females |> tally survived
let femaleSurvivorsPc = females |> percentage survived
Tally Ho!
/// Tally up items that match specified criteria
let tally criteria items =
items |> Array.filter criteria |> Array.length
/// Percentage of items that match specified criteria
let percentage criteria items =
let total = items |> Array.length
let count = items |> tally criteria
float count * 100.0 / float total
Survival rate
/// Survival rate of a criteria’s group
let survivalRate criteria =
passengers |> Array.groupBy criteria
|> Array.map (fun (key,matching) ->
key, matching |> Array.percentage survived
)
let embarked = survivalRate (fun p -> p.Embarked)
Score
let score f = passengers |> Array.percentage (fun p -> f p = p.Survived)
let rate = score (fun p -> (child p || female p) && not (p.Class = 3))
MACHINE LEARNING
Titanic: Machine Learning from Disaster
20 Questions
The game suggests that the
information (as measured
by Shannon's entropy statisti
c) required to identify an
arbitrary object is at most
20 bits. The game is often
used as an example when
teaching people
about information theory.
Mathematically, if each
question is structured to
eliminate half the
objects, 20 questions will
allow the questioner to
distinguish between 220 or
1,048,576 objects.
Decision
Trees
A tree can be "learned"
by splitting the
source set into subsets
based on an attribute
value test. This process is
repeated on each
derived subset in a
recursive manner
called recursive
partitioning.
Split data set (from ML in Action)
Python
def splitDataSet(dataSet, axis, value):
retDataSet = []
for featVec in dataSet:
if featVec[axis] == value:
reducedFeatVec = featVec[:axis]
reducedFeatVec.extend(featVec[axis+1:])
retDataSet.append(reducedFeatVec)
return retDataSet
F#
let splitDataSet(dataSet, axis, value) =
[|for featVec in dataSet do
if featVec.[axis] = value then
yield featVec |> Array.removeAt axis|]
Decision
Tree
let labels =
[|"sex"; "class"|]
let features (p:Passenger) : obj[] =
[|p.Sex; p.Pclass|]
let dataSet : obj[][] =
[|for passenger in passengers ->
[|yield! features passenger;
yield box (p.Survived = 1)|] |]
let tree = createTree(dataSet, labels)
Overfitting
CLASSIFY
Titanic: Machine Learning from Disaster
Decision Tree: Create -> Classify
let rec classify(inputTree, featLabels:string[], testVec:obj[]) =
match inputTree with
| Leaf(x) -> x
| Branch(s,xs) ->
let featIndex = featLabels |> Array.findIndex ((=) s)
xs |> Array.pick (fun (value,tree) ->
if testVec.[featIndex] = value
then classify(tree, featLabels,testVec) |> Some
else None
)
Titanic Data
Variable Description
survival Survival (0 = No; 1 = Yes)
pclass Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
name Name
sex Sex
age Age
sibsp Number of Siblings/Spouses Aboard
parch Number of Parents/Children Aboard
ticket Ticket Number
fare Passenger Fare
cabin Cabin
embarked Port of Embarkation
(C = Cherbourg; Q = Queenstown; S =
Southampton)
Tips:
* Empty floats -
Double.Nan
RESOURCES
Titanic: Machine Learning from Disaster
Special thanks!
◦ Matthias Brandewinder for the Machine Learning samples
◦ http://www.clear-lines.com/blog/
◦ Tomas Petricek & Gustavo Guerra for FSharp.Data library
◦ http://fsharp.github.io/FSharp.Data/
◦ F# Team for Type Providers
◦ http://blogs.msdn.com/b/dsyme/archive/2013/01/30/twelve-type-providers-in-pictures.aspx
◦ Peter Harrington’s for the Machine Learning in Action code samples
◦ http://www.manning.com/pharrington/
◦ Kaggle for the Titanic data set
◦ http://www.kaggle.com/c/titanic-gettingStarted
Machine
Learning Job
Trends
Source indeed.co.uk
What next?
F# Machine Learning information
◦ http://fsharp.org/machine-learning/
Random Forests
◦ http://tinyurl.com/randomforests
Progressive F# Tutorials
◦ http://skillsmatter.com/event/scala/progressive-f-tutorials-2013

More Related Content

More from Phillip Trelford

How to be a rock star developer
How to be a rock star developerHow to be a rock star developer
How to be a rock star developerPhillip Trelford
 
FSharp eye for the Haskell guy - London 2015
FSharp eye for the Haskell guy - London 2015FSharp eye for the Haskell guy - London 2015
FSharp eye for the Haskell guy - London 2015Phillip Trelford
 
Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015Phillip Trelford
 
F# for C# devs - Copenhagen .Net 2015
F# for C# devs - Copenhagen .Net 2015F# for C# devs - Copenhagen .Net 2015
F# for C# devs - Copenhagen .Net 2015Phillip Trelford
 
Generative Art - Functional Vilnius 2015
Generative Art - Functional Vilnius 2015Generative Art - Functional Vilnius 2015
Generative Art - Functional Vilnius 2015Phillip Trelford
 
24 hours later - FSharp Gotham 2015
24 hours later - FSharp Gotham  201524 hours later - FSharp Gotham  2015
24 hours later - FSharp Gotham 2015Phillip Trelford
 
Building cross platform games with Xamarin - Birmingham 2015
Building cross platform games with Xamarin - Birmingham 2015Building cross platform games with Xamarin - Birmingham 2015
Building cross platform games with Xamarin - Birmingham 2015Phillip Trelford
 
Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015Phillip Trelford
 
FSharp On The Desktop - Birmingham FP 2015
FSharp On The Desktop - Birmingham FP 2015FSharp On The Desktop - Birmingham FP 2015
FSharp On The Desktop - Birmingham FP 2015Phillip Trelford
 
Ready, steady, cross platform games - ProgNet 2015
Ready, steady, cross platform games - ProgNet 2015Ready, steady, cross platform games - ProgNet 2015
Ready, steady, cross platform games - ProgNet 2015Phillip Trelford
 
F# for C# devs - NDC Oslo 2015
F# for C# devs - NDC Oslo 2015F# for C# devs - NDC Oslo 2015
F# for C# devs - NDC Oslo 2015Phillip Trelford
 
F# for C# devs - Leeds Sharp 2015
F# for C# devs -  Leeds Sharp 2015F# for C# devs -  Leeds Sharp 2015
F# for C# devs - Leeds Sharp 2015Phillip Trelford
 
Build a compiler in 2hrs - NCrafts Paris 2015
Build a compiler in 2hrs -  NCrafts Paris 2015Build a compiler in 2hrs -  NCrafts Paris 2015
Build a compiler in 2hrs - NCrafts Paris 2015Phillip Trelford
 
24 Hours Later - NCrafts Paris 2015
24 Hours Later - NCrafts Paris 201524 Hours Later - NCrafts Paris 2015
24 Hours Later - NCrafts Paris 2015Phillip Trelford
 
F# for Trading - QuantLabs 2014
F# for Trading -  QuantLabs 2014F# for Trading -  QuantLabs 2014
F# for Trading - QuantLabs 2014Phillip Trelford
 
Write Your Own Compiler in 24 Hours
Write Your Own Compiler in 24 HoursWrite Your Own Compiler in 24 Hours
Write Your Own Compiler in 24 HoursPhillip Trelford
 

More from Phillip Trelford (20)

How to be a rock star developer
How to be a rock star developerHow to be a rock star developer
How to be a rock star developer
 
Mobile F#un
Mobile F#unMobile F#un
Mobile F#un
 
F# eXchange Keynote 2016
F# eXchange Keynote 2016F# eXchange Keynote 2016
F# eXchange Keynote 2016
 
FSharp eye for the Haskell guy - London 2015
FSharp eye for the Haskell guy - London 2015FSharp eye for the Haskell guy - London 2015
FSharp eye for the Haskell guy - London 2015
 
Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015
 
F# for C# devs - Copenhagen .Net 2015
F# for C# devs - Copenhagen .Net 2015F# for C# devs - Copenhagen .Net 2015
F# for C# devs - Copenhagen .Net 2015
 
Generative Art - Functional Vilnius 2015
Generative Art - Functional Vilnius 2015Generative Art - Functional Vilnius 2015
Generative Art - Functional Vilnius 2015
 
24 hours later - FSharp Gotham 2015
24 hours later - FSharp Gotham  201524 hours later - FSharp Gotham  2015
24 hours later - FSharp Gotham 2015
 
Building cross platform games with Xamarin - Birmingham 2015
Building cross platform games with Xamarin - Birmingham 2015Building cross platform games with Xamarin - Birmingham 2015
Building cross platform games with Xamarin - Birmingham 2015
 
Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015
 
FSharp On The Desktop - Birmingham FP 2015
FSharp On The Desktop - Birmingham FP 2015FSharp On The Desktop - Birmingham FP 2015
FSharp On The Desktop - Birmingham FP 2015
 
Ready, steady, cross platform games - ProgNet 2015
Ready, steady, cross platform games - ProgNet 2015Ready, steady, cross platform games - ProgNet 2015
Ready, steady, cross platform games - ProgNet 2015
 
F# for C# devs - NDC Oslo 2015
F# for C# devs - NDC Oslo 2015F# for C# devs - NDC Oslo 2015
F# for C# devs - NDC Oslo 2015
 
F# for C# devs - Leeds Sharp 2015
F# for C# devs -  Leeds Sharp 2015F# for C# devs -  Leeds Sharp 2015
F# for C# devs - Leeds Sharp 2015
 
Build a compiler in 2hrs - NCrafts Paris 2015
Build a compiler in 2hrs -  NCrafts Paris 2015Build a compiler in 2hrs -  NCrafts Paris 2015
Build a compiler in 2hrs - NCrafts Paris 2015
 
24 Hours Later - NCrafts Paris 2015
24 Hours Later - NCrafts Paris 201524 Hours Later - NCrafts Paris 2015
24 Hours Later - NCrafts Paris 2015
 
Real World F# - SDD 2015
Real World F# -  SDD 2015Real World F# -  SDD 2015
Real World F# - SDD 2015
 
F# for C# devs - SDD 2015
F# for C# devs - SDD 2015F# for C# devs - SDD 2015
F# for C# devs - SDD 2015
 
F# for Trading - QuantLabs 2014
F# for Trading -  QuantLabs 2014F# for Trading -  QuantLabs 2014
F# for Trading - QuantLabs 2014
 
Write Your Own Compiler in 24 Hours
Write Your Own Compiler in 24 HoursWrite Your Own Compiler in 24 Hours
Write Your Own Compiler in 24 Hours
 

Recently uploaded

Shark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristicsShark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristicsArubSultan
 
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...Nguyen Thanh Tu Collection
 
LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF NAT...
LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF  NAT...LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF  NAT...
LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF NAT...pragatimahajan3
 
DiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfDiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfChristalin Nelson
 
(Part 1) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf
(Part 1) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf(Part 1) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf
(Part 1) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdfMJDuyan
 
The Shop Floor Overview in the Odoo 17 ERP
The Shop Floor Overview in the Odoo 17 ERPThe Shop Floor Overview in the Odoo 17 ERP
The Shop Floor Overview in the Odoo 17 ERPCeline George
 
The Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian CongressThe Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian CongressMaria Paula Aroca
 
Paul Dobryden In Media Res Media Component
Paul Dobryden In Media Res Media ComponentPaul Dobryden In Media Res Media Component
Paul Dobryden In Media Res Media ComponentInMediaRes1
 
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptxTransdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptxinfo924062
 
(Part 2) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf
(Part 2) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf(Part 2) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf
(Part 2) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdfMJDuyan
 
HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...
HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...
HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...kumarpriyanshu81
 
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...DrVipulVKapoor
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxryandux83rd
 
What is Property Fields in Odoo 17 ERP Module
What is Property Fields in Odoo 17 ERP ModuleWhat is Property Fields in Odoo 17 ERP Module
What is Property Fields in Odoo 17 ERP ModuleCeline George
 
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...Nguyen Thanh Tu Collection
 
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEPART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEMISSRITIMABIOLOGYEXP
 
Sarah Lahm In Media Res Media Component
Sarah Lahm  In Media Res Media ComponentSarah Lahm  In Media Res Media Component
Sarah Lahm In Media Res Media ComponentInMediaRes1
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 

Recently uploaded (20)

Shark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristicsShark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristics
 
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
 
LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF NAT...
LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF  NAT...LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF  NAT...
LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF NAT...
 
DiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfDiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdf
 
(Part 1) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf
(Part 1) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf(Part 1) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf
(Part 1) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf
 
The Shop Floor Overview in the Odoo 17 ERP
The Shop Floor Overview in the Odoo 17 ERPThe Shop Floor Overview in the Odoo 17 ERP
The Shop Floor Overview in the Odoo 17 ERP
 
The Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian CongressThe Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian Congress
 
Paul Dobryden In Media Res Media Component
Paul Dobryden In Media Res Media ComponentPaul Dobryden In Media Res Media Component
Paul Dobryden In Media Res Media Component
 
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptxTransdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
 
(Part 2) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf
(Part 2) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf(Part 2) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf
(Part 2) CHILDREN'S DISABILITIES AND EXCEPTIONALITIES.pdf
 
HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...
HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...
HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptx
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
 
What is Property Fields in Odoo 17 ERP Module
What is Property Fields in Odoo 17 ERP ModuleWhat is Property Fields in Odoo 17 ERP Module
What is Property Fields in Odoo 17 ERP Module
 
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
 
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEPART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
 
Sarah Lahm In Media Res Media Component
Sarah Lahm  In Media Res Media ComponentSarah Lahm  In Media Res Media Component
Sarah Lahm In Media Res Media Component
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 

ML Titanic Survival with Decision Trees (40ch

  • 1. MACHINE LEARNING FROM DISASTER F#unctional Londoners @ Skills Matter Phil Trelford 2013 @ptrelford
  • 2. RMS Titanic On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. …there were not enough lifeboats for the passengers and crew. …some groups of people were more likely to survive than others, such as women, children, and the upper-class.
  • 4. Kaggle Titanic dataset train.csv test.csv PassengerIdSurvived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 1 0 3 Braund, Mr. Owen Harrismale 22 1 0 A/5 21171 7.25 S 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)female 38 1 0 PC 17599 71.2833 C85 C 3 1 3 Heikkinen, Miss. Lainafemale 26 0 0 STON/O2. 3101282 7.925 S 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)female 35 1 0 113803 53.1 C123 S 5 0 3 Allen, Mr. William Henrymale 35 0 0 373450 8.05 S 6 0 3 Moran, Mr. Jamesmale 0 0 330877 8.4583 Q 7 0 1 McCarthy, Mr. Timothy Jmale 54 0 0 17463 51.8625 E46 S 8 0 3 Palsson, Master. Gosta Leonardmale 2 3 1 349909 21.075 S 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female 27 0 2 347742 11.1333 S 10 1 2 Nasser, Mrs. Nicholas (Adele Achem)female 14 1 0 237736 30.0708 C 11 1 3 Sandstrom, Miss. Marguerite Rutfemale 4 1 1 PP 9549 16.7 G6 S 12 1 1 Bonnell, Miss. Elizabethfemale 58 0 0 113783 26.55 C103 S 13 0 3 Saundercock, Mr. William Henrymale 20 0 0 A/5. 2151 8.05 S 14 0 3 Andersson, Mr. Anders Johanmale 39 1 5 347082 31.275 S 15 0 3 Vestrom, Miss. Hulda Amanda Adolfinafemale 14 0 0 350406 7.8542 S 16 1 2 Hewlett, Mrs. (Mary D Kingcome)female 55 0 0 248706 16 S 17 0 3 Rice, Master. Eugenemale 2 4 1 382652 29.125 Q 18 1 2 Williams, Mr. Charles Eugenemale 0 0 244373 13 S 19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)female 31 1 0 345763 18 S 20 1 3 Masselmani, Mrs. Fatimafemale 0 0 2649 7.225 C 21 0 2 Fynney, Mr. Joseph Jmale 35 0 0 239865 26 S 22 1 2 Beesley, Mr. Lawrencemale 34 0 0 248698 13 D56 S 23 1 3 McGowan, Miss. Anna "Annie"female 15 0 0 330923 8.0292 Q 24 1 1 Sloper, Mr. William Thompsonmale 28 0 0 113788 35.5 A6 S 25 0 3 Palsson, Miss. Torborg Danirafemale 8 3 1 349909 21.075 S 26 1 3 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)female 38 1 5 347077 31.3875 S 27 0 3 Emir, Mr. Farred Chehabmale 0 0 2631 7.225 C 28 0 1 Fortune, Mr. Charles Alexandermale 19 3 2 19950 263 C23 C25 C27 S
  • 5. DATA ANALYSIS Titanic: Titanic: Machine Learning from Disaster
  • 7. Counting let female (passenger:Passenger) = passenger.Sex = “female” let survived (passenger:Passenger) = passenger.Survived = 1 let females = passengers |> where female let femaleSurvivors = females |> tally survived let femaleSurvivorsPc = females |> percentage survived
  • 8. Tally Ho! /// Tally up items that match specified criteria let tally criteria items = items |> Array.filter criteria |> Array.length /// Percentage of items that match specified criteria let percentage criteria items = let total = items |> Array.length let count = items |> tally criteria float count * 100.0 / float total
  • 9. Survival rate /// Survival rate of a criteria’s group let survivalRate criteria = passengers |> Array.groupBy criteria |> Array.map (fun (key,matching) -> key, matching |> Array.percentage survived ) let embarked = survivalRate (fun p -> p.Embarked)
  • 10. Score let score f = passengers |> Array.percentage (fun p -> f p = p.Survived) let rate = score (fun p -> (child p || female p) && not (p.Class = 3))
  • 11. MACHINE LEARNING Titanic: Machine Learning from Disaster
  • 12. 20 Questions The game suggests that the information (as measured by Shannon's entropy statisti c) required to identify an arbitrary object is at most 20 bits. The game is often used as an example when teaching people about information theory. Mathematically, if each question is structured to eliminate half the objects, 20 questions will allow the questioner to distinguish between 220 or 1,048,576 objects.
  • 13. Decision Trees A tree can be "learned" by splitting the source set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning.
  • 14. Split data set (from ML in Action) Python def splitDataSet(dataSet, axis, value): retDataSet = [] for featVec in dataSet: if featVec[axis] == value: reducedFeatVec = featVec[:axis] reducedFeatVec.extend(featVec[axis+1:]) retDataSet.append(reducedFeatVec) return retDataSet F# let splitDataSet(dataSet, axis, value) = [|for featVec in dataSet do if featVec.[axis] = value then yield featVec |> Array.removeAt axis|]
  • 15. Decision Tree let labels = [|"sex"; "class"|] let features (p:Passenger) : obj[] = [|p.Sex; p.Pclass|] let dataSet : obj[][] = [|for passenger in passengers -> [|yield! features passenger; yield box (p.Survived = 1)|] |] let tree = createTree(dataSet, labels)
  • 18. Decision Tree: Create -> Classify let rec classify(inputTree, featLabels:string[], testVec:obj[]) = match inputTree with | Leaf(x) -> x | Branch(s,xs) -> let featIndex = featLabels |> Array.findIndex ((=) s) xs |> Array.pick (fun (value,tree) -> if testVec.[featIndex] = value then classify(tree, featLabels,testVec) |> Some else None )
  • 19. Titanic Data Variable Description survival Survival (0 = No; 1 = Yes) pclass Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd) name Name sex Sex age Age sibsp Number of Siblings/Spouses Aboard parch Number of Parents/Children Aboard ticket Ticket Number fare Passenger Fare cabin Cabin embarked Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton) Tips: * Empty floats - Double.Nan
  • 21. Special thanks! ◦ Matthias Brandewinder for the Machine Learning samples ◦ http://www.clear-lines.com/blog/ ◦ Tomas Petricek & Gustavo Guerra for FSharp.Data library ◦ http://fsharp.github.io/FSharp.Data/ ◦ F# Team for Type Providers ◦ http://blogs.msdn.com/b/dsyme/archive/2013/01/30/twelve-type-providers-in-pictures.aspx ◦ Peter Harrington’s for the Machine Learning in Action code samples ◦ http://www.manning.com/pharrington/ ◦ Kaggle for the Titanic data set ◦ http://www.kaggle.com/c/titanic-gettingStarted
  • 23. What next? F# Machine Learning information ◦ http://fsharp.org/machine-learning/ Random Forests ◦ http://tinyurl.com/randomforests Progressive F# Tutorials ◦ http://skillsmatter.com/event/scala/progressive-f-tutorials-2013

Editor's Notes

  1. http://www.kaggle.com/c/titanic-gettingStarted
  2. http://www.kaggle.com/c/titanic-gettingStarted
  3. http://www.kaggle.com/c/titanic-gettingStarted/data
  4. http://fsharp.github.io/FSharp.Data/library/CsvProvider.htmlhttp://clear-lines.com/blog/post/Random-Forest-classification-in-F-first-cut.aspx
  5. https://en.wikipedia.org/wiki/Twenty_Questions
  6. http://en.wikipedia.org/wiki/Decision_tree_learning
  7. http://en.wikipedia.org/wiki/Overfitting
  8. http://en.wikipedia.org/wiki/Decision_tree_learninghttp://clear-lines.com/blog/post/Decision-Tree-classification.aspx
  9. http://www.kaggle.com/c/titanic-gettingStarted/data
  10. http://www.indeed.com/jobanalytics/jobtrends?q=machine+learning&l=