2. RMS Titanic
On April 15, 1912, during
her maiden voyage, the
Titanic sank after colliding
with an iceberg, killing
1502 out of 2224
passengers and crew.
…there were not enough
lifeboats for the
passengers and crew.
…some groups of people
were more likely to survive
than others, such as
women, children, and the
upper-class.
4. Kaggle
Titanic
dataset
train.csv
test.csv
PassengerIdSurvived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
1 0 3 Braund, Mr. Owen Harrismale 22 1 0 A/5 21171 7.25 S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)female 38 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen, Miss. Lainafemale 26 0 0 STON/O2. 3101282 7.925 S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)female 35 1 0 113803 53.1 C123 S
5 0 3 Allen, Mr. William Henrymale 35 0 0 373450 8.05 S
6 0 3 Moran, Mr. Jamesmale 0 0 330877 8.4583 Q
7 0 1 McCarthy, Mr. Timothy Jmale 54 0 0 17463 51.8625 E46 S
8 0 3 Palsson, Master. Gosta Leonardmale 2 3 1 349909 21.075 S
9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female 27 0 2 347742 11.1333 S
10 1 2 Nasser, Mrs. Nicholas (Adele Achem)female 14 1 0 237736 30.0708 C
11 1 3 Sandstrom, Miss. Marguerite Rutfemale 4 1 1 PP 9549 16.7 G6 S
12 1 1 Bonnell, Miss. Elizabethfemale 58 0 0 113783 26.55 C103 S
13 0 3 Saundercock, Mr. William Henrymale 20 0 0 A/5. 2151 8.05 S
14 0 3 Andersson, Mr. Anders Johanmale 39 1 5 347082 31.275 S
15 0 3 Vestrom, Miss. Hulda Amanda Adolfinafemale 14 0 0 350406 7.8542 S
16 1 2 Hewlett, Mrs. (Mary D Kingcome)female 55 0 0 248706 16 S
17 0 3 Rice, Master. Eugenemale 2 4 1 382652 29.125 Q
18 1 2 Williams, Mr. Charles Eugenemale 0 0 244373 13 S
19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)female 31 1 0 345763 18 S
20 1 3 Masselmani, Mrs. Fatimafemale 0 0 2649 7.225 C
21 0 2 Fynney, Mr. Joseph Jmale 35 0 0 239865 26 S
22 1 2 Beesley, Mr. Lawrencemale 34 0 0 248698 13 D56 S
23 1 3 McGowan, Miss. Anna "Annie"female 15 0 0 330923 8.0292 Q
24 1 1 Sloper, Mr. William Thompsonmale 28 0 0 113788 35.5 A6 S
25 0 3 Palsson, Miss. Torborg Danirafemale 8 3 1 349909 21.075 S
26 1 3 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)female 38 1 5 347077 31.3875 S
27 0 3 Emir, Mr. Farred Chehabmale 0 0 2631 7.225 C
28 0 1 Fortune, Mr. Charles Alexandermale 19 3 2 19950 263 C23 C25 C27 S
5. Titanic Data
Variable Description
survival Survival (0 = No; 1 = Yes)
pclass Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
name Name
sex Sex
age Age
sibsp Number of Siblings/Spouses Aboard
parch Number of Parents/Children Aboard
ticket Ticket Number
fare Passenger Fare
cabin Cabin
embarked Port of Embarkation
(C = Cherbourg; Q = Queenstown; S =
Southampton)
Tips:
* Empty floats -
Double.Nan
8. Counting
let female (passenger:Passenger) = passenger.Sex = “female”
let survived (passenger:Passenger) = passenger.Survived = 1
let females = passengers |> where female
let femaleSurvivors = females |> tally survived
let femaleSurvivorsPc = females |> percentage survived
9. Tally Ho!
/// Tally up items that match specified criteria
let tally criteria items =
items |> Array.filter criteria |> Array.length
/// Percentage of items that match specified criteria
let percentage criteria items =
let total = items |> Array.length
let count = items |> tally criteria
float count * 100.0 / float total
10. Survival rate
/// Survival rate of a criteria’s group
let survivalRate criteria =
passengers |> Array.groupBy criteria
|> Array.map (fun (key,matching) ->
key, matching |> Array.percentage survived
)
let embarked = survivalRate (fun p -> p.Embarked)
11. Score
let score f = passengers |> Array.percentage (fun p -> f p = p.Survived)
let rate = score (fun p -> (child p || female p) && not (p.Class = 3))
13. 20 Questions
The game suggests that the
information (as measured
by Shannon's entropy statisti
c) required to identify an
arbitrary object is at most
20 bits. The game is often
used as an example when
teaching people
about information theory.
Mathematically, if each
question is structured to
eliminate half the objects,
20 questions will allow the
questioner to distinguish
between 220 or 1,048,576
objects.
14. Decision
Trees
A tree can be "learned"
by splitting the
source set into subsets
based on an attribute
value test. This process is
repeated on each
derived subset in a
recursive manner
called recursive
partitioning.
15. Split data set (from ML in Action)
Python
def splitDataSet(dataSet, axis, value):
retDataSet = []
for featVec in dataSet:
if featVec[axis] == value:
reducedFeatVec = featVec[:axis]
reducedFeatVec.extend(featVec[axis+1:])
retDataSet.append(reducedFeatVec)
return retDataSet
F#
let splitDataSet(dataSet, axis, value) =
[|for featVec in dataSet do
if featVec.[axis] = value then
yield featVec |> Array.removeAt axis|]
16. Decision
Tree
let labels =
[|"sex"; "class"|]
let features (p:Passenger) : obj[] =
[|p.Sex; p.Pclass|]
let dataSet : obj[][] =
[|for passenger in passengers ->
[|yield! features passenger;
yield box (p.Survived = 1)|] |]
let tree = createTree(dataSet, labels)
21. Special thanks!
◦ Matthias Brandewinder for the Machine Learning samples
◦ http://www.clear-lines.com/blog/
◦ Tomas Petricek & Gustavo Guerra for the FSharp.Data library
◦ http://fsharp.github.io/FSharp.Data/
◦ F# Team for Type Providers
◦ http://blogs.msdn.com/b/dsyme/archive/2013/01/30/twelve-type-providers-in-pictures.aspx
◦ Peter Harrington for the Machine Learning in Action code samples
◦ http://www.manning.com/pharrington/
◦ Kaggle for the Titanic data set
◦ http://www.kaggle.com/c/titanic-gettingStarted