• Like
Machine learning from disaster
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Machine learning from disaster

  • 6,109 views
Published

 

Published in Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
6,109
On SlideShare
0
From Embeds
0
Number of Embeds
32

Actions

Shares
Downloads
36
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • http://www.kaggle.com/c/titanic-gettingStarted
  • http://www.kaggle.com/c/titanic-gettingStarted
  • http://www.kaggle.com/c/titanic-gettingStarted/data
  • http://fsharp.github.io/FSharp.Data/library/CsvProvider.htmlhttp://clear-lines.com/blog/post/Random-Forest-classification-in-F-first-cut.aspx
  • https://en.wikipedia.org/wiki/Twenty_Questions
  • http://en.wikipedia.org/wiki/Decision_tree_learning
  • http://en.wikipedia.org/wiki/Overfitting
  • http://en.wikipedia.org/wiki/Decision_tree_learninghttp://clear-lines.com/blog/post/Decision-Tree-classification.aspx
  • http://www.kaggle.com/c/titanic-gettingStarted/data
  • http://www.indeed.com/jobanalytics/jobtrends?q=machine+learning&l=

Transcript

  • 1. MACHINE LEARNING FROM DISASTER F#unctional Londoners @ Skills Matter Phil Trelford 2013 @ptrelford
  • 2. RMS Titanic On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. …there were not enough lifeboats for the passengers and crew. …some groups of people were more likely to survive than others, such as women, children, and the upper-class.
  • 3. Kaggle competition
  • 4. Kaggle Titanic dataset train.csv test.csv PassengerIdSurvived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 1 0 3 Braund, Mr. Owen Harrismale 22 1 0 A/5 21171 7.25 S 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)female 38 1 0 PC 17599 71.2833 C85 C 3 1 3 Heikkinen, Miss. Lainafemale 26 0 0 STON/O2. 3101282 7.925 S 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)female 35 1 0 113803 53.1 C123 S 5 0 3 Allen, Mr. William Henrymale 35 0 0 373450 8.05 S 6 0 3 Moran, Mr. Jamesmale 0 0 330877 8.4583 Q 7 0 1 McCarthy, Mr. Timothy Jmale 54 0 0 17463 51.8625 E46 S 8 0 3 Palsson, Master. Gosta Leonardmale 2 3 1 349909 21.075 S 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female 27 0 2 347742 11.1333 S 10 1 2 Nasser, Mrs. Nicholas (Adele Achem)female 14 1 0 237736 30.0708 C 11 1 3 Sandstrom, Miss. Marguerite Rutfemale 4 1 1 PP 9549 16.7 G6 S 12 1 1 Bonnell, Miss. Elizabethfemale 58 0 0 113783 26.55 C103 S 13 0 3 Saundercock, Mr. William Henrymale 20 0 0 A/5. 2151 8.05 S 14 0 3 Andersson, Mr. Anders Johanmale 39 1 5 347082 31.275 S 15 0 3 Vestrom, Miss. Hulda Amanda Adolfinafemale 14 0 0 350406 7.8542 S 16 1 2 Hewlett, Mrs. (Mary D Kingcome)female 55 0 0 248706 16 S 17 0 3 Rice, Master. Eugenemale 2 4 1 382652 29.125 Q 18 1 2 Williams, Mr. Charles Eugenemale 0 0 244373 13 S 19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)female 31 1 0 345763 18 S 20 1 3 Masselmani, Mrs. Fatimafemale 0 0 2649 7.225 C 21 0 2 Fynney, Mr. Joseph Jmale 35 0 0 239865 26 S 22 1 2 Beesley, Mr. Lawrencemale 34 0 0 248698 13 D56 S 23 1 3 McGowan, Miss. Anna "Annie"female 15 0 0 330923 8.0292 Q 24 1 1 Sloper, Mr. William Thompsonmale 28 0 0 113788 35.5 A6 S 25 0 3 Palsson, Miss. Torborg Danirafemale 8 3 1 349909 21.075 S 26 1 3 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)female 38 1 5 347077 31.3875 S 27 0 3 Emir, Mr. Farred Chehabmale 0 0 2631 7.225 C 28 0 1 Fortune, Mr. Charles Alexandermale 19 3 2 19950 263 C23 C25 C27 S
  • 5. DATA ANALYSIS Titanic: Titanic: Machine Learning from Disaster
  • 6. FSharp.Data: CSV Provider
  • 7. Counting let female (passenger:Passenger) = passenger.Sex = “female” let survived (passenger:Passenger) = passenger.Survived = 1 let females = passengers |> where female let femaleSurvivors = females |> tally survived let femaleSurvivorsPc = females |> percentage survived
  • 8. Tally Ho! /// Tally up items that match specified criteria let tally criteria items = items |> Array.filter criteria |> Array.length /// Percentage of items that match specified criteria let percentage criteria items = let total = items |> Array.length let count = items |> tally criteria float count * 100.0 / float total
  • 9. Survival rate /// Survival rate of a criteria’s group let survivalRate criteria = passengers |> Array.groupBy criteria |> Array.map (fun (key,matching) -> key, matching |> Array.percentage survived ) let embarked = survivalRate (fun p -> p.Embarked)
  • 10. Score let score f = passengers |> Array.percentage (fun p -> f p = p.Survived) let rate = score (fun p -> (child p || female p) && not (p.Class = 3))
  • 11. MACHINE LEARNING Titanic: Machine Learning from Disaster
  • 12. 20 Questions The game suggests that the information (as measured by Shannon's entropy statisti c) required to identify an arbitrary object is at most 20 bits. The game is often used as an example when teaching people about information theory. Mathematically, if each question is structured to eliminate half the objects, 20 questions will allow the questioner to distinguish between 220 or 1,048,576 objects.
  • 13. Decision Trees A tree can be "learned" by splitting the source set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning.
  • 14. Split data set (from ML in Action) Python def splitDataSet(dataSet, axis, value): retDataSet = [] for featVec in dataSet: if featVec[axis] == value: reducedFeatVec = featVec[:axis] reducedFeatVec.extend(featVec[axis+1:]) retDataSet.append(reducedFeatVec) return retDataSet F# let splitDataSet(dataSet, axis, value) = [|for featVec in dataSet do if featVec.[axis] = value then yield featVec |> Array.removeAt axis|]
  • 15. Decision Tree let labels = [|"sex"; "class"|] let features (p:Passenger) : obj[] = [|p.Sex; p.Pclass|] let dataSet : obj[][] = [|for passenger in passengers -> [|yield! features passenger; yield box (p.Survived = 1)|] |] let tree = createTree(dataSet, labels)
  • 16. Overfitting
  • 17. CLASSIFY Titanic: Machine Learning from Disaster
  • 18. Decision Tree: Create -> Classify let rec classify(inputTree, featLabels:string[], testVec:obj[]) = match inputTree with | Leaf(x) -> x | Branch(s,xs) -> let featIndex = featLabels |> Array.findIndex ((=) s) xs |> Array.pick (fun (value,tree) -> if testVec.[featIndex] = value then classify(tree, featLabels,testVec) |> Some else None )
  • 19. Titanic Data Variable Description survival Survival (0 = No; 1 = Yes) pclass Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd) name Name sex Sex age Age sibsp Number of Siblings/Spouses Aboard parch Number of Parents/Children Aboard ticket Ticket Number fare Passenger Fare cabin Cabin embarked Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton) Tips: * Empty floats - Double.Nan
  • 20. RESOURCES Titanic: Machine Learning from Disaster
  • 21. Special thanks! ◦ Matthias Brandewinder for the Machine Learning samples ◦ http://www.clear-lines.com/blog/ ◦ Tomas Petricek & Gustavo Guerra for FSharp.Data library ◦ http://fsharp.github.io/FSharp.Data/ ◦ F# Team for Type Providers ◦ http://blogs.msdn.com/b/dsyme/archive/2013/01/30/twelve-type-providers-in-pictures.aspx ◦ Peter Harrington’s for the Machine Learning in Action code samples ◦ http://www.manning.com/pharrington/ ◦ Kaggle for the Titanic data set ◦ http://www.kaggle.com/c/titanic-gettingStarted
  • 22. Machine Learning Job Trends Source indeed.co.uk
  • 23. What next? F# Machine Learning information ◦ http://fsharp.org/machine-learning/ Random Forests ◦ http://tinyurl.com/randomforests Progressive F# Tutorials ◦ http://skillsmatter.com/event/scala/progressive-f-tutorials-2013