ALL YOUR TYPES ARE
BELONG TO US!
PHILLIP TRELFORD, @PTRELFORD
DDD DUNDEE 2013, #DUNDDD
F#UNCTIONAL LONDONERS
Meetup

Topics

• 600 members

• Finance

• 50 meetup

• Machine Learning

• Meets every 2 weeks

• Big Data

• Talks & Hands On

• Gaming
FSHARP.ORG/GROUPS
F# TESTIMONIALS – MACHINE
LEARNING
PHILLIP TRELFORD, @PTRELFORD
DDD DUNDEE 2013, #DUNDDD
FSHARP.ORG/TESTIMONIALS
For a machine learning scientist, speed of experimentation is the critical factor to optimize.
Compiling is fast but loading large amounts of data in memory takes a long time.
With F#’s REPL, you only need to load the data once

and you can then code and explore in the interactive environment.
Unlike C# and C++, F# was designed for this mode of interaction.
- Patrice Simard, Microsoft
FSHARP.ORG/TESTIMONIALS - AMYRIS
BIOTECH
F# has been phenomenally useful.
I would be writing a lot of this in Python otherwise
and F# is more robust, 20x - 100x faster to run

and for anything but the most trivial programs,
faster to develop.
- Darren Platt, Amyris Biotechnology
CASE STUDIES
PHILLIP TRELFORD, @PTRELFORD
DDD DUNDEE 2013, #DUNDDD
F# TOOLS FOR HALO 3
Questions
• Controllable player skill distribution (slow down!)
• Controllable skills distributions (re-ordering)

Simulations
• Large scale simulation of 8,000,000,000 matches
• Distributed computation – 15 machines for 2wks
Tools

• Result viewer (Logged results: 52GB of data)
• Real-time simulator of partial update
ADCENTER
Weeks of data in training:
• 7,000,000,000 impressions
2 weeks of CPU time during sessions

• 2 wks x 7 days x 86,400 sec/day
Learning algorithm speed requirement:
• 5,787 impression updates /sec
• 172.8 µs per impression update
LIVE DEMOS
PHILLIP TRELFORD, @PTRELFORD
DDD DUNDEE 2013, #DUNDDD
TYPE PROVIDERS: JSON

open FSharp.Data

type Simple = JsonProvider<“sample.js”>
let simple = Simple.Parse(""" { "name":"Tomas", "age":4 } """)
simple.Age
CSV TYPE PROVIDER
SPLIT DATA SET (FROM ML IN ACTION)
Python
def splitDataSet(dataSet, axis, value):
retDataSet = []
for featVec in dataSet:
if featVec[axis] == value:

reducedFeatVec = featVec[:axis]
reducedFeatVec.extend(featVec[axis+1:])
retDataSet.append(reducedFeatVec)
return retDataSet

F#
let splitDataSet(dataSet, axis, value) =
[|for featVec in dataSet do
if featVec.[axis] = value then
yield featVec |> Array.removeAt axis|]
K-MEANS CLUSTERING ALGORITHM
(* K-Means Algorithm *)
/// Group all the vectors by the nearest center.
let classify centroids vectors =
vectors |> Array.groupBy (fun v -> centroids |> Array.minBy (distance v))
/// Repeatedly classify the vectors, starting with the seed centroids
let computeCentroids seed vectors =
seed |> Seq.iterate (fun centers -> classify centers vectors

|> Array.map (snd >> average))
R – TYPE PROVIDER
WORLD BANK DATA
RESOURCES
PHILLIP TRELFORD, @PTRELFORD
DDD DUNDEE 2013, #DUNDDD
TYPE PROVIDERS
• JSON
• XML
• CSV

• Excel
• SQL
• R
• MATLAB

• Hadoop
• ...
TRYFSHARP.ORG
BUY THE BOOK
GET THE T-SHIRT
MACHINE LEARNING JOB TRENDS

• Source indeed.co.uk
QUESTIONS
PHILLIP TRELFORD, @PTRELFORD
DDD DUNDEE 2013, #DUNDDD

All your types are belong to us!

  • 1.
    ALL YOUR TYPESARE BELONG TO US! PHILLIP TRELFORD, @PTRELFORD DDD DUNDEE 2013, #DUNDDD
  • 2.
    F#UNCTIONAL LONDONERS Meetup Topics • 600members • Finance • 50 meetup • Machine Learning • Meets every 2 weeks • Big Data • Talks & Hands On • Gaming
  • 3.
  • 4.
    F# TESTIMONIALS –MACHINE LEARNING PHILLIP TRELFORD, @PTRELFORD DDD DUNDEE 2013, #DUNDDD
  • 5.
    FSHARP.ORG/TESTIMONIALS For a machinelearning scientist, speed of experimentation is the critical factor to optimize. Compiling is fast but loading large amounts of data in memory takes a long time. With F#’s REPL, you only need to load the data once and you can then code and explore in the interactive environment. Unlike C# and C++, F# was designed for this mode of interaction. - Patrice Simard, Microsoft
  • 6.
    FSHARP.ORG/TESTIMONIALS - AMYRIS BIOTECH F#has been phenomenally useful. I would be writing a lot of this in Python otherwise and F# is more robust, 20x - 100x faster to run and for anything but the most trivial programs, faster to develop. - Darren Platt, Amyris Biotechnology
  • 7.
    CASE STUDIES PHILLIP TRELFORD,@PTRELFORD DDD DUNDEE 2013, #DUNDDD
  • 8.
    F# TOOLS FORHALO 3 Questions • Controllable player skill distribution (slow down!) • Controllable skills distributions (re-ordering) Simulations • Large scale simulation of 8,000,000,000 matches • Distributed computation – 15 machines for 2wks Tools • Result viewer (Logged results: 52GB of data) • Real-time simulator of partial update
  • 9.
    ADCENTER Weeks of datain training: • 7,000,000,000 impressions 2 weeks of CPU time during sessions • 2 wks x 7 days x 86,400 sec/day Learning algorithm speed requirement: • 5,787 impression updates /sec • 172.8 µs per impression update
  • 10.
    LIVE DEMOS PHILLIP TRELFORD,@PTRELFORD DDD DUNDEE 2013, #DUNDDD
  • 11.
    TYPE PROVIDERS: JSON openFSharp.Data type Simple = JsonProvider<“sample.js”> let simple = Simple.Parse(""" { "name":"Tomas", "age":4 } """) simple.Age
  • 12.
  • 13.
    SPLIT DATA SET(FROM ML IN ACTION) Python def splitDataSet(dataSet, axis, value): retDataSet = [] for featVec in dataSet: if featVec[axis] == value: reducedFeatVec = featVec[:axis] reducedFeatVec.extend(featVec[axis+1:]) retDataSet.append(reducedFeatVec) return retDataSet F# let splitDataSet(dataSet, axis, value) = [|for featVec in dataSet do if featVec.[axis] = value then yield featVec |> Array.removeAt axis|]
  • 14.
    K-MEANS CLUSTERING ALGORITHM (*K-Means Algorithm *) /// Group all the vectors by the nearest center. let classify centroids vectors = vectors |> Array.groupBy (fun v -> centroids |> Array.minBy (distance v)) /// Repeatedly classify the vectors, starting with the seed centroids let computeCentroids seed vectors = seed |> Seq.iterate (fun centers -> classify centers vectors |> Array.map (snd >> average))
  • 15.
    R – TYPEPROVIDER
  • 16.
  • 17.
  • 18.
    TYPE PROVIDERS • JSON •XML • CSV • Excel • SQL • R • MATLAB • Hadoop • ...
  • 19.
  • 20.
  • 21.
  • 22.
    MACHINE LEARNING JOBTRENDS • Source indeed.co.uk
  • 23.

Editor's Notes

  • #4 Fsharp.org map
  • #13 http://fsharp.github.io/FSharp.Data/library/CsvProvider.htmlhttp://clear-lines.com/blog/post/Random-Forest-classification-in-F-first-cut.aspx
  • #23 http://www.indeed.com/jobanalytics/jobtrends?q=machine+learning&amp;l=