F# R TYPE
PROVIDER
Howard Mansell
6th September 2012
DISCLAIMER
F# R TYPE PROVIDER
 F# vs. R
 What are Type Providers?
 The R Provider
 Challenges
 Type Provider Growing Pains
 Was it worth it?
F# VS R

F#                                   R
 Functional + OO                     Functional-ish + Crazy-OO
 Compiled                            Interpreted
 Statically typed                    Dynamically typed
 OK for Exploratory Analysis         Strong for Exploratory Analysis
 Well-suited for building systems    Unsuitable for building systems
 Weak math/stats libraries           Strong stats libraries
 Basic visualization tools           Rich visualization tools
 Good for data acquisition           Poor for data acquisition
 Good for data processing            Decent for data processing
 Scalable                            Not particularly scalable
MIXING THEM
 WHY?
   Functionality from .NET libraries
   Data acquisition & transformation in F#
   Stats/graphics functionality from R libraries
   Build robust systems

 HOW?
   RDotNet provides .NET OO wrapper around R.DLL (in-process)
   RCOM provides cross-process access to R session via DCOM
   Rserve provides client-server socket-server access to R session
F# TYPE PROVIDERS
 A mechanism for “dynamically” providing types to the IDE +
  compiler
 Provided at compile/edit time, based on:
    Static parameters (in code)
    Access to external resources (database, WSDL, odata)

 Downstream code is then statically typed
 Good Intellisense experience
 Code fragments are generated at compile-time and injected
    “Schema” baked into client code

 Addresses a significant issue that drives people to dynamic
  languages
TYPE PROVIDERS VS CODE GEN
 Generally equivalent, except…
 Some problems don‟t scale with codegen (e.g. Freebase provider)
 Simpler process-wise (no additional tool to know/run)
 Uniform mechanism for access
 Somewhat simpler/less error-prone to write type provider
THE R PROVIDER
 Type Providers can be used for inter-language interop / meta-
  programming
 The “external resource”/schema in this case is the R environment
 Make R packages available as .NET namespaces
 Make R functions & values available as .NET members
 Uses RDotNet
    Lightweight, in-process
    Results kept in R environment unless explicitly marshaled back
    Objects can be explicitly saved and loaded into a real R session if desired.

 Available at
  http://github.com/BlueMountainCapital/FSharpRProvider
CHALLENGES
 How do we bridge dynamic <-> static typing?
 Dynamic typing basically just has one static type – Any/Obj/…
 .NET-base statically-typed languages still have a dynamic typing
  system
 Dynamic languages have lightweight syntax for dynamic method
  dispatch
 But R eshews dotted notation for method dispatch
 Do the obvious thing – use the “one static type”
    All arguments are of type object
    Results are of type RDotNet.SymbolicExpression – keeps result inside R
      engine
    Arguments can be native .NET types or SymbolicExpression
ARGUMENT PASSING
CONVENTIONS
 R has named and positional passing styles
 R has … argument (params/varargs)
 R allows arguments to have default values
    Function will be invoked even if no value supplied and no default
    Make all arguments optional

 These map pretty well onto F# named/optional arguments
    Need to expose functions as static members.
    Always exposed as RProvider.packagename.R.functionname

 Exceptions:
    In R, … argument can come before named arguments.
    In R, … arguments can be passed using an identifying name.
ARGUMENT CONVERSION
 Obvious basic type conversions are built in:
    Seq<double> -> numeric vector
    Double -> numeric vector
    Etc.

 Lists can be constructed using R.list()
 How do we support implicit conversion of bespoke classes?
    E.g. we have our own .NET DataFrame type, should convert to R data.frame
    Avoid forking the Open Source project

 Support plug-ins via Managed Extensibility Framework
    Plug-ins can use the type provider to call R functions, or talk to REngine
      directly
RESULT CONVERSION
 Results always come back as SymbolicExpression
    RDotNet wrapper around the R C datatype SEXPREC

 RProvider adds a Value property as extension
    Returns the default .NET representation of the SymbolicExpression
    Obvious default conversions are built-in – can add/override using MEF plug-in

 We also add GetValue<„ResType> : unit -> „ResType
    Allows caller to specify the type they want
    Supports things like NumericVector->double when vector is length 1
    Can also augment/override using MEF plug-in
TYPE PROVIDER GROWING PAINS
 Type providers are an awesome idea
 Current implementation has some kinks:
    Cannot compile Type Provider while binary is in use by VS (VS keeps it
     locked)
    If Type Providers are dependent on other assemblies they may not get
     resolved
    Accessing slow external resources can slow down your machine
    Buggy type providers can crash the IDE or compiler
    Builds may fail because of machine configuration:
        E.g. you don‟t have R
        You don‟t have the same packages installed in R
        Best to put external resources or schema files in source control
WAS IT WORTH IT?
 Having integrated, slightly-type-safe access to R from F#
  interactive is extremely powerful.
 This problem can be solved using code generation
 Using the type provider is much more “fluid” – no process
 Issues from previous slide detract from that somewhat
 Type providers have lots of interesting applications outside data
  access:
    COM interop
    WinRT interop
    Intra-language meta-programming (if static parameters were more flexible)

F# Type Provider for R Statistical Platform

  • 1.
    F# R TYPE PROVIDER HowardMansell 6th September 2012
  • 2.
  • 3.
    F# R TYPEPROVIDER  F# vs. R  What are Type Providers?  The R Provider  Challenges  Type Provider Growing Pains  Was it worth it?
  • 4.
    F# VS R F# R  Functional + OO  Functional-ish + Crazy-OO  Compiled  Interpreted  Statically typed  Dynamically typed  OK for Exploratory Analysis  Strong for Exploratory Analysis  Well-suited for building systems  Unsuitable for building systems  Weak math/stats libraries  Strong stats libraries  Basic visualization tools  Rich visualization tools  Good for data acquisition  Poor for data acquisition  Good for data processing  Decent for data processing  Scalable  Not particularly scalable
  • 5.
    MIXING THEM  WHY?  Functionality from .NET libraries  Data acquisition & transformation in F#  Stats/graphics functionality from R libraries  Build robust systems  HOW?  RDotNet provides .NET OO wrapper around R.DLL (in-process)  RCOM provides cross-process access to R session via DCOM  Rserve provides client-server socket-server access to R session
  • 6.
    F# TYPE PROVIDERS A mechanism for “dynamically” providing types to the IDE + compiler  Provided at compile/edit time, based on:  Static parameters (in code)  Access to external resources (database, WSDL, odata)  Downstream code is then statically typed  Good Intellisense experience  Code fragments are generated at compile-time and injected  “Schema” baked into client code  Addresses a significant issue that drives people to dynamic languages
  • 7.
    TYPE PROVIDERS VSCODE GEN  Generally equivalent, except…  Some problems don‟t scale with codegen (e.g. Freebase provider)  Simpler process-wise (no additional tool to know/run)  Uniform mechanism for access  Somewhat simpler/less error-prone to write type provider
  • 8.
    THE R PROVIDER Type Providers can be used for inter-language interop / meta- programming  The “external resource”/schema in this case is the R environment  Make R packages available as .NET namespaces  Make R functions & values available as .NET members  Uses RDotNet  Lightweight, in-process  Results kept in R environment unless explicitly marshaled back  Objects can be explicitly saved and loaded into a real R session if desired.  Available at http://github.com/BlueMountainCapital/FSharpRProvider
  • 9.
    CHALLENGES  How dowe bridge dynamic <-> static typing?  Dynamic typing basically just has one static type – Any/Obj/…  .NET-base statically-typed languages still have a dynamic typing system  Dynamic languages have lightweight syntax for dynamic method dispatch  But R eshews dotted notation for method dispatch  Do the obvious thing – use the “one static type”  All arguments are of type object  Results are of type RDotNet.SymbolicExpression – keeps result inside R engine  Arguments can be native .NET types or SymbolicExpression
  • 10.
    ARGUMENT PASSING CONVENTIONS  Rhas named and positional passing styles  R has … argument (params/varargs)  R allows arguments to have default values  Function will be invoked even if no value supplied and no default  Make all arguments optional  These map pretty well onto F# named/optional arguments  Need to expose functions as static members.  Always exposed as RProvider.packagename.R.functionname  Exceptions:  In R, … argument can come before named arguments.  In R, … arguments can be passed using an identifying name.
  • 11.
    ARGUMENT CONVERSION  Obviousbasic type conversions are built in:  Seq<double> -> numeric vector  Double -> numeric vector  Etc.  Lists can be constructed using R.list()  How do we support implicit conversion of bespoke classes?  E.g. we have our own .NET DataFrame type, should convert to R data.frame  Avoid forking the Open Source project  Support plug-ins via Managed Extensibility Framework  Plug-ins can use the type provider to call R functions, or talk to REngine directly
  • 12.
    RESULT CONVERSION  Resultsalways come back as SymbolicExpression  RDotNet wrapper around the R C datatype SEXPREC  RProvider adds a Value property as extension  Returns the default .NET representation of the SymbolicExpression  Obvious default conversions are built-in – can add/override using MEF plug-in  We also add GetValue<„ResType> : unit -> „ResType  Allows caller to specify the type they want  Supports things like NumericVector->double when vector is length 1  Can also augment/override using MEF plug-in
  • 13.
    TYPE PROVIDER GROWINGPAINS  Type providers are an awesome idea  Current implementation has some kinks:  Cannot compile Type Provider while binary is in use by VS (VS keeps it locked)  If Type Providers are dependent on other assemblies they may not get resolved  Accessing slow external resources can slow down your machine  Buggy type providers can crash the IDE or compiler  Builds may fail because of machine configuration:  E.g. you don‟t have R  You don‟t have the same packages installed in R  Best to put external resources or schema files in source control
  • 14.
    WAS IT WORTHIT?  Having integrated, slightly-type-safe access to R from F# interactive is extremely powerful.  This problem can be solved using code generation  Using the type provider is much more “fluid” – no process  Issues from previous slide detract from that somewhat  Type providers have lots of interesting applications outside data access:  COM interop  WinRT interop  Intra-language meta-programming (if static parameters were more flexible)

Editor's Notes

  • #5 Demo: R-studio; 1+2x = 3xs = c(1,2,3) (both are vectors)Xs + xXs &gt; 1F = function(x) x + 1Look at fLook at cDf = data.frame(A=c(1,2,3), B=(c,4,5,6))Class(df)Unclass(df)Print(df)PrintPrint.data.frame
  • #6 Show RDotNet sample program.
  • #7 Examples: SQL provider, XML provider, Regex provider, CSV provider, Odata provider, file system provider
  • #9 Demo of F# type provider.#load script that provides getStockPrices as CSV (save files on disk)Call getStockPrices on MSFT and show results in visualizerCall R.log |&gt; R.diff on result – show resultlet data =[for t in tickers -&gt; t,getStockPrices t 255 |&gt;R.log|&gt;R.diff]Call R.data_frame on the resultShow pairs plotPick a pair – plot against each otherBuild lm.
  • #11 Take a look at the Rprovider.fs code
  • #12 Look at Rinterop.fs
  • #13 Show FactorVector and DataFrameConverter