Upcoming SlideShare
×

# Native interfaces for R

2,758 views

Published on

Presentation on native interfaces for the R programming language given as part of a course in advanced R programming at FHCRC:

https://secure.bioconductor.org/SeattleMay10/

Published in: Technology
2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
2,758
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
45
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Native interfaces for R

1. 1. Native Interfaces for R Seth Falcon Fred Hutchinson Cancer Research Center 20-21 May, 2010
2. 2. Outline The .Call Native Interface R types at the C-level Memory management in R Self-Study Exercises Odds and Ends Resources
3. 3. which as implemented in R prior to R-2.11 which <- function(x) { seq_along(x)[x & !is.na(x)] }
4. 4. which as implemented in R prior to R-2.11 which <- function(x) { seq_along(x)[x & !is.na(x)] } 1. is.na 2. ! 3. & 4. seq_along 5. [
5. 5. which in C 1 SEXP nid_which(SEXP v) { 2 SEXP ans; 3 int i, j = 0, len = Rf_length(v), *buf, *tf; 4 buf = (int *) R_alloc(len, sizeof(int)); 5 tf = LOGICAL(v); 6 for (i = 0; i < len; i++) { 7 if (tf[i] == TRUE) buf[j] = i + 1; j++; 8 } 9 ans = Rf_allocVector(INTSXP, j); 10 memcpy(INTEGER(ans), buf, sizeof(int) * j); 11 return ans; }
6. 6. which is now 10x faster system.time(which(v)) R Version System time Elapsed time R-2.10 0.018 0.485 R-2.11 0.001 0.052 v is a logical vector with 10 million elements
7. 7. Making use of existing code Access algorithms Interface to other systems RBGL RSQLite RCurl netcdf Rsamtools SJava
8. 8. Review of C #include <stdio.h> double _nid_avg(int *data, int len) { int i; double ans = 0.0; for (i = 0; i < len; i++) { ans += data[i]; } return ans / len; } main() { int ex_data[5] = {1, 2, 3, 4, 5}; double m = _nid_avg(ex_data, 5); printf("%fn", m); }
9. 9. The .Call interface
10. 10. Symbolic Expressions (SEXPs) The SEXP is R’s fundamental data type at the C-level SEXP is short for symbolic expression and is borrowed from Lisp History at http://en.wikipedia.org/wiki/S-expression
11. 11. .Call Start to Finish Demo Demo
12. 12. .Call Disection R # R/somefile.R avg <- function(x) .Call(.nid_avg, x) # NAMESPACE useDynLib("yourPackage", .nid_avg = nid_avg) C /* src/some.c */ #include <Rinternals.h> SEXP nid_avg(SEXP data) { ... INTEGER(data) ... }
13. 13. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
14. 14. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
15. 15. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
16. 16. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
17. 17. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
18. 18. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
19. 19. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
20. 20. Function Registration In NAMESPACE In C via R_CallMethodDef (See ShortRead/src/R_init_ShortRead.c for a nice example.
21. 21. Outline The .Call Native Interface R types at the C-level Memory management in R Self-Study Exercises Odds and Ends Resources
22. 22. Just one thing to learn Everything is a SEXP But there are many diﬀerent types of SEXPs INTSXP, REALSXP, STRSXP, LGLSXP, VECSXP, and more.
23. 23. What’s a SEXP?
24. 24. Common SEXP subtypes R function SEXP subtype Data accessor integer() INTSXP int *INTEGER(x) numeric() REALSXP double *REAL(x) logical() LGLSXP int *LOGICAL(x) character() STRSXP CHARSXP STRING ELT(x, i) list() VECSXP SEXP VECTOR ELT(x, i) NULL NILSXP R NilValue externalptr EXTPTRSXP SEXP (accessor funcs)
25. 25. STRSXP/CHARSXP Exercise Try this: .Internal(inspect(c("abc", "abc", "xyz")))
26. 26. STRSXP/CHARSXP Exercise > .Internal(inspect(c("abc", "abc", "xyz"))) @101d3f280 16 STRSXP g0c3 [] (len=3, tl=1) @101cc6278 09 CHARSXP g0c1 [gp=0x20] "abc" @101cc6278 09 CHARSXP g0c1 [gp=0x20] "abc" @101cc6308 09 CHARSXP g0c1 [gp=0x20] "xyz"
27. 27. character vectors == STRSXPs + CHARSXPs
28. 28. CHARSXPs are special
29. 29. STRSXP/CHARSXP API Example SEXP s = Rf_allocVector(STRSXP, 5); PROTECT(s); SET_STRING_ELT(s, 0, mkChar("hello")); SET_STRING_ELT(s, 1, mkChar("goodbye")); SEXP c = STRING_ELT(s, 0); const char *v = CHAR(c); UNPROTECT(1);
30. 30. STRSXP/CHARSXP API Example SEXP s = Rf_allocVector(STRSXP, 5); PROTECT(s); SET_STRING_ELT(s, 0, mkChar("hello")); SET_STRING_ELT(s, 1, mkChar("goodbye")); SEXP c = STRING_ELT(s, 0); const char *v = CHAR(c); UNPROTECT(1);
31. 31. STRSXP/CHARSXP API Example SEXP s = Rf_allocVector(STRSXP, 5); PROTECT(s); SET_STRING_ELT(s, 0, mkChar("hello")); SET_STRING_ELT(s, 1, mkChar("goodbye")); SEXP c = STRING_ELT(s, 0); const char *v = CHAR(c); UNPROTECT(1);
32. 32. STRSXP/CHARSXP API Example SEXP s = Rf_allocVector(STRSXP, 5); PROTECT(s); SET_STRING_ELT(s, 0, mkChar("hello")); SET_STRING_ELT(s, 1, mkChar("goodbye")); SEXP c = STRING_ELT(s, 0); const char *v = CHAR(c); UNPROTECT(1);
33. 33. STRSXP/CHARSXP API Example SEXP s = Rf_allocVector(STRSXP, 5); PROTECT(s); SET_STRING_ELT(s, 0, mkChar("hello")); SET_STRING_ELT(s, 1, mkChar("goodbye")); SEXP c = STRING_ELT(s, 0); const char *v = CHAR(c); UNPROTECT(1);
34. 34. Outline The .Call Native Interface R types at the C-level Memory management in R Self-Study Exercises Odds and Ends Resources
35. 35. R’s memory model (dramatization) All SEXPs PROTECT reachable stack Precious list DRAMATIZATION
36. 36. R’s memory model R allocates, tracks, and garbage collects SEXPs gc is triggered by R functions SEXPs that are not in-use are recycled. A SEXP is in-use if: it is on the protection stack (PROTECT/UNPROTECT) it is in the precious list (R PreserveObject/R ReleaseObject) it is reachable from a SEXP in the in-use list. All SEXPs PROTECT reachable stack Precious list DRAMATIZATION
37. 37. Preventing gc of SEXPs with the protection stack PROTECT(s) Push s onto the protection stack UNPROTECT(n) Pop top n items oﬀ of the protection stack
38. 38. Generic memory: When everything isn’t a SEXP Allocation Function Lifecycle R_alloc Memory is freed when returning from .Call Calloc/Free Memory persists until Free is called
39. 39. Case Study: C implementation of which Code review of nidemo/src/which.c
40. 40. Outline The .Call Native Interface R types at the C-level Memory management in R Self-Study Exercises Odds and Ends Resources
41. 41. Alphabet frequency of a text ﬁle Yeast Gene YDL143W English Dictionary a e t i g a c o d r 0.00 0.10 0.20 0.30 0.07 0.08 0.09 0.10 frequency frequency
42. 42. Self-Study in nidemo package 1. Alphabet Frequency Explore R and C implementations Make C implementation robust Attach names attribute in C multiple ﬁle, matrix return enhancement 2. External pointer example 3. Calling R from C example and enhancement 4. Debugging: a video how-to
43. 43. Outline The .Call Native Interface R types at the C-level Memory management in R Self-Study Exercises Odds and Ends Resources
44. 44. Odds and Ends 1. Debugging 2. Public API 3. Calling R from C 4. Making sense of the remapped function names 5. Finding C implementations of R functions 6. Using a TAGS ﬁle
45. 45. Debugging Native Code Rprintf("DEBUG: v => %d, x => %sn", v, x); \$ R -d gdb (gdb) run > # now R is running There are two ways to write error-free programs; only the third one works. – Alan Perlis, Epigrams on Programming
46. 46. R’s Public API mkdir R-devel-build; cd R-devel-build ~/src/R-devel-src/configure && make ## hint: configure --help ls include R.h Rdefines.h Rinterface.h S.h R_ext/ Rembedded.h Rinternals.h Details in WRE
47. 47. Calling R functions from C 1. Build a pairlist 2. call Rf_eval(s, rho)
48. 48. Remapped functions Source is unadorned Preprocessor adds Rf_ Object code contains Rf_
49. 49. Finding C implementations cd R-devel/src/main grep '"any"' names.c {"any", do_logic3, 2, ...} grep -l do_logic3 *.c logic.c
50. 50. R CMD rtags cd R-devel/src R CMD rtags --no-Rd # creates index file TAGS
51. 51. Outline The .Call Native Interface R types at the C-level Memory management in R Self-Study Exercises Odds and Ends Resources
52. 52. Writing R Extensions RShowDoc("R-exts")
53. 53. Book: The C Programming Language (K & R) Java in a Nutshell 1264 pages The C++ Programming Language 1030 pages The C Programming Language 274 pages
54. 54. Resources Read WRE and “K & R” Use the sources for R and packages R-devel, bioc-devel mailing lists Patience