• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
R, HTTP, and APIs, with a preview of TopicWatchr
 

R, HTTP, and APIs, with a preview of TopicWatchr

on

  • 359 views

Strong, Homer. "R, HTTP, and APIs, with a preview of TopicWatchr." Portland R User Group, 15 November 2011.

Strong, Homer. "R, HTTP, and APIs, with a preview of TopicWatchr." Portland R User Group, 15 November 2011.

Statistics

Views

Total Views
359
Views on SlideShare
358
Embed Views
1

Actions

Likes
0
Downloads
3
Comments
0

1 Embed 1

http://www.docshut.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    R, HTTP, and APIs, with a preview of TopicWatchr R, HTTP, and APIs, with a preview of TopicWatchr Document Transcript

    • Application Programming InterfacesWhy?I want my code to have access to your code or data... from a differentcomputer! we might be using different operating systems! different programming languages! have different compression capabilites! security! etc.At least you dont have to install tons of code or download all of the data.
    • The Internet Suggests a SolutionHyperText Transfer Protocol: HTTP Since the WWW has caught on, HTTP has become a dominant protocol. Pretty much all computers support some kind of HTTP client Browsers are just fancy HTTP clients R can be a client too!Duncan Temple Langs RCurl package offers R access to libcurl, a popular HTTP library.
    • But what data will we transfer?HTTP gives us a nearly universal way to pass data between machines, now we have to decide what formatmessages ought to have. Lets choose something lightweight and human readable (so no XML :p) but it should be something easily serializable, and should have some structure JSON is the popular choice
    • JSONJSON looks like this: 1 { 2 "hello" : "world", 3 "universe" : 42, 4 "pizza" : nil, 5 "cookies" : ["chocolate", "molasses", "oatmeal"], 6 "eggs" : { 7 "over" : "easy" 8 } 9 }JSON has types, can be nested, and has analogies (e.g. dicts or hashes or maps) in most major programminglanguages.smells like a list in RThe JSONIO , also by Duncan Temple Lang, takes R lists to and from their JSON representations.
    • Numerous ExamplesComputational geocoding, Google, et al. face-recognition, face.com prediction, GoogleData Federal Register Bloomberg"Data APIs/feeds available as packages in R"asked on stats.exchange.com a couple of months ago. The list of packages included:quantmod , tseries , flmport , WSI , RGoogleTrends , RGoogleDocs , twitteR , Zillow , RNYTimes ,UScensus2000 , infochimps , rdatamarket , factualR , RDSTK , RBloomberg , LIM , RTAQ , IBrokers ,rnpn , RClimate
    • API example: TopicWatchTopicWatch is a platform for text analytics and visualization currently developing 3 interfaces to the API: iPad app web app R packageWe collect streaming data from a variety of sources including Twitter, RSS feeds, government publications,and others.
    • API OutlineThe API is still under development, and is unstable. Were always adding new features and polishing old ones.Just a few concrete capabilites that are already running: time series of n-gram frequencies & counts aggregated at several resolutions n-grams ranked by frequency also aggregated a several resolution can be filtered by sub grams raw documents that contain a gram topics that contain a gram time series counts of documents that contain co-occurring n-grams ranking grams by usage change between any two times
    • TopicWatchrThe R package is thin wrapper for the HTTP API. It (unsurprisingly) worksby sending a request to a URL parsing JSON results re-arranging lists into data framesBut it has some nice functionality to make working with the API a bitsmoother: parses timestamps in data paginates large requests automatically handles authentication
    • Example 1: Presidential CandidatesCode to get data:1 library(TopicWatchr)2 set_credentials("PRUG", "12345")34 candidates <- c("Herman Cain", "Mitt Romney", "Rick Perry",5 "Newt Gingrich", "Ron Paul", "Michelle Bachmann",6 "Jon Huntsman", "Rick Santorum")78 twitter_counts <- wordCounts("twitter_sample", candidates)9 rss_counts <- wordCounts("rss-majorpapers", candidates)The wordCounts function constructs the proper API call, makes the call, and arranges the results into a dataframe. Each data frame looks like this:data.frame: 5 obs. of 9 variables:$ times : POSIXct, format: "2011-11-15 08:00:00" "2011-11-15 08:30:00" ...$ Herman Cain : num 0 0.00148 0 0.00326 0.00274$ Mitt Romney : num 0 0.00148 0 0.00326 0.00548$ Rick Perry : num 0 0.00148 0 0 0$ Newt Gingrich : num 0 0.00148 0 0.00326 0$ Ron Paul : num 0 0 0 0 0$ Michelle Bachmann: num 0 0 0 0 0$ Jon Huntsman : num 0 0 0 0 0$ Rick Santorum : num 0 0.00148 0 0 0Then we combine data frames and polish with ggplot2 ...
    • Final Result
    • Example 2: Likely Phrase Generator 1 lastGram <- function(g){ 2 strsplit(g, " ")[[1]][[2]] 3 } 4 5 vc <- topGrams("twitter_sample", 6 filter=first, limit=1, 7 m=1, n=2, prefix=TRUE, 8 resolution="daily")$gram 910 phrase <- c()1112 for (i in 1:i){13 vc <- lastGram(vc)14 phrase <- c(phrase, vc)15 vc <- topGrams(twsrc, filter=vc, limit=1, m=1, n=2,16 prefix=TRUE, dev_server=TRUE,17 resolution="daily")$gram18 }
    • `Likely phrases from earlier today:Twitter: "im going back :) lt3 please follow back :) lt3 please"Technology RSS feeds: "user interface displays users click scheme federal trade commission ftc antitrustcomplaint outside occupy wall street"same source, seeded with the word "statistics": "statistics showing highlights google apps like behavioraladvertising refers obliquely suggested session sounded viable business edition"Politics RSS feeds: "washington university battleground poll numbers superfan badge request may becomepresident obama administration asked whether congress approval"Major papers RSS feeds: "percent stake throughout california chapter 11 years ago effectively sealed georgew street movement prefers birds early"Federal Register: "revision incorporates provisions related investigative actions could result based upon freshprunes grown ornamentals ca fip"
    • Feeling Adventurous?Were looking for beta testers for the R package! In Shackletons words, what to expect:...BITTER COLD, LONG MONTHS OF COMPLETE DARKNESS, CONSTANT DANGER, SAFE RETURN DOUBTFUL...But it can still be fun! You can talk with me about it, or get in touch later athomer@luckysort.com
    • Thats all!Thanks for listening. Questions?