Quirrel & R for Dummies

1,033 views
914 views

Published on

Quirrel is a statistically-oriented language designed principally for data analysis. It combines a purely-declarative, implicitly parallel design with features needed by data scientists. In this presentation, John A. De Goes (chairman of the Quirrel language committee) introduces Quirrel and shows how it can be used to solve problems across large data sets. Over the past 5 years, R has enjoyed tremendous success in the data science community, and for good reason—it comes with batteries loaded, and sports one of the best communities in the data science world. Although R is not an easy programming language to learn, the basics can be picked up rather quickly. In this talk, John A. De Goes walks through the core syntax and features of R, providing enough training to give anyone the ability to do simple analysis.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,033
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Quirrel & R for Dummies

  1. 1. Introduction to Quirrel & R OSCON, July 25 John A. De Goes @jdegoes
  2. 2. Quirrel is an open standard language designed for the analysis of large-scale, heterogeneous data sets. overview R is an open source programming language and interactive environment for statistical computing and graphics. Quirrel R
  3. 3. ● Young language, still evolving ● Nascent community ● Intentionally limited ● Simple, consistent core ● Fully parallel ● Purely functional ● Programmatic or interactive quirrel versus r Quirrel R CONS / PROS PROS / CONS ● Mature language, "feature- complete" ● Robust community ● Turing-complete ● Complex core ● Mostly parallel ● Imperative ● Interactive
  4. 4. what's the right tool for the job? Small amount of data? Simple analytics? Simple analytics? YES NO NO YES YES NO Quirrel Hive / Pig SQL R
  5. 5. pageViews := //pageViews avg := mean(pageViews.duration) bound := 1.5 * stdDev(pageViews.duration) pageViews.userId where pageViews.duration > avg + bound sneak peek pageViews <- read.csv("pageViews.csv") avg <- mean(pageViews$duration) bound <- 1.5 * sd(pageViews$duration) userIds <- subset(pageViews, duration > avg + bound, select=userId) Quirrel R
  6. 6. data models Everything is a random variable. true, false 1, 3.1415 null, undefined "Mary Jane" [1, 2, 3] [[1, 2, 3], [4, 5, 6], [7, 8, 9]] {"name": "John"} 1 || 2 || 3 || 4 || 5 || 6 [1, "foo", [1, false]] Quirrel R Everything is an ordered sequence of values.* TRUE, FALSE 1, 3.1415 NA, NaN, Inf "Mary Jane" c(1, 2, 3) array(c(1,4,7,2,5,9,3,6,9), dim=c(3,3)) data.frame(name=c("John")) c(1, 2, 3, 4, 5, 6) list(1, "foo", list(1, FALSE)) *Except when it's not.
  7. 7. comments -- ignore me (- ignore me too! -) Quirrel R # ignore me # ignore # me # too!
  8. 8. basic expressions 2 * 4 (1 + 2) * 3 / 9 > 23 3 > 2 & (1 != 2) 2 + 2 = 4 false & true | !false undefined = undefined Quirrel R 2 * 4 (1 + 2) * 3 / 9 > 23 3 > 2 & (1 != 2) 2 + 2 == 4 FALSE & TRUE | !FALSE NA == NA
  9. 9. named expressions x := 2 square := x * x Quirrel R x <- 2 square <- x * x
  10. 10. loading data //pageViews load("/pageViews") //daily_snapshots/* Quirrel R read.csv("pageViews") read.csv("pageViews") lapply(Sys.glob ("daily_snapshots/*", read. csv))
  11. 11. drilldown pageViews := //pageViews pageViews.userId pageViews.keywords [2] Quirrel R pageViews <- read.csv("pageViews") pageViews$userId vector[2] list[[1]]
  12. 12. reductions count(purchases) sum(purchases.total) mean(purchases.total) stdDev(purchases. total) Quirrel R length(purchases) sum(purchases$total) mean(purchases$total) sd(purchases$total)
  13. 13. filtering views.userId where views.duration > 1000 Quirrel R subset(views, duration > 100, select=userId)
  14. 14. augmentation clicks with {dow: dayOfWeek(clicks.ts)} Quirrel R clicks$dow <- weekdays(clicks$ts)
  15. 15. libraries import std::stats::rank pageViews := //pageViews rank(pageViews.duration) Quirrel R library(data.table) pageViews <- read.csv("views.csv) rank(pageViews$duration)
  16. 16. user-defined functions ctr(day) := count(clicks where clicks.day = day) / count(impressions where impressions.day = day) ctr("Monday") Quirrel R ctr <- function(d) { c1 <- subset(clicks, clicks$day == d) c2 <- subset(impressions, impressions$day == d) length(c1$day) / length(c2$day) } ctr("Monday")
  17. 17. grouping - implicit constraints solve 'day {day: 'day, ctr: count(clicks where clicks.day = 'day) / count(impressions where impressions.day = 'day)} Quirrel R clicks$count1 <- 0 c1 <- aggregate(count1 ~ day, data = clicks, FUN=length) impressions$count2 <- 0 c2 <- aggregate(count2 ~ day, data = impressions, FUN=length) r <- merge(c1, c2) ctr <- data.frame(day = r$day, ctr = r$count1 / r$count2)
  18. 18. grouping - explicit constraints solve 'date = purchases.date {date: 'date, cummTotal: sum(purchases.total where purchases.date < 'date)} Quirrel R purchases2 <- purchases[ order(purchases$date) ] data.frame( date = purchases2$date, cummTotal = cumsum (purchases2$total) )
  19. 19. Questions?Nov - Dec 2012
  20. 20. Quirrel / R Challenge ProblemsNov - Dec 2012
  21. 21. ■ Using the /london_medals/summer_games data, find the youngest athlete to win a medal challenge problem #1 Download dataset at http://labcoat.precog.com
  22. 22. ■ Using the /london_medals/summer_games data, find the oldest athlete to win a medal challenge problem #2 Download dataset at http://labcoat.precog.com
  23. 23. ■ Using the /london_medals/summer_games data, find the average age at which athletes win medals challenge problem #3 Download dataset at http://labcoat.precog.com
  24. 24. ■ Using the /london_medals/summer_games data, find the most common age to win a medal challenge problem #4 Download dataset at http://labcoat.precog.com
  25. 25. Thank you! Follow me on Twitter: @jdegoes Learn more about R: r-project.org Download R: r-project.org/mirrors.html Sign up for a free Precog account: precog.com Learn more about Quirrel: quirrel-lang.org Nov - Dec 2012

×