Cascalog at Strange Loop

  • 3,661 views
Uploaded on

Presentation of Cascalog at Strange Loop on October 15th, 2010. …

Presentation of Cascalog at Strange Loop on October 15th, 2010.

http://github.com/nathanmarz/cascalog

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,661
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
48
Comments
0
Likes
11

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide










































Transcript

  • 1. Cascalog Data processing on Hadoop without the hassle Nathan Marz BackType @nathanmarz
  • 2. What is Cascalog? Cascalog Variables and logic Abstraction Cascading Tuples, data workflows Key/value pairs, MapReduce aggregation
  • 3. Cascalog’s components Cascading (the job execution engine) + Datalog (basis of the API design) + Clojure (the host programming language)
  • 4. Clojure • General purpose programming language • Dialect of Lisp that compiles to Java bytecode
  • 5. Clojure • “Programmable programming language”: Easy to build Domain Specific Languages (DSL) in Clojure
  • 6. Clojure examples Clojure code Result (+ 1 2 3) 6 (> 20 18) true (defn incr [x] (+ 1 x)) 4 (incr 3)
  • 7. Cascalog basics The “age” dataset
  • 8. Cascalog basics
  • 9. Cascalog basics Define and execute a query
  • 10. Cascalog basics Where to emit results Define and execute a query
  • 11. Cascalog basics Where to emit results Output variables Define and execute a query
  • 12. Cascalog basics Where to “Predicates”: constrain emit results the output variables Output variables Define and execute a query
  • 13. Predicates
  • 14. Predicates Input fields
  • 15. Predicates Input fields Output fields
  • 16. Predicates Fields can be constants or variables
  • 17. Predicates Fields can be constants or variables Variables are prefixed with ? or !
  • 18. Predicates
  • 19. Predicates • Functions • Filters • Aggregators • Generators: finite sources of tuples
  • 20. Example #1 Generator Filter
  • 21. Example #2 Generator Function
  • 22. Example #3 Generator Aggregator Filter
  • 23. Join example
  • 24. Join example Triggers a join
  • 25. Join example
  • 26. Join example Joins are an implementation detail
  • 27. Demo time!
  • 28. Why another query language for Hadoop? Existing tools cause too much Accidental Complexity
  • 29. Accidental complexity Complexity caused by the tool used to solve a problem rather than the problem itself
  • 30. Accidental complexity • Distinct query languages cause accidental complexity • Example: SQL injection
  • 31. Query language • We want: • Ability to abstract • Ability to compose
  • 32. Abstraction Clojure function that returns a subquery
  • 33. Abstraction Defining and using custom operation
  • 34. Composability Dynamic query with parameterized operation
  • 35. Composability “Predicate macro”
  • 36. Composability expands to Using a predicate macro
  • 37. Contrast to Pig “Average” is 300 lines of code in Pig
  • 38. Optimized aggregators in Cascalog Implementation of count and sum
  • 39. Why another query language for Hadoop? Existing tools cause too much Accidental Complexity
  • 40. Composability Value normalization example #1
  • 41. Composability Value normalization example #2
  • 42. Composability For each id: select value with the biggest timestamp Value normalization algorithm
  • 43. Composability Implementing value normalization
  • 44. Composability Using value normalization
  • 45. Try Cascalog yourself! Project Page http://www.github.com/nathanmarz/cascalog Introductory Tutorial http://nathanmarz.com/blog/introducing-cascalog/ 5 minutes to install Clojure, Hadoop, and Cascalog locally! See project README
  • 46. BackType is hiring Think Cascalog’s cool? Come build amazing software at BackType. http://www.backtype.com/jobs
  • 47. Questions? Follow me on Twitter at @nathanmarz nathan.marz@gmail.com