Your SlideShare is downloading. ×
0
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Cascalog at Strange Loop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cascalog at Strange Loop

3,813

Published on

Presentation of Cascalog at Strange Loop on October 15th, 2010. …

Presentation of Cascalog at Strange Loop on October 15th, 2010.

http://github.com/nathanmarz/cascalog

Published in: Technology
0 Comments
12 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,813
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
58
Comments
0
Likes
12
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide










































  • Transcript

    • 1. Cascalog Data processing on Hadoop without the hassle Nathan Marz BackType @nathanmarz
    • 2. What is Cascalog? Cascalog Variables and logic Abstraction Cascading Tuples, data workflows Key/value pairs, MapReduce aggregation
    • 3. Cascalog’s components Cascading (the job execution engine) + Datalog (basis of the API design) + Clojure (the host programming language)
    • 4. Clojure • General purpose programming language • Dialect of Lisp that compiles to Java bytecode
    • 5. Clojure • “Programmable programming language”: Easy to build Domain Specific Languages (DSL) in Clojure
    • 6. Clojure examples Clojure code Result (+ 1 2 3) 6 (> 20 18) true (defn incr [x] (+ 1 x)) 4 (incr 3)
    • 7. Cascalog basics The “age” dataset
    • 8. Cascalog basics
    • 9. Cascalog basics Define and execute a query
    • 10. Cascalog basics Where to emit results Define and execute a query
    • 11. Cascalog basics Where to emit results Output variables Define and execute a query
    • 12. Cascalog basics Where to “Predicates”: constrain emit results the output variables Output variables Define and execute a query
    • 13. Predicates
    • 14. Predicates Input fields
    • 15. Predicates Input fields Output fields
    • 16. Predicates Fields can be constants or variables
    • 17. Predicates Fields can be constants or variables Variables are prefixed with ? or !
    • 18. Predicates
    • 19. Predicates • Functions • Filters • Aggregators • Generators: finite sources of tuples
    • 20. Example #1 Generator Filter
    • 21. Example #2 Generator Function
    • 22. Example #3 Generator Aggregator Filter
    • 23. Join example
    • 24. Join example Triggers a join
    • 25. Join example
    • 26. Join example Joins are an implementation detail
    • 27. Demo time!
    • 28. Why another query language for Hadoop? Existing tools cause too much Accidental Complexity
    • 29. Accidental complexity Complexity caused by the tool used to solve a problem rather than the problem itself
    • 30. Accidental complexity • Distinct query languages cause accidental complexity • Example: SQL injection
    • 31. Query language • We want: • Ability to abstract • Ability to compose
    • 32. Abstraction Clojure function that returns a subquery
    • 33. Abstraction Defining and using custom operation
    • 34. Composability Dynamic query with parameterized operation
    • 35. Composability “Predicate macro”
    • 36. Composability expands to Using a predicate macro
    • 37. Contrast to Pig “Average” is 300 lines of code in Pig
    • 38. Optimized aggregators in Cascalog Implementation of count and sum
    • 39. Why another query language for Hadoop? Existing tools cause too much Accidental Complexity
    • 40. Composability Value normalization example #1
    • 41. Composability Value normalization example #2
    • 42. Composability For each id: select value with the biggest timestamp Value normalization algorithm
    • 43. Composability Implementing value normalization
    • 44. Composability Using value normalization
    • 45. Try Cascalog yourself! Project Page http://www.github.com/nathanmarz/cascalog Introductory Tutorial http://nathanmarz.com/blog/introducing-cascalog/ 5 minutes to install Clojure, Hadoop, and Cascalog locally! See project README
    • 46. BackType is hiring Think Cascalog’s cool? Come build amazing software at BackType. http://www.backtype.com/jobs
    • 47. Questions? Follow me on Twitter at @nathanmarz nathan.marz@gmail.com

    ×