• Save
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType

on

  • 26,353 views

 

Statistics

Views

Total Views
26,353
Views on SlideShare
3,438
Embed Views
22,915

Actions

Likes
5
Downloads
0
Comments
0

20 Embeds 22,915

http://tech.backtype.com 21316
http://nosql.mypopescu.com 896
http://developer.yahoo.com 340
http://developer.yahoo.net 208
https://developer.yahoo.com 51
http://yahoohadoop.tumblr.com 37
http://www.slideshare.net 19
http://translate.googleusercontent.com 11
http://web.archive.org 10
http://static.slidesharecdn.com 9
http://webcache.googleusercontent.com 6
https://www.tumblr.com 4
http://computerrepairkansascity.typepad.com 1
http://computerhelpkansascity.blogspot.com 1
http://hghltd.yandex.net 1
http://tech.backtype.com.iproxy.saverpigeeks.com 1
http://paper.li 1
http://posterous.com 1
http://rdbcci 1
https://www.x-ploited.net 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType Presentation Transcript

  • 1. Cascalog
    • Nathan Marz, BackType
    Powerful and easy-to-use data analysis tool for Hadoop
  • 2. About Me
    • Tech Lead at BackType
    • Have been working on many-terabyte scale systems for two years
      • ETL workflows
      • Data warehouses
  • 3. Presentation Overview
    • High level introduction to Cascalog
    • Demo
    • Cascalog at BackType
  • 4. What is Cascalog?
    • Query language for Hadoop
    • Queries are written as regular Clojure code
    • Alternative to Pig and Hive
  • 5. What is Clojure?
    • Functional language that compiles to Java bytecode
    • Lisp-based
    • First-class integration with Java
  • 6. Features
    • Inner and outer joins
    • Aggregators
    • Functions
    • Subqueries
    • Sorting
    • Arbitrary inputs and outputs
  • 7. What sets Cascalog apart?
  • 8. What sets Cascalog apart? Fully integrated in a general purpose programming language
  • 9. What sets Cascalog apart? Full power of Clojure available at all times
  • 10. What sets Cascalog apart? Full power of Clojure available at all times
  • 11. What sets Cascalog apart?
    • Custom operations
      • No UDF interface
      • Just Clojure functions
  • 12. What sets Cascalog apart?
    • Dynamic queries
      • Write functions that return queries
      • Manipulate queries as first-class entities in the language
  • 13. What sets Cascalog apart?
    • Use Cascalog side by side with other code
      • Appends and Distributed Copies
      • Consolidation
      • Application logic
  • 14. Easy Experimentation
    • Ships with test dataset that can be queried locally (the “playground”)
    • 5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README
  • 15. Demo time!
  • 16. Cascalog at BackType
    • BackType collects data about conversations around the web
      • Tweets
      • Blog comments
      • Social news
      • People
  • 17. Cascalog at BackType
    • Cascalog is used to:
      • Identify influencers
      • Determine number of people exposed to URLs on Twitter
      • Identify “interesting tweets”
      • Study social engagement of domains over time
      • Etc, etc.
  • 18. Cascalog at BackType
    • Input and output
      • Cascalog reads from MySQL databases
      • Cascalog writes to Cassandra
  • 19. Cascalog at BackType
    • Rapid development
      • Local playground dataset for development
      • Develop queries in the REPL
  • 20. Cascalog Roadmap
    • Optimized joins:
      • Replicated joins
      • Bloom joins
    • Negations
    • Recursion
  • 21. Questions?
    • Project page: http://www.github.com/nathanmarz/cascalog
    • Tutorial: http://nathanmarz.com/blog/introducing-cascalog
    • Follow me on Twitter: @nathanmarz
  • 22. Clojure and Cascalog
    • Provided by Clojure:
      • Module system
      • Dynamic queries
      • Custom operations
      • Interactive REPL
  • 23. Cascading and Cascalog
    • Provided by Cascading:
      • Tuple abstraction and tuple manipulation
      • Workflow to MapReduce translation
      • Read and write from anywhere with Taps