Your SlideShare is downloading. ×
0
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType

25,593

Published on

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
25,593
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Cascalog
    • Nathan Marz, BackType
    Powerful and easy-to-use data analysis tool for Hadoop
  • 2. About Me
    • Tech Lead at BackType
    • Have been working on many-terabyte scale systems for two years
      • ETL workflows
      • Data warehouses
  • 3. Presentation Overview
    • High level introduction to Cascalog
    • Demo
    • Cascalog at BackType
  • 4. What is Cascalog?
    • Query language for Hadoop
    • Queries are written as regular Clojure code
    • Alternative to Pig and Hive
  • 5. What is Clojure?
    • Functional language that compiles to Java bytecode
    • Lisp-based
    • First-class integration with Java
  • 6. Features
    • Inner and outer joins
    • Aggregators
    • Functions
    • Subqueries
    • Sorting
    • Arbitrary inputs and outputs
  • 7. What sets Cascalog apart?
  • 8. What sets Cascalog apart? Fully integrated in a general purpose programming language
  • 9. What sets Cascalog apart? Full power of Clojure available at all times
  • 10. What sets Cascalog apart? Full power of Clojure available at all times
  • 11. What sets Cascalog apart?
    • Custom operations
      • No UDF interface
      • Just Clojure functions
  • 12. What sets Cascalog apart?
    • Dynamic queries
      • Write functions that return queries
      • Manipulate queries as first-class entities in the language
  • 13. What sets Cascalog apart?
    • Use Cascalog side by side with other code
      • Appends and Distributed Copies
      • Consolidation
      • Application logic
  • 14. Easy Experimentation
    • Ships with test dataset that can be queried locally (the “playground”)
    • 5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README
  • 15. Demo time!
  • 16. Cascalog at BackType
    • BackType collects data about conversations around the web
      • Tweets
      • Blog comments
      • Social news
      • People
  • 17. Cascalog at BackType
    • Cascalog is used to:
      • Identify influencers
      • Determine number of people exposed to URLs on Twitter
      • Identify “interesting tweets”
      • Study social engagement of domains over time
      • Etc, etc.
  • 18. Cascalog at BackType
    • Input and output
      • Cascalog reads from MySQL databases
      • Cascalog writes to Cassandra
  • 19. Cascalog at BackType
    • Rapid development
      • Local playground dataset for development
      • Develop queries in the REPL
  • 20. Cascalog Roadmap
    • Optimized joins:
      • Replicated joins
      • Bloom joins
    • Negations
    • Recursion
  • 21. Questions?
    • Project page: http://www.github.com/nathanmarz/cascalog
    • Tutorial: http://nathanmarz.com/blog/introducing-cascalog
    • Follow me on Twitter: @nathanmarz
  • 22. Clojure and Cascalog
    • Provided by Clojure:
      • Module system
      • Dynamic queries
      • Custom operations
      • Interactive REPL
  • 23. Cascading and Cascalog
    • Provided by Cascading:
      • Tuple abstraction and tuple manipulation
      • Workflow to MapReduce translation
      • Read and write from anywhere with Taps

×