Uploaded on

Presentation I gave at Bay Area Clojure Meetup Group on May 6th, 2010. Also demoed examples from introductory tutorial: …

Presentation I gave at Bay Area Clojure Meetup Group on May 6th, 2010. Also demoed examples from introductory tutorial:

http://nathanmarz.com/blog/introducing-cascalog/

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
14,066
On Slideshare
0
From Embeds
0
Number of Embeds
7

Actions

Shares
Downloads
88
Comments
0
Likes
13

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

















Transcript

  • 1. Cascalog Nathan Marz, BackType Po wer fu l a n d ea sy-t o- us e data a n a lysi s to ol fo r H adoo p
  • 2. About Me Tech Lead at BackType Have been working on many-terabyte scale systems for two years ETL workflows Data warehouses
  • 3. What is Hadoop? Distributed Filesystem MapReduce Framework Scales to thousands of machines and petabytes of data
  • 4. What is Cascalog? Clojure-based query language for Hadoop with Datalog-inspired syntax Queries compile to one or more MapReduce jobs The tool I wish I had two years ago
  • 5. Features Inner and outer joins Aggregators Functions Subqueries Sorting High performance
  • 6. What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
  • 7. What sets Cascalog apart? Super simple Full power of Clojure always available Easy to extend with custom operations Dynamic queries Arbitrary inputs and outputs
  • 8. Experiment with Cascalog Ships with test dataset that can be queried locally (the “playground”) 5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README
  • 9. News feed generator Ranks events in social network for each person based on “importance” and recency 38 lines of code
  • 10. Demo time!
  • 11. News Feed “Follows” and “Action” data sources Text files on HDFS Follows Action
  • 12. News Feed
  • 13. News Feed Custom Aggregator to produce a news feed in JSON-like form
  • 14. News Feed Custom Function to score each item in the feed
  • 15. News Feed Data sources
  • 16. News Feed Subquery to compute follower count for each person
  • 17. News Feed Tie everything together in a single Cascalog query
  • 18. Questions? Project page: http://www.github.com/nathanmarz/cascalog Tutorial: http://nathanmarz.com/blog/introducing-cascalog Follow me on Twitter: @nathanmarz