Cascalog at May Bay Area Hadoop User Group

2,484 views
2,377 views

Published on

Presentation about Cascalog, a Clojure-based query language for Hadoop.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,484
On SlideShare
0
From Embeds
0
Number of Embeds
42
Actions
Shares
0
Downloads
17
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide




























  • Cascalog at May Bay Area Hadoop User Group

    1. 1. Cascalog Nathan Marz, BackType Po wer fu l a n d ea sy-t o- us e data a n a lysi s to ol fo r H adoo p
    2. 2. About Me Tech Lead at BackType Have been working on many-terabyte scale systems for two years ETL workflows Data warehouses
    3. 3. Presentation Over view 1) High level introduction to Cascalog 2) Demo 3) Cascalog at BackType
    4. 4. What is Cascalog? Query language for Hadoop Queries are written as regular Clojure code Alternative to Pig and Hive
    5. 5. What is Clojure? Functional language that compiles to Java bytecode Lisp-based First-class integration with Java
    6. 6. Features Inner and outer joins Aggregators Functions Subqueries Sorting Arbitrary inputs and outputs
    7. 7. What sets Cascalog apart?
    8. 8. What sets Cascalog apart? Fully integrated in a general purpose programming language
    9. 9. What sets Cascalog apart? Full power of Clojure available at all times
    10. 10. What sets Cascalog apart? Full power of Clojure available at all times
    11. 11. What sets Cascalog apart? Custom operations No UDF interface Just Clojure functions
    12. 12. What sets Cascalog apart? Dynamic queries Write functions that return queries Manipulate queries as first-class entities in the language
    13. 13. What sets Cascalog apart? Use Cascalog side by side with other code Appends and Distributed Copies Consolidation Application logic
    14. 14. Easy Experimentation Ships with test dataset that can be queried locally (the “playground”) 5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README
    15. 15. Demo time!
    16. 16. Cascalog at BackType BackType collects data about conversations around the web Tweets Blog comments Social news People
    17. 17. Cascalog at BackType
    18. 18. Cascalog at BackType Cascalog is used to:
    19. 19. Cascalog at BackType Cascalog is used to: Identify influencers
    20. 20. Cascalog at BackType Cascalog is used to: Identify influencers Determine number of people exposed to URLs on Twitter
    21. 21. Cascalog at BackType Cascalog is used to: Identify influencers Determine number of people exposed to URLs on Twitter Identify “interesting tweets”
    22. 22. Cascalog at BackType Cascalog is used to: Identify influencers Determine number of people exposed to URLs on Twitter Identify “interesting tweets” Study social engagement of domains over time
    23. 23. Cascalog at BackType Cascalog is used to: Identify influencers Determine number of people exposed to URLs on Twitter Identify “interesting tweets” Study social engagement of domains over time Etc, etc.
    24. 24. Cascalog at BackType Input and output Cascalog reads from MySQL databases and HDFS Cascalog writes to Cassandra and HDFS
    25. 25. Cascalog at BackType Rapid development Local playground dataset for development Develop queries in the REPL
    26. 26. Cascalog Roadmap Optimized joins: Replicated joins Bloom joins Negations Recursion
    27. 27. Questions? Project page: http://www.github.com/nathanmarz/cascalog Tutorial: http://nathanmarz.com/blog/introducing-cascalog Follow me on Twitter: @nathanmarz
    28. 28. Clojure and Cascalog Provided by Clojure: Module system Dynamic queries Custom operations Interactive REPL
    29. 29. Cascading and Cascalog Provided by Cascading: Tuple abstraction and tuple manipulation Workflow to MapReduce translation Read and write from anywhere with Taps

    ×