• Save
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType
Upcoming SlideShare
Loading in...5
×
 

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType

on

  • 26,240 views

 

Statistics

Views

Total Views
26,240
Views on SlideShare
3,383
Embed Views
22,857

Actions

Likes
5
Downloads
0
Comments
0

18 Embeds 22,857

http://tech.backtype.com 21316
http://nosql.mypopescu.com 896
http://developer.yahoo.com 340
http://developer.yahoo.net 208
https://developer.yahoo.com 34
http://www.slideshare.net 19
http://translate.googleusercontent.com 11
http://web.archive.org 10
http://static.slidesharecdn.com 9
http://webcache.googleusercontent.com 6
http://computerrepairkansascity.typepad.com 1
http://computerhelpkansascity.blogspot.com 1
http://hghltd.yandex.net 1
http://paper.li 1
http://posterous.com 1
https://www.x-ploited.net 1
http://tech.backtype.com.iproxy.saverpigeeks.com 1
http://rdbcci 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType Presentation Transcript

  • Cascalog
    • Nathan Marz, BackType
    Powerful and easy-to-use data analysis tool for Hadoop
  • About Me
    • Tech Lead at BackType
    • Have been working on many-terabyte scale systems for two years
      • ETL workflows
      • Data warehouses
  • Presentation Overview
    • High level introduction to Cascalog
    • Demo
    • Cascalog at BackType
  • What is Cascalog?
    • Query language for Hadoop
    • Queries are written as regular Clojure code
    • Alternative to Pig and Hive
  • What is Clojure?
    • Functional language that compiles to Java bytecode
    • Lisp-based
    • First-class integration with Java
  • Features
    • Inner and outer joins
    • Aggregators
    • Functions
    • Subqueries
    • Sorting
    • Arbitrary inputs and outputs
  • What sets Cascalog apart?
  • What sets Cascalog apart? Fully integrated in a general purpose programming language
  • What sets Cascalog apart? Full power of Clojure available at all times
  • What sets Cascalog apart? Full power of Clojure available at all times
  • What sets Cascalog apart?
    • Custom operations
      • No UDF interface
      • Just Clojure functions
  • What sets Cascalog apart?
    • Dynamic queries
      • Write functions that return queries
      • Manipulate queries as first-class entities in the language
  • What sets Cascalog apart?
    • Use Cascalog side by side with other code
      • Appends and Distributed Copies
      • Consolidation
      • Application logic
  • Easy Experimentation
    • Ships with test dataset that can be queried locally (the “playground”)
    • 5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README
  • Demo time!
  • Cascalog at BackType
    • BackType collects data about conversations around the web
      • Tweets
      • Blog comments
      • Social news
      • People
  • Cascalog at BackType
    • Cascalog is used to:
      • Identify influencers
      • Determine number of people exposed to URLs on Twitter
      • Identify “interesting tweets”
      • Study social engagement of domains over time
      • Etc, etc.
  • Cascalog at BackType
    • Input and output
      • Cascalog reads from MySQL databases
      • Cascalog writes to Cassandra
  • Cascalog at BackType
    • Rapid development
      • Local playground dataset for development
      • Develop queries in the REPL
  • Cascalog Roadmap
    • Optimized joins:
      • Replicated joins
      • Bloom joins
    • Negations
    • Recursion
  • Questions?
    • Project page: http://www.github.com/nathanmarz/cascalog
    • Tutorial: http://nathanmarz.com/blog/introducing-cascalog
    • Follow me on Twitter: @nathanmarz
  • Clojure and Cascalog
    • Provided by Clojure:
      • Module system
      • Dynamic queries
      • Custom operations
      • Interactive REPL
  • Cascading and Cascalog
    • Provided by Cascading:
      • Tuple abstraction and tuple manipulation
      • Workflow to MapReduce translation
      • Read and write from anywhere with Taps