Your SlideShare is downloading. ×
Cascading
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cascading

3,368
views

Published on

High level overview of Cascading.

High level overview of Cascading.

Published in: Technology, Education

0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,368
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
113
Comments
0
Likes
11
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
















  • Transcript

    • 1. Cascading Nathan Marz BackType
    • 2. What is Cascading? Cascading is a Java library that makes development of complex Hadoop MapReduce workflows easy
    • 3. Why Hadoop? • Process large amounts of data in a scalable, fault-tolerant way
    • 4. Why Cascading? Tool How you feel Hadoop MapReduce Cascading
    • 5. Tuples Cascading represents all data as “Tuples” (“the man sat” , 25) (“hello dolly” , 42) (“say hello” ,1 ) (“the woman sat”, 10)
    • 6. Tuples Tuples are named, ordered fields [“sentence”, “value”] (“the man sat” , 25) (“hello dolly” , 42) (“say hello” ,1 ) (“the woman sat”, 10)
    • 7. Flow A flow is a sequence of manipulations on pipes of tuple streams • Flow compiles to one or more MapReduce jobs • Inputs and outputs called “Taps”. • Each Tap produces or receives a pipe of tuples with the same format • Multiple inputs, multiple outputs
    • 8. Example [“sentence”, “value”] [“word”, “sum”] Get the sum of the values for each word
    • 9. Example [“sentence”, “value”] Split(“sentence”) -> “word” [“word”, “value”] GroupBy(“word”) [“word”, list<[“value”]>] Sum(“value”) -> “sum” [“word”, “sum”]
    • 10. Example Split(“sentence”) -> “word” [“sentence”, “value”] [“word”, “value”] (“the” , 25) (“the man sat” , 25) (“man” , 25) (“hello dolly” , 42) (“sat” , 25) (“say hello” ,1 ) (“hello” , 42) (“the woman sat”, 10) (“dolly” , 42) (“say” ,1 ) (“hello” , 1 ) (“the” , 10) (“woman” , 10) (“sat” , 10)
    • 11. Example GroupBy(“word”) [“word”, “value”] [“word”, list<[“value”]>] (“the” , 25) (“man” , 25) (“the” , [25, 10]) (“sat” , 25) (“man” , [25] ) (“hello” , 42) (“sat” , [25, 10]) (“dolly” , 42) (“hello” , [42, 1] ) (“say” ,1 ) (“dolly” , [42] ) (“hello” , 1 ) (“say” , [1] ) (“the” , 10) (“woman” , [10] ) (“woman” , 10) (“sat” , 10)
    • 12. Example Sum(“value”) -> “sum” [“word”, list<[“value”]>] [“word”, “sum”] (“the” , [25, 10]) (“the” , 35) (“man” , [25] ) (“man” , 25) (“sat” , [25, 10]) (“sat” , 35) (“hello” , [42, 1] ) (“hello” , 43) (“dolly” , [42] ) (“dolly” , 42) (“say” , [1] ) (“say” ,1 ) (“woman” , [10] ) (“woman” , 10)
    • 13. More functionality • Inner and outer joins natively supported • Seamlessly branch and merge pipes of tuples • Integrate diverse data sources
    • 14. Why not Pig? • Pig is a custom language for writing MapReduce workflows • Because it’s a custom language, intermixing “plain logic” in between flows is painful • Not nearly as flexible as Cascading for custom needs
    • 15. Learn more • Tutorial: http://blog.rapleaf.com/dev/?p=33 • Website: http://www.cascading.org
    • 16. Questions?

    ×