Your SlideShare is downloading. ×
0
Apachepig 130726021253-phpapp01
Apachepig 130726021253-phpapp01
Apachepig 130726021253-phpapp01
Apachepig 130726021253-phpapp01
Apachepig 130726021253-phpapp01
Apachepig 130726021253-phpapp01
Apachepig 130726021253-phpapp01
Apachepig 130726021253-phpapp01
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Apachepig 130726021253-phpapp01

201

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
201
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Apache Pig ● What is it ? ● How does it work ? ● Why use it ? ● PigLatin Data Types ● PigLatin Maths ● PigLatin Example www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 2. Pig – What is it ? ● A high level language ● Used to analyse large data sets ● Used to create MapReduce jobs ● Abstracts definition of jobs ● Uses Pig Latin to define jobs ● Less code needed ● Compiles to MapReduce code www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 3. Pig – How does it work ? ● Three ways to use it – – Write Pig Latin in a script file – ● Grunt – Pig's interactive shell Embed Pig commands in another language Run modes – – ● Local mode – single machine Hadoop – run on a Hadoop/MapReduce cluster Creates MapReduce code automatically www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 4. Pig – Why use it ? ● It is quicker ● It is data omnivorous ● It is easy to learn ● It is widely used ● Minor performance loss – ● Compared to native code It can be extended via user defined functions ( UDF ) www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 5. PigLatin Data Types ● Int ● Long ● Float ● Double ● Chararray ● Bytearray ● Tuple ● Bag ● Map www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 6. PigLatin Maths Some of the built in maths functions ● ABS ● CEIL ● EXP ● FLOOR ● LOG ● ROUND ● SIN ● TAN www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 7. PigLatin Example Example borrowed from Wikipedia input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray); -- Extract words from each line and put them into a pig bag -- datatype, then flatten the bag to get one word on each row words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word; -- filter out any words that are just white spaces filtered_words = FILTER words BY word MATCHES 'w+'; -- create a group for each word word_groups = GROUP filtered_words BY word; -- count the entries in each group word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word; -- order the records by count ordered_word_count = ORDER word_count BY count DESC; STORE ordered_word_count INTO '/tmp/number-of-words-on-internet'; www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 8. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems

×