Apachepig 130726021253-phpapp01
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
268
On Slideshare
268
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Apache Pig ● What is it ? ● How does it work ? ● Why use it ? ● PigLatin Data Types ● PigLatin Maths ● PigLatin Example www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 2. Pig – What is it ? ● A high level language ● Used to analyse large data sets ● Used to create MapReduce jobs ● Abstracts definition of jobs ● Uses Pig Latin to define jobs ● Less code needed ● Compiles to MapReduce code www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 3. Pig – How does it work ? ● Three ways to use it – – Write Pig Latin in a script file – ● Grunt – Pig's interactive shell Embed Pig commands in another language Run modes – – ● Local mode – single machine Hadoop – run on a Hadoop/MapReduce cluster Creates MapReduce code automatically www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 4. Pig – Why use it ? ● It is quicker ● It is data omnivorous ● It is easy to learn ● It is widely used ● Minor performance loss – ● Compared to native code It can be extended via user defined functions ( UDF ) www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 5. PigLatin Data Types ● Int ● Long ● Float ● Double ● Chararray ● Bytearray ● Tuple ● Bag ● Map www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 6. PigLatin Maths Some of the built in maths functions ● ABS ● CEIL ● EXP ● FLOOR ● LOG ● ROUND ● SIN ● TAN www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 7. PigLatin Example Example borrowed from Wikipedia input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray); -- Extract words from each line and put them into a pig bag -- datatype, then flatten the bag to get one word on each row words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word; -- filter out any words that are just white spaces filtered_words = FILTER words BY word MATCHES 'w+'; -- create a group for each word word_groups = GROUP filtered_words BY word; -- count the entries in each group word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word; -- order the records by count ordered_word_count = ORDER word_count BY count DESC; STORE ordered_word_count INTO '/tmp/number-of-words-on-internet'; www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 8. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems