Your SlideShare is downloading. ×

An Introduction to Apache Pig

545
views

Published on

An Introduction to Apache Pig, what is it used for ? …

An Introduction to Apache Pig, what is it used for ?
How does it work and why use it compared to Map Reduce
native code ?

Published in: Technology

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
545
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
37
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Apache Pig ● What is it ? ● How does it work ? ● Why use it ? ● PigLatin Data Types ● PigLatin Maths ● PigLatin Example www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 2. Pig – What is it ? ● A high level language ● Used to analyse large data sets ● Used to create MapReduce jobs ● Abstracts definition of jobs ● Uses Pig Latin to define jobs ● Less code needed ● Compiles to MapReduce code www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 3. Pig – How does it work ? ● Three ways to use it – Grunt – Pig's interactive shell – Write Pig Latin in a script file – Embed Pig commands in another language ● Run modes – Local mode – single machine – Hadoop – run on a Hadoop/MapReduce cluster ● Creates MapReduce code automatically www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 4. Pig – Why use it ? ● It is quicker ● It is data omnivorous ● It is easy to learn ● It is widely used ● Minor performance loss – Compared to native code ● It can be extended via user defined functions ( UDF ) www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 5. PigLatin Data Types ● Int ● Long ● Float ● Double ● Chararray ● Bytearray ● Tuple ● Bag ● Map www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 6. PigLatin Maths Some of the built in maths functions ● ABS ● CEIL ● EXP ● FLOOR ● LOG ● ROUND ● SIN ● TAN www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 7. PigLatin Example Example borrowed from Wikipedia input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray); -- Extract words from each line and put them into a pig bag -- datatype, then flatten the bag to get one word on each row words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word; -- filter out any words that are just white spaces filtered_words = FILTER words BY word MATCHES 'w+'; -- create a group for each word word_groups = GROUP filtered_words BY word; -- count the entries in each group word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word; -- order the records by count ordered_word_count = ORDER word_count BY count DESC; STORE ordered_word_count INTO '/tmp/number-of-words-on-internet'; www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 8. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems