Published on

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Apache Pig 아꿈사 박민규
  2. 2. Talking Pig Pig Latin Data Flow Hive
  3. 3. Pig Dev Philosophy Pigs Eat Anything. Pigs Live Anywhere. Pigs Are Domestic Animals. Pigs Fly. http://pig.apache.org/philosophy.html
  4. 4. Data Type Name Description Example Scalar int Signed 32-bit integer 10 long Signed 64-bit integer 10L float 32-bit floating point 10.5F double 64-bit floating point 10.5 Arrays chararray Character array (string) in Unicode UTF-8 format Hello World bytearray Byte array (blog) Complex tuple An ordered set of fields. (19,2) bag An collection of tuples. {(19,2), (18, 1)} map A set of key value pairs. [open#apache] Key is chararray. Key is unique. Value is any type.
  5. 5. Run local pix -x local grunt> a = LOAD '/etc/passwd' USING PigStorage(':'); grunt> DUMP a; grunt> EXPLAIN a; grunt> b = FOREACH a GENERATE $0 as id; grunt> DUMP b; grunt> EXPLAIN b; grunt> c = FOREACH a GENERATE $1 as id; grunt> DUMP c; grunt> EXPLAIN c; grunt> STORE b INTO ‘id.out’;
  6. 6. Word Count pix -x local grunt> a = LOAD './input.txt'; grunt> b = FOREACH a GENERATE FLATTEN(TOKENIZE((CHARARRAY) $0)) AS word; grunt> c = GROUP b by word; grunt> d = FOREACH c GENERATE COUNT(b), GROUP; grunt> e = ORDER d BY $0; grunt> STORE e INTO './wordcount';
  7. 7. Schema pix -x local grunt> records = LOAD ‘sample.txt’ AS(year:int, temperature:int, quality:int); grunt> filtered_records = FILTER records BY temperature != 9999 AND (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9); grunt> grouped_records = GROUP filtered_records BY year; grunt> max_temp = FOREACH grouped_records GENERATE GROUP, MAX(filterd_records.temperature); grunt> DUMP max_temp; grunt> ILLUSTRATE max_temp;
  8. 8. Functions Eval – avg, concat, count, count_star, diff, max, min, size,sum, tokenize Filter – IsEmpty Load/Store – PigStroage, BinStorage, TextLoader, PigDump
  9. 9. UDFs Extends EvalFunc, FilterFunc, LoadFunc Override Make jar grunt> REGISTER pig-examples.jar; grunt> filtered = FILTER records BY temperature != 9999 AND com.hadoop.pig.IsGoodQuality(quality); grunt> DEFINE isGood com.hadoopbook.pig.IsGoodQuality();
  10. 10. Keywords and, any, all, arrange, as, asc, AVG bag, BinStorage, by, bytearray cache, cat, cd, chararray, cogroup, CONCAT, copyFromLocal, copyToLocal, COUNT, cp, cross %declare, %default, define, desc, describe, DIFF, distinct, double, du, dump e, E, eval, exec, explain f, F, filter, flatten, float, foreach, full generate, group help if, illustrate, inner, input, int, into, is join kill l, L, left, limit, load, long, ls map, matches, MAX, MIN, mkdir, mv not, null or, order, outer, output parallel, pig, PigDump, PigStorage, pwd quit register, right, rm, rmf, run sample, set, ship, SIZE, split, stderr, stdin, stdout, store, stream, SUM TextLoader, TOKENIZE, through, tuple union, using = = != < > <= >= + - * / % ? $ . # :: ( ) [ ] { }
  11. 11. Thanx