Pig

792 views

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
792
On SlideShare
0
From Embeds
0
Number of Embeds
61
Actions
Shares
0
Downloads
38
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Pig

  1. 1. Apache Pig 아꿈사 박민규
  2. 2. Talking Pig Pig Latin Data Flow Hive
  3. 3. Pig Dev Philosophy Pigs Eat Anything. Pigs Live Anywhere. Pigs Are Domestic Animals. Pigs Fly. http://pig.apache.org/philosophy.html
  4. 4. Data Type Name Description Example Scalar int Signed 32-bit integer 10 long Signed 64-bit integer 10L float 32-bit floating point 10.5F double 64-bit floating point 10.5 Arrays chararray Character array (string) in Unicode UTF-8 format Hello World bytearray Byte array (blog) Complex tuple An ordered set of fields. (19,2) bag An collection of tuples. {(19,2), (18, 1)} map A set of key value pairs. [open#apache] Key is chararray. Key is unique. Value is any type.
  5. 5. Run local pix -x local grunt> a = LOAD '/etc/passwd' USING PigStorage(':'); grunt> DUMP a; grunt> EXPLAIN a; grunt> b = FOREACH a GENERATE $0 as id; grunt> DUMP b; grunt> EXPLAIN b; grunt> c = FOREACH a GENERATE $1 as id; grunt> DUMP c; grunt> EXPLAIN c; grunt> STORE b INTO ‘id.out’;
  6. 6. Word Count pix -x local grunt> a = LOAD './input.txt'; grunt> b = FOREACH a GENERATE FLATTEN(TOKENIZE((CHARARRAY) $0)) AS word; grunt> c = GROUP b by word; grunt> d = FOREACH c GENERATE COUNT(b), GROUP; grunt> e = ORDER d BY $0; grunt> STORE e INTO './wordcount';
  7. 7. Schema pix -x local grunt> records = LOAD ‘sample.txt’ AS(year:int, temperature:int, quality:int); grunt> filtered_records = FILTER records BY temperature != 9999 AND (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9); grunt> grouped_records = GROUP filtered_records BY year; grunt> max_temp = FOREACH grouped_records GENERATE GROUP, MAX(filterd_records.temperature); grunt> DUMP max_temp; grunt> ILLUSTRATE max_temp;
  8. 8. Functions Eval – avg, concat, count, count_star, diff, max, min, size,sum, tokenize Filter – IsEmpty Load/Store – PigStroage, BinStorage, TextLoader, PigDump
  9. 9. UDFs Extends EvalFunc, FilterFunc, LoadFunc Override Make jar grunt> REGISTER pig-examples.jar; grunt> filtered = FILTER records BY temperature != 9999 AND com.hadoop.pig.IsGoodQuality(quality); grunt> DEFINE isGood com.hadoopbook.pig.IsGoodQuality();
  10. 10. Keywords and, any, all, arrange, as, asc, AVG bag, BinStorage, by, bytearray cache, cat, cd, chararray, cogroup, CONCAT, copyFromLocal, copyToLocal, COUNT, cp, cross %declare, %default, define, desc, describe, DIFF, distinct, double, du, dump e, E, eval, exec, explain f, F, filter, flatten, float, foreach, full generate, group help if, illustrate, inner, input, int, into, is join kill l, L, left, limit, load, long, ls map, matches, MAX, MIN, mkdir, mv not, null or, order, outer, output parallel, pig, PigDump, PigStorage, pwd quit register, right, rm, rmf, run sample, set, ship, SIZE, split, stderr, stdin, stdout, store, stream, SUM TextLoader, TOKENIZE, through, tuple union, using = = != < > <= >= + - * / % ? $ . # :: ( ) [ ] { }
  11. 11. Thanx

×