Upcoming SlideShare
Loading in...5







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Pig Pig Presentation Transcript

  • Apache Pig 아꿈사 박민규
  • Talking Pig Pig Latin Data Flow Hive
  • Pig Dev Philosophy Pigs Eat Anything. Pigs Live Anywhere. Pigs Are Domestic Animals. Pigs Fly. http://pig.apache.org/philosophy.html View slide
  • Data Type Name Description Example Scalar int Signed 32-bit integer 10 long Signed 64-bit integer 10L float 32-bit floating point 10.5F double 64-bit floating point 10.5 Arrays chararray Character array (string) in Unicode UTF-8 format Hello World bytearray Byte array (blog) Complex tuple An ordered set of fields. (19,2) bag An collection of tuples. {(19,2), (18, 1)} map A set of key value pairs. [open#apache] Key is chararray. Key is unique. Value is any type. View slide
  • Run local pix -x local grunt> a = LOAD '/etc/passwd' USING PigStorage(':'); grunt> DUMP a; grunt> EXPLAIN a; grunt> b = FOREACH a GENERATE $0 as id; grunt> DUMP b; grunt> EXPLAIN b; grunt> c = FOREACH a GENERATE $1 as id; grunt> DUMP c; grunt> EXPLAIN c; grunt> STORE b INTO ‘id.out’;
  • Word Count pix -x local grunt> a = LOAD './input.txt'; grunt> b = FOREACH a GENERATE FLATTEN(TOKENIZE((CHARARRAY) $0)) AS word; grunt> c = GROUP b by word; grunt> d = FOREACH c GENERATE COUNT(b), GROUP; grunt> e = ORDER d BY $0; grunt> STORE e INTO './wordcount';
  • Schema pix -x local grunt> records = LOAD ‘sample.txt’ AS(year:int, temperature:int, quality:int); grunt> filtered_records = FILTER records BY temperature != 9999 AND (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9); grunt> grouped_records = GROUP filtered_records BY year; grunt> max_temp = FOREACH grouped_records GENERATE GROUP, MAX(filterd_records.temperature); grunt> DUMP max_temp; grunt> ILLUSTRATE max_temp;
  • Functions Eval – avg, concat, count, count_star, diff, max, min, size,sum, tokenize Filter – IsEmpty Load/Store – PigStroage, BinStorage, TextLoader, PigDump
  • UDFs Extends EvalFunc, FilterFunc, LoadFunc Override Make jar grunt> REGISTER pig-examples.jar; grunt> filtered = FILTER records BY temperature != 9999 AND com.hadoop.pig.IsGoodQuality(quality); grunt> DEFINE isGood com.hadoopbook.pig.IsGoodQuality();
  • Keywords and, any, all, arrange, as, asc, AVG bag, BinStorage, by, bytearray cache, cat, cd, chararray, cogroup, CONCAT, copyFromLocal, copyToLocal, COUNT, cp, cross %declare, %default, define, desc, describe, DIFF, distinct, double, du, dump e, E, eval, exec, explain f, F, filter, flatten, float, foreach, full generate, group help if, illustrate, inner, input, int, into, is join kill l, L, left, limit, load, long, ls map, matches, MAX, MIN, mkdir, mv not, null or, order, outer, output parallel, pig, PigDump, PigStorage, pwd quit register, right, rm, rmf, run sample, set, ship, SIZE, split, stderr, stdin, stdout, store, stream, SUM TextLoader, TOKENIZE, through, tuple union, using = = != < > <= >= + - * / % ? $ . # :: ( ) [ ] { }
  • Thanx