• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Pig
 

Pig

on

  • 439 views

 

Statistics

Views

Total Views
439
Views on SlideShare
439
Embed Views
0

Actions

Likes
1
Downloads
16
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Pig Pig Presentation Transcript

    • Apache Pig 아꿈사 박민규
    • Talking Pig Pig Latin Data Flow Hive
    • Pig Dev Philosophy Pigs Eat Anything. Pigs Live Anywhere. Pigs Are Domestic Animals. Pigs Fly. http://pig.apache.org/philosophy.html
    • Data Type Name Description Example Scalar int Signed 32-bit integer 10 long Signed 64-bit integer 10L float 32-bit floating point 10.5F double 64-bit floating point 10.5 Arrays chararray Character array (string) in Unicode UTF-8 format Hello World bytearray Byte array (blog) Complex tuple An ordered set of fields. (19,2) bag An collection of tuples. {(19,2), (18, 1)} map A set of key value pairs. [open#apache] Key is chararray. Key is unique. Value is any type.
    • Run local pix -x local grunt> a = LOAD '/etc/passwd' USING PigStorage(':'); grunt> DUMP a; grunt> EXPLAIN a; grunt> b = FOREACH a GENERATE $0 as id; grunt> DUMP b; grunt> EXPLAIN b; grunt> c = FOREACH a GENERATE $1 as id; grunt> DUMP c; grunt> EXPLAIN c; grunt> STORE b INTO ‘id.out’;
    • Word Count pix -x local grunt> a = LOAD './input.txt'; grunt> b = FOREACH a GENERATE FLATTEN(TOKENIZE((CHARARRAY) $0)) AS word; grunt> c = GROUP b by word; grunt> d = FOREACH c GENERATE COUNT(b), GROUP; grunt> e = ORDER d BY $0; grunt> STORE e INTO './wordcount';
    • Schema pix -x local grunt> records = LOAD ‘sample.txt’ AS(year:int, temperature:int, quality:int); grunt> filtered_records = FILTER records BY temperature != 9999 AND (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9); grunt> grouped_records = GROUP filtered_records BY year; grunt> max_temp = FOREACH grouped_records GENERATE GROUP, MAX(filterd_records.temperature); grunt> DUMP max_temp; grunt> ILLUSTRATE max_temp;
    • Functions Eval – avg, concat, count, count_star, diff, max, min, size,sum, tokenize Filter – IsEmpty Load/Store – PigStroage, BinStorage, TextLoader, PigDump
    • UDFs Extends EvalFunc, FilterFunc, LoadFunc Override Make jar grunt> REGISTER pig-examples.jar; grunt> filtered = FILTER records BY temperature != 9999 AND com.hadoop.pig.IsGoodQuality(quality); grunt> DEFINE isGood com.hadoopbook.pig.IsGoodQuality();
    • Keywords and, any, all, arrange, as, asc, AVG bag, BinStorage, by, bytearray cache, cat, cd, chararray, cogroup, CONCAT, copyFromLocal, copyToLocal, COUNT, cp, cross %declare, %default, define, desc, describe, DIFF, distinct, double, du, dump e, E, eval, exec, explain f, F, filter, flatten, float, foreach, full generate, group help if, illustrate, inner, input, int, into, is join kill l, L, left, limit, load, long, ls map, matches, MAX, MIN, mkdir, mv not, null or, order, outer, output parallel, pig, PigDump, PigStorage, pwd quit register, right, rm, rmf, run sample, set, ship, SIZE, split, stderr, stdin, stdout, store, stream, SUM TextLoader, TOKENIZE, through, tuple union, using = = != < > <= >= + - * / % ? $ . # :: ( ) [ ] { }
    • Thanx