An Introduction to Apache Pig

•Download as ODP, PDF•

3 likes•1,010 views

An Introduction to Apache Pig, what is it used for ? How does it work and why use it compared to Map Reduce native code ?

Technology

Apache Pig
● What is it ?
● How does it work ?
● Why use it ?
● PigLatin Data Types
● PigLatin Maths
● PigLatin Example
www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Pig – What is it ?
● A high level language
● Used to analyse large data sets
● Used to create MapReduce jobs
● Abstracts definition of jobs
● Uses Pig Latin to define jobs
● Less code needed
● Compiles to MapReduce code
www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Pig – How does it work ?
● Three ways to use it
– Grunt – Pig's interactive shell
– Write Pig Latin in a script file
– Embed Pig commands in another language
● Run modes
– Local mode – single machine
– Hadoop – run on a Hadoop/MapReduce cluster
● Creates MapReduce code automatically
www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Pig – Why use it ?
● It is quicker
● It is data omnivorous
● It is easy to learn
● It is widely used
● Minor performance loss
– Compared to native code
● It can be extended via user defined functions ( UDF )
www.semtech-solutions.co.nz info@semtech-solutions.co.nz

PigLatin Data Types
● Int
● Long
● Float
● Double
● Chararray
● Bytearray
● Tuple
● Bag
● Map
www.semtech-solutions.co.nz info@semtech-solutions.co.nz

PigLatin Maths
Some of the built in maths functions
● ABS
● CEIL
● EXP
● FLOOR
● LOG
● ROUND
● SIN
● TAN
www.semtech-solutions.co.nz info@semtech-solutions.co.nz

PigLatin Example
Example borrowed from Wikipedia
input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray);
-- Extract words from each line and put them into a pig bag
-- datatype, then flatten the bag to get one word on each row
words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;
-- filter out any words that are just white spaces
filtered_words = FILTER words BY word MATCHES 'w+';
-- create a group for each word
word_groups = GROUP filtered_words BY word;
-- count the entries in each group
word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word;
-- order the records by count
ordered_word_count = ORDER word_count BY count DESC;
STORE ordered_word_count INTO '/tmp/number-of-words-on-internet';
www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems

Viewers also liked

FAO - agribusiness handbook: refined oilsHernani Larrea

Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi

Tamilnadu Cuisine Dr. Sunil Kumar

7.4 organic natural products 7.4Martin Brown

51 Use Cases and implications for HPC & Apache Big Data StackGeoffrey Fox

Ginger Cultivation, Ginger Processing and Ginger Value Added Products, Ginger...Ajjay Kumar Gupta

Marketing Edible Oil IndustryJaspal Bhatia

EXTRACTION OF OILSAsra Hameed

oil extractionAirria Pinkice

Tamil Nadu TourismThe Other Home

Culture of Tamil Nadu Kasthuri Rengan

Cooking oil brand management strategyMarketing_Pro

Cooking oilHari Krishnan

8 Key Life and Leadership LessonsStanford Graduate School of Business

7 Lessons for Aspiring LeadersStanford Graduate School of Business

Viewers also liked (15)

FAO - agribusiness handbook: refined oils

Overview of Apache Fink: The 4G of Big Data Analytics Frameworks

Tamilnadu Cuisine

7.4 organic natural products 7.4

51 Use Cases and implications for HPC & Apache Big Data Stack

Ginger Cultivation, Ginger Processing and Ginger Value Added Products, Ginger...

Marketing Edible Oil Industry

EXTRACTION OF OILS

oil extraction

Tamil Nadu Tourism

Culture of Tamil Nadu

Cooking oil brand management strategy

Cooking oil

8 Key Life and Leadership Lessons

7 Lessons for Aspiring Leaders

Recently uploaded

Slack Application Development 101 Slidespraypatel2

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Understanding the Laravel MVC ArchitecturePixlogix Infotech

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Recently uploaded (20)

Slack Application Development 101 Slides

Handwritten Text Recognition for manuscripts and early printed texts

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

🐬 The future of MySQL is Postgres 🐘

Maximizing Board Effectiveness 2024 Webinar.pptx

Salesforce Community Group Quito, Salesforce 101

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Understanding the Laravel MVC Architecture

[2024]Digital Global Overview Report 2024 Meltwater.pdf

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Breaking the Kubernetes Kill Chain: Host Path Mount

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

The 7 Things I Know About Cyber Security After 25 Years | April 2024

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

How to Troubleshoot Apps for the Modern Connected Worker

My Hashitalk Indonesia April 2024 Presentation

An Introduction to Apache Pig

1. Apache Pig ● What is it ? ● How does it work ? ● Why use it ? ● PigLatin Data Types ● PigLatin Maths ● PigLatin Example www.semtech-solutions.co.nz info@semtech-solutions.co.nz

2. Pig – What is it ? ● A high level language ● Used to analyse large data sets ● Used to create MapReduce jobs ● Abstracts definition of jobs ● Uses Pig Latin to define jobs ● Less code needed ● Compiles to MapReduce code www.semtech-solutions.co.nz info@semtech-solutions.co.nz

3. Pig – How does it work ? ● Three ways to use it – Grunt – Pig's interactive shell – Write Pig Latin in a script file – Embed Pig commands in another language ● Run modes – Local mode – single machine – Hadoop – run on a Hadoop/MapReduce cluster ● Creates MapReduce code automatically www.semtech-solutions.co.nz info@semtech-solutions.co.nz

4. Pig – Why use it ? ● It is quicker ● It is data omnivorous ● It is easy to learn ● It is widely used ● Minor performance loss – Compared to native code ● It can be extended via user defined functions ( UDF ) www.semtech-solutions.co.nz info@semtech-solutions.co.nz

5. PigLatin Data Types ● Int ● Long ● Float ● Double ● Chararray ● Bytearray ● Tuple ● Bag ● Map www.semtech-solutions.co.nz info@semtech-solutions.co.nz

6. PigLatin Maths Some of the built in maths functions ● ABS ● CEIL ● EXP ● FLOOR ● LOG ● ROUND ● SIN ● TAN www.semtech-solutions.co.nz info@semtech-solutions.co.nz

7. PigLatin Example Example borrowed from Wikipedia input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray); -- Extract words from each line and put them into a pig bag -- datatype, then flatten the bag to get one word on each row words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word; -- filter out any words that are just white spaces filtered_words = FILTER words BY word MATCHES 'w+'; -- create a group for each word word_groups = GROUP filtered_words BY word; -- count the entries in each group word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word; -- order the records by count ordered_word_count = ORDER word_count BY count DESC; STORE ordered_word_count INTO '/tmp/number-of-words-on-internet'; www.semtech-solutions.co.nz info@semtech-solutions.co.nz

8. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems

An Introduction to Apache Pig

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (15)

More from Mike Frampton

More from Mike Frampton (20)

Recently uploaded

Recently uploaded (20)

An Introduction to Apache Pig