SlideShare a Scribd company logo
PIG
Mike Unwin
Twitter: @mjunwin
Why are we talking about Pig?


Originally developed at Yahoo! now an
apache project



Engine for executing data flows in parallel
on Hadoop



Includes a language called Pig Latin for
expressing data flows



Easy to learn and extensible



Open source
What is a data flow language


Allows us to describe how data should be loaded, read,
processed and stored.



Can be simple linear flows e.g. word count



Complex workflows that include joins
Is it like SQL?


Pig Latin does look a bit like SQL e.g. Join, Group By



But SQL is declarative



In Pig you describe how the data flows



SQL you end up producing an inside out query whereas
with Pig you describe a pipeline.
SQL example
SELECT CustomerName,TotalOrders, PostCode
FROM Customers c

INNER JOIN
(
SELECT CustomerId, count(OrderId) as
FROM Orders
GROUP BY CustomerId
) as t on t.CustomerId = c.CustomerId
Same Query in Pig
orders = load ‘Orders’ as (CustomerId, OrderId);
grouped = group orders by CustomerId;

total = foreach grouped generate group,
COUNT(OrderId)
customer = load ‘Customers’ as (CustomerId,
CustomerName)

result = join total by group, customer by customerId
dump result;
Installing Pig


http://pig.apache.org/docs/r0.11.1/



Requires Java



Hadoop (it does have a built in version of hadoop which
is currently v0.20.2.)



Requires Cygwin on windows
What do you get?

Pig

Grunt Shell

Piggy Bank
Basic Pig Operators


FOREACH



FILTER



GROUP BY



ORDER BY



UNION



CROSS
Same Query in Pig
orders = load ‘Orders’ as (CustomerId, OrderId);
grouped = group orders by CustomerId;

total = foreach grouped generate group,
COUNT(OrderId)
customer = load ‘Customers’ as (CustomerId,
CustomerName)

result = join total by group, customer by customerId
dump result;
Debugging


Describe



Explain
How does Pig become a MR job?
Advantages of Pig


Easy to learn



Can achieve a lot with a small amount of code


E.g. Join example



Well written scripts can be easy to read and easy to
maintain



Has a local mode for testing scripts



Has a unit testing framework
Limitations of Pig


Unit testing



High level – often need to drop down into custom UDFs



If you are proficient at C# or F# sometimes this can be
easier to test e.g. Streaming unit allows unit testing.



Still doesn’t play nicely in a windows environment
http://elastastorage.blob.core.windows.n
et/hdinsight/PigOnHDInsight.pdf

More Related Content

What's hot

Drupal Installation & Configuration
Drupal Installation & ConfigurationDrupal Installation & Configuration
Drupal Installation & Configuration
Anil Mishra
 
Ui perf
Ui perfUi perf
초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기
OnGameServer
 
[drupalday2017] - Drupal come frontend che consuma servizi: HTTP Client Manager
[drupalday2017] - Drupal come frontend che consuma servizi: HTTP Client Manager[drupalday2017] - Drupal come frontend che consuma servizi: HTTP Client Manager
[drupalday2017] - Drupal come frontend che consuma servizi: HTTP Client Manager
DrupalDay
 
WordPress Need For Speed
WordPress Need For SpeedWordPress Need For Speed
WordPress Need For Speed
pdeschen
 
Less and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersLess and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developers
Seravo
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applications
Perrin Harkins
 
Wordpress optimization
Wordpress optimizationWordpress optimization
Wordpress optimization
paudelvinay
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
Chapter Three
 
Php
PhpPhp
Search in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize itSearch in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize it
Otto Kekäläinen
 
PHP BASIC PRESENTATION
PHP BASIC PRESENTATIONPHP BASIC PRESENTATION
PHP BASIC PRESENTATION
krutitrivedi
 
Translating WordPress themes and plugins WordCamp Bhopal 2015
Translating WordPress themes and plugins WordCamp Bhopal 2015Translating WordPress themes and plugins WordCamp Bhopal 2015
Translating WordPress themes and plugins WordCamp Bhopal 2015
Swapnil Patil
 
Debugging Drupal - How to Debug your Drupal Application
Debugging Drupal - How to Debug your Drupal ApplicationDebugging Drupal - How to Debug your Drupal Application
Debugging Drupal - How to Debug your Drupal Application
Zyxware Technologies
 
High Performance - Joomla!Days NL 2009 #jd09nl
High Performance - Joomla!Days NL 2009 #jd09nlHigh Performance - Joomla!Days NL 2009 #jd09nl
High Performance - Joomla!Days NL 2009 #jd09nl
Joomla!Days Netherlands
 
Improving PHP Application Performance with APC
Improving PHP Application Performance with APCImproving PHP Application Performance with APC
Improving PHP Application Performance with APC
vortexau
 
PHP and PDFLib
PHP and PDFLibPHP and PDFLib
PHP and PDFLib
Adam Culp
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to Thrift
Dvir Volk
 
Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)
Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)
Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)
Japheth Thomson
 
I Can Haz More Performanz?
I Can Haz More Performanz?I Can Haz More Performanz?
I Can Haz More Performanz?
Andy Melichar
 

What's hot (20)

Drupal Installation & Configuration
Drupal Installation & ConfigurationDrupal Installation & Configuration
Drupal Installation & Configuration
 
Ui perf
Ui perfUi perf
Ui perf
 
초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기
 
[drupalday2017] - Drupal come frontend che consuma servizi: HTTP Client Manager
[drupalday2017] - Drupal come frontend che consuma servizi: HTTP Client Manager[drupalday2017] - Drupal come frontend che consuma servizi: HTTP Client Manager
[drupalday2017] - Drupal come frontend che consuma servizi: HTTP Client Manager
 
WordPress Need For Speed
WordPress Need For SpeedWordPress Need For Speed
WordPress Need For Speed
 
Less and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersLess and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developers
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applications
 
Wordpress optimization
Wordpress optimizationWordpress optimization
Wordpress optimization
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
 
Php
PhpPhp
Php
 
Search in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize itSearch in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize it
 
PHP BASIC PRESENTATION
PHP BASIC PRESENTATIONPHP BASIC PRESENTATION
PHP BASIC PRESENTATION
 
Translating WordPress themes and plugins WordCamp Bhopal 2015
Translating WordPress themes and plugins WordCamp Bhopal 2015Translating WordPress themes and plugins WordCamp Bhopal 2015
Translating WordPress themes and plugins WordCamp Bhopal 2015
 
Debugging Drupal - How to Debug your Drupal Application
Debugging Drupal - How to Debug your Drupal ApplicationDebugging Drupal - How to Debug your Drupal Application
Debugging Drupal - How to Debug your Drupal Application
 
High Performance - Joomla!Days NL 2009 #jd09nl
High Performance - Joomla!Days NL 2009 #jd09nlHigh Performance - Joomla!Days NL 2009 #jd09nl
High Performance - Joomla!Days NL 2009 #jd09nl
 
Improving PHP Application Performance with APC
Improving PHP Application Performance with APCImproving PHP Application Performance with APC
Improving PHP Application Performance with APC
 
PHP and PDFLib
PHP and PDFLibPHP and PDFLib
PHP and PDFLib
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to Thrift
 
Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)
Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)
Migrating a Site Quickly with SSH and WP-CLI (It's not as scary as you think!)
 
I Can Haz More Performanz?
I Can Haz More Performanz?I Can Haz More Performanz?
I Can Haz More Performanz?
 

Viewers also liked

Lesson 6 - power point presentation 3
Lesson 6  - power point presentation 3Lesson 6  - power point presentation 3
Lesson 6 - power point presentation 3
gerbs1010
 
Chapter 3 - Presentation 1
Chapter 3  - Presentation 1Chapter 3  - Presentation 1
Chapter 3 - Presentation 1
gerbs1010
 
Lesson 6 - power point presentation 4
Lesson 6  - power point presentation 4Lesson 6  - power point presentation 4
Lesson 6 - power point presentation 4
gerbs1010
 
Lesson 6 - power point presentation 2
Lesson 6  - power point presentation 2Lesson 6  - power point presentation 2
Lesson 6 - power point presentation 2
gerbs1010
 
Novo anexo iii_2013-11-14_10_17_52
Novo anexo iii_2013-11-14_10_17_52Novo anexo iii_2013-11-14_10_17_52
Novo anexo iii_2013-11-14_10_17_52
Resgate Cambuí
 
Gestion de desechos y reciclaje
Gestion de desechos y reciclajeGestion de desechos y reciclaje
Gestion de desechos y reciclaje
Lina Paque Fl
 
Guide to camera work and editing techniques
Guide to camera work and editing techniquesGuide to camera work and editing techniques
Guide to camera work and editing techniques
amieflan
 
Sales Sheet
Sales SheetSales Sheet
Sales Sheet
Lemme Create Art
 
Adriana esther rodriguez gafaro hardware.ppt.
Adriana esther rodriguez gafaro hardware.ppt.Adriana esther rodriguez gafaro hardware.ppt.
Adriana esther rodriguez gafaro hardware.ppt.
adriana rodriguez
 
Best iphone App for Video Resumes- Jobma
Best iphone App for Video Resumes- JobmaBest iphone App for Video Resumes- Jobma
Best iphone App for Video Resumes- Jobma
Antoine Lynd
 
Yamen_Sandouk-Syriatel_Praktikum
Yamen_Sandouk-Syriatel_PraktikumYamen_Sandouk-Syriatel_Praktikum
Yamen_Sandouk-Syriatel_PraktikumYamen Sandouk
 
Realnumbersystemnotes
RealnumbersystemnotesRealnumbersystemnotes
Realnumbersystemnotes
Sourav Rider
 
Cameren battley journey to career
Cameren battley journey to careerCameren battley journey to career
Cameren battley journey to career
KingCameren
 
Seabolt, Michael L. Jr.
Seabolt, Michael L. Jr.  Seabolt, Michael L. Jr.
Seabolt, Michael L. Jr.
Mike Seabolt
 
Ministério firma parcerias para desenvolver 19 novos produtos de Saúde
Ministério firma parcerias para desenvolver 19 novos produtos de SaúdeMinistério firma parcerias para desenvolver 19 novos produtos de Saúde
Ministério firma parcerias para desenvolver 19 novos produtos de Saúde
Ministério da Saúde
 
小書從列印到折成
小書從列印到折成小書從列印到折成
小書從列印到折成bell5
 
el maravilloso mundo de los cuentos
el maravilloso mundo de los cuentosel maravilloso mundo de los cuentos
el maravilloso mundo de los cuentos
vanina33l
 
1 نماذج من الشعر في العصر الجاهلي
1  نماذج من الشعر في العصر الجاهلي1  نماذج من الشعر في العصر الجاهلي
1 نماذج من الشعر في العصر الجاهلي
Top4Design
 
Chapter 2 presentation - 1
Chapter 2 presentation  - 1Chapter 2 presentation  - 1
Chapter 2 presentation - 1
gerbs1010
 
Chapter 2 presentation - 2
Chapter 2 presentation  - 2Chapter 2 presentation  - 2
Chapter 2 presentation - 2
gerbs1010
 

Viewers also liked (20)

Lesson 6 - power point presentation 3
Lesson 6  - power point presentation 3Lesson 6  - power point presentation 3
Lesson 6 - power point presentation 3
 
Chapter 3 - Presentation 1
Chapter 3  - Presentation 1Chapter 3  - Presentation 1
Chapter 3 - Presentation 1
 
Lesson 6 - power point presentation 4
Lesson 6  - power point presentation 4Lesson 6  - power point presentation 4
Lesson 6 - power point presentation 4
 
Lesson 6 - power point presentation 2
Lesson 6  - power point presentation 2Lesson 6  - power point presentation 2
Lesson 6 - power point presentation 2
 
Novo anexo iii_2013-11-14_10_17_52
Novo anexo iii_2013-11-14_10_17_52Novo anexo iii_2013-11-14_10_17_52
Novo anexo iii_2013-11-14_10_17_52
 
Gestion de desechos y reciclaje
Gestion de desechos y reciclajeGestion de desechos y reciclaje
Gestion de desechos y reciclaje
 
Guide to camera work and editing techniques
Guide to camera work and editing techniquesGuide to camera work and editing techniques
Guide to camera work and editing techniques
 
Sales Sheet
Sales SheetSales Sheet
Sales Sheet
 
Adriana esther rodriguez gafaro hardware.ppt.
Adriana esther rodriguez gafaro hardware.ppt.Adriana esther rodriguez gafaro hardware.ppt.
Adriana esther rodriguez gafaro hardware.ppt.
 
Best iphone App for Video Resumes- Jobma
Best iphone App for Video Resumes- JobmaBest iphone App for Video Resumes- Jobma
Best iphone App for Video Resumes- Jobma
 
Yamen_Sandouk-Syriatel_Praktikum
Yamen_Sandouk-Syriatel_PraktikumYamen_Sandouk-Syriatel_Praktikum
Yamen_Sandouk-Syriatel_Praktikum
 
Realnumbersystemnotes
RealnumbersystemnotesRealnumbersystemnotes
Realnumbersystemnotes
 
Cameren battley journey to career
Cameren battley journey to careerCameren battley journey to career
Cameren battley journey to career
 
Seabolt, Michael L. Jr.
Seabolt, Michael L. Jr.  Seabolt, Michael L. Jr.
Seabolt, Michael L. Jr.
 
Ministério firma parcerias para desenvolver 19 novos produtos de Saúde
Ministério firma parcerias para desenvolver 19 novos produtos de SaúdeMinistério firma parcerias para desenvolver 19 novos produtos de Saúde
Ministério firma parcerias para desenvolver 19 novos produtos de Saúde
 
小書從列印到折成
小書從列印到折成小書從列印到折成
小書從列印到折成
 
el maravilloso mundo de los cuentos
el maravilloso mundo de los cuentosel maravilloso mundo de los cuentos
el maravilloso mundo de los cuentos
 
1 نماذج من الشعر في العصر الجاهلي
1  نماذج من الشعر في العصر الجاهلي1  نماذج من الشعر في العصر الجاهلي
1 نماذج من الشعر في العصر الجاهلي
 
Chapter 2 presentation - 1
Chapter 2 presentation  - 1Chapter 2 presentation  - 1
Chapter 2 presentation - 1
 
Chapter 2 presentation - 2
Chapter 2 presentation  - 2Chapter 2 presentation  - 2
Chapter 2 presentation - 2
 

Similar to Introduction to Pig

Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
Jason Shao
 
Get your teeth into Plack
Get your teeth into PlackGet your teeth into Plack
Get your teeth into Plack
Workhorse Computing
 
January 2011 HUG: Pig Presentation
January 2011 HUG: Pig PresentationJanuary 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
Yahoo Developer Network
 
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateApache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Yahoo Developer Network
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
Subhas Kumar Ghosh
 
Build your own discovery index of scholary e-resources
Build your own discovery index of scholary e-resourcesBuild your own discovery index of scholary e-resources
Build your own discovery index of scholary e-resources
Martin Czygan
 
A Brief Introduce to WSGI
A Brief Introduce to WSGIA Brief Introduce to WSGI
A Brief Introduce to WSGI
Mingli Yuan
 
Apache Pig
Apache PigApache Pig
node.js: Javascript's in your backend
node.js: Javascript's in your backendnode.js: Javascript's in your backend
node.js: Javascript's in your backend
David Padbury
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
Antonio Silveira
 
NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010
Christopher Curtin
 
NodeJS
NodeJSNodeJS
NodeJS
LinkMe Srl
 
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
gethue
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
Sadayuki Furuhashi
 
"Xapi-lang For declarative code generation" By James Nelson
"Xapi-lang For declarative code generation" By James Nelson"Xapi-lang For declarative code generation" By James Nelson
"Xapi-lang For declarative code generation" By James Nelson
GWTcon
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure team
Brian Benz
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
Hadoop online training
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 
Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010
Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010 Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010
Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010
Matt Gauger
 
Drupal in 30 Minutes
Drupal in 30 MinutesDrupal in 30 Minutes
Drupal in 30 Minutes
Robert Carr
 

Similar to Introduction to Pig (20)

Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
Get your teeth into Plack
Get your teeth into PlackGet your teeth into Plack
Get your teeth into Plack
 
January 2011 HUG: Pig Presentation
January 2011 HUG: Pig PresentationJanuary 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
 
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateApache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
Build your own discovery index of scholary e-resources
Build your own discovery index of scholary e-resourcesBuild your own discovery index of scholary e-resources
Build your own discovery index of scholary e-resources
 
A Brief Introduce to WSGI
A Brief Introduce to WSGIA Brief Introduce to WSGI
A Brief Introduce to WSGI
 
Apache Pig
Apache PigApache Pig
Apache Pig
 
node.js: Javascript's in your backend
node.js: Javascript's in your backendnode.js: Javascript's in your backend
node.js: Javascript's in your backend
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010
 
NodeJS
NodeJSNodeJS
NodeJS
 
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
 
"Xapi-lang For declarative code generation" By James Nelson
"Xapi-lang For declarative code generation" By James Nelson"Xapi-lang For declarative code generation" By James Nelson
"Xapi-lang For declarative code generation" By James Nelson
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure team
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010
Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010 Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010
Matt Gauger - Lamp vs. the world - MKE PHP Users Group - December 14, 2010
 
Drupal in 30 Minutes
Drupal in 30 MinutesDrupal in 30 Minutes
Drupal in 30 Minutes
 

Recently uploaded

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 

Recently uploaded (20)

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 

Introduction to Pig

  • 2. Why are we talking about Pig?  Originally developed at Yahoo! now an apache project  Engine for executing data flows in parallel on Hadoop  Includes a language called Pig Latin for expressing data flows  Easy to learn and extensible  Open source
  • 3. What is a data flow language  Allows us to describe how data should be loaded, read, processed and stored.  Can be simple linear flows e.g. word count  Complex workflows that include joins
  • 4. Is it like SQL?  Pig Latin does look a bit like SQL e.g. Join, Group By  But SQL is declarative  In Pig you describe how the data flows  SQL you end up producing an inside out query whereas with Pig you describe a pipeline.
  • 5. SQL example SELECT CustomerName,TotalOrders, PostCode FROM Customers c INNER JOIN ( SELECT CustomerId, count(OrderId) as FROM Orders GROUP BY CustomerId ) as t on t.CustomerId = c.CustomerId
  • 6. Same Query in Pig orders = load ‘Orders’ as (CustomerId, OrderId); grouped = group orders by CustomerId; total = foreach grouped generate group, COUNT(OrderId) customer = load ‘Customers’ as (CustomerId, CustomerName) result = join total by group, customer by customerId dump result;
  • 7. Installing Pig  http://pig.apache.org/docs/r0.11.1/  Requires Java  Hadoop (it does have a built in version of hadoop which is currently v0.20.2.)  Requires Cygwin on windows
  • 8. What do you get? Pig Grunt Shell Piggy Bank
  • 9. Basic Pig Operators  FOREACH  FILTER  GROUP BY  ORDER BY  UNION  CROSS
  • 10. Same Query in Pig orders = load ‘Orders’ as (CustomerId, OrderId); grouped = group orders by CustomerId; total = foreach grouped generate group, COUNT(OrderId) customer = load ‘Customers’ as (CustomerId, CustomerName) result = join total by group, customer by customerId dump result;
  • 12. How does Pig become a MR job?
  • 13. Advantages of Pig  Easy to learn  Can achieve a lot with a small amount of code  E.g. Join example  Well written scripts can be easy to read and easy to maintain  Has a local mode for testing scripts  Has a unit testing framework
  • 14. Limitations of Pig  Unit testing  High level – often need to drop down into custom UDFs  If you are proficient at C# or F# sometimes this can be easier to test e.g. Streaming unit allows unit testing.  Still doesn’t play nicely in a windows environment