Submit Search
Upload
Introduction to pig
•
3 likes
•
1,254 views
Ravi Mutyala
Follow
These are the slide deck that I used for the presentation at Houston Hadoop Meetup Group.
Read less
Read more
Technology
Report
Share
Report
Share
1 of 20
Recommended
Sql saturday pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
Wes Floyd
Big Data Laboratory
Big Data Laboratory
J Singh
OSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier Renault
NETWAYS
Introduction to Apache Pig
Introduction to Apache Pig
Tapan Avasthi
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
Skillspeed
Introduction to Apache Hive
Introduction to Apache Hive
Tapan Avasthi
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
Owen O'Malley
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
Werner Keil
Recommended
Sql saturday pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
Wes Floyd
Big Data Laboratory
Big Data Laboratory
J Singh
OSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier Renault
NETWAYS
Introduction to Apache Pig
Introduction to Apache Pig
Tapan Avasthi
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
Skillspeed
Introduction to Apache Hive
Introduction to Apache Hive
Tapan Avasthi
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
Owen O'Malley
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
Werner Keil
YARN - Strata 2014
YARN - Strata 2014
Hortonworks
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/Reduce
J Singh
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
Jerry Wen
Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...
Julien Le Dem
HBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, Solutions
DataWorks Summit
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
DataWorks Summit
November 2014 HUG: Apache Tez - A Performance View into Large Scale Data-proc...
November 2014 HUG: Apache Tez - A Performance View into Large Scale Data-proc...
Yahoo Developer Network
Hadoop pycon2011uk
Hadoop pycon2011uk
Aditya Sakhuja
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
Yifeng Jiang
Apache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
Hdp developer apache spark using python (lab guide) by hortonworks university...
Hdp developer apache spark using python (lab guide) by hortonworks university...
ssusercda69b
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Skillspeed
Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
Josh Elser
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
Bryan Bende
Hive Does ACID
Hive Does ACID
DataWorks Summit
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop: Apache Ambari
Hortonworks
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
Josh Elser
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdp
Mark Kerzner
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS
Mark Kerzner
More Related Content
What's hot
YARN - Strata 2014
YARN - Strata 2014
Hortonworks
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/Reduce
J Singh
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
Jerry Wen
Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...
Julien Le Dem
HBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, Solutions
DataWorks Summit
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
DataWorks Summit
November 2014 HUG: Apache Tez - A Performance View into Large Scale Data-proc...
November 2014 HUG: Apache Tez - A Performance View into Large Scale Data-proc...
Yahoo Developer Network
Hadoop pycon2011uk
Hadoop pycon2011uk
Aditya Sakhuja
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
Yifeng Jiang
Apache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
Hdp developer apache spark using python (lab guide) by hortonworks university...
Hdp developer apache spark using python (lab guide) by hortonworks university...
ssusercda69b
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Skillspeed
Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
Josh Elser
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
Bryan Bende
Hive Does ACID
Hive Does ACID
DataWorks Summit
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop: Apache Ambari
Hortonworks
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
Josh Elser
What's hot
(20)
YARN - Strata 2014
YARN - Strata 2014
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/Reduce
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...
HBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, Solutions
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
November 2014 HUG: Apache Tez - A Performance View into Large Scale Data-proc...
November 2014 HUG: Apache Tez - A Performance View into Large Scale Data-proc...
Hadoop pycon2011uk
Hadoop pycon2011uk
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
Apache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
Hdp developer apache spark using python (lab guide) by hortonworks university...
Hdp developer apache spark using python (lab guide) by hortonworks university...
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
Hive Does ACID
Hive Does ACID
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop: Apache Ambari
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
Viewers also liked
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdp
Mark Kerzner
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS
Mark Kerzner
Zeta architecture -2015
Zeta architecture -2015
MapR Technologies
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Mark Kerzner
Cloudera search
Cloudera search
Mark Kerzner
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Mark Kerzner
Oil and gas big data edition
Oil and gas big data edition
Mark Kerzner
Launching your career in Big Data
Launching your career in Big Data
Sujee Maniyam
Hadoop to spark_v2
Hadoop to spark_v2
elephantscale
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
MapR Technologies
SHMcloud vision
SHMcloud vision
Mark Kerzner
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
Mark Kerzner
Viewers also liked
(12)
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdp
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS
Zeta architecture -2015
Zeta architecture -2015
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Cloudera search
Cloudera search
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Oil and gas big data edition
Oil and gas big data edition
Launching your career in Big Data
Launching your career in Big Data
Hadoop to spark_v2
Hadoop to spark_v2
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
SHMcloud vision
SHMcloud vision
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
Similar to Introduction to pig
Inside hadoop-dev
Inside hadoop-dev
Steve Loughran
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
PatrickCrompton
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis
Hortonworks
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)
Hortonworks
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
DataWorks Summit
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Data Con LA
Munich HUG 21.11.2013
Munich HUG 21.11.2013
Emil Andreas Siemes
201305 hadoop jpl-v3
201305 hadoop jpl-v3
Eric Baldeschwieler
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
Hortonworks
Pig Out to Hadoop
Pig Out to Hadoop
Hortonworks
Hadoop In Action
Hadoop In Action
Bigdata Meetup Kochi
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
Saptak Sen
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
Hortonworks
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
Introduction to hadoop V2
Introduction to hadoop V2
TarjeiRomtveit
Containerdays Intro to Habitat
Containerdays Intro to Habitat
Mandi Walls
Orange County HUG - Agile Data on HDP
Orange County HUG - Agile Data on HDP
Hortonworks
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
DataWorks Summit
LA HUG - Agile Analytics Applications on HDP
LA HUG - Agile Analytics Applications on HDP
Hortonworks
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
Similar to Introduction to pig
(20)
Inside hadoop-dev
Inside hadoop-dev
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Munich HUG 21.11.2013
Munich HUG 21.11.2013
201305 hadoop jpl-v3
201305 hadoop jpl-v3
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
Pig Out to Hadoop
Pig Out to Hadoop
Hadoop In Action
Hadoop In Action
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Introduction to hadoop V2
Introduction to hadoop V2
Containerdays Intro to Habitat
Containerdays Intro to Habitat
Orange County HUG - Agile Data on HDP
Orange County HUG - Agile Data on HDP
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
LA HUG - Agile Analytics Applications on HDP
LA HUG - Agile Analytics Applications on HDP
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
Recently uploaded
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Jeffrey Haguewood
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
rafiqahmad00786416
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Orbitshub
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
WSO2
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Zilliz
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
MIND CTI
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Christopher Logan Kennedy
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
apidays
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Edi Saputra
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Zilliz
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
UiPathCommunity
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
Remote DBA Services
Recently uploaded
(20)
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
Introduction to pig
1.
Apache Pig –
Introduction and Hands-on Ravi Mutyala Systems Architect, Hortonworks Twitter: @rmutyala © Hortonworks Inc. 2012
2.
Big Data Platforms Cost
per TB, Adoption Size of bubble = cost effectiveness of solution Source: 2
3.
Topics • What is
Pig? • Why Pig ? • Language Features • Labs • 0.10.0 Features • Features in the pipeline •Q &A Page 3 © Hortonworks Inc. 2012
4.
What is Pig? •
System for processing large unstructured Data • Uses HDFS and MapReduce • Data flow Language • Directional Asymptotic Graph • Started at Yahoo! Research • Joined Apache incubator in 2007 • Graduated to Subproject of Hadoop in 2008 • Top level project in Apache since 2010 Page 4 © Hortonworks Inc. 2012
5.
Pig Philosophy •
Pigs eat anything • Pigs live anywhere • Pigs are domesticated animals • Pigs can fly Page 5 © Hortonworks Inc. 2012
6.
Components • Pig Engine
– Parser, Optimizer and distributed query execution • Grunt – CLI shell • Pig Latin – Procedural Language Page 6 © Hortonworks Inc. 2012
7.
Why Pig ? •
High level language that increases programmer productivity. • Designed for Parallel Data flow. • Reduces complexity by abstracting low level Map and Reduce jobs and Map Reduce job chaining • Can be run on a client/gateway machine with no configuration on the cluster • Multiple versions of Pig can co-exist as long as they are compatible with Hadoop version. Page 7 © Hortonworks Inc. 2012
8.
Running Pig Pig Latin
script executes in 3 modes • MapReduce: Code executes as MapReduce on a Hadoop Cluster $ pig myscript.pig • Local: Code executes locally in a single JVM using local data $ pig –x local myscript.pig • Interactive: pig with no script starts the grunt shell where commands can be run interactively Page 8 © Hortonworks Inc. 2012
9.
GRUNT shell • fs
-ls • fs -cat filename • fs -copyFromLocal localfile hdfsfile Page 9 © Hortonworks Inc. 2012
10.
Data Types • Scalar
Types – int, long, float, double, chararray, bytearray, boolean, datetime • Complex Types – Map. Collection of key value pairs – [name#alan, age#30] – Tuple. Ordered set of values – (alan,40,engineering) – Bags. Unordered collection of tuples – {(alan,40,engineering),(bob,45,sales)} Page 10 © Hortonworks Inc. 2012
11.
• Relations and
a set of operations that work on relations • Schema for relations is optional • $0… $n can be used for fields in relations • null means the data in undefined. • Any missing or invalid fields are loaded as null Page 11 © Hortonworks Inc. 2012
12.
Input and Output •
A = LOAD ‘file’ USING PigStorage(‘,’) AS (data1:datatype1, data2:datatype2.. ) • STORE A INTO ‘file2’ using PigStorage(‘,’) • DUMP A • DESCRIBE A Page 12 © Hortonworks Inc. 2012
13.
Relational Operations • GROUP
A BY A.age; • FOREACH B GENERATE A.$1 – A.$3; • FILTER A BY A.$1 > 10; • ORDER A BY A.$1 DESC, A.$2; • JOIN A BY A.$1, B BY B.$5; • JOIN A BY (A.$1, A.$5) LEFT OUTER, B BY (B.$2, B.$3); Page 13 © Hortonworks Inc. 2012
14.
• LIMIT A
10; • SAMPLE A 0.1; • GROUP A BY A.$1 PARALLEL 10; • User Definited Functions AND piggybank register 'your_path_to_piggybank/piggybank.jar'; divs = load 'NYSE_dividends’; backwards = foreach divs generate org.apache.pig.piggybank.evaluation.string.Reverse($1); Page 14 © Hortonworks Inc. 2012
15.
• Invoking static
java methods • FLATTEN • TOKENIZE Page 15 © Hortonworks Inc. 2012
16.
0.10.0 Features • Ruby
UDFs • PigStorage with schemas • Additional UDF improvements • Language Improvements – Boolean type – otherwise – Maps, Bags and Tuples can be generated without UDFs – Register collection of jars • Performance Improvements Page 16 © Hortonworks Inc. 2012
17.
Current work in
progress • DataTime datatype • CUBE, ROLLUP and RANK operators • Native support for windows • Lower memory footprint Page 17 © Hortonworks Inc. 2012
18.
References • Labs are
from – https://github.com/alanfgates/programmingpig – https://github.com/michiard/CLOUDS-LAB • 0.10.0 Features and current WIP – http://www.slideshare.net/hortonworks/pig-out-to-hadoop by Alan Gates Page 18 © Hortonworks Inc. 2012
19.
Hortonworks Training
The expert source for Apache Hadoop training & certification Role-based Developer and Administration training – Coursework built and maintained by the core Apache Hadoop development team. – The “right” course, with the most extensive and realistic hands-on materials – Provide an immersive experience into real-world Hadoop scenarios – Public and Private courses available Comprehensive Apache Hadoop Certification – Become a trusted and valuable Apache Hadoop expert Page 19 © Hortonworks Inc. 2012
20.
Thank You! Questions &
Answers Ravi Mutyala Systems Architect Hortonworks Twitter: @rmutyala www.hortonworks.com Page 20 © Hortonworks Inc. 2012