SlideShare a Scribd company logo
Pig is a high level scripting language that is used with Apache Hadoop. Pig excels at
describing data analysis problems as data flows. Pig is complete in that you can do all the
required data manipulations in Apache Hadoop with Pig. In addition through the User
Defined Functions(UDF) facility in Pig you can have Pig invoke code in many languages like
JRuby, Jython and Java. Conversely you can execute Pig scripts in other languages. The result
is that you can use Pig as a component to build larger and more complex applications that
tackle real business problems.
Pig scripts are translated into a series of MapReduce jobs that are run on the Apache
Hadoop cluster. As part of the translation the Pig interpreter does perform optimizations to
speed execution on Apache Hadoop. We are going to write a Pig script that will do our data
analysis task.
Lets solve a simple exercise on how to run a pig script on hadoop platform:
Data Downloading
You can download the data file in csv zip form from:
http://hortonassets.s3.amazonaws.com/pig/lahman591-csv.zip
Once you have the file you will need to unzip the file into a directory. We will be uploading
just the master.csv and batting.csv files.
Uploading of the data
Although the link from where we are solving the program is an easy guide. We would be
using the hue to upload the files for execution of the program.
Steps to upload
a) Start VM Box and open SSH Terminal
b) Log on to the address http://127.0.0.1:8000/
c) Click on file browser option followed by view then click on hue
d) Upload 2 files batting.csv and master.csv.
Screenshot is attached below
Write the pig script and execute the program
Click on the pig icon on the header and a new script will open. Name this script as any name.
Here it is T2 and Code is written in the notepad type file.
Code:
load 'Batting.csv' using PigStorage(',');
This line will load the file Batting.Csv.Through pig storage we are passing the comma as the
delimiter
Raw runs = FILTER batting BY $1>0;
To filter the row the fist line would be used
runs = FOREACH raw runs GENERATE $0 as playerID , $1 as year , $8 as runs;
Foreach would iterate through batting data object. Generate pulls out selected fields and
assigns name,
grp_data = GROUP runs by (year);
This line would group runs by year.
max_runs = FOREACH grp_data GENERATE group as grp,MAX(runs.runs) as max_runs;
This line would find out the maximum runs
join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);
join_data = FOREACH join_max_run GENERATE $0 as year, $2 as
playerID, $1 as runs;
DUMP join_data;
We are combining and generating the dataset year, player id and maximum runs and
dumping the output to the screen
Click on the save and execute to compile the program and run
You can check the progress in the Job Browser Window.
This would take around 10-15 Minutes to work around.
Output
Pig

More Related Content

What's hot

Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010
Yahoo Developer Network
 
Apache Hive
Apache HiveApache Hive
Apache Hive
Abhishek Gautam
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
vishal choudhary
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Takrim Ul Islam Laskar
 
Hadoop-BigData
Hadoop-BigDataHadoop-BigData
Hadoop-BigData
Gigin Krishnan
 
Apache hive
Apache hiveApache hive
Apache hive
Vaibhav Kadu
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
Padma shree. T
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
Uday Vakalapudi
 
Hive Hadoop
Hive HadoopHive Hadoop
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
Arvind Kumar
 
Apache Hive
Apache HiveApache Hive
Apache Hive
tusharsinghal58
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
LAB1 Loading Data into HDFS.docx
LAB1 Loading Data into HDFS.docxLAB1 Loading Data into HDFS.docx
LAB1 Loading Data into HDFS.docx
Karim Fathallah
 
Big Data & Hadoop Data Analysis
Big Data & Hadoop Data AnalysisBig Data & Hadoop Data Analysis
Big Data & Hadoop Data Analysis
Koushik Mondal
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010
Yahoo Developer Network
 

What's hot (20)

Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
Hadoop-BigData
Hadoop-BigDataHadoop-BigData
Hadoop-BigData
 
Apache hive
Apache hiveApache hive
Apache hive
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
Hive Hadoop
Hive HadoopHive Hadoop
Hive Hadoop
 
Hive training
Hive trainingHive training
Hive training
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl PresentationJanuary 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
LAB1 Loading Data into HDFS.docx
LAB1 Loading Data into HDFS.docxLAB1 Loading Data into HDFS.docx
LAB1 Loading Data into HDFS.docx
 
Big Data & Hadoop Data Analysis
Big Data & Hadoop Data AnalysisBig Data & Hadoop Data Analysis
Big Data & Hadoop Data Analysis
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010
 

Similar to Pig

An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache Pig
Sachin Vakkund
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
vishal choudhary
 
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateApache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateYahoo Developer Network
 
Pig
PigPig
IRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache PigIRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET Journal
 
lecturte 5. Hgfjhffjyy to the data will be 1.ppt
lecturte 5. Hgfjhffjyy to the data will be 1.pptlecturte 5. Hgfjhffjyy to the data will be 1.ppt
lecturte 5. Hgfjhffjyy to the data will be 1.ppt
YashJadhav496388
 
Hadoop tutorial hand-outs
Hadoop tutorial hand-outsHadoop tutorial hand-outs
Hadoop tutorial hand-outspardhavi reddy
 
unit-4-apache pig-.pdf
unit-4-apache pig-.pdfunit-4-apache pig-.pdf
unit-4-apache pig-.pdf
ssuser92282c
 
Unit 4-apache pig
Unit 4-apache pigUnit 4-apache pig
Unit 4-apache pig
vishal choudhary
 
Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014rpbrehm
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
Antonio Silveira
 
A intro to (hosted) Shiny Apps
A intro to (hosted) Shiny AppsA intro to (hosted) Shiny Apps
A intro to (hosted) Shiny Apps
Daniel Koller
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
Jason Shao
 
R server and spark
R server and sparkR server and spark
R server and spark
BAINIDA
 
Proyecto de microcontroladores
Proyecto de microcontroladoresProyecto de microcontroladores
Proyecto de microcontroladores
Carlos Molina
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
stratapps
 
Apache Pig: A big data processor
Apache Pig: A big data processorApache Pig: A big data processor
Apache Pig: A big data processor
Tushar B Kute
 
An Overview of Hadoop
An Overview of HadoopAn Overview of Hadoop
An Overview of Hadoop
Asif Ali
 

Similar to Pig (20)

An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache Pig
 
January 2011 HUG: Pig Presentation
January 2011 HUG: Pig PresentationJanuary 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
 
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan GateApache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
Apache Hadoop India Summit 2011 talk "Pig - Making Hadoop Easy" by Alan Gate
 
Pig
PigPig
Pig
 
IRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache PigIRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache Pig
 
lecturte 5. Hgfjhffjyy to the data will be 1.ppt
lecturte 5. Hgfjhffjyy to the data will be 1.pptlecturte 5. Hgfjhffjyy to the data will be 1.ppt
lecturte 5. Hgfjhffjyy to the data will be 1.ppt
 
Hadoop tutorial hand-outs
Hadoop tutorial hand-outsHadoop tutorial hand-outs
Hadoop tutorial hand-outs
 
unit-4-apache pig-.pdf
unit-4-apache pig-.pdfunit-4-apache pig-.pdf
unit-4-apache pig-.pdf
 
Unit 4-apache pig
Unit 4-apache pigUnit 4-apache pig
Unit 4-apache pig
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
A intro to (hosted) Shiny Apps
A intro to (hosted) Shiny AppsA intro to (hosted) Shiny Apps
A intro to (hosted) Shiny Apps
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
R server and spark
R server and sparkR server and spark
R server and spark
 
Proyecto de microcontroladores
Proyecto de microcontroladoresProyecto de microcontroladores
Proyecto de microcontroladores
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
 
Apache Pig: A big data processor
Apache Pig: A big data processorApache Pig: A big data processor
Apache Pig: A big data processor
 
An Overview of Hadoop
An Overview of HadoopAn Overview of Hadoop
An Overview of Hadoop
 

Recently uploaded

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 

Recently uploaded (20)

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 

Pig

  • 1. Pig is a high level scripting language that is used with Apache Hadoop. Pig excels at describing data analysis problems as data flows. Pig is complete in that you can do all the required data manipulations in Apache Hadoop with Pig. In addition through the User Defined Functions(UDF) facility in Pig you can have Pig invoke code in many languages like JRuby, Jython and Java. Conversely you can execute Pig scripts in other languages. The result is that you can use Pig as a component to build larger and more complex applications that tackle real business problems. Pig scripts are translated into a series of MapReduce jobs that are run on the Apache Hadoop cluster. As part of the translation the Pig interpreter does perform optimizations to speed execution on Apache Hadoop. We are going to write a Pig script that will do our data analysis task. Lets solve a simple exercise on how to run a pig script on hadoop platform: Data Downloading You can download the data file in csv zip form from: http://hortonassets.s3.amazonaws.com/pig/lahman591-csv.zip Once you have the file you will need to unzip the file into a directory. We will be uploading just the master.csv and batting.csv files. Uploading of the data Although the link from where we are solving the program is an easy guide. We would be using the hue to upload the files for execution of the program. Steps to upload a) Start VM Box and open SSH Terminal b) Log on to the address http://127.0.0.1:8000/ c) Click on file browser option followed by view then click on hue d) Upload 2 files batting.csv and master.csv.
  • 2. Screenshot is attached below Write the pig script and execute the program Click on the pig icon on the header and a new script will open. Name this script as any name. Here it is T2 and Code is written in the notepad type file. Code: load 'Batting.csv' using PigStorage(',');
  • 3. This line will load the file Batting.Csv.Through pig storage we are passing the comma as the delimiter Raw runs = FILTER batting BY $1>0; To filter the row the fist line would be used runs = FOREACH raw runs GENERATE $0 as playerID , $1 as year , $8 as runs; Foreach would iterate through batting data object. Generate pulls out selected fields and assigns name, grp_data = GROUP runs by (year); This line would group runs by year. max_runs = FOREACH grp_data GENERATE group as grp,MAX(runs.runs) as max_runs; This line would find out the maximum runs join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs); join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs; DUMP join_data; We are combining and generating the dataset year, player id and maximum runs and dumping the output to the screen Click on the save and execute to compile the program and run You can check the progress in the Job Browser Window. This would take around 10-15 Minutes to work around. Output