SlideShare a Scribd company logo
WordCount Program Using
Apache Pig
 input1 = load '/user/cloudera/pig_wc_input/pig_word.txt' as (line:chararray);
 dump input1;
 words = foreach input1 generate FLATTEN(TOKENIZE(line)) as word;
 dump words;
 word_groups = GROUP words by word;
 dump word_groups;
 word_count = foreach word_groups generate group, COUNT(words);
 dump word_count;
 ordered_word_count = order word_count by group desc;
 dump ordered_word_count;
 store ordered_word_count into /user/cloudera/pig_wordcount;
• Input Data:
In MapReduce word count example, we find out the frequency of each word.
Here, the role of Mapper is to map the keys to the existing values and the role of
Reducer is to aggregate the keys of common values.
So, everything is represented in the form of Key-value pair.
• Open the grunt shell by typing in pig command.
• Load the input data from hdfs .
• Following is content of pig_word.txt file as seen from hue browser.
• Use dump command to display the input data.
• Generate list of words from the input data.
• Use dump command to display the list of words.
• Group the list of words i.e same words are grouped together.
• Display the grouped list of words.
• Count the number of times each word is repeated .
• Display the list of words along with their word_count values.
• Sort the words either in ascending or descending order.
• Store the final word count list into hdfs .
• Exit the grunt shell and check whether the file is saved in hdfs.
Thank You
Pig WordCount PPT.pptx

More Related Content

Similar to Pig WordCount PPT.pptx

Perl bhargav
Perl bhargavPerl bhargav
Perl bhargav
Bhargav Reddy
 
Hadoop M/R Pig Hive
Hadoop M/R Pig HiveHadoop M/R Pig Hive
Hadoop M/R Pig Hive
zahid-mian
 
Introduction-to-Iteration.pdf
Introduction-to-Iteration.pdfIntroduction-to-Iteration.pdf
Introduction-to-Iteration.pdf
KeshavBandil2
 
Golang online course
Golang online courseGolang online course
Golang online course
bestonlinecoursescoupon
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
Vibrant Technologies & Computers
 
unit-4-apache pig-.pdf
unit-4-apache pig-.pdfunit-4-apache pig-.pdf
unit-4-apache pig-.pdf
ssuser92282c
 
Unit 4-apache pig
Unit 4-apache pigUnit 4-apache pig
Unit 4-apache pig
vishal choudhary
 
Php1
Php1Php1
Perl_Part1
Perl_Part1Perl_Part1
Perl_Part1
Frank Booth
 
Shell Scripts
Shell ScriptsShell Scripts
Shell Scripts
Dr.Ravi
 
The Ring programming language version 1.8 book - Part 45 of 202
The Ring programming language version 1.8 book - Part 45 of 202The Ring programming language version 1.8 book - Part 45 of 202
The Ring programming language version 1.8 book - Part 45 of 202
Mahmoud Samir Fayed
 
Introduction to perl_ a scripting language
Introduction to perl_ a scripting languageIntroduction to perl_ a scripting language
Introduction to perl_ a scripting language
Vamshi Santhapuri
 
The Ring programming language version 1.5.1 book - Part 38 of 180
The Ring programming language version 1.5.1 book - Part 38 of 180The Ring programming language version 1.5.1 book - Part 38 of 180
The Ring programming language version 1.5.1 book - Part 38 of 180
Mahmoud Samir Fayed
 
Perl Basics with Examples
Perl Basics with ExamplesPerl Basics with Examples
Perl Basics with Examples
Nithin Kumar Singani
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
pappupassindia
 
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍
Tae Young Lee
 
Perl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReducePerl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReduce
Pedro Figueiredo
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_Pennonsoft
PennonSoft
 
Unit 2
Unit 2Unit 2
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Ran Silberman
 

Similar to Pig WordCount PPT.pptx (20)

Perl bhargav
Perl bhargavPerl bhargav
Perl bhargav
 
Hadoop M/R Pig Hive
Hadoop M/R Pig HiveHadoop M/R Pig Hive
Hadoop M/R Pig Hive
 
Introduction-to-Iteration.pdf
Introduction-to-Iteration.pdfIntroduction-to-Iteration.pdf
Introduction-to-Iteration.pdf
 
Golang online course
Golang online courseGolang online course
Golang online course
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
 
unit-4-apache pig-.pdf
unit-4-apache pig-.pdfunit-4-apache pig-.pdf
unit-4-apache pig-.pdf
 
Unit 4-apache pig
Unit 4-apache pigUnit 4-apache pig
Unit 4-apache pig
 
Php1
Php1Php1
Php1
 
Perl_Part1
Perl_Part1Perl_Part1
Perl_Part1
 
Shell Scripts
Shell ScriptsShell Scripts
Shell Scripts
 
The Ring programming language version 1.8 book - Part 45 of 202
The Ring programming language version 1.8 book - Part 45 of 202The Ring programming language version 1.8 book - Part 45 of 202
The Ring programming language version 1.8 book - Part 45 of 202
 
Introduction to perl_ a scripting language
Introduction to perl_ a scripting languageIntroduction to perl_ a scripting language
Introduction to perl_ a scripting language
 
The Ring programming language version 1.5.1 book - Part 38 of 180
The Ring programming language version 1.5.1 book - Part 38 of 180The Ring programming language version 1.5.1 book - Part 38 of 180
The Ring programming language version 1.5.1 book - Part 38 of 180
 
Perl Basics with Examples
Perl Basics with ExamplesPerl Basics with Examples
Perl Basics with Examples
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
 
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍
 
Perl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReducePerl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReduce
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_Pennonsoft
 
Unit 2
Unit 2Unit 2
Unit 2
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 

Recently uploaded

Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 

Recently uploaded (20)

Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 

Pig WordCount PPT.pptx

  • 2.  input1 = load '/user/cloudera/pig_wc_input/pig_word.txt' as (line:chararray);  dump input1;  words = foreach input1 generate FLATTEN(TOKENIZE(line)) as word;  dump words;  word_groups = GROUP words by word;  dump word_groups;  word_count = foreach word_groups generate group, COUNT(words);  dump word_count;  ordered_word_count = order word_count by group desc;  dump ordered_word_count;  store ordered_word_count into /user/cloudera/pig_wordcount;
  • 3. • Input Data: In MapReduce word count example, we find out the frequency of each word. Here, the role of Mapper is to map the keys to the existing values and the role of Reducer is to aggregate the keys of common values. So, everything is represented in the form of Key-value pair.
  • 4. • Open the grunt shell by typing in pig command.
  • 5. • Load the input data from hdfs . • Following is content of pig_word.txt file as seen from hue browser.
  • 6. • Use dump command to display the input data.
  • 7.
  • 8. • Generate list of words from the input data.
  • 9. • Use dump command to display the list of words.
  • 10.
  • 11. • Group the list of words i.e same words are grouped together.
  • 12. • Display the grouped list of words.
  • 13.
  • 14. • Count the number of times each word is repeated .
  • 15. • Display the list of words along with their word_count values.
  • 16.
  • 17. • Sort the words either in ascending or descending order.
  • 18.
  • 19. • Store the final word count list into hdfs .
  • 20. • Exit the grunt shell and check whether the file is saved in hdfs.