Pig WordCount PPT.pptx

WordCount Program Using
Apache Pig

 input1 = load '/user/cloudera/pig_wc_input/pig_word.txt' as (line:chararray);
 dump input1;
 words = foreach input1 generate FLATTEN(TOKENIZE(line)) as word;
 dump words;
 word_groups = GROUP words by word;
 dump word_groups;
 word_count = foreach word_groups generate group, COUNT(words);
 dump word_count;
 ordered_word_count = order word_count by group desc;
 dump ordered_word_count;
 store ordered_word_count into /user/cloudera/pig_wordcount;

• Input Data:
In MapReduce word count example, we find out the frequency of each word.
Here, the role of Mapper is to map the keys to the existing values and the role of
Reducer is to aggregate the keys of common values.
So, everything is represented in the form of Key-value pair.

• Open the grunt shell by typing in pig command.

• Load the input data from hdfs .
• Following is content of pig_word.txt file as seen from hue browser.

• Use dump command to display the input data.

• Generate list of words from the input data.

• Use dump command to display the list of words.

• Group the list of words i.e same words are grouped together.

• Display the grouped list of words.

• Count the number of times each word is repeated .

• Display the list of words along with their word_count values.

• Sort the words either in ascending or descending order.

• Store the final word count list into hdfs .

• Exit the grunt shell and check whether the file is saved in hdfs.

Pig WordCount PPT.pptx

Recommended

Recommended

More Related Content

Similar to Pig WordCount PPT.pptx

Similar to Pig WordCount PPT.pptx (20)

Recently uploaded

Recently uploaded (20)

Pig WordCount PPT.pptx