SlideShare a Scribd company logo
1 of 17
Does the Presidential debates have any
impact on election results?
Ashwath V
Avudaiappan R
Harish A
Sudhakaran S
Suganthy E
Monday, January 18, 2016 Active Learning Project 1
Big Data (Hadoop - Map Reduce, Pig, Hive) – for input text file processing
R & Tableau – for Visualization
Excel – for output interpretation
Tools and Technologies used
Reference
http://www.debates.org/index.php?page=debate-transcripts
Monday, January 18, 2016 Active Learning Project 2
Monday, January 18, 2016 3
Does the Presidential debates have any impact on
election results?
Objective:
• Finding out the word frequency and distribution for different parts of speech.
• Categorization of Positive and Negative words from the speech and if it has any impact
on election results
Our analysis will be helpful, If you are any of the
following:
• Do you want to contest in the next Presidential Elections?
• Would you like to know which of the two candidates fared well after the debate?
• Are you a political or a media person?
Sources:
www.debates.org
Well, my first job as
commander in chief, is to keep
the American people safe.
We need strong leadership. I'd
like to be that leader with your
support. I'll work with you.
Monday, January 18, 2016 Active Learning Project 4
MAPPER -Code to find word count
/**The mapper reads one line at the time, splits it into an array of single words and emits every word to
the reducers with the value of 1**/
public static class TopNMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
private String tokens = "[_|$#<>^=[]*/,;,.-:()?!"']";
@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
{
String cleanLine = value.toString().toLowerCase().replaceAll(tokens, " ");
StringTokenizer itr = new StringTokenizer(cleanLine);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken().trim());
context.write(word, one);
}}}
MAP-REDUCE
Monday, January 18, 2016 Active Learning Project 5
REDUCER -Code to find word count
/** The reducer retrieves every word and puts it into a Map: if the word already exists in the
map, increments its value, otherwise sets it to 1**/
public static class TopNReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private Map<Text, IntWritable> countMap = new HashMap<>();
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
// **computes the number of occurrences of a single word**/
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
Monday, January 18, 2016 Active Learning Project 6
OUTPUT
Monday, January 18, 2016 Active Learning Project 7
Code to find word count
/**Loading data**/
debate_dataset = LOAD '/home/cloudera/Desktop/debate.txt' AS (line:chararray);
/** Extract words from each line and put them into a pig bag**/
/** datatype, then flatten the bag to get one word on each row**/
words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;
/**filter out any words that are just white spaces**/
filtered_words = FILTER words BY word MATCHES 'w+';
/**create a group for each word**/
word_groups = GROUP filtered_words BY word;
/**count the entries in each group**/
word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count,
group AS word;
/**order the records by count**/
ordered_word_count = ORDER word_count BY count DESC;
STORE ordered_word_count INTO '/home/cloudera/Desktop/wordcountoutput’;
PIG
Monday, January 18, 2016 Active Learning Project 8
OUTPUT
Monday, January 18, 2016 Active Learning Project 9
I wonder who spoke more
positively?? How to find it??
Monday, January 18, 2016 Active Learning Project 10
Monday, January 18, 2016 Active Learning Project 11
Map Reduce code
public class CompareTwoFiles {
public static class Map extends
Mapper<LongWritable, Text, LongWritable, Text> {
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
context.write(key, value);
}
} public static class Map2 extends
Mapper<LongWritable, Text, LongWritable, Text> {
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
context.write(key, value);
}
}
public static class Reduce extends
Reducer<LongWritable, Text, LongWritable, Text> {
@Override
public void reduce(LongWritable key, Iterable<Text> values,
Context context) throws IOException, InterruptedException
Monday, January 18, 2016 Active Learning Project 12
Map Reduce code
{
String[] lines = new String[2];
int i = 0;
for (Text text : values) { lines[i] = text.toString(); i++; }
if (lines[0].equals(lines[1])) { context.write(key, new Text("same")); }
else { context.write(key, new Text(lines[0] + " vs " + lines[1])); } } }
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:8020"); Job job = new Job(conf);
job.setJarByClass(CompareTwoFiles.class);
job.setJobName("Compare Two Files and Identify the Difference");
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, Map.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, Map2.class);
job.waitForCompletion(true);
}
}
Monday, January 18, 2016 Active Learning Project 13
Positive words Output
Monday, January 18, 2016 Active Learning Project 14
Obama Romney
Series1 194 153
194
153
0
50
100
150
200
250POSITIVEWORDCOUNT
Obama VS Romney
Final Interpretation
Our Analysis reveals that positive words spoken by Barack Obama is comparatively
higher than Mitt Romney. Coincidentally Obama was elected as the President of
United States of America.
Yes, debates do have an impact on the Election results !
Monday, January 18, 2016 Active Learning Project 15
Positive words cloud
Monday, January 18, 2016 Active Learning Project 16
THANK YOU
Monday, January 18, 2016 Active Learning Project 17

More Related Content

Viewers also liked

EvictedInDaneCounty_final
EvictedInDaneCounty_finalEvictedInDaneCounty_final
EvictedInDaneCounty_final
Ruanda McFerren
 
Dr. Cason's Resume December
Dr. Cason's Resume DecemberDr. Cason's Resume December
Dr. Cason's Resume December
Jo Ann Cason
 
Curriculum Vitae of Roth Channy
Curriculum Vitae of Roth ChannyCurriculum Vitae of Roth Channy
Curriculum Vitae of Roth Channy
Roth Channy
 

Viewers also liked (13)

Didáctica crítica
Didáctica críticaDidáctica crítica
Didáctica crítica
 
EvictedInDaneCounty_final
EvictedInDaneCounty_finalEvictedInDaneCounty_final
EvictedInDaneCounty_final
 
Using pattern stories in the world language classroom helena curtain
Using pattern stories in the world language classroom helena curtainUsing pattern stories in the world language classroom helena curtain
Using pattern stories in the world language classroom helena curtain
 
new CV Rehan
new CV Rehannew CV Rehan
new CV Rehan
 
Dr. Cason's Resume December
Dr. Cason's Resume DecemberDr. Cason's Resume December
Dr. Cason's Resume December
 
How to know if your business idea will succeed
How to know if your business idea will succeedHow to know if your business idea will succeed
How to know if your business idea will succeed
 
Pkk 2
Pkk 2Pkk 2
Pkk 2
 
El amor y la locura
El amor y la locuraEl amor y la locura
El amor y la locura
 
Curriculum Vitae of Roth Channy
Curriculum Vitae of Roth ChannyCurriculum Vitae of Roth Channy
Curriculum Vitae of Roth Channy
 
PRINT ADS 5
PRINT ADS 5PRINT ADS 5
PRINT ADS 5
 
401 norma sanitaria
401  norma sanitaria401  norma sanitaria
401 norma sanitaria
 
Orientation
OrientationOrientation
Orientation
 
конспект уроку 11 клас eating out
конспект уроку 11 клас eating outконспект уроку 11 клас eating out
конспект уроку 11 клас eating out
 

Similar to Does presidential debates have impact on election results?

Similar to Does presidential debates have impact on election results? (20)

Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
TheEdge10 : Big Data is Here - Hadoop to the Rescue
TheEdge10 : Big Data is Here - Hadoop to the RescueTheEdge10 : Big Data is Here - Hadoop to the Rescue
TheEdge10 : Big Data is Here - Hadoop to the Rescue
 
Hadoop
HadoopHadoop
Hadoop
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
 
Big Data otimizado: Arquiteturas eficientes para construção de Pipelines MapR...
Big Data otimizado: Arquiteturas eficientes para construção de Pipelines MapR...Big Data otimizado: Arquiteturas eficientes para construção de Pipelines MapR...
Big Data otimizado: Arquiteturas eficientes para construção de Pipelines MapR...
 
Map reduce模型
Map reduce模型Map reduce模型
Map reduce模型
 
MapReduce wordcount program
MapReduce wordcount program MapReduce wordcount program
MapReduce wordcount program
 
Bootcamp Code Notes - Akshansh Chaudhary
Bootcamp Code Notes - Akshansh ChaudharyBootcamp Code Notes - Akshansh Chaudhary
Bootcamp Code Notes - Akshansh Chaudhary
 
Big data shim
Big data shimBig data shim
Big data shim
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy DyagilevMonads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
 
Study material ip class 12th
Study material ip class 12thStudy material ip class 12th
Study material ip class 12th
 
Spark overview
Spark overviewSpark overview
Spark overview
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 

Recently uploaded

NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
a8om7o51
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
dq9vz1isj
 

Recently uploaded (20)

Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 

Does presidential debates have impact on election results?

  • 1. Does the Presidential debates have any impact on election results? Ashwath V Avudaiappan R Harish A Sudhakaran S Suganthy E Monday, January 18, 2016 Active Learning Project 1
  • 2. Big Data (Hadoop - Map Reduce, Pig, Hive) – for input text file processing R & Tableau – for Visualization Excel – for output interpretation Tools and Technologies used Reference http://www.debates.org/index.php?page=debate-transcripts Monday, January 18, 2016 Active Learning Project 2
  • 4. Does the Presidential debates have any impact on election results? Objective: • Finding out the word frequency and distribution for different parts of speech. • Categorization of Positive and Negative words from the speech and if it has any impact on election results Our analysis will be helpful, If you are any of the following: • Do you want to contest in the next Presidential Elections? • Would you like to know which of the two candidates fared well after the debate? • Are you a political or a media person? Sources: www.debates.org Well, my first job as commander in chief, is to keep the American people safe. We need strong leadership. I'd like to be that leader with your support. I'll work with you. Monday, January 18, 2016 Active Learning Project 4
  • 5. MAPPER -Code to find word count /**The mapper reads one line at the time, splits it into an array of single words and emits every word to the reducers with the value of 1**/ public static class TopNMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); private String tokens = "[_|$#<>^=[]*/,;,.-:()?!"']"; @Override public void map(Object key, Text value, Context context) throws IOException, InterruptedException { String cleanLine = value.toString().toLowerCase().replaceAll(tokens, " "); StringTokenizer itr = new StringTokenizer(cleanLine); while (itr.hasMoreTokens()) { word.set(itr.nextToken().trim()); context.write(word, one); }}} MAP-REDUCE Monday, January 18, 2016 Active Learning Project 5
  • 6. REDUCER -Code to find word count /** The reducer retrieves every word and puts it into a Map: if the word already exists in the map, increments its value, otherwise sets it to 1**/ public static class TopNReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private Map<Text, IntWritable> countMap = new HashMap<>(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { // **computes the number of occurrences of a single word**/ int sum = 0; for (IntWritable val : values) { sum += val.get(); } Monday, January 18, 2016 Active Learning Project 6
  • 7. OUTPUT Monday, January 18, 2016 Active Learning Project 7
  • 8. Code to find word count /**Loading data**/ debate_dataset = LOAD '/home/cloudera/Desktop/debate.txt' AS (line:chararray); /** Extract words from each line and put them into a pig bag**/ /** datatype, then flatten the bag to get one word on each row**/ words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word; /**filter out any words that are just white spaces**/ filtered_words = FILTER words BY word MATCHES 'w+'; /**create a group for each word**/ word_groups = GROUP filtered_words BY word; /**count the entries in each group**/ word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word; /**order the records by count**/ ordered_word_count = ORDER word_count BY count DESC; STORE ordered_word_count INTO '/home/cloudera/Desktop/wordcountoutput’; PIG Monday, January 18, 2016 Active Learning Project 8
  • 9. OUTPUT Monday, January 18, 2016 Active Learning Project 9
  • 10. I wonder who spoke more positively?? How to find it?? Monday, January 18, 2016 Active Learning Project 10
  • 11. Monday, January 18, 2016 Active Learning Project 11
  • 12. Map Reduce code public class CompareTwoFiles { public static class Map extends Mapper<LongWritable, Text, LongWritable, Text> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { context.write(key, value); } } public static class Map2 extends Mapper<LongWritable, Text, LongWritable, Text> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { context.write(key, value); } } public static class Reduce extends Reducer<LongWritable, Text, LongWritable, Text> { @Override public void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException Monday, January 18, 2016 Active Learning Project 12
  • 13. Map Reduce code { String[] lines = new String[2]; int i = 0; for (Text text : values) { lines[i] = text.toString(); i++; } if (lines[0].equals(lines[1])) { context.write(key, new Text("same")); } else { context.write(key, new Text(lines[0] + " vs " + lines[1])); } } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); conf.set("fs.default.name", "hdfs://localhost:8020"); Job job = new Job(conf); job.setJarByClass(CompareTwoFiles.class); job.setJobName("Compare Two Files and Identify the Difference"); FileOutputFormat.setOutputPath(job, new Path(args[2])); job.setReducerClass(Reduce.class); job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(Text.class); MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, Map.class); MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, Map2.class); job.waitForCompletion(true); } } Monday, January 18, 2016 Active Learning Project 13
  • 14. Positive words Output Monday, January 18, 2016 Active Learning Project 14
  • 15. Obama Romney Series1 194 153 194 153 0 50 100 150 200 250POSITIVEWORDCOUNT Obama VS Romney Final Interpretation Our Analysis reveals that positive words spoken by Barack Obama is comparatively higher than Mitt Romney. Coincidentally Obama was elected as the President of United States of America. Yes, debates do have an impact on the Election results ! Monday, January 18, 2016 Active Learning Project 15
  • 16. Positive words cloud Monday, January 18, 2016 Active Learning Project 16
  • 17. THANK YOU Monday, January 18, 2016 Active Learning Project 17