SlideShare a Scribd company logo
“25th CSI Karnataka Student Convention”


      Map/Reduce Algorithm
Performance Analysis in Computing
          Frequency of
             Tweets

                         Shravanthi U M & Nagashree N
                  Information Science and Engineering
           Bangalore Institute of Technology, Bangalore
AGENDA

   Data
   Big Data
   Twitter and Big Data
   Classical Approach
   Why hadoop Framework
   Map/Reduce
   Our Proposed Approach
   Conclusion
   Q&A
Its all About Data
Big Data

Data sets whose size is beyond the ability of commonly
used software tools to capture, manage, and process the
data within a tolerable elapsed time.
Big data sizes are a constantly moving target currently
ranging from a few dozen terabytes to many petabytes
of data in a single data set.
 Ex : Web logs , Social Network data , Internet Search
Index etc.
“BigData”
Classical Approach
egrep _____ files[0-1000]



     egrep


                                          file0

     egrep                   Remote
                            FileSystem
                                         file1000



     egrep
Hadoop Framework
   Fault tolerance
   Streaming data access - HDFS
    emphasizes high throughput.
   Extreme scalability - HDFS will
    scale to petabytes; Example: at
    Facebook.
   Portability - HDFS is portable
    across operating systems.
   Write once read many times
   Locality of computation -move
    the program near to the data
HDFS
egrep _____ files[0-1000]
Move Computation to Data




       egrep
                                            f0            f_




                            40 nodes/rack
                                                  f_ f_

                                            f_                  file0
                                                          f3
                                                 f1000
       egrep
                                            f2
                                                  f_
                                                               file1000



       egrep
Map/Reduce
            Input                           Input
           Any file              All <key, value> pairs with
     (e.g. documents)              the same key grouped
                                (e.g. all <word, count> pairs
                                    where word = “the”)




          Map()                         Reduce()




                                           Output
          Output
                                          Anything
Stream of <key, value> pairs
                                  (e.g. sum of counts for a
 (e.g. <word, count> pairs)
                                        specific word)
Advantages:
   Fine-grained Map and Reduce tasks
    ◦ Improved load balancing
    ◦ Faster recovery from failed tasks
   Automatic re-execution on failure
    ◦ In a large cluster, some nodes are always slow or flaky
    ◦ Framework re-executes failed tasks
   Locality optimizations
    ◦ Map-Reduce queries HDFS for locations of input data
    ◦ When possible, map tasks are scheduled close to the
      inputs (local access, local rack access, remote rack
      access)
What did we do…
Python code to extract tweets using
“twitter.Search” API
for i in range(10):
      turl=urllib.urlopen("http://search.twitter.com/
      search.atom?lang=en&q="+AnnaHazare+"&rpp=100&
      page="+str(i))

     tweettext=re.findall('<updated>(.*?)</updated>',
                turl.read())
     print "Got the Page No. ",(i+1)
     for i in tweettext:
            tweets.append(i)
                f.write(i+"n")
Extracted DATA
Map/Reduce Impelmentation

               <6/4/11,   1>
               <6/4/11,   1>
               <6/4/11,   1>   Reduce()
               <6/4/11,   1>
               <6/4/11,   1>
                                          Server 1 Final
<6/4/11, 1>                                Result File
<6/4/11, 1>
<6/4/11, 1>
<6/6/11, 1>
               <6/6/11,1>                 6/4/11    85
<6/6/11, 1>
               <6/6/11,1>
               <6/6/11,1>
                               Reduce()   6/6/11    36
<6/6/11, 1>
<15/8/11, 1>
<15/8/11, 1>                              15/8/11   125

               <15/8/11,1>
               <15/8/11,1>
               <15/8/11,1>
                               Reduce()
What’s UNIQUE…

 Business Analytics - Considerable approach to spot
  popularity of “New Product”
 Sentimental Analysis
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets

More Related Content

What's hot

Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
Rupak Roy
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R Studio
Rupak Roy
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
ffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsffbase, statistical functions for large datasets
ffbase, statistical functions for large datasets
Edwin de Jonge
 
Heapify algorithm
Heapify algorithmHeapify algorithm
Heapify algorithm
Sikandar Pandit
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
Rupak Roy
 
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoop
Chirag Ahuja
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
Zheng Shao
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
Yue Chen
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010ragho
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
Rupak Roy
 
One Database To Rule 'em All
One Database To Rule 'em AllOne Database To Rule 'em All
One Database To Rule 'em All
Stefanie Janine Stölting
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
Uday Vakalapudi
 
1 Installing & getting started with R
1 Installing & getting started with R1 Installing & getting started with R
1 Installing & getting started with R
naroranisha
 
Apache scoop overview
Apache scoop overviewApache scoop overview
Apache scoop overviewNisanth Simon
 
Big Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsBig Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC Systems
Fujio Turner
 
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
CloudxLab
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
Rupak Roy
 
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 

What's hot (20)

Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R Studio
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
ffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsffbase, statistical functions for large datasets
ffbase, statistical functions for large datasets
 
Heapify algorithm
Heapify algorithmHeapify algorithm
Heapify algorithm
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoop
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
One Database To Rule 'em All
One Database To Rule 'em AllOne Database To Rule 'em All
One Database To Rule 'em All
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
1 Installing & getting started with R
1 Installing & getting started with R1 Installing & getting started with R
1 Installing & getting started with R
 
Apache scoop overview
Apache scoop overviewApache scoop overview
Apache scoop overview
 
Big Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsBig Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC Systems
 
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
 

Viewers also liked

2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
anh tuan
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
Adrian Florea
 
Hive
HiveHive
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
VIKAS KATARE
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
Ashraf Uddin
 
Introduction to Hive for Hadoop
Introduction to Hive for HadoopIntroduction to Hive for Hadoop
Introduction to Hive for Hadoop
ryanlecompte
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingMohammad Mustaqeem
 

Viewers also liked (7)

2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
Hive
HiveHive
Hive
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
 
Introduction to Hive for Hadoop
Introduction to Hive for HadoopIntroduction to Hive for Hadoop
Introduction to Hive for Hadoop
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 

Similar to CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets

Hadoop
HadoopHadoop
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Chester Chen
 
An introduction to Hadoop for large scale data analysis
An introduction to Hadoop for large scale data analysisAn introduction to Hadoop for large scale data analysis
An introduction to Hadoop for large scale data analysisAbhijit Sharma
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & ZingLong Dao
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
Amund Tveit
 
Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5
The HDF-EOS Tools and Information Center
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala
 
Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache Hadoop
Svetlin Nakov
 
Zenith it-hadoop-training
Zenith it-hadoop-trainingZenith it-hadoop-training
Zenith it-hadoop-training
Zenith It Solutions
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
Jim Dowling
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zingzingopen
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Euangelos Linardos
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
Scott Leberknight
 
Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)
Cloudera, Inc.
 
Easy R
Easy REasy R
Easy R
Ajay Ohri
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
Wisely chen
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoopjeffturner
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
Gert Drapers
 

Similar to CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets (20)

Hadoop
HadoopHadoop
Hadoop
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
An introduction to Hadoop for large scale data analysis
An introduction to Hadoop for large scale data analysisAn introduction to Hadoop for large scale data analysis
An introduction to Hadoop for large scale data analysis
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & Zing
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache Hadoop
 
Zenith it-hadoop-training
Zenith it-hadoop-trainingZenith it-hadoop-training
Zenith it-hadoop-training
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)
 
Easy R
Easy REasy R
Easy R
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 

Recently uploaded

CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 

Recently uploaded (20)

CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 

CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets

  • 1. “25th CSI Karnataka Student Convention” Map/Reduce Algorithm Performance Analysis in Computing Frequency of Tweets Shravanthi U M & Nagashree N Information Science and Engineering Bangalore Institute of Technology, Bangalore
  • 2. AGENDA  Data  Big Data  Twitter and Big Data  Classical Approach  Why hadoop Framework  Map/Reduce  Our Proposed Approach  Conclusion  Q&A
  • 4. Big Data Data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set. Ex : Web logs , Social Network data , Internet Search Index etc.
  • 6. Classical Approach egrep _____ files[0-1000] egrep file0 egrep Remote FileSystem file1000 egrep
  • 7. Hadoop Framework  Fault tolerance  Streaming data access - HDFS emphasizes high throughput.  Extreme scalability - HDFS will scale to petabytes; Example: at Facebook.  Portability - HDFS is portable across operating systems.  Write once read many times  Locality of computation -move the program near to the data
  • 8. HDFS egrep _____ files[0-1000] Move Computation to Data egrep f0 f_ 40 nodes/rack f_ f_ f_ file0 f3 f1000 egrep f2 f_ file1000 egrep
  • 9. Map/Reduce Input Input Any file All <key, value> pairs with (e.g. documents) the same key grouped (e.g. all <word, count> pairs where word = “the”) Map() Reduce() Output Output Anything Stream of <key, value> pairs (e.g. sum of counts for a (e.g. <word, count> pairs) specific word)
  • 10. Advantages:  Fine-grained Map and Reduce tasks ◦ Improved load balancing ◦ Faster recovery from failed tasks  Automatic re-execution on failure ◦ In a large cluster, some nodes are always slow or flaky ◦ Framework re-executes failed tasks  Locality optimizations ◦ Map-Reduce queries HDFS for locations of input data ◦ When possible, map tasks are scheduled close to the inputs (local access, local rack access, remote rack access)
  • 11. What did we do… Python code to extract tweets using “twitter.Search” API for i in range(10): turl=urllib.urlopen("http://search.twitter.com/ search.atom?lang=en&q="+AnnaHazare+"&rpp=100& page="+str(i)) tweettext=re.findall('<updated>(.*?)</updated>', turl.read()) print "Got the Page No. ",(i+1) for i in tweettext: tweets.append(i) f.write(i+"n")
  • 13. Map/Reduce Impelmentation <6/4/11, 1> <6/4/11, 1> <6/4/11, 1> Reduce() <6/4/11, 1> <6/4/11, 1> Server 1 Final <6/4/11, 1> Result File <6/4/11, 1> <6/4/11, 1> <6/6/11, 1> <6/6/11,1> 6/4/11 85 <6/6/11, 1> <6/6/11,1> <6/6/11,1> Reduce() 6/6/11 36 <6/6/11, 1> <15/8/11, 1> <15/8/11, 1> 15/8/11 125 <15/8/11,1> <15/8/11,1> <15/8/11,1> Reduce()
  • 14. What’s UNIQUE…  Business Analytics - Considerable approach to spot popularity of “New Product”  Sentimental Analysis