SlideShare a Scribd company logo
1 of 14
Where does Hadoop
  come handy?
  praveensripati@gmail.com

  www.thecloudavenue.com

      @praveensripati
Agenda


isn't            used as
What's Big Data?

According to Wikipedia (http://en.wikipedia.org/wiki/Big_data) the definition of Big Data is

In information technology, Big Data is a collection of data sets so large and complex that
        it becomes difficult to process using on-hand database management tools.




                                         ~
                                         ~
Hadoop acting like a kernel
Workload distribution across
            installationsPig play an important role
                      Hive n
                                                 in the Hadoop ecosystem




http://www.cloudera.com/blog/2012/09/what-do-real-life-hadoop-workloads-look-like/
Different Big Data scenarios
Scenario               Is Hadoop good for it?   What are the
                                                alternatives?
Real time processing   No                       HStreaming, Twitter Storm
Iterative Processing   No                       Apache Hama, Apache
                                                Giraph, Jung
Adhoc Interactive      No                       Apache Drill, Open
Querying                                        Dremel
Batch Processing       Yes
How have Big Data frameworks
             evolved?
            Google Paper                             Apache Component
                                      There has been 4-5 years gap between
                                          Google releasing a paper and
The Google File System (October, 2003) us seeing an implementation of it.
                                         HDFS (2008 became Apache TLP)


MapReduce: Simplified Data Processing        MapReduce (2008 became Apache TLP)
on Large Clusters (December, 2004)
Bigtable: A Distributed Storage System for   HBase (2010 became Apache TLP),
Structured Data (November, 2006)             Cassandra (2010 became Apache TLP)


Large-scale graph computing at Google        Hama, Giraph (2012 became Apache
(June, 2009)                                 TLP)
Dremel: Interactive Analysis of Web-Scale    Apache Drill (Incubated in August, 2012)
Datasets (2010)
Spanner: Google's Globally-Distributed       ????
Database (September, 2012)
What happens to the data once it
          is stored?
          If you aren’t taking advantage of big data,
                 then you don’t have big data,
                  you have just a pile of data.


Descriptive analytics               Predictive and Prescriptive analytics

       - What happened?                   - Why did it happen?
       - When did it happen?              - When will it happen again?
       - What was it's impact?            - What caused it to happen?
                                          - What can be done to avoid it?
Evolution of Big Data use cases
          Hadoop has evolved from Yahoo and Google
         which are Web 2.0 companies for their massive
                text processing requirements like

                                - log processing
                                 - search index
                              - recommendations
                          - context based advertising

Ads & E-commerce, Astronomy, Social Networks, Bioinformatics/Medical Informatics, Machine Translation,
                       Spatial Data Processing, Information Extraction and Text Processing,
Artificial Intelligence/Machine Learning/Data Mining, Search Query Analysis, Information Retrieval (Search),
                      Spam & Malware Detection, Image and Video Processing, Networking,
                          Simulation, Statistics, Numerical Mathematics, Sets & Graphs


http://atbrox.com/2011/05/16/mapreduce-hadoop-algorithms-in-academic-papers-4th-update-may-2011/
Few of the Big Data use cases
  World Bank kicked an initiative to improve the
  Sanitation and Water that would impact 1B people.
  Neural Networks for Breast Cancer prize by Google.
  Fraud Detection in financial industry.
  Predictive Maintenance scheduling (like aircraft
  engines).
  Walmart and Sears Holding use POS information to
  stock different products in the stores and also for the
  SCM.
  Customer profiling and segmentation for targetted
  campaigns.

Follow the competetions in Kaagle for more use case.
Democratization of Education
    https://www.coursera.org/

    http://www.udacity.com/

    http://www.khanacademy.org/

    http://www.youtube.com/user/nptelhrd/

    https://www.edx.org/




                           to

Machine Learning                       Music
Keep Looking Out




There is a lot more than Hadoop and some of them are mature
                 and some are still evolving !!!
Q&A
Where does hadoop come handy

More Related Content

What's hot

SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
OReillyStrata
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
DataWorks Summit
 

What's hot (20)

Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
 
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoop
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesHadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
 
6.hive
6.hive6.hive
6.hive
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
 

Similar to Where does hadoop come handy

Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
Evert Lammerts
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15
Edureka!
 

Similar to Where does hadoop come handy (20)

1. what is hadoop part 1
1. what is hadoop   part 11. what is hadoop   part 1
1. what is hadoop part 1
 
Hadoop
Hadoop Hadoop
Hadoop
 
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache HadoopFirst NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
 
Hadoop Overview
Hadoop OverviewHadoop Overview
Hadoop Overview
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant Resource
 
HadoopWorkshopJuly2014
HadoopWorkshopJuly2014HadoopWorkshopJuly2014
HadoopWorkshopJuly2014
 
00 hadoop welcome_transcript
00 hadoop welcome_transcript00 hadoop welcome_transcript
00 hadoop welcome_transcript
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Big data with java
Big data with javaBig data with java
Big data with java
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?
 
Hadoop.powerpoint.pptx
Hadoop.powerpoint.pptxHadoop.powerpoint.pptx
Hadoop.powerpoint.pptx
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Survey
 
ANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEWANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEW
 
Hadoop technology doc
Hadoop technology docHadoop technology doc
Hadoop technology doc
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Where does hadoop come handy

  • 1. Where does Hadoop come handy? praveensripati@gmail.com www.thecloudavenue.com @praveensripati
  • 2. Agenda isn't used as
  • 3. What's Big Data? According to Wikipedia (http://en.wikipedia.org/wiki/Big_data) the definition of Big Data is In information technology, Big Data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. ~ ~
  • 5. Workload distribution across installationsPig play an important role Hive n in the Hadoop ecosystem http://www.cloudera.com/blog/2012/09/what-do-real-life-hadoop-workloads-look-like/
  • 6. Different Big Data scenarios Scenario Is Hadoop good for it? What are the alternatives? Real time processing No HStreaming, Twitter Storm Iterative Processing No Apache Hama, Apache Giraph, Jung Adhoc Interactive No Apache Drill, Open Querying Dremel Batch Processing Yes
  • 7. How have Big Data frameworks evolved? Google Paper Apache Component There has been 4-5 years gap between Google releasing a paper and The Google File System (October, 2003) us seeing an implementation of it. HDFS (2008 became Apache TLP) MapReduce: Simplified Data Processing MapReduce (2008 became Apache TLP) on Large Clusters (December, 2004) Bigtable: A Distributed Storage System for HBase (2010 became Apache TLP), Structured Data (November, 2006) Cassandra (2010 became Apache TLP) Large-scale graph computing at Google Hama, Giraph (2012 became Apache (June, 2009) TLP) Dremel: Interactive Analysis of Web-Scale Apache Drill (Incubated in August, 2012) Datasets (2010) Spanner: Google's Globally-Distributed ???? Database (September, 2012)
  • 8. What happens to the data once it is stored? If you aren’t taking advantage of big data, then you don’t have big data, you have just a pile of data. Descriptive analytics Predictive and Prescriptive analytics - What happened? - Why did it happen? - When did it happen? - When will it happen again? - What was it's impact? - What caused it to happen? - What can be done to avoid it?
  • 9. Evolution of Big Data use cases Hadoop has evolved from Yahoo and Google which are Web 2.0 companies for their massive text processing requirements like - log processing - search index - recommendations - context based advertising Ads & E-commerce, Astronomy, Social Networks, Bioinformatics/Medical Informatics, Machine Translation, Spatial Data Processing, Information Extraction and Text Processing, Artificial Intelligence/Machine Learning/Data Mining, Search Query Analysis, Information Retrieval (Search), Spam & Malware Detection, Image and Video Processing, Networking, Simulation, Statistics, Numerical Mathematics, Sets & Graphs http://atbrox.com/2011/05/16/mapreduce-hadoop-algorithms-in-academic-papers-4th-update-may-2011/
  • 10. Few of the Big Data use cases World Bank kicked an initiative to improve the Sanitation and Water that would impact 1B people. Neural Networks for Breast Cancer prize by Google. Fraud Detection in financial industry. Predictive Maintenance scheduling (like aircraft engines). Walmart and Sears Holding use POS information to stock different products in the stores and also for the SCM. Customer profiling and segmentation for targetted campaigns. Follow the competetions in Kaagle for more use case.
  • 11. Democratization of Education https://www.coursera.org/ http://www.udacity.com/ http://www.khanacademy.org/ http://www.youtube.com/user/nptelhrd/ https://www.edx.org/ to Machine Learning Music
  • 12. Keep Looking Out There is a lot more than Hadoop and some of them are mature and some are still evolving !!!
  • 13. Q&A