Where does Hadoop
  come handy?
  praveensripati@gmail.com

  www.thecloudavenue.com

      @praveensripati
Agenda


isn't            used as
What's Big Data?

According to Wikipedia (http://en.wikipedia.org/wiki/Big_data) the definition of Big Data is

In information technology, Big Data is a collection of data sets so large and complex that
        it becomes difficult to process using on-hand database management tools.




                                         ~
                                         ~
Hadoop acting like a kernel
Workload distribution across
            installationsPig play an important role
                      Hive n
                                                 in the Hadoop ecosystem




http://www.cloudera.com/blog/2012/09/what-do-real-life-hadoop-workloads-look-like/
Different Big Data scenarios
Scenario               Is Hadoop good for it?   What are the
                                                alternatives?
Real time processing   No                       HStreaming, Twitter Storm
Iterative Processing   No                       Apache Hama, Apache
                                                Giraph, Jung
Adhoc Interactive      No                       Apache Drill, Open
Querying                                        Dremel
Batch Processing       Yes
How have Big Data frameworks
             evolved?
            Google Paper                             Apache Component
                                      There has been 4-5 years gap between
                                          Google releasing a paper and
The Google File System (October, 2003) us seeing an implementation of it.
                                         HDFS (2008 became Apache TLP)


MapReduce: Simplified Data Processing        MapReduce (2008 became Apache TLP)
on Large Clusters (December, 2004)
Bigtable: A Distributed Storage System for   HBase (2010 became Apache TLP),
Structured Data (November, 2006)             Cassandra (2010 became Apache TLP)


Large-scale graph computing at Google        Hama, Giraph (2012 became Apache
(June, 2009)                                 TLP)
Dremel: Interactive Analysis of Web-Scale    Apache Drill (Incubated in August, 2012)
Datasets (2010)
Spanner: Google's Globally-Distributed       ????
Database (September, 2012)
What happens to the data once it
          is stored?
          If you aren’t taking advantage of big data,
                 then you don’t have big data,
                  you have just a pile of data.


Descriptive analytics               Predictive and Prescriptive analytics

       - What happened?                   - Why did it happen?
       - When did it happen?              - When will it happen again?
       - What was it's impact?            - What caused it to happen?
                                          - What can be done to avoid it?
Evolution of Big Data use cases
          Hadoop has evolved from Yahoo and Google
         which are Web 2.0 companies for their massive
                text processing requirements like

                                - log processing
                                 - search index
                              - recommendations
                          - context based advertising

Ads & E-commerce, Astronomy, Social Networks, Bioinformatics/Medical Informatics, Machine Translation,
                       Spatial Data Processing, Information Extraction and Text Processing,
Artificial Intelligence/Machine Learning/Data Mining, Search Query Analysis, Information Retrieval (Search),
                      Spam & Malware Detection, Image and Video Processing, Networking,
                          Simulation, Statistics, Numerical Mathematics, Sets & Graphs


http://atbrox.com/2011/05/16/mapreduce-hadoop-algorithms-in-academic-papers-4th-update-may-2011/
Few of the Big Data use cases
  World Bank kicked an initiative to improve the
  Sanitation and Water that would impact 1B people.
  Neural Networks for Breast Cancer prize by Google.
  Fraud Detection in financial industry.
  Predictive Maintenance scheduling (like aircraft
  engines).
  Walmart and Sears Holding use POS information to
  stock different products in the stores and also for the
  SCM.
  Customer profiling and segmentation for targetted
  campaigns.

Follow the competetions in Kaagle for more use case.
Democratization of Education
    https://www.coursera.org/

    http://www.udacity.com/

    http://www.khanacademy.org/

    http://www.youtube.com/user/nptelhrd/

    https://www.edx.org/




                           to

Machine Learning                       Music
Keep Looking Out




There is a lot more than Hadoop and some of them are mature
                 and some are still evolving !!!
Q&A
Where does hadoop come handy

Where does hadoop come handy

  • 1.
    Where does Hadoop come handy? praveensripati@gmail.com www.thecloudavenue.com @praveensripati
  • 2.
  • 3.
    What's Big Data? Accordingto Wikipedia (http://en.wikipedia.org/wiki/Big_data) the definition of Big Data is In information technology, Big Data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. ~ ~
  • 4.
  • 5.
    Workload distribution across installationsPig play an important role Hive n in the Hadoop ecosystem http://www.cloudera.com/blog/2012/09/what-do-real-life-hadoop-workloads-look-like/
  • 6.
    Different Big Datascenarios Scenario Is Hadoop good for it? What are the alternatives? Real time processing No HStreaming, Twitter Storm Iterative Processing No Apache Hama, Apache Giraph, Jung Adhoc Interactive No Apache Drill, Open Querying Dremel Batch Processing Yes
  • 7.
    How have BigData frameworks evolved? Google Paper Apache Component There has been 4-5 years gap between Google releasing a paper and The Google File System (October, 2003) us seeing an implementation of it. HDFS (2008 became Apache TLP) MapReduce: Simplified Data Processing MapReduce (2008 became Apache TLP) on Large Clusters (December, 2004) Bigtable: A Distributed Storage System for HBase (2010 became Apache TLP), Structured Data (November, 2006) Cassandra (2010 became Apache TLP) Large-scale graph computing at Google Hama, Giraph (2012 became Apache (June, 2009) TLP) Dremel: Interactive Analysis of Web-Scale Apache Drill (Incubated in August, 2012) Datasets (2010) Spanner: Google's Globally-Distributed ???? Database (September, 2012)
  • 8.
    What happens tothe data once it is stored? If you aren’t taking advantage of big data, then you don’t have big data, you have just a pile of data. Descriptive analytics Predictive and Prescriptive analytics - What happened? - Why did it happen? - When did it happen? - When will it happen again? - What was it's impact? - What caused it to happen? - What can be done to avoid it?
  • 9.
    Evolution of BigData use cases Hadoop has evolved from Yahoo and Google which are Web 2.0 companies for their massive text processing requirements like - log processing - search index - recommendations - context based advertising Ads & E-commerce, Astronomy, Social Networks, Bioinformatics/Medical Informatics, Machine Translation, Spatial Data Processing, Information Extraction and Text Processing, Artificial Intelligence/Machine Learning/Data Mining, Search Query Analysis, Information Retrieval (Search), Spam & Malware Detection, Image and Video Processing, Networking, Simulation, Statistics, Numerical Mathematics, Sets & Graphs http://atbrox.com/2011/05/16/mapreduce-hadoop-algorithms-in-academic-papers-4th-update-may-2011/
  • 10.
    Few of theBig Data use cases World Bank kicked an initiative to improve the Sanitation and Water that would impact 1B people. Neural Networks for Breast Cancer prize by Google. Fraud Detection in financial industry. Predictive Maintenance scheduling (like aircraft engines). Walmart and Sears Holding use POS information to stock different products in the stores and also for the SCM. Customer profiling and segmentation for targetted campaigns. Follow the competetions in Kaagle for more use case.
  • 11.
    Democratization of Education https://www.coursera.org/ http://www.udacity.com/ http://www.khanacademy.org/ http://www.youtube.com/user/nptelhrd/ https://www.edx.org/ to Machine Learning Music
  • 12.
    Keep Looking Out Thereis a lot more than Hadoop and some of them are mature and some are still evolving !!!
  • 13.