Your SlideShare is downloading. ×
What is the point of Hadoop?Matthew AslettResearch Director, 451 Research                              © 2013 by The 451 G...
 Matthew Aslett• Research Director, Data Management and Analytics matthew.aslett@451research.com www.twitter.com/maslet...
Unique combination of research, analysis & dataEmerging tech market segment focusDaily qualitative & quantitative insightA...
Company Overview  One company with 3 operating             200+ staff   divisions                                1,300+...
What is the point of Hadoop?Hadoop’s greatest asset is itsflexibility: it can be used formultiple roles and use-casesBut t...
The Blind Men and the Elephant“It was six men of IndostanTo learning much inclined,Who went to see the Elephant(Though all...
The Blind Men and the Elephant“After Hadoop finishesfiltering the data, the placeyou want to put that datais in Oracle Dat...
Oracle Big Data ApplianceApache HadoopNoSQL DatabaseOracle Tools                        Oracle DatabaseData Integrator for...
What is the point of Hadoop?                         Big data   Big data                                                  ...
Big Data“Big data” - the realization of greater business intelligence bystoring, processing and analyzing data that was pr...
Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data process...
Big Data and Total Data                                                             Big Data:                             ...
Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data process...
Totality                                  Big data   Big data                                processing/i   storage       ...
Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data process...
ExplorationTraditional data warehouses:Schema on write    Application        Schema                         RDBMS         ...
Exploration                                Big data   Big data                                                            ...
Hadoop adoption process                                                                          Big data                 ...
Crossing the Chasm           Hadoop as (just) a low cost storage option is not fulfilling its potential           Proces...
Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data process...
Frequency Formerly AT&T Advertising solutions and AT&T Interactive Faced with increasing volume of traffic through    di...
Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data process...
SQL meets Hadoop                                   RDBMS and Hadoop      SQL on Hadoop                                    ...
Crossing the Chasm             Project maturity             Vendor ecosystem             Mainstream interest          ...
Project maturity      Feb 2006                                                    Dec 2012                   © 2013 by The...
Vendor ecosystem70+ different                         120+ differentcompanies, 200+                       companies, 750+i...
Vendor ecosystem                                               Academia            Unknown/indi                          1...
Mainstream interestSource: Indeed.com February, 2013                                    © 2013 by The 451 Group. All right...
Mainstream interestSource: Indeed.com February, 2013                                    © 2013 by The 451 Group. All right...
Largest employers of Hadoop skills                           Yahoo                        Microsoft                       ...
Largest employers of Hadoop skills                           Yahoo                        Microsoft                       ...
Geographic adoptionSeattle                       UK 3.7%                        3.0%          NYC          4.8%           ...
Geographic adoptionSeattle                      UK 3.9%       NYC             3.4%            4.7%            LA        DC...
Geographic adoptionSeattle                      UK 3.9%        NYC            3.4%             4.6%             LA       D...
Geographic adoption                          USA        ROW 40000                                                         ...
Conclusions Hadoop’s greatest asset is its flexibility, but that is also a challenge,  and can lead to confusion and disi...
Questions? Comments?                       © 2013 by The 451 Group. All rights reserved
Upcoming SlideShare
Loading in...5
×

What is the Point of Hadoop

18,202

Published on

The flexibility of Apache Hadoop is one of its biggest assets – enabling businesses to generate value from data that was previously considered too expensive to be stored and processed in traditional databases – but also results in Hadoop meaning different things to different people. In this session 451 Research’s Matt Aslett will explore the impact that Hadoop is having on the traditional data processing landscape, examining the expanding ecosystem of vendors and their relationships with Apache Hadoop, investigating the increasing variety of Hadoop use-cases, and exploring adoption trends around the world.

Published in: Technology
0 Comments
16 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
18,202
On Slideshare
0
From Embeds
0
Number of Embeds
79
Actions
Shares
0
Downloads
0
Comments
0
Likes
16
Embeds 0
No embeds

No notes for slide

Transcript of "What is the Point of Hadoop"

  1. 1. What is the point of Hadoop?Matthew AslettResearch Director, 451 Research © 2013 by The 451 Group. All rights reserved
  2. 2.  Matthew Aslett• Research Director, Data Management and Analytics matthew.aslett@451research.com www.twitter.com/maslett Responsible for data managementand analytics research agenda Focus on operational and analyticdatabases, including NoSQL,NewSQL, and Hadoop With 451 Research since 2007 © 2013 by The 451 Group. All rights reserved
  3. 3. Unique combination of research, analysis & dataEmerging tech market segment focusDaily qualitative & quantitative insightAnalyst advisory & Go-to-market supportGlobal events © 2013 by The 451 Group. All rights reserved
  4. 4. Company Overview  One company with 3 operating  200+ staff divisions  1,300+ client organizations:  Syndicated research, advisory, enterprises, vendors, service professional services, datacenter providers, and investment firms certification, and events  Organic and growth through  Global focus acquisition © 2013 by The 451 Group. All rights reserved
  5. 5. What is the point of Hadoop?Hadoop’s greatest asset is itsflexibility: it can be used formultiple roles and use-casesBut that is also a challenge,and can lead to confusionand disillusionmentEach user and vendor hastheir own perspective onHadoop’s role © 2013 by The 451 Group. All rights reserved
  6. 6. The Blind Men and the Elephant“It was six men of IndostanTo learning much inclined,Who went to see the Elephant(Though all of them were blind),That each by observationMight satisfy his mind.”John Godfrey Saxe (1872) © 2013 by The 451 Group. All rights reserved
  7. 7. The Blind Men and the Elephant“After Hadoop finishesfiltering the data, the placeyou want to put that datais in Oracle Database.”Larry Ellison (2011) © 2013 by The 451 Group. All rights reserved
  8. 8. Oracle Big Data ApplianceApache HadoopNoSQL DatabaseOracle Tools Oracle DatabaseData Integrator for Oracle DatabaseData Loader Big data Big dataR distribution processing/i analytics ntegration © 2013 by The 451 Group. All rights reserved
  9. 9. What is the point of Hadoop? Big data Big data Big data processing/i storage analytics ntegration © 2013 by The 451 Group. All rights reserved
  10. 10. Big Data“Big data” - the realization of greater business intelligence bystoring, processing and analyzing data that was previously ignored due to thelimitations of traditional data management technologies due to the three Vs: Volume Velocity Variety The volume of data The data is being The data lacks the is too large for produced at a rate structure to make it traditional database that is beyond the suitable for storage software tools to performance limits and analysis in cope with of traditional traditional databases systems and data warehouses © 2013 by The 451 Group. All rights reserved
  11. 11. Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data processing requirements. Inspired by ‘Total Football’ – a new approach to soccer that emerged in the late 1960s, in Amsterdam Total Data is making the most efficient use of existing and new data management resources to deliver value Not another name for Big Data: if your data is big, the way you manage it should be total © 2013 by The 451 Group. All rights reserved
  12. 12. Big Data and Total Data Big Data: The growing volume, velocity and variety of data Big Data Technologies: New technologies being adopted to store and process BIG that data TOTAL BIG DATA DQ DATA TECHNOLOGY Total Data: Volume The user trends driving the adoption of Big Data Technologies to store and Predictive process Big Data and the analytics management alongside existing data management technologies. © 2013 by The 451 Group. All rights reserved
  13. 13. Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data processing requirements.TotalityThe desire to processand analyze data inits entirety, ratherthan analyzing asample of data andextrapolating theresults. © 2013 by The 451 Group. All rights reserved
  14. 14. Totality Big data Big data processing/i storage ntegration Prior to adopting Hadoop, only had transactional and summarized non-transactional data stored in its EDW The vast majority of its log data was discarded as not valuable enough to be efficiently processed in an enterprise data warehouse Now using Hadoop to process hundreds of GBs of log data produced by the millions of searches and transactions performed on its site each day Creating data exports to R, and aggregating data to its existing data warehouse for analysis © 2013 by The 451 Group. All rights reserved
  15. 15. Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data processing requirements.Totality ExplorationThe desire to process The interest inand analyze data in exploratory analyticits entirety, rather approaches, in whichthan analyzing a schema is defined insample of data and response to theextrapolating the nature of the query.results. © 2013 by The 451 Group. All rights reserved
  16. 16. ExplorationTraditional data warehouses:Schema on write Application Schema RDBMS SQLHadoop:Schema on read Application Hadoop Schema MapReduce © 2013 by The 451 Group. All rights reserved
  17. 17. Exploration Big data Big data Big data processing/i storage analytics ntegration The company wanted to perform analysis on customer data in order to create geo-targeted advertising The required data was already present in its data warehouse but was modeled in a way that would not allow Orbitz to efficiently process the query Extracting the data into Hadoop enabled the company to query it in a way the data warehouse was never designed for © 2013 by The 451 Group. All rights reserved
  18. 18. Hadoop adoption process Big data Big data Big data processing/i storage analytics ntegration  Google File System  Google MapReduce  Google Dremel Research paper Research paper Research paper published: 2003 published: 2004 published: 2010  Google Tenzing Research paper published: 2011 ANALYTICS PROCESSING STORAGE INNOVATORS EARLY ADOPTERSImage source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.pngLicensed under the Creative Commons Attribution 2.5 License. © 2013 by The 451 Group. All rights reserved
  19. 19. Crossing the Chasm  Hadoop as (just) a low cost storage option is not fulfilling its potential  Processing and integration is not the complete picture  Hadoop-based analytics unlocks the value of previously ignored data  Attempting to fast forward to analytics, missing out the processing/integration stage, creates silos and will result in disillusionment PROCESSING ANALYTICS STORAGE EARLY INNOVATORS ADOPTERS EARLY MAJORITY LATE MAJORITY LAGGARDSImage source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.pngLicensed under the Creative Commons Attribution 2.5 License. © 2013 by The 451 Group. All rights reserved
  20. 20. Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data processing requirements.Totality Exploration FrequencyThe desire to process The interest in The desire toand analyze data in exploratory analytic increase the rate ofits entirety, rather approaches, in which analysis in order tothan analyzing a schema is defined in generate moresample of data and response to the accurate and timelyextrapolating the nature of the query. business intelligence.results. © 2013 by The 451 Group. All rights reserved
  21. 21. Frequency Formerly AT&T Advertising solutions and AT&T Interactive Faced with increasing volume of traffic through distribution network Wanted to provide intra-day reporting, but faced days of report-lag due to loading multiple databases Moved data processing to Hadoop, enabling the creation of a single common data layer for all applications Report-lag reduced to hours, rather than days New insight enabled by more frequent analysis and being able to process all the data © 2013 by The 451 Group. All rights reserved
  22. 22. Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data processing requirements.Totality Exploration Frequency DependencyThe desire to process The interest in The desire to The reliance onand analyze data in exploratory analytic increase the rate of existing technologiesits entirety, rather approaches, in which analysis in order to and skills, and thethan analyzing a schema is defined in generate more need to balancesample of data and response to the accurate and timely investment in thoseextrapolating the nature of the query. business intelligence. existing technologiesresults. and skills with the adoption of new techniques. © 2013 by The 451 Group. All rights reserved
  23. 23. SQL meets Hadoop RDBMS and Hadoop SQL on Hadoop Operational SQL on Hadoop co-processing• Hive • Hadapt Adaptive Analytic • Drawn to Scale • Project Stinger Platform • Spire • Apache Tez (proposed) • Teradata Aster SQL-H • Splice Machine• Impala • Splice SQL Engine • Cloudera Enterprise RTQ • Rainstor Big Data Analytics on Hadoop• Apache Drill • (incubating) • EMC Greenplum HAWQ• Phoenix project • Microsoft PolyBase • For HBase • Citus Data CitusDB• Lingual • For Cascading and • IBM Big SQL Hadoop © 2013 by The 451 Group. All rights reserved
  24. 24. Crossing the Chasm  Project maturity  Vendor ecosystem  Mainstream interest  Geographic adoption PROCESSING ANALYTICS STORAGE EARLY INNOVATORS ADOPTERS EARLY MAJORITY LATE MAJORITY LAGGARDSImage source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.pngLicensed under the Creative Commons Attribution 2.5 License. © 2013 by The 451 Group. All rights reserved
  25. 25. Project maturity Feb 2006 Dec 2012 © 2013 by The 451 Group. All rights reserved
  26. 26. Vendor ecosystem70+ different 120+ differentcompanies, 200+ companies, 750+individuals individuals Hortonworks Hortonworks The rest 37% The rest 27% 29% 31% HADOOP ALL CORE HADOOP PROJECTS Facebook 7% Cloudera Facebook 15% Yahoo! 11% Cloudera Yahoo 12% 15% 16% Contributors by lines of code by current employer © 2013 by The 451 Group. All rights reserved
  27. 27. Vendor ecosystem Academia Unknown/indi 1% viduals 4% Users ALL Hadoop 38% HADOOP vendors PROJECTS 51% Contributors by lines of Other vendors code by current employer 6% and contributor type © 2013 by The 451 Group. All rights reserved
  28. 28. Mainstream interestSource: Indeed.com February, 2013 © 2013 by The 451 Group. All rights reserved
  29. 29. Mainstream interestSource: Indeed.com February, 2013 © 2013 by The 451 Group. All rights reserved
  30. 30. Largest employers of Hadoop skills Yahoo Microsoft Google Current employer eBay Amazon IBM LinkedIn Oracle EMC Cisco Cloudera 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 % of total LinkedIn profiles mentioning HadoopSource: LinkedIn: August 2012 © 2013 by The 451 Group. All rights reserved
  31. 31. Largest employers of Hadoop skills Yahoo Microsoft Google Current employer Amazon IBM eBay Oracle LinkedIn Tata HP Cisco 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 % of total LinkedIn profiles mentioning HadoopSource: LinkedIn: February 2013 © 2013 by The 451 Group. All rights reserved
  32. 32. Geographic adoptionSeattle UK 3.7% 3.0% NYC 4.8% LA DC 3.0% 3.5% China 3.6% India 9.7% Bay area 28.2% LinkedIn search result December 2011 © 2013 by The 451 Group. All rights reserved
  33. 33. Geographic adoptionSeattle UK 3.9% NYC 3.4% 4.7% LA DC 2.8% 3.1% China 4.4% India 11.2% Bay area 24.9% LinkedIn search result August, 2012 © 2013 by The 451 Group. All rights reserved
  34. 34. Geographic adoptionSeattle UK 3.9% NYC 3.4% 4.6% LA DC 2.7% 3.1% China 4.8% India 13.5% Bay area 22.9% LinkedIn search result February 2013 © 2013 by The 451 Group. All rights reserved
  35. 35. Geographic adoption USA ROW 40000 Total: 38,049 35000 30000 41.7% 25000 Total: 22,178 20000 39.6% 15000 Total: 9,079 58.3% 10000 35.6% 60.4% 5000 64.4% 0 December 2011 August 2012 February 2013 LinkedIn search result © 2013 by The 451 Group. All rights reserved
  36. 36. Conclusions Hadoop’s greatest asset is its flexibility, but that is also a challenge, and can lead to confusion and disillusionment among later adopters Hadoop is enabling greater business intelligence by storing, processing and analyzing data that was previously ignored due to the limitations of traditional data management technologies Storage, processing, and analyzing of data is a process that has enabled early adopters to understand Hadoop’s role in the wider landscape Attempting to fast forward to analytics, missing out the processing/integration stage, creates silos and will result in disillusionment The Hadoop ecosystem is vibrant, with strength in depth, and breadth Growing mainstream interest and geographic adoption means Hadoop is well-positioned to cross the chasm into mainstream adoption © 2013 by The 451 Group. All rights reserved
  37. 37. Questions? Comments? © 2013 by The 451 Group. All rights reserved

×