• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
What is the Point of Hadoop
 

What is the Point of Hadoop

on

  • 18,241 views

The flexibility of Apache Hadoop is one of its biggest assets – enabling businesses to generate value from data that was previously considered too expensive to be stored and processed in traditional ...

The flexibility of Apache Hadoop is one of its biggest assets – enabling businesses to generate value from data that was previously considered too expensive to be stored and processed in traditional databases – but also results in Hadoop meaning different things to different people. In this session 451 Research’s Matt Aslett will explore the impact that Hadoop is having on the traditional data processing landscape, examining the expanding ecosystem of vendors and their relationships with Apache Hadoop, investigating the increasing variety of Hadoop use-cases, and exploring adoption trends around the world.

Statistics

Views

Total Views
18,241
Views on SlideShare
6,380
Embed Views
11,861

Actions

Likes
15
Downloads
0
Comments
0

60 Embeds 11,861

http://readwrite.com 11182
http://www.google.com 167
http://www.newsblur.com 81
http://www.scoop.it 50
http://newsgator.cgsh.com 33
http://www-ig-opensocial.googleusercontent.com 28
http://newsblur.com 28
http://semanticweb.collected.info 25
http://feedspot.com 22
http://lanyrd.com 20
http://feedproxy.google.com 20
http://eventifier.co 18
http://www.hanrss.com 18
http://techapplicant.collected.info 17
https://twitter.com 17
http://digitalapprentices.collected.info 11
http://feeds.feedburner.com 10
http://techandstuff.collected.info 10
http://tedwon.com 7
http://dev.newsblur.com 7
http://127.0.0.1 6
http://techbloggers.collected.info 6
http://feedwrangler.net 6
http://aitao.collected.info 5
http://plus.url.google.com 5
http://cloudcomputing.collected.info 4
http://core.traackr.com 4
http://www6.lexisnexis.com 4
http://notepad.com 3
http://translate.googleusercontent.com 3
http://www.undesign.com.au 3
http://www.pulse.me 3
http://spacelounge.collected.info 3
http://author.cmo.com 3
http://preview.v3.readwrite.com 2
http://rockmelt.com 2
http://eventifier.com 2
http://ibm-watson-ai-medical.co 2
http://preview.user.kylesaymediacom.v3.readwrite.com 2
http://www2.warwick.ac.uk 2
http://digitalscene.collected.info 1
http://twimblr.appspot.com 1
http://www.acushare.com 1
http://editorial.readwrite.com 1
https://www.rebelmouse.com 1
http://johnkrol.mobi 1
http://pulse.me&_=1368457421182 HTTP 1
http://rss2.com 1
http://fever.lightcorp.net 1
http://www.feedhabit.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    What is the Point of Hadoop What is the Point of Hadoop Presentation Transcript

    • What is the point of Hadoop?Matthew AslettResearch Director, 451 Research © 2013 by The 451 Group. All rights reserved
    •  Matthew Aslett• Research Director, Data Management and Analytics matthew.aslett@451research.com www.twitter.com/maslett Responsible for data managementand analytics research agenda Focus on operational and analyticdatabases, including NoSQL,NewSQL, and Hadoop With 451 Research since 2007 © 2013 by The 451 Group. All rights reserved
    • Unique combination of research, analysis & dataEmerging tech market segment focusDaily qualitative & quantitative insightAnalyst advisory & Go-to-market supportGlobal events © 2013 by The 451 Group. All rights reserved
    • Company Overview  One company with 3 operating  200+ staff divisions  1,300+ client organizations:  Syndicated research, advisory, enterprises, vendors, service professional services, datacenter providers, and investment firms certification, and events  Organic and growth through  Global focus acquisition © 2013 by The 451 Group. All rights reserved
    • What is the point of Hadoop?Hadoop’s greatest asset is itsflexibility: it can be used formultiple roles and use-casesBut that is also a challenge,and can lead to confusionand disillusionmentEach user and vendor hastheir own perspective onHadoop’s role © 2013 by The 451 Group. All rights reserved
    • The Blind Men and the Elephant“It was six men of IndostanTo learning much inclined,Who went to see the Elephant(Though all of them were blind),That each by observationMight satisfy his mind.”John Godfrey Saxe (1872) © 2013 by The 451 Group. All rights reserved
    • The Blind Men and the Elephant“After Hadoop finishesfiltering the data, the placeyou want to put that datais in Oracle Database.”Larry Ellison (2011) © 2013 by The 451 Group. All rights reserved
    • Oracle Big Data ApplianceApache HadoopNoSQL DatabaseOracle Tools Oracle DatabaseData Integrator for Oracle DatabaseData Loader Big data Big dataR distribution processing/i analytics ntegration © 2013 by The 451 Group. All rights reserved
    • What is the point of Hadoop? Big data Big data Big data processing/i storage analytics ntegration © 2013 by The 451 Group. All rights reserved
    • Big Data“Big data” - the realization of greater business intelligence bystoring, processing and analyzing data that was previously ignored due to thelimitations of traditional data management technologies due to the three Vs: Volume Velocity Variety The volume of data The data is being The data lacks the is too large for produced at a rate structure to make it traditional database that is beyond the suitable for storage software tools to performance limits and analysis in cope with of traditional traditional databases systems and data warehouses © 2013 by The 451 Group. All rights reserved
    • Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data processing requirements. Inspired by ‘Total Football’ – a new approach to soccer that emerged in the late 1960s, in Amsterdam Total Data is making the most efficient use of existing and new data management resources to deliver value Not another name for Big Data: if your data is big, the way you manage it should be total © 2013 by The 451 Group. All rights reserved
    • Big Data and Total Data Big Data: The growing volume, velocity and variety of data Big Data Technologies: New technologies being adopted to store and process BIG that data TOTAL BIG DATA DQ DATA TECHNOLOGY Total Data: Volume The user trends driving the adoption of Big Data Technologies to store and Predictive process Big Data and the analytics management alongside existing data management technologies. © 2013 by The 451 Group. All rights reserved
    • Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data processing requirements.TotalityThe desire to processand analyze data inits entirety, ratherthan analyzing asample of data andextrapolating theresults. © 2013 by The 451 Group. All rights reserved
    • Totality Big data Big data processing/i storage ntegration Prior to adopting Hadoop, only had transactional and summarized non-transactional data stored in its EDW The vast majority of its log data was discarded as not valuable enough to be efficiently processed in an enterprise data warehouse Now using Hadoop to process hundreds of GBs of log data produced by the millions of searches and transactions performed on its site each day Creating data exports to R, and aggregating data to its existing data warehouse for analysis © 2013 by The 451 Group. All rights reserved
    • Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data processing requirements.Totality ExplorationThe desire to process The interest inand analyze data in exploratory analyticits entirety, rather approaches, in whichthan analyzing a schema is defined insample of data and response to theextrapolating the nature of the query.results. © 2013 by The 451 Group. All rights reserved
    • ExplorationTraditional data warehouses:Schema on write Application Schema RDBMS SQLHadoop:Schema on read Application Hadoop Schema MapReduce © 2013 by The 451 Group. All rights reserved
    • Exploration Big data Big data Big data processing/i storage analytics ntegration The company wanted to perform analysis on customer data in order to create geo-targeted advertising The required data was already present in its data warehouse but was modeled in a way that would not allow Orbitz to efficiently process the query Extracting the data into Hadoop enabled the company to query it in a way the data warehouse was never designed for © 2013 by The 451 Group. All rights reserved
    • Hadoop adoption process Big data Big data Big data processing/i storage analytics ntegration  Google File System  Google MapReduce  Google Dremel Research paper Research paper Research paper published: 2003 published: 2004 published: 2010  Google Tenzing Research paper published: 2011 ANALYTICS PROCESSING STORAGE INNOVATORS EARLY ADOPTERSImage source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.pngLicensed under the Creative Commons Attribution 2.5 License. © 2013 by The 451 Group. All rights reserved
    • Crossing the Chasm  Hadoop as (just) a low cost storage option is not fulfilling its potential  Processing and integration is not the complete picture  Hadoop-based analytics unlocks the value of previously ignored data  Attempting to fast forward to analytics, missing out the processing/integration stage, creates silos and will result in disillusionment PROCESSING ANALYTICS STORAGE EARLY INNOVATORS ADOPTERS EARLY MAJORITY LATE MAJORITY LAGGARDSImage source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.pngLicensed under the Creative Commons Attribution 2.5 License. © 2013 by The 451 Group. All rights reserved
    • Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data processing requirements.Totality Exploration FrequencyThe desire to process The interest in The desire toand analyze data in exploratory analytic increase the rate ofits entirety, rather approaches, in which analysis in order tothan analyzing a schema is defined in generate moresample of data and response to the accurate and timelyextrapolating the nature of the query. business intelligence.results. © 2013 by The 451 Group. All rights reserved
    • Frequency Formerly AT&T Advertising solutions and AT&T Interactive Faced with increasing volume of traffic through distribution network Wanted to provide intra-day reporting, but faced days of report-lag due to loading multiple databases Moved data processing to Hadoop, enabling the creation of a single common data layer for all applications Report-lag reduced to hours, rather than days New insight enabled by more frequent analysis and being able to process all the data © 2013 by The 451 Group. All rights reserved
    • Total DataThe adoption of non-traditional data processing technologiesis also driven by the user’s particular data processing requirements.Totality Exploration Frequency DependencyThe desire to process The interest in The desire to The reliance onand analyze data in exploratory analytic increase the rate of existing technologiesits entirety, rather approaches, in which analysis in order to and skills, and thethan analyzing a schema is defined in generate more need to balancesample of data and response to the accurate and timely investment in thoseextrapolating the nature of the query. business intelligence. existing technologiesresults. and skills with the adoption of new techniques. © 2013 by The 451 Group. All rights reserved
    • SQL meets Hadoop RDBMS and Hadoop SQL on Hadoop Operational SQL on Hadoop co-processing• Hive • Hadapt Adaptive Analytic • Drawn to Scale • Project Stinger Platform • Spire • Apache Tez (proposed) • Teradata Aster SQL-H • Splice Machine• Impala • Splice SQL Engine • Cloudera Enterprise RTQ • Rainstor Big Data Analytics on Hadoop• Apache Drill • (incubating) • EMC Greenplum HAWQ• Phoenix project • Microsoft PolyBase • For HBase • Citus Data CitusDB• Lingual • For Cascading and • IBM Big SQL Hadoop © 2013 by The 451 Group. All rights reserved
    • Crossing the Chasm  Project maturity  Vendor ecosystem  Mainstream interest  Geographic adoption PROCESSING ANALYTICS STORAGE EARLY INNOVATORS ADOPTERS EARLY MAJORITY LATE MAJORITY LAGGARDSImage source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.pngLicensed under the Creative Commons Attribution 2.5 License. © 2013 by The 451 Group. All rights reserved
    • Project maturity Feb 2006 Dec 2012 © 2013 by The 451 Group. All rights reserved
    • Vendor ecosystem70+ different 120+ differentcompanies, 200+ companies, 750+individuals individuals Hortonworks Hortonworks The rest 37% The rest 27% 29% 31% HADOOP ALL CORE HADOOP PROJECTS Facebook 7% Cloudera Facebook 15% Yahoo! 11% Cloudera Yahoo 12% 15% 16% Contributors by lines of code by current employer © 2013 by The 451 Group. All rights reserved
    • Vendor ecosystem Academia Unknown/indi 1% viduals 4% Users ALL Hadoop 38% HADOOP vendors PROJECTS 51% Contributors by lines of Other vendors code by current employer 6% and contributor type © 2013 by The 451 Group. All rights reserved
    • Mainstream interestSource: Indeed.com February, 2013 © 2013 by The 451 Group. All rights reserved
    • Mainstream interestSource: Indeed.com February, 2013 © 2013 by The 451 Group. All rights reserved
    • Largest employers of Hadoop skills Yahoo Microsoft Google Current employer eBay Amazon IBM LinkedIn Oracle EMC Cisco Cloudera 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 % of total LinkedIn profiles mentioning HadoopSource: LinkedIn: August 2012 © 2013 by The 451 Group. All rights reserved
    • Largest employers of Hadoop skills Yahoo Microsoft Google Current employer Amazon IBM eBay Oracle LinkedIn Tata HP Cisco 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 % of total LinkedIn profiles mentioning HadoopSource: LinkedIn: February 2013 © 2013 by The 451 Group. All rights reserved
    • Geographic adoptionSeattle UK 3.7% 3.0% NYC 4.8% LA DC 3.0% 3.5% China 3.6% India 9.7% Bay area 28.2% LinkedIn search result December 2011 © 2013 by The 451 Group. All rights reserved
    • Geographic adoptionSeattle UK 3.9% NYC 3.4% 4.7% LA DC 2.8% 3.1% China 4.4% India 11.2% Bay area 24.9% LinkedIn search result August, 2012 © 2013 by The 451 Group. All rights reserved
    • Geographic adoptionSeattle UK 3.9% NYC 3.4% 4.6% LA DC 2.7% 3.1% China 4.8% India 13.5% Bay area 22.9% LinkedIn search result February 2013 © 2013 by The 451 Group. All rights reserved
    • Geographic adoption USA ROW 40000 Total: 38,049 35000 30000 41.7% 25000 Total: 22,178 20000 39.6% 15000 Total: 9,079 58.3% 10000 35.6% 60.4% 5000 64.4% 0 December 2011 August 2012 February 2013 LinkedIn search result © 2013 by The 451 Group. All rights reserved
    • Conclusions Hadoop’s greatest asset is its flexibility, but that is also a challenge, and can lead to confusion and disillusionment among later adopters Hadoop is enabling greater business intelligence by storing, processing and analyzing data that was previously ignored due to the limitations of traditional data management technologies Storage, processing, and analyzing of data is a process that has enabled early adopters to understand Hadoop’s role in the wider landscape Attempting to fast forward to analytics, missing out the processing/integration stage, creates silos and will result in disillusionment The Hadoop ecosystem is vibrant, with strength in depth, and breadth Growing mainstream interest and geographic adoption means Hadoop is well-positioned to cross the chasm into mainstream adoption © 2013 by The 451 Group. All rights reserved
    • Questions? Comments? © 2013 by The 451 Group. All rights reserved