Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Hadoop World 2011: Mike Olson Keynote Presentation

4,066
views

Published on

Now in its fifth year, Apache Hadoop has firmly established itself as the platform of choice for organizations that need to efficiently store, organize, analyze, and harvest valuable insight from the …

Now in its fifth year, Apache Hadoop has firmly established itself as the platform of choice for organizations that need to efficiently store, organize, analyze, and harvest valuable insight from the flood of data that they interact with. Since its inception as an early, promising technology that inspired curiosity, Hadoop has evolved into a widely embraced, proven solution used in production to solve a growing number of business problems that were previously impossible to address. In his opening keynote, Mike will reflect on the growth of the Hadoop platform due to the innovative work of a vibrant developer community and on the rapid adoption of the platform among large enterprises. He will highlight how enterprises have transformed themselves into data-driven organizations, highlighting compelling use cases across vertical markets. He will also discuss Cloudera’s plans to stay at the forefront of Hadoop innovation and its role as the trusted solution provider for Hadoop in the enterprise. He will share Cloudera’s view of the road ahead for Hadoop and Big Data and discuss the vital roles for the key constituents across the Hadoop community, ecosystem and enterprises.

Published in: Technology, Education, Business

1 Comment
8 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,066
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
1
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • We ran a survey.1400 people, 580 countries27 countries and 40 of the United StatesMore than 3/4 are first-timers at Hadoop World – Welcome!Nearly 3/4 are using Hadoop today2/3 technical, 1/3 businessAnd the new profession of data science is here in force!
  • One third each: Less than one year, 1-2 years, more than two years.The average user here is more experienced than the average user at Hadoop World 2010 – 9 months
  • Average cluster size has doubled in a year.More than half of you have pretty big clusters – more than 100 nodes.202 PB represented on our survey. One company was 10% of that.More of you – 12% -- above a petabyte than I would have guessed.But important: About 3/4 of you have less than 100TB in Hadoop.
  • Hadoop needed more:Load and share dataQuery tools and ways to schedule and manage obsFast record storage and retrievalAll of that is available from the Apache ecossytem
  • In 2006 and 2007, all the work was on core Hadoop.2008, the ecosystem began to diversify.Today, nearly 70% of all new contribs are to surrounding projects – only 31% to Hadoop itselfWhat you would expect as platform has matured
  • Hadoop in production is just one part of your data center.You need to monitor and manage like other critical platforms.
  • What’s happening right now?Who’s doing what?
  • How are the services I depend on doing?
  • I need a high-level service view.Take storage.How is it performing?Latency? Throughput?What’s happening?
  • Who’s consuming storage?Am I close to capacity?How to I make sure users get what they need?How do I track their use?
  • Infrastructure is long-lived.I need to add, remove, retire hardware.I can’t shut down the system.
  • Move between high-level view and detail.HDFS is a service, but it runs on lots of servers.I need to see both.
  • That’s just storage.Lots of other services: query tools, analytics and more.Complex, multi-tenant, mission-critical infrastructure.Integrate with data center operations.
  • Hadoop is not an island.It is part of your enterprise IT platform.We were right.
  • Pick your graph: Big data is a big deal.The platform is here today.The next 12 months will be about use cases.About tooling and apps.Let me show you some cool ones. These companies are all here today.
  • WibiData is Odiago’s core product – a platform for developing personalized applications with Hadoop and HbaseWibiData provides both programmatic APIs for Application Development and an ODBC interface for easy integration with existing BI / Reporting / Analysis technology + libraries that make personalization quick and easyFoneDoktor is one such application, powered by WibiDataFoneDoktor is free for Consumers:Learn from your dataShare with the community -> get more value from your dataAvailable at fonedoktor.comFoneDoktor is available to Partners (Carriers and OEMs):Lower Device Return RatesLower Support VolumeMeasure Device / Network performanceWibiData + FoneDoktor deep dive in Aaron and Garrett’s talk – check it out!
  • Need self-service tools for behavioral analytics.Interactive, visual tools for business users to explore data themselves.Cetas provides real-time, interactive analytics.Automatic discover and highlight clusters and trends in data.Mask complexity, deliver big data analysis to business users.
  • R is a statistical language for developing advanced analyticsWith Hadoop, R can explore all the data: No sampling, no subsetting.R language runs under MapReduceStatistician focuses on analysis, not HadoopFraud and Risk analysisPortfolio optimizationAnything you can model in R
  • Validated by customers in the US Army and intelligence spaceOperates on key enterprise information (financial intelligence, risk, and patents)Combines enterprise data with public sourcesStructured, semi-structured and complexDiscovers and shows connections, relationships among entities
  • Enterprise Performance ManagementKey metrics, trends, analysies: Plan, budget, forecastHadoop for trending, diverse data sources, external and internalWith drill-downAimed at busy execs who need clear insight and overviewiPad, iPhone applications
  • It’s getting crowded in here!Companies contributing to Hadoop, integrating with it or building on top.Sign of a big, robust market.But these aren’t the only people who have spotted the opportunity in big data.I’d like to bring up Ping Li from Accel Partners with an exciting announcement.
  • Hadoop as the hubCatch, process, summarize the firehoseIntegrate with new and existing platforms for special-purpose workloadsAlready happening
  • Three years talking speeds and feedsThe story for the future is value:Business problems and solutions built on big data.
  • Transcript

    • 1. WELCOME
    • 2. Conference Highlights • Four exciting keynotes • Lots networking opportunities • Sixty educational sessions2 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 3. Thank You Sponsors PLATINUM SPONSORS GOLD SPONSORS SILVER SPONSORS BRONZE SPONSORS3 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 4. Housekeeping Items • Connecting to the internet – Wireless network = Sheraton Meeting – Code = Vertica • Hashtag = #hw2011 • Take the surveys – Breakout sessions – Overall survey4 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 5. Mike OlsonChief Executive OfficerCloudera
    • 6. Three Years Ago… We said: Hadoop is going to be huge.This year’s conference: • 1,400 people from 580 companies in 27 countries and 40 of the United States • 75.7% attending Hadoop World for the first time • 71.9% using Hadoop • 66.5% engineers, developers and architects, 33.5% non-technical business roles • Just over 50 of you are “data scientists”6 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 7. Three Years Ago… We said: Hadoop is going to be huge.Your Hadoop usage: • Less than one year: 36.8% • One to two years: 32.3% • Two to three years: 16.8% • More than three years: 12% • Average usage is 17.4 months this year, versus 8.76 months at last year’s Hadoop World7 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 8. Three Years Ago… We said: Hadoop is going to be huge.Your clusters: • Average size is 120 nodes, up from 66 last year • 44% between 10 and 100 nodes, 52% between 100 and 1,000 nodes • Total of 202 petabytes under management (60 last year) • Largest cluster bigger than 20PB • 13.1% bigger than 100TB • 12.8% bigger than 1PB 2010 20118 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 9. Two Years Ago…We said: Hadoop is at the center of a new platform for big data. • Hadoop • HBase • Pig • Zookeeper • Mahout • Hive • Avro • Whirr • Sqoop • Hcatalog • MRUnit • Bigtop • Oozie9 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 10. Two Years Ago… We said: Hadoop is at the center of a new platform for big data. 100% 100%Core 58%Hadoop 37% 37% 31%as % ofNewContribs 2006 2007 2008 2009 2010 2011 • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • HBase • HBase • HBase • HBase • Zookeeper • Pig • Pig • Pig • Mahout • Zookeeper • Zookeeper • ZookeeperRelevant • Mahout • Mahout • MahoutProjects • Hive • Hive • Hive • Avro • Avro • Whirr • Whirr • Sqoop • Sqoop • Bigtop • … 10 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 11. Last Year… We said : Hadoop must integrate with data center infrastructure and tools. • Enterprises need software and support that de-risk and simplify the operation of Hadoop in production • Must build on the open source platform to deliver all the innovation Hadoop and value created by the global Apache Operations Hadoop ecosystem11 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 12. 12 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 13. 13 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 14. 14 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 15. 15 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 16. 16 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 17. 17 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 18. 18 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 19. Last Year… We said : Hadoop must integrate with data center infrastructure and tools. OPERATORS ENGINEERS ANALYSTS BUSINESS USERS Management Enterprise IDE’s BI / Analytics Tools Reporting CUSTOMERS Enterprise Data Warehouse Web Application Relational Logs Files Web Data Databases19 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 20. This Year…We’re talking about the future.20 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 21. Building Applications Develop personalized applications on Hadoop and HBase Get it at: http://fonedoktor.com Learn more about Today, 3:30PM, Architecture Track Battery Analysis Mapping Features Aaron Kimball and Available Today… Coming Soon! Garrett Wuwww.wibidata.com, @wibidata
    • 22. Data Analysis and Visualization INSTANT INTELLIGENCE Demand for Online App Analytics• Real-time, interactive & visual analytics• Auto-discover data trends• User behavior analytics with data clustering• Investigative and root cause analytics• Simplify data modeling & custom functions for Hadoop dataEmpower business users, data scientists without-of-the-box analytics www.cetas.net, @CetasAnalytics
    • 23. Powerful Statistical Tools• Why Hadoop and R? • Need to do more than simple statistics • Analyze all of the data• Integration • Make it easy to write MapReduce programs in R • Keep the statisticians focused on the analysis Usage • Fraud and Risk Analysis • Portfolio Optimization • Anything you can model in R! www.revolutionanalytics.com, @RevolutionR
    • 24. Complex Data Exploration Automatic extraction of facts, Who connections, associations, etc. Relationship Who Association Connections Aliases Entity : Alias Where AIG When Location What Time Synthesys Knowledge Base What did.. Connection discovered from AIG to Metlife Equity in Wikipedia: Unstructured Data AIG sells Allco to Metlife Equity for $6.8B Synthesys automatically surfaces critical facts in unstructured datawww.digitalreasoning.com, @dreasoning
    • 25. Business Analytics• Metrics Management and Reporting• Strategic, Financial, and Operational Planning, Budgeting, and Forecasting• Profitability Modeling USABLE UNIFIED ACTIONABLE Enterprise Performance Management for the Cloudwww.tidemark.net, @TidemarkEPM
    • 26. An Exploding, Diverse Ecosystem26 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 27. | BE FIRSTBig Data FundHadoop World — November 2011
    • 28. Big Data Fund• $100MM dedicated to fund entrepreneurs globally in building disruptive, Big Data companies• Funding innovation across every layer of the “Big Data Stack”: Infrastructure • Applications Business Intelligence • Automation • Collaboration • Data Management • Data Analysis/Visualization • Identity & Access • Mobile • Security • Vertical Applications • Storage • … • …• Partnering with thought leaders to foster community and drive innovation: Doug Cutting Gil Elbaz Jeff Hammerbacher Jeff Heer Hilary Mason Jay Parikh Kenny Van Zant Hadoop Factual Cloudera Stanford Bit.ly Facebook SolarwindsAccel Partners 28
    • 29. Who We Are Three decades of technology investing with over $6B of capital in US, Europe, China and India • Partner with category-defining entrepreneurs • Invest at every stage of technology lifecycle – seed, venture and growth capital • Focus deeply on technology innovations in software, infrastructure and internet Big Data consistently drives innovation across our portfolio companies today Data Generators Data SolutionsAccel Partners 29
    • 30. Time is Now! The Big Data Wave  Data is exploding  “New” data types are breaking legacy data Data Growth platforms  Big Data platforms such as Hadoop are becoming mainstream 1980 1990 2000 2010  “Native” Big Data Traditional Data Big Data applications and services will quickly emerge Big Data continues to revolutionize data centers across all industries, opening up a massive market for entrepreneurial activity.Accel Partners 30
    • 31. Funding the Big Data EcosystemBig Data will drive the next-generation of multi-billion dollar software companies 1980 - 2010 2010 and beyond Analytics Security BusinessApplications Collaboration Intelligence Mobile CRM Vertical Apps: Fin Tech, Healthcare Big Data Platforms Traditional Data PlatformsData Relational Database Management SystemsInfrastructure Traditional Infrastructure Platforms Private & Public Cloud Mainframe, Client-Server, Web Platform and ServicesAccel Partners 31
    • 32. Big Data Fund Contact InfoAccel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ AccelPartners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners Contact Us▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ BigData Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big DataFund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ accel.com/bigdataAccel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ AccelPartners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel bigdatafund@accel.com Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ BigData Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big DataFund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ @bigdatafundAccel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ AccelPartners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ BigData Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big DataFund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Big Data Conference - Spring 2012Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ AccelPartners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners Want to attend or speak?▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big BigData2012@accel.comData Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big DataFund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Stay on top of the latest big data news from Accel Partners by finding us onAccel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ AccelPartners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners facebook.com/Accel▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ BigData Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data @Accel_PartnersFund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Partners ▪ Big Data Fund ▪ Accel Accel Partners 32
    • 33. The Next-Generation Data Center Systems Web Logs Real-time Servers Feeds Trading Systems SensorsEnterprise Sales Data SystemsWarehouse People Document Repository ERP System CRM33 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 34. The Future Tackling Critical Business Issues Better targeted Better and deeper medicines with fewer understanding of risk complications and to avoid credit crisis. side effects. Financial Services Life Sciences A personal experience More reliable with products and offers networks where we that are just what can predict and you need. Telecommunications Retail prevent failure. More content that is Government services lined up with your that are based on hard personal preferences. data, not just gut. Media Government34 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
    • 35. Thank You Thanks you