Hadoop Trends


Published on


Published in: Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hadoop Trends

  1. Trends and usage ofApache HadoopEric BaldeschwielerCEO HortonworksTwitter: @jeric14, @hortonworksJanuary 2012© Hortonworks Inc. 2011 Page 1
  2. Agenda• Define terms – What is Hadoop? Why does Hadoop matter?• What drives Hadoop adoption?• Observed Trends Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2011
  3. Hortonworks Vision We believe that by 2015, more than half the worlds data will be processed by Apache Hadoop How to achieve that vision??? Enable ecosystem around enterprise-viable platform. Page 3 © Hortonworks Inc. 2011
  4. What is Apache Hadoop?•  Solution for big data –  Deals with complexities of high volume, velocity & variety of data•  Set of open source projects•  Transforms commodity hardware into a service that: –  Stores petabytes of data reliably –  Allows huge distributed computations•  Key attributes: –  Redundant and reliable (no data loss) One of the best examples of –  Extremely powerful open source driving innovation –  Batch processing centric and creating a market –  Easy to program distributed apps –  Runs on commodity hardware Page 4 © Hortonworks Inc. 2011
  5. Hortonworks Data Platform (HDP)Key Components of “Standard Hadoop” Open Source Stack Core Apache Hadoop Related Hadoop Projects Open APIs for: •  Data Integration •  Data Movement •  App Job Management •  System Management Pig Hive (Data Flow) (SQL) (Columnar NoSQL Store) HBase MapReduce Zookeeper (Coordination) (Distributed Programing Framework) HCatalog (Table & Schema Management) HDFS (Hadoop Distributed File System) Page 5 © Hortonworks Inc. 2011
  6. Big Data Trailblazers and Use Cases data analyzing web logs analytics advertising optimization machine learning mail anti-spam text mining web search content optimization customer trend analysis ad selection video & audio processing data mining user interest prediction social media Page 6 © Hortonworks Inc. 2011
  7. Yahoo!, Apache Hadoop & Hortonworkshttp://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop Yahoo! embraced Apache Hadoop, an open source platform, to crunch epic amounts of data using an army of dirt-cheap servers 2006 Hadoop at Yahoo! 40K+ Servers 170PB Storage 5M+ Monthly Jobs 1000+ Active Users 2011 Yahoo! spun off 22+ engineers into Hortonworks, a company focused on advancing open source Apache Hadoop for the broader market Page 7 © Hortonworks Inc. 2011
  8. What drives Hadoop adoption? Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2011
  9. Market Drivers for Apache Hadoop• Business drivers – High-value projects that require use of more data Gartner predicts 800% data growth – Belief that there is great ROI in mastering big data over next 5 years• Financial drivers – Growing cost of data systems as percentage of IT spend – Cost advantage of commodity hardware + open source – Enables departmental-level big data strategies 80-90% of data produced today is unstructured• Technical drivers – Existing solutions failing under growing requirements – 3Vs - Volume, velocity, variety – Proliferation of unstructured data © Hortonworks Inc. 2011 9 © Hortonworks Inc. 2011
  10. Every Market has Big Data Digital data is personal, everywhere, increasingly accessible, and will continue to grow exponentiallySource: McKinsey & Company report. Big data: The next frontier for innovation, competition, and productivity. May 2011. Page 10 © Hortonworks Inc. 2011
  11. Broader Use Case OpportunitiesFinancial Services Healthcare•  Detect/prevent fraud •  Patient monitoring•  Model and manage risk •  Predictive modeling•  Personalize banking/insurance products •  Compliance, Archival, text search•  Compliance, Archival, … •  Data driven researchRetail Web / Social / Mobile•  Behavior analysis •  Sentiment analysis•  Cross selling, recommendation engines •  Web log, image, and video analysis•  Optimize pricing, placement, design •  Personalization•  Optimize inventory and distribution •  Billing, Reporting, Network AnalysisManufacturing Government•  Simulation, Analysis, Design •  Detect/prevent fraud•  Improve service via product sensor data •  Security & Intelligence•  “Digital factory” for lean manufacturing •  Support open data initiatives Page 11 © Hortonworks Inc. 2011
  12. Observed Trends Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2011
  13. Trend: Agile Data• The old way – Operational systems keep only current records, short history – Analytics systems keep only conformed / cleaned / digested data – Unstructured data locked away in operational silos – Archives offline – Inflexible, new questions require system redesigns• The new trend – Keep raw data in Hadoop for a long time – Able to produce a new analytics view on-demand – Keep a new copy of data that was previously on in silos – Can directly do new reports, experiments at low incremental cost – New products / services can be added very quickly – Agile outcome justifies new infrastructure Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2011
  14. Traditional Enterprise Data Architecture Data Silos Traditional Data Warehouses, Serving Applications BI & AnalyticsWeb NoSQL Traditional ETL & Data BI /Serving RDMS … Message buses EDW Marts Analytics Serving Social Sensor Text Logs Media Data Systems … Unstructured Systems Page 14 © Hortonworks Inc. 2011
  15. Agile Data Architecture w/Hadoop Connecting All of Your Big Data Traditional Data Warehouses, Serving Applications BI & AnalyticsWeb NoSQL Traditional ETL & Data BI /Serving RDMS … Message buses EDW Marts Analytics EsTsL (s = Store) Custom Analytics Serving Social Sensor Text Logs Media Data Systems … Unstructured Systems Page 15 © Hortonworks Inc. 2011
  16. Trend: Data driven development• Limited runtime logic driven by huge lookup tables• Data computed offline on Hadoop – Machine learning, other expensive computation offline – Personalization, classification, fraud, value analysis…• Application development requires data science – Huge amounts of actually observed data key to modern services – Hadoop used as the science platform Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2011
  17. CASE STUDY YAHOO! HOMEPAGE •  Serving Maps   SCIENCE »  Machine learning to build ever •  Users  -­‐  Interests   HADOOP better categorization models   CLUSTER •  Five  Minute   USER   CATEGORIZATION   Produc7on   BEHAVIOR   MODELS  (weekly)     •  Weekly   PRODUCTION Categoriza7on   HADOOP »  Identify user interests using models   SERVING CLUSTER Categorization models MAPS (every 5 minutes) USER BEHAVIOR SERVING  SYSTEMS ENGAGED  USERS Build  customized  home  pages  with  latest  data  (thousands  /  second)  Copyright  Yahoo  2011   17  
  18. CASE STUDY YAHOO! HOMEPAGE Personalized for each visitor Result: twice the engagement Recommended  links   News  Interests   Top  Searches   +79% clicks +160% clicks +43% clicks vs. randomly selected vs. one size fits all vs. editor selectedCopyright  Yahoo  2011  Hortonworks Inc. 2011 © 18  
  19. Trend: Specialization of Data Systems• Hadoop does not replace existing systems – It adds new capabilities to the enterprise – It can offload things that are not done efficiently in current systems – Especially in scale out situations• Specialization of traditional data components – Use OLTP systems just for transactions – Use OLAP systems for interactive analysis• Hadoop has LOTS of bandwidth to storage and CPU – Pull reporting out OLTP systems – Pull ELT out of OLAP systems Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2011
  20. Hadoop and OLTP Systems MPP Processing of Online Transactions Hadoop used to Process Reports•  Mission critical •  Free up 50+% processing power for•  Manages transactions & serves reports transaction processing system •  Significant cost savings due to commodity nature of Hadoop Web Site Transaction Reports Processing Web Systems Site $$$ Transaction Logs Web Site Page 20 © Hortonworks Inc. 2011
  21. Hadoop and OLAP Systems Fast loading, raw data staging, ELT & long-term archival Allow analysts to use tools they know (The Agile Data Zone) (Take advantage of huge ecosystem of BI and Analytics tooling)Web Hadoop EDWMobileSocial Online ArchivalOtherlogs Page 21 © Hortonworks Inc. 2011
  22. TRENDS: Instrument Clouds of Things Clouds of things logging to Hadoop HDFS + Map-Reduce Websites Or HBase Mobile phones, Enterprise devices… + Analysis Things Things Things Things Things Things Page 22 © Hortonworks Inc. 2011
  23. Trend: Many POCs, Few Production Systems• The problem – Hadoop is still a young technology – Hard to find knowledgeable staff – Integration with existing systems• Hadoop market is maturing at speed – Emerging ecosystem of Hadoop platform solutions providers – Apache Hadoop continues to get better – Hadoop training and support available form several vendors Architecting the Future of Big Data Page 23 © Hortonworks Inc. 2011
  24. Growth in Hadoop Ecosystem• Hardware vendors, Public Cloud (IAAS, PAAS) – Storage, Appliances, Preloaded commodity boxes, cloud• Data Systems – All the major vendors announced Hadoop plans / products in 2011• BI, Analytics and ETL – Hadoop integrations emerging• Dedicated Hadoop Applications – Datamere, Karmashere, Platfora, …• Systems Integrators – Regional and Global providers available Architecting the Future of Big Data Page 24 © Hortonworks Inc. 2011
  25. Hadoop Continues to ImproveApache community, including Hortonworks investing to improve Hadoop:•  Make Hadoop an Open, Extensible, and Enterprise Viable Platform•  Enable More Applications to Run on Apache Hadoop “Hadoop.Beyond” Platform actively evolving “Hadoop.Next” (Hadoop 0.23) HA, Next-gen HDFS & MapReduce “Hadoop.Now” Extension & Integration APIs (Hadoop 1.0)Most stable version everHBase, security, WebHDFS Page 25 © Hortonworks Inc. 2011
  26. Hortonworks – Approachable Hadoop•  Apache Hadoop Leadership –  Delivered every major release since 0.1 –  Driving innovation across entire stack –  Experience managing world’s largest deployment –  Access to Yahoo’s 1,000+ Hadoop users and 40k+ nodes for testing, QA, etc.•  Business Focus –  Provide 100% open source product –  Hortonworks Data Platform Expert Role-based Training –  Help customers and partners overcome Hadoop knowledge gaps Full Lifecycle Support and Services –  Help organizations successfully develop and deploy solutions based on Hadoop Evaluate Pilot Production Architecting the Future of Big Data Page 26 © Hortonworks Inc. 2011
  27. Trend: Finding More Value Over Time• Hadoop is usually brought in to solve a specific problem – Build seach indexes for Yahoo – Manage web site logs for Facebook – Users using EC2 to do data processing at Amazon – Simple reporting when existing tools don’t scale• Once your data is in Hadoop more users find value• Once you have Hadoop, folks add more data Architecting the Future of Big Data Page 27 © Hortonworks Inc. 2011
  28. Thank You! Questions?Eric Baldeschwieler@jeric14 @hortonworks Page 28 © Hortonworks Inc. 2011