Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal

1,136 views

Published on

As adoption of Hadoop across enterprises has skyrocketed, a variety of business use cases have emerged. In this talk, Milind would highlight a few use cases, and talk about emerging use cases that are shaping the future of the Hadoop platform.

Published in: Technology, Education

Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal

  1. 1. 1© Copyright 2013 Pivotal. All rights reserved. 1© Copyright 2013 Pivotal. All rights reserved.Hadoop: AFoundation forChangeMilind BhandarkarChief Scientist, PivotalTwitter: @techmilind
  2. 2. 2© Copyright 2013 Pivotal. All rights reserved.About Me http://www.linkedin.com/in/milindb Founding member of Hadoop team at Yahoo! [2005-2010] Contributor to Apache Hadoop since v0.1 Built and led Grid Solutions Team at Yahoo! [2007-2010] Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu) Center for Development of Advanced Computing (C-DAC), NationalCenter for Supercomputing Applications (NCSA), Center for Simulation ofAdvanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic),Yahoo!, LinkedIn, and Pivotal (formerly EMC-Greenplum)
  3. 3. 3© Copyright 2013 Pivotal. All rights reserved.First, technology is good. Then it getsbad. Then it gets stable.- Alistair Croll(http://strata.oreilly.com/2013/01/data-warefare.html)
  4. 4. 4© Copyright 2013 Pivotal. All rights reserved.History (2003-2010)
  5. 5. 5© Copyright 2013 Pivotal. All rights reserved.Google Papers
  6. 6. 6© Copyright 2013 Pivotal. All rights reserved.Yahoo! Search+=
  7. 7. 7© Copyright 2013 Pivotal. All rights reserved.W-1-W WebMap : Graph processing for WWW Dreadnaught: Infrastructure for WebMap Juggernaut: Infrastructure for W-1-W JFS, JMR, Condor: Abandoned for Hadoop
  8. 8. 8© Copyright 2013 Pivotal. All rights reserved.Lucene, Nutch
  9. 9. 9© Copyright 2013 Pivotal. All rights reserved.Kryptonite
  10. 10. 10© Copyright 2013 Pivotal. All rights reserved.Lessons Learned Multi-Tenancy from ground-up Agility in lieu of Performance Provisioning vs Procurement “Weird” use cases as learning experience Academic collaboration
  11. 11. 11© Copyright 2013 Pivotal. All rights reserved.(From Hadoop Summit 2010)Who Uses Hadoop ?
  12. 12. 12© Copyright 2013 Pivotal. All rights reserved.http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/Big Data Landscape (June 2012)
  13. 13. 13© Copyright 2013 Pivotal. All rights reserved.http://www.datameer.com/blog/perspectives/hadoop-ecosystem-as-of-january-2013-now-an-app.htmlHadoop Ecosystem (January 2013)
  14. 14. 14© Copyright 2013 Pivotal. All rights reserved.
  15. 15. 15© Copyright 2013 Pivotal. All rights reserved.
  16. 16. 16© Copyright 2013 Pivotal. All rights reserved.
  17. 17. 17© Copyright 2013 Pivotal. All rights reserved.Hadoop Economics is Game Changer$-$20,000$40,000$60,000$80,0002008 2009 2010 2011 2012 2013Big Data Platform Price/TBBig Data DB Hadoop
  18. 18. 18© Copyright 2013 Pivotal. All rights reserved.“Typical” Hadoop Use-Case “User” Modeling Objective: Determine User-Interests by mining user-activities Large dimensionality of possible user activities Typical user has sparse activity vector Event attributes change over time
  19. 19. 19© Copyright 2013 Pivotal. All rights reserved.Domain: Retail User = Customer Activities– Online: Purchase, Ad click, FB Likes– Offline : Brick-and-mortar purchases, returns, coupon clipping,gift cards Personalized Product Recommendation
  20. 20. 20© Copyright 2013 Pivotal. All rights reserved.Domain: IT Infrastructure “User” = HW & SW Components Activities– Log messages, Metrics, connectivity, communication events Goal: Proactive alerting of imminent failures
  21. 21. 21© Copyright 2013 Pivotal. All rights reserved.Domain: Healthcare User = Patient Activities– Doctor Visits, Medicine refills, Medical History– 3G/WiFi-enabled Pillbox... Goal: Prevent Hospital Readmissions
  22. 22. 22© Copyright 2013 Pivotal. All rights reserved.Domain: Telecom User: Subscriber Activities– Calls made, duration, calls dropped, locations, ...– “social” graph, status updates Goal: Reduce customer churn
  23. 23. 23© Copyright 2013 Pivotal. All rights reserved.Domain: Ad-Supported Web User = User :-) Activities– Clicks on content, Likes, Repost– Search Queries, Comments, Participation Goal: Increase Engagement, Increase Clicks onrevenue-generating content (ads/premium content)
  24. 24. 24© Copyright 2013 Pivotal. All rights reserved.User-Modeling Pipeline Sessionization Feature and Target Generation Model Training Offline Scoring & Evaluation Batch Scoring & Upload to serving
  25. 25. 25© Copyright 2013 Pivotal. All rights reserved.What’s Next ?
  26. 26. 26© Copyright 2013 Pivotal. All rights reserved.Trough of Disillusionment ?
  27. 27. 27© Copyright 2013 Pivotal. All rights reserved.Or, Hadoop Everywhere ?
  28. 28. 28© Copyright 2013 Pivotal. All rights reserved.Storage Wars HDFS KosmosFS, LocalFS, Quantcast FS, S3 MapR GPFS, Isilon, Atmos, Swift, NetApp Lustre, Gluster, Ceph, PanFS, PVFS EMC ViPR
  29. 29. 29© Copyright 2013 Pivotal. All rights reserved.NoSQL = Not Yet SQL ? Pivotal HAWQ Cloudera Impala Apache Drill, Spire (Drawn to Scale) Cascading Lingual, Optiq Hortonworks Stinger More to come....
  30. 30. 30© Copyright 2013 Pivotal. All rights reserved.Prepare for Convergence HPC: Cache Coherence, Prefetching, Zero-copy, Low-contention locks “Big Data”: Caching, Mirroring, Sharding (variousflavors), relaxed consistency Databases: Indexing, MVCC, Columnarstorage/processing, Cost-based optimization
  31. 31. 31© Copyright 2013 Pivotal. All rights reserved.Convergence Resource Allocation, Scheduling, LifecycleManagement Compute, Storage, and Communication isolation, Multi-tenancy, Performance SLAs Auth & Auth, Data/System Provisioning andManagement, Monitoring, Metadata Management,Metering
  32. 32. 32© Copyright 2013 Pivotal. All rights reserved.Hadoop As A Service Hadoop Platform-As-A-Service– EMR competitor proliferation– OpenStack, CloudStack, Joyent... Application-As-A-Service (Hadoop Inside)– Cetas, Continuuity, Causata, Claritics, Tresata, Wibidata,… Pivotal One– CloudFoundry, Hadoop, HAWQ, Analytics– Spring, Redis, RabbitMQ
  33. 33. 33© Copyright 2013 Pivotal. All rights reserved.New Hardware Platforms Mellanox - Hadoop Acceleration through NetworkLevitated Merge RoCE - Brocade, Cisco, Extreme, Arista... ARM - Low power Hadoop servers SSD - Velobit, Violin, FusionIO, Samsung.. Niche - Compression, Encryption…
  34. 34. 34© Copyright 2013 Pivotal. All rights reserved.IAAS as the new Hardware AWS, GCE, Azure vSphere, OpenStack Easy Provisioning Scalable Elastic Ubiquitous Needs bundling with Data & Analytics as Services
  35. 35. 35© Copyright 2013 Pivotal. All rights reserved.Big Data Platform of Future ?deployPublic CloudPrivate CloudOn Premise
  36. 36. 36© Copyright 2013 Pivotal. All rights reserved.Questions ?
  37. 37. A NEW PLATFORM FOR A NEW ERA

×