NATC 2013 - Future of Data Intensive Applications by Milind Bhandarkar

1,737 views
1,601 views

Published on

NATC 2013 - Future of Data Intensive Applications by Milind Bhandarkar, Chief Scientist, Pivotal Labs

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,737
On SlideShare
0
From Embeds
0
Number of Embeds
998
Actions
Shares
0
Downloads
44
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

NATC 2013 - Future of Data Intensive Applications by Milind Bhandarkar

  1. 1. Future of Data Intensive Applications Milind Bhandarkar Chief Scientist, Pivotal @techmilind Thursday, December 12, 2013
  2. 2. About Me • • • • • • Thursday, December 12, 2013 http://www.linkedin.com/in/milindb Founding member of Hadoop team at Yahoo! [2005-2010] Contributor to Apache Hadoop since v0.1 Built and led Grid Solutions Team at Yahoo! [2007-2010] Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu) Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems (acquired by Oracle), Pathscale Inc. (acquired by QLogic), Yahoo!, LinkedIn, and Pivotal (formerly Greenplum)
  3. 3. Thursday, December 12, 2013
  4. 4. Kryptonite: First Hadoop Cluster At Yahoo! Thursday, December 12, 2013
  5. 5. M45 Thursday, December 12, 2013
  6. 6. OpenCirrus Thursday, December 12, 2013
  7. 7. Analytics Workbench Thursday, December 12, 2013
  8. 8. Analytics Workbench Thursday, December 12, 2013
  9. 9. Thursday, December 12, 2013
  10. 10. 70% of data generated by customers 80% of data being stored 3% being prepared for analysis 0.5% being analyzed Average Enterprises The Big Gap Thursday, December 12, 2013 <0.5% being operationalized
  11. 11. Example: Healthcare • In last 5 years • 3,573 studies on hospital readmissions • 9,745 papers on comparative effectiveness • 39,230 studies on drug interactions • 132,241 studies on hospital mortality • Yet, very few models operational Thursday, December 12, 2013
  12. 12. PowerPoint is where Models go to Die. - Hulya Farinas, Principal Data Scientist, Pivotal Thursday, December 12, 2013
  13. 13. Modernization Thursday, December 12, 2013
  14. 14. Building Blocks Thursday, December 12, 2013
  15. 15. Applications BI/Analytics Tools Data Science On-Demand, Self-Service Access (Big) Data Staging Platform Meta-data driven access control Unstructured Analytic Data Warehouse High-Speed Integration Semi-structured In Memory Data Grid Data provisioning, shared security, coordinated transformation Structured Modern Data Architecture Thursday, December 12, 2013
  16. 16. Data Fabric Requirements • Store massive & diverse data sets economically • Integrate and Ingest from legacy & disparate sources • Ability to rapidly analyze massive data sets • Control, Auditing, Manageability • Self-Service Thursday, December 12, 2013
  17. 17. Data Fabric Architecture Thursday, December 12, 2013
  18. 18. Infrastructure-As-AService is the new “Hardware” Thursday, December 12, 2013
  19. 19. IAAS: New Hardware • AWS, GCE, Azure • vSphere, OpenStack • Easy Provisioning • Scalable, Elastic, Ubiquitous • Needs bundling with Data & Analytics as Services Thursday, December 12, 2013
  20. 20. App Fabric Requirements • IAAS Cloud-Agnostic • Rapid provisioning, Elasticity • Open, No-Lock-In, Data As-A-Service • Automation for Application Lifecycle Management • Developer Agility : Eliminate Infrastructure Wiring Thursday, December 12, 2013
  21. 21. Thursday, December 12, 2013
  22. 22. Ecosystem Thursday, December 12, 2013
  23. 23. Broader Ecosystem Thursday, December 12, 2013
  24. 24. Legacy App Deployment Thursday, December 12, 2013
  25. 25. !"#$%&%#'()*+(,-#./0( ((12"341()*+(,-#./0( ((!.&5()*+(2!!0( ((6%'/()*+(&4"$%,4&0( ((&,2-4()*+(2!!0(7899( .!3"2/4()*+(,-#./0( ( Modern App Deployment Thursday, December 12, 2013
  26. 26. App App App App Dev Framework Dev Framework Dev Framework Configurations Configurations Manifests, Automations JVM JVM App Server App Server VM VM Infrastructure One Infrastructure Two Container 1 Container 2 JVM JVM App Server App Server Infrastructure One Infrastructure Two Application As Unit of Deployment Thursday, December 12, 2013
  27. 27. Hadoop’s Role in Data Clouds Thursday, December 12, 2013
  28. 28. Trough of Disillusionment ? Thursday, December 12, 2013
  29. 29. Or, Hadoop Everywhere? Thursday, December 12, 2013
  30. 30. Thursday, December 12, 2013
  31. 31. Thursday, December 12, 2013
  32. 32. Thursday, December 12, 2013
  33. 33. Thursday, December 12, 2013
  34. 34. Thursday, December 12, 2013
  35. 35. Game Changing Hadoop Economics Big Data Platform Price/TB $80,000 $60,000 $40,000 $20,000 $2008 Thursday, December 12, 2013 2009 2010 Big Data DB 2011 Hadoop 2012 2013
  36. 36. Storage Options • HDFS, MapR, Quantcast QFS • EMC Isilon, NetApp, IBM GPFS, PanFS, PVFS, Lustre • Amazon S3, EMC Atmos, OpenStack Swift • GlusterFS, Ceph • EMC ViPR Thursday, December 12, 2013
  37. 37. SQL-on-Hadoop • Pivotal HAWQ • Cloudera Impala, Facebook Presto, Apache Drill, Cascading Lingual, Optiq, Hortonworks Stinger • Hadapt, Jethrodata, IBM BigSQL, Microsoft PolyBase • More to come... Thursday, December 12, 2013
  38. 38. !"#$%&'())' !"#$%&'())' INTERACTIVE ONLINE !"#$%&'())' !"#$%&'())' !"#$%&'())' BATCH BATCH BATCH HDFS HDFS HDFS Hadoop 1.0 (Image Courtesy Arun Murthy, Hortonworks) Thursday, December 12, 2013
  39. 39. MapReduce 1.0 (Image Courtesy Arun Murthy, Hortonworks) Thursday, December 12, 2013
  40. 40. HADOOP 1.0 456% !$'('*8/91* % !57*% 89:*;<% !.:+1* !2'.2'$,&01* * &'()*+,-*% HADOOP 2.0 &)% !-'(2;1* 456% !$'('*8/91* % !57*% 89:*;<% !.:+1* !2'.2'$,&01* % 2*3% )2%% $9;*'=>%$*;75-*<% ?;'(:% -.*/0' !#6#2%7/&*#&0,&#1* /0)1% !2+%.(#"*"#./%"2#*3'&'0#3#&(* *4*$'('*5"/2#..,&01* !2+%.(#"*"#./%"2#*3'&'0#3#&(1* !"#$% !"#$.% !"#$%&$'&()*"#+,'-+#*.(/"'0#1* !"#$%&$'&()*"#+,'-+#*.(/"'0#1* Hadoop 2.0 (Image Courtesy Arun Murthy, Hortonworks) Thursday, December 12, 2013 !"#$%&'' ()$*+,' * *
  41. 41. !""#$%&'()*+,-)+.&'/0#1+2.+3&4(("+ :!;<3+ 2.;@,!<;2A@+ =>&",04-%0?+ =;0B?+ M.O2.@+ =3:&*0?+ 7;,@!>2.C+ =7D(EFG+7HGI?+ C,!J3+ =C$E&"K?+ 2.L>@>M,9+ =7"&EN?+ 3J<+>J2+ =M"0)>J2?+ 9!,.+!3+%4(#0*"#4/%05#*6'&'1#7#&(2*** 35678+!"#$%&$'&()*"#+,'-+#*.(/0'1#2* YARN Platform (Image Courtesy Arun Murthy, Hortonworks) Thursday, December 12, 2013 M;3@,+ =70&E%K?+ =P0&/0I?+
  42. 42. 5$6"7)8$%&'&($)* +4-$',0* 98:$#74$)* !"#$%&'&($)* !"#$%&'&($)* !"#$%&'&($)* !"#$%&'&($)* +"',&-'$)*./.* +"',&-'$)*0/0* +"',&-'$)*0/1* !"#$%&'&($)* !"#$%&'&($)* 3%*.* !"#$%&'&($)* +"',&-'$)*./0* !"#$%&'&($)* !"#$%&'&($)* 3%0* !"#$%&'&($)* +"',&-'$)*./2* !"#$%&'&($)* +"',&-'$)*0/.* !"#$%&'&($)* +"',&-'$)*0/2* YARN Architecture (Image Courtesy Arun Murthy, Hortonworks) Thursday, December 12, 2013
  43. 43. YARN • Yet Another Resource Negotiator • Resource Manager • Node Managers • Application Masters • Specific to paradigm, e.g. MR Application master (aka JobTracker) Thursday, December 12, 2013
  44. 44. Beyond MapReduce • Apache Giraph - BSP & Graph Processing • Storm on Yarn - Streaming Computation • HOYA - HBase on Yarn • Hamster - MPI on Hadoop • More to come ... Thursday, December 12, 2013
  45. 45. Hamster • Hadoop and MPI on the same cluster • OpenMPI Runtime on Hadoop YARN • Hadoop Provides: Resource Scheduling, Process monitoring, Distributed File System • Open MPI Provides: Process launching, Communication, I/O forwarding Thursday, December 12, 2013
  46. 46. Hamster Components • Hamster Application Master • Gang Scheduler,YARN Application Preemption • Resource Isolation (lxc Containers) • ORTE: Hamster Runtime • Process launching, Wireup, Interconnect Thursday, December 12, 2013
  47. 47. Client Client-RM RM-NodeManager Resource Manager Scheduler AMService Proc/ Container RM-AM RMNodeManager MPI AM MPI Scheduler ! Proc/ Container Proc/ Container Aux Srvcs HNP Framework Daemon ! Proc/ Container Aux Srvcs Framework Daemon NS NS AM-NM Node Manager Node Manager ! Node Manager Hamster Architecture Thursday, December 12, 2013
  48. 48. Hamster Scalability • Sufficient for small to medium HPC workloads • Job launch time gated by YARN resource scheduler Launch WireUp Collectives Monitor OpenMPI O(logN) O(logN) O(logN) O(logN) Hamster O(logN) O(logN) O(logN) Thursday, December 12, 2013 O(N)
  49. 49. ! GraphLab + Hamster on Hadoop Thursday, December 12, 2013
  50. 50. About GraphLab • Graph-based, High-Performance distributed computation framework • Started by Prof. Carlos Guestrin in CMU in 2009 • Recently founded Graphlab Inc to commercialize Graphlab.org Thursday, December 12, 2013
  51. 51. GraphLab Features • Topic Modeling (e.g. LDA) • Graph Analytics (Pagerank, Triangle counting) • Clustering (K-Means) • Collaborative Filtering • Linear Solvers • etc... Thursday, December 12, 2013
  52. 52. Only Graphs are not Enough • Full Data processing workflow required ETL/ Postprocessing, Visualization, Data Wrangling, Serving • MapReduce excels at data wrangling • OLTP/NoSQL Row-Based stores excel at Serving • GraphLab should co-exist with other Hadoop frameworks Thursday, December 12, 2013
  53. 53. Call To Action Thursday, December 12, 2013
  54. 54. Prepare for Convergence • HPC: Cache Coherence, Prefetching, Zerocopy, Low-contention locks • “Big Data”: Caching, Mirroring, Sharding (various flavors), relaxed consistency • Databases: Indexing, MVCC, Columnar storage/processing, Cost-based optimization Thursday, December 12, 2013
  55. 55. Convergence • Resource Allocation, Scheduling, Lifecycle Management • Compute, Storage, and Communication isolation, Multi-tenancy, Performance SLAs • Auth & Auth, Data/System Provisioning and Management, Monitoring, Metadata Management, Metering Thursday, December 12, 2013
  56. 56. New Hardware Platforms • Mellanox - Hadoop Acceleration through Network-assisted Merge • RoCE - Brocade, Cisco, Extreme, Arista... • ARM - Low power Hadoop servers • SSD - Velobit, Violin, FusionIO, Samsung.. • Niche - Compression, Encryption... Thursday, December 12, 2013
  57. 57. Data Cloud of Future? deploy Public Cloud On Premise Private Cloud Thursday, December 12, 2013
  58. 58. Questions? Thursday, December 12, 2013

×