Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Modern Data Architecture

24,860 views

Published on

This is the presentation for the talk I gave at JavaDay Kiev 2015. This is about an evolution of data processing systems from simple ones with single DWH to the complex approaches like Data Lake, Lambda Architecture and Pipeline architecture

Published in: Data & Analytics
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Nice walkthrough of evolution of the modern EDW architecture
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Modern Data Architecture

  1. 1. 1Pivotal Confidential–Internal Use Only 1Pivotal Confidential–Internal Use Only Modern Data Architecture Alexey Grishchenko
  2. 2. 2Pivotal Confidential–Internal Use Only About me Enterprise Architect @ Pivotal  7 years in data processing  5 years with MPP  4 years with Hadoop  Spark contributor  http://0x0fff.com
  3. 3. 3Pivotal Confidential–Internal Use Only How it started… Front End
  4. 4. 4Pivotal Confidential–Internal Use Only How it started… Front End Back End
  5. 5. 5Pivotal Confidential–Internal Use Only How it started… Front End Back End DBMS
  6. 6. 6Pivotal Confidential–Internal Use Only How it started… Front End Back End DBMS What about BI?
  7. 7. 7Pivotal Confidential–Internal Use Only How it started… Front End Back End DBMS Just put it there!
  8. 8. 8Pivotal Confidential–Internal Use Only How it started… Front End Back End DBMS BI
  9. 9. 9Pivotal Confidential–Internal Use Only How it started… Front End Back End DBMS BI Was it fast?
  10. 10. 10Pivotal Confidential–Internal Use Only How it started… Front End 10ms Back End DBMS BI 100ms 200ms 1-2 min
  11. 11. 11Pivotal Confidential–Internal Use Only How it started… Front End 10ms Back End DBMS BI 100ms 200ms 1-2 min yes, single server…
  12. 12. 12Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 200ms 1-2 min More users got workstations
  13. 13. 13Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 400ms 800ms 1-2 min
  14. 14. 14Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 400ms 800ms 1-2 min Split!
  15. 15. 15Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 300ms 600ms 1-2 min
  16. 16. 16Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 300ms 600ms 1-2 min Even more users?
  17. 17. 17Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 300ms 600ms 1-2 min Split!
  18. 18. 18Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 400ms 1-2 min Front End Back End Front End Back End
  19. 19. 19Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 400ms 1-2 min Front End Back End Front End Back End What about automated systems?
  20. 20. 20Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 1 sec 5-10 min Front End Back End Front End Back End Front End Back End Front End Back End
  21. 21. 21Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 1 sec 5-10 min Front End Back End Front End Back End Front End Back End Front End Back End Database, please, live!
  22. 22. 22Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 1 sec 5-10 min Front End Back End Front End Back End Front End Back End Front End Back End
  23. 23. 23Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 800ms 15-20 min Front End Back End Front End Back End Front End Back End Front End Back End
  24. 24. 24Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 800ms 15-20 min Front End Back End Front End Back End Front End Back End Front End Back End What if “split” didn’t help this time?
  25. 25. 25Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 800ms 15-20 min Front End Back End Front End Back End Front End Back End Front End Back End Split more! Eventually it will help…
  26. 26. 26Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 300ms 35-40 min Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS
  27. 27. 27Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 300ms 35-40 min Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS
  28. 28. 28Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 300ms 35-40 min Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS Sales went 10% up!
  29. 29. 29Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 300ms 35-40 min Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS Sales went 10% up! Sales went 20% down!
  30. 30. 30Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 600ms 2-3 hrs Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS Sales went 10% up! Sales went 20% down!
  31. 31. 31Pivotal Confidential–Internal Use Only First Issues Front End 10ms Back End DBMS BI 100ms 600ms 2-3 hrs Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS Sales went 10% up! Sales went 20% down! Stop loading my system with your stupid reports!
  32. 32. 32Pivotal Confidential–Internal Use Only BI The Era of Data Warehouse 100ms DBMS 300ms 2 days FE BE DBMS DBMSDBMSDBMS FE BE FE BE FE BE FE BE ETL DWH 1 day
  33. 33. 33Pivotal Confidential–Internal Use Only BI The Era of Data Warehouse 100ms DBMS 300ms 2 days FE BE DBMS DBMSDBMSDBMS FE BE FE BE FE BE FE BE ETL DWH 1 day We need more reports!
  34. 34. 34Pivotal Confidential–Internal Use Only BI The Era of Data Warehouse 100ms DBMS 300ms 3-4 days FE BE DBMS DBMSDBMSDBMS FE BE FE BE FE BE FE BE ETL DWH 1 day Data Mining OLAP…
  35. 35. 35Pivotal Confidential–Internal Use Only BI The Era of Data Warehouse 100ms DBMS 300ms 3-4 days FE BE DBMS DBMSDBMSDBMS FE BE FE BE FE BE FE BE ETL DWH 1 day Data Mining OLAP… We need secondary site!
  36. 36. 36Pivotal Confidential–Internal Use Only The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP…
  37. 37. 37Pivotal Confidential–Internal Use Only The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE WAL Replication 3-5 minutes late
  38. 38. 38Pivotal Confidential–Internal Use Only The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE WAL Replication 3-5 minutes late
  39. 39. 39Pivotal Confidential–Internal Use Only The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE WAL Replication 3-5 minutes late Where is our DWH? We need this data now!
  40. 40. 40Pivotal Confidential–Internal Use Only The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE WAL Replication 3-5 minutes late
  41. 41. 41Pivotal Confidential–Internal Use Only ETL The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS
  42. 42. 42Pivotal Confidential–Internal Use Only ETL The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS Why is this data so old?
  43. 43. 43Pivotal Confidential–Internal Use Only ETL The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS
  44. 44. 44Pivotal Confidential–Internal Use Only ETL Advanced Architecture – ELT 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS… ETL DDS Data Marts Reports Aggregates OLAP DBMS DBMS DBMS… ELT DDS Data Marts Reports Aggregates OLAP ODS ODS ODS…
  45. 45. 45Pivotal Confidential–Internal Use Only ELT Advanced Architecture – ELT 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS
  46. 46. 46Pivotal Confidential–Internal Use Only ELT Advanced Architecture – CDC 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS… ELT DDS Data Marts Reports Aggregates OLAP ODS ODS ODS… DBMS DBMS DBMS… ELT DDS Data Marts Reports Aggregates OLAP ODS ODS ODS… CDC 1 day 1 hour
  47. 47. 47Pivotal Confidential–Internal Use Only ELT CDC Advanced Architecture – CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH
  48. 48. 48Pivotal Confidential–Internal Use Only ELT CDC Advanced Architecture – CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Why is our secondary site’s DWH so old?
  49. 49. 49Pivotal Confidential–Internal Use Only ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Moving Forward
  50. 50. 50Pivotal Confidential–Internal Use Only ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are Moving Forward
  51. 51. 51Pivotal Confidential–Internal Use Only ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days Moving Forward
  52. 52. 52Pivotal Confidential–Internal Use Only ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days  Amount of data is growing Moving Forward
  53. 53. 53Pivotal Confidential–Internal Use Only ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days  Amount of data is growing  DWH MPP storage is expensive Moving Forward
  54. 54. 54Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days  Amount of data is growing  DWH MPP storage is expensive Data Lake
  55. 55. 55Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days  Amount of data is growing  DWH MPP storage is expensive Lambda Data Lake
  56. 56. 56Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures – Data Lake 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Hadoop DBMS DBMS DBMS… ELT DDS OLAP Data Marts Aggregates Reports ODS ODS ODS… CDC DWH ODS UDS Analytical Archives BI Data Mining OLAP SQL-on-Hadoop Data Mining At Scale
  57. 57. 57Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures – Data Lake 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH
  58. 58. 58Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures – Data Lake 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late Data Mining BI OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ?
  59. 59. 59Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures – Lambda 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late Data Mining BI OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? Source Data Speed Layer Batch Layer Serving Layer Query Query Master Dataset Batch View Batch View Batch View Real-time View Real-time View Real-time View
  60. 60. 60Pivotal Confidential–Internal Use Only ELT CDC Modern Architectures – Lambda 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late Data Mining BI OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ?
  61. 61. 61Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures – Lambda 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining
  62. 62. 62Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are
  63. 63. 63Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems
  64. 64. 64Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems  How to replicate Hadoop cluster?
  65. 65. 65Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems  How to replicate Hadoop cluster?  How to sync data in real-time systems?
  66. 66. 66Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems  How to replicate Hadoop cluster?  How to sync data in real-time systems?  How to better sync DWH?
  67. 67. 67Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems  How to replicate Hadoop cluster?  How to sync data in real-time systems?  How to better sync DWH? Pipelining
  68. 68. 68Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining
  69. 69. 69Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP
  70. 70. 70Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP
  71. 71. 71Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Table
  72. 72. 72Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table
  73. 73. 73Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch
  74. 74. 74Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL
  75. 75. 75Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL load ODS DWH
  76. 76. 76Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL load ODS DDS DWH
  77. 77. 77Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL load ODS DDS DataMart DWH
  78. 78. 78Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL load ODS DDS DataMart DWH JDBC
  79. 79. 79Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL ODS DDS DataMart DWH JDBC
  80. 80. 80Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatch
  81. 81. 81Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatch loadETL
  82. 82. 82Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatchApp ETLBatch load loadETL
  83. 83. 83Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatchApp ETLBatch load loadETL STG BatchApp Hadoop HDFS SQL On Hadoop
  84. 84. 84Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatchApp ETLBatch load loadETL STG BatchApp Hadoop HDFS SQL On Hadoop RTI App
  85. 85. 85Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatchApp ETLBatch load loadETL STG BatchApp Hadoop HDFS SQL On Hadoop RTI AppReplicate
  86. 86. 86Pivotal Confidential–Internal Use Only In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining
  87. 87. 87Pivotal Confidential–Internal Use Only ELT CDC FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH OLAP Data Mining RTBI… FE BE FE BE FE BE CDC Hadoop In-Memory Data Store BI Modern Data Architecture – Pipelining Replication Queue 3-5 minutes late In-Memory Data Store OLAP… DWHHadoop BI Data Mining RTBI DBMS DBMS DBMSWAL Replication 3-5 minutes late
  88. 88. 88Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL
  89. 89. 89Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI HTTP Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL Pivotal Cloud Foundry FE … App App App Queue BE … App App App  Pivotal Labs – agile software development for next-generation applications  Pivotal Cloud Foundry – PaaS for customer applications  RabbitMQ – distributed message queue service on top of PCF  Spring IO – foundation platform for modern applications
  90. 90. 90Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL Pivotal GemFire App Pivotal GemFire and Apache Geode (incubating) – in-memory data grid enabling real-time data processing and real-time decision making for enterprises
  91. 91. 91Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL Spring XD Streaming Spring XD – unified, distributed and extensible framework for data pipelining: ingesting, batching, processing and exporting
  92. 92. 92Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming ES DDS DataMart Pivotal Greenplum PostgreSQL SP Table ODS ETL ETL Streaming Data Pivotal HD Pivotal HAWQ Data Mart  Pivotal HD – leading Hadoop distribution based on ODP  Pivotal HAWQ and Apache HAWQ (incubating) – bringing the power of MPP to the Hadoop cluster, best in class SQL-on- Hadoop solution  Apache Spark – component of the Pivotal HD distribution, modern framework for distributed data processing
  93. 93. 93Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data Mart ODS ETL ETL PostgreSQL SP Table  Pivotal PostgreSQL – commercially supported by Pivotal open source distribution of PostgreSQL
  94. 94. 94Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ Data MartPostgreSQL SP Table ETL ETL ES DDS DataMart Pivotal Greenplum ODS Pivotal Greenplum – leading analytical MPP database, foundation for the enterprise data warehousing systems and advanced analytics
  95. 95. 95Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture Pivotal GemFire App Spring XD Streaming BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL Data Lake
  96. 96. 96Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Spring XD Streaming ES DDS DataMart Pivotal Greenplum PostgreSQL SP Table ODS ETL ETL Pivotal GemFire App Streaming Data Pivotal HD Pivotal HAWQ Data Mart BI Lambda Architecture
  97. 97. 97Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture ES DDS DataMart Pivotal Greenplum PostgreSQL SP Table ODS ETL ETL Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Streaming Pivotal HD BI Pivotal GemFire App Spring XD Streaming Data Pivotal HAWQ Data Mart Pipelining
  98. 98. 98Pivotal Confidential–Internal Use Only Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL
  99. 99. 99Pivotal Confidential–Internal Use Only 99Pivotal Confidential–Internal Use Only Questions?
  100. 100. BUILT FOR THE SPEED OF BUSINESS

×