Successfully reported this slideshow.
Your SlideShare is downloading. ×

s2gx2015 who needs batch

s2gx2015 who needs batch

Download to read offline

Stream processing gets quite a bit of the press these days. Projects like Spark, Samza/Kafka, Google Cloud Dataflow, Apache Flink, and Amazon Kinesis bring the question of can we accomplish what we need without a batch component. This talk will take a hard look at if we still need batch these days, what that might look like in a modern development platform, and where Spring fits into it all.

Stream processing gets quite a bit of the press these days. Projects like Spark, Samza/Kafka, Google Cloud Dataflow, Apache Flink, and Amazon Kinesis bring the question of can we accomplish what we need without a batch component. This talk will take a hard look at if we still need batch these days, what that might look like in a modern development platform, and where Spring fits into it all.

More Related Content

s2gx2015 who needs batch

  1. 1. SPRINGONE2GX WASHINGTON, DC Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Who Needs Batch? By Michael Minella and Gunnar Hillert @michaelminella, @ghillert
  2. 2. Michael Minella Twitter: @michaelminella Podcast: http://javaOffHeap.com or @OffHeap Website: http://spring.io Gunnar Hillert Twitter: @ghillert Website: http://blog.hillert.com Conference: http://devnexus.com
  3. 3. https://github.com/ghillert/s1-2gx2015-who-needs-batch
  4. 4. W E H AV E A N ANNOUNCEMENT
  5. 5. WE HAVE A DISEASE
  6. 6. OBJECTIVIUS SHINIUM SYNDROMUS
  7. 7. Shiny Object Syndrom SHINY OBJECT SYNDROME
  8. 8. IT MAKES US INNOVATE
  9. 9. ROAD TO RECOVERY
  10. 10. COOL KIDS ARE DOING STREAM PROCESSING
  11. 11. Stream Analytics Cloud Dataflow
  12. 12. LET’S TAKE A CLOSER LOOK
  13. 13. BATCH AND STREAM DEFINITIONS
  14. 14. LOOK AT COMMON USE CASES
  15. 15. CONCLUSIONS
  16. 16. DEFINITION: STREAM PROCESSING
  17. 17. A system for processing an unbounded set of data in an asynchronous manor.
  18. 18. DEFINITION: BATCH PROCESSING
  19. 19. DEPENDS ON WHO YOU ASK
  20. 20. “Batch … applications run … as special cases of stream processing applications” - Apache Flink Documentation
  21. 21. Batch processing is the processing of a bounded data set without interruption or interaction
  22. 22. KEY DIFFERENCES
  23. 23. 1 BOUNDED VS UNBOUNDED
  24. 24. 2 SYNCHRONOUS VS ASYNCHRONOUS
  25. 25. 3 STATEFUL VS STATELESS
  26. 26. STATE OF STREAMING
  27. 27. Stream Analytics Cloud Dataflow
  28. 28. Stream Analytics Cloud Dataflow
  29. 29. STATE OF STREAMING
  30. 30. JSR-352
  31. 31. WHY THE END OF BATCH?
  32. 32. LATENCY
  33. 33. WHAT DOES THE END OF BATCH MEAN?
  34. 34. LIMIT LATENCY
  35. 35. “ENTERPRISE GRADE”
  36. 36. JSR-352
  37. 37. LOOK AT COMMON USE CASES
  38. 38. E.T.L.
  39. 39. DATABASE TO HDFS
  40. 40. DEMO
  41. 41. STREAMING CAN BE USEFUL IN INGESTION USE CASES
  42. 42. TWITTER TO HDFS
  43. 43. DATA SCIENCE
  44. 44. BATCH TRAINING PREDICTIVE MODELS
  45. 45. CONNECTED CAR
  46. 46. transformerhttp filter hdfs type-transformer python Gemfire REST Hadoop Gemfire
  47. 47. STREAMING APROXIMIATIONS EXIST ≈
  48. 48. BREADTH AND ACCURACY DON’T MATCH BATCH ≠
  49. 49. WORKFLOW ORCHESTRATION
  50. 50. forEachDir start getDirectoryInfo join dirSubProcss dirSubProcss…
  51. 51. Condition 1 start getDirAgeAndNumberOfFiles Ingest Condition 2 sendReminderEmail ArchiveEnd
  52. 52. DEMO
  53. 53. HOW IS THIS DONE VIA STREAMING?
  54. 54. NON-INTERACTIVE PROCESSING
  55. 55. SAME AS ETL PROCESSING
  56. 56. PORT SCANNER
  57. 57. DEMO
  58. 58. STREAMING DOESN’T MAKE SENSE
  59. 59. STREAMING DOES MAKE SENSE
  60. 60. WHERE LATENCY IS A PRIORITY
  61. 61. DATA LOSS IS ACCEPTABLE
  62. 62. UNBOUNDED DATASET
  63. 63. BUT DOES NOT IN OTHER USE CASES
  64. 64. HIGHLY COMPLEX
  65. 65. ERROR HANDLING NOT AS ROBUST
  66. 66. DATA GUARANTEES NOT AS ROBUST
  67. 67. WHERE BATCH IS BETTER
  68. 68. FINITE DATASET
  69. 69. ROBUST ERROR HANDLING
  70. 70. RESOURCES ARE A CONCERN
  71. 71. NOT EITHER OR
  72. 72. COMBINING THE TWO PROVIDES THE MOST POWER
  73. 73. E.T.L.v2
  74. 74. mailhttp mysql2hdfs Database to HDFS v2
  75. 75. DEMO
  76. 76. BATCH HAS LASTED BECAUSE IT MAKSE SENSE
  77. 77. SPRING XD/DATA FLOW BEST OF BOTH WORLDS
  78. 78. ENTERPRISE GRADE STREAM PROCESSING
  79. 79. BEST BATCH OPTION ON THE JVM
  80. 80. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 88 Check out Spring Cloud Data Flow on https://spring.io Apache Spark for Big Data Processing – Salon N-P Up next! High Performance Stream Processing – Salon N-P 8:30AM Tomorrow Spring Integration Extensions Ecosystem – Here! 12:45 Tomorrow Learn More. Stay Connected. @springcentral Spring.io/video

×