Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cloud-Native Patterns for Data-Intensive Applications

198 views

Published on

Are you interested in learning how to schedule batch jobs in container runtimes?
Maybe you’re wondering how to apply continuous delivery in practice for data-intensive applications? Perhaps you’re looking for an orchestration tool for data pipelines?
Questions like these are common, so rest assured that you’re not alone.

In this webinar, we’ll cover the recent feature improvements in Spring Cloud Data Flow. More specifically, we’ll discuss data processing use cases and how they simplify the overall orchestration experience in cloud runtimes like Cloud Foundry and Kubernetes.

Please join us and be part of the community discussion!

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Cloud-Native Patterns for Data-Intensive Applications

  1. 1. Mark Pollack (@markpollack) Sabby Anandan (@sabbyanandan) August 2018 Cloud-Native Patterns for Data-Intensive Applications
  2. 2. Agenda ■ Data-intensive Applications ■ What is Spring Cloud Data Flow? ■ Cloud-native Patterns for Data ■ Use-cases ■ Q+A 2
  3. 3. Data in the Enterprise What we see in the industry Digital transformation is the new norm. DevOps practices play a critical role in transitioning into a data-driven business. Event-driven architectures are on the rise; data is at the core of it. ETL is not going away, but the development and operating model continues to evolve. Machine learning has brought unprecedented abilities to engineering domain. Making it easily accessible is on an upswing. 3
  4. 4. “We call an application data- intensive if data is its primary challenge—the quantity of data, the complexity of data, or the speed at which it is changing.” 4
  5. 5. 5
  6. 6. You have data from disparate systems 6
  7. 7. You have data of different types 7
  8. 8. You have data of varying speed, size and shape 8
  9. 9. You have data that evolves 9
  10. 10. A toolkit for building data integration, real-time streaming, and batch data processing pipelines. WHAT IS SPRING CLOUD DATA FLOW? 10
  11. 11. Data pipelines consist of Spring Boot apps, using Spring Cloud Stream for event-streaming or Spring Cloud Task for batch processes. Ready for Data Integration with >60 out-of-the- box streaming and batch Apps. DSL, GUI, and REST-APIs to build and orchestrate data pipelines onto platforms like Kubernetes and Cloud Foundry. Continuous delivery for streaming data pipelines using Spring Cloud Skipper. Cron-job scheduler for batch data pipelines using Spring Cloud Scheduler. A toolkit for building data integration, real-time streaming, and batch data processing pipelines. WHAT IS SPRING CLOUD DATA FLOW? 11
  12. 12. Data pipelines consist of Spring Boot apps, using Spring Cloud Stream for event-streaming or Spring Cloud Task for batch processes. Ready for Data Integration with >60 out-of-the- box streaming and batch Apps. DSL, GUI, and REST-APIs to build and orchestrate data pipelines onto platforms like Kubernetes and Cloud Foundry. Continuous delivery for streaming data pipelines using Spring Cloud Skipper. Cron-job scheduler for batch data pipelines using Spring Cloud Scheduler. A toolkit for building data integration, real-time streaming, and batch data processing pipelines. WHAT IS SPRING CLOUD DATA FLOW? 12
  13. 13. Data pipelines consist of Spring Boot apps, using Spring Cloud Stream for event-streaming or Spring Cloud Task for batch processes. Ready for Data Integration with >60 out-of-the- box streaming and batch Apps. DSL, GUI, and REST-APIs to build and orchestrate data pipelines onto platforms like Kubernetes and Cloud Foundry. Continuous delivery for streaming data pipelines using Spring Cloud Skipper. Cron-job scheduler for batch data pipelines using Spring Cloud Scheduler. A toolkit for building data integration, real-time streaming, and batch data processing pipelines. WHAT IS SPRING CLOUD DATA FLOW? 13
  14. 14. Data pipelines consist of Spring Boot apps, using Spring Cloud Stream for event-streaming or Spring Cloud Task for batch processes. Ready for Data Integration with >60 out-of-the- box streaming and batch Apps. DSL, GUI, and REST-APIs to build and orchestrate data pipelines onto platforms like Kubernetes and Cloud Foundry. Continuous delivery for streaming data pipelines using Spring Cloud Skipper. Cron-job scheduler for batch data pipelines using Spring Cloud Scheduler. A toolkit for building data integration, real-time streaming, and batch data processing pipelines. WHAT IS SPRING CLOUD DATA FLOW? 14
  15. 15. Data pipelines consist of Spring Boot apps, using Spring Cloud Stream for event-streaming or Spring Cloud Task for batch processes. Ready for Data Integration with >60 out-of-the- box streaming and batch Apps. DSL, GUI, and REST-APIs to build and orchestrate data pipelines onto platforms like Kubernetes and Cloud Foundry. Continuous delivery for streaming data pipelines using Spring Cloud Skipper. Cron-job scheduler for batch data pipelines using Spring Cloud Scheduler. A toolkit for building data integration, real-time streaming, and batch data processing pipelines. WHAT IS SPRING CLOUD DATA FLOW? 15
  16. 16. Spring Cloud Task Build short-lived microservices to perform data processing locally or in the cloud. ● Task executions history ● Pluggable Task repository ● Remote partitioning, checkpointing, and restartability Spring Cloud Stream Build highly scalable event-driven microservices connected with shared messaging systems. ● Imperative vs. Functional programming styles ● Partitioning and consumer- groups ● Pluggable message bus abstraction Spring MVC / WebFlux Build production-grade RESTful apps on the JVM. ● Separation of concerns to support modular architecture ● Built-in RESTful components ● Pluggable view resolvers FEATURES FEATURES FEATURES Common Denominator = Spring Boot RESTful Streaming Batch Opportunities to Consolidate: Development Practices | Test Infrastructure | CI/CD Tooling and Automation 16
  17. 17. Runtime Abstraction 17
  18. 18. Platform Implementation Kubernetes Cloud FoundryLocal / Dev 18
  19. 19. File Ingest and ETL DEMO 1 19
  20. 20. Batch Jobs and Scheduler DEMO 2 20
  21. 21. Spring Cloud Data Flow Scheduling Agent Task Task Task Schedule Unschedule Execute Tasks 21
  22. 22. Message Binder Abstraction 22
  23. 23. Rabbit MQ Apache Kafka Google PubSub Amazon Kinesis SolaceAzure Event Hubs Pluggable Binder Implementation Opportunities: Same code; Same tests; Drop-in replacement for a variety of Message Brokers 23
  24. 24. Stateful Streaming DEMO 3 24
  25. 25. Top-5 States with “New Users” on a 30 secs Window 25
  26. 26. Real-time Streaming Analytics 26
  27. 27. Cloud Native Patterns for Data 27
  28. 28. What do we mean by ‘cloud- native’ patterns for data? ■ Self-contained applications; no app- server or external runtime (Web Server or ESB). ■ Deployment and Governance done by platforms like Cloud Foundry or Kubernetes. ■ Many data centric use-cases can be handled by self-contained apps. Leverage existing knowledge of runtime platforms and the supporting ecosystem. E T L E T L 28
  29. 29. Maintainability ● Build, test, and iterate as frequently as needed. ● CI/CD as first-class thinking for data-centric workloads. ● Data processing guarantees in the event of rolling-upgrades. Scalability ● Auto-scale up/down based on throughput demands. ● Linear throughput characteristics as you scale applications. ● Bring it back to desired shape when the demand fades away. ● Same app runs locally in the laptop or in any cloud platform where there’s JVM. PortabilityReliability ● Focus on the business logic and the unit-level, integration, and acceptance tests. ● Depend on platform runtime (Kubernetes or Cloud Foundry) for reliability and resiliency guarantees. 29
  30. 30. CI/CD for Streaming Apps DEMO 4 30
  31. 31. Build Package Test Unit Test IT Test Candidate Stage E2E Test Deploy Prod Deploy Prod automatic automatic 31
  32. 32. Use Cases 32
  33. 33. Modernize monolithic ETL workloads SQL scripts, stored procedures, and in- house bash scripts to cloud-native architecture. Small and incremental releases. Continuous delivery is the focus. 33
  34. 34. Replatform data integration Migrate Legacy integration applications from App-Servers on Bare Metal to microservices running on cloud platforms. 34
  35. 35. Events as first-class thinking in the enterprise. Common practices include domain-driven design, event-sourcing, and CQRS. Batch to Event-driven and streaming architectures 35
  36. 36. The file-ingest from NFS, S3, and other volume- mounts. Doing it in container based runtimes comes with many operational benefits including the ability to auto-scale. File-ingest and data processing 36
  37. 37. Stateful applications built using KStreams API including KTable and Interactive Queries for real-time streaming- analytics and rapid dashboarding. Stateful stream processing 37
  38. 38. Scheduled batch-jobs Whether it is for predictive model training, massive file movement, or the classic data migration batch-jobs, they are typically schedule driven. 38
  39. 39. Real-time predictive analytics Machine learning and model-scoring through TensorFlow, PMML, or Python-based models to perform real-time predictions. 39
  40. 40. Next ■ Function chaining through Spring Cloud Function and Spring Cloud Stream. ■ Deploy Apps with multiple input/output channels. ■ Audit trails: Who did what and when? ■ Task-launch and rate limiting. ■ Spring Boot 2.x / Java 9, 10, 11. 40
  41. 41. Q+A

×