Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dataservices: Processing Big Data the Microservice Way


Published on

O'Reilly Software Architecture Conference 2018, New York (USA): Talk by Mario-Leander Reimer (@LeanderReimer, Principal Software Architect at QAware)

Big data processing, microservices, and cloud-native technology are a match made in computing heaven, enabling microservices to be used to build a flexible, scalable, and distributed system of loosely coupled data processing tasks, called data services.
Mario-Leander Reimer explores key JEE technologies that can be used to build JEE-powered data services and walks you through implementing the individual data processing tasks of a simplified showcase application. You’ll then deploy and orchestrate the individual data services using OpenShift, illustrating the scalability of the overall processing pipeline. The context and content is taken from a real-world project for a major German car manufacturer, implementing a microservices-based processing pipeline that uses car-related event data (sensor data, traffic events, and other real-time data) for a traffic information management and route optimization system.

Published in: Data & Analytics
  • Be the first to comment

Dataservices: Processing Big Data the Microservice Way

  1. 1. Mario-Leander Reimer @LeanderReimer Dataservices Processing Big Data the Microservice Way New York, Feb 27, 2018
  2. 2. Mario-Leander Reimer Chief Technologist, QAware GmbH Contact Details Mail: Twitter: @LeanderReimer Github: 27.02.18 3 Developer && Architect 20+ years of experience #CloudNativeNerd Open Source Enthusiast
  3. 3. We want to go to the cloud … 4
  4. 4. 5 Device The System Traffic Data Historical Data Map Data Vehicle Data
  5. 5. 6 The system. The data center.
  6. 6. Enter Dataservices. { Big + Fast + Smart } Data Micro- services
  7. 7. BIG DATA All things distributed: Distributed Processing Distributed Databases 8 FAST DATA Low latency and high throughput: Stream processing Messaging Event-driven Data to information: Machine (deep) learning Advanced statistics Natural Language Processing SMART DATA
  8. 8. 9
  9. 9. 10 Components All Along the Software Lifecycle. DESIGN § Complexity unit § Data integrity unit § Coherent and cohesive features unit § Decoupled unit Design Components RUN § Release unit § Deployment unit § Runtime unit (crash, slow-down, access) § Scaling unit Ops Components n:1 NEW ! BUILD § Planning unit § Team assignment unit § Knowledge unit § Development unit § Integration unit Dev Components 1:1
  10. 10. 11 Dev Components Ops Components?:1 System Subsystems Components Services Good starting point Decomposition Trade-Offs Microservices Nanoservices Macroservices Monolith More flexible to scale Runtime isolation (crash, slow-down, …) Independent releases, deployments, teams Higher utilization possible - Distribution debt: Latency - Increasing infrastructure complexity - Increasing troubleshooting complexity - Increasing integration complexity
  11. 11. 12 We are here. We need to go here.
  12. 12. Decomposing the existing monolith was realistic. 13
  13. 13. 14 The basic idea: Input – Processing – Output. Data processing using a graph of microservices. I1 Sources P1 Pn Processors O1 Sinks Microservice (aka Dataservice) Message Queue
  14. 14. 15 Possible messaging patterns applied for reliable and flexible communication between dataservices. P1 C1Q1 Message Passing P1 C1 Q1 Cn Work Queue P1 C1Q1 CnQn Publish/Subscribe P1 C1 Q1 Q2 Remote Procedure Call
  16. 16. 17 Some Open Source Dataservice Platforms. Standardized API with several open source implementations Microservices: JavaEE micro container Messaging: JMS, MQTT, Kafka, SQS Platforms: Docker, Kubernetes, OpenShift, DC/OS Stream processing tightly integrated with Kafka Microservices: main() Messaging: Kafka, Kafka Streams Platforms: any Kafka runs on Open source by Lightbend Microservices: Lagom, Play Messaging: akka Platforms: Conductr, ??? Open source project based on the Spring stack Microservices: Spring Boot, Spring Cloud Stream & Task Messaging: Kafka, RabbitMQ Platforms: PCF, Kuberntes, YARN, Mesos Java EE 7 / 8 Kafka Streams Lagom Framework Cloud Cloud Data Flow
  17. 17. Overview of Java EE 7 APIs suited for Dataservices. 18 CDI Extensions Web Fragments Bean Validation 1.1 CDI 1.1 Managed Beans 1.0 JCA 1.7 JPA 2.2JMS 2.0 JSP 2.3 EL 3.0 EJB 3.2 Batch 1.0 JSF 2.2 Interceptors 1.2 Mail 1.5 Common Annotations 1.3 JTA 1.2 JAX-WS 1.4 JAX-RS 2.0 Concurrency 1.0 JSON-P 1.0 WebSocket 1.1 JASPIC 1.1 JACC 1.5 Servlet 3.1 JCache 1.0
  18. 18. @MessageDriven(activationConfig = { @ActivationConfigProperty(propertyName = "serverURIs", propertyValue = "tcp://eclipse-mosquitto:1883"), @ActivationConfigProperty(propertyName = "cleanSession", propertyValue = "false"), @ActivationConfigProperty(propertyName = "automaticReconnect", propertyValue = "true"), @ActivationConfigProperty(propertyName = "filePersistence", propertyValue = "false"), @ActivationConfigProperty(propertyName = "connectionTimeout", propertyValue = "30"), @ActivationConfigProperty(propertyName = "maxInflight", propertyValue = "3"), @ActivationConfigProperty(propertyName = "keepAliveInterval", propertyValue = "5"), @ActivationConfigProperty(propertyName = "topicFilter", propertyValue = "de/qaware/oss/cloud/mqtt"), @ActivationConfigProperty(propertyName = "qos", propertyValue = "1") }) public class MqttSourceMDB implements MQTTListener { @OnMQTTMessage @TransactionAttribute(value = TransactionAttributeType.REQUIRED) @Transactional(Transactional.TxType.REQUIRED) public void onMQTTMessage(String topic, MqttMessage message) { JsonReader reader = Json.createReader(new ByteArrayInputStream(message.getPayload())); JsonObject jsonObject = reader.readObject(); // TODO do stuff with the JSON payload } } 19 Simple Message Driven Beans to receive messages. This also works for MQTT, Kafka, Amazon SQS, … For other JCA adapters visit
  19. 19. JsonObject currentWeather = Json.createObjectBuilder() .add("city", “London") .add("weather", “Drizzle") .build(); StringWriter payload = new StringWriter(); JsonWriter jsonWriter = Json.createWriter(payload); jsonWriter.writeObject(currentWeather); TextMessage msg = session.createTextMessage(payload.toString()); msg.setJMSType("CurrentWeather"); msg.setStringProperty("contentType", "application/"); @ActivationConfigProperty(propertyName = "messageSelector", propertyValue = "(JMSType = 'CurrentWeather') AND (contentType = 'application/‘)“) JsonReader reader = Json.createReader(new StringReader(body)); JsonObject jsonObject = reader.readObject(); 20 Use JSON-P to build your JsonObject and JsonArray instances. Use JSON-P to read JSON payloads. Use JSON-P to traverse and access JSON objects and arrays. Upcoming in Java EE 8: JSON Pointers and JSON Patch add even more flexibility. Use Mime-Type versioning for your JSON messages if required. Use JMS message selectors to filter on JMS type and content type. Alternatively use flexible binary protocols like ProtoBuf. Use JSON as payload format for loose coupling. Use JSON-P to implement tolerant reader pattern.
  20. 20. Cloud-ready runtimes suited for Dataservices. 21 … and many more.
  21. 21. Overview of the demo showcase. 22 JDBC Source Weather Processor Weather File Sink Weather DB Sink REST Source JAX-RS JMS MQTT Source JSON-P JMS Kafka Source JSON-P JMS CSV Source JBatch JMS JBatch JMS CSV In-Memory Datagrid Topic Queue Topic Location Processor JSON-P JMS JCache JSON-P JMS JCache CSV JMS JSON-P JPA JMS JSON-P JPA
  22. 22. Conceptual View on Kubernetes Building Blocks. 23
  23. 23. Most important Kubernetes concepts. 24 Services are an abstraction for a logical collection of pods. Pods are the smallest unit of compute in Kubernetes Deployments are an abstraction used to declare and update pods, RCs, … Replica Sets ensure that the desired number of pod replicas are running Labels are key/value pairs used to identify Kubernetes resources
  24. 24. apiVersion: extensions/v1beta1 kind: Deployment metadata: name: location-processor spec: replicas: 2 strategy: type: RollingUpdate template: metadata: labels: io.kompose.service: location-processor spec: containers: - name: location-processor image: lreimer/location-processor:1.0 ports: - containerPort: 8080 - containerPort: 5701 Example K8s Deployment Definition. 25
  25. 25. resources: # Define resources to help K8S scheduler # CPU is specified in units of cores # Memory is specified in units of bytes # required resources for a Pod to be started requests: memory: “196Mi" cpu: "250m" # the Pod will be restarted if limits are exceeded limits: memory: “512Mi" cpu: "500m" Resource Constraints Definition. 26
  26. 26. # container will receive requests if probe succeeds readinessProbe: httpGet: path: /api/application.wadl port: 8080 initialDelaySeconds: 30 timeoutSeconds: 5 # container will be killed if probe fails livenessProbe: httpGet: path: /admin/health port: 8080 initialDelaySeconds: 60 timeoutSeconds: 5 Liveness and Readiness Probes for Antifragility. 27
  27. 27. apiVersion: v1 kind: Service metadata: labels: io.kompose.service: location-processor name: location-processor spec: type: NodePort ports: - name: "http" port: 8080 targetPort: 8080 selector: io.kompose.service: location-processor Example K8s Service Definition. 28
  28. 28. Programmable MIDI Controller. Visualizes Deployments and Pods. Scales Deployments. Supports K8s, OpenShift, DC/OS. Java EE powered Dataservices on Kubernetes in Action. 29
  29. 29. Fork me on Github.
  30. 30. Mario-Leander Reimer @LeanderReimer