Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Microservices in the Cloud

2,662 views

Published on

SpringOne Platform 2016
Speaker: Mark Pollack; Spring Cloud Data Flow Lead, Pivotal

Spring Cloud Data Flow enables you to create data pipelines for many common use-case such as data ingestion, real-time analytics and data import/export. In this session, we will introduce Spring Cloud Data Flow’s architecture and walkthrough the orchestration capabilities of long-running and short-lived data-centric applications on multiple runtime platforms such as Cloud Foundry, Kubernetes, Apache Mesos and Apache YARN. Spring Cloud Data Flow represents the evolution of Spring XD and retains the DSL to define data pipelines as well as the web based UI designer, but changes the component model from modules that used to run inside a container to standard Spring Boot applications built with Spring Cloud Stream and Spring Cloud Task APIs. We will discuss how to make the transition from Spring XD to Spring Cloud Data Flow and demonstrate creating new applications that are deployed onto multiple runtimes.

Published in: Technology
  • Be the first to comment

Data Microservices in the Cloud

  1. 1. Data Microservices in the Cloud By Mark Pollack
  2. 2. HTTP JMS Kafka File HDFS Cassandra HAWQ JDBC Real Time Analytics
  3. 3. Spring XD Streams 3 Container Container gpfdist Cassandra jms http ZooKeeper Message Broker XD Admin stream1 = http | cassandra stream2 = jms | gpfdist On Metal/VMs
  4. 4. Spring XD Limitations • How to scale up/down instances at runtime? • How to upgrade/downgrade module instances at runtime? • How to specify resources unique to each module, e.g. memory? • Container architecture lead to parent/child class loader issues • Too many libraries in root classpath 4
  5. 5. Refactoring to a Microservice Architecture 5 From multiple modules embedded in a container to standalone executable applications From our own runtime to delegating to existing platforms
  6. 6. Data Microservices • Stand-alone, production grade applications focused on data processing • Communicating with ‘lightweight mechanisms’ – messaging middleware • ‘Event Driven’ - Microservices 6 “Write programs that do one thing and do it well.” “Write programs to work together.” “Write programs to handle text streams, because that is a universal interface.” $ cat book.txt | tr ' ' ' ' | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]' | grep -v '[^a-z]‘ | sort | uniq -c | sort -rn | head
  7. 7. Application Types • Long lived Stream applications • Spring Cloud Stream • Short lived Task Applications • Spring Cloud Task 7
  8. 8. Spring Cloud Stream • Spring Boot based event-driven microservice framework • Opinionated primitives for streaming applications • Persistent Publish/Subscribe semantics • Consumer Groups • Partitioning • Pluggable messaging middleware bindings • Programming model focused on input/output objects • Adaptable to different event processing APIs 8
  9. 9. Spring Cloud Stream Applications 9 Cassandra java –jar cassandrasink-1.0.0.RELEASE.jar --ingestQuery=<some cql> --spring.cassandra.keyspace=tweetdata --spring.cloud.stream.bindings .input.destination=dataDest http java –jar twittersource-1.0.0.RELEASE.jar --consumerKey=XYZ --consumerSecret=ABC --spring.cloud.stream.bindings .output.destination=dataDest
  10. 10. Stream orchestration in Spring Cloud Data Flow 10 ingest = twitterstream | cassandraStream Definition |Spring Cloud Stream Applications Map DSL names to maven/docker artifacts dataflow:>stream create --name ingest --definition “twitterstream | cassandra” --deploy
  11. 11. Features: Persistent Messaging 11 HTTP log s1.http DLQ Message Broker • Production Ready • RabbitMQ • Kafka • Experimental • JMS • Google PubSub • Planned • Kinesis s1 = http | log
  12. 12. Features: Named Destinations 12 HTTP JMS S3 myInputDestination s1 = http > :myInputDestination s2 = jms > :myInputDestination s3 = aws-s3 > :myInputDestination
  13. 13. Features: Consumer Groups 13 HTTP s1.http HDFS s1 = http | hdfs s2 = :s1.http > counter COUNTER group: s1 group: s2 HDFS
  14. 14. Simple Real Time Analytics 14 tweets = twitterstream | hdfs analytics = :ingest.twitterstream > field-value-counter --fieldName=lang HTTP s1.http HDFS COUNTER Data Flow Server REST API
  15. 15. Spring Cloud Stream Programming Model 15 @EnableBinding(Processor.class) public class TransformProcessor { @StreamListener(“input”) @SendTo(“output”) public String transform(String s) { return s.toUpperCase(); } }
  16. 16. Spring Cloud Stream Programming Model 16 @EnableBinding(Processor.class) public class TransformProcessor { @StreamListener @Output(“output”) public Flux<String> transform(@Input(“input”) Flux<String> input) { return input.map(s -> s.toUpperCase()); } }
  17. 17. Spring Cloud Stream Programming Model 17 @EnableBinding(Processor.class) public class TransformProcessor { @StreamListener @Output(“output”) public Flux<WordCount> countWords(@Input("output") Flux<String> words) { return words.window(ofSeconds(5), ofSeconds(1)) .flatMap(window -> window.groupBy(word -> word) .flatMap(group -> group.reduce(0, (counter, word) -> counter + 1) .map(count -> new WordCount(group.key(), count)))); } }
  18. 18. Platform Runtimes 18 Docker Swarm Apache YARN Apache Mesos + Marathon
  19. 19. Spring Cloud Data Flow Deployment Platforms 19 Data Flow Server REST API Deployer SPI SCDF FloSCDF Shell
  20. 20. Spring Cloud Data Flow Streams 20 gpfdist cassandra jms http stream1 = http | cassandra stream2 = jms | gpfdist Message Broker Data Flow Server DB Platform Runtime
  21. 21. Deployment: Partitioning and Instance Count 21 http http work work work hdfs hdfs hdfs hdfs LoadBalancer stream create s1 --definition “http | work | hdfs” stream deploy s1 --propertiesFile ingest.properties app.http.count=2 app.work.count=3 app.hdfs.count=4 app.http.producer.partitionKeyExpression=payload.id
  22. 22. Deployment: Resource Management 22 http http work work work app.work.spring.cloud.deployer.cloudfoundry.memory=2048
  23. 23. Spring Cloud Task • Spring Boot based framework for short lived processes • Auto-configuration provides a task repository and pluggable data source • Result of each process persists beyond the life of the task for future reporting • Tasks can be any arbitrary short lived code • Well integrated with Spring Batch 23
  24. 24. Task Orchestration in Spring Cloud Data Flow 24 >task create jdbchdfs –sql=‘select * from table’ >task launch jdbchdfs jdbc2hdfs Data Flow Server DB Task Name Start Time End Time Exit Code Exit Message Last Updated Time Parameters task-event Message Broker job-execution-eventsstep-execution- eventsitem-read-events item-process-events item-write-events skip-events
  25. 25. Spring Cloud Data Flow Tasks 25 spark Data Flow Server DB http | task-launcher sqoop Message Broker task-event Platform Runtime
  26. 26. Spring Cloud Task Programming Model 26 @SpringBootApplication @EnableTask public class ExampleApplication { @Bean public CommandLineRunner commandLineRunner() { return strings -> System.out.println("Executed at :" + new SimpleDateFormat().format(new Date())); } public static void main(String[] args) { SpringApplication.run(ExampleApplication.class, args); } }
  27. 27. Provided Applications • ~60 stream and task apps • https://github.com/spring-cloud/spring-cloud-stream-app-starters • https://github.com/spring-cloud/spring-cloud-task-app-starters/ • Customize provided apps - http://start-scs.cfapps.io/ • Create new stream/task apps - http://start.spring.io/ • Easy import of provided apps/tasks • dataflow>import app --uri http://bit.ly/1-0-2-GA-stream-applications-kafka-maven 27
  28. 28. UI : Dashboard with Designer 28
  29. 29. XD to SCDF - Terminology 29 XD-Admin Data Flow Server (local, CF, YARN, k8s, Mesos) XD-Container N/A Modules Applications Admin UI Dashboard Message Bus Binders Job Task
  30. 30. DEMO 30
  31. 31. Upcoming features • Some ‘porting’ from XD • Batch Job DSL + Designer • Role based access • Looking forward • Spring Cloud Sleuth • JavaDSL • In-place application version upgrades with Spinnaker • Application Groups • Polyglot • Expanded analytics with Redis and Python/R ecosystem • More provided apps/tasks 31
  32. 32. Related Talks • Building Resilient and Evolutionary Data Microservices – Tuesday 2:00pm • Cloud Native Java – Tuesday 2:00pm • Task Madness - Modern On Demand Processing – Tuesday 2:40pm • Spinnaker – Land of a 1000 Builds – Tuesday 5:00pm • Spring and Big Data – Tuesday 5:00pm • Migrating from Spring XD to Spring Cloud Data Flow – Thursday 10:10am • Orchestrate All the Things! with Spring Cloud Data Flow – Thursday 11:10am • Cloud Native Streaming and Event-Driven Microservices – Wednesday 4:20pm 32
  33. 33. Get Started… • http://cloud.spring.io/spring-cloud-dataflow/ • http://cloud.spring.io/spring-cloud-stream/ • http://cloud.spring.io/spring-cloud-task/ • https://github.com/spring-cloud/spring-cloud-deployer 33
  34. 34. Learn More. Stay Connected. http://cloud.spring.io/spring-cloud-dataflow/ @springcentral spring.io/blog @pivotal pivotal.io/blog @pivotalcf http://engineering.pivotal.io
  35. 35. Spring XD Log Module 35 <channel id="input"/> <logging-channel-adapter channel="input" level="${level}" logger-name="xd.sink.${name}" expression="${expression}"/> log.xml info.shortDescription = Logs the message payload. options_class =o.s.xd.dirt.modules.metadata.LogSinkOptionsMetadata log.properties
  36. 36. Spring Cloud Data Flow Log Application 36 @EnableBinding(Sink.class) @EnableConfigurationProperties(LogSinkProperties.class) public class LogSinkConfiguration { @Autowired private LogSinkProperties properties; @Bean @ServiceActivator(inputChannel = Sink.INPUT) public LoggingHandler logSinkHandler() { LoggingHandler loggingHandler = new LoggingHandler(this.properties.getLevel().name()); loggingHandler.setExpression(this.properties.getExpression()); loggingHandler.setLoggerName(this.properties.getName()); return loggingHandler; } } LogSinkConfiguration.java

×