© 2016 Pivotal
!1
Stream Processing in the Cloud with Data
Microservices
Marius Bogoevici, Software Engineer, Pivotal
@mariusbogoevici
!2
Stream Processing in the Cloud with Data Microservices
• Use Cases
• Predictive maintenance
• Fraud detection
• QoS measurement
• Log analysis
• High throughput/low latency
• Growing quantities of data
• Immediate response is required
• Grouping and ordering of data
• Partitioning
• Windowing
!3
Stream Processing in the Cloud with Data Microservices
• Huge quantities of data to be analyzed efficiently
• Scaling requirements
• Massive storage
• Massive computing power (memory/CPU)
• Massive scalability, from a few machines to data center level
• Reliance on platform’s resource management abilities
• public and private cloud: AWS
• cluster managers: Apache YARN, Apache Mesos,
Kubernetes
• full application platforms: Cloud Foundry
!4
Stream Processing in the Cloud with Data Microservices
• Microservice pattern applied to data processing applications
• Typical benefits of microservices:
• scalability, isolation, agility, continuous deployment,
operational control
• Tuning process-specific resources
• Instance count
• Memory
• CPU
• Event-driven interaction
• communication decoupling, distributed data consistency
• There’s life after REST
!5
dataflow:> stream create demo --definition “http | file”
!6
Spring Cloud Stream
▪ Built on battle-tested code
▪ Spring Boot: full-stack standalone apps
▪ Spring Integration: EAI patterns, Connectors
▪ Opinionated primitives
▪ Persistent Publish Subscribe Semantics
▪ Consumer Groups
▪ Partitioning
▪ Pluggable Middleware binding API
▪ Write applications in a middleware agnostic manner
▪ Kafka, Rabbit, Gemfire, Redis, …
▪ Programming model focuses on input/output channels
!7
Spring Cloud Stream in a nutshell …
!8
… in a 10000 ft nutshell …
!9
Publish-subscribe semantics
▪ Fits both data streaming and event-driven use cases
!10
Consumer groups
▪ Borrowed from Kafka, applied to all binders
▪ Groups of competing consumers within the pub-sub destination
▪ Durable subscriptions
▪ Used in scaling and partitioning
!11
Partitioning
▪ Required for stateful processing scenarios
▪ Outputs specify a data partitioning strategy
▪ Inputs can be bound to a specific partition
!12
Programming model: Spring Integration
package	org.springframework.cloud.stream.messaging;	
public	interface	Source	{	
							
						String	OUTPUT	=	"output";	
							
						@Output(Source.OUTPUT)	
						MessageChannel	output();	
}	
@EnableBinding(Source.class)	
@SpringBootApplication	
public	class	Application	{	
		@InboundChannelAdapter(Source.OUTPUT)	
public String sayHello() {
return “hello” + System.currentTimeMillis();
}
}
!13
Programming model: @StreamListener
package	org.springframework.cloud.stream.messaging;	
public	interface	Sink	{	
							
						String	INPUT	=	"input";	
							
						@Input(Sink.input())	
						SubscribableChannel	input();	
							
}
@EnableBinding(Processor.class)	
public	class	GreetingProcessor	{	
@Autowire	GreetingService	greetingService;	
				@StreamListener(Sink.INPUT)	
				public	void	greet(Greeting	greeting)	{	
								greetingService(greeting);	
				}	
}	
{greeting: “hello world”}
contentType: application/json
!14
@Enable All the Things
▪ @EnableBindings(Source.class)
▪ one output
▪ @EnableBindings(Sink.class)
▪ one input
▪ @EnableBinding(Processor.class)
▪ one input and one output
▪ @EnableBinding(MyOrderHandler.class)
▪ custom interfaces with as many inputs and outputs
▪ @EnableRxJavaProcessor
▪ OOTB support for RxJava with one input and one output
!15
Binder SPI
!16
Binder SPI - mapping on native middleware support
!17
Spring Cloud Data Flow
▪ Portable Orchestration Layer for Stream and Tasks
▪ Stream and Task DSL
http | transform | hdfs
▪ REST API
▪ Shell
▪ UI
▪ OOTB apps for common integration use-cases
!18
Spring Cloud Deployer
▪ SPI for deploying applications to modern runtimes
▪ Local (for testing)
▪ YARN
▪ Cloud Foundry
▪ Kubernetes
▪ Mesos + Marathon
!19
dataflow:> stream create demo --definition “http | file”
▪ Stream definition
▪ Launching Boot applications
▪ Can pass configuration parameters via Spring Boot
▪ Control instance count, resource allocation
!20
Spring Cloud Data Flow
Pla$orm
!21
Data Flow Developer Experience
▪ 1. Create Spring Cloud Stream Microservice App
@EnableBinding(Processor.class)	
public	class	UpperCase	{	
	@Transformer(inputChannel	=	Processor.INPUT,	outputChannel=Processor.OUTPUT)	
	public	String	process(String	message)	{	
	 	return	message.toUpperCase();	
	}	
}
!22
Data Flow Development Experience
dataflow:>	module	register	--name	uppercase	--type	processor	--coordinates	group:ar7fact:version	
dataflow:>	stream	create	demo		--defini7on	
																															"h=p	--server.port=9000	|	uppercase	|	file	--directory=/tmp/demodata"	
2:	Build	and		Install:	
$	mvn	clean	install	
3:	Register	Module	with	Data	Flow:	
4:	Define	Stream	via	DSL:
!23
Taps
dataflow:> stream create tap --definition ":demo.http > counter --store=redis"
dataflow:> stream create demo --definition
"http --server.port=9000 | uppercase | file --directory=/tmp/devnexus"
!24
Roadmap
▪ Spring Cloud Stream 1.0.0.RC1 - March 22, 2016
▪ Spring Cloud Stream 1.0.0.GA - March 2016
▪ Spring Cloud Data Flow - 1.0.0.GA Q2 2016
!25
Links!
h"p://cloud.spring.io/spring-cloud-stream	
h"p://github.com/spring-cloud/spring-cloud-stream-modules	
h"p://github.com/spring-cloud/spring-cloud-stream-samples	
h"ps://github.com/mbogoevici/devnexus2016	
h"p://github.com/spring-cloud/spring-cloud-deployer	*	
h"p://cloud.spring.io/spring-cloud-dataflow	*	
h"p://github.com/spinnaker	
*	Cloud	Foundry,	YARN,	Kubernetes,	Mesos+Marathon	variants	
QuesNons:		
email:	mbogoevici@pivotal.io	
twi"er:	@mariusbogoevici

Stream Processing in the Cloud With Data Microservices