Spring XD 
Pivotal Confidential–Internal Use Only 
Glenn Renfro 
grenfro @pivotal.io 
@CPPWFS
Volume 
Pivotal Confidential–Internal Use Only 
Velocity 
Variety 
Veracity 
60-100 sensors in each car 
22 Billion sensors by 2020 
420 Million Wearables 
Data 
90% of enterprise data is 
unstructured 
500 million tweets each day 
2.3 Trillion GBs of each day 
86% suspect data 
inaccuracy 
30% revenue loss due to bad 
data quality 
Data Points: McKinsey, Twitter, Gartner, IBM
Batch and Streaming 
often handled by 
multiple platforms 
Fragmented Big Data 
Pivotal Confidential–Internal Use Only 
Ecosystem 
Not all data Hadoop 
bound
SPRING XD 
EXTREME DATA 
“One stop shop for 
developing and deploying 
Big Data Applications”
Spring XD to Rescue 
Batch and Streaming 
often handled by 
multiple platforms 
Fragmented Big Data 
Ecosystem 
Not all data Hadoop 
Pivotal Confidential–Internal Use Only 
bound 
 Unified Stream and Batch Operations 
 Hadoop Batch Workflow Orchestration 
 Predictive Analytics and Model Scoring 
 Portable on-prem, YARN, EC2, PCF, Mesos, 
Docker etc. 
 Easy to Use, Extend and Integrate with other 
Technologies 
 Built on proven Spring EAI and Batch projects 
(Volume, Velocity, Veracity, and Variety)
Pivotal Confidential–Internal Use Only 
INTEGRATION BATCH BIG DATA WEB 
Jobs, Steps, 
Readers, Writers 
Ingestion, Export, 
Orchestration, Hadoop 
Controllers, REST, 
WebSocket 
Channels, Adapters, 
Filters, Transformers 
SPRING CORE 
FRAMEWORK SECURITY GROOVY REACTOR 
DATA 
RELATIONAL 
DATA ACCESS 
NON-RELATIONAL 
DATA ACCESS 
BOOT 
Bootable, Minimal, Ops-Ready 
GRAILS 
Full-stack, Web 
XD 
Stream, Taps, 
Jobs 
IO EXECUTION 
IO FOUNDATION 
IO COORDINATION 
SPRING CLOUD
Spring XD - 10,000 Foot View 
Pivotal Confidential–Internal Use Only
Streams 
HTTP 
Tail 
File 
Mail 
Twitter 
Gemfire 
Syslog 
TCP 
UDP 
JMS 
RabbitMQ 
MQTT 
Trigger 
Reactor TCP/UDP 
Pivotal Confidential–Internal Use Only 
Filter 
Transformer 
Object-to-JSON 
JSON-to-Tuple 
Splitter 
Aggregator 
HTTP Client 
JPMML Evaluator 
Shell 
Groovy 
Python 
Java 
File 
HDFS 
JDBC 
TCP 
Log 
Mail 
RabbitMQ 
Gemfire 
Splunk 
MQTT 
Dynamic Router 
Counters
Pivotal Confidential–Internal Use Only 
Create a stream with http as a source and hdfs 
as a sink. The hdfs —rollover is set to a small 
value so that we can read the file on hdfs.
Spring XD - Distributed Runtime 
Pivotal Confidential–Internal Use Only 
XD Shell 
HTTP POST /streams/aStream “M1 | M2” 
XD Admin 
(leader) 
XD Admin XD Admin Container State 
XD Container XD Container 
Message Bus 
ZooKeeper 
Spring App Context 
M1 M2
Pivotal Confidential–Internal Use Only
Pivotal Confidential–Internal Use Only
Spring XD - Analytics 
• Counters and Gauges 
• Simple & Field Value Counter 
(how many tweets for #java) 
• Aggregate Counter (how many 
tweets for #java in the week/day/hr) 
• Gauge & Rich Gauge (how many 
requests / minute?) 
• Abstract API implemented in Redis 
in-memory 
Pivotal Confidential–Internal Use Only 
• Predictive Model Evaluation 
• JPMML 
• Is this transaction fraudulent? 
• What group does this user belong to? 
• Interoperable with R, Rattle, 
KNIME, RapidMiner, MADLib
Jobs 
Pivotal Confidential–Internal Use Only 
CSV to JDBC 
FTP to HDFS 
JDBC to HDFS 
HDFS to JDBC 
HDFS to MongoDB
SENSORS 
SOCIAL 
Pivotal Confidential–Internal Use Only 
REALTIME 
VIEWS 
BATCH 
VIEWS 
Spring 
XD 
Spring 
XD 
MASTER 
DATASET 
Spring 
BOOT 
Spring 
BOOT 
Spring 
BOOT 
FILES 
Stream 
Processing 
Analytics 
Ingest 
Workflow 
Orchestration 
Export 
XD> 
GemFire XD 
Predictive 
Modeling 
GemFire XD 
SPEED 
LAYER 
BATCH 
LAYER 
SERVING 
LAYER 
PCF - BOSH Service PCF - Apps 
MOBILE
Pivotal Confidential–Internal Use Only 
Unified runtime 
for both Real-time 
and Batch 
use cases 
Scalable, 
Distributed and 
Fault Tolerant 
Runtime 
Increased 
Productivity through 
out-of-the-box 
components 
Closed Loop 
Analytics through 
online (stream) and 
offline (batch) data 
Swiss-army knife of data 
movement and data 
pipelines 
Repeatable ‘turnkey’ 
solution for next generation 
data-centric use cases
Agility: Easy to Setup and Run 
Pivotal Confidential–Internal Use Only 
Writing HTTP Data 
to HDFS 
…that simple! 
or 
or 
or
Spring XD on YARN 
Pivotal Confidential–Internal Use Only 
Spring XD Running 
on 
YARN! 
Copies Files to 
Creates HDFS 
manifest.yml 
Spring Boot App 
‘xd-yarn start admin’ 
Spring Boot App 
‘xd-yarn start container’ 
Spring Boot App
Pivotal Confidential–Internal Use Only 
Even easier with PCF
Natural Fit: Reactive Streaming Pipelines 
Moving Average 
‘collect values every 500ms’ 
Pivotal Confidential–Internal Use Only 
Non-Blocking 
Backpressure 
“take all these items I have whether you can 
handle them or not” 
“give me the next N available items” 
OLD 
NEW Microbatching 
‘either 1024b or 350ms; trigger downstream processing’
Deployment Manifest – Module Count 
• http | doWork | hdfs 
http 
http 
Pivotal Confidential–Internal Use Only 
doWork 
doWork 
doWork 
doWork 
hdfs 
hdfs 
hdfs 
stream deploy –name s1 
--properties 
module.http.count=2, 
module.doWork.count=4, 
module.hdfs.count=3
Deployment Manifest – Module Placement 
• http | doWork | hdfs 
http 
http 
Pivotal Confidential–Internal Use Only 
doWork 
doWork 
doWork 
doWork 
hdfs 
hdfs 
hdfs 
stream deploy –name s1 
--properties 
module.http.count=2, 
module.doWork.count=4, 
module.hdfs.count=3, 
module.http.criteria = 
groups.contains(‘WEB’) 
WEB
Deployment Manifest – Data Partitioning 
• http | doWork | hdfs 
http 
http 
Pivotal Confidential–Internal Use Only 
doWork 
doWork 
doWork 
doWork 
hdfs 
hdfs 
hdfs 
stream deploy –name s1 
--properties 
... 
module.http.producer 
.partitionKeyExpression = 
payload.customerId 
WEB 
doWork modules will always 
process the same set of customer 
IDs
Learn More 
• Project: http://projects.spring.io/spring-xd/ 
• GitHub: https://github.com/spring-projects/spring-xd/ 
• Wiki: https://github.com/spring-projects/spring-xd/wiki 
• Samples: https://github.com/spring-projects/spring-xd-samples 
Pivotal Confidential–Internal Use Only
Pivotal Confidential–Internal Use Only 
A NEW PLATFORM FOR A NEW ERA

Big Data Applications Made Easy: Fact Or Fiction?

  • 1.
    Spring XD PivotalConfidential–Internal Use Only Glenn Renfro grenfro @pivotal.io @CPPWFS
  • 2.
    Volume Pivotal Confidential–InternalUse Only Velocity Variety Veracity 60-100 sensors in each car 22 Billion sensors by 2020 420 Million Wearables Data 90% of enterprise data is unstructured 500 million tweets each day 2.3 Trillion GBs of each day 86% suspect data inaccuracy 30% revenue loss due to bad data quality Data Points: McKinsey, Twitter, Gartner, IBM
  • 3.
    Batch and Streaming often handled by multiple platforms Fragmented Big Data Pivotal Confidential–Internal Use Only Ecosystem Not all data Hadoop bound
  • 4.
    SPRING XD EXTREMEDATA “One stop shop for developing and deploying Big Data Applications”
  • 5.
    Spring XD toRescue Batch and Streaming often handled by multiple platforms Fragmented Big Data Ecosystem Not all data Hadoop Pivotal Confidential–Internal Use Only bound  Unified Stream and Batch Operations  Hadoop Batch Workflow Orchestration  Predictive Analytics and Model Scoring  Portable on-prem, YARN, EC2, PCF, Mesos, Docker etc.  Easy to Use, Extend and Integrate with other Technologies  Built on proven Spring EAI and Batch projects (Volume, Velocity, Veracity, and Variety)
  • 6.
    Pivotal Confidential–Internal UseOnly INTEGRATION BATCH BIG DATA WEB Jobs, Steps, Readers, Writers Ingestion, Export, Orchestration, Hadoop Controllers, REST, WebSocket Channels, Adapters, Filters, Transformers SPRING CORE FRAMEWORK SECURITY GROOVY REACTOR DATA RELATIONAL DATA ACCESS NON-RELATIONAL DATA ACCESS BOOT Bootable, Minimal, Ops-Ready GRAILS Full-stack, Web XD Stream, Taps, Jobs IO EXECUTION IO FOUNDATION IO COORDINATION SPRING CLOUD
  • 7.
    Spring XD -10,000 Foot View Pivotal Confidential–Internal Use Only
  • 8.
    Streams HTTP Tail File Mail Twitter Gemfire Syslog TCP UDP JMS RabbitMQ MQTT Trigger Reactor TCP/UDP Pivotal Confidential–Internal Use Only Filter Transformer Object-to-JSON JSON-to-Tuple Splitter Aggregator HTTP Client JPMML Evaluator Shell Groovy Python Java File HDFS JDBC TCP Log Mail RabbitMQ Gemfire Splunk MQTT Dynamic Router Counters
  • 9.
    Pivotal Confidential–Internal UseOnly Create a stream with http as a source and hdfs as a sink. The hdfs —rollover is set to a small value so that we can read the file on hdfs.
  • 10.
    Spring XD -Distributed Runtime Pivotal Confidential–Internal Use Only XD Shell HTTP POST /streams/aStream “M1 | M2” XD Admin (leader) XD Admin XD Admin Container State XD Container XD Container Message Bus ZooKeeper Spring App Context M1 M2
  • 11.
  • 12.
  • 13.
    Spring XD -Analytics • Counters and Gauges • Simple & Field Value Counter (how many tweets for #java) • Aggregate Counter (how many tweets for #java in the week/day/hr) • Gauge & Rich Gauge (how many requests / minute?) • Abstract API implemented in Redis in-memory Pivotal Confidential–Internal Use Only • Predictive Model Evaluation • JPMML • Is this transaction fraudulent? • What group does this user belong to? • Interoperable with R, Rattle, KNIME, RapidMiner, MADLib
  • 14.
    Jobs Pivotal Confidential–InternalUse Only CSV to JDBC FTP to HDFS JDBC to HDFS HDFS to JDBC HDFS to MongoDB
  • 15.
    SENSORS SOCIAL PivotalConfidential–Internal Use Only REALTIME VIEWS BATCH VIEWS Spring XD Spring XD MASTER DATASET Spring BOOT Spring BOOT Spring BOOT FILES Stream Processing Analytics Ingest Workflow Orchestration Export XD> GemFire XD Predictive Modeling GemFire XD SPEED LAYER BATCH LAYER SERVING LAYER PCF - BOSH Service PCF - Apps MOBILE
  • 16.
    Pivotal Confidential–Internal UseOnly Unified runtime for both Real-time and Batch use cases Scalable, Distributed and Fault Tolerant Runtime Increased Productivity through out-of-the-box components Closed Loop Analytics through online (stream) and offline (batch) data Swiss-army knife of data movement and data pipelines Repeatable ‘turnkey’ solution for next generation data-centric use cases
  • 17.
    Agility: Easy toSetup and Run Pivotal Confidential–Internal Use Only Writing HTTP Data to HDFS …that simple! or or or
  • 18.
    Spring XD onYARN Pivotal Confidential–Internal Use Only Spring XD Running on YARN! Copies Files to Creates HDFS manifest.yml Spring Boot App ‘xd-yarn start admin’ Spring Boot App ‘xd-yarn start container’ Spring Boot App
  • 19.
    Pivotal Confidential–Internal UseOnly Even easier with PCF
  • 20.
    Natural Fit: ReactiveStreaming Pipelines Moving Average ‘collect values every 500ms’ Pivotal Confidential–Internal Use Only Non-Blocking Backpressure “take all these items I have whether you can handle them or not” “give me the next N available items” OLD NEW Microbatching ‘either 1024b or 350ms; trigger downstream processing’
  • 21.
    Deployment Manifest –Module Count • http | doWork | hdfs http http Pivotal Confidential–Internal Use Only doWork doWork doWork doWork hdfs hdfs hdfs stream deploy –name s1 --properties module.http.count=2, module.doWork.count=4, module.hdfs.count=3
  • 22.
    Deployment Manifest –Module Placement • http | doWork | hdfs http http Pivotal Confidential–Internal Use Only doWork doWork doWork doWork hdfs hdfs hdfs stream deploy –name s1 --properties module.http.count=2, module.doWork.count=4, module.hdfs.count=3, module.http.criteria = groups.contains(‘WEB’) WEB
  • 23.
    Deployment Manifest –Data Partitioning • http | doWork | hdfs http http Pivotal Confidential–Internal Use Only doWork doWork doWork doWork hdfs hdfs hdfs stream deploy –name s1 --properties ... module.http.producer .partitionKeyExpression = payload.customerId WEB doWork modules will always process the same set of customer IDs
  • 24.
    Learn More •Project: http://projects.spring.io/spring-xd/ • GitHub: https://github.com/spring-projects/spring-xd/ • Wiki: https://github.com/spring-projects/spring-xd/wiki • Samples: https://github.com/spring-projects/spring-xd-samples Pivotal Confidential–Internal Use Only
  • 25.
    Pivotal Confidential–Internal UseOnly A NEW PLATFORM FOR A NEW ERA

Editor's Notes

  • #3 Big Data Overview: Everything starts with Data! Let’s look at the 4 V’s of Big Data. Volume: Data generation is at massive scale Velocity: Need for data agility is mandatory Veracity: Bad quality of data poses enormous risk Variety: Heterogeneous data requirements
  • #4 Flume Storm Spark * notes* oozie List the top challenges. Hadoop isn’t always the target… Mongo, RDBMS, Redis, In memory data grid, or as a stream to a micro service
  • #5 Pitch Spring XD! Relate to the discussed problem and progress to the next slide for solutions.
  • #6 Let’s see how Spring XD tackles the described challenges. http client hdfs stream create foo --definition "http |hdfs --rollover=11" —deploy http post --target http://localhost:9000 --data "hello world” hadoop fs ls /xd/foo hadoop config fs --namenode hdfs://localhost:8020
  • #7 Brief overview on Spring IO platform.
  • #8 Architecture overview.
  • #14 http client filter hdfs http client filter rdbms http client filter count on hdfs job move that data to mongo
  • #17 Closer look at Spring XD’s business value proposition. Unified runtime Runtime features Productivity Closed loop analytics Enterprise data pipelines Data-centric use cases
  • #18 Easy to setup.
  • #19 Even on YARN, it’s that SIMPLE!
  • #20 A
  • #21 Spring Reactor’s NIO and async dispatcher fits Spring XD model naturally.