Big Data Applications Made Easy: Fact Or Fiction?

Spring XD
Pivotal Confidential–Internal Use Only
Glenn Renfro
grenfro @pivotal.io
@CPPWFS

Volume
Velocity
Variety
Veracity
60-100 sensors in each car
22 Billion sensors by 2020
420 Million Wearables
Data
90% of enterprise data is
unstructured
500 million tweets each day
2.3 Trillion GBs of each day
86% suspect data
inaccuracy
30% revenue loss due to bad
data quality
Data Points: McKinsey, Twitter, Gartner, IBM

Batch and Streaming
often handled by
multiple platforms
Fragmented Big Data
Ecosystem
Not all data Hadoop
bound

SPRING XD
EXTREME DATA
“One stop shop for
developing and deploying
Big Data Applications”

Spring XD to Rescue
Batch and Streaming
often handled by
multiple platforms
Fragmented Big Data
Ecosystem
Not all data Hadoop
bound
 Unified Stream and Batch Operations
 Hadoop Batch Workflow Orchestration
 Predictive Analytics and Model Scoring
 Portable on-prem, YARN, EC2, PCF, Mesos,
Docker etc.
 Easy to Use, Extend and Integrate with other
Technologies
 Built on proven Spring EAI and Batch projects
(Volume, Velocity, Veracity, and Variety)

INTEGRATION BATCH BIG DATA WEB
Jobs, Steps,
Readers, Writers
Ingestion, Export,
Orchestration, Hadoop
Controllers, REST,
WebSocket
Channels, Adapters,
Filters, Transformers
SPRING CORE
FRAMEWORK SECURITY GROOVY REACTOR
DATA
RELATIONAL
DATA ACCESS
NON-RELATIONAL
DATA ACCESS
BOOT
Bootable, Minimal, Ops-Ready
GRAILS
Full-stack, Web
XD
Stream, Taps,
Jobs
IO EXECUTION
IO FOUNDATION
IO COORDINATION
SPRING CLOUD

Spring XD - 10,000 Foot View

Streams
HTTP
Tail
File
Mail
Twitter
Gemfire
Syslog
TCP
UDP
JMS
RabbitMQ
MQTT
Trigger
Reactor TCP/UDP
Filter
Transformer
Object-to-JSON
JSON-to-Tuple
Splitter
Aggregator
HTTP Client
JPMML Evaluator
Shell
Groovy
Python
Java
File
HDFS
JDBC
TCP
Log
Mail
RabbitMQ
Gemfire
Splunk
MQTT
Dynamic Router
Counters

Create a stream with http as a source and hdfs
as a sink. The hdfs —rollover is set to a small
value so that we can read the file on hdfs.

Spring XD - Distributed Runtime
XD Shell
HTTP POST /streams/aStream “M1 | M2”
XD Admin
(leader)
XD Admin XD Admin Container State
XD Container XD Container
Message Bus
ZooKeeper
Spring App Context
M1 M2

Spring XD - Analytics
• Counters and Gauges
• Simple & Field Value Counter
(how many tweets for #java)
• Aggregate Counter (how many
tweets for #java in the week/day/hr)
• Gauge & Rich Gauge (how many
requests / minute?)
• Abstract API implemented in Redis
in-memory
• Predictive Model Evaluation
• JPMML
• Is this transaction fraudulent?
• What group does this user belong to?
• Interoperable with R, Rattle,
KNIME, RapidMiner, MADLib

Jobs
CSV to JDBC
FTP to HDFS
JDBC to HDFS
HDFS to JDBC
HDFS to MongoDB

SENSORS
SOCIAL
REALTIME
VIEWS
BATCH
VIEWS
Spring
XD
Spring
XD
MASTER
DATASET
Spring
BOOT
Spring
BOOT
Spring
BOOT
FILES
Stream
Processing
Analytics
Ingest
Workflow
Orchestration
Export
XD>
GemFire XD
Predictive
Modeling
GemFire XD
SPEED
LAYER
BATCH
LAYER
SERVING
LAYER
PCF - BOSH Service PCF - Apps
MOBILE

Unified runtime
for both Real-time
and Batch
use cases
Scalable,
Distributed and
Fault Tolerant
Runtime
Increased
Productivity through
out-of-the-box
components
Closed Loop
Analytics through
online (stream) and
offline (batch) data
Swiss-army knife of data
movement and data
pipelines
Repeatable ‘turnkey’
solution for next generation
data-centric use cases

Agility: Easy to Setup and Run
Writing HTTP Data
to HDFS
…that simple!
or
or
or

Spring XD on YARN
Spring XD Running
on
YARN!
Copies Files to
Creates HDFS
manifest.yml
Spring Boot App
‘xd-yarn start admin’
Spring Boot App
‘xd-yarn start container’
Spring Boot App

Even easier with PCF

Natural Fit: Reactive Streaming Pipelines
Moving Average
‘collect values every 500ms’
Non-Blocking
Backpressure
“take all these items I have whether you can
handle them or not”
“give me the next N available items”
OLD
NEW Microbatching
‘either 1024b or 350ms; trigger downstream processing’

Deployment Manifest – Module Count
• http | doWork | hdfs
http
http
doWork
doWork
doWork
doWork
hdfs
hdfs
hdfs
stream deploy –name s1
--properties
module.http.count=2,
module.doWork.count=4,
module.hdfs.count=3

Deployment Manifest – Module Placement
http
http
doWork
doWork
doWork
doWork
hdfs
hdfs
hdfs
--properties
module.http.count=2,
module.doWork.count=4,
module.hdfs.count=3,
module.http.criteria =
groups.contains(‘WEB’)
WEB

Deployment Manifest – Data Partitioning
http
http
doWork
doWork
doWork
doWork
hdfs
hdfs
hdfs
--properties
...
module.http.producer
.partitionKeyExpression =
payload.customerId
WEB
doWork modules will always
process the same set of customer
IDs

Learn More
• Project: http://projects.spring.io/spring-xd/
• GitHub: https://github.com/spring-projects/spring-xd/
• Wiki: https://github.com/spring-projects/spring-xd/wiki
• Samples: https://github.com/spring-projects/spring-xd-samples

A NEW PLATFORM FOR A NEW ERA

Big Data Applications Made Easy: Fact Or Fiction?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data Applications Made Easy: Fact Or Fiction?

Similar to Big Data Applications Made Easy: Fact Or Fiction? (20)

Recently uploaded

Recently uploaded (20)

Big Data Applications Made Easy: Fact Or Fiction?

Editor's Notes