Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbend Platform

3PAR Streaming Journey
Chris McDermott
January 2020

Agenda
1. The 3PAR use case
2. Batch to Streaming transition
3. Reactive architecture, micro-services and event sourcing
2

The 3PAR Use case
• 50K 3PAR Storage Arrays (SAs)
• Different data types have different arrival rates
• 500 GB/day on average
• Need to process 10x bursts
• 10K sensors per SA
• ~10 sources of enhancement data
3

The 3PAR Use case (continued)
• Joins!
• Analytics
• Statistical aggregations
• Prediction
• Projections
• Automated case management
• Legacy Integrations
• Create event sources from non-eventing services using Reactive facades
4

Batch Vs Streaming
Day 1
Processing
Data
Day 2
Processing
Data
Day 3
Processing
Data
Day 4 Day N
Day N
Processing
Data
Day 1
Input Data
Day 3
Input Data
Day 2
Input Data
Day 4
Input Data
Time Lag
T 1
Processing
Data
T 2
Processing
Data
T 3
Processing
Data
T 4 Time N
Time N
Processing
Data
T 1
Input Data
T 3
Input Data
T 2
Input Data
T 4
Input Data
No Time
Lag
Batch
Streaming

Batch Processing
• Serial processing of all data on a regular cadence
* Push = project data to new Elasticsearch index
• As the amount of history and systems increase, each stage of the pipeline takes longer to run
• Failures take a long time to recover from (Push failure at the 20-hour mark…)
• Large quanta problems means repeating failed changes takes a long time
• Based on Spark, so monitoring is sub-par
6
Gather
(1 hour)
Process
(4 hours)
Push*
(30 hours)

Streaming Processing
• Parallel processing of (relatively) small chunks of data (per SA) as soon as the data is received
• Data lag, or data freshness, is always consistent at less than 5 minutes. New data is always
processed as soon as it is available.
• Failure recovery is extremely fast
oWhen running at 10x line rate, full recovery is roughly 10% of outage time (2-day outage is
recovered in ~5 hours)
oCan dynamically apply more resources to increase processing performance
• Based on Lagom/Akka, which provides built-in metrics and reporting framework
7

Reactive Architecture and Technologies

Reactive Architecture
“Systems built as Reactive Systems are more flexible, loosely-coupled and scalable. This makes them easier to develop
and amenable to change. They are significantly more tolerant of failure and when a failure does occur they meet it with
elegance rather than disaster. Reactive Systems are highly responsive, giving users effective interactive feedback.”
9
Reactive Systems Are:
• Responsive: The system responds in a timely manner if at all possible.
• Resilient: The system stays responsive in the face of failure. This applies not only to highly-available,
mission-critical systems — any system that is not resilient will be unresponsive after a failure.
• Elastic: The system stays responsive under varying workload. Reactive Systems can react to changes
in the input rate by increasing or decreasing the resources allocated to service these inputs.
• Message Driven: Reactive Systems rely on asynchronous message-passing to establish a boundary
between components that ensures loose coupling, isolation and location transparency.
Summarized from the Reactive Manifesto

3PAR Streaming – Technology Stack
10
Apache Kafka – durable, elastic, fault-tolerant, log based, message bus
Apache Cassandra – durable, elastic, fault-tolerant noSQL database
Elasticsearch – durable, elastic, fault-tolerant document-store optimized for
search
Apache NiFi – dataflow automation with graphical programming interface
Akka - a toolkit for building highly concurrent, distributed, and resilient message-
driven applications
Play – web based (REST) application framework based on Akka

3PAR Streaming – Technology Stack
11
Lightbend Commercial Components
• Split Brain Resolver
• Telemetry
• Thread Starvation Detector

3PAR Simplified Streaming Component Architecture
Ingest
Support
Tickets
Entitlement
ML
Application
NiFi
Support Lagom
Entitlement Lagom
ES Projector
*
*
…
…
Elasticsearch
StoreServ API
StoreServ Akka
InfoSight UI
HPE

Akka
StoreServ Akka
• Device shadow model
• Stores raw data in Cassandra (data lake)
• Stores Actor State in Cassandra
• Actors cache most recent data in memory for very low latency
• Actors are rehydrated from State in Cassandra
• Actors are not passivated
• Scale out by adding more instances when running out heap

Akka vs Lagom
Various stateful micro services written using Lagom
• Lagom makes sense for event driven micro services
• If you can store the entire event history and rebuild the read-side from it in a reasonable amount
of time (event sourcing)
• Most use cases fall into this category.
• Plain Akka makes more sense if you can’t afford to save the entire event history or rebuilding the read-
side from the event history is too expensive. Or you simply don’t need the entire event history to
rebuild the read-side.
• Persisted the entire read-side (Kafka)
• Still CQRS (but not event-sourced.)

ES Projector
• Akka Streams application
• Reads full data model
• Creates “role” based projections of data into Elasticsearch
StoreServ API
• Uses Play Framework
• Basically provides a thin wrapper over Elasticsearch queries
• Modifies client queries to enforce access control (both tenancy and role restrictions)

What was gained?
• Responsive: InfoSight is updated in near real-time
• Reduced lag gives customers greater confidence
• Allows automated support (problems can be remediated sooner: outages are prevented)
• Resilient: InfoSight is more reliable
• Microservice isolation means the system degrades instead of totally fails.
• Elastic: Containerization and Scale-out technologies
• The system can easily be scaled to account for growth and new processing
• Message Driven: Well defined boundaries and client managed message consumption
• Isolated components are more easily understood.
• New components can be added without changing the underlying architecture.

Bottom Line
• Customer satisfaction is increased
• HPE costs are decreased

Data Platform
Goals
• Allow the on-boarding of a disparate set product lines quickly and efficiently
• Data-lake to share data across internal organizations
• Support exploratory analytics – Data democracy
• Access Control and Multitenancy (RBAC)
• Uniform Data Access API
• Support for ML workflows

Data Platform
• Scala
• Kubernetes
• S3
• Delta-Lake
• Spark
• Akka
• Kafka
• Cassandra/ElasticSearch/PostgresSQL,

HPE is Hiring
Click on the links below to see job descriptions – note: the URLs are subject to change.
For the latest information, visit https://careers.hpe.com
https://careers.hpe.com/job/Hewlett-Packard-Enterprise-Andover-Massachusetts/91589424
You may apply through the career site or send resumes directly to victor.volpe@hpe.com

Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbend Platform

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbend Platform

Similar to Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbend Platform (20)

More from Lightbend

More from Lightbend (20)

Recently uploaded

Recently uploaded (20)

Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbend Platform

Editor's Notes