When Streaming Becomes Strategic

© 2016 MapR Technologies© 2016 MapR Technologies
When Streaming Becomes Strategic

© 2016 MapR Technologies
Today’s Presenters
Jack Norris
SVP – Data Applications
@Norrisjack
Robin Bloor
Chief Analyst and Co-founder
@robinbloor

Agenda
• The data movement issue
• Hadoop and the problem and gravity
• Streaming architecture in big data
• The Event-Insight-Outcome framework
• Introduction to MapR Streams
• Customer cases studies

• It has always been necessary to move data because centralized
computing does not scale
• Data volumes grow at 50-60% per annum, and hence has
increasing inertia (gravity)
• Data is born distributed and its level of distribution will increase
with time
• Processing data in flight Stream processing is becoming both
common and necessary for some applications
• Hadoop’s HDFS the first truly scalable file system also has
scalability limits
The Data Movement Issue

• We no longer process transactions we process events
(click-streams, log files, IoT, etc.)
• Batching is a software process used for the sake of efficiency.
• Batches are becoming micro-batches and streams.
• Some analytic applications can only work processing streams.
• OLTP applications are stream processing of a kind.
• But, fast queries on large data collections require data pools and data lakes
with powerful query engines above them.
• So the “database/data warehouse” does not vanish.
The Streaming Dynamic

What is an Event?
An event is an action or occurrence detected by a
program. Events can be user actions (such as clicking
a link on a web page or selling a stock), sensor actions
(such as reading temperature), system occurrences
(such as server crash).
Examples:
• Retail: Item sold, item out of stock, payment accepted, payment rejected etc.
• Telco: Call initiated, call ended, call dropped etc.
• IoT: Temperature reading, pressure reading, moisture level reading etc.
• Healthcare: Vital signs unstable, patient released, patient billed, image taken etc.
• IT: System crash, unautorized access, login failed etc.
• Automobile: Engine error detected, tyre pressure low etc.

The Way We Were
• This is beginning to fail because it doesn’t scale and it is expensive
• The architectural weakness is in the staging and in a central data repository
• Staging became a problem because of unstructured data
Data
Warehouse
Data Marts
Transactional
Systems
File(s)
Data
Staging
ETL
ETL ETL
Queries

The Data Lake Concept
Data Lake Applications
• ETL – data acquisition
• Data Lineage – for analytic usage
• Metadata Discovery –
external data (at least)
• Metadata management – the data
catalog
• Governance – many aspects
• Life Cycle Management –
to archive or deletion
• MDM – business glossary or ontology
• ETL – to data engines
• Direct Applications
This is far too much work for a single Hadoop instance
Collect
Data Prep
Static Data Sources Data Streams
Data Lake
or Hub
MetaData
Discovery
MetaData
Management
Data
Cleansing
Data Lineage
MDM
ETL
Life Cycle
Mgt
GovernETL
Analytics or
BI Apps
Data Warehouse or
Big Data DBMS

The Likely Future
• This is a remarkably flexible architecture allowing for both data distribution and concentration
• It accommodates streaming, normal application latency and high concurrency as occurs with database
• It genuinely scales
Data Sources Database
Queries
Data Lake
Apps
Streaming
Apps
Hadoop
Instance
Hadoop
Instance
Hadoop
Instance
Fast Pipe Fast Pipe
Urgent Streams

Increased computer power reduces latencies compressing time
● Event = action or transaction
● Insight = analysis
● Outcome = business result
This can move to
● Event = action
● Insight = predictive analysis of action and response
● Event = transaction
● Insight = analysis of aggregation
● Outcome = different business result
Analytics is gradually becoming part of the business transaction rather than a later activity
Event-Insight-Outcome

© 2016 MapR Technologies© 2016 MapR Technologies© 2016 MapR Technologies
The Event-Insight-Outcome Framework

Events, Insights, and Outcomes – A Framework
Events Fast
Insights
Historical
Perspective
Deep
Insights
Real-time
Actions
Business
Outcomes

Questions to ask:
• What “events” are most relevant?
• What business benefits accrue by spotting trends, anomalies, and patterns in
real-time?
• What insights can be gleaned by looking at trends, anomalies, and patterns in
a deeper, more historical context?
• What business actions need to be the result of these insights?
Applying EIO Framework to a Customer or Vertical

© 2016 MapR Technologies© 2016 MapR Technologies© 2016 MapR Technologies
MapR Streams and the Converged Data
Platform

Without a Converged Platform
Open Source
DatabaseStreams
Enterprise
Storage
Batch Loads
Real Time
Apps
Streaming
Sources

The Converged Big Data Platform
Open Source
Streams
Enterprise
Storage
Database
MapR Converged
Big Data Platform

The Converged Big Data Platform
MapR Converged
Big Data Platform

Life with a Converged Platform
Stream ProcessingBulk ProcessingSources/Apps
Enterprise-Grade Platform Services
Data
Web-Scale Storage
MapR-FS MapR-DB MapR Streams
Database Event Streaming
Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy

MapR Streams:
Global Pub-sub Event Streaming System for Big Data
“Publish” means writing events to
MapR Streams topics
“Subscribe” means reading events
from MapR Streams topics
Guaranteed, immediate delivery
to all consumers.
Tie together geo-dispersed clusters.
Worldwide.
Standard real-time API (Kafka).
Integrates with Spark Streaming,
Storm, Apex, and Flink
To
pi
c
Stream
Producers
Remote sites and consumers
Batch analytics
Topic
Replication
Consumers
Consumers

MapR Streams Benefits
Simpler
and Faster
Architecture
• Converged platform with file storage and database reduces
data movement, data latency, hardware cost, and
administration cost
• Event streaming and stream processing in the same cluster
enables faster processing
• Unified security framework with files and database tables
reduces administration cost around setting up and enforcing
security policies
• Multi-tenant - topic isolation, quotas, data placement control
allows multiple isolated streaming applications to run on the
same cluster reducing hardware cost and data movement

MapR Streams Benefits
• Global data replication enables
disaster recovery
• One unified view of all data
created and distributed
• Ingest more events to
enable faster insights and
Hold on to events longer to
enable deeper insights

© 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential © 2016 MapR Technologies© 2016 MapR Technologies
Use Cases

Altitude Digital

Largest Biometric Database in the World
PEOPLE
1.2B
PEOPLE

Yield Management Optimization
Global
Semi-conductor
Company

National Oilwell Varco (NOV)

JSON DB
(MapR-DB)
Graph DB
(Titan on
MapR-DB)
Search Engine
(Elastic-Search)
Transforming the Health Care Ecosystem
Electronic Medical Records
“The Stream is the
System of Record”
–Brad Anderson
VP Big Data Informatics

Q&AEngage with us!
1. Whitepaper: When Streaming Becomes Strategic
https://www.mapr.com/when-streaming-becomes-strategic
2. Book: Streaming Architecture – Kafka and MapR Streams
https://www.mapr.com/streaming-architecture-using-apache-kafka-mapr-streams
3. Get Answers: MapR Converge Community:
https://community.mapr.com/news

When Streaming Becomes Strategic

More Related Content

What's hot

Viewers also liked

Similar to When Streaming Becomes Strategic

More from MapR Technologies

Recently uploaded

When Streaming Becomes Strategic

Editor's Notes