© 2016 MapR Technologies© 2016 MapR Technologies
When Streaming Becomes Strategic
© 2016 MapR Technologies
Today’s Presenters
Jack Norris
SVP – Data Applications
@Norrisjack
Robin Bloor
Chief Analyst and Co-founder
@robinbloor
© 2016 MapR Technologies
Agenda
• The data movement issue
• Hadoop and the problem and gravity
• Streaming architecture in big data
• The Event-Insight-Outcome framework
• Introduction to MapR Streams
• Customer cases studies
© 2016 MapR Technologies
• It has always been necessary to move data because centralized
computing does not scale
• Data volumes grow at 50-60% per annum, and hence has
increasing inertia (gravity)
• Data is born distributed and its level of distribution will increase
with time
• Processing data in flight Stream processing is becoming both
common and necessary for some applications
• Hadoop’s HDFS the first truly scalable file system also has
scalability limits
The Data Movement Issue
© 2016 MapR Technologies
• We no longer process transactions we process events
(click-streams, log files, IoT, etc.)
• Batching is a software process used for the sake of efficiency.
• Batches are becoming micro-batches and streams.
• Some analytic applications can only work processing streams.
• OLTP applications are stream processing of a kind.
• But, fast queries on large data collections require data pools and data lakes
with powerful query engines above them.
• So the “database/data warehouse” does not vanish.
The Streaming Dynamic
© 2016 MapR Technologies
What is an Event?
An event is an action or occurrence detected by a
program. Events can be user actions (such as clicking
a link on a web page or selling a stock), sensor actions
(such as reading temperature), system occurrences
(such as server crash).
Examples:
• Retail: Item sold, item out of stock, payment accepted, payment rejected etc.
• Telco: Call initiated, call ended, call dropped etc.
• IoT: Temperature reading, pressure reading, moisture level reading etc.
• Healthcare: Vital signs unstable, patient released, patient billed, image taken etc.
• IT: System crash, unautorized access, login failed etc.
• Automobile: Engine error detected, tyre pressure low etc.
© 2016 MapR Technologies
The Way We Were
• This is beginning to fail because it doesn’t scale and it is expensive
• The architectural weakness is in the staging and in a central data repository
• Staging became a problem because of unstructured data
Data
Warehouse
Data Marts
Transactional
Systems
File(s)
Data
Staging
ETL
ETL ETL
Queries
© 2016 MapR Technologies
The Data Lake Concept
Data Lake Applications
• ETL – data acquisition
• Data Lineage – for analytic usage
• Metadata Discovery –
external data (at least)
• Metadata management – the data
catalog
• Governance – many aspects
• Life Cycle Management –
to archive or deletion
• MDM – business glossary or ontology
• ETL – to data engines
• Direct Applications
This is far too much work for a single Hadoop instance
Collect
Data Prep
Static Data Sources Data Streams
Data Lake
or Hub
MetaData
Discovery
MetaData
Management
Data
Cleansing
Data Lineage
MDM
ETL
Life Cycle
Mgt
GovernETL
Analytics or
BI Apps
Data Warehouse or
Big Data DBMS
© 2016 MapR Technologies
The Likely Future
• This is a remarkably flexible architecture allowing for both data distribution and concentration
• It accommodates streaming, normal application latency and high concurrency as occurs with database
• It genuinely scales
Data Sources Database
Queries
Data Lake
Apps
Streaming
Apps
Hadoop
Instance
Hadoop
Instance
Hadoop
Instance
Fast Pipe Fast Pipe
Urgent Streams
© 2016 MapR Technologies
Increased computer power reduces latencies compressing time
● Event = action or transaction
● Insight = analysis
● Outcome = business result
This can move to
● Event = action
● Insight = predictive analysis of action and response
● Event = transaction
● Insight = analysis of aggregation
● Outcome = different business result
Analytics is gradually becoming part of the business transaction rather than a later activity
Event-Insight-Outcome
© 2016 MapR Technologies© 2016 MapR Technologies© 2016 MapR Technologies
The Event-Insight-Outcome Framework
© 2016 MapR Technologies
Events, Insights, and Outcomes – A Framework
Events Fast
Insights
Historical
Perspective
Deep
Insights
Real-time
Actions
Business
Outcomes
© 2016 MapR Technologies
Questions to ask:
• What “events” are most relevant?
• What business benefits accrue by spotting trends, anomalies, and patterns in
real-time?
• What insights can be gleaned by looking at trends, anomalies, and patterns in
a deeper, more historical context?
• What business actions need to be the result of these insights?
Applying EIO Framework to a Customer or Vertical
© 2016 MapR Technologies© 2016 MapR Technologies© 2016 MapR Technologies
MapR Streams and the Converged Data
Platform
© 2016 MapR Technologies
Without a Converged Platform
Open Source
DatabaseStreams
Enterprise
Storage
Batch Loads
Real Time
Apps
Streaming
Sources
© 2016 MapR Technologies
The Converged Big Data Platform
Open Source
Streams
Enterprise
Storage
Database
MapR Converged
Big Data Platform
© 2016 MapR Technologies
The Converged Big Data Platform
MapR Converged
Big Data Platform
© 2016 MapR Technologies
Life with a Converged Platform
Stream ProcessingBulk ProcessingSources/Apps
Enterprise-Grade Platform Services
Data
Web-Scale Storage
MapR-FS MapR-DB MapR Streams
Database Event Streaming
Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy
© 2016 MapR Technologies
MapR Streams:
Global Pub-sub Event Streaming System for Big Data
“Publish” means writing events to
MapR Streams topics
“Subscribe” means reading events
from MapR Streams topics
Guaranteed, immediate delivery
to all consumers.
Tie together geo-dispersed clusters.
Worldwide.
Standard real-time API (Kafka).
Integrates with Spark Streaming,
Storm, Apex, and Flink
To
pi
c
Stream
Producers
Remote sites and consumers
Batch analytics
Topic
Replication
Consumers
Consumers
© 2016 MapR Technologies
MapR Streams Benefits
Simpler
and Faster
Architecture
• Converged platform with file storage and database reduces
data movement, data latency, hardware cost, and
administration cost
• Event streaming and stream processing in the same cluster
enables faster processing
• Unified security framework with files and database tables
reduces administration cost around setting up and enforcing
security policies
• Multi-tenant - topic isolation, quotas, data placement control
allows multiple isolated streaming applications to run on the
same cluster reducing hardware cost and data movement
© 2016 MapR Technologies
MapR Streams Benefits
• Global data replication enables
disaster recovery
• One unified view of all data
created and distributed
• Ingest more events to
enable faster insights and
Hold on to events longer to
enable deeper insights
© 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential © 2016 MapR Technologies© 2016 MapR Technologies
Use Cases
© 2016 MapR Technologies
Altitude Digital
© 2016 MapR Technologies
Largest Biometric Database in the World
PEOPLE
1.2B
PEOPLE
© 2016 MapR Technologies
Yield Management Optimization
Global
Semi-conductor
Company
© 2016 MapR Technologies
National Oilwell Varco (NOV)
© 2016 MapR Technologies
JSON DB
(MapR-DB)
Graph DB
(Titan on
MapR-DB)
Search Engine
(Elastic-Search)
Transforming the Health Care Ecosystem
Electronic Medical Records
“The Stream is the
System of Record”
–Brad Anderson
VP Big Data Informatics
© 2016 MapR Technologies
Q&AEngage with us!
1. Whitepaper: When Streaming Becomes Strategic
https://www.mapr.com/when-streaming-becomes-strategic
2. Book: Streaming Architecture – Kafka and MapR Streams
https://www.mapr.com/streaming-architecture-using-apache-kafka-mapr-streams
3. Get Answers: MapR Converge Community:
https://community.mapr.com/news

When Streaming Becomes Strategic

  • 1.
    © 2016 MapRTechnologies© 2016 MapR Technologies When Streaming Becomes Strategic
  • 2.
    © 2016 MapRTechnologies Today’s Presenters Jack Norris SVP – Data Applications @Norrisjack Robin Bloor Chief Analyst and Co-founder @robinbloor
  • 3.
    © 2016 MapRTechnologies Agenda • The data movement issue • Hadoop and the problem and gravity • Streaming architecture in big data • The Event-Insight-Outcome framework • Introduction to MapR Streams • Customer cases studies
  • 4.
    © 2016 MapRTechnologies • It has always been necessary to move data because centralized computing does not scale • Data volumes grow at 50-60% per annum, and hence has increasing inertia (gravity) • Data is born distributed and its level of distribution will increase with time • Processing data in flight Stream processing is becoming both common and necessary for some applications • Hadoop’s HDFS the first truly scalable file system also has scalability limits The Data Movement Issue
  • 5.
    © 2016 MapRTechnologies • We no longer process transactions we process events (click-streams, log files, IoT, etc.) • Batching is a software process used for the sake of efficiency. • Batches are becoming micro-batches and streams. • Some analytic applications can only work processing streams. • OLTP applications are stream processing of a kind. • But, fast queries on large data collections require data pools and data lakes with powerful query engines above them. • So the “database/data warehouse” does not vanish. The Streaming Dynamic
  • 6.
    © 2016 MapRTechnologies What is an Event? An event is an action or occurrence detected by a program. Events can be user actions (such as clicking a link on a web page or selling a stock), sensor actions (such as reading temperature), system occurrences (such as server crash). Examples: • Retail: Item sold, item out of stock, payment accepted, payment rejected etc. • Telco: Call initiated, call ended, call dropped etc. • IoT: Temperature reading, pressure reading, moisture level reading etc. • Healthcare: Vital signs unstable, patient released, patient billed, image taken etc. • IT: System crash, unautorized access, login failed etc. • Automobile: Engine error detected, tyre pressure low etc.
  • 7.
    © 2016 MapRTechnologies The Way We Were • This is beginning to fail because it doesn’t scale and it is expensive • The architectural weakness is in the staging and in a central data repository • Staging became a problem because of unstructured data Data Warehouse Data Marts Transactional Systems File(s) Data Staging ETL ETL ETL Queries
  • 8.
    © 2016 MapRTechnologies The Data Lake Concept Data Lake Applications • ETL – data acquisition • Data Lineage – for analytic usage • Metadata Discovery – external data (at least) • Metadata management – the data catalog • Governance – many aspects • Life Cycle Management – to archive or deletion • MDM – business glossary or ontology • ETL – to data engines • Direct Applications This is far too much work for a single Hadoop instance Collect Data Prep Static Data Sources Data Streams Data Lake or Hub MetaData Discovery MetaData Management Data Cleansing Data Lineage MDM ETL Life Cycle Mgt GovernETL Analytics or BI Apps Data Warehouse or Big Data DBMS
  • 9.
    © 2016 MapRTechnologies The Likely Future • This is a remarkably flexible architecture allowing for both data distribution and concentration • It accommodates streaming, normal application latency and high concurrency as occurs with database • It genuinely scales Data Sources Database Queries Data Lake Apps Streaming Apps Hadoop Instance Hadoop Instance Hadoop Instance Fast Pipe Fast Pipe Urgent Streams
  • 10.
    © 2016 MapRTechnologies Increased computer power reduces latencies compressing time ● Event = action or transaction ● Insight = analysis ● Outcome = business result This can move to ● Event = action ● Insight = predictive analysis of action and response ● Event = transaction ● Insight = analysis of aggregation ● Outcome = different business result Analytics is gradually becoming part of the business transaction rather than a later activity Event-Insight-Outcome
  • 11.
    © 2016 MapRTechnologies© 2016 MapR Technologies© 2016 MapR Technologies The Event-Insight-Outcome Framework
  • 12.
    © 2016 MapRTechnologies Events, Insights, and Outcomes – A Framework Events Fast Insights Historical Perspective Deep Insights Real-time Actions Business Outcomes
  • 13.
    © 2016 MapRTechnologies Questions to ask: • What “events” are most relevant? • What business benefits accrue by spotting trends, anomalies, and patterns in real-time? • What insights can be gleaned by looking at trends, anomalies, and patterns in a deeper, more historical context? • What business actions need to be the result of these insights? Applying EIO Framework to a Customer or Vertical
  • 14.
    © 2016 MapRTechnologies© 2016 MapR Technologies© 2016 MapR Technologies MapR Streams and the Converged Data Platform
  • 15.
    © 2016 MapRTechnologies Without a Converged Platform Open Source DatabaseStreams Enterprise Storage Batch Loads Real Time Apps Streaming Sources
  • 16.
    © 2016 MapRTechnologies The Converged Big Data Platform Open Source Streams Enterprise Storage Database MapR Converged Big Data Platform
  • 17.
    © 2016 MapRTechnologies The Converged Big Data Platform MapR Converged Big Data Platform
  • 18.
    © 2016 MapRTechnologies Life with a Converged Platform Stream ProcessingBulk ProcessingSources/Apps Enterprise-Grade Platform Services Data Web-Scale Storage MapR-FS MapR-DB MapR Streams Database Event Streaming Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy
  • 19.
    © 2016 MapRTechnologies MapR Streams: Global Pub-sub Event Streaming System for Big Data “Publish” means writing events to MapR Streams topics “Subscribe” means reading events from MapR Streams topics Guaranteed, immediate delivery to all consumers. Tie together geo-dispersed clusters. Worldwide. Standard real-time API (Kafka). Integrates with Spark Streaming, Storm, Apex, and Flink To pi c Stream Producers Remote sites and consumers Batch analytics Topic Replication Consumers Consumers
  • 20.
    © 2016 MapRTechnologies MapR Streams Benefits Simpler and Faster Architecture • Converged platform with file storage and database reduces data movement, data latency, hardware cost, and administration cost • Event streaming and stream processing in the same cluster enables faster processing • Unified security framework with files and database tables reduces administration cost around setting up and enforcing security policies • Multi-tenant - topic isolation, quotas, data placement control allows multiple isolated streaming applications to run on the same cluster reducing hardware cost and data movement
  • 21.
    © 2016 MapRTechnologies MapR Streams Benefits • Global data replication enables disaster recovery • One unified view of all data created and distributed • Ingest more events to enable faster insights and Hold on to events longer to enable deeper insights
  • 22.
    © 2016 MapRTechnologies 22© 2016 MapR Technologies 22MapR Confidential © 2016 MapR Technologies© 2016 MapR Technologies Use Cases
  • 23.
    © 2016 MapRTechnologies Altitude Digital
  • 24.
    © 2016 MapRTechnologies Largest Biometric Database in the World PEOPLE 1.2B PEOPLE
  • 25.
    © 2016 MapRTechnologies Yield Management Optimization Global Semi-conductor Company
  • 26.
    © 2016 MapRTechnologies National Oilwell Varco (NOV)
  • 27.
    © 2016 MapRTechnologies JSON DB (MapR-DB) Graph DB (Titan on MapR-DB) Search Engine (Elastic-Search) Transforming the Health Care Ecosystem Electronic Medical Records “The Stream is the System of Record” –Brad Anderson VP Big Data Informatics
  • 28.
    © 2016 MapRTechnologies Q&AEngage with us! 1. Whitepaper: When Streaming Becomes Strategic https://www.mapr.com/when-streaming-becomes-strategic 2. Book: Streaming Architecture – Kafka and MapR Streams https://www.mapr.com/streaming-architecture-using-apache-kafka-mapr-streams 3. Get Answers: MapR Converge Community: https://community.mapr.com/news

Editor's Notes

  • #13 This is the framework enabled by the MapR Converged Data Platform. Focus is how to compress the data to action cycle. Requires convergence of capbilities
  • #16 Contrast this with the situation presented with all other approaches. You have data duplication, more management tasks to coordinate the flow, more latency as data is staged and transferred between systems and much more data risk as there are lack of reliability, protection and DR capabilities across these solutions.
  • #20 Producers example – sensors, web logs, application logs, credit card transactions. Subscribers example – spark streaming, storm, or even databases.
  • #24 Altitude Digital, is one of the fastest growing video advertising marketplaces, with nearly seven billion transactions per day. Altitude Digital selects in real time the best video advertisement to play at the right time for the right person. The MapR platform provides Altitude Digital with a centralized data repository based on the MapR-DB NoSQL database that serves multiple departments, driving efficiency and competitive advantage. Altitude Digital optimizes and streamlines the connection between buyers and sellers of online video and mobile advertising by providing data-driven insights on daily consumer transactions. “Every event helps us make a more intelligent decision about what advertisement we will deliver for what person, and what will make the most money for the publisher while maintaining return on investment for the advertiser,” explained Manny Puentes, CTO, Altitude Digital. “Our goal is to present the most compelling video ad in real time for every video player in the world. That’s a big challenge.” Since MapR enables Altitude Digital to house all of the data in one place, data is managed holistically instead of in departmental silos. Operations, support, sales, and product use the data repository to solve their business-related objectives. “MapR-DB NoSQL database table replication is a powerful feature that enables real-time, bi-directional updates across data centers at scale. We use table replication for discovery and real-time notifications, giving our business a competitive advantage,” said Puentes. “MapR continues to deliver innovative enterprise solutions that just work.”
  • #25 24
  • #26 25
  • #27 National Oilwell Varco is a $23B multinational company based in Houston and is a leading worldwide provider of oil equipment, components and services.   NOV is using MapR to perform real-time analysis to optimize Oil and Gas drilling and production"
  • #28 They needed to make it easier for their platform to serve hospitals…. Their answer was our converged platform….and to treat the electronic medical record as a stream…the stream itself is a system of record and any updates are subscribed to and received by the various players and consumed as their app requires - - a search index, a database table. This dramatically simplified the processs and flow with integrated security for privacy and HiPAA requirement.s