© 2016 MapR Technologies
Crystal Valentine, PhD
VP of Technology Strategy
The “Next-Gen” is Now
How big data technologies are
revolutionizing the application landscape
(100,000)
(80,000)
(60,000)
(40,000)
(20,000)
-
20,000
40,000
60,000
80,000
100,000
120,000
2013 2014 2015 2016 2017 2018 2019 2020
N.G. Architecture Growth vs. Legacy Shrinkage ($M)
Total $ Growth of IT Mkt N.G. $ Growth Legacy Mkt Growth/Shrink in $
A Technology Inflection Point
Data source: IDC, Gartner; Analysis & Estimates: MapR
The Value of Data
Size
$, ₽
Value
Cost
Legacy Value Model
Net Value
Size
$, ₽
Value
Next-Gen Value Model
Cost
Net Value
OPT OPT
Data as Infrastructure
Big Data Virtuous Cycle
Collect
More
Data
Technological
Improvements to
Platform
Make Better
Decisions
Create
Efficiencies
We are only at the beginning
of a new data-first era.
The winners today are
succeeding at operationalizing
their data best.
The key is the platform.
The Next-Gen is Now
In 4 years 90% of Data will be on next-gen technology
Technological Innovation
1. Data & Compute Convergence
The Secret is to Bring Analytics & Operations Together
Converged Applications
Complete Access to Real-time and
Historical Data in One Platform
Historical Immediate
Before
• Complex and Slow
• Multiple Versions of the Truth
• Difficult Governance
• High TCO
• Real-Time Data to Action
• Single Data Copy
• Easy Governance
• Low TCO – Scales Horizontally
Analytics
Operations
HTAP (Hybrid Transaction/Analytical Processing) – Gartner Group 2015
Data
Data
Data
Converged
The MapR Converged Platform
Open Source Engines & Tools Commercial Engines & Applications
Enterprise-Grade Platform Services
DataProcessing
Web-Scale Storage
MapR-FS MapR-DB
Search
and
Others
Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability
MapR Streams
Cloud and
Managed
Services
Search
and
Others
UnifiedManagementandMonitoring
Search
and
Others
Event StreamingDatabase
Custom
Apps
Example: Predictive Analytics at American Express
● AmEx MyOffers App
● Runs on MapR Converged Data
Platform
● Analytics: Mahout machine learning
●Operations: Real-time
recommendations based on location
Predictive Analytics
Type Machine
Learning
Time Series Discrete
Choice
Regression Classification Recommendations
Use
Cases
Customer
Retention
Customer
Sentiment
Targeted
Marketing
Risk
Assessment
Fraud
Detection
Diagnostics
Co-occurrence network
• Cardmember spend
graph
• Merchant data
• Location
• Thumb up/down
feedback
Predictive Analytics
Iteration
A. Input Data B. Seed Point Selection C. Iteration 2 D. Iteration 3 E. Final Clustering
Platform requirements
+
Search engine plus location
data stream for fast
recommendations
Large-scale, offline iterative
ML training algorithm +
Customer RetentionAmerican Express
Offers based on real-time location data
coupled with deep analytical insights
Saved customers over $190M
Technological Innovation
2. Stream Processing
Event-based Data Flows
web events
etc…
machine sensors
Biometrics
Mobile events
Streams Enable Real-Time Applications
Failure
Alerts Real-time
application &
network
monitoring
Trending
now
Web
Personalized
Offers
Real-time Fraud Detection
Ad optimization
Supply Chain
Optimization
Producers ConsumersEvents_Topic
A stream is an unbounded sequence of events carried from a
set of producers to a set of consumers.
Events
What’s a Stream?
Producers and consumers don’t have to be aware of each other, instead they
participate in shared topics. This is called publish/subscribe.
Events are delivered in the order they are received, like a queue.
Unlike with a queue, events are persisted even after they’re delivered.
Streams vs Queues
Buffer &
Save Hourly
Files
Aggregate
by minute
Join
Batch
Load
Batch
Load
Streams Simplify Data Integration Pipelines
Analytics
& BI
Filter
Aggregate
by Minute
Join
Examples of Producers & Consumers
Social Platforms
Servers
(Logs, Metrics)
Sensors
Mobile Apps
Other Apps &
Microservices
Alerting Systems
Stream Processing
Frameworks
Databases &
Search Engines
Dashboards
Other Apps &
Microservices
Flexible Communication Modes
Consumer Group
C
One to One Many to Many
Orders Clicks
P P P
C CC
JSON DB
(MapR-DB)
Graph DB
(Titan on
MapR-DB)
Search Engine
(Elastic-Search)
Transforming the Health Care Ecosystem
Electronic Medical Records
“The Stream is the
System of Record”
–Brad Anderson
VP Big Data Informatics
Streams Enable Event-based Microservices
Clicks Sessions Conversions
Sessionizer Converted?
Microservices Overview
• Microservices is an approach to application
development in which a large application is built
as a suite of modular services.
• Each module supports a specific business goal
and uses a simple, well-defined interface to
communicate with other modules.
Event-Driven Microservices
Microservices combined with Streams
and a Converged Platform provides:
• Application development across file, database,
document and streaming services
• Ultra scale, utility grade performance
• Greater efficiency and simplicity than alternative
architectures
• Integrated data-in-motion and data-at-rest and
continuous and low latency processing
Microservices Architecture
Message Broker
Processing
Services
Processing
Services
Message Broker
Microservices Architecture
Message Broker
Processing
Platform
Processing
Platform
Streaming Data
Platform
Converged Microservices Architecture
MapR Converged Data Platform
Converged
Message and
Processing
Services
Technological Innovation
3. Global Cloud Processing
Single global namespace
Strong global consistency
Distributed processing across clusters,
data centers and public and private cloud
environments
Supports global apps with
active/active replication
Global Cloud Processing Platform
India
1.3 billion residents
640,000 villages
22 official languages
60% live on $2/day
75 million homeless
Largest illiterate population in the world
Indian Subsidy Program
$40 billion per year on subsidies
40% leakage due to fraud, impersonation
Largest Biometric Database in the World
PEOPLE
20 BILLION
BIOMETRICS
Massive NoSQL Database Application
1M registrations (Trillions of matches) / day
5 TB incremental data / day
100M+ authentications / day
200 ms response
0.0001 false positive rate
Billion+ audit records per day
Always-on high availability
● Mirroring and backup
across DCs
● Strong consistency for
update and insert
DC #1
Gurgaon
DC #3
Bangalore
Regional
Backup #2
Regional
Backup #4
Broad Application Support through Open APIs
Enrollment Services (40k enrollment stations)
Authentication Services
Fraud detection frameworks
Administrative Services
Analytics and Reporting Services
Logistics services
Contact Center
Impact
$1B saving per year from decreased fraud
>1 billion Aadhaars issued
95% of adults in India
500K-700K new enrollees everyday
240 million bank accounts
122 million LPG consumers
113 million ration cards
Key Technological Innovations
1. Data & Compute Convergence
2. Stream Processing
3. Global Cloud Processing
ThankYou
@DrCrystalV maprtech
cvalentine@mapr.com
MapR
maprtech
mapr-technologies

Next-Gen уже здесь

  • 1.
    © 2016 MapRTechnologies Crystal Valentine, PhD VP of Technology Strategy The “Next-Gen” is Now How big data technologies are revolutionizing the application landscape
  • 2.
    (100,000) (80,000) (60,000) (40,000) (20,000) - 20,000 40,000 60,000 80,000 100,000 120,000 2013 2014 20152016 2017 2018 2019 2020 N.G. Architecture Growth vs. Legacy Shrinkage ($M) Total $ Growth of IT Mkt N.G. $ Growth Legacy Mkt Growth/Shrink in $ A Technology Inflection Point Data source: IDC, Gartner; Analysis & Estimates: MapR
  • 3.
    The Value ofData Size $, ₽ Value Cost Legacy Value Model Net Value Size $, ₽ Value Next-Gen Value Model Cost Net Value OPT OPT
  • 4.
    Data as Infrastructure BigData Virtuous Cycle Collect More Data Technological Improvements to Platform Make Better Decisions Create Efficiencies We are only at the beginning of a new data-first era. The winners today are succeeding at operationalizing their data best. The key is the platform.
  • 5.
    The Next-Gen isNow In 4 years 90% of Data will be on next-gen technology
  • 6.
    Technological Innovation 1. Data& Compute Convergence
  • 7.
    The Secret isto Bring Analytics & Operations Together Converged Applications Complete Access to Real-time and Historical Data in One Platform Historical Immediate
  • 8.
    Before • Complex andSlow • Multiple Versions of the Truth • Difficult Governance • High TCO • Real-Time Data to Action • Single Data Copy • Easy Governance • Low TCO – Scales Horizontally Analytics Operations HTAP (Hybrid Transaction/Analytical Processing) – Gartner Group 2015 Data Data Data Converged
  • 9.
    The MapR ConvergedPlatform Open Source Engines & Tools Commercial Engines & Applications Enterprise-Grade Platform Services DataProcessing Web-Scale Storage MapR-FS MapR-DB Search and Others Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability MapR Streams Cloud and Managed Services Search and Others UnifiedManagementandMonitoring Search and Others Event StreamingDatabase Custom Apps
  • 10.
    Example: Predictive Analyticsat American Express ● AmEx MyOffers App ● Runs on MapR Converged Data Platform ● Analytics: Mahout machine learning ●Operations: Real-time recommendations based on location
  • 11.
    Predictive Analytics Type Machine Learning TimeSeries Discrete Choice Regression Classification Recommendations Use Cases Customer Retention Customer Sentiment Targeted Marketing Risk Assessment Fraud Detection Diagnostics Co-occurrence network • Cardmember spend graph • Merchant data • Location • Thumb up/down feedback
  • 12.
    Predictive Analytics Iteration A. InputData B. Seed Point Selection C. Iteration 2 D. Iteration 3 E. Final Clustering
  • 13.
    Platform requirements + Search engineplus location data stream for fast recommendations Large-scale, offline iterative ML training algorithm +
  • 14.
    Customer RetentionAmerican Express Offersbased on real-time location data coupled with deep analytical insights Saved customers over $190M
  • 15.
  • 16.
    Event-based Data Flows webevents etc… machine sensors Biometrics Mobile events
  • 17.
    Streams Enable Real-TimeApplications Failure Alerts Real-time application & network monitoring Trending now Web Personalized Offers Real-time Fraud Detection Ad optimization Supply Chain Optimization
  • 18.
    Producers ConsumersEvents_Topic A streamis an unbounded sequence of events carried from a set of producers to a set of consumers. Events What’s a Stream? Producers and consumers don’t have to be aware of each other, instead they participate in shared topics. This is called publish/subscribe.
  • 19.
    Events are deliveredin the order they are received, like a queue. Unlike with a queue, events are persisted even after they’re delivered. Streams vs Queues
  • 20.
    Buffer & Save Hourly Files Aggregate byminute Join Batch Load Batch Load Streams Simplify Data Integration Pipelines Analytics & BI Filter Aggregate by Minute Join
  • 21.
    Examples of Producers& Consumers Social Platforms Servers (Logs, Metrics) Sensors Mobile Apps Other Apps & Microservices Alerting Systems Stream Processing Frameworks Databases & Search Engines Dashboards Other Apps & Microservices
  • 22.
    Flexible Communication Modes ConsumerGroup C One to One Many to Many Orders Clicks P P P C CC
  • 23.
    JSON DB (MapR-DB) Graph DB (Titanon MapR-DB) Search Engine (Elastic-Search) Transforming the Health Care Ecosystem Electronic Medical Records “The Stream is the System of Record” –Brad Anderson VP Big Data Informatics
  • 24.
    Streams Enable Event-basedMicroservices Clicks Sessions Conversions Sessionizer Converted?
  • 25.
    Microservices Overview • Microservicesis an approach to application development in which a large application is built as a suite of modular services. • Each module supports a specific business goal and uses a simple, well-defined interface to communicate with other modules.
  • 26.
    Event-Driven Microservices Microservices combinedwith Streams and a Converged Platform provides: • Application development across file, database, document and streaming services • Ultra scale, utility grade performance • Greater efficiency and simplicity than alternative architectures • Integrated data-in-motion and data-at-rest and continuous and low latency processing
  • 27.
  • 28.
  • 29.
    Converged Microservices Architecture MapRConverged Data Platform Converged Message and Processing Services
  • 30.
  • 31.
    Single global namespace Strongglobal consistency Distributed processing across clusters, data centers and public and private cloud environments Supports global apps with active/active replication Global Cloud Processing Platform
  • 32.
    India 1.3 billion residents 640,000villages 22 official languages 60% live on $2/day 75 million homeless Largest illiterate population in the world
  • 33.
    Indian Subsidy Program $40billion per year on subsidies 40% leakage due to fraud, impersonation
  • 34.
    Largest Biometric Databasein the World PEOPLE 20 BILLION BIOMETRICS
  • 35.
    Massive NoSQL DatabaseApplication 1M registrations (Trillions of matches) / day 5 TB incremental data / day 100M+ authentications / day 200 ms response 0.0001 false positive rate Billion+ audit records per day
  • 36.
    Always-on high availability ●Mirroring and backup across DCs ● Strong consistency for update and insert DC #1 Gurgaon DC #3 Bangalore Regional Backup #2 Regional Backup #4
  • 37.
    Broad Application Supportthrough Open APIs Enrollment Services (40k enrollment stations) Authentication Services Fraud detection frameworks Administrative Services Analytics and Reporting Services Logistics services Contact Center
  • 38.
    Impact $1B saving peryear from decreased fraud >1 billion Aadhaars issued 95% of adults in India 500K-700K new enrollees everyday 240 million bank accounts 122 million LPG consumers 113 million ration cards
  • 39.
    Key Technological Innovations 1.Data & Compute Convergence 2. Stream Processing 3. Global Cloud Processing
  • 40.