SlideShare a Scribd company logo
Unlocking Operational Intelligence
from the Data Lake
Mat Keep
Director, Product & Market Analysis
mat.keep@mongodb.com
@matkeep
2
The World is Changing
Digital Natives & Digital Transformation
Volume
Velocity
Variety
Iterative
Agile
Short Cycles
Always On
Secure
Global
Open-Source
Cloud
Commodity
Data Time
Risk Cost
3
Creating the “Insight Economy”
4
Data Warehouse Challenges
5
The Rise of the Data Lake
6
• 24% CAGR: Hadoop,
Spark & Streaming
• 18% CAGR: Databases
• Databases are key
components within the
big data landscape
“Big Data” is More than Just Hadoop
7
Apache Hadoop Data Lake
• Risk modeling
• Retrospective & predictive analytics
• Machine learning & pattern
matching
• Customer segmentation & churn
analysis
• ETL pipelines
• Active archives
NoSQL
Database
8
http://www.infoworld.com/article/2980316/big-data/why-your-big-data-strategy-is-a-bust.html
“Thru 2018, 70 percent of Hadoop
deployments will not meet cost savings
and revenue generation objectives due to
skills and integration challenges.”
Nick Heudecker, Research Director, Data Management & Integration
9
How to Avoid Being in the 70%?
1. Unify data lake analytics with
the operational applications
2. Create smart, contextually
aware, data-driven apps &
insights
3. Integrate a database layer with
the data lake
10
MongoDB & Hadoop: What’s Common
Distributed Processing & Analytics
Common Attributes
• Schema-on-read
• Multiple replicas
• Horizontal scale
• High throughput
• Low TCO
11
MongoDB & Hadoop: What’s Different
Distributed Processing & Analytics
• Data stored as large files (64MB-128MB
blocks). No indexes
• Write-once-read-many, append-only
• Designed for high throughput scans
across TB/PB of data.
• Multi-minute latency
Common Attributes
• Schema-on-read
• Multiple replicas
• Horizontal scale
• High throughput
• Low TCO
12
MongoDB & Hadoop: What’s Different
Distributed Processing & Analytics
• Random access to subsets of data
• Millisecond latency
• Expressive querying, rich
aggregations & flexible indexing
• Update fast changing data, avoid re-
write / re-compute entire data set
• Data stored as large files (64MB-128MB
blocks). No indexes
• Write-once-read-many, append-only
• Designed for high throughput scans
across TB/PB of data.
• Multi-minute latency
Common Attributes
• Schema-on-read
• Multiple replicas
• Horizontal scale
• High throughput
• Low TCO
13
Bringing it Together
Online Services
powered by
Back-end machine learning
powered by
• User account & personalization
• Product catalog
• Session management & shopping cart
• Recommendations
• Customer classification & clustering
• Basket analysis
• Brand sentiment
• Price optimization
MongoDB
Connector for
Hadoop
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Configure where to
land incoming data
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Raw data processed to
generate analytics models
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
MongoDB exposes
analytics models to
operational apps.
Handles real time
updates
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Compute new
models against
MongoDB &
HDFS
19
Operational Database Requirements
1 “Smart” integration with the data lake
2 Powerful real-time analytics
3 Flexible, governed data model
4 Scale with the data lake
5 Sophisticated management & security
20
Evaluating your Options
21
Query and Data Model
MongoDB Relational Column Family
(i.e. HBase)
Rich query language & secondary
indexes
Yes Yes Requires integration
with separate Spark /
Hadoop cluster
In-Database aggregations & search Yes Yes Requires integration
with separate Spark /
Hadoop cluster
Dynamic schema Yes No Partial
Data validation Yes Yes App-side code
• Why it matters
– Query & Aggregations: Rich, real time analytics against operational data
– Dynamic Schema: Manage multi-structured data
– Data Validation: Enforce data governance between data lake & operational apps
22
Data Lake Integration
MongoDB Relational Column Family
(i.e. HBase)
Hadoop + secondary indexes Yes Yes: Expensive No secondary
indexes
Spark + secondary indexes Yes Yes: Expensive No secondary
indexes
Native BI connectivity Yes Yes 3rd-party connectors
Workload isolation Yes Yes: Expensive Load data to
separate
Spark/Hadoop
cluster
• Why it matters
– Hadoop + Spark: Efficient data movement between data lake, processing layer & database
– Native BI Connectivity: Visualizing operational data
– Workload isolation: separation between operational and analytical workloads
23
Operationalizing for Scale & Security
MongoDB Relational Column Family
(i.e. HBase)
Robust security controls Yes Yes Yes
Scale-out on commodity hardware Yes No Yes
Sophisticated management platform Yes Yes Monitoring only
• Why it matters
– Security: Data protection for regulatory compliance
– Scale-Out: Grow with the data lake
– Management: Reduce TCO with platform automation, monitoring, disaster recovery
24
MongoDB Nexus Architecture
Adoption & Skills Availability
Operational Data Lake in Action
27
Problem Why MongoDB ResultsProblem Solution Results
Existing EDW with nightly
batch loads
No real-time analytics to
personalize user experience
Application changes broke ETL
pipeline
Unable to scale as services
expanded
Microservices architecture running on AWS
All application events written to Kafka queue,
routed to MongoDB and Hadoop
Events that personalize real-time experience (ie
triggering email send, additional questions,
offers) written to MongoDB
All event data aggregated with other data
sources and analyzed in Hadoop, updated
customer profiles written back to MongoDB
2x faster delivery of new
services after migrating to new
architecture
Enabled continuous delivery:
pushing new features every
day
Personalized user experience,
plus higher uptime and
scalability
UK’s Leading Price Comparison Site
Out-pacing Internet search giants with continuous delivery pipeline
powered by microservices & Docker running MongoDB, Kafka and
Hadoop in the cloud
28
Problem Why MongoDB Results
Problem Solution Results
Customer data scattered across
100+ different systems
Poor customer experience: no
personalization, no consistent
experience across brands or
devices
No way to analyze customer
behavior to deliver targeted offers
Selected MongoDB over HBase for
schema flexibility and rich query support
MongoDB stores all customer profiles,
served to web, mobile & call-center apps
Distributed across multiple regions for DR
and data locality
All customer interactions stored in
MongoDB, loaded into Hadoop for
customer segmentation
Unified processing pipeline with Spark
running across MongoDB and Hadoop
Single profile created for each
customer, personalizing
experience in real time
Revenue optimization by
calculating best ticket prices
Reduce competitive pressures
by identifying gaps in product
offerings
Customer Data Management
Single view and real-time analytics with MongoDB,
Spark, & Hadoop
Leading
Global
Airline
29
Problem Why MongoDB Results
Problem Solution Results
Commercialize a national security
platform
Massive volumes of multi-
structured data: news, RSS &
social feeds, geospatial, geological,
health & crime stats
Requires complex analysis,
delivered in real time, always on
Apache NiFI for data ingestion, routing
& metadata management
Hadoop for text analytics
HANA for geospatial analytics
MongoDB correlates analytics with
user profiles & location data to deliver
real-time alerts to corporate security
teams & individual travelers
Enables Prescient to uniquely
blend big data technology with its
security IP developed in
government
Dynamic data model supports
indexing 38k data sources,
growing at 200 per day
24x7 continuous availability
Scalability to PBs of data
World’s Most Sophisticated
Traveler Safety Platform
Analyzing PBs of Data with MongoDB, Hadoop, Apache NiFi
& SAP HANA
30
Problem Why MongoDB Results
Problem Solution Results
Requirement to analyze data over
many different dimensions to detect
real time threat profiles
HBase unable to query data
beyond primary key lookups
Lucene search unable to scale with
growth in data
MongoDB + Hadoop to collect and
analyze data from internet sensors in
real time
MongoDB dynamic schema enables
sensor data to be enriched with
geospatial tags
Auto-sharding to scale as data
volumes grow
Run complex, real-time analytics on
live data
Improved query performance by
over 3x
Scale to support doubling of data
volume every 24 months
Deploy across global data
centers for low latency user
experience
Engineering teams have more
time to develop new features
Powering Global Threat
Intelligence
Cloud-based real-time analytics with MongoDB & Hadoop
Wrapping Up
Conclusion
1 Data lakes enabling
enterprises to affordably
capture & analyze more data
2 Operational and analytical
workloads are converging
3 MongoDB is the key
technology to operationalize
the data lake
33
MongoDB Compass MongoDB Connector for BI
MongoDB Enterprise Server
MongoDB Enterprise Advanced24x7Support
(1hourSLA)
CommercialLicense
(NoAGPLCopyleftRestrictions)
Platform
Certifications
MongoDB Ops Manager
Monitoring &
Alerting
Query
Optimization
Backup &
Recovery
Automation &
Configuration
Schema Visualization
Data Exploration
Ad-Hoc Queries
Visualization
Analysis
Reporting
Authorization Auditing
Encryption
(In Flight & at Rest)
Authentication
REST APIEmergency
Patches
Customer
Success
Program
On-Demand
Online Training
Warranty
Limitation of
Liability
Indemnification
500+ employees
About
MongoDB, Inc.
2,000+
customers
13 offices
worldwide
$311M in
funding
35
Resources to Learn More
• Guide: Operational Data Lake
• Whitepaper: Real-Time
Analytics with Apache Spark &
MongoDB
37
For More Information
Resource Location
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training education.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.org
MongoDB Downloads mongodb.com/download
Additional Info info@mongodb.com
38
Problem Why MongoDB Results
Problem Solution Results
System failures in online banking
systems creating customer sat
issues
No personalization experience
across channels
No enrichment of user data with
social media chatter
Apache Flume to ingest log data &
social media streams, Apache Spark
to process log events
MongoDB to persist log data and
KPIs, immediately rebuild user
sessions when a service fails
Integration with MongoDB query
language and secondary indexes to
selectively filter and query data in real
time
Improved user experience, with
more customers using online,
self-service channels
Improved services following
deeper understanding of how
users interact with systems
Greater user insight by adding
social media insights
One of World’s Largest Banks
Creating new customer insights with MongoDB & Spark
39
LEGACY FUTURE STATE
APPS On-Premise, Monoliths SaaS, Microservices
DATABASE Relational (Oracle) Non-Relational (MongoDB)
EDW Teradata, Oracle, etc. Hadoop
COMPUTE Scale-Up Server Containers / Commodity Server / Cloud
STORAGE SAN Local Storage & Data Lakes
NETWORK Routers and Switches Software-Defined Networks
The New Enterprise Stack
Operational Application
Analytics Application
MongoDB Primary
MongoDB Secondary MongoDB Secondary
Real Time analytics to
inform operational
application
Querying
operational data
Workload Isolation for Real-Time Analytics
41
Handling Multi-Structured Data from the Data Lake
Flexible, Governed Data Model
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
Profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array
of sub-documents
Typed field values
Fields can contain arrays
Number
42
Expressive Query Language, Rich
Secondary Indexes
Rich Queries
• Find Paul’s cars
• Find everybody in London with a car between 1970
and 1980
Geospatial • Find all of the car owners within 5km of Trafalgar Sq.
Text Search • Find all the cars described as having leather seats
Aggregation • Calculate the average value of Paul’s car collection
Map Reduce
• What is the ownership pattern of colors by geography
over time (is purple trending in China?)
43
Visualizing Operational Data
MongoDB Connector for BI
Visualize and explore multi-structured data
using SQL-based BI platforms.
Your BI Platform
BI Connector
Provides Schema
Translates Queries
Translates Response
44
Enterprise-Grade Security
*Included with MongoDB Enterprise Advanced
BUSINESS NEEDS SECURITY FEATURES
Authentication SCRAM, LDAP*, Kerberos*, x.509 Certificates
Authorization Built-in Roles, User-Defined Roles, Field-Level Redaction
Auditing* Admin, DML, DDL, Role-based
Encryption
Network: SSL (with FIPS 140-2), Disk: Encrypted Storage Engine* or Partner
Solutions
45
Scale-Out Across Commodity
Hardware & Regions
46
Management Tooling:
MongoDB Ops Manager
• Monitoring & alerting
• Integration to APM platforms
• Prescriptive management with
query profiling
• Automated cluster
provisioning, scaling and
upgrades
• Continuous, point in time
backup

More Related Content

What's hot

MongoDB Europe 2016 - MongoDB Atlas
MongoDB Europe 2016 - MongoDB AtlasMongoDB Europe 2016 - MongoDB Atlas
MongoDB Europe 2016 - MongoDB Atlas
MongoDB
 
What's New In MongoDB 3.6
What's New In MongoDB 3.6What's New In MongoDB 3.6
What's New In MongoDB 3.6
MongoDB
 
What's new in MongoDB 3.6?
What's new in MongoDB 3.6?What's new in MongoDB 3.6?
What's new in MongoDB 3.6?
MongoDB
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDB
MongoDB
 
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and Beyond
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and BeyondMongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and Beyond
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and Beyond
MongoDB
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
MongoDB
 
MongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB in a Mainframe World
MongoDB in a Mainframe World
MongoDB
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB
 
MongoDB Atlas
MongoDB AtlasMongoDB Atlas
MongoDB Atlas
MongoDB
 
The importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital TransformationThe importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital Transformation
MongoDB
 
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB
 
MongoDB: Agile Combustion Engine
MongoDB: Agile Combustion EngineMongoDB: Agile Combustion Engine
MongoDB: Agile Combustion Engine
Norberto Leite
 
Maximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWSMaximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWS
MongoDB
 
MongoDB in the Healthcare Enterprise
MongoDB in the Healthcare EnterpriseMongoDB in the Healthcare Enterprise
MongoDB in the Healthcare Enterprise
MongoDB
 
MongoDB .local Chicago 2019: A MongoDB Journey: Moving from a relational data...
MongoDB .local Chicago 2019: A MongoDB Journey: Moving from a relational data...MongoDB .local Chicago 2019: A MongoDB Journey: Moving from a relational data...
MongoDB .local Chicago 2019: A MongoDB Journey: Moving from a relational data...
MongoDB
 
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...
MongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
MongoDB
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for Developers
MongoDB
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
MongoDB
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
MongoDB
 

What's hot (20)

MongoDB Europe 2016 - MongoDB Atlas
MongoDB Europe 2016 - MongoDB AtlasMongoDB Europe 2016 - MongoDB Atlas
MongoDB Europe 2016 - MongoDB Atlas
 
What's New In MongoDB 3.6
What's New In MongoDB 3.6What's New In MongoDB 3.6
What's New In MongoDB 3.6
 
What's new in MongoDB 3.6?
What's new in MongoDB 3.6?What's new in MongoDB 3.6?
What's new in MongoDB 3.6?
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDB
 
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and Beyond
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and BeyondMongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and Beyond
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and Beyond
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
MongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB in a Mainframe World
MongoDB in a Mainframe World
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
MongoDB Atlas
MongoDB AtlasMongoDB Atlas
MongoDB Atlas
 
The importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital TransformationThe importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital Transformation
 
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
 
MongoDB: Agile Combustion Engine
MongoDB: Agile Combustion EngineMongoDB: Agile Combustion Engine
MongoDB: Agile Combustion Engine
 
Maximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWSMaximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWS
 
MongoDB in the Healthcare Enterprise
MongoDB in the Healthcare EnterpriseMongoDB in the Healthcare Enterprise
MongoDB in the Healthcare Enterprise
 
MongoDB .local Chicago 2019: A MongoDB Journey: Moving from a relational data...
MongoDB .local Chicago 2019: A MongoDB Journey: Moving from a relational data...MongoDB .local Chicago 2019: A MongoDB Journey: Moving from a relational data...
MongoDB .local Chicago 2019: A MongoDB Journey: Moving from a relational data...
 
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for Developers
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 

Viewers also liked

Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage Engine
MongoDB
 
Webinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDBWebinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDB
MongoDB
 
Stay Ahead of Risk
Stay Ahead of RiskStay Ahead of Risk
Stay Ahead of Risk
Procore Technologies
 
Filling the Construction Labor Gap
Filling the Construction Labor GapFilling the Construction Labor Gap
Filling the Construction Labor Gap
Procore Technologies
 
Creating the Internet of Your Things
Creating the Internet of Your ThingsCreating the Internet of Your Things
Creating the Internet of Your Things
DataWorks Summit/Hadoop Summit
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using HadoopDataWorks Summit
 
Data Modeling Deep Dive
Data Modeling Deep DiveData Modeling Deep Dive
Data Modeling Deep DiveMongoDB
 
Concurrency Control in MongoDB 3.0
Concurrency Control in MongoDB 3.0Concurrency Control in MongoDB 3.0
Concurrency Control in MongoDB 3.0
MongoDB
 
MongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
PivotalOpenSourceHub
 
Webinar: MongoDB Connector for Spark
Webinar: MongoDB Connector for SparkWebinar: MongoDB Connector for Spark
Webinar: MongoDB Connector for Spark
MongoDB
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Julian Hyde
 
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...MongoDB
 
Big Data Analytics for Real-time Operational Intelligence with Your z/OS Data
Big Data Analytics for Real-time Operational Intelligence with Your z/OS DataBig Data Analytics for Real-time Operational Intelligence with Your z/OS Data
Big Data Analytics for Real-time Operational Intelligence with Your z/OS Data
Precisely
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDB
MongoDB
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
Jared Winick
 
XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez X...
XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez X...XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez X...
XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez X...
Publicis Sapient Engineering
 
Webinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance ImplicationsWebinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance Implications
MongoDB
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Institute of Contemporary Sciences
 

Viewers also liked (20)

Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage Engine
 
Webinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDBWebinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDB
 
Stay Ahead of Risk
Stay Ahead of RiskStay Ahead of Risk
Stay Ahead of Risk
 
PROCORE
PROCOREPROCORE
PROCORE
 
Filling the Construction Labor Gap
Filling the Construction Labor GapFilling the Construction Labor Gap
Filling the Construction Labor Gap
 
Creating the Internet of Your Things
Creating the Internet of Your ThingsCreating the Internet of Your Things
Creating the Internet of Your Things
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
 
Data Modeling Deep Dive
Data Modeling Deep DiveData Modeling Deep Dive
Data Modeling Deep Dive
 
Concurrency Control in MongoDB 3.0
Concurrency Control in MongoDB 3.0Concurrency Control in MongoDB 3.0
Concurrency Control in MongoDB 3.0
 
MongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB Memory Management Demystified
MongoDB Memory Management Demystified
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Webinar: MongoDB Connector for Spark
Webinar: MongoDB Connector for SparkWebinar: MongoDB Connector for Spark
Webinar: MongoDB Connector for Spark
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
 
Big Data Analytics for Real-time Operational Intelligence with Your z/OS Data
Big Data Analytics for Real-time Operational Intelligence with Your z/OS DataBig Data Analytics for Real-time Operational Intelligence with Your z/OS Data
Big Data Analytics for Real-time Operational Intelligence with Your z/OS Data
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDB
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
 
XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez X...
XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez X...XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez X...
XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez X...
 
Webinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance ImplicationsWebinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance Implications
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 

Similar to Unlocking Operational Intelligence from the Data Lake

Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
MongoDB
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
MongoDB
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
Skillwise Group
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 
Accelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyAccelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data Strategy
MongoDB
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
Amazon Web Services
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
Data Driven Innovation
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Denodo
 
Microsoft SQL Server - Parallel Data Warehouse Presentation
Microsoft SQL Server - Parallel Data Warehouse PresentationMicrosoft SQL Server - Parallel Data Warehouse Presentation
Microsoft SQL Server - Parallel Data Warehouse PresentationMicrosoft Private Cloud
 
L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazione
MongoDB
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
Amazon Web Services
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
confluent
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
James Serra
 
Key Data Management Requirements for the IoT
Key Data Management Requirements for the IoTKey Data Management Requirements for the IoT
Key Data Management Requirements for the IoTMongoDB
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
Amazon Web Services
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
MongoDB
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
Serhiy (Serge) Haziyev
 

Similar to Unlocking Operational Intelligence from the Data Lake (20)

Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Accelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyAccelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data Strategy
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Microsoft SQL Server - Parallel Data Warehouse Presentation
Microsoft SQL Server - Parallel Data Warehouse PresentationMicrosoft SQL Server - Parallel Data Warehouse Presentation
Microsoft SQL Server - Parallel Data Warehouse Presentation
 
L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazione
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Key Data Management Requirements for the IoT
Key Data Management Requirements for the IoTKey Data Management Requirements for the IoT
Key Data Management Requirements for the IoT
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 

Recently uploaded (20)

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 

Unlocking Operational Intelligence from the Data Lake

  • 1. Unlocking Operational Intelligence from the Data Lake Mat Keep Director, Product & Market Analysis mat.keep@mongodb.com @matkeep
  • 2. 2 The World is Changing Digital Natives & Digital Transformation Volume Velocity Variety Iterative Agile Short Cycles Always On Secure Global Open-Source Cloud Commodity Data Time Risk Cost
  • 5. 5 The Rise of the Data Lake
  • 6. 6 • 24% CAGR: Hadoop, Spark & Streaming • 18% CAGR: Databases • Databases are key components within the big data landscape “Big Data” is More than Just Hadoop
  • 7. 7 Apache Hadoop Data Lake • Risk modeling • Retrospective & predictive analytics • Machine learning & pattern matching • Customer segmentation & churn analysis • ETL pipelines • Active archives NoSQL Database
  • 8. 8 http://www.infoworld.com/article/2980316/big-data/why-your-big-data-strategy-is-a-bust.html “Thru 2018, 70 percent of Hadoop deployments will not meet cost savings and revenue generation objectives due to skills and integration challenges.” Nick Heudecker, Research Director, Data Management & Integration
  • 9. 9 How to Avoid Being in the 70%? 1. Unify data lake analytics with the operational applications 2. Create smart, contextually aware, data-driven apps & insights 3. Integrate a database layer with the data lake
  • 10. 10 MongoDB & Hadoop: What’s Common Distributed Processing & Analytics Common Attributes • Schema-on-read • Multiple replicas • Horizontal scale • High throughput • Low TCO
  • 11. 11 MongoDB & Hadoop: What’s Different Distributed Processing & Analytics • Data stored as large files (64MB-128MB blocks). No indexes • Write-once-read-many, append-only • Designed for high throughput scans across TB/PB of data. • Multi-minute latency Common Attributes • Schema-on-read • Multiple replicas • Horizontal scale • High throughput • Low TCO
  • 12. 12 MongoDB & Hadoop: What’s Different Distributed Processing & Analytics • Random access to subsets of data • Millisecond latency • Expressive querying, rich aggregations & flexible indexing • Update fast changing data, avoid re- write / re-compute entire data set • Data stored as large files (64MB-128MB blocks). No indexes • Write-once-read-many, append-only • Designed for high throughput scans across TB/PB of data. • Multi-minute latency Common Attributes • Schema-on-read • Multiple replicas • Horizontal scale • High throughput • Low TCO
  • 13. 13 Bringing it Together Online Services powered by Back-end machine learning powered by • User account & personalization • Product catalog • Session management & shopping cart • Recommendations • Customer classification & clustering • Basket analysis • Brand sentiment • Price optimization MongoDB Connector for Hadoop
  • 14. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Distributed Processing Frameworks Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake
  • 15. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Distributed Processing Frameworks Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake Configure where to land incoming data
  • 16. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Distributed Processing Frameworks Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake Raw data processed to generate analytics models
  • 17. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Distributed Processing Frameworks Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake MongoDB exposes analytics models to operational apps. Handles real time updates
  • 18. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Distributed Processing Frameworks Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake Compute new models against MongoDB & HDFS
  • 19. 19 Operational Database Requirements 1 “Smart” integration with the data lake 2 Powerful real-time analytics 3 Flexible, governed data model 4 Scale with the data lake 5 Sophisticated management & security
  • 21. 21 Query and Data Model MongoDB Relational Column Family (i.e. HBase) Rich query language & secondary indexes Yes Yes Requires integration with separate Spark / Hadoop cluster In-Database aggregations & search Yes Yes Requires integration with separate Spark / Hadoop cluster Dynamic schema Yes No Partial Data validation Yes Yes App-side code • Why it matters – Query & Aggregations: Rich, real time analytics against operational data – Dynamic Schema: Manage multi-structured data – Data Validation: Enforce data governance between data lake & operational apps
  • 22. 22 Data Lake Integration MongoDB Relational Column Family (i.e. HBase) Hadoop + secondary indexes Yes Yes: Expensive No secondary indexes Spark + secondary indexes Yes Yes: Expensive No secondary indexes Native BI connectivity Yes Yes 3rd-party connectors Workload isolation Yes Yes: Expensive Load data to separate Spark/Hadoop cluster • Why it matters – Hadoop + Spark: Efficient data movement between data lake, processing layer & database – Native BI Connectivity: Visualizing operational data – Workload isolation: separation between operational and analytical workloads
  • 23. 23 Operationalizing for Scale & Security MongoDB Relational Column Family (i.e. HBase) Robust security controls Yes Yes Yes Scale-out on commodity hardware Yes No Yes Sophisticated management platform Yes Yes Monitoring only • Why it matters – Security: Data protection for regulatory compliance – Scale-Out: Grow with the data lake – Management: Reduce TCO with platform automation, monitoring, disaster recovery
  • 25. Adoption & Skills Availability
  • 27. 27 Problem Why MongoDB ResultsProblem Solution Results Existing EDW with nightly batch loads No real-time analytics to personalize user experience Application changes broke ETL pipeline Unable to scale as services expanded Microservices architecture running on AWS All application events written to Kafka queue, routed to MongoDB and Hadoop Events that personalize real-time experience (ie triggering email send, additional questions, offers) written to MongoDB All event data aggregated with other data sources and analyzed in Hadoop, updated customer profiles written back to MongoDB 2x faster delivery of new services after migrating to new architecture Enabled continuous delivery: pushing new features every day Personalized user experience, plus higher uptime and scalability UK’s Leading Price Comparison Site Out-pacing Internet search giants with continuous delivery pipeline powered by microservices & Docker running MongoDB, Kafka and Hadoop in the cloud
  • 28. 28 Problem Why MongoDB Results Problem Solution Results Customer data scattered across 100+ different systems Poor customer experience: no personalization, no consistent experience across brands or devices No way to analyze customer behavior to deliver targeted offers Selected MongoDB over HBase for schema flexibility and rich query support MongoDB stores all customer profiles, served to web, mobile & call-center apps Distributed across multiple regions for DR and data locality All customer interactions stored in MongoDB, loaded into Hadoop for customer segmentation Unified processing pipeline with Spark running across MongoDB and Hadoop Single profile created for each customer, personalizing experience in real time Revenue optimization by calculating best ticket prices Reduce competitive pressures by identifying gaps in product offerings Customer Data Management Single view and real-time analytics with MongoDB, Spark, & Hadoop Leading Global Airline
  • 29. 29 Problem Why MongoDB Results Problem Solution Results Commercialize a national security platform Massive volumes of multi- structured data: news, RSS & social feeds, geospatial, geological, health & crime stats Requires complex analysis, delivered in real time, always on Apache NiFI for data ingestion, routing & metadata management Hadoop for text analytics HANA for geospatial analytics MongoDB correlates analytics with user profiles & location data to deliver real-time alerts to corporate security teams & individual travelers Enables Prescient to uniquely blend big data technology with its security IP developed in government Dynamic data model supports indexing 38k data sources, growing at 200 per day 24x7 continuous availability Scalability to PBs of data World’s Most Sophisticated Traveler Safety Platform Analyzing PBs of Data with MongoDB, Hadoop, Apache NiFi & SAP HANA
  • 30. 30 Problem Why MongoDB Results Problem Solution Results Requirement to analyze data over many different dimensions to detect real time threat profiles HBase unable to query data beyond primary key lookups Lucene search unable to scale with growth in data MongoDB + Hadoop to collect and analyze data from internet sensors in real time MongoDB dynamic schema enables sensor data to be enriched with geospatial tags Auto-sharding to scale as data volumes grow Run complex, real-time analytics on live data Improved query performance by over 3x Scale to support doubling of data volume every 24 months Deploy across global data centers for low latency user experience Engineering teams have more time to develop new features Powering Global Threat Intelligence Cloud-based real-time analytics with MongoDB & Hadoop
  • 32. Conclusion 1 Data lakes enabling enterprises to affordably capture & analyze more data 2 Operational and analytical workloads are converging 3 MongoDB is the key technology to operationalize the data lake
  • 33. 33 MongoDB Compass MongoDB Connector for BI MongoDB Enterprise Server MongoDB Enterprise Advanced24x7Support (1hourSLA) CommercialLicense (NoAGPLCopyleftRestrictions) Platform Certifications MongoDB Ops Manager Monitoring & Alerting Query Optimization Backup & Recovery Automation & Configuration Schema Visualization Data Exploration Ad-Hoc Queries Visualization Analysis Reporting Authorization Auditing Encryption (In Flight & at Rest) Authentication REST APIEmergency Patches Customer Success Program On-Demand Online Training Warranty Limitation of Liability Indemnification
  • 34. 500+ employees About MongoDB, Inc. 2,000+ customers 13 offices worldwide $311M in funding
  • 35. 35 Resources to Learn More • Guide: Operational Data Lake • Whitepaper: Real-Time Analytics with Apache Spark & MongoDB
  • 36.
  • 37. 37 For More Information Resource Location Case Studies mongodb.com/customers Presentations mongodb.com/presentations Free Online Training education.mongodb.com Webinars and Events mongodb.com/events Documentation docs.mongodb.org MongoDB Downloads mongodb.com/download Additional Info info@mongodb.com
  • 38. 38 Problem Why MongoDB Results Problem Solution Results System failures in online banking systems creating customer sat issues No personalization experience across channels No enrichment of user data with social media chatter Apache Flume to ingest log data & social media streams, Apache Spark to process log events MongoDB to persist log data and KPIs, immediately rebuild user sessions when a service fails Integration with MongoDB query language and secondary indexes to selectively filter and query data in real time Improved user experience, with more customers using online, self-service channels Improved services following deeper understanding of how users interact with systems Greater user insight by adding social media insights One of World’s Largest Banks Creating new customer insights with MongoDB & Spark
  • 39. 39 LEGACY FUTURE STATE APPS On-Premise, Monoliths SaaS, Microservices DATABASE Relational (Oracle) Non-Relational (MongoDB) EDW Teradata, Oracle, etc. Hadoop COMPUTE Scale-Up Server Containers / Commodity Server / Cloud STORAGE SAN Local Storage & Data Lakes NETWORK Routers and Switches Software-Defined Networks The New Enterprise Stack
  • 40. Operational Application Analytics Application MongoDB Primary MongoDB Secondary MongoDB Secondary Real Time analytics to inform operational application Querying operational data Workload Isolation for Real-Time Analytics
  • 41. 41 Handling Multi-Structured Data from the Data Lake Flexible, Governed Data Model { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Fields can contain an array of sub-documents Typed field values Fields can contain arrays Number
  • 42. 42 Expressive Query Language, Rich Secondary Indexes Rich Queries • Find Paul’s cars • Find everybody in London with a car between 1970 and 1980 Geospatial • Find all of the car owners within 5km of Trafalgar Sq. Text Search • Find all the cars described as having leather seats Aggregation • Calculate the average value of Paul’s car collection Map Reduce • What is the ownership pattern of colors by geography over time (is purple trending in China?)
  • 43. 43 Visualizing Operational Data MongoDB Connector for BI Visualize and explore multi-structured data using SQL-based BI platforms. Your BI Platform BI Connector Provides Schema Translates Queries Translates Response
  • 44. 44 Enterprise-Grade Security *Included with MongoDB Enterprise Advanced BUSINESS NEEDS SECURITY FEATURES Authentication SCRAM, LDAP*, Kerberos*, x.509 Certificates Authorization Built-in Roles, User-Defined Roles, Field-Level Redaction Auditing* Admin, DML, DDL, Role-based Encryption Network: SSL (with FIPS 140-2), Disk: Encrypted Storage Engine* or Partner Solutions
  • 46. 46 Management Tooling: MongoDB Ops Manager • Monitoring & alerting • Integration to APM platforms • Prescriptive management with query profiling • Automated cluster provisioning, scaling and upgrades • Continuous, point in time backup

Editor's Notes

  1. Seen rapid growth in adoption of the data lake – a centralized repository for many new data sources orgs now collecting But not without challenges – primary challenge is how to make analytics generated by the data lake available to our real time, operational apps So we are going to cover Rise of data lake Challenges presented in getting most biz value out of data lake Role that databases play, and requirements Case studies who are unlockig insight from the data lake
  2. As enterprises bring more products and services on line as part of digital transformation initiatives, one thing don’t lack today is data – from streams of sensor readings, to social sentiment, to machine logs, mobile apps, and more. Analysts estimate volumes growing at 40% per annum, with 80% of all data unstructured. Same time – we see more pressure on time to market, on exposing apps to global audiences, and in reducing cost of delivering new services Trends fundamentally changes how enterprises build and run modern apps
  3. What all of this new data available, we are creating an insight economy Uncovering new insights by collecting and analyzing this data carries the promise of competitive advantage and efficiency savings. Better understand customers by predicting what they might buy based on behavior, on demographics – could be optimizing supply chain to better or faster routes. Reducing risk of fraud by identifying suspicious behavior – its all about that data Those that don’t harness data are at major disadvantage understand the past, monitor the present, and predict the future
  4. Traditional source of data from operational apps has been DW, take all this data in, then create analytics from it However, the traditional Enterprise Data Warehouse (EDW) is straining under the load, overwhelmed by the sheer volume and variety of data pouring into the business. Costs – hundreds to thousands of $ per TB v 10s to hundreds in commodity systems
  5. Becaise of these challenges many organizations have turned to Hadoop as a centralized repository for this new data, creating what many call a data lake. Not are replacement – adjunct – stores all new data – apply new analytics which combined with traditional reporting coming from the DW Gartner estimate around 50% of ents have or are in the process of rolling out data lakes
  6. When we think about data lakes, think about big data, and big data often associated with Hadoop – reality is more than just Hadoop Market growth forecast by wikibon – “big data revenues” growing from $19bn 2016 to $92bn in 2026. S/W outpacing h/w and PS. IDC forecasr Just under $50bn by 2019, 23% CAGR. Software growing fastest Leading charge, Hadoop and spark. Closely followed by databases – key part of big data landscape – because they operationalize the data lake – link between backend data lake and front end apps that consume analytics to make those apps smarter
  7. Hadoop – well established, celebrates 10th anniversry this year Grown from HDFS and MR into dozens of projects - Gartner identify 19 common projects supported by 4 leading distros. Avg distro has many more projects – processing frameworks, to search, to provisionng and mgmt, to security to file formats to integration Each project is developed independenytly – own roadmap, own dependencies – incredible complexity HDFS is the common storage layer – against which processing frameworks run to produce outputs you see on the slide
  8. While something like 50% of enterprises either have or are evaluating Hadoop to create new classes of app, not without its challenges Appears in a number of Gartner analysis, any by the press
  9. One of the fundamental challenges in integration is how to integrate data lake with your operational systems Operational apps run the business – how do you expose analytics created in the data lake to better serve customers with more relevant products and offers, to better drive efficiency savings from IoT-enabled smart factory Unify data lake analytics with the operational applications Enables you to create smart, contextually aware, data-driven apps Integrated database layer operationalizes the data lake
  10. Obvious question is why do we need a database when we have Hadoop. Comes down to how each platform persists and accesses data. HDFS is a file system – accesses data in batches of 128MB blocks. MongoDB is a database which provides fine grained access to data at the level of individual records – gives each system very different properties – talk through. Despite those differences, lots of similarities – in how we process data – MR, Spark. These are unopinionated on underlying persistence layer – could be HDFS, could be MongoDB. Means can unify analytics across data lake and in your database Both MongoDB and HDFS – common atrributes provide: Schema on Read, multiple replicas for fault tolerance horizontal scale, low TCO. But have different characteristics in how they store and access data – means suited to different parts of the data lake deployment
  11. Differences come in how data is stored, accessed and updated. Hadoop is a file system – it stores data in files in blocks – has no knowledge of that underlying data – its has no indexes. If you want to access a specific record, scan all the data that stored in the file where the record is located – could be tens of MBs HDFS characteristics WORM, ie update customer data, rewite all that customer data, not just individual customers Hadoop excels at generating analytics models by scanning and processing large datasets, is not designed to provide real-time, random access by operational applications. the time to read the whole dataset is more important than the latency in reading the first record. http://stackoverflow.com/questions/15675312/why-hdfs-is-write-once-and-read-multiple-times/37300268#37300268\
  12. But MongoDB more than just a filesystem. Full database, so gives you a whole bunch of things hdfs doesn’t give – Millisecond latency query responsiveness. Random access to indexed subsets of data. Expressive querying & flexible indexing: Supporting complex queries and aggregations against the data in real time, making online applications smarter and contextual. Updating fast-changing data in real time as users interact with online applications, without having to rewrite the entire data set. fine-grained access with complex filtering logic, Use distributed processing libs against it – mongo collection or doc looks like an input or output in hdfs. Rather than load a file, load a dataframe. Hive sees Mongodb as a table Longer jobs Batch analytics Append only files Great for scanning all data or large subsets in files
  13. When you bring the database and the data lake together, you can build powerful, data driven apps Take a real life example – data lake of a large retailers Online store front and ecomm engine is powered by MongoDB – handling customer profiles, sessions, baskets, product catalogs – presenting recommendations and offers As they browse the ite, all of their activity is being written back to Hadoop –blending it with other data sources – social feeds, demogragpahics, market data, credit scores, currency feeds, to segment and cluster customers These can then be exposed to MongoDB, so when customers come back, presented with personalized experience – based on what you have browsed before – what you are likely want to purchase next. Could not serve that operational app that is dealing individual customers from hdfs – not real time, no indexes to access just the customer details you need. No way of updating customer record –everything is rewritten and recomputed Regression and classification for customer clustering
  14. Lets go deeper and wider This is a design pattern for the data lake – multiple components that collectively handle ingest, storage, processing and analysis of data, then serving it to consuming operational apps Step thru
  15. Data ingestion: Data streams are ingested to a pub/sub message queue, which routes all raw data into HDFS. Often also have event processing running against the queue to find interesting events that need to be consumed by the operational apps immediately - displaying an offer to a user browsing a product page, or alarms generated against vehicle telemetry from an IoT apps, are routed to MongoDB for immediate consumption by operational applications.
  16. Raw data is loaded into the data lake where we can use Hadoop jobs – MR or Spark, generate analytics models from the raw data – see examples in the layer above HDFS
  17. MongoDB exposes these models to the operational processes, serving indexed queries and updates against them with real-time latency
  18. The distributed processing frameworks can re-compute analytics models, against data stored in either HDFS or MongoDB, continuously flowing updates from the operational database to analytics models Look at some examples of users who have deployed this type of design pattern little later
  19. Beyond low latency performance, specific requirements. Need much more than just a datastore, fully-featured database serving as a System of Record for online applications Tight integration between MongoDB and the data lake – minimize data movement between them, fullt exploit native capabilities of each part of the system Need to be able to serve operational workloads, run analytics against live operational data –ie top trending articles now so I know where to place my ads, how many widgets coming off my produiction line are failing QA, is that up or down with previous trends. Gartner calls it HTAP (Hybrid Transactional and Analytical Processing), Forrester = transalytics – to do that, need: Powerful query language, secondary indexes, aggregations & transformations all within the database – not ETL into a warehouse Workload isolation: operational & analytics – so don’t contend for the same resource Flexible schema to handle multi-structured data, but need to enforce governance to that data Secure access to the data: – the operational DB typically accessed by a much broader audience than Hadoop, so security controls critical – robust access controls – LDAP, kerberos, RBAC Auditing of all events for reg compliance. Encr of data in motion and at rest, all built into the database Need to scale as the data lake scales – means scaling out on commodity hardware, often across geo regions To simplify the envrionment, need sophisticated mgmt tools: to automate database deployment, scaling, monitoring and alerting, and disaster recovery. Tight integration: not enough just to move data between analytics and operational layers – need to move it efficiently. Connectors should allow selective filtering by using secondary indexes to extract and process only the range of data it needs – for example, retrieving all customers located in a specific geography. This is very different from other databases that do not support secondary indexes. In these cases, Spark and Hadoop jobs are limited to extracting all data based on a simple primary key, even if only a subset of that data is required for the query. This means more processing overhead, more hardware, and longer time-to-insight for the user. Workload isolation: provision database clusters with dedicated analytic nodes, allowing users to simultaneously run real-time analytics and reporting queries against live data, without impacting nodes servicing the operational application. Flexible data model to store data of any structure, and easily evolve the model to capture new attribs – ie enriched user profiles with geospatial data. Also need to ensure data quality by enforcing validation rules against the data – to ensure it is appropriated typed, contains all attribs needed by the app Expressive queries developers to build applications that can query and analyze the data in multiple ways – by single keys, ranges, text search, and geospatial queries through to complex aggregations and MapReduce jobs, returning responses in milliseconds. Complex queries are executed natively in the database without having to use additional analytics frameworks or tools, and avoiding the latency that comes from moving data between operational and analytical engines. Secondary indexes give oppt to filter data in any way you need – key for low latency operational queries Robust security controls: govern access, provide audit trails and enc data in flight and at rest Scale-out – match scale out of data lake, as it grows, add new nodes to service higher data volumes or user load Advanced management platform. To reduce data lake TCO and risk of application downtime, powerful tooling to automate database deployment, scaling, monitoring and alerting, and disaster recovery.
  20. While its impt to provide low latency access to data, not enough to just support simple K-V lookups – demand is to get insights from data faster – so this is the role of RT analytics - track in RT where vehicles in your fleet, what social sentiment to an announcement you’ve just made, Correlate patterns of real time fraud attempts against specific domains – so this is where expressive query lang, secondary indexes, aggs in database are valuable. MongoDB and RDBMS both have strong features – RDBMS further ahead – column family – little more than k-v. Need to move data out to other query frameworks or analytics nodes to get any intelligence – adds latency, adds complexity – more moving parts RDBMS good in many areas, but lacks data model flexibility needed to handle rapidly changing, multi-structured data is where it falls downs. CF – more schema flexibility than relational, but still need to pre-define columns, restrict speed to evolve apps Data validtion – apply rules to data structures operational database stores – apps creates single view of your customer – data maybe spread across many repositories – loaded into data lake, creates single view, loads in mongo to serve operational apps – needs to ensure docs contains mandatory fields: unique customer identifiers, typed and formed in a specific way, ie ID is always an integer, email address always contains @. Doc validation in mongo enables you to do this. RDBMS full schema validation, so a little ahead – have to enforce govn in code in a CF database Look at aggregrated scores – relationla abnd mongo evenly matched, with CF, much simpler datastore, long way behind
  21. Hadoop and Spark integration: need to do more than just move vast amounts of data between each layer of the stack – need intelligent connectors that can push down predicates, filter data with secondary indexes – ie access all customers in a specific geo, without being able to access the DBs secondary indexe, and pre-aggregate data, moving a ton of data backward and forward – more processing cycles, longer latency. MngoDB connector for Hadoop, and for Spark, both support these capabilities. CF doesn’t offer secondary indexes or aggs, so nothing to filter the data RDBMS offers these capabilities in its connectors, but generally only available as expensive add-ons, hence downgraded Workload isolation – ability to perform real time analytics on live operational data, without interfering with operational apps – don’t want some type of aggregation looking at how many deliveries your fleet of trucks has made with how quickly you can detect from sensor data than a vehicle has developed a fault – key to do this is distribute queries to dedicated nodes in the database cluster – some provisioned for operational data, then replicating to nodes dedicated to analytics. MongoDB – up to 50 members in a single replica set – configure analytics as hidden so never hit by op queries. CF, restricted to just 3 data replicas – there for HA, not for separation of different workloads. RDBMS, expensive add-on Native BI connectivity – may not be relevant in all cases, but many orgs want to be able to create live dashboards reporting current state from op systems. MongoDB had a native BI connector that exposes database as an ODBC data source – visualize in anything from tableau to biz objects to excel. Rich tooling in relational world. CF, connector exist, 3rd party, don’t push down queries to the database, instead extract all data – so more computationally and network intensive to power dashboards
  22. Security: data from operational databases exposed to apps and potentially millions of users – need to provide robust access controls, may include integration with LDAP, kerberos, PKI environments and RBAC to titghly seggregate who can do what in the DB. Enc data in flight and at rest, need to maintain a log of activity in the DB for forensic analysis All solutions do well – big investment in Hadoop ecosystem, rapidly gainining ground on RDBMS, but doing it at much lower cost Scale out – need to be able to scale as data lake scales, and more digital services opened up to users – non-relational databaes, core strenght. Fundamental challenge in RDBMS requires scale up, limited headroom, very expenive in proprietary h/w Mgmt – Hadoop is complex, mgmt tools still primitive. For op database, need a platform that provides powerful tooling to automate database deployment, scaling, fine grained monitoring and alerting, and disaster recovery with point in time backups and automated restores. Rich tooling in relational world – big investment from Mongo to close that gap
  23. Left hand side – maintained attribs of relational – blended with innovation from NoSQL Uniquely differentiates mongodb from its peers in the non-relational DB market
  24. Invest in tech that has production proven deployments, broad skills availability With availability of Hadoop skills cited by Gartner analysts as a top challenge, it is essential you choose an operational database with a large available talent pool. This enables you to find staff who can rapidly build differentiated big data applications. Across multiple measures, including DB Engines Rankings, 451 Group NoSQL Skills Index and the Gartner Magic Quadrant for Operational Databases, MongoDB is the leading non-relational database.
  25. Look at examples in action
  26. CTM – UK’s leading price comparisons sites – moved from an on-prem RDBMS based monlithic app to microservices architecture powered by MongoDB with Hadoop at the back end providing analytics – enabled them better personalize customer experience and deepen relationships Read through bullets
  27. 2nd example leading global airline. Thru M&A – multiple brands to service different countries and market sectors, but customer data spread across 100 different systems. By using Hadoop and Spark, brought that data together to create a single view, and that is loaded into MongoDB which powers the online apps – web and mobile, as well as call center – so users get a consistent experience however they interact. All user data and ticket data is stored in MongoDB, then written back into Hadoop to run advanced analytics that allow ticket price optimization, identify offers, and gaps in product portfolio Read bullets
  28. Provide traveler safety platform for corp customers – if natural disaster or security incident while traveler away on biz, able to send real time alerts and advise on how to get to safety Platform built for national govts, now launched for commercial usage - Analyzing PBs of Data with MongoDB, Hadoop, Apache NiFi & SAP HANA Read bullets
  29. McAfee – built its cloud based threat intelligence platform on MongoDB. Platform monitor threat activity for clients in RT – identifies attacks are taking place, identifies when users maybe interacting with insecure or suspicious sites All RT activity is captured in MongoDB – provide alerting to security teams, sent to Hadoop for further backend analytics, with updated threat profiles written back to mongo
  30. MongoDB is open source – also provide EA Collection of software and support to run in production at scale
  31. The Stratio Apache Spark-certified Big Data (BD) platform is used by an impressive client list including BBVA, Just Eat, Santander, SAP, Sony, and Telefonica. The company has implemented a unified real-time monitoring platform for a multinational banking group operating in 31 countries with 51 million clients all over the world. The bank wanted to ensure a high quality of service and personalized experience across its online channels, and needed to continuously monitor client activity to check service response times and identify potential issues. The application was built on a modern technology foundation including: Apache Flume to aggregate log data Apache Spark to process log events in real time MongoDB to persist log data, processed events and Key Performance Indicators (KPIs). The aggregated KPIs, stored by MongoDB enable the bank to analyze client and systems behavior in real time in order to improve the customer experience. Collecting raw log data allows the bank to immediately rebuild user sessions if a service fails, with analysis generated by MongoDB and Spark providing complete traceability to quickly identify the root cause of any issue. The project required a database that provided always-on availability, high performance, and linear scalability. In addition, a fully dynamic schema was needed to support high volumes of rapidly changing semi-structured and unstructured JSON data being ingested from a variety of logs, clickstreams, and social networks. After evaluating the project’s requirements, Stratio concluded MongoDB was the best fit. With MongoDB’s query projections and secondary indexes, analytic processes run by the Stratio BD platform avoid the need to scan the entire data set, which is not the case with other databases.
  32. Digitial transformation not just impacting DW and analytics Not just in field of datawarehouse and analytics – across the stack, we’re seeing transformations
  33. Workload isolation. MongoDB replica sets can be provisioned with dedicated analytic nodes, allowing users to simultaneously run real-time analytics and reporting queries against live data, without impacting nodes servicing the operational application. Using MongoDB inbuilt replication, don’t have complex and brittle ETL pipelines that are moving data between operational and analytical systems
  34. MongoDB's document data model makes it easy for users to store and combine data of any structure, without giving up sophisticated validation rules. If new attributes need to be added – for example enriching user profiles with geo-location data – the schema can be modified without application downtime, and without having to update all existing records. Can also enforce structure – take a user profile – need to ensure all have a unique ID stored as an int with a valid email address – use doc validation to enfoce that
  35. enables developers to build applications that can query and analyze the data in multiple ways – by single keys, ranges, text search, and geospatial queries through to complex aggregations and MapReduce jobs, returning responses in milliseconds. Complex queries are executed natively in the database without having to use additional analytics frameworks or tools Secondary indexes: MongoDB supports compound, unique, array, partial, TTL, geospatial, sparse, hash and text indexes to optimize for multiple query patterns, data types and application requirements. Indexes are essential when operating across slices of the data, for example updating the churn analysis of a subset of customers, without having to scan all customer data.
  36. We need to visualize data for reporting and analytics – drive live dashboards MongoDB BI Connector… Provides the BI tool with the schema of the MongoDB collection to be visualized Translates SQL statements issued by the BI tool into equivalent MongoDB queries that are sent to MongoDB for processing Converts the results into the tabular format expected by the BI tool, which can then visualize the data based on user requirements
  37. Protect our Lot of investment in Hadop security, typically locked away to only a subset of analysts – the operational DB typically deployed to a much broader audience, so security controls critical – robust access controls – LDAP, kerberos, RBAC Auditing of all events for reg compliance. Encr of data in motion and at rest, all built into the database
  38. Need to be able to scale cost effectivly – as the data lake grows, we need to scale operational database layer in a way that is economic and doesn’t break apps With auto-sharding, MongoDB can be distributed across multiple nodes – both wihin and across datacenters Elastic - Increase or decrease capacity as you go, Automatic load balancing
  39. Need sophisticated operational tooling to manage operational database layer