MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB

Wide-Ranging Analytical
Solutions on MongoDB
DAWOUD IBRAHIM
Sr. Solutions Architect

IoT Edge Device
Charts
Atlas Data Lake

Operations on Data read/write, transform, aggregation, algorithm
Speed to Insight both how up-to-date data is and response times (SLA)
Effort training, development, management
Processing Model for Analytics distributed processing, iterative, streaming, etc.
Cost data duplication, memory, servers, software
Criteria for Tools to Use

Charts
Atlas Data Lake
IoT Edge Device

MongoDB Capabilities
for
Analytics, ML and AI

MongoDB Highlights for Analytics
DISTRIBUTED PARALLEL PROCESSING: Sharding & Replication
AGGREGATION FRAMEWORK
Data Lake (beta)
CONNECTORS
Ø Spark
Ø Hadoop
Ø R
VISUALIZATION
Ø Charts
Ø BI Connector

WORKLOAD ISOLATION
&
DISTRIBUTED PROCESSING

Put data where you need it:
Workload Isolation
Analytics
PRIMARY Secondary Secondary
Dedicated Analytics
BI & Reporting
Predictive Analytics
Aggregations

Agg
pipeline
…
Mongos
Run in parallel
on N partitions
Data returned
In parallel
Application
Each server
Workload split between
shards
Ø Client works through
mongos as with any
query
Sharding for Highly Parallel Processing

Date Manipulation String Manipulation Type Conversions
Aggregation Pipelines

Aggregation With a Sharded Database
Workload split between shards
1. Client works through mongos as with any query
2. Shards execute pipeline up to a point
3. A single shard merges cursors and continues
processing
4. $lookup & $out performed within Primary shard
for the database

Business Intelligence, Analytics, Machine Learning
Process data in MongoDB with the massive parallelism
of Spark, it's machine learning libraries, and streaming
API
● Process data “in place”, avoiding the latency
otherwise required by an incremental ETL task.
● Reduced Operational Complexity and Faster Time-
To-Analytics
● Aggregation pre-filtering in combination with
secondary indexing means that an analytics query
only draws that data required
● Multiple Language APIs

JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
Process data in MongoDB with the massive parallelism of
Spark, it's machine learning libraries, and streaming API
otherwise required by an incremental ETL task
● Reads from secondaries isolate analytics workload
from business critical operations
● Shard aware for data locality

WRITE
READ
Primary
2ndary
2ndary
Process data in MongoDB with the massive parallelism of
Spark, it's machine learning libraries, and streaming API
otherwise required by an incremental ETL task
● Reads from secondaries isolate analytics workload
from business critical operations
● Shard aware for data locality

Partitionable Distributed Analytics
…
Partitions
lined up
between
workers &
shard
Worker
Worker
Worker
…
Mongos
Mongos
Mongos
Master
Worker Mongos
Benefits
• Very parallelizable to
scale horizontally
• Intermediate results can
be on disk, not
necessarily memory
Common Frameworks
• Hadoop
• Spark

Use Cases
Data Lake Analytics Data Products and Services Active Archives
➔ explore all of your rich data
naturally
➔ get to data as it lands via
streams or microservices
➔ democratize access across
diverse user groups
➔ monetize data
➔ market research, data- and
insight-as-a-service
➔ snapshots, time series
analysis, predictive analytics
to innovate faster
➔ historical analysis against
data assets retained in long
term cold storage
➔ cost-effective data strategy

What is MongoDB Charts?
The best way to work with dataIntelligent data distribution Freedom to run anywhere
Create visualizations in seconds
Built for the MongoDB Document Model:
work with rich hierarchical data including
arrays and subdocument
The quickest and easiest way to build visualizations of data stored in MongoDB
No data movement or duplication
Workload Isolation to separate analytical
and transactional workloads
Run on Atlas - no infrastructure,
installation or upgrades
Or
Run on premises - access any data,
control your environment

Example Scenarios
Make better decisions by
analyzing transactional data
Solve problems by visualizing
log or telemetry data
Tell stories with data in blog
posts or articles
➔ Visualize data from operational systems
➔ Identify trends and signals from the
noise
➔ Create dashboards monitoring KPIs and
business metrics
➔ Make sense of large volumes of
technical data through charts
➔ Identify performance problems or
outliers
➔ Create system health dashboards
➔ Use charts to explain what happened
or what you should do
➔ Embed charts in context: in
documents, internal systems or public
blog posts

Charts vs BI Connector vs Compass
Charts BI Connector Compass
➔ You want to create custom
visualizations of MongoDB data
➔ Your team or project is using MongoDB
as its main or only database
➔ You do not have existing data
visualization tools, or you are unhappy
with your current tool
➔ You want to create custom
visualizations of MongoDB data
➔ Your team is using multiple different
databases
➔ You have existing data visualization
tools, and you would like to use them
with data from MongoDB
➔ You want to explore schemas and
documents in MongoDB collections
➔ You want to see simple prebuilt
visualizations showing the range of
values in a collection
➔ You want to author custom
aggregation pipelines, for use in
custom applications or to pre-
process data for Charts
When should I use...

Which Charts is for you?
➔ You want to visualize data from MongoDB Atlas
➔ You want to spend your time visualizing data, not
setting up managing servers or software
➔ You want immediate access to the latest Charts
features
Charts on MongoDB Atlas
➔ You want to visualize data from MongoDB
Enterprise Server or Atlas
➔ You want to keep all visualizations within your
private network
➔ You want control over the infrastructure hosting
Charts
Charts On-Premises

Resources
Learn more about MongoDB Charts https://mongodb.com/charts
MongoDB Connector for Spark https://docs.mongodb.com/spark-
connector/master/
Atlas Data Lake https://www.mongodb.com/atlas/data-lake
Sign up or sign in to MongoDB Atlas and use
Charts on Atlas
https://cloud.mongodb.com
MongoDB Stitch https://www.mongodb.com/cloud/stitch

Why MongoDB for Analytics
ü Flexible data model supports the entire process in all stages
ü Validation gives control over data formats and structures
ü Comprehensive queries
ü Parallelization through aggregation queries
ü Storage by Wired Tiger Engine either on-disk or in-memory possible
ü Connectors to Python, Scala, Spark and R
ü Secondary indices for performant deep learning, even with growing amounts of data
ü Index for text search, graph queries and geo-spatial queries
ü Continuous use in lab and production, no technology break
ü Index for text search, graph queries and geo-spatial queries

MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB

Similar to MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB