MongoDB natively provides a rich analytics framework within the database. We will highlight the different tools, features and capabilities that MongoDB provides to enable various analytics scenarios ranging from AI, Machine Learning and applications. We will demonstrate a Machine Learning (ML) example using MongoDB and Spark.
5. Operations on Data read/write, transform, aggregation, algorithm
Speed to Insight both how up-to-date data is and response times (SLA)
Effort training, development, management
Processing Model for Analytics distributed processing, iterative, streaming, etc.
Cost data duplication, memory, servers, software
Criteria for Tools to Use
11. Put data where you need it:
Workload Isolation
Analytics
PRIMARY Secondary Secondary
Dedicated Analytics
BI & Reporting
Predictive Analytics
Aggregations
12. Agg
pipeline
…
Mongos
Run in parallel
on N partitions
Data returned
In parallel
Application
Each server
Workload split between
shards
Ø Client works through
mongos as with any
query
Sharding for Highly Parallel Processing
15. Aggregation With a Sharded Database
Workload split between shards
1. Client works through mongos as with any query
2. Shards execute pipeline up to a point
3. A single shard merges cursors and continues
processing
4. $lookup & $out performed within Primary shard
for the database
17. Business Intelligence, Analytics, Machine Learning
Process data in MongoDB with the massive parallelism
of Spark, it's machine learning libraries, and streaming
API
● Process data “in place”, avoiding the latency
otherwise required by an incremental ETL task.
● Reduced Operational Complexity and Faster Time-
To-Analytics
● Aggregation pre-filtering in combination with
secondary indexing means that an analytics query
only draws that data required
● Multiple Language APIs
18. JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
Business Intelligence, Analytics, Machine Learning
Process data in MongoDB with the massive parallelism of
Spark, it's machine learning libraries, and streaming API
● Process data “in place”, avoiding the latency
otherwise required by an incremental ETL task
● Aggregation pre-filtering in combination with
secondary indexing means that an analytics query
only draws that data required
● Reads from secondaries isolate analytics workload
from business critical operations
● Shard aware for data locality
19. WRITE
READ
Primary
2ndary
2ndary
Business Intelligence, Analytics, Machine Learning
Process data in MongoDB with the massive parallelism of
Spark, it's machine learning libraries, and streaming API
● Process data “in place”, avoiding the latency
otherwise required by an incremental ETL task
● Aggregation pre-filtering in combination with
secondary indexing means that an analytics query
only draws that data required
● Reads from secondaries isolate analytics workload
from business critical operations
● Shard aware for data locality
20. Partitionable Distributed Analytics
…
Partitions
lined up
between
workers &
shard
Worker
Worker
Worker
…
Mongos
Mongos
Mongos
Master
Worker Mongos
Benefits
• Very parallelizable to
scale horizontally
• Intermediate results can
be on disk, not
necessarily memory
Common Frameworks
• Hadoop
• Spark
22. Use Cases
Data Lake Analytics Data Products and Services Active Archives
➔ explore all of your rich data
naturally
➔ get to data as it lands via
streams or microservices
➔ democratize access across
diverse user groups
➔ monetize data
➔ market research, data- and
insight-as-a-service
➔ snapshots, time series
analysis, predictive analytics
to innovate faster
➔ historical analysis against
data assets retained in long
term cold storage
➔ cost-effective data strategy
24. What is MongoDB Charts?
The best way to work with dataIntelligent data distribution Freedom to run anywhere
Create visualizations in seconds
Built for the MongoDB Document Model:
work with rich hierarchical data including
arrays and subdocument
The quickest and easiest way to build visualizations of data stored in MongoDB
No data movement or duplication
Workload Isolation to separate analytical
and transactional workloads
Run on Atlas - no infrastructure,
installation or upgrades
Or
Run on premises - access any data,
control your environment
25. Example Scenarios
Make better decisions by
analyzing transactional data
Solve problems by visualizing
log or telemetry data
Tell stories with data in blog
posts or articles
➔ Visualize data from operational systems
➔ Identify trends and signals from the
noise
➔ Create dashboards monitoring KPIs and
business metrics
➔ Make sense of large volumes of
technical data through charts
➔ Identify performance problems or
outliers
➔ Create system health dashboards
➔ Use charts to explain what happened
or what you should do
➔ Embed charts in context: in
documents, internal systems or public
blog posts
26. Charts vs BI Connector vs Compass
Charts BI Connector Compass
➔ You want to create custom
visualizations of MongoDB data
➔ Your team or project is using MongoDB
as its main or only database
➔ You do not have existing data
visualization tools, or you are unhappy
with your current tool
➔ You want to create custom
visualizations of MongoDB data
➔ Your team is using multiple different
databases
➔ You have existing data visualization
tools, and you would like to use them
with data from MongoDB
➔ You want to explore schemas and
documents in MongoDB collections
➔ You want to see simple prebuilt
visualizations showing the range of
values in a collection
➔ You want to author custom
aggregation pipelines, for use in
custom applications or to pre-
process data for Charts
When should I use...
27. Which Charts is for you?
➔ You want to visualize data from MongoDB Atlas
➔ You want to spend your time visualizing data, not
setting up managing servers or software
➔ You want immediate access to the latest Charts
features
Charts on MongoDB Atlas
➔ You want to visualize data from MongoDB
Enterprise Server or Atlas
➔ You want to keep all visualizations within your
private network
➔ You want control over the infrastructure hosting
Charts
Charts On-Premises
28. Resources
Learn more about MongoDB Charts https://mongodb.com/charts
MongoDB Connector for Spark https://docs.mongodb.com/spark-
connector/master/
Atlas Data Lake https://www.mongodb.com/atlas/data-lake
Sign up or sign in to MongoDB Atlas and use
Charts on Atlas
https://cloud.mongodb.com
MongoDB Stitch https://www.mongodb.com/cloud/stitch
31. Why MongoDB for Analytics
ü Flexible data model supports the entire process in all stages
ü Validation gives control over data formats and structures
ü Comprehensive queries
ü Parallelization through aggregation queries
ü Storage by Wired Tiger Engine either on-disk or in-memory possible
ü Connectors to Python, Scala, Spark and R
ü Secondary indices for performant deep learning, even with growing amounts of data
ü Index for text search, graph queries and geo-spatial queries
ü Continuous use in lab and production, no technology break
ü Index for text search, graph queries and geo-spatial queries