SlideShare a Scribd company logo
1 of 33
Download to read offline
Wide-Ranging Analytical
Solutions on MongoDB
DAWOUD IBRAHIM
Sr. Solutions Architect
IoT Edge Device
Charts
Atlas Data Lake
Why Are Analytics Important?
So Many Options…
Operations on Data read/write, transform, aggregation, algorithm
Speed to Insight both how up-to-date data is and response times (SLA)
Effort training, development, management
Processing Model for Analytics distributed processing, iterative, streaming, etc.
Cost data duplication, memory, servers, software
Criteria for Tools to Use
Quick Demo
Charts
Atlas Data Lake
IoT Edge Device
MongoDB Capabilities
for
Analytics, ML and AI
MongoDB Highlights for Analytics
DISTRIBUTED PARALLEL PROCESSING: Sharding & Replication
AGGREGATION FRAMEWORK
Data Lake (beta)
CONNECTORS
Ø Spark
Ø Hadoop
Ø R
VISUALIZATION
Ø Charts
Ø BI Connector
WORKLOAD ISOLATION
&
DISTRIBUTED PROCESSING
Put data where you need it:
Workload Isolation
Analytics
PRIMARY Secondary Secondary
Dedicated Analytics
BI & Reporting
Predictive Analytics
Aggregations
Agg
pipeline
…
Mongos
Run in parallel
on N partitions
Data returned
In parallel
Application
Each server
Workload split between
shards
Ø Client works through
mongos as with any
query
Sharding for Highly Parallel Processing
AGGREGATION FRAMEWORK
Date Manipulation String Manipulation Type Conversions
Aggregation Pipelines
Aggregation With a Sharded Database
Workload split between shards
1. Client works through mongos as with any query
2. Shards execute pipeline up to a point
3. A single shard merges cursors and continues
processing
4. $lookup & $out performed within Primary shard
for the database
MONGODB
SPARK CONNECTOR
Business Intelligence, Analytics, Machine Learning
Process data in MongoDB with the massive parallelism
of Spark, it's machine learning libraries, and streaming
API
● Process data “in place”, avoiding the latency
otherwise required by an incremental ETL task.
● Reduced Operational Complexity and Faster Time-
To-Analytics
● Aggregation pre-filtering in combination with
secondary indexing means that an analytics query
only draws that data required
● Multiple Language APIs
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
Business Intelligence, Analytics, Machine Learning
Process data in MongoDB with the massive parallelism of
Spark, it's machine learning libraries, and streaming API
● Process data “in place”, avoiding the latency
otherwise required by an incremental ETL task
● Aggregation pre-filtering in combination with
secondary indexing means that an analytics query
only draws that data required
● Reads from secondaries isolate analytics workload
from business critical operations
● Shard aware for data locality
WRITE
READ
Primary
2ndary
2ndary
Business Intelligence, Analytics, Machine Learning
Process data in MongoDB with the massive parallelism of
Spark, it's machine learning libraries, and streaming API
● Process data “in place”, avoiding the latency
otherwise required by an incremental ETL task
● Aggregation pre-filtering in combination with
secondary indexing means that an analytics query
only draws that data required
● Reads from secondaries isolate analytics workload
from business critical operations
● Shard aware for data locality
Partitionable Distributed Analytics
…
Partitions
lined up
between
workers &
shard
Worker
Worker
Worker
…
Mongos
Mongos
Mongos
Master
Worker Mongos
Benefits
• Very parallelizable to
scale horizontally
• Intermediate results can
be on disk, not
necessarily memory
Common Frameworks
• Hadoop
• Spark
MONGODB
Atlas Data Lake
Use Cases
Data Lake Analytics Data Products and Services Active Archives
➔ explore all of your rich data
naturally
➔ get to data as it lands via
streams or microservices
➔ democratize access across
diverse user groups
➔ monetize data
➔ market research, data- and
insight-as-a-service
➔ snapshots, time series
analysis, predictive analytics
to innovate faster
➔ historical analysis against
data assets retained in long
term cold storage
➔ cost-effective data strategy
MONGODB
Charts
What is MongoDB Charts?
The best way to work with dataIntelligent data distribution Freedom to run anywhere
Create visualizations in seconds
Built for the MongoDB Document Model:
work with rich hierarchical data including
arrays and subdocument
The quickest and easiest way to build visualizations of data stored in MongoDB
No data movement or duplication
Workload Isolation to separate analytical
and transactional workloads
Run on Atlas - no infrastructure,
installation or upgrades
Or
Run on premises - access any data,
control your environment
Example Scenarios
Make better decisions by
analyzing transactional data
Solve problems by visualizing
log or telemetry data
Tell stories with data in blog
posts or articles
➔ Visualize data from operational systems
➔ Identify trends and signals from the
noise
➔ Create dashboards monitoring KPIs and
business metrics
➔ Make sense of large volumes of
technical data through charts
➔ Identify performance problems or
outliers
➔ Create system health dashboards
➔ Use charts to explain what happened
or what you should do
➔ Embed charts in context: in
documents, internal systems or public
blog posts
Charts vs BI Connector vs Compass
Charts BI Connector Compass
➔ You want to create custom
visualizations of MongoDB data
➔ Your team or project is using MongoDB
as its main or only database
➔ You do not have existing data
visualization tools, or you are unhappy
with your current tool
➔ You want to create custom
visualizations of MongoDB data
➔ Your team is using multiple different
databases
➔ You have existing data visualization
tools, and you would like to use them
with data from MongoDB
➔ You want to explore schemas and
documents in MongoDB collections
➔ You want to see simple prebuilt
visualizations showing the range of
values in a collection
➔ You want to author custom
aggregation pipelines, for use in
custom applications or to pre-
process data for Charts
When should I use...
Which Charts is for you?
➔ You want to visualize data from MongoDB Atlas
➔ You want to spend your time visualizing data, not
setting up managing servers or software
➔ You want immediate access to the latest Charts
features
Charts on MongoDB Atlas
➔ You want to visualize data from MongoDB
Enterprise Server or Atlas
➔ You want to keep all visualizations within your
private network
➔ You want control over the infrastructure hosting
Charts
Charts On-Premises
Resources
Learn more about MongoDB Charts https://mongodb.com/charts
MongoDB Connector for Spark https://docs.mongodb.com/spark-
connector/master/
Atlas Data Lake https://www.mongodb.com/atlas/data-lake
Sign up or sign in to MongoDB Atlas and use
Charts on Atlas
https://cloud.mongodb.com
MongoDB Stitch https://www.mongodb.com/cloud/stitch
Charts
Atlas Data Lake
IoT Edge Device
Summary
Why MongoDB for Analytics
ü Flexible data model supports the entire process in all stages
ü Validation gives control over data formats and structures
ü Comprehensive queries
ü Parallelization through aggregation queries
ü Storage by Wired Tiger Engine either on-disk or in-memory possible
ü Connectors to Python, Scala, Spark and R
ü Secondary indices for performant deep learning, even with growing amounts of data
ü Index for text search, graph queries and geo-spatial queries
ü Continuous use in lab and production, no technology break
ü Index for text search, graph queries and geo-spatial queries
DEMOS
QA

More Related Content

What's hot

Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Databricks
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Databricks
 
Advanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLAdvanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using ML
Databricks
 
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
Databricks
 

What's hot (20)

Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Optimize the Large Scale Graph Applications by using Apache Spark with 4-5x P...
Optimize the Large Scale Graph Applications by using Apache Spark with 4-5x P...Optimize the Large Scale Graph Applications by using Apache Spark with 4-5x P...
Optimize the Large Scale Graph Applications by using Apache Spark with 4-5x P...
 
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
 
ADF Mapping Data Flows Training V2
ADF Mapping Data Flows Training V2ADF Mapping Data Flows Training V2
ADF Mapping Data Flows Training V2
 
Spark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu Adunuthula
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta CachingReal-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
 
Advanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLAdvanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using ML
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Zipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering FrameworkZipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering Framework
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 

Similar to MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB

MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_Spark
Mat Keep
 

Similar to MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB (20)

Serverless_with_MongoDB
Serverless_with_MongoDBServerless_with_MongoDB
Serverless_with_MongoDB
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
MongoDB.local Seattle 2019: Bringing Data to Life with MongoDB Charts
MongoDB.local Seattle 2019: Bringing Data to Life with MongoDB Charts MongoDB.local Seattle 2019: Bringing Data to Life with MongoDB Charts
MongoDB.local Seattle 2019: Bringing Data to Life with MongoDB Charts
 
Architecting Wide-ranging Analytical Solutions with MongoDB
Architecting Wide-ranging Analytical Solutions with MongoDBArchitecting Wide-ranging Analytical Solutions with MongoDB
Architecting Wide-ranging Analytical Solutions with MongoDB
 
data_engineering_basics.pdf
data_engineering_basics.pdfdata_engineering_basics.pdf
data_engineering_basics.pdf
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_Spark
 
Elevate MongoDB with ODBC/JDBC
Elevate MongoDB with ODBC/JDBCElevate MongoDB with ODBC/JDBC
Elevate MongoDB with ODBC/JDBC
 
MongoDB.local Sydney: Bringing Data to Life with MongoDB Charts
MongoDB.local Sydney: Bringing Data to Life with MongoDB ChartsMongoDB.local Sydney: Bringing Data to Life with MongoDB Charts
MongoDB.local Sydney: Bringing Data to Life with MongoDB Charts
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
 
在-MongoDB-Cloud-上構建無服務器化應用
在-MongoDB-Cloud-上構建無服務器化應用在-MongoDB-Cloud-上構建無服務器化應用
在-MongoDB-Cloud-上構建無服務器化應用
 
Bringing Data to Life with MongoDB Charts
Bringing Data to Life with MongoDB ChartsBringing Data to Life with MongoDB Charts
Bringing Data to Life with MongoDB Charts
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
 
Introduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationIntroduction to GCP Data Flow Presentation
Introduction to GCP Data Flow Presentation
 
Introduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationIntroduction to GCP DataFlow Presentation
Introduction to GCP DataFlow Presentation
 
Discover MongoDB Atlas and MongoDB Stitch - DEM02-S - Mexico City AWS Summit
Discover MongoDB Atlas and MongoDB Stitch - DEM02-S - Mexico City AWS SummitDiscover MongoDB Atlas and MongoDB Stitch - DEM02-S - Mexico City AWS Summit
Discover MongoDB Atlas and MongoDB Stitch - DEM02-S - Mexico City AWS Summit
 
How Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data StackHow Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data Stack
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB

  • 1. Wide-Ranging Analytical Solutions on MongoDB DAWOUD IBRAHIM Sr. Solutions Architect
  • 3. Why Are Analytics Important?
  • 5. Operations on Data read/write, transform, aggregation, algorithm Speed to Insight both how up-to-date data is and response times (SLA) Effort training, development, management Processing Model for Analytics distributed processing, iterative, streaming, etc. Cost data duplication, memory, servers, software Criteria for Tools to Use
  • 9. MongoDB Highlights for Analytics DISTRIBUTED PARALLEL PROCESSING: Sharding & Replication AGGREGATION FRAMEWORK Data Lake (beta) CONNECTORS Ø Spark Ø Hadoop Ø R VISUALIZATION Ø Charts Ø BI Connector
  • 11. Put data where you need it: Workload Isolation Analytics PRIMARY Secondary Secondary Dedicated Analytics BI & Reporting Predictive Analytics Aggregations
  • 12. Agg pipeline … Mongos Run in parallel on N partitions Data returned In parallel Application Each server Workload split between shards Ø Client works through mongos as with any query Sharding for Highly Parallel Processing
  • 14. Date Manipulation String Manipulation Type Conversions Aggregation Pipelines
  • 15. Aggregation With a Sharded Database Workload split between shards 1. Client works through mongos as with any query 2. Shards execute pipeline up to a point 3. A single shard merges cursors and continues processing 4. $lookup & $out performed within Primary shard for the database
  • 17. Business Intelligence, Analytics, Machine Learning Process data in MongoDB with the massive parallelism of Spark, it's machine learning libraries, and streaming API ● Process data “in place”, avoiding the latency otherwise required by an incremental ETL task. ● Reduced Operational Complexity and Faster Time- To-Analytics ● Aggregation pre-filtering in combination with secondary indexing means that an analytics query only draws that data required ● Multiple Language APIs
  • 18. JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON Business Intelligence, Analytics, Machine Learning Process data in MongoDB with the massive parallelism of Spark, it's machine learning libraries, and streaming API ● Process data “in place”, avoiding the latency otherwise required by an incremental ETL task ● Aggregation pre-filtering in combination with secondary indexing means that an analytics query only draws that data required ● Reads from secondaries isolate analytics workload from business critical operations ● Shard aware for data locality
  • 19. WRITE READ Primary 2ndary 2ndary Business Intelligence, Analytics, Machine Learning Process data in MongoDB with the massive parallelism of Spark, it's machine learning libraries, and streaming API ● Process data “in place”, avoiding the latency otherwise required by an incremental ETL task ● Aggregation pre-filtering in combination with secondary indexing means that an analytics query only draws that data required ● Reads from secondaries isolate analytics workload from business critical operations ● Shard aware for data locality
  • 20. Partitionable Distributed Analytics … Partitions lined up between workers & shard Worker Worker Worker … Mongos Mongos Mongos Master Worker Mongos Benefits • Very parallelizable to scale horizontally • Intermediate results can be on disk, not necessarily memory Common Frameworks • Hadoop • Spark
  • 22. Use Cases Data Lake Analytics Data Products and Services Active Archives ➔ explore all of your rich data naturally ➔ get to data as it lands via streams or microservices ➔ democratize access across diverse user groups ➔ monetize data ➔ market research, data- and insight-as-a-service ➔ snapshots, time series analysis, predictive analytics to innovate faster ➔ historical analysis against data assets retained in long term cold storage ➔ cost-effective data strategy
  • 24. What is MongoDB Charts? The best way to work with dataIntelligent data distribution Freedom to run anywhere Create visualizations in seconds Built for the MongoDB Document Model: work with rich hierarchical data including arrays and subdocument The quickest and easiest way to build visualizations of data stored in MongoDB No data movement or duplication Workload Isolation to separate analytical and transactional workloads Run on Atlas - no infrastructure, installation or upgrades Or Run on premises - access any data, control your environment
  • 25. Example Scenarios Make better decisions by analyzing transactional data Solve problems by visualizing log or telemetry data Tell stories with data in blog posts or articles ➔ Visualize data from operational systems ➔ Identify trends and signals from the noise ➔ Create dashboards monitoring KPIs and business metrics ➔ Make sense of large volumes of technical data through charts ➔ Identify performance problems or outliers ➔ Create system health dashboards ➔ Use charts to explain what happened or what you should do ➔ Embed charts in context: in documents, internal systems or public blog posts
  • 26. Charts vs BI Connector vs Compass Charts BI Connector Compass ➔ You want to create custom visualizations of MongoDB data ➔ Your team or project is using MongoDB as its main or only database ➔ You do not have existing data visualization tools, or you are unhappy with your current tool ➔ You want to create custom visualizations of MongoDB data ➔ Your team is using multiple different databases ➔ You have existing data visualization tools, and you would like to use them with data from MongoDB ➔ You want to explore schemas and documents in MongoDB collections ➔ You want to see simple prebuilt visualizations showing the range of values in a collection ➔ You want to author custom aggregation pipelines, for use in custom applications or to pre- process data for Charts When should I use...
  • 27. Which Charts is for you? ➔ You want to visualize data from MongoDB Atlas ➔ You want to spend your time visualizing data, not setting up managing servers or software ➔ You want immediate access to the latest Charts features Charts on MongoDB Atlas ➔ You want to visualize data from MongoDB Enterprise Server or Atlas ➔ You want to keep all visualizations within your private network ➔ You want control over the infrastructure hosting Charts Charts On-Premises
  • 28. Resources Learn more about MongoDB Charts https://mongodb.com/charts MongoDB Connector for Spark https://docs.mongodb.com/spark- connector/master/ Atlas Data Lake https://www.mongodb.com/atlas/data-lake Sign up or sign in to MongoDB Atlas and use Charts on Atlas https://cloud.mongodb.com MongoDB Stitch https://www.mongodb.com/cloud/stitch
  • 31. Why MongoDB for Analytics ü Flexible data model supports the entire process in all stages ü Validation gives control over data formats and structures ü Comprehensive queries ü Parallelization through aggregation queries ü Storage by Wired Tiger Engine either on-disk or in-memory possible ü Connectors to Python, Scala, Spark and R ü Secondary indices for performant deep learning, even with growing amounts of data ü Index for text search, graph queries and geo-spatial queries ü Continuous use in lab and production, no technology break ü Index for text search, graph queries and geo-spatial queries
  • 32. DEMOS
  • 33. QA