SlideShare a Scribd company logo
1 of 22
Download to read offline
BUILDING A MISSION
CRITICAL EVENT SYSTEM
ON TOP OF MONGODB
by @shahar_kedar
BIGPANDA
SaaS platform that lets companies aggregate alerts
from all their monitoring systems into one place for
faster incident discovery and response.
HOW IT WORKS
High CPU on	

prod-srv-1	

18/06/14 16:05	

CRITICAL
High CPU on	

prod-srv-1	

18/06/14 16:07	

WARNING	

Memory usage on	

prod-srv-1	

18/06/14 16:08	

CRITICAL	

Events Entities
High CPU on	

prod-srv-1	

WARNING
Memory usage on	

prod-srv-1	

CRITICAL	

Incidents
2 Alerts on 	

prod-srv-1
PRODUCT REQUIREMENTS
• Events need to be processed into incidents and
streamed to the user’s browser as fast as possible 	

• Incidents need to reliably reflect the state as it is in
the monitoring system	

• The service has to be up and running 24x7
MISSION CRITICAL
• It’s not rocket science, it’s not Google, but:	

• It has to be super fast	

• It has to be extremely reliable	

• It has to always be available
OUR #1 COMPETITOR
WHY MONGO?
BECAUSE IT’S WEB SCALE!
WHY MONGO?
At first:	

• NodeJS shop	

• Schemaless	

• Easy to master	

Later on:	

• Reliable	

• Easy to evolve	

• Partial and atomic updates	

• Powerful query language
BECAUSE IT’S WEB SCALE!
SUPER FAST
Hardware
Schema Design
Lean & Stream
HARDWARE
03/13
3 x m1.medium
02/14
1 x i2.xlarge

+	

2 x m1.medium
m1.medium: 1 vCPUs, 3.75GB RAM, EBS drive
06/14
2 x i2.xlarge

+	

1 x m3.xlarge
m3.xlarge: 4 vCPUs, 15GB RAM, EBS drive
i2.xlarge: 4 vCPUs, 30.5GB RAM, SSD 800GB
x3 reads
x4 writes
–Eliot Horowitz
“Schema design is … the largest factor when it comes
to performance and scalability … more important
than hardware, how you shard, or anything else,
schema is by far the most important thing.”
SCHEMA DESIGN
Event
{	

timestamp : Date	

status: String	

description: String,	

}	

Entity
{	

start : Date	

end: Date	

status: String	

description: String,	

events: [
<embedded>
]
source_system: String	

}	

Incident
{	

start : Date	

end: Date	

is_active: Boolean	

description: String,	

entities: [

{
entityId: ObjectId
status: String
}
]	

}
DENORMALIZATION
• Go over the checklist (http://bit.ly/1vUdz2T)	

• Incidents => Entities: partially embedded + ref	

• Cardinality: one-to-few	

• Direct access to Entities	

• Entities are frequently updated	

• Entities => Events: embedded	

• Events are not directly accessed	

• Events are immutable	

• Cardinality: one-to-many ~ one-to-gazzilion
INDEXES
• Optimized indexes 

db.collection.find({..}).explain()	

• Removed redundant indexes	

• Truncated events collections (TTL index)
LEAN QUERIES
• Use projections to limit fields returned by a query:

Model.find().select(‘-events’)	

• Mongoose users: use .lean() when possible to gain more
than 50% performance boost:

Model.find().lean()	

• Stream results: 

Model.find().stream().on(‘data’, function(doc){})

RESULTS
• Average latency of all API calls went from 500ms
to under 20ms	

• Average latency of full pipeline went from 2s to
under 500ms	

• Peak time latency of full pipeline went down from
5m(!!) to less than 30s
EXTREMELY
RELIABLE
Atomic & Partial Updates
ATOMIC & PARTIAL UPDATES
• Several services might try to update the same
document at the same time, but:	

• Different systems update different parts of the
document	

• Updates to the same document are sharded and
ordered at the application level 

(read our awesome blog post: http://bit.ly/1nQVcbS)
IMPOSSIBLETO
KILL
Replica Set
Disaster Recovery
REPLICA SET
• 3 nodes replica set	

• Using priorities to enforce master election of
stronger nodes	

• Deployed on different availability zones
DISASTER RECOVERY
• Cold backup using MMS Backup	

• Full production replication on another EC2 region:
using mongo’s replication mechanism to
continuously sync data to the backup region
THANKYOU!

More Related Content

What's hot

SplunkLive! Customer Presentation - Garmin International
SplunkLive! Customer Presentation - Garmin InternationalSplunkLive! Customer Presentation - Garmin International
SplunkLive! Customer Presentation - Garmin International
Splunk
 
LabGauge - LRIG Late Night
LabGauge - LRIG Late NightLabGauge - LRIG Late Night
LabGauge - LRIG Late Night
xi2elic
 
Capstone Poster Final Draft - 2
Capstone Poster Final Draft - 2Capstone Poster Final Draft - 2
Capstone Poster Final Draft - 2
Krishna Prasad A R
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
QAware GmbH
 
Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15
Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15
Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15
MLconf
 

What's hot (19)

SplunkLive! Customer Presentation - Garmin International
SplunkLive! Customer Presentation - Garmin InternationalSplunkLive! Customer Presentation - Garmin International
SplunkLive! Customer Presentation - Garmin International
 
Turning Cloud Metrics into Results
Turning Cloud Metrics into ResultsTurning Cloud Metrics into Results
Turning Cloud Metrics into Results
 
Efficient IT operations using monitoring systems and standardized tools - Ici...
Efficient IT operations using monitoring systems and standardized tools - Ici...Efficient IT operations using monitoring systems and standardized tools - Ici...
Efficient IT operations using monitoring systems and standardized tools - Ici...
 
LabGauge - LRIG Late Night
LabGauge - LRIG Late NightLabGauge - LRIG Late Night
LabGauge - LRIG Late Night
 
Monitoring via Datadog
Monitoring via DatadogMonitoring via Datadog
Monitoring via Datadog
 
Monitoring @ scale spot dy
Monitoring @ scale spot dyMonitoring @ scale spot dy
Monitoring @ scale spot dy
 
Combinación de logs, métricas y trazas para una observabilidad centralizada
Combinación de logs, métricas y trazas para una observabilidad centralizadaCombinación de logs, métricas y trazas para una observabilidad centralizada
Combinación de logs, métricas y trazas para una observabilidad centralizada
 
Capstone Poster Final Draft - 2
Capstone Poster Final Draft - 2Capstone Poster Final Draft - 2
Capstone Poster Final Draft - 2
 
Why Visibility into Your Stack Matters
Why Visibility into Your Stack MattersWhy Visibility into Your Stack Matters
Why Visibility into Your Stack Matters
 
Splunk Implementation and Usage - Garmin
Splunk Implementation and Usage - GarminSplunk Implementation and Usage - Garmin
Splunk Implementation and Usage - Garmin
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
 
Data torrent meetup-productioneng
Data torrent meetup-productionengData torrent meetup-productioneng
Data torrent meetup-productioneng
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)
 
Codemotion Milan 2015 Alerts Overload
Codemotion Milan 2015 Alerts OverloadCodemotion Milan 2015 Alerts Overload
Codemotion Milan 2015 Alerts Overload
 
Sarah Wells - Alert overload: How to adopt a microservices architecture witho...
Sarah Wells - Alert overload: How to adopt a microservices architecture witho...Sarah Wells - Alert overload: How to adopt a microservices architecture witho...
Sarah Wells - Alert overload: How to adopt a microservices architecture witho...
 
Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15
Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15
Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15
 
SensorThings API webinar-#4-Connect Your Sensor
SensorThings API webinar-#4-Connect Your SensorSensorThings API webinar-#4-Connect Your Sensor
SensorThings API webinar-#4-Connect Your Sensor
 
Using static analysis tools within continuous integration systems
Using static analysis tools within continuous integration systemsUsing static analysis tools within continuous integration systems
Using static analysis tools within continuous integration systems
 
Cloud-native application monitoring powered by Riverbed and Elasticsearch
Cloud-native application monitoring powered by Riverbed and ElasticsearchCloud-native application monitoring powered by Riverbed and Elasticsearch
Cloud-native application monitoring powered by Riverbed and Elasticsearch
 

Similar to Building an event system on top MongoDB

Building Microservices with Scala, functional domain models and Spring Boot -...
Building Microservices with Scala, functional domain models and Spring Boot -...Building Microservices with Scala, functional domain models and Spring Boot -...
Building Microservices with Scala, functional domain models and Spring Boot -...
JAXLondon2014
 
Event Driven Architectures
Event Driven ArchitecturesEvent Driven Architectures
Event Driven Architectures
Avinash Ramineni
 

Similar to Building an event system on top MongoDB (20)

Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data Platform
 
Keptn: Unbreakable Continuous Delivery - Berlin CI/CD Meetup
Keptn: Unbreakable Continuous Delivery - Berlin CI/CD MeetupKeptn: Unbreakable Continuous Delivery - Berlin CI/CD Meetup
Keptn: Unbreakable Continuous Delivery - Berlin CI/CD Meetup
 
Barista: Event-centric NOS Composition Framework for SDN
Barista: Event-centric NOS Composition Framework for SDNBarista: Event-centric NOS Composition Framework for SDN
Barista: Event-centric NOS Composition Framework for SDN
 
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
 
Building Autonomous Operations for Kubernetes with keptn
Building Autonomous Operations for Kubernetes with keptnBuilding Autonomous Operations for Kubernetes with keptn
Building Autonomous Operations for Kubernetes with keptn
 
2006 - Basta!: Advanced server controls
2006 - Basta!: Advanced server controls2006 - Basta!: Advanced server controls
2006 - Basta!: Advanced server controls
 
Sybase BAM Overview
Sybase BAM OverviewSybase BAM Overview
Sybase BAM Overview
 
Building Microservices with Scala, functional domain models and Spring Boot -...
Building Microservices with Scala, functional domain models and Spring Boot -...Building Microservices with Scala, functional domain models and Spring Boot -...
Building Microservices with Scala, functional domain models and Spring Boot -...
 
#JaxLondon: Building microservices with Scala, functional domain models and S...
#JaxLondon: Building microservices with Scala, functional domain models and S...#JaxLondon: Building microservices with Scala, functional domain models and S...
#JaxLondon: Building microservices with Scala, functional domain models and S...
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
 
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
 
Event Driven Architectures
Event Driven ArchitecturesEvent Driven Architectures
Event Driven Architectures
 
Azure Event Grid: Glue for the Internet
Azure Event Grid: Glue for the InternetAzure Event Grid: Glue for the Internet
Azure Event Grid: Glue for the Internet
 
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
 
AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mob...
AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mob...AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mob...
AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mob...
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
 
[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data
 
Behavioral Analytics and Blockchain Applications – a Reliability View. Keynot...
Behavioral Analytics and Blockchain Applications – a Reliability View. Keynot...Behavioral Analytics and Blockchain Applications – a Reliability View. Keynot...
Behavioral Analytics and Blockchain Applications – a Reliability View. Keynot...
 
How to Create Observable Integration Solutions Using WSO2 Enterprise Integrator
How to Create Observable Integration Solutions Using WSO2 Enterprise IntegratorHow to Create Observable Integration Solutions Using WSO2 Enterprise Integrator
How to Create Observable Integration Solutions Using WSO2 Enterprise Integrator
 
Observability for Integration Using WSO2 Enterprise Integrator
Observability for Integration Using WSO2 Enterprise IntegratorObservability for Integration Using WSO2 Enterprise Integrator
Observability for Integration Using WSO2 Enterprise Integrator
 

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 

Recently uploaded (20)

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

Building an event system on top MongoDB

  • 1. BUILDING A MISSION CRITICAL EVENT SYSTEM ON TOP OF MONGODB by @shahar_kedar
  • 2. BIGPANDA SaaS platform that lets companies aggregate alerts from all their monitoring systems into one place for faster incident discovery and response.
  • 3. HOW IT WORKS High CPU on prod-srv-1 18/06/14 16:05 CRITICAL High CPU on prod-srv-1 18/06/14 16:07 WARNING Memory usage on prod-srv-1 18/06/14 16:08 CRITICAL Events Entities High CPU on prod-srv-1 WARNING Memory usage on prod-srv-1 CRITICAL Incidents 2 Alerts on prod-srv-1
  • 4. PRODUCT REQUIREMENTS • Events need to be processed into incidents and streamed to the user’s browser as fast as possible • Incidents need to reliably reflect the state as it is in the monitoring system • The service has to be up and running 24x7
  • 5. MISSION CRITICAL • It’s not rocket science, it’s not Google, but: • It has to be super fast • It has to be extremely reliable • It has to always be available
  • 8. WHY MONGO? At first: • NodeJS shop • Schemaless • Easy to master Later on: • Reliable • Easy to evolve • Partial and atomic updates • Powerful query language BECAUSE IT’S WEB SCALE!
  • 10. HARDWARE 03/13 3 x m1.medium 02/14 1 x i2.xlarge
 + 2 x m1.medium m1.medium: 1 vCPUs, 3.75GB RAM, EBS drive 06/14 2 x i2.xlarge
 + 1 x m3.xlarge m3.xlarge: 4 vCPUs, 15GB RAM, EBS drive i2.xlarge: 4 vCPUs, 30.5GB RAM, SSD 800GB x3 reads x4 writes
  • 11. –Eliot Horowitz “Schema design is … the largest factor when it comes to performance and scalability … more important than hardware, how you shard, or anything else, schema is by far the most important thing.”
  • 12. SCHEMA DESIGN Event { timestamp : Date status: String description: String, } Entity { start : Date end: Date status: String description: String, events: [ <embedded> ] source_system: String } Incident { start : Date end: Date is_active: Boolean description: String, entities: [
 { entityId: ObjectId status: String } ] }
  • 13. DENORMALIZATION • Go over the checklist (http://bit.ly/1vUdz2T) • Incidents => Entities: partially embedded + ref • Cardinality: one-to-few • Direct access to Entities • Entities are frequently updated • Entities => Events: embedded • Events are not directly accessed • Events are immutable • Cardinality: one-to-many ~ one-to-gazzilion
  • 14. INDEXES • Optimized indexes 
 db.collection.find({..}).explain() • Removed redundant indexes • Truncated events collections (TTL index)
  • 15. LEAN QUERIES • Use projections to limit fields returned by a query:
 Model.find().select(‘-events’) • Mongoose users: use .lean() when possible to gain more than 50% performance boost:
 Model.find().lean() • Stream results: 
 Model.find().stream().on(‘data’, function(doc){})

  • 16. RESULTS • Average latency of all API calls went from 500ms to under 20ms • Average latency of full pipeline went from 2s to under 500ms • Peak time latency of full pipeline went down from 5m(!!) to less than 30s
  • 18. ATOMIC & PARTIAL UPDATES • Several services might try to update the same document at the same time, but: • Different systems update different parts of the document • Updates to the same document are sharded and ordered at the application level 
 (read our awesome blog post: http://bit.ly/1nQVcbS)
  • 20. REPLICA SET • 3 nodes replica set • Using priorities to enforce master election of stronger nodes • Deployed on different availability zones
  • 21. DISASTER RECOVERY • Cold backup using MMS Backup • Full production replication on another EC2 region: using mongo’s replication mechanism to continuously sync data to the backup region

Editor's Notes

  1. For each customer: aggregate alert notifications from multiple monitoring systems group together alerts that belong to the same monitored appliance group together, into “incidents”, alerts that are (topo-)logically related