SlideShare a Scribd company logo
1 of 29
Scaling Rails @ Yottaa Jared Rosoff @forjared jrosoff@yottaa.com September 20th 2010
From zero to humongous 2 About our application  How we chose MongoDB How we use MongoDB
About our application 3 We collect lots of data 6000+ URLs 300 samples per URL per day Some samples are >1MB (firebug)  Missing a sample isn’t a bit deal We visualize data in real-time No delay when showing data “On-Demand” samples  The “check now” button
The Yottaa Network 4
How we chose mongo 5
Requirements Our data set is going to grow very quickly  Scalable by default We have a very small team Focus on application, not infrastructure We are a startup  Requirements change hourly Operations We’re 100% in the cloud 6
Rails default architecture Performance Bottleneck: Too much load Collection Server Data Source MySQL User Reporting Server “Just” a Rails App
Let’s add replication! Performance Bottleneck: Still can’t scale writes MySQL Master Collection Server Data Source Replication MySQL Master User Reporting Server MySQL Master MySQL Master Off the shelf! Scalable Reads!
What about sharding? Development Bottleneck: Need to write custom code Collection Server Data Source Sharding MySQL Master MySQL Master MySQL Master User Reporting Server Sharding Scalable Writes!
Key Value stores to the rescue? Development Bottleneck: Reporting is limited / hard Collection Server Data Source MySQL Master MySQL Master Cassandra or Voldemort User Reporting Server Scalable Writes!
Can I Hadoop my way out of this? 	Development Bottleneck: Too many systems! MySQL Master MySQL Master Cassandra or Voldemort Collection Server Data Source Hadoop MySQL Master Scalable Writes! Flexible Reports! “Just” a Rails App MySQL Master User Reporting Server MySQL Master MySQL Slave
MongoDB!  Collection Server Data Source MySQL Master MySQL Master MongoDB User Reporting Server Scalable Writes! “Just” a rails app Flexible Reporting!
MongoD App Server Data Source Collection MongoD Load Balancer Passenger Nginx Mongos Reporting User MongoD Sharding! High Concurrency Scale-Out
Sharding is critical 14 Distribute write load across servers Decentralize data storage Scale out!
Before Sharding 15 App Server App Server App Server Need higher write volume Buy a bigger database Need more storage volume Buy a bigger database
After Sharding 16 App Server App Server App Server Need higher write volume Add more servers Need more storage volume Add more servers
Scale out is the new scale up 17 App Server App Server App Server
How we’re using MongoDB 18
Our Data Model 19 Document per URL we track  Meta-data Summary Data Most recent measurements Document per URL per Day Detailed metrics Pre-aggregated data
Thinking in rows 20 { url: ‘www.google.com’,   location: “SFO”    connect: 23,  first_byte: 123,  last_byte: 245,    timestamp: 1234	}  { url: ‘www.google.com’,   location: “NYC”    connect: 23,  first_byte: 123,  last_byte: 245,    timestamp: 2345	}
Thinking in rows 21 What was the average connect time for google on friday? From SFO? From NYC? Between 1AM-2AM?
Thinking in rows 22  Up to 100’s of samples per URL per day!! Day 1 AVG Result Day 2 An “average” chart had to hit 600 rows   AVG Day 3 AVG 30 days average query range
Thinking in Documents This document contains all data for www.google.com collected during 9/20/2010 This tells us the average value for this metric for this url / time period Average value from SFO Average value from NYC 23
Storing a sample 24 db.metrics.dailies.update(  	{ url: ‘www.google.com’,         day: ‘9/20/2010’ },  	{ ‘$inc’: {  	  ‘connect.sum’:1234,        ‘connect.count’:1,        ‘connect.sfo.sum’:1234,        ‘connect.sfo.count’:1 } },      { upsert: true }  ); Which document we’re updating Update the aggregate value Update the location specific value Atomically update the document Create the document if it doesn’t already exist
Putting it together 25 Atomically update the daily data 1 { url: ‘www.google.com’,   location: “SFO”    connect: 23,  first_byte: 123,  last_byte: 245,    timestamp: 1234	}  Atomically update the weekly data 2 Atomically update the monthly data 3
Drawing connect time graph 26 db.metrics.dailies.find(  	{ url: ‘www.google.com’,         day: { “$gte”: ‘9/1/2010’,                  “$lte”:’9/20/2010’ },  	{ ‘connect’:true} ); Data for google We just want connect time data Compound index to make this query fast The range of dates for the chart db.metrics.dailies.ensureIndex({url:1,day:-1})
More efficient charts 27 1 Document per URL per Day Day 1 AVG Result Day 2 Average chart hits 30 documents.  AVG 20x fewer Day 3 AVG 30 days == 30 documents
Real Time Updates 28 Single query to fetch all metric data for a URL Fast enough that browser can poll constantly for updated data without impacting server
Final thoughts Mongo has been a great choice  80gb of data and counting Majorly compressed after moving from table to document oriented data model  100’s of updates per second 24x7 Not using Sharding in production yet, but planning on it soon  You are using replication, right?  29

More Related Content

Viewers also liked

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014Matthew Russell
 
Benchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneBenchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneTim Callaghan
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXTim Callaghan
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB MongoDB
 
Lightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDBLightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDBMongoDB
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsMongoDB
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopAhmedabadJavaMeetup
 
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceTim Callaghan
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkChris Westin
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishMongoDB
 
[D2 campus seminar]오픈소스로 날아오르다
[D2 campus seminar]오픈소스로 날아오르다[D2 campus seminar]오픈소스로 날아오르다
[D2 campus seminar]오픈소스로 날아오르다NAVER D2
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHPfwso
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDBTony Tam
 

Viewers also liked (18)

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
 
Benchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneBenchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and Fortune
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
 
Lightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDBLightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDB
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
 
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
 
MongoDB - Ekino PHP
MongoDB - Ekino PHPMongoDB - Ekino PHP
MongoDB - Ekino PHP
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation framework
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
 
MongoDB
MongoDBMongoDB
MongoDB
 
[D2 campus seminar]오픈소스로 날아오르다
[D2 campus seminar]오픈소스로 날아오르다[D2 campus seminar]오픈소스로 날아오르다
[D2 campus seminar]오픈소스로 날아오르다
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHP
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 

More from Jared Rosoff

MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesJared Rosoff
 
Mongosv 2011 - Sharding
Mongosv 2011 - ShardingMongosv 2011 - Sharding
Mongosv 2011 - ShardingJared Rosoff
 
Mongosv 2011 - Replication
Mongosv 2011 - ReplicationMongosv 2011 - Replication
Mongosv 2011 - ReplicationJared Rosoff
 
Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2Jared Rosoff
 
MongoDB Deployment Tips
MongoDB Deployment TipsMongoDB Deployment Tips
MongoDB Deployment TipsJared Rosoff
 
Scaling with mongo db - SF Mongo User Group 7-19-2011
Scaling with mongo db - SF Mongo User Group 7-19-2011Scaling with mongo db - SF Mongo User Group 7-19-2011
Scaling with mongo db - SF Mongo User Group 7-19-2011Jared Rosoff
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBSJared Rosoff
 
Indexing & query optimization
Indexing & query optimizationIndexing & query optimization
Indexing & query optimizationJared Rosoff
 
Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010Jared Rosoff
 
Scalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsScalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsJared Rosoff
 

More from Jared Rosoff (10)

MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - Inboxes
 
Mongosv 2011 - Sharding
Mongosv 2011 - ShardingMongosv 2011 - Sharding
Mongosv 2011 - Sharding
 
Mongosv 2011 - Replication
Mongosv 2011 - ReplicationMongosv 2011 - Replication
Mongosv 2011 - Replication
 
Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2
 
MongoDB Deployment Tips
MongoDB Deployment TipsMongoDB Deployment Tips
MongoDB Deployment Tips
 
Scaling with mongo db - SF Mongo User Group 7-19-2011
Scaling with mongo db - SF Mongo User Group 7-19-2011Scaling with mongo db - SF Mongo User Group 7-19-2011
Scaling with mongo db - SF Mongo User Group 7-19-2011
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBS
 
Indexing & query optimization
Indexing & query optimizationIndexing & query optimization
Indexing & query optimization
 
Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010
 
Scalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsScalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on Rails
 

Recently uploaded

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Realtime Analytics with MongoDB

  • 1. Scaling Rails @ Yottaa Jared Rosoff @forjared jrosoff@yottaa.com September 20th 2010
  • 2. From zero to humongous 2 About our application How we chose MongoDB How we use MongoDB
  • 3. About our application 3 We collect lots of data 6000+ URLs 300 samples per URL per day Some samples are >1MB (firebug) Missing a sample isn’t a bit deal We visualize data in real-time No delay when showing data “On-Demand” samples The “check now” button
  • 5. How we chose mongo 5
  • 6. Requirements Our data set is going to grow very quickly Scalable by default We have a very small team Focus on application, not infrastructure We are a startup Requirements change hourly Operations We’re 100% in the cloud 6
  • 7. Rails default architecture Performance Bottleneck: Too much load Collection Server Data Source MySQL User Reporting Server “Just” a Rails App
  • 8. Let’s add replication! Performance Bottleneck: Still can’t scale writes MySQL Master Collection Server Data Source Replication MySQL Master User Reporting Server MySQL Master MySQL Master Off the shelf! Scalable Reads!
  • 9. What about sharding? Development Bottleneck: Need to write custom code Collection Server Data Source Sharding MySQL Master MySQL Master MySQL Master User Reporting Server Sharding Scalable Writes!
  • 10. Key Value stores to the rescue? Development Bottleneck: Reporting is limited / hard Collection Server Data Source MySQL Master MySQL Master Cassandra or Voldemort User Reporting Server Scalable Writes!
  • 11. Can I Hadoop my way out of this? Development Bottleneck: Too many systems! MySQL Master MySQL Master Cassandra or Voldemort Collection Server Data Source Hadoop MySQL Master Scalable Writes! Flexible Reports! “Just” a Rails App MySQL Master User Reporting Server MySQL Master MySQL Slave
  • 12. MongoDB! Collection Server Data Source MySQL Master MySQL Master MongoDB User Reporting Server Scalable Writes! “Just” a rails app Flexible Reporting!
  • 13. MongoD App Server Data Source Collection MongoD Load Balancer Passenger Nginx Mongos Reporting User MongoD Sharding! High Concurrency Scale-Out
  • 14. Sharding is critical 14 Distribute write load across servers Decentralize data storage Scale out!
  • 15. Before Sharding 15 App Server App Server App Server Need higher write volume Buy a bigger database Need more storage volume Buy a bigger database
  • 16. After Sharding 16 App Server App Server App Server Need higher write volume Add more servers Need more storage volume Add more servers
  • 17. Scale out is the new scale up 17 App Server App Server App Server
  • 18. How we’re using MongoDB 18
  • 19. Our Data Model 19 Document per URL we track Meta-data Summary Data Most recent measurements Document per URL per Day Detailed metrics Pre-aggregated data
  • 20. Thinking in rows 20 { url: ‘www.google.com’, location: “SFO” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 } { url: ‘www.google.com’, location: “NYC” connect: 23, first_byte: 123, last_byte: 245, timestamp: 2345 }
  • 21. Thinking in rows 21 What was the average connect time for google on friday? From SFO? From NYC? Between 1AM-2AM?
  • 22. Thinking in rows 22 Up to 100’s of samples per URL per day!! Day 1 AVG Result Day 2 An “average” chart had to hit 600 rows AVG Day 3 AVG 30 days average query range
  • 23. Thinking in Documents This document contains all data for www.google.com collected during 9/20/2010 This tells us the average value for this metric for this url / time period Average value from SFO Average value from NYC 23
  • 24. Storing a sample 24 db.metrics.dailies.update( { url: ‘www.google.com’, day: ‘9/20/2010’ }, { ‘$inc’: { ‘connect.sum’:1234, ‘connect.count’:1, ‘connect.sfo.sum’:1234, ‘connect.sfo.count’:1 } }, { upsert: true } ); Which document we’re updating Update the aggregate value Update the location specific value Atomically update the document Create the document if it doesn’t already exist
  • 25. Putting it together 25 Atomically update the daily data 1 { url: ‘www.google.com’, location: “SFO” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 } Atomically update the weekly data 2 Atomically update the monthly data 3
  • 26. Drawing connect time graph 26 db.metrics.dailies.find( { url: ‘www.google.com’, day: { “$gte”: ‘9/1/2010’, “$lte”:’9/20/2010’ }, { ‘connect’:true} ); Data for google We just want connect time data Compound index to make this query fast The range of dates for the chart db.metrics.dailies.ensureIndex({url:1,day:-1})
  • 27. More efficient charts 27 1 Document per URL per Day Day 1 AVG Result Day 2 Average chart hits 30 documents. AVG 20x fewer Day 3 AVG 30 days == 30 documents
  • 28. Real Time Updates 28 Single query to fetch all metric data for a URL Fast enough that browser can poll constantly for updated data without impacting server
  • 29. Final thoughts Mongo has been a great choice 80gb of data and counting Majorly compressed after moving from table to document oriented data model 100’s of updates per second 24x7 Not using Sharding in production yet, but planning on it soon You are using replication, right? 29