SlideShare a Scribd company logo
1 of 29
Scaling Rails @ Yottaa Jared Rosoff @forjared jrosoff@yottaa.com September 20th 2010
From zero to humongous 2 About our application  How we chose MongoDB How we use MongoDB
About our application 3 We collect lots of data 6000+ URLs 300 samples per URL per day Some samples are >1MB (firebug)  Missing a sample isn’t a bit deal We visualize data in real-time No delay when showing data “On-Demand” samples  The “check now” button
The Yottaa Network 4
How we chose mongo 5
Requirements Our data set is going to grow very quickly  Scalable by default We have a very small team Focus on application, not infrastructure We are a startup  Requirements change hourly Operations We’re 100% in the cloud 6
Rails default architecture Performance Bottleneck: Too much load Collection Server Data Source MySQL User Reporting Server “Just” a Rails App
Let’s add replication! Performance Bottleneck: Still can’t scale writes MySQL Master Collection Server Data Source Replication MySQL Master User Reporting Server MySQL Master MySQL Master Off the shelf! Scalable Reads!
What about sharding? Development Bottleneck: Need to write custom code Collection Server Data Source Sharding MySQL Master MySQL Master MySQL Master User Reporting Server Sharding Scalable Writes!
Key Value stores to the rescue? Development Bottleneck: Reporting is limited / hard Collection Server Data Source MySQL Master MySQL Master Cassandra or Voldemort User Reporting Server Scalable Writes!
Can I Hadoop my way out of this? 	Development Bottleneck: Too many systems! MySQL Master MySQL Master Cassandra or Voldemort Collection Server Data Source Hadoop MySQL Master Scalable Writes! Flexible Reports! “Just” a Rails App MySQL Master User Reporting Server MySQL Master MySQL Slave
MongoDB!  Collection Server Data Source MySQL Master MySQL Master MongoDB User Reporting Server Scalable Writes! “Just” a rails app Flexible Reporting!
MongoD App Server Data Source Collection MongoD Load Balancer Passenger Nginx Mongos Reporting User MongoD Sharding! High Concurrency Scale-Out
Sharding is critical 14 Distribute write load across servers Decentralize data storage Scale out!
Before Sharding 15 App Server App Server App Server Need higher write volume Buy a bigger database Need more storage volume Buy a bigger database
After Sharding 16 App Server App Server App Server Need higher write volume Add more servers Need more storage volume Add more servers
Scale out is the new scale up 17 App Server App Server App Server
How we’re using MongoDB 18
Our Data Model 19 Document per URL we track  Meta-data Summary Data Most recent measurements Document per URL per Day Detailed metrics Pre-aggregated data
Thinking in rows 20 { url: ‘www.google.com’,   location: “SFO”    connect: 23,  first_byte: 123,  last_byte: 245,    timestamp: 1234	}  { url: ‘www.google.com’,   location: “NYC”    connect: 23,  first_byte: 123,  last_byte: 245,    timestamp: 2345	}
Thinking in rows 21 What was the average connect time for google on friday? From SFO? From NYC? Between 1AM-2AM?
Thinking in rows 22  Up to 100’s of samples per URL per day!! Day 1 AVG Result Day 2 An “average” chart had to hit 600 rows   AVG Day 3 AVG 30 days average query range
Thinking in Documents This document contains all data for www.google.com collected during 9/20/2010 This tells us the average value for this metric for this url / time period Average value from SFO Average value from NYC 23
Storing a sample 24 db.metrics.dailies.update(  	{ url: ‘www.google.com’,         day: ‘9/20/2010’ },  	{ ‘$inc’: {  	  ‘connect.sum’:1234,        ‘connect.count’:1,        ‘connect.sfo.sum’:1234,        ‘connect.sfo.count’:1 } },      { upsert: true }  ); Which document we’re updating Update the aggregate value Update the location specific value Atomically update the document Create the document if it doesn’t already exist
Putting it together 25 Atomically update the daily data 1 { url: ‘www.google.com’,   location: “SFO”    connect: 23,  first_byte: 123,  last_byte: 245,    timestamp: 1234	}  Atomically update the weekly data 2 Atomically update the monthly data 3
Drawing connect time graph 26 db.metrics.dailies.find(  	{ url: ‘www.google.com’,         day: { “$gte”: ‘9/1/2010’,                  “$lte”:’9/20/2010’ },  	{ ‘connect’:true} ); Data for google We just want connect time data Compound index to make this query fast The range of dates for the chart db.metrics.dailies.ensureIndex({url:1,day:-1})
More efficient charts 27 1 Document per URL per Day Day 1 AVG Result Day 2 Average chart hits 30 documents.  AVG 20x fewer Day 3 AVG 30 days == 30 documents
Real Time Updates 28 Single query to fetch all metric data for a URL Fast enough that browser can poll constantly for updated data without impacting server
Final thoughts Mongo has been a great choice  80gb of data and counting Majorly compressed after moving from table to document oriented data model  100’s of updates per second 24x7 Not using Sharding in production yet, but planning on it soon  You are using replication, right?  29

More Related Content

Viewers also liked

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014Matthew Russell
 
Benchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneBenchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneTim Callaghan
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXTim Callaghan
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB MongoDB
 
Lightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDBLightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDBMongoDB
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsMongoDB
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopAhmedabadJavaMeetup
 
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceTim Callaghan
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkChris Westin
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishMongoDB
 
[D2 campus seminar]오픈소스로 날아오르다
[D2 campus seminar]오픈소스로 날아오르다[D2 campus seminar]오픈소스로 날아오르다
[D2 campus seminar]오픈소스로 날아오르다NAVER D2
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHPfwso
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDBTony Tam
 

Viewers also liked (18)

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
 
Benchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneBenchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and Fortune
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
 
Lightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDBLightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDB
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
 
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
 
MongoDB - Ekino PHP
MongoDB - Ekino PHPMongoDB - Ekino PHP
MongoDB - Ekino PHP
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation framework
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
 
MongoDB
MongoDBMongoDB
MongoDB
 
[D2 campus seminar]오픈소스로 날아오르다
[D2 campus seminar]오픈소스로 날아오르다[D2 campus seminar]오픈소스로 날아오르다
[D2 campus seminar]오픈소스로 날아오르다
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHP
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 

More from Jared Rosoff

MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesJared Rosoff
 
Mongosv 2011 - Sharding
Mongosv 2011 - ShardingMongosv 2011 - Sharding
Mongosv 2011 - ShardingJared Rosoff
 
Mongosv 2011 - Replication
Mongosv 2011 - ReplicationMongosv 2011 - Replication
Mongosv 2011 - ReplicationJared Rosoff
 
Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2Jared Rosoff
 
MongoDB Deployment Tips
MongoDB Deployment TipsMongoDB Deployment Tips
MongoDB Deployment TipsJared Rosoff
 
Scaling with mongo db - SF Mongo User Group 7-19-2011
Scaling with mongo db - SF Mongo User Group 7-19-2011Scaling with mongo db - SF Mongo User Group 7-19-2011
Scaling with mongo db - SF Mongo User Group 7-19-2011Jared Rosoff
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBSJared Rosoff
 
Indexing & query optimization
Indexing & query optimizationIndexing & query optimization
Indexing & query optimizationJared Rosoff
 
Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010Jared Rosoff
 
Scalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsScalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsJared Rosoff
 

More from Jared Rosoff (10)

MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - Inboxes
 
Mongosv 2011 - Sharding
Mongosv 2011 - ShardingMongosv 2011 - Sharding
Mongosv 2011 - Sharding
 
Mongosv 2011 - Replication
Mongosv 2011 - ReplicationMongosv 2011 - Replication
Mongosv 2011 - Replication
 
Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2
 
MongoDB Deployment Tips
MongoDB Deployment TipsMongoDB Deployment Tips
MongoDB Deployment Tips
 
Scaling with mongo db - SF Mongo User Group 7-19-2011
Scaling with mongo db - SF Mongo User Group 7-19-2011Scaling with mongo db - SF Mongo User Group 7-19-2011
Scaling with mongo db - SF Mongo User Group 7-19-2011
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBS
 
Indexing & query optimization
Indexing & query optimizationIndexing & query optimization
Indexing & query optimization
 
Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010
 
Scalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsScalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on Rails
 

Recently uploaded

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Realtime Analytics with MongoDB

  • 1. Scaling Rails @ Yottaa Jared Rosoff @forjared jrosoff@yottaa.com September 20th 2010
  • 2. From zero to humongous 2 About our application How we chose MongoDB How we use MongoDB
  • 3. About our application 3 We collect lots of data 6000+ URLs 300 samples per URL per day Some samples are >1MB (firebug) Missing a sample isn’t a bit deal We visualize data in real-time No delay when showing data “On-Demand” samples The “check now” button
  • 5. How we chose mongo 5
  • 6. Requirements Our data set is going to grow very quickly Scalable by default We have a very small team Focus on application, not infrastructure We are a startup Requirements change hourly Operations We’re 100% in the cloud 6
  • 7. Rails default architecture Performance Bottleneck: Too much load Collection Server Data Source MySQL User Reporting Server “Just” a Rails App
  • 8. Let’s add replication! Performance Bottleneck: Still can’t scale writes MySQL Master Collection Server Data Source Replication MySQL Master User Reporting Server MySQL Master MySQL Master Off the shelf! Scalable Reads!
  • 9. What about sharding? Development Bottleneck: Need to write custom code Collection Server Data Source Sharding MySQL Master MySQL Master MySQL Master User Reporting Server Sharding Scalable Writes!
  • 10. Key Value stores to the rescue? Development Bottleneck: Reporting is limited / hard Collection Server Data Source MySQL Master MySQL Master Cassandra or Voldemort User Reporting Server Scalable Writes!
  • 11. Can I Hadoop my way out of this? Development Bottleneck: Too many systems! MySQL Master MySQL Master Cassandra or Voldemort Collection Server Data Source Hadoop MySQL Master Scalable Writes! Flexible Reports! “Just” a Rails App MySQL Master User Reporting Server MySQL Master MySQL Slave
  • 12. MongoDB! Collection Server Data Source MySQL Master MySQL Master MongoDB User Reporting Server Scalable Writes! “Just” a rails app Flexible Reporting!
  • 13. MongoD App Server Data Source Collection MongoD Load Balancer Passenger Nginx Mongos Reporting User MongoD Sharding! High Concurrency Scale-Out
  • 14. Sharding is critical 14 Distribute write load across servers Decentralize data storage Scale out!
  • 15. Before Sharding 15 App Server App Server App Server Need higher write volume Buy a bigger database Need more storage volume Buy a bigger database
  • 16. After Sharding 16 App Server App Server App Server Need higher write volume Add more servers Need more storage volume Add more servers
  • 17. Scale out is the new scale up 17 App Server App Server App Server
  • 18. How we’re using MongoDB 18
  • 19. Our Data Model 19 Document per URL we track Meta-data Summary Data Most recent measurements Document per URL per Day Detailed metrics Pre-aggregated data
  • 20. Thinking in rows 20 { url: ‘www.google.com’, location: “SFO” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 } { url: ‘www.google.com’, location: “NYC” connect: 23, first_byte: 123, last_byte: 245, timestamp: 2345 }
  • 21. Thinking in rows 21 What was the average connect time for google on friday? From SFO? From NYC? Between 1AM-2AM?
  • 22. Thinking in rows 22 Up to 100’s of samples per URL per day!! Day 1 AVG Result Day 2 An “average” chart had to hit 600 rows AVG Day 3 AVG 30 days average query range
  • 23. Thinking in Documents This document contains all data for www.google.com collected during 9/20/2010 This tells us the average value for this metric for this url / time period Average value from SFO Average value from NYC 23
  • 24. Storing a sample 24 db.metrics.dailies.update( { url: ‘www.google.com’, day: ‘9/20/2010’ }, { ‘$inc’: { ‘connect.sum’:1234, ‘connect.count’:1, ‘connect.sfo.sum’:1234, ‘connect.sfo.count’:1 } }, { upsert: true } ); Which document we’re updating Update the aggregate value Update the location specific value Atomically update the document Create the document if it doesn’t already exist
  • 25. Putting it together 25 Atomically update the daily data 1 { url: ‘www.google.com’, location: “SFO” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 } Atomically update the weekly data 2 Atomically update the monthly data 3
  • 26. Drawing connect time graph 26 db.metrics.dailies.find( { url: ‘www.google.com’, day: { “$gte”: ‘9/1/2010’, “$lte”:’9/20/2010’ }, { ‘connect’:true} ); Data for google We just want connect time data Compound index to make this query fast The range of dates for the chart db.metrics.dailies.ensureIndex({url:1,day:-1})
  • 27. More efficient charts 27 1 Document per URL per Day Day 1 AVG Result Day 2 Average chart hits 30 documents. AVG 20x fewer Day 3 AVG 30 days == 30 documents
  • 28. Real Time Updates 28 Single query to fetch all metric data for a URL Fast enough that browser can poll constantly for updated data without impacting server
  • 29. Final thoughts Mongo has been a great choice 80gb of data and counting Majorly compressed after moving from table to document oriented data model 100’s of updates per second 24x7 Not using Sharding in production yet, but planning on it soon You are using replication, right? 29