Scaling with mongo db - SF Mongo User Group 7-19-2011

•Download as PPTX, PDF•

8 likes•3,631 views

- MongoDB allows scaling by using documents, optimizing indexes, and understanding your working data set size. - Replica sets can scale reads by adding secondary nodes for load balancing, while sharding scales writes and RAM usage by splitting data across multiple shards. - Proper disk configuration and replication are important to maximize performance when scaling with MongoDB.

Technology Business

Scaling with MongoDBJared Rosoff (jsr@10gen.com) - @forjared

How do we do it today? We use a relational database but … We don’t use joins We don’t use transactions We add read-only slaves We added a caching layer We de-normalized our data We implemented custom sharding We buy bigger servers

The landscape Memcached Key / Value Scalability & Performance RDBMS Depth of functionality

Scaling your app Use documents Indexes make me happy Knowing your working set Disks are the bottleneck Replication makes reading fun Sharding for profit

Documents { author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Best Movie Ever” } ] }

Disk Seeks & Data Locality Read = really really fast Seek = 5+ ms

Disk Seeks & Data Locality Post Comment Author

Disk Seeks & Data Locality Post Author Comment Comment Comment Comment Comment

Table scans Find where x equals 7 1 2 3 4 5 6 7 Looked at 7 objects

Tree Lookup Find where x equals 7 4 6 2 7 5 3 1 Looked at 3 objects

Random Index Entire index must fit in RAM

Working Set Active Documents + Used Indexes RAM Disk

Page Fault App requests document Document not in memory Evict a page from memory Read block from disk Return document from memory App 1 5 2 RAM 3 4 Disk

$Figuring out working Set > db.foo.stats() { "ns" : "test.foo", "count" : 1338330, "size" : 46915928, "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "x_1" : 43982848 }, "ok" : 1 } Size of data Average document size Size on disk (and in memory!) Size of all indexes Size of each index$

RAID0 ~200 seeks / second ~200 seeks / second ~200 seeks / second

RAID10 ~400 seeks / second ~400 seeks / second ~400 seeks / second

Replica Sets Read / Write Secondary Read Primary Read Secondary

Replica Sets Read / Write Read Secondary Secondary Read Primary Read Secondary Secondary Read

Secondary Secondary Secondary Secondary MongoS MongoS Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 Primary Primary Primary Primary Secondary Secondary Secondary Secondary

400GB Index? Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 100GB Index! 100GB Index! 100GB Index! 100GB Index!

Summary Use documents to your advantage! Optimize your indexes Understand your working set Use a sane disk configuratino Use replicas to scale reads Use sharding to scale writes & working RAM

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011

Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Rick Copeland

MongoSF 2011 - Using MongoDB for IGN's Social Platform

Manish Pandit

BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...

BigDataCloud

MongoDB Sharding

Eugene Kovshilovsky

Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...

Codemotion

Intro To Mongo Db

chriskite

MongoDB 3.2 - a giant leap. What’s new?

Binary Studio

Tales from the Field

MongoDB

Netezza fundamentals for developers

Biju Nair

Why NoSQL Makes Sense

MongoDB

MongoDB Basics Unileon

Juan Antonio Roy Couto

Deployment Preparedness

MongoDB

CouchDB

codebits

MongoDB for Time Series Data Part 3: Sharding

MongoDB

Mark Logic StrangeLoop 2010

Christopher Biow

10 Key MongoDB Performance Indicators

iammutex

Performance Optimization of Rails Applications

Serge Smetana

Big Data Lakes Benchmarking 2018

Tom Grek

MongoDB at Scale

MongoDB

MongoDB Europe 2016 - Big Data meets Big Compute

MongoDB

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011 (20)

Rapid and Scalable Development with MongoDB, PyMongo, and Ming

MongoSF 2011 - Using MongoDB for IGN's Social Platform

BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...

MongoDB Sharding

Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...

Intro To Mongo Db

MongoDB 3.2 - a giant leap. What’s new?

Tales from the Field

Netezza fundamentals for developers

Why NoSQL Makes Sense

MongoDB Basics Unileon

Deployment Preparedness

CouchDB

MongoDB for Time Series Data Part 3: Sharding

Mark Logic StrangeLoop 2010

10 Key MongoDB Performance Indicators

Performance Optimization of Rails Applications

Big Data Lakes Benchmarking 2018

MongoDB at Scale

MongoDB Europe 2016 - Big Data meets Big Compute

Recently uploaded

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

MINDCTI Revenue Release Quarter One 2024

MIND CTI

DBX First Quarter 2024 Investor Presentation

Dropbox

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

In this keynote, Asanka Abeysinghe, CTO,WSO2 will explore the shift towards platformless technology ecosystems and their importance in driving digital adaptability and innovation. We will discuss strategies for leveraging decentralized architectures and integrating diverse technologies, with a focus on building resilient, flexible, and future-ready IT infrastructures. We will also highlight WSO2's roadmap, emphasizing our commitment to supporting this transformative journey with our evolving product suite.

Platformless Horizons for Digital Adaptability

WSO2

Retrieval augmented generation (RAG) is the most popular style of large language model application to emerge from 2023. The most basic style of RAG works by vectorizing your data and injecting it into a vector database like Milvus for retrieval to augment the text output generated by an LLM. This is just the beginning. One of the ways that we can extend RAG, and extend AI, is through multilingual use cases. Typical RAG is done in English using embedding models that are trained in English. In this talk, we’ll explore how RAG could work in languages other than English. We’ll explore French, Chinese, and Polish.

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Zilliz

Angeliki Cooney has spent over twenty years at the forefront of the life sciences industry, working out of Wynantskill, NY. She is highly regarded for her dedication to advancing the development and accessibility of innovative treatments for chronic diseases, rare disorders, and cancer. Her professional journey has centered on strategic consulting for biopharmaceutical companies, facilitating digital transformation, enhancing omnichannel engagement, and refining strategic commercial practices. Angeliki's innovative contributions include pioneering several software-as-a-service (SaaS) products for the life sciences sector, earning her three patents. As the Senior Vice President of Life Sciences at Avenga, Angeliki orchestrated the firm's strategic entry into the U.S. market. Avenga, a renowned digital engineering and consulting firm, partners with significant entities in the pharmaceutical and biotechnology fields. Her leadership was instrumental in expanding Avenga's client base and establishing its presence in the competitive U.S. market.

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Angeliki Cooney

FWD Group - Insurer Innovation Award 2024

The Digital Insurer

Architecting Cloud Native Applications

WSO2

Vector Search -An Introduction in Oracle Database 23ai.pptx

Remote DBA Services

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

ICT role in 21st century education and its challenges

rafiqahmad00786416

Passkeys: Developing APIs to enable passwordless authentication Cody Salas, Sr Developer Advocate | Solutions Architect - Yubico Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

apidays

Whatsapp Number Escorts Call girls 8617370543 Available 24x7 Mcleodganj Call Girls Service Offer Genuine VIP Model Escorts Call Girls in Your Budget. Mcleodganj Call Girls Service Provide Real Call Girls Number. Make Your Sexual Pleasure Memorable with Our Mcleodganj Call Girls at Affordable Price. Top VIP Escorts Call Girls, High Profile Independent Escorts Call Girls, Housewife Women Escorts Call Girl, College Girls Escorts Call Girls, Russian Escorts Call girls Service in Your Budget.

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Deepika Singh

Corporate and higher education. Two industries that, in the past, have had a clear divide with very little crossover. The difference in goals, learning styles and objectives paved the way for differing learning technologies platforms to evolve. Now, those stark lines are blurring as both sides are discovering they have content that’s relevant to the other. Join Tammy Rutherford as she walks through the pros and cons of corporate and higher ed collaborating. And the challenges of these different technology platforms working together for a brighter future.

Corporate and higher education May webinar.pptx

Rustici Software

Recently uploaded (20)

MS Copilot expands with MS Graph connectors

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

MINDCTI Revenue Release Quarter One 2024

DBX First Quarter 2024 Investor Presentation

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

AWS Community Day CPH - Three problems of Terraform

Platformless Horizons for Digital Adaptability

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

FWD Group - Insurer Innovation Award 2024

Architecting Cloud Native Applications

Vector Search -An Introduction in Oracle Database 23ai.pptx

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Strategies for Landing an Oracle DBA Job as a Fresher

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

ICT role in 21st century education and its challenges

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Corporate and higher education May webinar.pptx

Scaling with mongo db - SF Mongo User Group 7-19-2011

1. Scaling with MongoDBJared Rosoff (jsr@10gen.com) - @forjared

3. How do we do it today? We use a relational database but … We don’t use joins We don’t use transactions We add read-only slaves We added a caching layer We de-normalized our data We implemented custom sharding We buy bigger servers

4. How’s that working out for you?

5. Costs go up

6. Productivity goes down

7. By engineers, for engineers

8. The landscape Memcached Key / Value Scalability & Performance RDBMS Depth of functionality

9. Scaling your app Use documents Indexes make me happy Knowing your working set Disks are the bottleneck Replication makes reading fun Sharding for profit

10. Scaling your data model

11. Documents { author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Best Movie Ever” } ] }

12. Disk Seeks & Data Locality Read = really really fast Seek = 5+ ms

13. Disk Seeks & Data Locality Post Comment Author

14. Disk Seeks & Data Locality Post Author Comment Comment Comment Comment Comment

15. Optimized indexes

16. Table scans Find where x equals 7 1 2 3 4 5 6 7 Looked at 7 objects

17. Tree Lookup Find where x equals 7 4 6 2 7 5 3 1 Looked at 3 objects

18. Random Index Entire index must fit in RAM

19. Right Aligned Only small portion in RAM

20. Working set size

21. Working Set Active Documents + Used Indexes RAM Disk

22. Page Fault App requests document Document not in memory Evict a page from memory Read block from disk Return document from memory App 1 5 2 RAM 3 4 Disk

23. Figuring out working Set > db.foo.stats() { "ns" : "test.foo", "count" : 1338330, "size" : 46915928, "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "x_1" : 43982848 }, "ok" : 1 } Size of data Average document size Size on disk (and in memory!) Size of all indexes Size of each index

24. Disk configurations

25. Single Disk ~200 seeks / second

26. RAID0 ~200 seeks / second ~200 seeks / second ~200 seeks / second

27. RAID10 ~400 seeks / second ~400 seeks / second ~400 seeks / second

28. replication

29. Replica Sets Read / Write Secondary Read Primary Read Secondary

30. Replica Sets Read / Write Read Secondary Secondary Read Primary Read Secondary Secondary Read

31. Sharding

32. Secondary Secondary Secondary Secondary MongoS MongoS Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 Primary Primary Primary Primary Secondary Secondary Secondary Secondary

33. 400GB Index?

34. 400GB Index? Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 100GB Index! 100GB Index! 100GB Index! 100GB Index!

35. Summary

36. Summary Use documents to your advantage! Optimize your indexes Understand your working set Use a sane disk configuratino Use replicas to scale reads Use sharding to scale writes & working RAM

Editor's Notes

Let’s talk about infrastructure costs. You probably started building your application on top of an RDBMS. This is the way we have built enterprise and web applications for years. But the problem is that your RDBMS doesn’t have a smooth cost curve when you scale it up. When you start off, you may be running on a smaller server, totally adequate for your load. When you exceed the capacity of that small server, you need to buy a bigger server. You can’t add a second small server. This process repeats. You exceed the capacity of your new server, and upgrade your hardware. There are two long term problems with this: As you scale up, you end up paying more and more for each transaction that your system processes. A small server may cost you $1,000 per CPU, but when you need 128 processors, you might be paying as much as $100,000 per CPU. Each incremental step up in hardware gets more and more expensive, not cheaper and cheaper. You reach an end of this scaling approach. Once you have scaled up to the biggest hardware platform available on the market, there is nowhere to go; no bigger box to buy. At this point you need to change strategies, even if you can afford those ultra-high-end boxes.
And while we’ve been spending more and more money on Hardware, our developer productivity has gone down too. You will hear this storyover and over again from CIO’s and architects: “Well, we use <insert RDBMS> but we don’t use joins or transactions and we’ve de-normalized our schema.” As our hardware gets more and more expensive, we ask our developers to squeeze more and more performance out of the same box. To achieve this, they go through “herculean efforts” to strip their code of advanced features that once made them productive. De-normalizing data, eliminating joins and transactions, adding caching and sharding layers… These are risky projects that slow down feature velocity.

Scaling with mongo db - SF Mongo User Group 7-19-2011

Recommended

Recommended

More Related Content

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011 (20)

More from Jared Rosoff

More from Jared Rosoff (9)

Recently uploaded

Recently uploaded (20)

Scaling with mongo db - SF Mongo User Group 7-19-2011

Editor's Notes