The document discusses several challenges with MongoDB, including:
1. MongoDB uses a global write lock, which can negatively impact write performance.
2. Auto-sharding in MongoDB is not always reliable, as the balancer can get into deadlocks and MongoDB has trouble determining the number of documents after sharding.
3. Being schema-less is overrated, as it means repeating the schema in each document, increasing storage size. Possible solutions discussed include using shorter key names to reduce document sizes.
These are the slides I presented at the Nosql Night in Boston on Nov 4, 2014. The slides were adapted from a presentation given by Steve Francia in 2011. Original slide deck can be found here:
http://spf13.com/presentation/mongodb-sort-conference-2011
MongoDB's architecture features built-in support for horizontal scalability, and high availability through replica sets. Auto-sharding allows users to easily distribute data across many nodes. Replica sets enable automatic failover and recovery of database nodes within or across data centers. This session will provide an introduction to scaling with MongoDB by one of MongoDB's early adopters.
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.
These are the slides I presented at the Nosql Night in Boston on Nov 4, 2014. The slides were adapted from a presentation given by Steve Francia in 2011. Original slide deck can be found here:
http://spf13.com/presentation/mongodb-sort-conference-2011
MongoDB's architecture features built-in support for horizontal scalability, and high availability through replica sets. Auto-sharding allows users to easily distribute data across many nodes. Replica sets enable automatic failover and recovery of database nodes within or across data centers. This session will provide an introduction to scaling with MongoDB by one of MongoDB's early adopters.
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
Este es el último seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web le guiaremos por el despliegue en producción.
MongoDB is a popular NoSQL database. This presentation was delivered during a workshop.
First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases.
Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less.
At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.
Are you in the process of evaluating or migrating to MongoDB? We will cover key aspects of migrating to MongoDB from a RDBMS, including Schema design, Indexing strategies, Data migration approaches as your implementation reaches various SDLC stages, Achieving operational agility through MongoDB Management Services (MMS).
Redundancy and high availability are the basis for all production deployments. With MongoDB this can be achieved by deploying replica set. In this slides we are exploring how the replication works with MongoDB, why you should use replication, what are the features and go over different deployment use cases. At the end we are comparing some features with MySQL replication and what are the differences between the two
This tutorial will introduce the features of MongoDB by building a simple location-based application using MongoDB. The tutorial will cover the basics of MongoDB’s document model, query language, map-reduce framework and deployment architecture.
The tutorial will be divided into 5 sections:
Data modeling with MongoDB: documents, collections and databases
Querying your data: simple queries, geospatial queries, and text-searching
Writes and updates: using MongoDB’s atomic update modifiers
Trending and analytics: Using mapreduce and MongoDB’s aggregation framework
Deploying the sample application
Besides the knowledge to start building their own applications with MongoDB, attendees will finish the session with a working application they use to check into locations around Portland from any HTML5 enabled phone!
TUTORIAL PREREQUISITES
Each attendee should have a running version of MongoDB. Preferably the latest unstable release 2.1.x, but any install after 2.0 should be fine. You can dowload MongoDB at http://www.mongodb.org/downloads.
Instructions for installing MongoDB are at http://docs.mongodb.org/manual/installation/.
Additionally we will be building an app in Ruby. Ruby 1.9.3+ is required for this. The current latest version of ruby is 1.9.3-p194.
For windows download the http://rubyinstaller.org/
For OSX download http://unfiniti.com/software/mac/jewelrybox/
For linux most users should know how to for their own distributions.
We will be using the following GEMs and they MUST BE installed ahead of time so you can be ahead of the game and safe in the event that the Internet isn’t accommodating.
bson (1.6.4)
bson_ext (1.6.4)
haml (3.1.4)
mongo (1.6.4)
rack (1.4.1)
rack-protection (1.2.0)
rack shotgun (0.9)
sinatra (1.3.2)
tilt (1.3.3)
Prior ruby experience isn’t required for this. We will NOT be using rails for this app.
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
Slides of my MongoDB Training given at Coding Serbia Conference on 18.10.2013
Agenda:
1. Introduction to NoSQL & MongoDB
2. Data manipulation: Learn how to CRUD with MongoDB
3. Indexing: Speed up your queries with MongoDB
4. MapReduce: Data aggregation with MongoDB
5. Aggregation Framework: Data aggregation done the MongoDB way
6. Replication: High Availability with MongoDB
7. Sharding: Scaling with MongoDB
Determining the root cause of performance issues is a critical task for Operations. In this webinar, we'll show you the tools and techniques for diagnosing and tuning the performance of your MongoDB deployment. Whether you're running into problems or just want to optimize your performance, these skills will be useful.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
Este es el último seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web le guiaremos por el despliegue en producción.
MongoDB is a popular NoSQL database. This presentation was delivered during a workshop.
First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases.
Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less.
At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.
Are you in the process of evaluating or migrating to MongoDB? We will cover key aspects of migrating to MongoDB from a RDBMS, including Schema design, Indexing strategies, Data migration approaches as your implementation reaches various SDLC stages, Achieving operational agility through MongoDB Management Services (MMS).
Redundancy and high availability are the basis for all production deployments. With MongoDB this can be achieved by deploying replica set. In this slides we are exploring how the replication works with MongoDB, why you should use replication, what are the features and go over different deployment use cases. At the end we are comparing some features with MySQL replication and what are the differences between the two
This tutorial will introduce the features of MongoDB by building a simple location-based application using MongoDB. The tutorial will cover the basics of MongoDB’s document model, query language, map-reduce framework and deployment architecture.
The tutorial will be divided into 5 sections:
Data modeling with MongoDB: documents, collections and databases
Querying your data: simple queries, geospatial queries, and text-searching
Writes and updates: using MongoDB’s atomic update modifiers
Trending and analytics: Using mapreduce and MongoDB’s aggregation framework
Deploying the sample application
Besides the knowledge to start building their own applications with MongoDB, attendees will finish the session with a working application they use to check into locations around Portland from any HTML5 enabled phone!
TUTORIAL PREREQUISITES
Each attendee should have a running version of MongoDB. Preferably the latest unstable release 2.1.x, but any install after 2.0 should be fine. You can dowload MongoDB at http://www.mongodb.org/downloads.
Instructions for installing MongoDB are at http://docs.mongodb.org/manual/installation/.
Additionally we will be building an app in Ruby. Ruby 1.9.3+ is required for this. The current latest version of ruby is 1.9.3-p194.
For windows download the http://rubyinstaller.org/
For OSX download http://unfiniti.com/software/mac/jewelrybox/
For linux most users should know how to for their own distributions.
We will be using the following GEMs and they MUST BE installed ahead of time so you can be ahead of the game and safe in the event that the Internet isn’t accommodating.
bson (1.6.4)
bson_ext (1.6.4)
haml (3.1.4)
mongo (1.6.4)
rack (1.4.1)
rack-protection (1.2.0)
rack shotgun (0.9)
sinatra (1.3.2)
tilt (1.3.3)
Prior ruby experience isn’t required for this. We will NOT be using rails for this app.
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
Slides of my MongoDB Training given at Coding Serbia Conference on 18.10.2013
Agenda:
1. Introduction to NoSQL & MongoDB
2. Data manipulation: Learn how to CRUD with MongoDB
3. Indexing: Speed up your queries with MongoDB
4. MapReduce: Data aggregation with MongoDB
5. Aggregation Framework: Data aggregation done the MongoDB way
6. Replication: High Availability with MongoDB
7. Sharding: Scaling with MongoDB
Determining the root cause of performance issues is a critical task for Operations. In this webinar, we'll show you the tools and techniques for diagnosing and tuning the performance of your MongoDB deployment. Whether you're running into problems or just want to optimize your performance, these skills will be useful.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
For our eReader development project, we had to find a persistent storage for our JSON documents. After initial scanning we zeroed into two products DynamoDB and MongoDB. These slides take a deeper dive in the selection of our JSON data store.
David Mytton is a MongoDB master and the founder of Server Density. In this presentation David delves deeper into what's discussed in our how to monitor MongoDB tutorial (https://blog.serverdensity.com/monitor-mongodb/), with the aim of taking you through:
Key MongoDB metrics to monitor.
Non-critical MongoDB metrics to monitor.
Alerts to set for MongoDB on production.
Tools for monitoring MongoDB.
Jackrabbit Oak is an effort to implement a scalable and performant hierarchical content repository for use as the foundation of modern world-class web sites and other demanding content applications.
The Oak effort is a part of the Apache Jackrabbit project. Jackrabbit is a project of the Apache Software Foundation.
MyRocks is an open source LSM based MySQL database, created by Facebook. This slides introduce MyRocks overview and how we deployed at Facebook, as of 2017.
Database as a Service on the Oracle Database Appliance PlatformMaris Elsins
Speaker: Marc Fielding, Co-speaker: Maris Elsins.
Oracle Database Appliance provides a robust, highly-available, cost-effective, and surprisingly scalable platform for database as a service environment. By leveraging Oracle Enterprise Manager's self-service features, databases can be provisioned on a self-service basis to a cluster of Oracle Database Appliance machines. Discover how multiple ODA devices can be managed together to provide both high availability and incremental, cost-effective scalability. Hear real-world lessons learned from successful database consolidation implementations.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Oracle Active Data Guard 12cR2. Is it the best option?Ludovico Caldara
If you are using Oracle Data Guard for data protection (hint: you should!), you might also want to know more about Oracle Active Data Guard and what makes it essential for even more increased availability and performance. In this session, I will give an overview of many new and old Active Data Guard features such as:
- Rolling Upgrades
- Real-time Query
- Fast Incremental Backup
- Subset Standby
- Multiple Instance Redo Apply
- Advanced topologies (Real-time Cascading Standby, Far Sync Standby, Alternate destinations)
- Automatic Block Repair
- Global Data Services
I will also explain why the ROI of Oracle Database Enterprise Edition can be higher when coupled with Oracle Active Data Guard.
Slides from my talk at RootConf 2012, Bangalore (http://rootconf.in/bangalore2012). The talk covers some general tips and practices to be followed when building web applications for scale on the LAMPhp stack.
Some key value stores using log-structureZhichao Liang
This slides presents three key-value stores using log-structure, includes Riak, RethinkDB, LevelDB. BTW, i state that RethinkDB employs append-only B-tree and that is an estimate made by combining guessing wih reasoning!
Conference slides: MySQL Cluster Performance TuningSeveralnines
This presentation goes through performance tuning basics in MySQL Cluster.
It also covers the new parameters and status variables of MySQL Cluster 7.2 to determine issues with e.g disk data performance and query (join) performance.
Project presentation for High Availability in YARN project. We propose to use MySQL Cluster (NDB) to tackle High Availability issue in YARN. We also developed benchmark framework to investigate whether MySQL Cluster (NDB) is better than Apache's proposed storage (ZooKeeper and HDFS)
Full project report will be uploaded after I finish it.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Challenges with MongoDB
1. Challenges with
MongoDB
Stone Gao
MongoDB Beijing 2012
Monday, April 2, 2012
2. About Me
Tech Lead at Umeng.com
Monday, April 2, 2012
3. MongoDB is Awesome
• Document-oriented storage
• Full Index Support
• Replication & High Availability
• Auto-Sharding
• Querying
• Fast In-Place Updates
• Map/Reduce
• GridFS
Monday, April 2, 2012
4. But...
This talk is not Yet Another Talk about it’s
Awesomeness
but
challenges with MongoDB
Monday, April 2, 2012
5. Outline
1. Global Write Lock Sucks
2. Auto-Sharding is not that Reliable
3. Schema-less is Over Rated
4. Community Contribution is Quite Low
5. Attitude Matters
Monday, April 2, 2012
6. 1. Global Write Lock Sucks
http://www.clker.com/cliparts/3/3/5/D/X/b/locked-exclamation-mark-padlock-hi.png
Monday, April 2, 2012
7. 1. Global Write Lock Sucks
single global write lock for the entire server (process)
collection1 table1
doc1 doc1
doc2 doc2
db-1 db-1
collection2 table2
doc1 doc1
doc2 doc2
mongod mysqld
doc1
collection1 VS. doc1
table1
doc2 doc2
db-n db-n
collection2 table2
doc1 doc1
doc2 doc2
DB Process Lock VS. Row Lock
Monday, April 2, 2012
8. 1. Global Write Lock Sucks
Intel SSD 320 RAID10 & mongostat
39.5K Rread IOPS / 23K Write IOPS
Nearly all data in RAM, lock ratio is pretty high and
bunch of Queued Writes(qw)
Monday, April 2, 2012
9. 1. Global Write Lock Sucks
Intel SSD 320 RAID10 & mongostat
39.5K Rread IOPS / 23K Write IOPS
Nearly all data in RAM, lock ratio is pretty high and
bunch of Queued Writes(qw)
Monday, April 2, 2012
10. 1. Global Write Lock Sucks
Intel SSD 320 RAID10 & mongostat
39.5K Rread IOPS / 23K Write IOPS
Nearly all data in RAM, lock ratio is pretty high and
bunch of Queued Writes(qw)
Monday, April 2, 2012
11. 1. Global Write Lock Sucks
Intel SSD 320 RAID10 & mongostat
39.5K Rread IOPS / 23K Write IOPS
Nearly all data in RAM, lock ratio is pretty high and
bunch of Queued Writes(qw)
Monday, April 2, 2012
12. Possible Solutions/Workarounds #1
Wait for lock related issues on JIRA
•SERVER-2563 : When hitting disk, yield lock - phase 1
https://jira.mongodb.org/browse/SERVER-2563 Fixed in 1.9.1 Vote (25)
• any time we actually have to hit disk. so if a memory mapped page is not in ram, then we should yield
update by _id, remove, long cursor iteration
•SERVER-1240 : Collection level locking
https://jira.mongodb.org/browse/SERVER-1240 Planning Bucket A Vote (154)
•SERVER-1241 : Intra collection locking (maybe extent)
https://jira.mongodb.org/browse/SERVER-1241 Planning Bucket A Vote (25)
•SERVER-1169 : Record level locking
https://jira.mongodb.org/browse/SERVER-1169 Rejected Vote (1)
and more ...
Monday, April 2, 2012
13. Possible Solutions/Workarounds #2
One Collection per DB to Reduce Lock Ratio
But you can go no further
Use Auto-Sharding to the rescue ?
Monday, April 2, 2012
14. 2. Auto-Sharding is not that Reliable
http://www.autoinsurancecompanies.com/wp-content/uploads/2011/11/reliable.jpg
Monday, April 2, 2012
16. Problems with Auto-Sharding
• MongoDB can’t figure out how many docs in a collection after sharding
• Balancer dead lock
[Balancer] skipping balancing round during ongoing split or move activity.)
[Balancer] dist_lock lock failed because taken by....
[Balancer] Assertion failure cm s/balance.cpp...
• Uneven shard load distribution
• ...
(Note: I did the experiment before 2.0. So some of the issues might be fixed
or improved in new versions of MongoDB coz it’s evolving very fast)
Monday, April 2, 2012
17. Possible Solutions/Workarounds #1
Manual Chunk Pre-Splitting
http://www.mongodb.org/display/DOCS/Splitting+Shard+Chunks
https://groups.google.com/d/msg/mongodb-user/tYBFKSMM3cU/TiYtoOiNMgEJ
http://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-importing/
0) Turn off the balancer (balancing won't understand your locations, but it shouldn't matter b/c
you're using hashed shard keys)
1) Shard the empty collection over the shard key { location : 1, hash : 1 }
2) run db.runCommand({ split : "<coll>", middle : { "location":"DEN", "hash": "8000...0" }})
3) run db.runCommand({ split : "<coll>", middle : { "location":"SC", "hash": "0000...0" }})
4) move those empty chunks to whatever shards you want
- Greg Studer
Monday, April 2, 2012
18. Possible Solutions/Workarounds #2
SERVER-2001 : Option to hash shard key
https://jira.mongodb.org/browse/SERVER-2001 Unresolved Fix Version/s: 2.1.1 Vote (27)
“The lack of hashing based read/write distribution
amongst available shards is a huge issue for us now.
We're actually considering implementing an app-side
layer to do this but that obviously has a number of
serious drawbacks.”
- Remon van Vliet
“Seems like a good idea : we implemented hashed
shard key on client-side : operation rate sky rocked
( x3 and less variability). Balancing is moreover
quicker and done during our very heavy insertion
process : perfect !”
- Grégoire Seux
https://github.com/twitter/gizzard/raw/master/doc/forwarding_table.png
Monday, April 2, 2012
19. Possible Solutions/Workarounds #3
Plain-old Application Level Sharding
https://github.com/twitter/gizzard/raw/master/doc/forwarding_table.png
Monday, April 2, 2012
20. 3. Schema-less is Over Rated
http://images.sodahead.com/polls/001635729/1863780_overrated_answer_2_xlarge.jpeg
Monday, April 2, 2012
21. Schema-less is Over Rated
Schema-Free (schema-less) is not free.
It means repeat the schema in every docs (records) !
Monday, April 2, 2012
22. Possible Solutions/Workarounds #1
Use Short Key Names
1.6 billion documents
{"sequence":"AHAHSPGPGSAVKLPAPHSVGKSALR",
"location":{
243 GB
"chromosome":"19",
"strand":"-",
"begin":"51067007",
"end":"51067085"
}}
183 GB
{"s":"AHAHSPGPGSAVKLPAPHSVGKSALR",
"l":{
"c":"19",
"s":"-",
"b":"51067007",
"e":"51067085"
}}
60 GB saved!
ref : http://christophermaier.name/blog/2011/05/22/MongoDB-key-names
Monday, April 2, 2012
23. Possible Solutions/Workarounds #2
SERVER-863 : Tokenize the field names
https://jira.mongodb.org/browse/SERVER-863 planned but not scheduled Vote (66)
“Most collections, even if they don’t contain the same
structure , they contain similar. So it would make a
lot of sense and save a lot of space to tokenize the field
names.”
“The overall benefit as mentioned by other users is that
you reduce the amount of storage/RAM taken up by
redundant data in each document (so you can use
less resources per request, hence gain more throughput
and capacity), while importantly also freeing the
developer from having to pick short and hard to read
field names as a workaround for a technical limitation.”
- Andrew Armstrong
Monday, April 2, 2012
24. Possible Solutions/Workarounds #3
SERVER-164 : Option to store data compressed
https://jira.mongodb.org/browse/SERVER-164 planned but not scheduled Vote (126)
“The way oracle handles this is transparent to the
database server at the block engine level. They
compress the blocks similar to how SAN store's handle
it rather than at a record level. They use zlib type
compression and the overhead is less than 5 percent.
Due to the IO access reduction in both number of
blocks touched, and amount of data transferred, the
overall effect is a cumulative speed increase.
Should MongoDB do it this way? Maybe? But at the end
of the day, the architecture must make Mongo more
scalable, as well as increase the ability limit the storage
footprint.”
- Michael D. Joy
Monday, April 2, 2012
25. 4. Community Contribution is
Quite Low
http://www.thompsoncrg.com/wp-content/themes/zoomtechnic/images/slide/img3.jpg
Monday, April 2, 2012
26. Community Contribution is
Quite Low
https://github.com/mongodb/mongo/graphs/impact
https://github.com/mongodb/mongo/contributors
Monday, April 2, 2012
28. 5. Attitude Matters
http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart
MongoDB already has the sweetest API in the
NoSQL world.
Wish more effort invested in fixing the Hard
Problems : locking, sharding, storage engine...
Monday, April 2, 2012
29. We are hiring
We are doing bigdata analytics
• Backend Engineer (MongoDB, Hadoop,
HBase, Storm, Scala, Java, Ruby, Clojure)
• Data Mining Engineer
• DevOps Engineer
• Front End Engineer
hr@umeng.com
Monday, April 2, 2012