From sql server to mongo db

19,071 views

Published on

Presentation given to the MongoDB NYC User Group on 9/27/2012.

My blog: http://architectryan.com
Twitter: @tekmaven

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
19,071
On SlideShare
0
From Embeds
0
Number of Embeds
14,066
Actions
Shares
0
Downloads
50
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

From sql server to mongo db

  1. 1. From SQL Server to MongoDBRyan Hoffman, Senior Software Architect@tekmavenhttp://architectryan.com
  2. 2. TNTP + TeacherTrack • TNTP is a national nonprofit committed to ending the injustice of educational inequality. Founded by teachers in 1997, TNTP works with schools, districts and states to provide excellent teachers to the students who need them most and advance policies and practices that ensure effective teaching in every classroom. • TeacherTrack is a web-based applicant tracking and teacher evaluation system. TNTP recruits teachers for districts nationwide, including in New Orleans, Philadelphia, and New York City, with TeacherTrack.© TNTP 2012 2
  3. 3. TeacherTrack Technology • .NET 4.0 o ASP.NET Web Forms o ASP.NET MVC o WCF o WF4 • NHibernate ORM for SQL Server • MongoDB .NET Driver • NServiceBus • Lucene.NET • Much, much more…© TNTP 2012 3
  4. 4. Survey Templates • TeacherTrack uses a flexible data structure called a Survey to store a majority of data. A survey works very similarly to the conceptual model of a SurveyMonkey survey. • A Survey Template is a “master” survey in which blank survey instances are created from. A Survey template consists of some header data (a string key, an ID, as well as what site it is for) and an array of questions. • Each question contains the question text, as well as properties that govern how the question is rendered (for example if it is a text box or a drop down).© TNTP 2012 4
  5. 5. Surveys • A blank survey is instantiated from the survey template. It contains header data that associates that survey to a user, and contains an array of responses. • Each response contains the entire set of data from the question. o If the original survey template is changed, we will always be able to load the original questions the survey was filled out with. o It also allows for rendering a survey without needing to load a template.© TNTP 2012 5
  6. 6. TeacherTrack Survey Demo
  7. 7. Storing Surveys and Survey Object One table for Surveys and class Survey { another for Responses. Guid Id { get; set; } Guid AccountId { get; set; } string Title { get; set; } • 1 row in the Survey table. List<Response> Responses { get; set; } } • 1 row per response in Response table. class Response { Guid Id { get; set; } A survey with 20 responses string Value { get; set; } would be stored in 21 rows. string QuestionText { get; set; } string QuestionTitle { get; set; } ElementTypes QuestionElementType { get; set; } ControlTypes QuestionControlType { get; set; } string Watermark { get; set; } } //Additional fields omitted for brevity © TNTP 2012 7
  8. 8. © TNTP 2012 8
  9. 9. SQL Server Challenges • Performance! • Joining between the two tables was slow! We had >1 million surveys and >16 million responses before converting to MongoDB. • Actual query time in the application could easily be >200ms for one survey. • There were existing pages in the application where we could easily need to load over 20 surveys. 10 second page load times are not fun to work with. • Iterative Development • When alter tables take 20 minutes to run, deployment scripts which were not designed with this in mind break and time out.© TNTP 2012 9
  10. 10. © TNTP 2012 10
  11. 11. Why TNTP selected MongoDB • Performance, durability, and scaling. o Document databases allow for a richer schema. o Replica sets are elegant, easy to set up, and reliable. o Auto-sharding is a great future option to scale. • 10gen rocks. o Training. Switching from an RDBMS so a document database is a big paradigm shift. 10gen’s Developer and Administrator training did a great job giving key team members the skills to make this possible. o Great support options. TNTP uses MMS to get insight their MongoDB servers, and we love that 10gen proactively can reach out to us based on server telemetry. o Great people. From day one at training, I met many 10gen employees, including people responsible for the Windows version. This type of access and interaction can not be understated.© TNTP 2012 11
  12. 12. Survey Documents in MongoDB • Surveys are a great match for MongoDB. • The number of responses never changes after a survey is instantiated, making it an ideal candidate for being an embedded array in the survey document. • <10ms query times! { "_id" : BinData(3,"vD+ifVfvS0qlk5vN8OPQOQ=="), "AccountId" : BinData(3,"B1giiULLskSEG7rYmdqBUA=="), "Title" : "Registering", "Responses" : [ { "_id" : BinData(3,"UvqabcPS1UGZipKODPKgGA=="), "Value" : "Ryan", "QuestionText" : "What is your first name?", "QuestionElementType" : 1, "QuestionControlType" : 1 } ] }© TNTP 2012 12
  13. 13. Conversion Query Insert Convert SQL into to BSON Server Mongo© TNTP 2012 13
  14. 14. Conversion - Multithreading • The original proof of concept was single threaded. It took over two days to convert the data. When we refactored to a multithreaded model, conversion took less then 20 hours. • Each of the three parts of the conversion run in their own thread • A queue between each thread allows the threads to pass data along.  The query thread to add objects to a conversion queue for the conversion thread.  Similarly, the conversion thread adds converted objects to the insert queue for the insert thread.  System.Collections.Concurrent.BlockingCollection<T> made this very easy.© TNTP 2012 14
  15. 15. Conversion – Auto Batching • Returning millions of rows in one query is clearly not going to work well. We need to batch the source queries and iterate until 0 rows are returned in the batch. • Querying batches out of SQL Server was very inconsistent. With no other load on the server, batches would take 45 seconds to over 10 minutes. • Instead of making each batch a fixed number of rows, we had logic that timed how long the previous batch took. Based on trial and error, a 1 minute batch time became the target. The code would adjust the number of rows based on the previous query’s number of rows and the query’s time.© TNTP 2012 15
  16. 16. Conversion - Incremental • Converting the data is still a time consuming process. When we deploy code that uses MongoDB, all the data needs to be converted. Deployments generally take less then an hour. The “20 hours of downtime” discussion is not a great conversation to have with stakeholders • The answer: pre-convert the data! When we deploy, convert only the last 24 hours of data, which may only take minutes. • Surveys have a ModifiedOn date field. Using this is the key to converting! We did a lot of work and testing to make sure this field was always updated when a change was made. • Surveys are never deleted. A delete flips a deleted flag on the row. This allowed us to not worry about incrementally tracking deletes. • A command line switch allowed us to specify the start date of the conversion.© TNTP 2012 16
  17. 17. Deployment Lessons • Practice makes perfect. We took stories over 3 sprints (each sprint is 3 weeks) to prepare for the conversion. • Always explicitly set your oplog size! The defaults created a 40gb oplog on the production servers. Since MongoDB uses memory mapped files, that 40gb oplog was loaded into ram. The servers have 48gb of RAM. We resized to a more sane 3gb. • If you have profiling turned on, you can’t fsyncLock the server. We didn’t know this, and it immediately broke the backup scripts the first night. I added a ticket to 10gen for this, and the documentation now reflects this.© TNTP 2012 17
  18. 18. Using MongoDB as a .NET Developer • Since most users run MongoDB on Linux, I was concerned about reliability and performance running on Windows. I’m happy to say that MongoDB works very well on Windows and we’ve had no issues. • The MongoDB .NET Driver is excellent. It allows raw BsonDocument access, or can map documents to your objects. It has very good LINQ support, and is constantly improving its API. • Guids are the primary key for most structures. Working with them is very inconvenient in the shell. In fact, without the “UUID” helper from the C# driver’s git repo, it would be nearly impossible to use the shell to work with Guids.© TNTP 2012 18
  19. 19. Wrap Up • MongoDB was a game changer for TeacherTrack. Think. In. Documents. • 10gen is a great company to work with. We are depending on MonogDB, and knowing that the people behind MongoDB were available for us was a huge plus. • Pre-conversion and incremental conversion are the keys of minimizing deployment time when working with a large set of data. • Most importantly, this was all made possible because of very talented team members at TNTP. You guys rock!© TNTP 2012 19
  20. 20. QuestionsSlides will be made available on my blog, located athttp://architectryan.com/

×