Successfully reported this slideshow.

Meetup#2: Building responsive Symbology & Suggest WebService


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Meetup#2: Building responsive Symbology & Suggest WebService

  1. 1. Building responsiveSymbology & Suggest web service with MongoDB Andrei Palchys, @apalchys Alex Kosau, @alexkosau
  2. 2. Introduction• Customer: Thomson Reuters• Business domain: Financial markets• Goal: Implement Next-Gen financial web services• The project started: July 2011• The project finished: (Dec 2011)• Team: 1 team lead, 5+1 developers, 2 QA
  3. 3. Web services• Symbology Web ServiceProvides reference data about financial instruments, via symbols,codes or instrument names• Suggest Web Service
  4. 4. Architecture Old Search Web FrontSources ETL Engine services End Desktop NewSources ETL The New Web Services Desktop
  5. 5. Reasons to write the new web services• Bad performance• Expensive for scaling or extending• Not easy to manage some type of data
  6. 6. Requirements for the web new services• Performance 95% Symbology requests should fit in 50ms. 95% Suggest requests should fit in 25ms.• Use normalized data• Use less memory as much as possible• Fast data loading into DB• Windows environment and .Net platform
  7. 7. What we considered from commercial databases• Microsoft SQL Server • 13 ms, too slow• Oracle TimesTen • Relational • Completely in-memory: guaranteed latency but slow startup • Expensive• McObject’s ExtremeDb • Object DB • Native C interface: designed for performance • Ultra reliability • Still expensive
  8. 8. What we considered from free databases • Redis • Hbase • Cassandra • RavenDB All these databases miss one of the requirements
  9. 9. MongoDB• Document-oriented• Simple use (decent interface for .NET available)• Simple maintenance (monitoring, replication, sharding)• Data is stored in-memory once used.• 1ms average response time• Cross-platform (native Windows support)
  10. 10. Databases• Symbology DB – about 30GB of data• Suggest DB – >22 GB of data Symbology DB Suggest DB Symbology WS Suggest WS Suggest WS
  11. 11. Deployment (planned)• 6 “clusters” all around the world (TR data centers), in replica set.• “cluster” – 3 servers (replica set + sharding) + 1 arbiter• 2 of them are also used to load data.• 128GB of memory per server
  12. 12. Symbology DB: challenge• Fast search by full key• Minimize the space taken by the data, since we need it to fit into RAM • Data is Text only (no pictures etc) • Full document required always • Only some fields are used to query data, and these fields are short (3..10 symbols) • New fields should be easily added to the “queryable” list• Composite queries are needed sometimes • AB and CD and not EF or GH• Fast data loading
  13. 13. Symbology DB: solutionMap the names of the document fields to intsRIC -> 1Name -> 2{ "1": "GOOG.O", "2": "Google"}
  14. 14. Symbology DB: solutionUnite all queryable fields into arrays• Query syntax is the same• Single index – less space occupied• Easy to add new searchable data"s":[{ "k": 1, "v": "MSFT.O" },{ "k": 2, "v": "Microsoft Inc." }]
  15. 15. Symbology DB: solutionCombine key and value properties• Takes less space• Use regex /^a../• No performance decrease – MongoDB uses index for regex which starts with /^"s":[ "MSFT.O|1", "Microsoft Inc.|2"]Query: { s: { $regex: "^MSFT.O|" } }
  16. 16. Symbology DB: solution Compress not queryable data and store as a single field (binary data) • Encode with Protocol Buffers or MsgPack – In our case, MsgPack 2x faster than Protobuf • Zip with Snappy – Fastest algorithm in the world.{ "b" :BinData(0,"CgcxMDkwMzcwEgZ1cztJQk0xAAAAAAAA8D86A05ZU0IXTmV3IFlvcmsgU3RvY2sgRXhjaGFuZ2VZAAAAAAAA8D9gAXABeAGJAQAAAAAAAPA/ogEFNDc0MU6qAQU0NzQxTrI…“)}
  17. 17. Symbology DB: solutionChange ETL output format to json and insertdirectly to MongoDBIt helped to decrease loading time from 9h to1h.
  18. 18. Suggest DB: challenge• Fast search by partial text• Keep only top 50 entities per term• Generate Suggest DB from existing Symbology DB
  19. 19. Suggest DB: solutionUse “Inverted” index for fast search by partialtext{“term”: “g”, “references”:[…]},{“term”: “go”, “references”:[…]},{“term”: “goo”, “references”:[…]},{“term”: “goog”, “references”:[…]},
  20. 20. Suggest DB: solutionGenerate Suggest DB from existing Symbology DB • About 750 mln temporary documents • MongoDB Map Reduce is too slow • All MongoDB based algorithms takes a lot of time Use Amazon Elastic MapReduce! 10h -> 40 mins Practical usage Amazon Elastic MapReduce (Viktar Basharymau)
  21. 21. .Net MongoDB driver- Use IBsonSerializer interface instead of BsonElement attributes- Driver has good performance – we have not found any bottlenecks.
  22. 22. Questions?