Delivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDB



RavenDB is a schema-less document database that offers fully ACID transactions, fast and flexible search, replication, sharding, and a simple RESTful API wrapped by clients in a growing number of ...

RavenDB is a schema-less document database that offers fully ACID transactions, fast and flexible search, replication, sharding, and a simple RESTful API wrapped by clients in a growing number of languages. In this session, we will discuss the experience of developing and maintaining a RavenDB-backed CMS for one of the largest news sites in the US.

We'll cover:
- Supporting rapid evolution of the content/data model.
- Indexing for full-text, map-reduce, geospatial and other types of search.
- Replicating and sharding across servers and data centers for high-availability.
- Deploying with no downtime.
- Handling huge traffic spikes.



NoSql NOW! 2013 Delivering big content at NBC News with RavenDB

  • NoSql NOW! 2013 Delivering big content at NBC News with RavenDB
  • A quick tour
  • •  Schema-less document database with RESTful API. •  Fully ACID and all writes saved to disk (ESENT). •  Indexing/queries executed with Lucene.NET. •  Easily extended with custom logic using “bundles”. •  Management UI provided in Silverlight. •  Host as Windows Service, IIS app, or embedded in your app. Raven server
  • •  .NET client provided. Third-party clients exist for JavaScript, PHP, and Ruby. •  Wraps HTTP API. •  Provides client-side caching, change notification, LINQ querying. •  Easily extended with many, many hooks into almost all operations. Raven client
  • •  Open source: •  License is AGPL (free) or commercial (paid). •  Exception: Your project can use any OSI-approved license and still use Raven for free. •  Commercial licenses based on max parallelism and RAM. •  Windows clustering support and storage compression/ encryption available with Enterprise license only. Raven licensing
  • Demo
  • Why RavenDB?
  • •  Includes, and more. •  1.2 billion pageviews/month. •  140 million video streams/month. •  58 million unique users/month. •  Traffic spikes up to 100x normal when big news events happen. NBC News Digital network
  • •  Very fast page load required •  “Instant” publish time required •  6 to 8 code deployments each day •  High availability: zero* downtime allowed One of the largest US news sites
  • High availability is when the answer to: “What’s the longest outage before you wind up in your boss’s office?” is < 5 seconds.
  • Credit: Mitch Canter @studionashvegas
  • •  Rolling deployments and rollbacks. •  Apps and services decoupled physically and temporally. •  Designed for both auto-failover/recovery and manual reconfiguration by ops. •  Seamless scale out by adding instances of any process. •  And more… Some prerequisites for HA
  • •  Data schema can evolve rapidly •  Apps shouldn’t know where data is •  Apps should talk to the closest data replica •  Apps should automatically find a new replica if the closest becomes unavailable •  Ops can add/remove replicas quickly and easily, without affecting any running apps HA data: a private data cloud
  • •  Schema-less document database allows rapid change. •  Fully ACID model fit business needs. •  Strong replication functionality supported HA needs. •  Easily customizable on both client and server. •  Easily deployed and managed. •  First class .NET client. Why we chose RavenDB
  • •  Raven used behind: •  NBC News and TODAY apps: Windows 8, iOS, Android, Windows Phone, XBox, Roku. •  Growing number of sections of and •  Raven usage stats: •  ~10 million docs, +1000s of new docs/day. •  10s of writes/sec. •  100s of reads/sec (after 3 layers of caching). Current* state of Raven usage
  • The details
  • •  Each doc cached as long as memory available. •  Requests include If-Modified-Since header. •  304 Not Modified response saves bandwidth. •  Aggressive caching avoids the round-trip. Tunable by ops at runtime (custom). Client-side caching
  • •  You define sharding strategy – a method. •  Raven manages storing each doc to the correct instance and fanning/merging queries. •  No auto-rebalancing of shards if you change number of instances. Raven sharding
  • •  All queries are performed against indexes. •  Indexes can be predefined or auto-created. •  Indexing/queries are executed in Lucene.NET. •  Fielded. •  Full text with built-in or custom analyzers. •  Geo-spatial. •  Map-reduce. •  Result transformers can load other docs. •  Query with LINQ or Lucene syntax. •  Indexes may be stale. Can force wait for non-stale results. (Danger! Primarily for unit tests.) •  Projections occur on server, reducing data on the wire. •  Super-cool stuff: eval patching, index scripts. Raven indexing and querying
  • •  Need indexes up to date before letting a client talk to a replica. •  Indexes are created by the client app: •  Static: CreateIndexes() at startup scans assemblies for index classes. •  Dynamic: when client issues a query. Indexing catch-22
  • •  Define new index, with no code using it. •  Deploy and allow new index to build. •  Redeploy with code using the new index. •  Redeploy after deleting old index definition. •  Delete old index on each replica. Updating a static index – a pain
  • •  If you do it by Id, it is consistent (within a single Raven server) •  Load() •  Store() •  Delete() •  Queries are only eventually consistent (“eventually” is measured in milliseconds) Consistency
  • •  Eventual consistency – replication is async in background. •  All replication is one-way and managed by source. •  Can enable transitive replication – useful for new instances. •  Set W value to ensure replication to minimum number of instances (v2.5). Or timeout. •  Client will auto-failover to replication destinations, configurable to reads only or reads and writes. Raven replication
  • •  Sequential guids. •  Unique for every write to a database. •  Used for caching in client, concurrency control, and replication. Etags
  • Source: What’s the last etag I replicated to you? Destination: 42 Source: I’m up to 49, so here’s a POST with some docs in it. Destination: Got ‘em. Source: What’s the last etag I replicated to you? Destination: 49 The replication conversation
  • •  Replication from each instance to all other instances. •  Any instance could receive writes. •  Reduce replication conflicts by forcing writes to single “master”. •  Handle conflicts in your app or with custom server bundle – in our case, “last in wins” bundle. Multi-master replication
  • •  Null Id and tag can be extracted: client generates with Hi-Lo •  Null Id received at server: guid •  Id ending in / received at server: append auto-increment integer. •  Otherwise: use the value in the object. •  Server prefix protects against edge-case failures. Id generation
  • •  Control where reads and writes go. Implemented in a custom DocumentStore wrapper. •  Control aggressive caching time. •  Deploy new instances with replication. •  Backup – but probably never restore in production. •  Copy indexes. •  Monitor with stats endpoints. Raven operations tasks
  • •  Modeling/versioning •  Replication •  Client failover •  Consistency Keep in mind… •  Concurrency control •  Indexing and updates •  Id generation •  Caching
  • • •  GitHub: •  Ayende’s blog: •  RavenDB Google group •  @RavenDB on Twitter •  Me: @jtbennett on Twitter More info on Raven
  • Questions?
  • Many thanks to: You. NoSql NOW! Huge. Rhinos: @ayende,@synhershko. Peacocks: @benlakey,@johncoder,@pkdotnet, Colin Hicks,Peter Durham,BryanWheeler.
  • 45 Main St. #220 Brooklyn, NY 11201 +1 718 625 4843