Delivering big content at NBC News with RavenDB
Upcoming SlideShare
Loading in...5

Delivering big content at NBC News with RavenDB



RavenDB is a schema-less document database that offers fully ACID transactions, fast and flexible search, replication, sharding, and a simple RESTful API wrapped by clients in a growing number of ...

RavenDB is a schema-less document database that offers fully ACID transactions, fast and flexible search, replication, sharding, and a simple RESTful API wrapped by clients in a growing number of languages. In this session, we will discuss the experience of developing and maintaining a RavenDB-backed CMS for one of the largest news sites in the US.

We'll cover:
- Supporting rapid evolution of the content/data model.
- Indexing for full-text, map-reduce, geospatial and other types of search.
- Replicating and sharding across servers and data centers for high-availability.
- Deploying with no downtime.
- Handling huge traffic spikes.



Total Views
Views on SlideShare
Embed Views



1 Embed 2 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Delivering big content at NBC News with RavenDB Delivering big content at NBC News with RavenDB Presentation Transcript

  • NoSql NOW! 2013 Delivering big content at NBC News with RavenDB
  • A quick tour
  • •  Schema-less document database with RESTful API. •  Fully ACID and all writes saved to disk (ESENT). •  Indexing/queries executed with Lucene.NET. •  Easily extended with custom logic using “bundles”. •  Management UI provided in Silverlight. •  Host as Windows Service, IIS app, or embedded in your app. Raven server
  • •  .NET client provided. Third-party clients exist for JavaScript, PHP, and Ruby. •  Wraps HTTP API. •  Provides client-side caching, change notification, LINQ querying. •  Easily extended with many, many hooks into almost all operations. Raven client
  • •  Open source: •  License is AGPL (free) or commercial (paid). •  Exception: Your project can use any OSI-approved license and still use Raven for free. •  Commercial licenses based on max parallelism and RAM. •  Windows clustering support and storage compression/ encryption available with Enterprise license only. Raven licensing
  • Demo
  • Why RavenDB?
  • •  Includes, and more. •  1.2 billion pageviews/month. •  140 million video streams/month. •  58 million unique users/month. •  Traffic spikes up to 100x normal when big news events happen. NBC News Digital network
  • •  Very fast page load required •  “Instant” publish time required •  6 to 8 code deployments each day •  High availability: zero* downtime allowed One of the largest US news sites
  • High availability is when the answer to: “What’s the longest outage before you wind up in your boss’s office?” is < 5 seconds.
  • Credit: Mitch Canter @studionashvegas
  • •  Rolling deployments and rollbacks. •  Apps and services decoupled physically and temporally. •  Designed for both auto-failover/recovery and manual reconfiguration by ops. •  Seamless scale out by adding instances of any process. •  And more… Some prerequisites for HA
  • •  Data schema can evolve rapidly •  Apps shouldn’t know where data is •  Apps should talk to the closest data replica •  Apps should automatically find a new replica if the closest becomes unavailable •  Ops can add/remove replicas quickly and easily, without affecting any running apps HA data: a private data cloud
  • •  Schema-less document database allows rapid change. •  Fully ACID model fit business needs. •  Strong replication functionality supported HA needs. •  Easily customizable on both client and server. •  Easily deployed and managed. •  First class .NET client. Why we chose RavenDB
  • •  Raven used behind: •  NBC News and TODAY apps: Windows 8, iOS, Android, Windows Phone, XBox, Roku. •  Growing number of sections of and •  Raven usage stats: •  ~10 million docs, +1000s of new docs/day. •  10s of writes/sec. •  100s of reads/sec (after 3 layers of caching). Current* state of Raven usage
  • The details
  • •  Each doc cached as long as memory available. •  Requests include If-Modified-Since header. •  304 Not Modified response saves bandwidth. •  Aggressive caching avoids the round-trip. Tunable by ops at runtime (custom). Client-side caching
  • •  You define sharding strategy – a method. •  Raven manages storing each doc to the correct instance and fanning/merging queries. •  No auto-rebalancing of shards if you change number of instances. Raven sharding
  • •  All queries are performed against indexes. •  Indexes can be predefined or auto-created. •  Indexing/queries are executed in Lucene.NET. •  Fielded. •  Full text with built-in or custom analyzers. •  Geo-spatial. •  Map-reduce. •  Result transformers can load other docs. •  Query with LINQ or Lucene syntax. •  Indexes may be stale. Can force wait for non-stale results. (Danger! Primarily for unit tests.) •  Projections occur on server, reducing data on the wire. •  Super-cool stuff: eval patching, index scripts. Raven indexing and querying
  • •  Need indexes up to date before letting a client talk to a replica. •  Indexes are created by the client app: •  Static: CreateIndexes() at startup scans assemblies for index classes. •  Dynamic: when client issues a query. Indexing catch-22
  • •  Define new index, with no code using it. •  Deploy and allow new index to build. •  Redeploy with code using the new index. •  Redeploy after deleting old index definition. •  Delete old index on each replica. Updating a static index – a pain
  • •  If you do it by Id, it is consistent (within a single Raven server) •  Load() •  Store() •  Delete() •  Queries are only eventually consistent (“eventually” is measured in milliseconds) Consistency
  • •  Eventual consistency – replication is async in background. •  All replication is one-way and managed by source. •  Can enable transitive replication – useful for new instances. •  Set W value to ensure replication to minimum number of instances (v2.5). Or timeout. •  Client will auto-failover to replication destinations, configurable to reads only or reads and writes. Raven replication
  • •  Sequential guids. •  Unique for every write to a database. •  Used for caching in client, concurrency control, and replication. Etags
  • Source: What’s the last etag I replicated to you? Destination: 42 Source: I’m up to 49, so here’s a POST with some docs in it. Destination: Got ‘em. Source: What’s the last etag I replicated to you? Destination: 49 The replication conversation
  • •  Replication from each instance to all other instances. •  Any instance could receive writes. •  Reduce replication conflicts by forcing writes to single “master”. •  Handle conflicts in your app or with custom server bundle – in our case, “last in wins” bundle. Multi-master replication
  • •  Null Id and tag can be extracted: client generates with Hi-Lo •  Null Id received at server: guid •  Id ending in / received at server: append auto-increment integer. •  Otherwise: use the value in the object. •  Server prefix protects against edge-case failures. Id generation
  • •  Control where reads and writes go. Implemented in a custom DocumentStore wrapper. •  Control aggressive caching time. •  Deploy new instances with replication. •  Backup – but probably never restore in production. •  Copy indexes. •  Monitor with stats endpoints. Raven operations tasks
  • •  Modeling/versioning •  Replication •  Client failover •  Consistency Keep in mind… •  Concurrency control •  Indexing and updates •  Id generation •  Caching
  • • •  GitHub: •  Ayende’s blog: •  RavenDB Google group •  @RavenDB on Twitter •  Me: @jtbennett on Twitter More info on Raven
  • Questions?
  • Many thanks to: You. NoSql NOW! Huge. Rhinos: @ayende,@synhershko. Peacocks: @benlakey,@johncoder,@pkdotnet, Colin Hicks,Peter Durham,BryanWheeler.
  • 45 Main St. #220 Brooklyn, NY 11201 +1 718 625 4843