NoSql NOW! 2013
Delivering big content at NBC News
with RavenDB
A quick tour
•  Schema-less document database with RESTful API.
•  Fully ACID and all writes saved to disk (ESENT).
•  Indexing/queries...
•  .NET client provided. Third-party clients exist for
JavaScript, PHP, and Ruby.
•  Wraps HTTP API.
•  Provides client-si...
•  Open source: http://github.com/ravendb/ravendb
•  License is AGPL (free) or commercial (paid).
•  Exception: Your proje...
Demo
Why RavenDB?
•  Includes nbcnews.com, today.com and more.
•  1.2 billion pageviews/month.
•  140 million video streams/month.
•  58 mil...
•  Very fast page load required
•  “Instant” publish time required
•  6 to 8 code deployments each day
•  High availabilit...
High availability
is when the answer to:
“What’s the longest outage
before you wind up
in your boss’s office?”
is < 5 seco...
Credit: Mitch Canter @studionashvegas http://twitpic.com/z13bw
•  Rolling deployments and rollbacks.
•  Apps and services decoupled physically and temporally.
•  Designed for both auto-...
•  Data schema can evolve rapidly
•  Apps shouldn’t know where data is
•  Apps should talk to the closest data replica
•  ...
•  Schema-less document database allows rapid change.
•  Fully ACID model fit business needs.
•  Strong replication functi...
•  Raven used behind:
•  NBC News and TODAY apps: Windows 8, iOS,
Android, Windows Phone, XBox, Roku.
•  Growing number of...
The details
•  Each doc cached as long as memory available.
•  Requests include If-Modified-Since header.
•  304 Not Modified response...
•  You define sharding strategy – a method.
•  Raven manages storing each doc to the correct instance
and fanning/merging ...
•  All queries are performed against indexes.
•  Indexes can be predefined or auto-created.
•  Indexing/queries are execut...
•  Need indexes up to date before letting a client talk to a
replica.
•  Indexes are created by the client app:
•  Static:...
•  Define new index, with no code using it.
•  Deploy and allow new index to build.
•  Redeploy with code using the new in...
•  If you do it by Id, it is consistent (within a single Raven
server)
•  Load()
•  Store()
•  Delete()
•  Queries are onl...
•  Eventual consistency – replication is async in background.
•  All replication is one-way and managed by source.
•  Can ...
•  Sequential guids.
•  Unique for every write to a database.
•  Used for caching in client, concurrency control, and
repl...
Source: What’s the last etag I replicated to you?
Destination: 42
Source: I’m up to 49, so here’s a POST with some docs in...
•  Replication from each instance to all other instances.
•  Any instance could receive writes.
•  Reduce replication conf...
•  Null Id and tag can be extracted:
client generates with Hi-Lo
•  Null Id received at server: guid
•  Id ending in / rec...
•  Control where reads and writes go. Implemented in a
custom DocumentStore wrapper.
•  Control aggressive caching time.
•...
•  Modeling/versioning
•  Replication
•  Client failover
•  Consistency
Keep in mind…
•  Concurrency control
•  Indexing a...
•  http://ravendb.net
•  GitHub: http://github.com/ravendb
•  Ayende’s blog: http://ayende.com
•  RavenDB Google group
•  ...
Questions?
Many thanks to:
You.
NoSql NOW!
Huge.
Rhinos:
@ayende,@synhershko.
Peacocks:
@benlakey,@johncoder,@pkdotnet,
Colin Hicks,P...
hugeinc.com
info@hugeinc.com
45 Main St. #220 Brooklyn, NY 11201
+1 718 625 4843
Delivering big content at NBC News with RavenDB
Upcoming SlideShare
Loading in...5
×

Delivering big content at NBC News with RavenDB

5,469

Published on

RavenDB is a schema-less document database that offers fully ACID transactions, fast and flexible search, replication, sharding, and a simple RESTful API wrapped by clients in a growing number of languages. In this session, we will discuss the experience of developing and maintaining a RavenDB-backed CMS for one of the largest news sites in the US.

We'll cover:
- Supporting rapid evolution of the content/data model.
- Indexing for full-text, map-reduce, geospatial and other types of search.
- Replicating and sharding across servers and data centers for high-availability.
- Deploying with no downtime.
- Handling huge traffic spikes.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
5,469
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Delivering big content at NBC News with RavenDB

  1. 1. NoSql NOW! 2013 Delivering big content at NBC News with RavenDB
  2. 2. A quick tour
  3. 3. •  Schema-less document database with RESTful API. •  Fully ACID and all writes saved to disk (ESENT). •  Indexing/queries executed with Lucene.NET. •  Easily extended with custom logic using “bundles”. •  Management UI provided in Silverlight. •  Host as Windows Service, IIS app, or embedded in your app. Raven server
  4. 4. •  .NET client provided. Third-party clients exist for JavaScript, PHP, and Ruby. •  Wraps HTTP API. •  Provides client-side caching, change notification, LINQ querying. •  Easily extended with many, many hooks into almost all operations. Raven client
  5. 5. •  Open source: http://github.com/ravendb/ravendb •  License is AGPL (free) or commercial (paid). •  Exception: Your project can use any OSI-approved license and still use Raven for free. •  Commercial licenses based on max parallelism and RAM. •  Windows clustering support and storage compression/ encryption available with Enterprise license only. Raven licensing
  6. 6. Demo
  7. 7. Why RavenDB?
  8. 8. •  Includes nbcnews.com, today.com and more. •  1.2 billion pageviews/month. •  140 million video streams/month. •  58 million unique users/month. •  Traffic spikes up to 100x normal when big news events happen. NBC News Digital network
  9. 9. •  Very fast page load required •  “Instant” publish time required •  6 to 8 code deployments each day •  High availability: zero* downtime allowed One of the largest US news sites
  10. 10. High availability is when the answer to: “What’s the longest outage before you wind up in your boss’s office?” is < 5 seconds.
  11. 11. Credit: Mitch Canter @studionashvegas http://twitpic.com/z13bw
  12. 12. •  Rolling deployments and rollbacks. •  Apps and services decoupled physically and temporally. •  Designed for both auto-failover/recovery and manual reconfiguration by ops. •  Seamless scale out by adding instances of any process. •  And more… Some prerequisites for HA
  13. 13. •  Data schema can evolve rapidly •  Apps shouldn’t know where data is •  Apps should talk to the closest data replica •  Apps should automatically find a new replica if the closest becomes unavailable •  Ops can add/remove replicas quickly and easily, without affecting any running apps HA data: a private data cloud
  14. 14. •  Schema-less document database allows rapid change. •  Fully ACID model fit business needs. •  Strong replication functionality supported HA needs. •  Easily customizable on both client and server. •  Easily deployed and managed. •  First class .NET client. Why we chose RavenDB
  15. 15. •  Raven used behind: •  NBC News and TODAY apps: Windows 8, iOS, Android, Windows Phone, XBox, Roku. •  Growing number of sections of nbcnews.com and today.com. •  Raven usage stats: •  ~10 million docs, +1000s of new docs/day. •  10s of writes/sec. •  100s of reads/sec (after 3 layers of caching). Current* state of Raven usage
  16. 16. The details
  17. 17. •  Each doc cached as long as memory available. •  Requests include If-Modified-Since header. •  304 Not Modified response saves bandwidth. •  Aggressive caching avoids the round-trip. Tunable by ops at runtime (custom). Client-side caching
  18. 18. •  You define sharding strategy – a method. •  Raven manages storing each doc to the correct instance and fanning/merging queries. •  No auto-rebalancing of shards if you change number of instances. Raven sharding
  19. 19. •  All queries are performed against indexes. •  Indexes can be predefined or auto-created. •  Indexing/queries are executed in Lucene.NET. •  Fielded. •  Full text with built-in or custom analyzers. •  Geo-spatial. •  Map-reduce. •  Result transformers can load other docs. •  Query with LINQ or Lucene syntax. •  Indexes may be stale. Can force wait for non-stale results. (Danger! Primarily for unit tests.) •  Projections occur on server, reducing data on the wire. •  Super-cool stuff: eval patching, index scripts. Raven indexing and querying
  20. 20. •  Need indexes up to date before letting a client talk to a replica. •  Indexes are created by the client app: •  Static: CreateIndexes() at startup scans assemblies for index classes. •  Dynamic: when client issues a query. Indexing catch-22
  21. 21. •  Define new index, with no code using it. •  Deploy and allow new index to build. •  Redeploy with code using the new index. •  Redeploy after deleting old index definition. •  Delete old index on each replica. Updating a static index – a pain
  22. 22. •  If you do it by Id, it is consistent (within a single Raven server) •  Load() •  Store() •  Delete() •  Queries are only eventually consistent (“eventually” is measured in milliseconds) Consistency
  23. 23. •  Eventual consistency – replication is async in background. •  All replication is one-way and managed by source. •  Can enable transitive replication – useful for new instances. •  Set W value to ensure replication to minimum number of instances (v2.5). Or timeout. •  Client will auto-failover to replication destinations, configurable to reads only or reads and writes. Raven replication
  24. 24. •  Sequential guids. •  Unique for every write to a database. •  Used for caching in client, concurrency control, and replication. Etags
  25. 25. Source: What’s the last etag I replicated to you? Destination: 42 Source: I’m up to 49, so here’s a POST with some docs in it. Destination: Got ‘em. Source: What’s the last etag I replicated to you? Destination: 49 The replication conversation
  26. 26. •  Replication from each instance to all other instances. •  Any instance could receive writes. •  Reduce replication conflicts by forcing writes to single “master”. •  Handle conflicts in your app or with custom server bundle – in our case, “last in wins” bundle. Multi-master replication
  27. 27. •  Null Id and tag can be extracted: client generates with Hi-Lo •  Null Id received at server: guid •  Id ending in / received at server: append auto-increment integer. •  Otherwise: use the value in the object. •  Server prefix protects against edge-case failures. Id generation
  28. 28. •  Control where reads and writes go. Implemented in a custom DocumentStore wrapper. •  Control aggressive caching time. •  Deploy new instances with replication. •  Backup – but probably never restore in production. •  Copy indexes. •  Monitor with stats endpoints. Raven operations tasks
  29. 29. •  Modeling/versioning •  Replication •  Client failover •  Consistency Keep in mind… •  Concurrency control •  Indexing and updates •  Id generation •  Caching
  30. 30. •  http://ravendb.net •  GitHub: http://github.com/ravendb •  Ayende’s blog: http://ayende.com •  RavenDB Google group •  @RavenDB on Twitter •  Me: @jtbennett on Twitter More info on Raven
  31. 31. Questions?
  32. 32. Many thanks to: You. NoSql NOW! Huge. Rhinos: @ayende,@synhershko. Peacocks: @benlakey,@johncoder,@pkdotnet, Colin Hicks,Peter Durham,BryanWheeler.
  33. 33. hugeinc.com info@hugeinc.com 45 Main St. #220 Brooklyn, NY 11201 +1 718 625 4843
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×