NoSql presentation
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


NoSql presentation



Presentation given at NoSql EU conference describing architectures past, present & future for

Presentation given at NoSql EU conference describing architectures past, present & future for



Total Views
Views on SlideShare
Embed Views



34 Embeds 10,392 3815 2795 2353 746 247 93 84 78 60 41 14 8 7 7 7 4 4 3
http://www.guprod.gnl 3 3 3 2 2 2 2 1 1 1
http://mbot-2.local 1 1 1 1 1 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

NoSql presentation Presentation Transcript

  • 1. NoSql at Matthew Wall Simon Willison
  • 2. !
  • 3. SQL
  • 4. n ot ly
  • 5. Guardian journalism online: 1995
  • 6. Guardian journalism online: 1999
  • 7. Guardian journalism online: 2000
  • 8. Guardian journalism online: 2010
  • 9. Read all about it!
  • 10. Web server Web server Web server App bring I server you NEWS!!! App server App server Memcached (20Gb) Oracle CMS Data feeds
  • 11. Web server Web server Web server Why RDBMS? App bring you NEWS!!! I server App server App server 5 years ago, fewer alternatives Understand operations procedures Memcached Can easily recruit DBAs / devs Developer/ops tools Oracle Business critical system: a safe choice CMS Data feeds
  • 12. Related content from search engine
  • 13. Related content from search engine Introduction of memcached
  • 14. Related content from search engine Big traffic spike Introduction of memcached
  • 15. Distributed memcached Protects database from peak load Entities explicitly decached Queries given TTL memcached = database supercharger
  • 16. Now we have a stable “broadcast” platform We know how to scale it SQL running effectively at core We’ve finished, right?
  • 17. Digital journalism is changing We can’t cover everything We can’t compete with everyone Need to be “part of the web” not just “on the web”
  • 18. Mutualise the news!
  • 19. Mutalisation of journalism Mutualised news! content No longer only broadcasting User engagement & contribution: journalism data software Data curation / linked data Support engaged developers with data and APIs
  • 20. Mutualised news! Be a part of the data fabric of the internet
  • 21. Mutualised news! Platform strategy Out: Release our data to the world via APIs In: Rapidly build new functionality outside the core Write: Ingest, store & present arbitrary data
  • 22. Mutualised news! Data Out Content API
  • 23. Content API Delivered using Apache Solr Mutualised news! Document oriented search engine Loose schema: records, fields, facets Fields can be multi-value Supports dynamic field generation Can apply multiple facets in queries faster than RDBMS
  • 24. Mutualised news!
  • 25. Mutualised news!
  • 26. Mutualised news!
  • 27. Mutualised news! Is Solr a database?
  • 28. Can perform complex queries, including full text search Mutualised news! Can filter results with facets (WHERE clause) ANYTHING can be a facet.Very powerful. On our dataset most queries are of a similar cost Scales very well horizontally Handles millions of documents
  • 29. Mutualised news! No transactions Excellent for certain types of queries Not truly general purpose Schema design very important Search index not really persistence
  • 30. Core Api Web servers Solr App server Solr Memcached (20Gb) Solr rdbms Solr Solr M/Q Solr CMS Cloud, EC2
  • 31. API Mutualised news! Currently powering iPad app Site components External applications Editors tools More to follow
  • 32. Mutualised news! Data In Application framework
  • 33. Application framework Simple REST/ HTTP news! allows lightweight Mutualised framework development Applications proxied for performance Apps generally hosted in the cloud, hot deployment into production No RDBMs provided for storage Can develop in news timeline
  • 34. Core Apps Web servers App Proxy App server App Memcached (20Gb) App App rdbms App M/Q App CMS external hosting app engine etc
  • 35. NoSQL for journalism
  • 36. Some useful characteristics • Scale down as well as up • Support rapid production-ready prototyping: turn projects around in hours or days • Handle massive traffic spikes
  • 37. Desktop analysis • Leaked BNP membership list • Load postcodes to constituencies mapping in to Redis • Generate heatmaps by looking up all 12,000 postcodes
  • 38. MP’s expenses
  • 39. MP’s expenses SELECT * FROM pages WHERE is_reviewed = 0 ORDER BY RAND()
  • 40. v2 used Redis
  • 41. v2 used Redis Set differ l a b ou r M ence: P pages - reviewed p a ge s MEM BER SRA ND
  • 42. BigTable: Zeitgeist
  • 43. Zeitgeist stores pre- calculated results in BigTable • Data comes in from stats system, comments system and OneRiot real-time search API • AppEngine cron tasks populate task queues • Task queues recalculate hotness levels • “Live” BigTable queries are simple SELECT / SORT
  • 44. Live debate poll • Over a million votes cast in an hour • Stretched limits of BigTable / AppEngine • Sharded counter pattern to handle writes
  • 45. Spreadsheets are NoSQL too...
  • 46. Google Docs powered infographics
  • 47. The Datablog
  • 48. • Datablog was launched with no development involvement at all - it’s a blog, and a bunch of Google Docs Spreadsheets • Retrieve data as CSV, XLS, JSON, Atom... • “Make a copy” and run your own analysis
  • 49. Mutualised news! Write Arbitrary data
  • 50. Mutualised news! Create schema free database alongside RDBMS Index in Solr Provide access in API Investigating: CouchDB
  • 51. Core Out In Web servers App Solr Proxy App server App Solr Memcached (20Gb) App Solr App CMS Data feeds Solr Solr App M/Q Solr App rdbms CouchDB? external hosting Cloud, EC2 app engine etc