SlideShare a Scribd company logo
1 of 59
Download to read offline
NoSql at guardian.co.uk
        Matthew Wall
        Simon Willison
NoSql presentation
!
SQL
NoSql presentation
NoSql presentation
NoSql presentation
n
ot ly
Guardian journalism online: 1995
Guardian journalism online: 1999
Guardian journalism online: 2000
Guardian journalism online: 2010
Read all about it!
Web server          Web server         Web server



App bring
  I server   you NEWS!!!
                   App server          App server



                 Memcached (20Gb)




                     Oracle


         CMS                     Data feeds
Web server        Web server         Web server

            Why RDBMS?
App bring you NEWS!!!
  I server       App server       App server
      5 years ago, fewer alternatives

  Understand operations procedures
              Memcached

     Can easily recruit DBAs / devs

             Developer/ops tools
                   Oracle

 Business critical system: a safe choice
         CMS                   Data feeds
NoSql presentation
NoSql presentation
NoSql presentation
NoSql presentation
Related content from search engine
Related content from search engine




                Introduction of memcached
Related content from search engine   Big traffic spike




                Introduction of memcached
Distributed memcached

 Protects database from peak load

    Entities explicitly decached

        Queries given TTL

memcached = database supercharger
Now we have a stable “broadcast” platform

        We know how to scale it

     SQL running effectively at core

          We’ve finished, right?
Digital journalism is changing

            We can’t cover everything

        We can’t compete with everyone

Need to be “part of the web” not just “on the web”
Mutualise
the news!
Mutalisation of journalism


   Mutualised news! content
    No longer only broadcasting

      User engagement & contribution:
                journalism
                   data
                 software

         Data curation / linked data

Support engaged developers with data and APIs
Mutualised news!

Be a part of the data fabric of the internet
Mutualised news!
              Platform strategy

   Out: Release our data to the world via APIs

In: Rapidly build new functionality outside the core

  Write: Ingest, store & present arbitrary data
Mutualised news!

         Data Out

        Content API
Content API

             Delivered using Apache Solr
     Mutualised news!
          Document oriented search engine

                   Loose schema:
                records, fields, facets

               Fields can be multi-value

          Supports dynamic field generation

Can apply multiple facets in queries faster than RDBMS
Mutualised news!
Mutualised news!
Mutualised news!
Mutualised news!

       Is Solr a database?
Can perform complex queries, including full text search
     Mutualised news!
    Can filter results with facets (WHERE clause)

      ANYTHING can be a facet.Very powerful.

  On our dataset most queries are of a similar cost

             Scales very well horizontally

            Handles millions of documents
Mutualised news!
         No transactions

  Excellent for certain types of queries

       Not truly general purpose

     Schema design very important

   Search index not really persistence
Core
                             Api
   Web servers

                             Solr
    App server
                             Solr
Memcached (20Gb)
                             Solr

     rdbms         Solr
                             Solr

      M/Q                    Solr

     CMS                  Cloud, EC2
API
Mutualised news!
    Currently powering iPad app

         Site components

       External applications

           Editors tools

          More to follow
Mutualised news!

           Data In

    Application framework
Application framework

   Simple REST/ HTTP news! allows lightweight
      Mutualised framework
                   development

         Applications proxied for performance

Apps generally hosted in the cloud, hot deployment into
                      production

           No RDBMs provided for storage

             Can develop in news timeline
Core
   Apps                      Web servers

        App




                   Proxy
                              App server
        App
                           Memcached (20Gb)
        App

        App                   rdbms


        App
                              M/Q
        App
                              CMS
external hosting
 app engine etc
NoSQL for journalism
Some useful
          characteristics
• Scale down as well as up
• Support rapid production-ready prototyping:
  turn projects around in hours or days
• Handle massive traffic spikes
Desktop analysis
• Leaked BNP
  membership list
• Load postcodes to
  constituencies
  mapping in to Redis
• Generate heatmaps
  by looking up all
  12,000 postcodes
MP’s expenses
MP’s expenses




     SELECT * FROM pages WHERE
is_reviewed = 0 ORDER BY RAND()
v2 used Redis
v2 used Redis
                Set differ
  l a b ou r M            ence:
               P pages -
                         reviewed
                                  p a ge s




                           MEM BER
                     SRA ND
BigTable: Zeitgeist
Zeitgeist stores pre-
calculated results in BigTable
• Data comes in from stats system,
  comments system and OneRiot real-time
  search API
• AppEngine cron tasks populate task queues
• Task queues recalculate hotness levels
• “Live” BigTable queries are simple
  SELECT / SORT
Live debate poll




• Over a million votes cast in an hour
• Stretched limits of BigTable / AppEngine
• Sharded counter pattern to handle writes
Spreadsheets are
  NoSQL too...
Google Docs powered
    infographics
The Datablog
• Datablog was launched with no
  development involvement at all - it’s a blog,
  and a bunch of Google Docs Spreadsheets
• Retrieve data as CSV, XLS, JSON, Atom...
• “Make a copy” and run your own analysis
Mutualised news!

            Write

         Arbitrary data
Mutualised news!
Create schema free database alongside RDBMS

               Index in Solr

           Provide access in API

           Investigating: CouchDB
Core
                                                           Out
      In                           Web servers

        App                                                Solr

                   Proxy
                                   App server
        App                                                Solr
                             Memcached (20Gb)
        App                                                Solr
        App        CMS         Data feeds        Solr
                                                           Solr
        App
                                   M/Q
                                                           Solr
        App
                           rdbms         CouchDB?
external hosting                                        Cloud, EC2
 app engine etc

More Related Content

What's hot

Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersIvan Donev
 
Data Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAData Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAAndrew Morgan
 
Migrating with Debezium
Migrating with DebeziumMigrating with Debezium
Migrating with DebeziumMike Fowler
 
Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019Ivan Donev
 
Bi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stackBi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stackIvan Donev
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiCodemotion Dubai
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Dan Harvey
 
Riak at Engine Yard Cloud
Riak at Engine Yard CloudRiak at Engine Yard Cloud
Riak at Engine Yard CloudInes Sombra
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...DataStax Academy
 
20150627 bigdatala
20150627 bigdatala20150627 bigdatala
20150627 bigdatalagethue
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Аліна Шепшелей
 
The Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and ServicesThe Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and Servicesconfluent
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...buildacloud
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Jay Patel
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationPatrick Di Loreto
 
Embracing Database Diversity with Kafka and Debezium
Embracing Database Diversity with Kafka and DebeziumEmbracing Database Diversity with Kafka and Debezium
Embracing Database Diversity with Kafka and DebeziumFrank Lyaruu
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
 

What's hot (20)

Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clusters
 
Data Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAData Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEA
 
Migrating with Debezium
Migrating with DebeziumMigrating with Debezium
Migrating with Debezium
 
Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019
 
Bi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stackBi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stack
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 
Riak at Engine Yard Cloud
Riak at Engine Yard CloudRiak at Engine Yard Cloud
Riak at Engine Yard Cloud
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Couchbase 101
Couchbase 101 Couchbase 101
Couchbase 101
 
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
 
20150627 bigdatala
20150627 bigdatala20150627 bigdatala
20150627 bigdatala
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
The Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and ServicesThe Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and Services
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
 
Embracing Database Diversity with Kafka and Debezium
Embracing Database Diversity with Kafka and DebeziumEmbracing Database Diversity with Kafka and Debezium
Embracing Database Diversity with Kafka and Debezium
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
Couchbase Day
Couchbase DayCouchbase Day
Couchbase Day
 

Viewers also liked

Moving from Relational to Document Store
Moving from Relational to Document StoreMoving from Relational to Document Store
Moving from Relational to Document StoreGraham Tackley
 
Social Insights on the UK Newspaper Industry
Social Insights on the UK Newspaper IndustrySocial Insights on the UK Newspaper Industry
Social Insights on the UK Newspaper IndustryBrandwatch
 
Riak: A friendly key/value store for the web.
Riak: A friendly key/value store for the web.Riak: A friendly key/value store for the web.
Riak: A friendly key/value store for the web.codefluency
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsTodd Hoff
 
NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database OverviewSteve Min
 
Evented I/O based web servers, explained using bunnies
Evented I/O based web servers, explained using bunniesEvented I/O based web servers, explained using bunnies
Evented I/O based web servers, explained using bunniesSimon Willison
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)Kevin Weil
 
NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"Sushant Choudhary
 
What Should I Blog About?
What Should I Blog About?What Should I Blog About?
What Should I Blog About?Andrea La-Rosa
 
How to Create an Editorial Calendar for Your Blog
How to Create an Editorial Calendar for Your BlogHow to Create an Editorial Calendar for Your Blog
How to Create an Editorial Calendar for Your BlogAmanda Nagy
 
Why we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukWhy we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukGraham Tackley
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptxNaveen Kumar
 
NoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsNoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsDATAVERSITY
 
Are Content Strategists the Next Corporate Rock Stars?
Are Content Strategists the Next Corporate Rock Stars?Are Content Strategists the Next Corporate Rock Stars?
Are Content Strategists the Next Corporate Rock Stars?Mark Fidelman
 

Viewers also liked (20)

Moving from Relational to Document Store
Moving from Relational to Document StoreMoving from Relational to Document Store
Moving from Relational to Document Store
 
Neuropsicologia
NeuropsicologiaNeuropsicologia
Neuropsicologia
 
Redis at the guardian
Redis at the guardianRedis at the guardian
Redis at the guardian
 
The New Guardian
The New GuardianThe New Guardian
The New Guardian
 
Social Insights on the UK Newspaper Industry
Social Insights on the UK Newspaper IndustrySocial Insights on the UK Newspaper Industry
Social Insights on the UK Newspaper Industry
 
Riak: A friendly key/value store for the web.
Riak: A friendly key/value store for the web.Riak: A friendly key/value store for the web.
Riak: A friendly key/value store for the web.
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
 
NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database Overview
 
Talking circles
Talking circlesTalking circles
Talking circles
 
Evented I/O based web servers, explained using bunnies
Evented I/O based web servers, explained using bunniesEvented I/O based web servers, explained using bunnies
Evented I/O based web servers, explained using bunnies
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
 
NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"
 
What Should I Blog About?
What Should I Blog About?What Should I Blog About?
What Should I Blog About?
 
How to Create an Editorial Calendar for Your Blog
How to Create an Editorial Calendar for Your BlogHow to Create an Editorial Calendar for Your Blog
How to Create an Editorial Calendar for Your Blog
 
Why we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukWhy we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.uk
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptx
 
NoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsNoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture Patterns
 
Are Content Strategists the Next Corporate Rock Stars?
Are Content Strategists the Next Corporate Rock Stars?Are Content Strategists the Next Corporate Rock Stars?
Are Content Strategists the Next Corporate Rock Stars?
 
How To Win That Next Sales Presentation - @High_Spark @cliffatkinson
How To Win That Next Sales Presentation - @High_Spark @cliffatkinsonHow To Win That Next Sales Presentation - @High_Spark @cliffatkinson
How To Win That Next Sales Presentation - @High_Spark @cliffatkinson
 

Similar to NoSql presentation

Discover MongoDB Atlas and MongoDB Stitch - DEM02-S - Mexico City AWS Summit
Discover MongoDB Atlas and MongoDB Stitch - DEM02-S - Mexico City AWS SummitDiscover MongoDB Atlas and MongoDB Stitch - DEM02-S - Mexico City AWS Summit
Discover MongoDB Atlas and MongoDB Stitch - DEM02-S - Mexico City AWS SummitAmazon Web Services
 
Technology Overview
Technology OverviewTechnology Overview
Technology OverviewLiran Zelkha
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...
 Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T... Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...
Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...Amazon Web Services
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaHelen Rogers
 
Database Freedom - ADB304 - Santa Clara AWS Summit
Database Freedom - ADB304 - Santa Clara AWS SummitDatabase Freedom - ADB304 - Santa Clara AWS Summit
Database Freedom - ADB304 - Santa Clara AWS SummitAmazon Web Services
 
Getting Started with AWS Lambda and Serverless Computing
Getting Started with AWS Lambda and Serverless ComputingGetting Started with AWS Lambda and Serverless Computing
Getting Started with AWS Lambda and Serverless ComputingAmazon Web Services
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
 
Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2kartraj
 
Container Days: Architecting Modern Apps on AWS
Container Days: Architecting Modern Apps on AWSContainer Days: Architecting Modern Apps on AWS
Container Days: Architecting Modern Apps on AWSTara Walker
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Editionecobold
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessjavier ramirez
 
A Technology Backgrounder to Serverless Architecture - A Whitepaper by RapidV...
A Technology Backgrounder to Serverless Architecture - A Whitepaper by RapidV...A Technology Backgrounder to Serverless Architecture - A Whitepaper by RapidV...
A Technology Backgrounder to Serverless Architecture - A Whitepaper by RapidV...RapidValue
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionLecole Cole
 
[AWS Dev Day] 기조연설 – Olivier Klein AWS 신기술 부문 책임자, 정성권 삼성전자 수석
[AWS Dev Day] 기조연설 – Olivier Klein AWS 신기술 부문 책임자, 정성권 삼성전자 수석[AWS Dev Day] 기조연설 – Olivier Klein AWS 신기술 부문 책임자, 정성권 삼성전자 수석
[AWS Dev Day] 기조연설 – Olivier Klein AWS 신기술 부문 책임자, 정성권 삼성전자 수석Amazon Web Services Korea
 
Cost Optimization for Microsoft Workloads on AWS - AWS Transformation Day: Sa...
Cost Optimization for Microsoft Workloads on AWS - AWS Transformation Day: Sa...Cost Optimization for Microsoft Workloads on AWS - AWS Transformation Day: Sa...
Cost Optimization for Microsoft Workloads on AWS - AWS Transformation Day: Sa...Amazon Web Services
 
Introduction to Serverless Computing - OOP Munich
 Introduction to Serverless Computing - OOP Munich Introduction to Serverless Computing - OOP Munich
Introduction to Serverless Computing - OOP MunichBoaz Ziniman
 

Similar to NoSql presentation (20)

Discover MongoDB Atlas and MongoDB Stitch - DEM02-S - Mexico City AWS Summit
Discover MongoDB Atlas and MongoDB Stitch - DEM02-S - Mexico City AWS SummitDiscover MongoDB Atlas and MongoDB Stitch - DEM02-S - Mexico City AWS Summit
Discover MongoDB Atlas and MongoDB Stitch - DEM02-S - Mexico City AWS Summit
 
Technology Overview
Technology OverviewTechnology Overview
Technology Overview
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...
 Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T... Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...
Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon Elisha
 
Database Freedom - ADB304 - Santa Clara AWS Summit
Database Freedom - ADB304 - Santa Clara AWS SummitDatabase Freedom - ADB304 - Santa Clara AWS Summit
Database Freedom - ADB304 - Santa Clara AWS Summit
 
Getting Started with AWS Lambda and Serverless Computing
Getting Started with AWS Lambda and Serverless ComputingGetting Started with AWS Lambda and Serverless Computing
Getting Started with AWS Lambda and Serverless Computing
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
Understanding Database Options
Understanding Database OptionsUnderstanding Database Options
Understanding Database Options
 
Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2
 
Container Days: Architecting Modern Apps on AWS
Container Days: Architecting Modern Apps on AWSContainer Days: Architecting Modern Apps on AWS
Container Days: Architecting Modern Apps on AWS
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam Elmalak
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Edition
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverless
 
A Technology Backgrounder to Serverless Architecture - A Whitepaper by RapidV...
A Technology Backgrounder to Serverless Architecture - A Whitepaper by RapidV...A Technology Backgrounder to Serverless Architecture - A Whitepaper by RapidV...
A Technology Backgrounder to Serverless Architecture - A Whitepaper by RapidV...
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Edition
 
[AWS Dev Day] 기조연설 – Olivier Klein AWS 신기술 부문 책임자, 정성권 삼성전자 수석
[AWS Dev Day] 기조연설 – Olivier Klein AWS 신기술 부문 책임자, 정성권 삼성전자 수석[AWS Dev Day] 기조연설 – Olivier Klein AWS 신기술 부문 책임자, 정성권 삼성전자 수석
[AWS Dev Day] 기조연설 – Olivier Klein AWS 신기술 부문 책임자, 정성권 삼성전자 수석
 
Cost Optimization for Microsoft Workloads on AWS - AWS Transformation Day: Sa...
Cost Optimization for Microsoft Workloads on AWS - AWS Transformation Day: Sa...Cost Optimization for Microsoft Workloads on AWS - AWS Transformation Day: Sa...
Cost Optimization for Microsoft Workloads on AWS - AWS Transformation Day: Sa...
 
Introduction to Serverless Computing - OOP Munich
 Introduction to Serverless Computing - OOP Munich Introduction to Serverless Computing - OOP Munich
Introduction to Serverless Computing - OOP Munich
 

NoSql presentation

  • 1. NoSql at guardian.co.uk Matthew Wall Simon Willison
  • 3. !
  • 4. SQL
  • 14. Web server Web server Web server App bring I server you NEWS!!! App server App server Memcached (20Gb) Oracle CMS Data feeds
  • 15. Web server Web server Web server Why RDBMS? App bring you NEWS!!! I server App server App server 5 years ago, fewer alternatives Understand operations procedures Memcached Can easily recruit DBAs / devs Developer/ops tools Oracle Business critical system: a safe choice CMS Data feeds
  • 20. Related content from search engine
  • 21. Related content from search engine Introduction of memcached
  • 22. Related content from search engine Big traffic spike Introduction of memcached
  • 23. Distributed memcached Protects database from peak load Entities explicitly decached Queries given TTL memcached = database supercharger
  • 24. Now we have a stable “broadcast” platform We know how to scale it SQL running effectively at core We’ve finished, right?
  • 25. Digital journalism is changing We can’t cover everything We can’t compete with everyone Need to be “part of the web” not just “on the web”
  • 27. Mutalisation of journalism Mutualised news! content No longer only broadcasting User engagement & contribution: journalism data software Data curation / linked data Support engaged developers with data and APIs
  • 28. Mutualised news! Be a part of the data fabric of the internet
  • 29. Mutualised news! Platform strategy Out: Release our data to the world via APIs In: Rapidly build new functionality outside the core Write: Ingest, store & present arbitrary data
  • 30. Mutualised news! Data Out Content API
  • 31. Content API Delivered using Apache Solr Mutualised news! Document oriented search engine Loose schema: records, fields, facets Fields can be multi-value Supports dynamic field generation Can apply multiple facets in queries faster than RDBMS
  • 35. Mutualised news! Is Solr a database?
  • 36. Can perform complex queries, including full text search Mutualised news! Can filter results with facets (WHERE clause) ANYTHING can be a facet.Very powerful. On our dataset most queries are of a similar cost Scales very well horizontally Handles millions of documents
  • 37. Mutualised news! No transactions Excellent for certain types of queries Not truly general purpose Schema design very important Search index not really persistence
  • 38. Core Api Web servers Solr App server Solr Memcached (20Gb) Solr rdbms Solr Solr M/Q Solr CMS Cloud, EC2
  • 39. API Mutualised news! Currently powering iPad app Site components External applications Editors tools More to follow
  • 40. Mutualised news! Data In Application framework
  • 41. Application framework Simple REST/ HTTP news! allows lightweight Mutualised framework development Applications proxied for performance Apps generally hosted in the cloud, hot deployment into production No RDBMs provided for storage Can develop in news timeline
  • 42. Core Apps Web servers App Proxy App server App Memcached (20Gb) App App rdbms App M/Q App CMS external hosting app engine etc
  • 44. Some useful characteristics • Scale down as well as up • Support rapid production-ready prototyping: turn projects around in hours or days • Handle massive traffic spikes
  • 45. Desktop analysis • Leaked BNP membership list • Load postcodes to constituencies mapping in to Redis • Generate heatmaps by looking up all 12,000 postcodes
  • 47. MP’s expenses SELECT * FROM pages WHERE is_reviewed = 0 ORDER BY RAND()
  • 49. v2 used Redis Set differ l a b ou r M ence: P pages - reviewed p a ge s MEM BER SRA ND
  • 51. Zeitgeist stores pre- calculated results in BigTable • Data comes in from stats system, comments system and OneRiot real-time search API • AppEngine cron tasks populate task queues • Task queues recalculate hotness levels • “Live” BigTable queries are simple SELECT / SORT
  • 52. Live debate poll • Over a million votes cast in an hour • Stretched limits of BigTable / AppEngine • Sharded counter pattern to handle writes
  • 53. Spreadsheets are NoSQL too...
  • 54. Google Docs powered infographics
  • 56. • Datablog was launched with no development involvement at all - it’s a blog, and a bunch of Google Docs Spreadsheets • Retrieve data as CSV, XLS, JSON, Atom... • “Make a copy” and run your own analysis
  • 57. Mutualised news! Write Arbitrary data
  • 58. Mutualised news! Create schema free database alongside RDBMS Index in Solr Provide access in API Investigating: CouchDB
  • 59. Core Out In Web servers App Solr Proxy App server App Solr Memcached (20Gb) App Solr App CMS Data feeds Solr Solr App M/Q Solr App rdbms CouchDB? external hosting Cloud, EC2 app engine etc