SlideShare a Scribd company logo
1 of 29
Riak Search
Performance Wins
 How we got > 100x improvement
            in query throughput


            Gary Flake, Founder
           gary@clipboard.com
Demo


       Introduction
Architecture
                             web-01                       web-02                web-03
                          Node.js + Nginx              Node.js + Nginx       Node.js + Nginx




                      riak-01

                                                          cache-01                redis-01

  riak-05                                 riak-02
                                                          cache-02                redis-02


                                                          cache-03

            riak-04             riak-03
                                                                                                 admin-01




                      thumb-01              thumb-02                     job-01         job-02
Riak

An awesome noSQL data store:

• Super easy to scale up AND down
• Fault tolerant – no SPoF
• Flexible schema
• Full-text search out of the box
• Can be fixed and improved in Erlang (the
  Basho folks awesomely take our commits)
Riak – Basics

• Data in Riak is grouped buckets
  (effectively namespaces)
• Basic operations are:
    •   Get, save, delete, search, map, reduce
• Eventual consistency managed through
  N, R, and W bucket parameters.
• Everything we put in Riak is JSON
• We talk to Riak through the excellent riak-js
  node library by Francisco Treacy
Data Model – Clips
           title                  ctime
                                          domain

 author




mentions           annotation   tags
Data Model - Clips
Clips are the gateway to all of our data

                   <html>         Comments on Clip ‘abc’
                      …                  “F1rst”

                   </html>
 key: abc           Blob              “Nice clip yo!”


                                  “Saw this on Reddit…”
   Clip            Key: abc



                Comment Cache
Other Buckets

• Users
• Blobs
• Comments
• Templates
• Counts
• Search Caches
• Transactions
Riak Search

• Gets many things out of Riak by something
  other than the primary key.
• You specify a schema (the types for the
  field within a JSON object).
• Works great but with one big gotcha:
  – Index is uses term-based partitioning instead
    of document-based partitioning
  – Implication: joins + sort + pagination sucks
  – We know how to work around this
Riak Search – Querying

• Query syntax based on Lucene
• Basic Query
   text:funny
• Compound Query
   login:greg OR (login:gary AND tags:riak)
• Range Query
   ctime:[98685879630026 TO 98686484430026]
Clipboard App Flow
      Client                           node.js                           Riak
            Go to clipboard.com/home
                                                  Search clips bucket
                                                   query = login:greg

                                                     Top 20 results
                  Top 20 results
    start
rendering
                  (For each clip)
               API Request for blob
                                                 GET from blobs bucket

               Return blob to client
  render
    blob
Clipboard Queries


                 login:greg



               mentions:greg



  ctime:[98685879630026 TO 98686484430026]

                                             (Search)
Clipboard Queries cont.



            login:greg AND tags:riak




  login:greg AND text:node AND text:javascript


                                                 (Search)
Uh oh


               login:greg AND private:false
  Matches only my clips           Matches 20% of all clips!




                login:greg AND text:iPhone



                                                              (Search)
Index Partitioning Schemes
Doc Partition Query Processing

1. x AND y (sort z, start = 990, count = 10)
2. On Each node:
    1. Perform x AND y
    2. Sort on z
    3. Slice [ 0 .. 1000 ]
    4. Send to aggregator
3. On aggregator
    1. Merge all results (N x 1000)
    2. Slice [ 990 .. 1000 ]
Term Partition Query Processing

1. x AND y (sort z, start = 990, count = 10)
2. On x node: search for x (and send all)
3. On y node: search for y (and send all)
4. On aggregator:
    1. Do x AND y
    2. Sort on z
    3. Slice to [ 990 .. 1000 ]
Riak Search Issues

1. For any singular term, all results must be
   sent back to aggregator.
2. Incorrectly performs sort and slice (does
   sort then slice)
3. ANDs take time O(MAX(|x|, |y|)) instead
   of O(MIN(|x|, |y|).
4. All matches must be read to get sort field.
Riak Search Fixes

1. Inline fields for short and common
   attributes.
2. Dynamic fields for precomputed ANDs.
3. PRESORT option for sorting without
   document reads.
Inline Fields

Nifty feature added recently to Riak Search


Fields only used to prune result set can be
made inline for a big perf win


Normal query applied first – then results filtered
quickly with inline “filter” query


High storage cost – only viable for small fields!

                                               (Search)
Riak Search – Inline Fields cont.


             login:greg AND private:false

                       becomes
                   Query - login:greg
              Filter Query – private:false

 private:false is efficiently applied only to results of
 login:greg. Hooray!
                                                       (Search)
Fixing ANDs

But what about login:greg AND text:iPhone?



text field is too large to inline!



We had to get creative.


                                         (Search)
Dynamic Fields
Our Solution: Create a new field - text_u
   (u for user)


Values in text_u have the user’s name appended


In greg’s clip
 text:iPhone  text_greg:iPhone
In bob’s clip
 text:iPhone  text_bob:iPhone

                                            (Search)
Presort on Keys

• Our addition to Riak code base.
• Does sort before slice
• If PRESORT=key, then never reads the docs
• Tremendous win (> 100x compared to M/R
  approaches)
Clip Keys

<Time (ms)><User (guid)><SHA1 of Value>


• Base-64 encode each component
• Only use first 4 characters of user & content
• Only 16 bytes


Collisions? 1 in 17M if clipped the same thing
at same time.
Our Query Processing

1. w AND (x AND y)
   (sort z, start = 990, count = 10)
2. On w_x node: search and send w_x
3. On w_y node: search and send all w_y
4. On aggregator:
    1. Do w_x AND w_y
    2. Sort on z
    3. Slice to [ 990 .. 1000 ]
Summary

• Use inline fields for short and common bits
• Use dynamic fields for prebuilt ANDs
• Use keys that imply sort order
• Use same techniques for pagination


• Out approach yields search throughput
  that is 100x better than out of the box (and
  better as you scale outward).
Questions?
We’re hiring!


       www.clipboard.com/register
          Invitation Code: just4u


        www.clipboard.com/jobs
         Or talk to us right now!



                                    Thanks!

More Related Content

What's hot

TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADRTweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
Lucidworks
 

What's hot (20)

Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
 
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGMUsing JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
 
Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014
 
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADRTweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
 
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Scala profiling
Scala profilingScala profiling
Scala profiling
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!
 
Introduction to datomic
Introduction to datomicIntroduction to datomic
Introduction to datomic
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 

Viewers also liked

презентация1
презентация1презентация1
презентация1
Danil Kozlov
 
Bunny booktemplate1
Bunny booktemplate1Bunny booktemplate1
Bunny booktemplate1
mjbeichner
 
Voting presentation
Voting presentationVoting presentation
Voting presentation
hannahfenney
 
Maritime New Haven - Sound School
Maritime New Haven - Sound SchoolMaritime New Haven - Sound School
Maritime New Haven - Sound School
Amy Durbin
 
Android vs ios
Android vs iosAndroid vs ios
Android vs ios
gndolf
 
Forefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas esForefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas es
Fitira
 

Viewers also liked (20)

Riak 2.0 : For Beginners, and Everyone Else
Riak 2.0 : For Beginners, and Everyone ElseRiak 2.0 : For Beginners, and Everyone Else
Riak 2.0 : For Beginners, and Everyone Else
 
Leon fagan
Leon faganLeon fagan
Leon fagan
 
Proyecto fredy-jaramillo extenzo
Proyecto fredy-jaramillo extenzoProyecto fredy-jaramillo extenzo
Proyecto fredy-jaramillo extenzo
 
Plt process (category products)
Plt process (category products)Plt process (category products)
Plt process (category products)
 
Data data every where!! Thomas O'Grady
Data data every where!! Thomas O'GradyData data every where!! Thomas O'Grady
Data data every where!! Thomas O'Grady
 
26 28
26 2826 28
26 28
 
презентация1
презентация1презентация1
презентация1
 
Bunny booktemplate1
Bunny booktemplate1Bunny booktemplate1
Bunny booktemplate1
 
Voting presentation
Voting presentationVoting presentation
Voting presentation
 
Maritime New Haven - Sound School
Maritime New Haven - Sound SchoolMaritime New Haven - Sound School
Maritime New Haven - Sound School
 
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
 
Digipak analysis
Digipak analysisDigipak analysis
Digipak analysis
 
Marketing Management
Marketing ManagementMarketing Management
Marketing Management
 
Android vs ios
Android vs iosAndroid vs ios
Android vs ios
 
Forefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas esForefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas es
 
Simplify and run your development environments with Vagrant on OpenStack
Simplify and run your development environments with Vagrant on OpenStackSimplify and run your development environments with Vagrant on OpenStack
Simplify and run your development environments with Vagrant on OpenStack
 
The Poker Entrepreneurship: Speaking @ JFDI.Asia
The Poker Entrepreneurship: Speaking @ JFDI.AsiaThe Poker Entrepreneurship: Speaking @ JFDI.Asia
The Poker Entrepreneurship: Speaking @ JFDI.Asia
 
P m01 inside_selling
P m01 inside_sellingP m01 inside_selling
P m01 inside_selling
 
Digital Audio/Podcast Assignment
Digital Audio/Podcast AssignmentDigital Audio/Podcast Assignment
Digital Audio/Podcast Assignment
 
P m01 inside_selling
P m01 inside_sellingP m01 inside_selling
P m01 inside_selling
 

Similar to Riak perf wins

Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
MongoDB APAC
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
bartzon
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slides
Matt Kocubinski
 

Similar to Riak perf wins (20)

Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Adding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of TricksAdding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of Tricks
 
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
 
Tuning Flink For Robustness And Performance
Tuning Flink For Robustness And PerformanceTuning Flink For Robustness And Performance
Tuning Flink For Robustness And Performance
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross Lawley
 
遇見 Ruby on Rails
遇見 Ruby on Rails遇見 Ruby on Rails
遇見 Ruby on Rails
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Fluent 2012 v2
Fluent 2012   v2Fluent 2012   v2
Fluent 2012 v2
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
 
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time CompilationThe Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
 
Autogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQLAutogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQL
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slides
 
Let's Get to the Rapids
Let's Get to the RapidsLet's Get to the Rapids
Let's Get to the Rapids
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
Monitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixMonitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with Zabbix
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Riak perf wins

  • 1. Riak Search Performance Wins How we got > 100x improvement in query throughput Gary Flake, Founder gary@clipboard.com
  • 2. Demo Introduction
  • 3. Architecture web-01 web-02 web-03 Node.js + Nginx Node.js + Nginx Node.js + Nginx riak-01 cache-01 redis-01 riak-05 riak-02 cache-02 redis-02 cache-03 riak-04 riak-03 admin-01 thumb-01 thumb-02 job-01 job-02
  • 4. Riak An awesome noSQL data store: • Super easy to scale up AND down • Fault tolerant – no SPoF • Flexible schema • Full-text search out of the box • Can be fixed and improved in Erlang (the Basho folks awesomely take our commits)
  • 5. Riak – Basics • Data in Riak is grouped buckets (effectively namespaces) • Basic operations are: • Get, save, delete, search, map, reduce • Eventual consistency managed through N, R, and W bucket parameters. • Everything we put in Riak is JSON • We talk to Riak through the excellent riak-js node library by Francisco Treacy
  • 6. Data Model – Clips title ctime domain author mentions annotation tags
  • 7. Data Model - Clips Clips are the gateway to all of our data <html> Comments on Clip ‘abc’ … “F1rst” </html> key: abc Blob “Nice clip yo!” “Saw this on Reddit…” Clip Key: abc Comment Cache
  • 8. Other Buckets • Users • Blobs • Comments • Templates • Counts • Search Caches • Transactions
  • 9. Riak Search • Gets many things out of Riak by something other than the primary key. • You specify a schema (the types for the field within a JSON object). • Works great but with one big gotcha: – Index is uses term-based partitioning instead of document-based partitioning – Implication: joins + sort + pagination sucks – We know how to work around this
  • 10. Riak Search – Querying • Query syntax based on Lucene • Basic Query text:funny • Compound Query login:greg OR (login:gary AND tags:riak) • Range Query ctime:[98685879630026 TO 98686484430026]
  • 11. Clipboard App Flow Client node.js Riak Go to clipboard.com/home Search clips bucket query = login:greg Top 20 results Top 20 results start rendering (For each clip) API Request for blob GET from blobs bucket Return blob to client render blob
  • 12. Clipboard Queries login:greg mentions:greg ctime:[98685879630026 TO 98686484430026] (Search)
  • 13. Clipboard Queries cont. login:greg AND tags:riak login:greg AND text:node AND text:javascript (Search)
  • 14. Uh oh login:greg AND private:false Matches only my clips Matches 20% of all clips! login:greg AND text:iPhone (Search)
  • 16. Doc Partition Query Processing 1. x AND y (sort z, start = 990, count = 10) 2. On Each node: 1. Perform x AND y 2. Sort on z 3. Slice [ 0 .. 1000 ] 4. Send to aggregator 3. On aggregator 1. Merge all results (N x 1000) 2. Slice [ 990 .. 1000 ]
  • 17. Term Partition Query Processing 1. x AND y (sort z, start = 990, count = 10) 2. On x node: search for x (and send all) 3. On y node: search for y (and send all) 4. On aggregator: 1. Do x AND y 2. Sort on z 3. Slice to [ 990 .. 1000 ]
  • 18. Riak Search Issues 1. For any singular term, all results must be sent back to aggregator. 2. Incorrectly performs sort and slice (does sort then slice) 3. ANDs take time O(MAX(|x|, |y|)) instead of O(MIN(|x|, |y|). 4. All matches must be read to get sort field.
  • 19. Riak Search Fixes 1. Inline fields for short and common attributes. 2. Dynamic fields for precomputed ANDs. 3. PRESORT option for sorting without document reads.
  • 20. Inline Fields Nifty feature added recently to Riak Search Fields only used to prune result set can be made inline for a big perf win Normal query applied first – then results filtered quickly with inline “filter” query High storage cost – only viable for small fields! (Search)
  • 21. Riak Search – Inline Fields cont. login:greg AND private:false becomes Query - login:greg Filter Query – private:false private:false is efficiently applied only to results of login:greg. Hooray! (Search)
  • 22. Fixing ANDs But what about login:greg AND text:iPhone? text field is too large to inline! We had to get creative. (Search)
  • 23. Dynamic Fields Our Solution: Create a new field - text_u (u for user) Values in text_u have the user’s name appended In greg’s clip text:iPhone  text_greg:iPhone In bob’s clip text:iPhone  text_bob:iPhone (Search)
  • 24. Presort on Keys • Our addition to Riak code base. • Does sort before slice • If PRESORT=key, then never reads the docs • Tremendous win (> 100x compared to M/R approaches)
  • 25. Clip Keys <Time (ms)><User (guid)><SHA1 of Value> • Base-64 encode each component • Only use first 4 characters of user & content • Only 16 bytes Collisions? 1 in 17M if clipped the same thing at same time.
  • 26. Our Query Processing 1. w AND (x AND y) (sort z, start = 990, count = 10) 2. On w_x node: search and send w_x 3. On w_y node: search and send all w_y 4. On aggregator: 1. Do w_x AND w_y 2. Sort on z 3. Slice to [ 990 .. 1000 ]
  • 27. Summary • Use inline fields for short and common bits • Use dynamic fields for prebuilt ANDs • Use keys that imply sort order • Use same techniques for pagination • Out approach yields search throughput that is 100x better than out of the box (and better as you scale outward).
  • 29. We’re hiring! www.clipboard.com/register Invitation Code: just4u www.clipboard.com/jobs Or talk to us right now! Thanks!