SlideShare a Scribd company logo
Riak Search
Performance Wins
 How we got > 100x improvement
            in query throughput


            Gary Flake, Founder
           gary@clipboard.com
Demo


       Introduction
Architecture
                             web-01                       web-02                web-03
                          Node.js + Nginx              Node.js + Nginx       Node.js + Nginx




                      riak-01

                                                          cache-01                redis-01

  riak-05                                 riak-02
                                                          cache-02                redis-02


                                                          cache-03

            riak-04             riak-03
                                                                                                 admin-01




                      thumb-01              thumb-02                     job-01         job-02
Riak

An awesome noSQL data store:

• Super easy to scale up AND down
• Fault tolerant – no SPoF
• Flexible schema
• Full-text search out of the box
• Can be fixed and improved in Erlang (the
  Basho folks awesomely take our commits)
Riak – Basics

• Data in Riak is grouped buckets
  (effectively namespaces)
• Basic operations are:
    •   Get, save, delete, search, map, reduce
• Eventual consistency managed through
  N, R, and W bucket parameters.
• Everything we put in Riak is JSON
• We talk to Riak through the excellent riak-js
  node library by Francisco Treacy
Data Model – Clips
           title                  ctime
                                          domain

 author




mentions           annotation   tags
Data Model - Clips
Clips are the gateway to all of our data

                   <html>         Comments on Clip ‘abc’
                      …                  “F1rst”

                   </html>
 key: abc           Blob              “Nice clip yo!”


                                  “Saw this on Reddit…”
   Clip            Key: abc



                Comment Cache
Other Buckets

• Users
• Blobs
• Comments
• Templates
• Counts
• Search Caches
• Transactions
Riak Search

• Gets many things out of Riak by something
  other than the primary key.
• You specify a schema (the types for the
  field within a JSON object).
• Works great but with one big gotcha:
  – Index is uses term-based partitioning instead
    of document-based partitioning
  – Implication: joins + sort + pagination sucks
  – We know how to work around this
Riak Search – Querying

• Query syntax based on Lucene
• Basic Query
   text:funny
• Compound Query
   login:greg OR (login:gary AND tags:riak)
• Range Query
   ctime:[98685879630026 TO 98686484430026]
Clipboard App Flow
      Client                           node.js                           Riak
            Go to clipboard.com/home
                                                  Search clips bucket
                                                   query = login:greg

                                                     Top 20 results
                  Top 20 results
    start
rendering
                  (For each clip)
               API Request for blob
                                                 GET from blobs bucket

               Return blob to client
  render
    blob
Clipboard Queries


                 login:greg



               mentions:greg



  ctime:[98685879630026 TO 98686484430026]

                                             (Search)
Clipboard Queries cont.



            login:greg AND tags:riak




  login:greg AND text:node AND text:javascript


                                                 (Search)
Uh oh


               login:greg AND private:false
  Matches only my clips           Matches 20% of all clips!




                login:greg AND text:iPhone



                                                              (Search)
Index Partitioning Schemes
Doc Partition Query Processing

1. x AND y (sort z, start = 990, count = 10)
2. On Each node:
    1. Perform x AND y
    2. Sort on z
    3. Slice [ 0 .. 1000 ]
    4. Send to aggregator
3. On aggregator
    1. Merge all results (N x 1000)
    2. Slice [ 990 .. 1000 ]
Term Partition Query Processing

1. x AND y (sort z, start = 990, count = 10)
2. On x node: search for x (and send all)
3. On y node: search for y (and send all)
4. On aggregator:
    1. Do x AND y
    2. Sort on z
    3. Slice to [ 990 .. 1000 ]
Riak Search Issues

1. For any singular term, all results must be
   sent back to aggregator.
2. Incorrectly performs sort and slice (does
   sort then slice)
3. ANDs take time O(MAX(|x|, |y|)) instead
   of O(MIN(|x|, |y|).
4. All matches must be read to get sort field.
Riak Search Fixes

1. Inline fields for short and common
   attributes.
2. Dynamic fields for precomputed ANDs.
3. PRESORT option for sorting without
   document reads.
Inline Fields

Nifty feature added recently to Riak Search


Fields only used to prune result set can be
made inline for a big perf win


Normal query applied first – then results filtered
quickly with inline “filter” query


High storage cost – only viable for small fields!

                                               (Search)
Riak Search – Inline Fields cont.


             login:greg AND private:false

                       becomes
                   Query - login:greg
              Filter Query – private:false

 private:false is efficiently applied only to results of
 login:greg. Hooray!
                                                       (Search)
Fixing ANDs

But what about login:greg AND text:iPhone?



text field is too large to inline!



We had to get creative.


                                         (Search)
Dynamic Fields
Our Solution: Create a new field - text_u
   (u for user)


Values in text_u have the user’s name appended


In greg’s clip
 text:iPhone  text_greg:iPhone
In bob’s clip
 text:iPhone  text_bob:iPhone

                                            (Search)
Presort on Keys

• Our addition to Riak code base.
• Does sort before slice
• If PRESORT=key, then never reads the docs
• Tremendous win (> 100x compared to M/R
  approaches)
Clip Keys

<Time (ms)><User (guid)><SHA1 of Value>


• Base-64 encode each component
• Only use first 4 characters of user & content
• Only 16 bytes


Collisions? 1 in 17M if clipped the same thing
at same time.
Our Query Processing

1. w AND (x AND y)
   (sort z, start = 990, count = 10)
2. On w_x node: search and send w_x
3. On w_y node: search and send all w_y
4. On aggregator:
    1. Do w_x AND w_y
    2. Sort on z
    3. Slice to [ 990 .. 1000 ]
Summary

• Use inline fields for short and common bits
• Use dynamic fields for prebuilt ANDs
• Use keys that imply sort order
• Use same techniques for pagination


• Out approach yields search throughput
  that is 100x better than out of the box (and
  better as you scale outward).
Questions?
We’re hiring!


       www.clipboard.com/register
          Invitation Code: just4u


        www.clipboard.com/jobs
         Or talk to us right now!



                                    Thanks!

More Related Content

What's hot

Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
_mdev_
 
Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1
datamantra
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
datamantra
 
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGMUsing JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
PT.JUG
 
Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014
StampedeCon
 
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
Restlet
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010
Rusty Klophaus
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
Ana Rebelo
 
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADRTweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
Lucidworks
 
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Cohesive Networks
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Wes McKinney
 
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Alex Gorbachev
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
Dorian Beganovic
 
Scala profiling
Scala profilingScala profiling
Scala profiling
Filippo Pacifici
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
datamantra
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!
Michele Leroux Bustamante
 
Introduction to datomic
Introduction to datomicIntroduction to datomic
Introduction to datomic
Siva Jagadeesan
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
Piotr Findeisen
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
Josh Carlisle
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
 

What's hot (20)

Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
 
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGMUsing JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
 
Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014
 
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADRTweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
 
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Scala profiling
Scala profilingScala profiling
Scala profiling
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!
 
Introduction to datomic
Introduction to datomicIntroduction to datomic
Introduction to datomic
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 

Viewers also liked

Riak 2.0 : For Beginners, and Everyone Else
Riak 2.0 : For Beginners, and Everyone ElseRiak 2.0 : For Beginners, and Everyone Else
Riak 2.0 : For Beginners, and Everyone Else
Engin Yoeyen
 
Leon fagan
Leon faganLeon fagan
Leon fagan
andremusic93
 
Proyecto fredy-jaramillo extenzo
Proyecto fredy-jaramillo extenzoProyecto fredy-jaramillo extenzo
Proyecto fredy-jaramillo extenzo
Freddy Jaramillo
 
Plt process (category products)
Plt process (category products)Plt process (category products)
Plt process (category products)
Fitira
 
Data data every where!! Thomas O'Grady
Data data every where!! Thomas O'GradyData data every where!! Thomas O'Grady
Data data every where!! Thomas O'Grady
tomo006
 
презентация1
презентация1презентация1
презентация1Danil Kozlov
 
Bunny booktemplate1
Bunny booktemplate1Bunny booktemplate1
Bunny booktemplate1
mjbeichner
 
Voting presentation
Voting presentationVoting presentation
Voting presentation
hannahfenney
 
Maritime New Haven - Sound School
Maritime New Haven - Sound SchoolMaritime New Haven - Sound School
Maritime New Haven - Sound School
Amy Durbin
 
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
B1 Systems GmbH
 
Digipak analysis
Digipak analysisDigipak analysis
Digipak analysis
andremusic93
 
Marketing Management
Marketing ManagementMarketing Management
Marketing Management
rsinghkaurav
 
Android vs ios
Android vs iosAndroid vs ios
Android vs iosgndolf
 
Forefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas esForefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas es
Fitira
 
Simplify and run your development environments with Vagrant on OpenStack
Simplify and run your development environments with Vagrant on OpenStackSimplify and run your development environments with Vagrant on OpenStack
Simplify and run your development environments with Vagrant on OpenStack
B1 Systems GmbH
 
The Poker Entrepreneurship: Speaking @ JFDI.Asia
The Poker Entrepreneurship: Speaking @ JFDI.AsiaThe Poker Entrepreneurship: Speaking @ JFDI.Asia
The Poker Entrepreneurship: Speaking @ JFDI.Asia
saumilnanavati
 
P m01 inside_selling
P m01 inside_sellingP m01 inside_selling
P m01 inside_selling
Fitira
 
Digital Audio/Podcast Assignment
Digital Audio/Podcast AssignmentDigital Audio/Podcast Assignment
Digital Audio/Podcast Assignment
Jordan Kelly
 
P m01 inside_selling
P m01 inside_sellingP m01 inside_selling
P m01 inside_selling
Fitira
 

Viewers also liked (20)

Riak 2.0 : For Beginners, and Everyone Else
Riak 2.0 : For Beginners, and Everyone ElseRiak 2.0 : For Beginners, and Everyone Else
Riak 2.0 : For Beginners, and Everyone Else
 
Leon fagan
Leon faganLeon fagan
Leon fagan
 
Proyecto fredy-jaramillo extenzo
Proyecto fredy-jaramillo extenzoProyecto fredy-jaramillo extenzo
Proyecto fredy-jaramillo extenzo
 
Plt process (category products)
Plt process (category products)Plt process (category products)
Plt process (category products)
 
Data data every where!! Thomas O'Grady
Data data every where!! Thomas O'GradyData data every where!! Thomas O'Grady
Data data every where!! Thomas O'Grady
 
26 28
26 2826 28
26 28
 
презентация1
презентация1презентация1
презентация1
 
Bunny booktemplate1
Bunny booktemplate1Bunny booktemplate1
Bunny booktemplate1
 
Voting presentation
Voting presentationVoting presentation
Voting presentation
 
Maritime New Haven - Sound School
Maritime New Haven - Sound SchoolMaritime New Haven - Sound School
Maritime New Haven - Sound School
 
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
 
Digipak analysis
Digipak analysisDigipak analysis
Digipak analysis
 
Marketing Management
Marketing ManagementMarketing Management
Marketing Management
 
Android vs ios
Android vs iosAndroid vs ios
Android vs ios
 
Forefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas esForefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas es
 
Simplify and run your development environments with Vagrant on OpenStack
Simplify and run your development environments with Vagrant on OpenStackSimplify and run your development environments with Vagrant on OpenStack
Simplify and run your development environments with Vagrant on OpenStack
 
The Poker Entrepreneurship: Speaking @ JFDI.Asia
The Poker Entrepreneurship: Speaking @ JFDI.AsiaThe Poker Entrepreneurship: Speaking @ JFDI.Asia
The Poker Entrepreneurship: Speaking @ JFDI.Asia
 
P m01 inside_selling
P m01 inside_sellingP m01 inside_selling
P m01 inside_selling
 
Digital Audio/Podcast Assignment
Digital Audio/Podcast AssignmentDigital Audio/Podcast Assignment
Digital Audio/Podcast Assignment
 
P m01 inside_selling
P m01 inside_sellingP m01 inside_selling
P m01 inside_selling
 

Similar to Riak perf wins

Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
Nishant Gandhi
 
Adding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of TricksAdding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of Tricks
siculars
 
Tuning Flink For Robustness And Performance
Tuning Flink For Robustness And PerformanceTuning Flink For Robustness And Performance
Tuning Flink For Robustness And Performance
Stefan Richter
 
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
Flink Forward
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross Lawley
NETWAYS
 
遇見 Ruby on Rails
遇見 Ruby on Rails遇見 Ruby on Rails
遇見 Ruby on Rails
Wen-Tien Chang
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
MongoDB APAC
 
Fluent 2012 v2
Fluent 2012   v2Fluent 2012   v2
Fluent 2012 v2
Shalendra Chhabra
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
bartzon
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Kevin Xu
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
Heinrich Hartmann
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
tieleman
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
GeeksLab Odessa
 
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time CompilationThe Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
Monica Beckwith
 
Autogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQLAutogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQL
Nordic APIs
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slides
Matt Kocubinski
 
Let's Get to the Rapids
Let's Get to the RapidsLet's Get to the Rapids
Let's Get to the Rapids
Maurice Naftalin
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
Morgan Tocker
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
Dmitriy Dumanskiy
 
Monitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixMonitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with Zabbix
Gerger
 

Similar to Riak perf wins (20)

Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Adding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of TricksAdding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of Tricks
 
Tuning Flink For Robustness And Performance
Tuning Flink For Robustness And PerformanceTuning Flink For Robustness And Performance
Tuning Flink For Robustness And Performance
 
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross Lawley
 
遇見 Ruby on Rails
遇見 Ruby on Rails遇見 Ruby on Rails
遇見 Ruby on Rails
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Fluent 2012 v2
Fluent 2012   v2Fluent 2012   v2
Fluent 2012 v2
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
 
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time CompilationThe Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
 
Autogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQLAutogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQL
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slides
 
Let's Get to the Rapids
Let's Get to the RapidsLet's Get to the Rapids
Let's Get to the Rapids
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
Monitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixMonitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with Zabbix
 

Recently uploaded

Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 

Recently uploaded (20)

Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 

Riak perf wins

  • 1. Riak Search Performance Wins How we got > 100x improvement in query throughput Gary Flake, Founder gary@clipboard.com
  • 2. Demo Introduction
  • 3. Architecture web-01 web-02 web-03 Node.js + Nginx Node.js + Nginx Node.js + Nginx riak-01 cache-01 redis-01 riak-05 riak-02 cache-02 redis-02 cache-03 riak-04 riak-03 admin-01 thumb-01 thumb-02 job-01 job-02
  • 4. Riak An awesome noSQL data store: • Super easy to scale up AND down • Fault tolerant – no SPoF • Flexible schema • Full-text search out of the box • Can be fixed and improved in Erlang (the Basho folks awesomely take our commits)
  • 5. Riak – Basics • Data in Riak is grouped buckets (effectively namespaces) • Basic operations are: • Get, save, delete, search, map, reduce • Eventual consistency managed through N, R, and W bucket parameters. • Everything we put in Riak is JSON • We talk to Riak through the excellent riak-js node library by Francisco Treacy
  • 6. Data Model – Clips title ctime domain author mentions annotation tags
  • 7. Data Model - Clips Clips are the gateway to all of our data <html> Comments on Clip ‘abc’ … “F1rst” </html> key: abc Blob “Nice clip yo!” “Saw this on Reddit…” Clip Key: abc Comment Cache
  • 8. Other Buckets • Users • Blobs • Comments • Templates • Counts • Search Caches • Transactions
  • 9. Riak Search • Gets many things out of Riak by something other than the primary key. • You specify a schema (the types for the field within a JSON object). • Works great but with one big gotcha: – Index is uses term-based partitioning instead of document-based partitioning – Implication: joins + sort + pagination sucks – We know how to work around this
  • 10. Riak Search – Querying • Query syntax based on Lucene • Basic Query text:funny • Compound Query login:greg OR (login:gary AND tags:riak) • Range Query ctime:[98685879630026 TO 98686484430026]
  • 11. Clipboard App Flow Client node.js Riak Go to clipboard.com/home Search clips bucket query = login:greg Top 20 results Top 20 results start rendering (For each clip) API Request for blob GET from blobs bucket Return blob to client render blob
  • 12. Clipboard Queries login:greg mentions:greg ctime:[98685879630026 TO 98686484430026] (Search)
  • 13. Clipboard Queries cont. login:greg AND tags:riak login:greg AND text:node AND text:javascript (Search)
  • 14. Uh oh login:greg AND private:false Matches only my clips Matches 20% of all clips! login:greg AND text:iPhone (Search)
  • 16. Doc Partition Query Processing 1. x AND y (sort z, start = 990, count = 10) 2. On Each node: 1. Perform x AND y 2. Sort on z 3. Slice [ 0 .. 1000 ] 4. Send to aggregator 3. On aggregator 1. Merge all results (N x 1000) 2. Slice [ 990 .. 1000 ]
  • 17. Term Partition Query Processing 1. x AND y (sort z, start = 990, count = 10) 2. On x node: search for x (and send all) 3. On y node: search for y (and send all) 4. On aggregator: 1. Do x AND y 2. Sort on z 3. Slice to [ 990 .. 1000 ]
  • 18. Riak Search Issues 1. For any singular term, all results must be sent back to aggregator. 2. Incorrectly performs sort and slice (does sort then slice) 3. ANDs take time O(MAX(|x|, |y|)) instead of O(MIN(|x|, |y|). 4. All matches must be read to get sort field.
  • 19. Riak Search Fixes 1. Inline fields for short and common attributes. 2. Dynamic fields for precomputed ANDs. 3. PRESORT option for sorting without document reads.
  • 20. Inline Fields Nifty feature added recently to Riak Search Fields only used to prune result set can be made inline for a big perf win Normal query applied first – then results filtered quickly with inline “filter” query High storage cost – only viable for small fields! (Search)
  • 21. Riak Search – Inline Fields cont. login:greg AND private:false becomes Query - login:greg Filter Query – private:false private:false is efficiently applied only to results of login:greg. Hooray! (Search)
  • 22. Fixing ANDs But what about login:greg AND text:iPhone? text field is too large to inline! We had to get creative. (Search)
  • 23. Dynamic Fields Our Solution: Create a new field - text_u (u for user) Values in text_u have the user’s name appended In greg’s clip text:iPhone  text_greg:iPhone In bob’s clip text:iPhone  text_bob:iPhone (Search)
  • 24. Presort on Keys • Our addition to Riak code base. • Does sort before slice • If PRESORT=key, then never reads the docs • Tremendous win (> 100x compared to M/R approaches)
  • 25. Clip Keys <Time (ms)><User (guid)><SHA1 of Value> • Base-64 encode each component • Only use first 4 characters of user & content • Only 16 bytes Collisions? 1 in 17M if clipped the same thing at same time.
  • 26. Our Query Processing 1. w AND (x AND y) (sort z, start = 990, count = 10) 2. On w_x node: search and send w_x 3. On w_y node: search and send all w_y 4. On aggregator: 1. Do w_x AND w_y 2. Sort on z 3. Slice to [ 990 .. 1000 ]
  • 27. Summary • Use inline fields for short and common bits • Use dynamic fields for prebuilt ANDs • Use keys that imply sort order • Use same techniques for pagination • Out approach yields search throughput that is 100x better than out of the box (and better as you scale outward).
  • 29. We’re hiring! www.clipboard.com/register Invitation Code: just4u www.clipboard.com/jobs Or talk to us right now! Thanks!