Percona Live London 2014: Serve out any page with an HA Sphinx environment

Serve out any page
with an HA Sphinx
environment
Art van Scheppingen
Head of Database Engineering
2
1. Who is Spil Games?
2. What is Sphinx Search?
3. Make Sphinx highly available
4. How does Spil Games use Sphinx?
5. Sphinx benchmarks
6. Questions?
Overview
Who are we?
Who is Spil Games?
4
• Game publishers & distributors
• Company founded in 2001
• 130+ employees
• 150M+ unique visitors per month
• Over 60M registered users
• 45 portals in 19 languages
• Casual games
• Social games
• Real time multiplayer games
• Mobile (html5) games
• 40+ MySQL clusters
• 65k queries per second
• 10 Sphinx servers
• 8k queries per second
Facts
5
Geographic Reach
150 Million Monthly Active Users(*)
Source: (*) Google Analytics, August 2012
6
Girls, Teens and Family
spielen.com
juegos.com
gamesgames.com
games.co.uk
Brands
Sphinx Search
Advanced seaching
8
• MyISAM / InnoDB (5.6.4 or higher)
CREATE TABLE articles (
id int(11) not null auto_increment,
author varchar(40) not null,
title varchar(50) not null,
body text,
PRIMARY KEY (id),
FULLTEXT idx (title, body)
) ENGINE=InnoDB;
• SELECT id, author FROM articles WHERE MATCH (title,body)
AGAINST (’somephrase');
• Complex queries
• SELECT id, author, MATCH (title,body) AGAINST (’somephrase' IN
BOOLEAN MODE) as score FROM articles ORDER BY score DESC,
id ASC;
• Drawbacks:
• Slow response times
Full text search in MySQL
9
• PostgreSQL tsquery
• Elasticsearch
• Apache Lucene
• Sphinx Search
• Many other alternatives
Alternatives to MySQL full text search
10
• Sphinx
• SELECT author FROM articles WHERE
MATCH('(@title,body) database');
• Complex queries
• SELECT author FROM articles WHERE
MATCH('(@title,body) database') ORDER BY
WEIGHT(), id ASC;
• Drawbacks:
• Not straightforward swap
• Specialized knowledge is needed
Full text search in Sphinx
11
• Generic (site) search
• Document search
• Logdata analysis
• Geo-distance calculation
Sphinx Search typical use cases
12
• Consists out of two components
• Indexer
• Index (textual) data
• Search daemon
• Search indexes and return matched items
• Three types of indexes:
• Disk indexes
• Real Time indexes
• Distributed indexes
Sphinx is a full text search engine
13
• Comparable to archive tables
• Indexer indexes data and updates full index
• Index is “written once”
• Only attributes can be changed (run time)
• Use --rotate to reload new indexes
• Less resources needed (ram/cpu)
• Not dependent on a specific database engine
• MySQL
• PostgreSQL
• MSSQL
• ODBC
• Xml/tsp pipes
Disk indexes
14
• Comparable to normal tables
• Online indexes
• Will be (eventually) written to disk
• Dynamically alter the indexes
• Insert/replace/delete operations
• Consume more memory
• Changes are generally updated within milliseconds
• Sometimes stalls for seconds, so not guaranteed
• High update rate influences the performance
Real time indexes
15
• Comparable to federated tables in MySQL
• Distribute the search over multiple nodes
• Many smaller indexes
• Sends queries to all defined nodes/indexes
• Aggregates and merges results
• Slowest node slows down responses
• Setting timeouts can keep this lower
Distributed indexes
16
• Two types of data:
• Fields
• Textual data to be indexed
• Attributes
• Data to sort/filter upon
• Special: unique identifier
• Special: (last update) timestamp
• Example:
+-------+----------------+---------------+-----------------+
| id | author | title | publishing_date |
+-------+----------------+---------------+-----------------+
| 12345 | Linus Torvalds | Just for fun | 2002-06-04 |
+-------+----------------+---------------+-----------------+
Indexing: attributes and fields
17
• Support for stopwords
• Ignore common words like “and”, “the” and “to”
• Ignore specific words like “game” and “juego”
• Still affects the keyword position
• Language and characters
• Morphology
• Similar words
• Lemmatization
• Run/ran/running
• Character folding
• U+FF10..U+FF19->0..9
Indexing: stopwords and stemmers
18
• Search daemon has three interfaces:
• SphinxAPI: Native Sphinx binary protocol
• SphinxQL: MySQL protocol
• SphinxSE: MySQL/MariaDB integration
• Example native:
<?php
$s = new SphinxClient;
$s->setServer("localhost", 6712);
$s->setMatchMode(SPH_MATCH_ANY);
$s->setMaxQueryTime(3);
$result = $s->query(”somephrase”, “articles”);
var_dump($result);
?>
• Example SphinxQL:
echo “SELECT author FROM articles WHERE MATCH('(@title,body)
somephrase') ORDER BY WEIGHT(), id ASC;” | mysql –P 6713
Searching: the interfaces
19
• Supports various ranking algorithms:
• None
• Any
• Phrase proximity
• Okapi BM25 (probabilistic)
• Wordcount
• Many more
• User weighting
• Boost columns with a multiplier
Searching: Search daemon
20
mysql> SELECT title, id, publication_date FROM articles WHERE
MATCH('(@title,body) database') ORDER BY WEIGHT(), publication_date ASC
LIMIT 0,5 OPTION field_weights=(title=10,body=3);
+-----------------------------+-------+------------------+
| title | id | publication_date |
+-----------------------------+-------+------------------+
| MySQL Cookbook | 75532 | 2014-07-01 |
| High performance MySQL | 94325 | 2012-04-02 |
| MySQL Administrator’s Bible | 63627 | 2009-05-11 |
| MySQL (4th Edition) | 39922 | 2008-09-08 |
| MySQL in a nutshell | 58793 | 2008-04-01 |
+-----------------------------+-------+------------------+
5 rows in set (0.01 sec)
Returned data
Making Sphinx
Highly Available
22
• Application handles:
• Connections
• Failovers
• Timeouts
• Distribution scheme
• Random
• Round robin
• Weighted
• Be creative!
Client side HA
Client side HA
Server-1 Server-2 Server-n
Sphinx
Node 1
Sphinx
Node 2
Sphinx
Node n
Client side HA
Server-1 Server-2 Server-n
Sphinx
Node 1
Sphinx
Node 2
Sphinx
Node n
Timeouts
25
<?php
function mysql_ha_connect(array $servers) {
foreach ($servers as $server){
$mysqli = new mysqli($server, 'user', 'pass', '', 9306);
if (is_null($mysqli->connect_error)) {
return $mysqli;
}
}
return false;
}
$servers = array(’node1.domain.com', 'node2.domain.com');
shuffle($servers);
$connection = mysql_ha_connect($servers);
if($connection === false) {
die('Could not connect to any node');
}
…
Client side HA Example
26
• Application connects to one single host
• LB / Proxy handles:
• Connections
• Failovers
• Timeouts
• Solutions:
• HAProxy
• MySQL Proxy
• MaxScale(?)
• Distribution scheme
• Random
• Round robin
• Weighted
• Least connections
• Fastest response
Load balancer / Proxy
Load Balancer / Proxy
Server-1 Server-2 Server-n
Load balancer
Sphinx
Node 1
Sphinx
Node 2
Sphinx
Node n
Load Balancer / Proxy
Server-1 Server-2 Server-n
Load balancer
Sphinx
Node 1
Sphinx
Node 2
Sphinx
Node n
Removed from load balancer
29
• Application connects to Sphinx on localhost
• Sphinx agent mirroring handles:
• Connections
• Failovers
• Timeouts
• Distribution scheme
• Random
• Round robin
• Nodeads (removes dead mirrors)
• Noerrors (removes worse performing mirrors)
Sphinx agent mirroring
Sphinx agent mirroring
Server-1 Server-2 Server-n
Sphinx
Sphinx
Node 1
Sphinx
Node 2
Sphinx
Node n
Sphinx Sphinx
Sphinx agent mirroring
Server-1 Server-2 Server-n
Sphinx
Sphinx
Node 1
Sphinx
Node 2
Sphinx
Node n
Sphinx Sphinx
Removed from Sphinx
32
Sphinx agent mirroring example
index dist {
type = distributed
ha_strategy = nodeads
agent_query_timeout = 100
agent = node1:9312|node2:9312|node3:9312:game_index
}
How do we use
Sphinx Search?
Not only search
34
• Started using Sphinx in 2009
• Simple game search
• Replaced our MySQL / MyISAM search
• Added search for multiple columns
• Change weight per column
• Distributed mirrored indexes
• Index rebuilds performed per node
• Updates happen more frequently
Game search
35
Distributed mirrored indexes
Sphinx Node 1
Brand A
Brand B
Sphinx Node 2
Brand A+
Application
Server
Brand B+
36
Game Search
37
• Profile service
• Friends function
• Searches friends on
• username
• firstname / lastname
• Find friends across portals (within brands)
• Distributed partitioned index
Friends search
38
Distributed partitioned index
Sphinx Node 1
Partition >=
today
Partition >=
this month
<= today
Partition >=
3 months
<= this
month
Partition <=
3 months
Sphinx Node 2
Partition >=
today
Application
Server
39
Friends search
40
• ROAR is a database abstraction layer
• See Percona Live Santa Clara 2014 presentation
• Sphinx complementary to MySQL and Couchbase
• Translate a title to a gamepage
• Search url parts to fetch the application id
• Translate keywords to lists of games
• Search url parts to fetch a list of application ids
• Filter applications on portal and brand
• Filter applications on browser capabilities
• Sort on publishing date, popularity and rating
ROAR storage layer
41
• Legacy:
• Url without identifiers
• There can only be one game with the same url
• Sphinx does a fast lookup of (existing) game to id
• Example:
http://www.agame.com/game/rig-bmx
Translates into application id 123456
• Future improvements:
• Correct non-existing pages (404)
http://www.agame.com/game/rig-bmxx
with a redirect (301) to:
http://www.agame.com/game/rig-bmx
Translating a title to a gamepage
42
Translating a title to a gamepage
43
Translating a title to a gamepage
44
• Filter on url parts
• One or multiple
• Complex filtering on capabilities
• Blacklist incompatible games (Flash/Unity)
Translating keywords to game listings
45
• Example 1 url part:
http://www.agame.com/games/puzzle
Sends this query to Sphinx:
SELECT title, appid FROM game_index WHERE brandid=1 AND portalid=88 AND
MATCH('@url "puzzle"') ORDER BY date_onsite desc LIMIT 0,10 OPTION
max_matches=10000;
• Example 2 url parts:
http://www.agame.com/games/puzzle/match-3
Sends this query to Sphinx:
SELECT title, appid FROM game_index WHERE brandid=1 AND portalid=88 AND
MATCH('@url "puzzle" && "match-3" ') ORDER BY date_onsite desc LIMIT 0,10
OPTION max_matches=10000;
Search on url parts
46
Search on url parts
47
Search on url parts
48
• Blacklisting performed on capabilities encoded bitmask
• Example normal desktop browser (no filter):
http://www.agame.com/games/puzzle
Opening the puzzle category on a desktop sends this query to Sphinx:
SELECT title, appid,(bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS bitfilter FROM
game_index WHERE brandid=1 AND portalid=88 AND MATCH('@url "puzzle"') AND
bitcheck = 0 ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000;
• Example Chrome on Android 4.4 (filter out 11):
http://www.agame.com/games/puzzle
Opening the puzzle category on a Nexus 7 sends this query to Sphinx:
SELECT title, appid,(bitmask1 & 11) AS bitcheck, (bitmask1 & 11) AS bitfilter
FROM game_index WHERE brandid=1 AND portalid=88 AND MATCH('@url "puzzle"') AND
bitcheck = 0 ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000;
Filter on browser capabilities
49
Filter on browser capabilities
50
Filter on browser capabilities
51
• Real time indexes decreased performance
• Make the indexing process “nicer”
/bin/taskset 0x00000001 /usr/bin/indexer --all --config /etc/sphinx.conf
• Send statistics to Graphite
http://engineering.spilgames.com/tamed-sphinx-search/
What we encountered
Benchmarking
Sphinx
53
• Sysbench 0.5
• Custom lua scripts
• Disabled caching
• Openstack virtuals:
• Benchmark driver: 4 core CPU, 4GB memory
• Sphinx nodes: 4 core CPU, 16GB memory
• MySQL nodes: 4 core CPU, 16GB memory
• At least 3 runs per test
• Average of tests counts
• Repeat tests when outliers were found
Sphinx Benchmark specifications
54
• InnoDB discrete match
SELECT l.url, gd.title, g.appid, bitmask1, date_onsite FROM games g LEFT
JOIN game_capabilities gc ON g.appid=gc.app INNER JOIN game_cat c ON
g.appid = c.appid AND g.portalid = c.portalid AND g.brandid = c.brandid
INNER JOIN cat_data cd ON c.portalid = cd.portalid AND c.brandid =
cd.brandid AND c.catname = cd.catname WHERE g.brandid=1 AND g.portalid=88
AND cd.url='puzzle' ORDER BY date_onsite desc LIMIT 0,10;
• Sphinx single phrase
SELECT title, appid, (bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS
bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND bitcheck = 0
AND MATCH('@url "puzzle"') ORDER BY date_onsite desc LIMIT 0,10 OPTION
max_matches=10000;
InnoDB vs Sphinx
55
InnoDB vs Sphinx
0
50
100
150
200
250
300
4 8 16 24 32 48 64
Sphinx single phrase
InnoDB discrete match
threads
95thperc.responsetimeinms
56
• MyISAM single match-against
Select title, appid, (bitmask1 & 0) AS bitfilter, MATCH(`url`)
AGAINST('puzzle') AS score FROM game_index WHERE MATCH(`url`)
AGAINST('puzzle') AND portalid=88 AND brandid=1 AND (bitmask1 & 0) = 0
ORDER BY score DESC, date_onsite DESC LIMIT 0,10;
• Sphinx single phrase
SELECT title, appid, (bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS
bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND bitcheck = 0
AND MATCH('@url "puzzle"') ORDER BY date_onsite desc LIMIT 0,10 OPTION
max_matches=10000;
MyISAM full text vs Sphinx 1
57
MyISAM full text vs Sphinx 1
0
200
400
600
800
1000
1200
1400
1600
1800
2000
4 8 16 24 32 48 64
Sphinx single phrase
MyISAM single match-against
threads
95thperc.responsetimeinms
58
• MyISAM multiple match-against
SELECT title, appid, (bitmask1 & 0) AS bitfilter, MATCH(`url`)
AGAINST('+puzzle +sudoku' IN BOOLEAN MODE) AS score FROM game_index WHERE
MATCH(`url`) AGAINST('+puzzle +sudoku' IN BOOLEAN MODE) AND portalid=88 AND
brandid=1 AND (bitmask1 & 0) = 0 ORDER BY score DESC, date_onsite DESC
LIMIT 0,10;
• Sphinx multiple phrases
SELECT title, appid, (bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS
bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND bitcheck = 0
AND MATCH('@url "puzzle" && "sudoku"') ORDER BY date_onsite desc LIMIT 0,10
OPTION max_matches=10000;
MyISAM full text vs Sphinx 2
59
MyISAM full text vs Sphinx 2
0
50
100
150
200
250
4 8 16 24 32 48 64
MyISAM multiple match-against
Sphinx multiple phrases
threads
95thperc.responsetimeinms
60
MyISAM full text vs Sphinx 2
0
200
400
600
800
1000
1200
1400
1600
1800
2000
4 8 16 24 32 48 64
Sphinx single phrase
MyISAM multiple match-against
Sphinx multiple phrases
MyISAM single match-against
threads
95thperc.responsetimeinms
61
InnoDB vs MyISAM vs Sphinx
0
500
1000
1500
2000
2500
3000
3500
4000
4 8 16 24 32 48 64
Sphinx single phrase
InnoDB single match-against
MyISAM single match-against
threads
95thperc.responsetimeinms
62
• Sphinx on localhost
• Talks MySQL on localhost
• One or two remote agent(s)
• Sphinx behind loadbalancer
• Proxies MySQL
Sphinx HA solutions
63
Sphinx HA solutions
0
20
40
60
80
100
120
140
160
4 8 16 24 32 48 64
Direct connection single host
Localhost 2 nodes
localhost 1 node
Load Balancer 2 nodes
threads
Avgresponsetimeinms
64
• Sphinx Search is faster than MySQL full text search
• Smaller result sets increase performance
• Due to sorting by relevance
• Smaller temporary tables
• InnoDB performs worse than MyISAM
• Sphinx agent mirroring performs better
• Probably due to Sphinx native protocol
• Load balances seems to perform better
• Probably due to dedicated (better) hardware
Conclusion
Questions?
66
• This presentation can be found at:
http://spil.com/pluk2014sphinx
• Sphinx Search:
http://www.sphinxsearch.com
• Sending Sphinx Search metrics to Graphite:
http://engineering.spilgames.com/tamed-sphinx-search/
• About the ROAR storage layer:
http://spil.com/plsc2014storage
• If you wish to contact me:
Email: art@spilgames.com
Twitter: @banpei
Blog: http://engineering.spilgames.com
Twitter Spil Engineering: @spilengineering
Thank you!
67
Google Snail Search:
Boomerang Cards
http://data.boomerang.nl/b/boomerang/image/google-classic/s600/3.jpg
Jean-Claude van Damme
Volvo Trucks
http://www.volvotrucks.com/trucks/UAE-market/en-
ae/newsmedia/pressreleases/Pages/pressreleases.aspx?pubid=17613
Bench mates
Craig Sunter
https://www.flickr.com/photos/16210667@N02/12381776985
Photo sources
1 of 67

Recommended

Spil Games @ FOSDEM: Galera Replicator IRL by
Spil Games @ FOSDEM: Galera Replicator IRLSpil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRLspil-engineering
65.5K views47 slides
Disco workshop by
Disco workshopDisco workshop
Disco workshopspil-engineering
2.1K views87 slides
MySQL Performance Monitoring by
MySQL Performance MonitoringMySQL Performance Monitoring
MySQL Performance Monitoringspil-engineering
2.4K views50 slides
Retaining globally distributed high availability by
Retaining globally distributed high availabilityRetaining globally distributed high availability
Retaining globally distributed high availabilityspil-engineering
802 views47 slides
Database TCO by
Database TCODatabase TCO
Database TCOspil-engineering
2.3K views34 slides
How to Make Norikra Perfect by
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra PerfectSATOSHI TAGOMORI
4.7K views29 slides

More Related Content

What's hot

Percona tool kit for MySQL DBA's by
Percona tool kit for MySQL DBA'sPercona tool kit for MySQL DBA's
Percona tool kit for MySQL DBA'sKarthik .P.R
2.7K views25 slides
MySQL in the Hosted Cloud - Percona Live 2015 by
MySQL in the Hosted Cloud - Percona Live 2015MySQL in the Hosted Cloud - Percona Live 2015
MySQL in the Hosted Cloud - Percona Live 2015Colin Charles
1K views45 slides
How to deploy Apache Spark 
to Mesos/DCOS by
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSLegacy Typesafe (now Lightbend)
46.4K views28 slides
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va... by
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...Spark Summit
1.4K views10 slides
Simple Works Best by
 Simple Works Best Simple Works Best
Simple Works BestEDB
138 views40 slides
Diagnosing Problems in Production (Nov 2015) by
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Jon Haddad
843 views41 slides

What's hot(20)

Percona tool kit for MySQL DBA's by Karthik .P.R
Percona tool kit for MySQL DBA'sPercona tool kit for MySQL DBA's
Percona tool kit for MySQL DBA's
Karthik .P.R2.7K views
MySQL in the Hosted Cloud - Percona Live 2015 by Colin Charles
MySQL in the Hosted Cloud - Percona Live 2015MySQL in the Hosted Cloud - Percona Live 2015
MySQL in the Hosted Cloud - Percona Live 2015
Colin Charles1K views
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va... by Spark Summit
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
Spark Summit1.4K views
Simple Works Best by EDB
 Simple Works Best Simple Works Best
Simple Works Best
EDB138 views
Diagnosing Problems in Production (Nov 2015) by Jon Haddad
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
Jon Haddad843 views
Bootstrapping Using Free Software by Colin Charles
Bootstrapping Using Free SoftwareBootstrapping Using Free Software
Bootstrapping Using Free Software
Colin Charles971 views
Building Distributed Systems in Scala by Alex Payne
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
Alex Payne35.5K views
keyvi the key value index @ Cliqz by Hendrik Muhs
keyvi the key value index @ Cliqzkeyvi the key value index @ Cliqz
keyvi the key value index @ Cliqz
Hendrik Muhs1.5K views
Cassandra @ Sony: The good, the bad, and the ugly part 2 by DataStax Academy
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy1.8K views
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin by Spark Summit
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Spark Summit2K views
Large-Scale Stream Processing in the Hadoop Ecosystem by Gyula Fóra
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
Gyula Fóra4K views
Unified Batch & Stream Processing with Apache Samza by DataWorks Summit
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit2.9K views
Introduction to Cassandra and CQL for Java developers by Julien Anguenot
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developers
Julien Anguenot3.3K views
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by... by Lucidworks
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Lucidworks1.5K views
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability by Pythian
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityRamp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Pythian3.5K views
Spark and cassandra (Hulu Talk) by Jon Haddad
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
Jon Haddad4.1K views

Viewers also liked

Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks by
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksQuery Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksJaime Crespo
22.3K views111 slides
How to Analyze and Tune MySQL Queries for Better Performance by
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performanceoysteing
1.7K views101 slides
How to analyze and tune sql queries for better performance webinar by
How to analyze and tune sql queries for better performance webinarHow to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinaroysteing
4.3K views58 slides
MySQL Optimizer Cost Model by
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost ModelOlav Sandstå
9.6K views46 slides
MySQL Schema Design in Practice by
MySQL Schema Design in PracticeMySQL Schema Design in Practice
MySQL Schema Design in PracticeJaime Crespo
1.6K views118 slides
Using Optimizer Hints to Improve MySQL Query Performance by
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performanceoysteing
5.6K views48 slides

Viewers also liked(6)

Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks by Jaime Crespo
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksQuery Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Jaime Crespo22.3K views
How to Analyze and Tune MySQL Queries for Better Performance by oysteing
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
oysteing1.7K views
How to analyze and tune sql queries for better performance webinar by oysteing
How to analyze and tune sql queries for better performance webinarHow to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinar
oysteing4.3K views
MySQL Optimizer Cost Model by Olav Sandstå
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost Model
Olav Sandstå9.6K views
MySQL Schema Design in Practice by Jaime Crespo
MySQL Schema Design in PracticeMySQL Schema Design in Practice
MySQL Schema Design in Practice
Jaime Crespo1.6K views
Using Optimizer Hints to Improve MySQL Query Performance by oysteing
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performance
oysteing5.6K views

Similar to Percona Live London 2014: Serve out any page with an HA Sphinx environment

ElasticSearch AJUG 2013 by
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
69.1K views40 slides
An Introduction to Elastic Search. by
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
71.1K views86 slides
Data Engineering with Solr and Spark by
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
11.5K views53 slides
KeyValue Stores by
KeyValue StoresKeyValue Stores
KeyValue StoresMauro Pompilio
3.8K views37 slides
Agility and Scalability with MongoDB by
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
2.5K views41 slides
SQL To NoSQL - Top 6 Questions Before Making The Move by
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveIBM Cloud Data Services
20.9K views32 slides

Similar to Percona Live London 2014: Serve out any page with an HA Sphinx environment(20)

ElasticSearch AJUG 2013 by Roy Russo
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
Roy Russo69.1K views
An Introduction to Elastic Search. by Jurriaan Persyn
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
Jurriaan Persyn71.1K views
Data Engineering with Solr and Spark by Lucidworks
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
Lucidworks11.5K views
Agility and Scalability with MongoDB by MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
MongoDB2.5K views
FITC - Node.js 101 by Rami Sayar
FITC - Node.js 101FITC - Node.js 101
FITC - Node.js 101
Rami Sayar503 views
My Sql And Search At Craigslist by MySQLConference
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At Craigslist
MySQLConference1.7K views
Search onhadoopsfhug081413 by gregchanan
Search onhadoopsfhug081413Search onhadoopsfhug081413
Search onhadoopsfhug081413
gregchanan1.4K views
ELK stack introduction by abenyeung1
ELK stack introduction ELK stack introduction
ELK stack introduction
abenyeung137 views
Mongo db admin_20110329 by radiocats
Mongo db admin_20110329Mongo db admin_20110329
Mongo db admin_20110329
radiocats709 views
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018 by Matthew Groves
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 20185 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
Matthew Groves336 views
Sphinx at Craigslist in 2012 by Jeremy Zawodny
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
Jeremy Zawodny5.3K views
ElasticSearch - DevNexus Atlanta - 2014 by Roy Russo
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
Roy Russo8K views
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati... by Spark Summit
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark Summit5.8K views
Service stack all the things by cyberzeddk
Service stack all the thingsService stack all the things
Service stack all the things
cyberzeddk1.7K views
Ingesting hdfs intosolrusingsparktrimmed by whoschek
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
whoschek1.6K views
MongoDB: a gentle, friendly overview by Antonio Pintus
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overview
Antonio Pintus2.3K views

Recently uploaded

Five Things You SHOULD Know About Postman by
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About PostmanPostman
33 views43 slides
Attacking IoT Devices from a Web Perspective - Linux Day by
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day Simone Onofri
16 views68 slides
STPI OctaNE CoE Brochure.pdf by
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdfmadhurjyapb
14 views1 slide
The details of description: Techniques, tips, and tangents on alternative tex... by
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...BookNet Canada
127 views24 slides
The Research Portal of Catalonia: Growing more (information) & more (services) by
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)CSUC - Consorci de Serveis Universitaris de Catalunya
80 views25 slides

Recently uploaded(20)

Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman33 views
Attacking IoT Devices from a Web Perspective - Linux Day by Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri16 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada127 views
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab19 views
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi127 views
handbook for web 3 adoption.pdf by Liveplex
handbook for web 3 adoption.pdfhandbook for web 3 adoption.pdf
handbook for web 3 adoption.pdf
Liveplex22 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely21 views
Serverless computing with Google Cloud (2023-24) by wesley chun
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)
wesley chun11 views

Percona Live London 2014: Serve out any page with an HA Sphinx environment

  • 1. Serve out any page with an HA Sphinx environment Art van Scheppingen Head of Database Engineering
  • 2. 2 1. Who is Spil Games? 2. What is Sphinx Search? 3. Make Sphinx highly available 4. How does Spil Games use Sphinx? 5. Sphinx benchmarks 6. Questions? Overview
  • 3. Who are we? Who is Spil Games?
  • 4. 4 • Game publishers & distributors • Company founded in 2001 • 130+ employees • 150M+ unique visitors per month • Over 60M registered users • 45 portals in 19 languages • Casual games • Social games • Real time multiplayer games • Mobile (html5) games • 40+ MySQL clusters • 65k queries per second • 10 Sphinx servers • 8k queries per second Facts
  • 5. 5 Geographic Reach 150 Million Monthly Active Users(*) Source: (*) Google Analytics, August 2012
  • 6. 6 Girls, Teens and Family spielen.com juegos.com gamesgames.com games.co.uk Brands
  • 8. 8 • MyISAM / InnoDB (5.6.4 or higher) CREATE TABLE articles ( id int(11) not null auto_increment, author varchar(40) not null, title varchar(50) not null, body text, PRIMARY KEY (id), FULLTEXT idx (title, body) ) ENGINE=InnoDB; • SELECT id, author FROM articles WHERE MATCH (title,body) AGAINST (’somephrase'); • Complex queries • SELECT id, author, MATCH (title,body) AGAINST (’somephrase' IN BOOLEAN MODE) as score FROM articles ORDER BY score DESC, id ASC; • Drawbacks: • Slow response times Full text search in MySQL
  • 9. 9 • PostgreSQL tsquery • Elasticsearch • Apache Lucene • Sphinx Search • Many other alternatives Alternatives to MySQL full text search
  • 10. 10 • Sphinx • SELECT author FROM articles WHERE MATCH('(@title,body) database'); • Complex queries • SELECT author FROM articles WHERE MATCH('(@title,body) database') ORDER BY WEIGHT(), id ASC; • Drawbacks: • Not straightforward swap • Specialized knowledge is needed Full text search in Sphinx
  • 11. 11 • Generic (site) search • Document search • Logdata analysis • Geo-distance calculation Sphinx Search typical use cases
  • 12. 12 • Consists out of two components • Indexer • Index (textual) data • Search daemon • Search indexes and return matched items • Three types of indexes: • Disk indexes • Real Time indexes • Distributed indexes Sphinx is a full text search engine
  • 13. 13 • Comparable to archive tables • Indexer indexes data and updates full index • Index is “written once” • Only attributes can be changed (run time) • Use --rotate to reload new indexes • Less resources needed (ram/cpu) • Not dependent on a specific database engine • MySQL • PostgreSQL • MSSQL • ODBC • Xml/tsp pipes Disk indexes
  • 14. 14 • Comparable to normal tables • Online indexes • Will be (eventually) written to disk • Dynamically alter the indexes • Insert/replace/delete operations • Consume more memory • Changes are generally updated within milliseconds • Sometimes stalls for seconds, so not guaranteed • High update rate influences the performance Real time indexes
  • 15. 15 • Comparable to federated tables in MySQL • Distribute the search over multiple nodes • Many smaller indexes • Sends queries to all defined nodes/indexes • Aggregates and merges results • Slowest node slows down responses • Setting timeouts can keep this lower Distributed indexes
  • 16. 16 • Two types of data: • Fields • Textual data to be indexed • Attributes • Data to sort/filter upon • Special: unique identifier • Special: (last update) timestamp • Example: +-------+----------------+---------------+-----------------+ | id | author | title | publishing_date | +-------+----------------+---------------+-----------------+ | 12345 | Linus Torvalds | Just for fun | 2002-06-04 | +-------+----------------+---------------+-----------------+ Indexing: attributes and fields
  • 17. 17 • Support for stopwords • Ignore common words like “and”, “the” and “to” • Ignore specific words like “game” and “juego” • Still affects the keyword position • Language and characters • Morphology • Similar words • Lemmatization • Run/ran/running • Character folding • U+FF10..U+FF19->0..9 Indexing: stopwords and stemmers
  • 18. 18 • Search daemon has three interfaces: • SphinxAPI: Native Sphinx binary protocol • SphinxQL: MySQL protocol • SphinxSE: MySQL/MariaDB integration • Example native: <?php $s = new SphinxClient; $s->setServer("localhost", 6712); $s->setMatchMode(SPH_MATCH_ANY); $s->setMaxQueryTime(3); $result = $s->query(”somephrase”, “articles”); var_dump($result); ?> • Example SphinxQL: echo “SELECT author FROM articles WHERE MATCH('(@title,body) somephrase') ORDER BY WEIGHT(), id ASC;” | mysql –P 6713 Searching: the interfaces
  • 19. 19 • Supports various ranking algorithms: • None • Any • Phrase proximity • Okapi BM25 (probabilistic) • Wordcount • Many more • User weighting • Boost columns with a multiplier Searching: Search daemon
  • 20. 20 mysql> SELECT title, id, publication_date FROM articles WHERE MATCH('(@title,body) database') ORDER BY WEIGHT(), publication_date ASC LIMIT 0,5 OPTION field_weights=(title=10,body=3); +-----------------------------+-------+------------------+ | title | id | publication_date | +-----------------------------+-------+------------------+ | MySQL Cookbook | 75532 | 2014-07-01 | | High performance MySQL | 94325 | 2012-04-02 | | MySQL Administrator’s Bible | 63627 | 2009-05-11 | | MySQL (4th Edition) | 39922 | 2008-09-08 | | MySQL in a nutshell | 58793 | 2008-04-01 | +-----------------------------+-------+------------------+ 5 rows in set (0.01 sec) Returned data
  • 22. 22 • Application handles: • Connections • Failovers • Timeouts • Distribution scheme • Random • Round robin • Weighted • Be creative! Client side HA
  • 23. Client side HA Server-1 Server-2 Server-n Sphinx Node 1 Sphinx Node 2 Sphinx Node n
  • 24. Client side HA Server-1 Server-2 Server-n Sphinx Node 1 Sphinx Node 2 Sphinx Node n Timeouts
  • 25. 25 <?php function mysql_ha_connect(array $servers) { foreach ($servers as $server){ $mysqli = new mysqli($server, 'user', 'pass', '', 9306); if (is_null($mysqli->connect_error)) { return $mysqli; } } return false; } $servers = array(’node1.domain.com', 'node2.domain.com'); shuffle($servers); $connection = mysql_ha_connect($servers); if($connection === false) { die('Could not connect to any node'); } … Client side HA Example
  • 26. 26 • Application connects to one single host • LB / Proxy handles: • Connections • Failovers • Timeouts • Solutions: • HAProxy • MySQL Proxy • MaxScale(?) • Distribution scheme • Random • Round robin • Weighted • Least connections • Fastest response Load balancer / Proxy
  • 27. Load Balancer / Proxy Server-1 Server-2 Server-n Load balancer Sphinx Node 1 Sphinx Node 2 Sphinx Node n
  • 28. Load Balancer / Proxy Server-1 Server-2 Server-n Load balancer Sphinx Node 1 Sphinx Node 2 Sphinx Node n Removed from load balancer
  • 29. 29 • Application connects to Sphinx on localhost • Sphinx agent mirroring handles: • Connections • Failovers • Timeouts • Distribution scheme • Random • Round robin • Nodeads (removes dead mirrors) • Noerrors (removes worse performing mirrors) Sphinx agent mirroring
  • 30. Sphinx agent mirroring Server-1 Server-2 Server-n Sphinx Sphinx Node 1 Sphinx Node 2 Sphinx Node n Sphinx Sphinx
  • 31. Sphinx agent mirroring Server-1 Server-2 Server-n Sphinx Sphinx Node 1 Sphinx Node 2 Sphinx Node n Sphinx Sphinx Removed from Sphinx
  • 32. 32 Sphinx agent mirroring example index dist { type = distributed ha_strategy = nodeads agent_query_timeout = 100 agent = node1:9312|node2:9312|node3:9312:game_index }
  • 33. How do we use Sphinx Search? Not only search
  • 34. 34 • Started using Sphinx in 2009 • Simple game search • Replaced our MySQL / MyISAM search • Added search for multiple columns • Change weight per column • Distributed mirrored indexes • Index rebuilds performed per node • Updates happen more frequently Game search
  • 35. 35 Distributed mirrored indexes Sphinx Node 1 Brand A Brand B Sphinx Node 2 Brand A+ Application Server Brand B+
  • 37. 37 • Profile service • Friends function • Searches friends on • username • firstname / lastname • Find friends across portals (within brands) • Distributed partitioned index Friends search
  • 38. 38 Distributed partitioned index Sphinx Node 1 Partition >= today Partition >= this month <= today Partition >= 3 months <= this month Partition <= 3 months Sphinx Node 2 Partition >= today Application Server
  • 40. 40 • ROAR is a database abstraction layer • See Percona Live Santa Clara 2014 presentation • Sphinx complementary to MySQL and Couchbase • Translate a title to a gamepage • Search url parts to fetch the application id • Translate keywords to lists of games • Search url parts to fetch a list of application ids • Filter applications on portal and brand • Filter applications on browser capabilities • Sort on publishing date, popularity and rating ROAR storage layer
  • 41. 41 • Legacy: • Url without identifiers • There can only be one game with the same url • Sphinx does a fast lookup of (existing) game to id • Example: http://www.agame.com/game/rig-bmx Translates into application id 123456 • Future improvements: • Correct non-existing pages (404) http://www.agame.com/game/rig-bmxx with a redirect (301) to: http://www.agame.com/game/rig-bmx Translating a title to a gamepage
  • 42. 42 Translating a title to a gamepage
  • 43. 43 Translating a title to a gamepage
  • 44. 44 • Filter on url parts • One or multiple • Complex filtering on capabilities • Blacklist incompatible games (Flash/Unity) Translating keywords to game listings
  • 45. 45 • Example 1 url part: http://www.agame.com/games/puzzle Sends this query to Sphinx: SELECT title, appid FROM game_index WHERE brandid=1 AND portalid=88 AND MATCH('@url "puzzle"') ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; • Example 2 url parts: http://www.agame.com/games/puzzle/match-3 Sends this query to Sphinx: SELECT title, appid FROM game_index WHERE brandid=1 AND portalid=88 AND MATCH('@url "puzzle" && "match-3" ') ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; Search on url parts
  • 48. 48 • Blacklisting performed on capabilities encoded bitmask • Example normal desktop browser (no filter): http://www.agame.com/games/puzzle Opening the puzzle category on a desktop sends this query to Sphinx: SELECT title, appid,(bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND MATCH('@url "puzzle"') AND bitcheck = 0 ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; • Example Chrome on Android 4.4 (filter out 11): http://www.agame.com/games/puzzle Opening the puzzle category on a Nexus 7 sends this query to Sphinx: SELECT title, appid,(bitmask1 & 11) AS bitcheck, (bitmask1 & 11) AS bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND MATCH('@url "puzzle"') AND bitcheck = 0 ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; Filter on browser capabilities
  • 49. 49 Filter on browser capabilities
  • 50. 50 Filter on browser capabilities
  • 51. 51 • Real time indexes decreased performance • Make the indexing process “nicer” /bin/taskset 0x00000001 /usr/bin/indexer --all --config /etc/sphinx.conf • Send statistics to Graphite http://engineering.spilgames.com/tamed-sphinx-search/ What we encountered
  • 53. 53 • Sysbench 0.5 • Custom lua scripts • Disabled caching • Openstack virtuals: • Benchmark driver: 4 core CPU, 4GB memory • Sphinx nodes: 4 core CPU, 16GB memory • MySQL nodes: 4 core CPU, 16GB memory • At least 3 runs per test • Average of tests counts • Repeat tests when outliers were found Sphinx Benchmark specifications
  • 54. 54 • InnoDB discrete match SELECT l.url, gd.title, g.appid, bitmask1, date_onsite FROM games g LEFT JOIN game_capabilities gc ON g.appid=gc.app INNER JOIN game_cat c ON g.appid = c.appid AND g.portalid = c.portalid AND g.brandid = c.brandid INNER JOIN cat_data cd ON c.portalid = cd.portalid AND c.brandid = cd.brandid AND c.catname = cd.catname WHERE g.brandid=1 AND g.portalid=88 AND cd.url='puzzle' ORDER BY date_onsite desc LIMIT 0,10; • Sphinx single phrase SELECT title, appid, (bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND bitcheck = 0 AND MATCH('@url "puzzle"') ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; InnoDB vs Sphinx
  • 55. 55 InnoDB vs Sphinx 0 50 100 150 200 250 300 4 8 16 24 32 48 64 Sphinx single phrase InnoDB discrete match threads 95thperc.responsetimeinms
  • 56. 56 • MyISAM single match-against Select title, appid, (bitmask1 & 0) AS bitfilter, MATCH(`url`) AGAINST('puzzle') AS score FROM game_index WHERE MATCH(`url`) AGAINST('puzzle') AND portalid=88 AND brandid=1 AND (bitmask1 & 0) = 0 ORDER BY score DESC, date_onsite DESC LIMIT 0,10; • Sphinx single phrase SELECT title, appid, (bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND bitcheck = 0 AND MATCH('@url "puzzle"') ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; MyISAM full text vs Sphinx 1
  • 57. 57 MyISAM full text vs Sphinx 1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 4 8 16 24 32 48 64 Sphinx single phrase MyISAM single match-against threads 95thperc.responsetimeinms
  • 58. 58 • MyISAM multiple match-against SELECT title, appid, (bitmask1 & 0) AS bitfilter, MATCH(`url`) AGAINST('+puzzle +sudoku' IN BOOLEAN MODE) AS score FROM game_index WHERE MATCH(`url`) AGAINST('+puzzle +sudoku' IN BOOLEAN MODE) AND portalid=88 AND brandid=1 AND (bitmask1 & 0) = 0 ORDER BY score DESC, date_onsite DESC LIMIT 0,10; • Sphinx multiple phrases SELECT title, appid, (bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND bitcheck = 0 AND MATCH('@url "puzzle" && "sudoku"') ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; MyISAM full text vs Sphinx 2
  • 59. 59 MyISAM full text vs Sphinx 2 0 50 100 150 200 250 4 8 16 24 32 48 64 MyISAM multiple match-against Sphinx multiple phrases threads 95thperc.responsetimeinms
  • 60. 60 MyISAM full text vs Sphinx 2 0 200 400 600 800 1000 1200 1400 1600 1800 2000 4 8 16 24 32 48 64 Sphinx single phrase MyISAM multiple match-against Sphinx multiple phrases MyISAM single match-against threads 95thperc.responsetimeinms
  • 61. 61 InnoDB vs MyISAM vs Sphinx 0 500 1000 1500 2000 2500 3000 3500 4000 4 8 16 24 32 48 64 Sphinx single phrase InnoDB single match-against MyISAM single match-against threads 95thperc.responsetimeinms
  • 62. 62 • Sphinx on localhost • Talks MySQL on localhost • One or two remote agent(s) • Sphinx behind loadbalancer • Proxies MySQL Sphinx HA solutions
  • 63. 63 Sphinx HA solutions 0 20 40 60 80 100 120 140 160 4 8 16 24 32 48 64 Direct connection single host Localhost 2 nodes localhost 1 node Load Balancer 2 nodes threads Avgresponsetimeinms
  • 64. 64 • Sphinx Search is faster than MySQL full text search • Smaller result sets increase performance • Due to sorting by relevance • Smaller temporary tables • InnoDB performs worse than MyISAM • Sphinx agent mirroring performs better • Probably due to Sphinx native protocol • Load balances seems to perform better • Probably due to dedicated (better) hardware Conclusion
  • 66. 66 • This presentation can be found at: http://spil.com/pluk2014sphinx • Sphinx Search: http://www.sphinxsearch.com • Sending Sphinx Search metrics to Graphite: http://engineering.spilgames.com/tamed-sphinx-search/ • About the ROAR storage layer: http://spil.com/plsc2014storage • If you wish to contact me: Email: art@spilgames.com Twitter: @banpei Blog: http://engineering.spilgames.com Twitter Spil Engineering: @spilengineering Thank you!
  • 67. 67 Google Snail Search: Boomerang Cards http://data.boomerang.nl/b/boomerang/image/google-classic/s600/3.jpg Jean-Claude van Damme Volvo Trucks http://www.volvotrucks.com/trucks/UAE-market/en- ae/newsmedia/pressreleases/Pages/pressreleases.aspx?pubid=17613 Bench mates Craig Sunter https://www.flickr.com/photos/16210667@N02/12381776985 Photo sources

Editor's Notes

  1. The three main brands: Girls, aimed at girls ages from 8 to 12 Teens aimed at boys and girls 10 to 15 and Family basically mothers playing with their children Strong domains localized over 19 different languages spielen.com, juegos.com, gamesgames.com, games.co.uk, oyunonya.com All content is localized