Successfully reported this slideshow.
Your SlideShare is downloading. ×

Percona Live London 2014: Serve out any page with an HA Sphinx environment

Percona Live London 2014: Serve out any page with an HA Sphinx environment

At Spil Games we have been using Sphinx for over five years now. At first we used it to offload full text search queries to our MySQL databases, but one year ago we started to use it in a different way: we now serve out any page to any of our 26M daily active users dynamically. This means on every visitor for every pageview he/she makes we will make at least one invocation to Sphinx Search. This session will describe on how we need to pick the right content for our users dynamically based upon their browser capabilities (Flash, HTML5, WebGL,etc) in millisecond ranges using the attribute filtering capabilities from Sphinx.

Using Sphinx as one of the main building blocks of our architectural foundations means we have to set it up as highly available as possible and be smart with data loading. Using the background indexing method we relied on for years makes our Sphinx instances respond slower during indexing and graceful index reloading. To overcome this we now use the real-time indexes whenever we update our content database. I will cover the various scenarios of HA and index population we went through with their pros and cons.

At Spil Games we have been using Sphinx for over five years now. At first we used it to offload full text search queries to our MySQL databases, but one year ago we started to use it in a different way: we now serve out any page to any of our 26M daily active users dynamically. This means on every visitor for every pageview he/she makes we will make at least one invocation to Sphinx Search. This session will describe on how we need to pick the right content for our users dynamically based upon their browser capabilities (Flash, HTML5, WebGL,etc) in millisecond ranges using the attribute filtering capabilities from Sphinx.

Using Sphinx as one of the main building blocks of our architectural foundations means we have to set it up as highly available as possible and be smart with data loading. Using the background indexing method we relied on for years makes our Sphinx instances respond slower during indexing and graceful index reloading. To overcome this we now use the real-time indexes whenever we update our content database. I will cover the various scenarios of HA and index population we went through with their pros and cons.

Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

Percona Live London 2014: Serve out any page with an HA Sphinx environment

  1. 1. Serve out any page with an HA Sphinx environment Art van Scheppingen Head of Database Engineering
  2. 2. 2 1. Who is Spil Games? 2. What is Sphinx Search? 3. Make Sphinx highly available 4. How does Spil Games use Sphinx? 5. Sphinx benchmarks 6. Questions? Overview
  3. 3. Who are we? Who is Spil Games?
  4. 4. 4 • Game publishers & distributors • Company founded in 2001 • 130+ employees • 150M+ unique visitors per month • Over 60M registered users • 45 portals in 19 languages • Casual games • Social games • Real time multiplayer games • Mobile (html5) games • 40+ MySQL clusters • 65k queries per second • 10 Sphinx servers • 8k queries per second Facts
  5. 5. 5 Geographic Reach 150 Million Monthly Active Users(*) Source: (*) Google Analytics, August 2012
  6. 6. 6 Girls, Teens and Family spielen.com juegos.com gamesgames.com games.co.uk Brands
  7. 7. Sphinx Search Advanced seaching
  8. 8. 8 • MyISAM / InnoDB (5.6.4 or higher) CREATE TABLE articles ( id int(11) not null auto_increment, author varchar(40) not null, title varchar(50) not null, body text, PRIMARY KEY (id), FULLTEXT idx (title, body) ) ENGINE=InnoDB; • SELECT id, author FROM articles WHERE MATCH (title,body) AGAINST (’somephrase'); • Complex queries • SELECT id, author, MATCH (title,body) AGAINST (’somephrase' IN BOOLEAN MODE) as score FROM articles ORDER BY score DESC, id ASC; • Drawbacks: • Slow response times Full text search in MySQL
  9. 9. 9 • PostgreSQL tsquery • Elasticsearch • Apache Lucene • Sphinx Search • Many other alternatives Alternatives to MySQL full text search
  10. 10. 10 • Sphinx • SELECT author FROM articles WHERE MATCH('(@title,body) database'); • Complex queries • SELECT author FROM articles WHERE MATCH('(@title,body) database') ORDER BY WEIGHT(), id ASC; • Drawbacks: • Not straightforward swap • Specialized knowledge is needed Full text search in Sphinx
  11. 11. 11 • Generic (site) search • Document search • Logdata analysis • Geo-distance calculation Sphinx Search typical use cases
  12. 12. 12 • Consists out of two components • Indexer • Index (textual) data • Search daemon • Search indexes and return matched items • Three types of indexes: • Disk indexes • Real Time indexes • Distributed indexes Sphinx is a full text search engine
  13. 13. 13 • Comparable to archive tables • Indexer indexes data and updates full index • Index is “written once” • Only attributes can be changed (run time) • Use --rotate to reload new indexes • Less resources needed (ram/cpu) • Not dependent on a specific database engine • MySQL • PostgreSQL • MSSQL • ODBC • Xml/tsp pipes Disk indexes
  14. 14. 14 • Comparable to normal tables • Online indexes • Will be (eventually) written to disk • Dynamically alter the indexes • Insert/replace/delete operations • Consume more memory • Changes are generally updated within milliseconds • Sometimes stalls for seconds, so not guaranteed • High update rate influences the performance Real time indexes
  15. 15. 15 • Comparable to federated tables in MySQL • Distribute the search over multiple nodes • Many smaller indexes • Sends queries to all defined nodes/indexes • Aggregates and merges results • Slowest node slows down responses • Setting timeouts can keep this lower Distributed indexes
  16. 16. 16 • Two types of data: • Fields • Textual data to be indexed • Attributes • Data to sort/filter upon • Special: unique identifier • Special: (last update) timestamp • Example: +-------+----------------+---------------+-----------------+ | id | author | title | publishing_date | +-------+----------------+---------------+-----------------+ | 12345 | Linus Torvalds | Just for fun | 2002-06-04 | +-------+----------------+---------------+-----------------+ Indexing: attributes and fields
  17. 17. 17 • Support for stopwords • Ignore common words like “and”, “the” and “to” • Ignore specific words like “game” and “juego” • Still affects the keyword position • Language and characters • Morphology • Similar words • Lemmatization • Run/ran/running • Character folding • U+FF10..U+FF19->0..9 Indexing: stopwords and stemmers
  18. 18. 18 • Search daemon has three interfaces: • SphinxAPI: Native Sphinx binary protocol • SphinxQL: MySQL protocol • SphinxSE: MySQL/MariaDB integration • Example native: <?php $s = new SphinxClient; $s->setServer("localhost", 6712); $s->setMatchMode(SPH_MATCH_ANY); $s->setMaxQueryTime(3); $result = $s->query(”somephrase”, “articles”); var_dump($result); ?> • Example SphinxQL: echo “SELECT author FROM articles WHERE MATCH('(@title,body) somephrase') ORDER BY WEIGHT(), id ASC;” | mysql –P 6713 Searching: the interfaces
  19. 19. 19 • Supports various ranking algorithms: • None • Any • Phrase proximity • Okapi BM25 (probabilistic) • Wordcount • Many more • User weighting • Boost columns with a multiplier Searching: Search daemon
  20. 20. 20 mysql> SELECT title, id, publication_date FROM articles WHERE MATCH('(@title,body) database') ORDER BY WEIGHT(), publication_date ASC LIMIT 0,5 OPTION field_weights=(title=10,body=3); +-----------------------------+-------+------------------+ | title | id | publication_date | +-----------------------------+-------+------------------+ | MySQL Cookbook | 75532 | 2014-07-01 | | High performance MySQL | 94325 | 2012-04-02 | | MySQL Administrator’s Bible | 63627 | 2009-05-11 | | MySQL (4th Edition) | 39922 | 2008-09-08 | | MySQL in a nutshell | 58793 | 2008-04-01 | +-----------------------------+-------+------------------+ 5 rows in set (0.01 sec) Returned data
  21. 21. Making Sphinx Highly Available
  22. 22. 22 • Application handles: • Connections • Failovers • Timeouts • Distribution scheme • Random • Round robin • Weighted • Be creative! Client side HA
  23. 23. Client side HA Server-1 Server-2 Server-n Sphinx Node 1 Sphinx Node 2 Sphinx Node n
  24. 24. Client side HA Server-1 Server-2 Server-n Sphinx Node 1 Sphinx Node 2 Sphinx Node n Timeouts
  25. 25. 25 <?php function mysql_ha_connect(array $servers) { foreach ($servers as $server){ $mysqli = new mysqli($server, 'user', 'pass', '', 9306); if (is_null($mysqli->connect_error)) { return $mysqli; } } return false; } $servers = array(’node1.domain.com', 'node2.domain.com'); shuffle($servers); $connection = mysql_ha_connect($servers); if($connection === false) { die('Could not connect to any node'); } … Client side HA Example
  26. 26. 26 • Application connects to one single host • LB / Proxy handles: • Connections • Failovers • Timeouts • Solutions: • HAProxy • MySQL Proxy • MaxScale(?) • Distribution scheme • Random • Round robin • Weighted • Least connections • Fastest response Load balancer / Proxy
  27. 27. Load Balancer / Proxy Server-1 Server-2 Server-n Load balancer Sphinx Node 1 Sphinx Node 2 Sphinx Node n
  28. 28. Load Balancer / Proxy Server-1 Server-2 Server-n Load balancer Sphinx Node 1 Sphinx Node 2 Sphinx Node n Removed from load balancer
  29. 29. 29 • Application connects to Sphinx on localhost • Sphinx agent mirroring handles: • Connections • Failovers • Timeouts • Distribution scheme • Random • Round robin • Nodeads (removes dead mirrors) • Noerrors (removes worse performing mirrors) Sphinx agent mirroring
  30. 30. Sphinx agent mirroring Server-1 Server-2 Server-n Sphinx Sphinx Node 1 Sphinx Node 2 Sphinx Node n Sphinx Sphinx
  31. 31. Sphinx agent mirroring Server-1 Server-2 Server-n Sphinx Sphinx Node 1 Sphinx Node 2 Sphinx Node n Sphinx Sphinx Removed from Sphinx
  32. 32. 32 Sphinx agent mirroring example index dist { type = distributed ha_strategy = nodeads agent_query_timeout = 100 agent = node1:9312|node2:9312|node3:9312:game_index }
  33. 33. How do we use Sphinx Search? Not only search
  34. 34. 34 • Started using Sphinx in 2009 • Simple game search • Replaced our MySQL / MyISAM search • Added search for multiple columns • Change weight per column • Distributed mirrored indexes • Index rebuilds performed per node • Updates happen more frequently Game search
  35. 35. 35 Distributed mirrored indexes Sphinx Node 1 Brand A Brand B Sphinx Node 2 Brand A+ Application Server Brand B+
  36. 36. 36 Game Search
  37. 37. 37 • Profile service • Friends function • Searches friends on • username • firstname / lastname • Find friends across portals (within brands) • Distributed partitioned index Friends search
  38. 38. 38 Distributed partitioned index Sphinx Node 1 Partition >= today Partition >= this month <= today Partition >= 3 months <= this month Partition <= 3 months Sphinx Node 2 Partition >= today Application Server
  39. 39. 39 Friends search
  40. 40. 40 • ROAR is a database abstraction layer • See Percona Live Santa Clara 2014 presentation • Sphinx complementary to MySQL and Couchbase • Translate a title to a gamepage • Search url parts to fetch the application id • Translate keywords to lists of games • Search url parts to fetch a list of application ids • Filter applications on portal and brand • Filter applications on browser capabilities • Sort on publishing date, popularity and rating ROAR storage layer
  41. 41. 41 • Legacy: • Url without identifiers • There can only be one game with the same url • Sphinx does a fast lookup of (existing) game to id • Example: http://www.agame.com/game/rig-bmx Translates into application id 123456 • Future improvements: • Correct non-existing pages (404) http://www.agame.com/game/rig-bmxx with a redirect (301) to: http://www.agame.com/game/rig-bmx Translating a title to a gamepage
  42. 42. 42 Translating a title to a gamepage
  43. 43. 43 Translating a title to a gamepage
  44. 44. 44 • Filter on url parts • One or multiple • Complex filtering on capabilities • Blacklist incompatible games (Flash/Unity) Translating keywords to game listings
  45. 45. 45 • Example 1 url part: http://www.agame.com/games/puzzle Sends this query to Sphinx: SELECT title, appid FROM game_index WHERE brandid=1 AND portalid=88 AND MATCH('@url "puzzle"') ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; • Example 2 url parts: http://www.agame.com/games/puzzle/match-3 Sends this query to Sphinx: SELECT title, appid FROM game_index WHERE brandid=1 AND portalid=88 AND MATCH('@url "puzzle" && "match-3" ') ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; Search on url parts
  46. 46. 46 Search on url parts
  47. 47. 47 Search on url parts
  48. 48. 48 • Blacklisting performed on capabilities encoded bitmask • Example normal desktop browser (no filter): http://www.agame.com/games/puzzle Opening the puzzle category on a desktop sends this query to Sphinx: SELECT title, appid,(bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND MATCH('@url "puzzle"') AND bitcheck = 0 ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; • Example Chrome on Android 4.4 (filter out 11): http://www.agame.com/games/puzzle Opening the puzzle category on a Nexus 7 sends this query to Sphinx: SELECT title, appid,(bitmask1 & 11) AS bitcheck, (bitmask1 & 11) AS bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND MATCH('@url "puzzle"') AND bitcheck = 0 ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; Filter on browser capabilities
  49. 49. 49 Filter on browser capabilities
  50. 50. 50 Filter on browser capabilities
  51. 51. 51 • Real time indexes decreased performance • Make the indexing process “nicer” /bin/taskset 0x00000001 /usr/bin/indexer --all --config /etc/sphinx.conf • Send statistics to Graphite http://engineering.spilgames.com/tamed-sphinx-search/ What we encountered
  52. 52. Benchmarking Sphinx
  53. 53. 53 • Sysbench 0.5 • Custom lua scripts • Disabled caching • Openstack virtuals: • Benchmark driver: 4 core CPU, 4GB memory • Sphinx nodes: 4 core CPU, 16GB memory • MySQL nodes: 4 core CPU, 16GB memory • At least 3 runs per test • Average of tests counts • Repeat tests when outliers were found Sphinx Benchmark specifications
  54. 54. 54 • InnoDB discrete match SELECT l.url, gd.title, g.appid, bitmask1, date_onsite FROM games g LEFT JOIN game_capabilities gc ON g.appid=gc.app INNER JOIN game_cat c ON g.appid = c.appid AND g.portalid = c.portalid AND g.brandid = c.brandid INNER JOIN cat_data cd ON c.portalid = cd.portalid AND c.brandid = cd.brandid AND c.catname = cd.catname WHERE g.brandid=1 AND g.portalid=88 AND cd.url='puzzle' ORDER BY date_onsite desc LIMIT 0,10; • Sphinx single phrase SELECT title, appid, (bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND bitcheck = 0 AND MATCH('@url "puzzle"') ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; InnoDB vs Sphinx
  55. 55. 55 InnoDB vs Sphinx 0 50 100 150 200 250 300 4 8 16 24 32 48 64 Sphinx single phrase InnoDB discrete match threads 95thperc.responsetimeinms
  56. 56. 56 • MyISAM single match-against Select title, appid, (bitmask1 & 0) AS bitfilter, MATCH(`url`) AGAINST('puzzle') AS score FROM game_index WHERE MATCH(`url`) AGAINST('puzzle') AND portalid=88 AND brandid=1 AND (bitmask1 & 0) = 0 ORDER BY score DESC, date_onsite DESC LIMIT 0,10; • Sphinx single phrase SELECT title, appid, (bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND bitcheck = 0 AND MATCH('@url "puzzle"') ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; MyISAM full text vs Sphinx 1
  57. 57. 57 MyISAM full text vs Sphinx 1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 4 8 16 24 32 48 64 Sphinx single phrase MyISAM single match-against threads 95thperc.responsetimeinms
  58. 58. 58 • MyISAM multiple match-against SELECT title, appid, (bitmask1 & 0) AS bitfilter, MATCH(`url`) AGAINST('+puzzle +sudoku' IN BOOLEAN MODE) AS score FROM game_index WHERE MATCH(`url`) AGAINST('+puzzle +sudoku' IN BOOLEAN MODE) AND portalid=88 AND brandid=1 AND (bitmask1 & 0) = 0 ORDER BY score DESC, date_onsite DESC LIMIT 0,10; • Sphinx multiple phrases SELECT title, appid, (bitmask1 & 0) AS bitcheck, (bitmask1 & 0) AS bitfilter FROM game_index WHERE brandid=1 AND portalid=88 AND bitcheck = 0 AND MATCH('@url "puzzle" && "sudoku"') ORDER BY date_onsite desc LIMIT 0,10 OPTION max_matches=10000; MyISAM full text vs Sphinx 2
  59. 59. 59 MyISAM full text vs Sphinx 2 0 50 100 150 200 250 4 8 16 24 32 48 64 MyISAM multiple match-against Sphinx multiple phrases threads 95thperc.responsetimeinms
  60. 60. 60 MyISAM full text vs Sphinx 2 0 200 400 600 800 1000 1200 1400 1600 1800 2000 4 8 16 24 32 48 64 Sphinx single phrase MyISAM multiple match-against Sphinx multiple phrases MyISAM single match-against threads 95thperc.responsetimeinms
  61. 61. 61 InnoDB vs MyISAM vs Sphinx 0 500 1000 1500 2000 2500 3000 3500 4000 4 8 16 24 32 48 64 Sphinx single phrase InnoDB single match-against MyISAM single match-against threads 95thperc.responsetimeinms
  62. 62. 62 • Sphinx on localhost • Talks MySQL on localhost • One or two remote agent(s) • Sphinx behind loadbalancer • Proxies MySQL Sphinx HA solutions
  63. 63. 63 Sphinx HA solutions 0 20 40 60 80 100 120 140 160 4 8 16 24 32 48 64 Direct connection single host Localhost 2 nodes localhost 1 node Load Balancer 2 nodes threads Avgresponsetimeinms
  64. 64. 64 • Sphinx Search is faster than MySQL full text search • Smaller result sets increase performance • Due to sorting by relevance • Smaller temporary tables • InnoDB performs worse than MyISAM • Sphinx agent mirroring performs better • Probably due to Sphinx native protocol • Load balances seems to perform better • Probably due to dedicated (better) hardware Conclusion
  65. 65. Questions?
  66. 66. 66 • This presentation can be found at: http://spil.com/pluk2014sphinx • Sphinx Search: http://www.sphinxsearch.com • Sending Sphinx Search metrics to Graphite: http://engineering.spilgames.com/tamed-sphinx-search/ • About the ROAR storage layer: http://spil.com/plsc2014storage • If you wish to contact me: Email: art@spilgames.com Twitter: @banpei Blog: http://engineering.spilgames.com Twitter Spil Engineering: @spilengineering Thank you!
  67. 67. 67 Google Snail Search: Boomerang Cards http://data.boomerang.nl/b/boomerang/image/google-classic/s600/3.jpg Jean-Claude van Damme Volvo Trucks http://www.volvotrucks.com/trucks/UAE-market/en- ae/newsmedia/pressreleases/Pages/pressreleases.aspx?pubid=17613 Bench mates Craig Sunter https://www.flickr.com/photos/16210667@N02/12381776985 Photo sources

Editor's Notes

  • The three main brands:
    Girls, aimed at girls ages from 8 to 12
    Teens aimed at boys and girls 10 to 15
    and Family basically mothers playing with their children
    Strong domains localized over 19 different languages
    spielen.com, juegos.com, gamesgames.com, games.co.uk, oyunonya.com
    All content is localized

×