Fulltext engine for non fulltext searches

1,013 views
989 views

Published on

Or better said when Sphinx can help MySQL on queries that at first look they don’t involve any fulltext searching.
Sphinx was build in mind to help the DB on fulltext queries. But it can also help on where there is no text search. That is everyday used queries with combined filtering,grouping and sorting used for various analytics, reporting of simply general usage.
In Sphinx, the fulltext query is executed first, creating a result set that is passed to the remaining operations ( filters, groups, sorts). By reducing the size of the set that is interogated, the whole query will not be only faster, but it will consume less resources.
Because of design for speed, Sphinx can group and sort a lot faster and can do easy segmentations or getting top-N best group matches in a single query.
The result will be offloading heavy work done by database nodes to even a single Sphinx server.
Slides were presented at PerconaLive London 2013

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,013
On SlideShare
0
From Embeds
0
Number of Embeds
416
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Fulltext engine for non fulltext searches

  1. 1. Fulltext engine for Non-Fulltext Queries Adrian Nuta // Sphinxsearch // 2013
  2. 2. • • • • Introduction Non-fulltext queries Special data columns Fulltext for speed-up non-fulltext
  3. 3. Introduction
  4. 4. What is Sphinx • • • • free, open-source, search server fast 700 qps /core / 1M docs flexible 100+ features scalable o o 300 mil. q / day 50 TB data, 100+ boxes
  5. 5. Sphinx document Doc ID Fulltext Fields Attributes ... Integer, Float, Bool, Timestamp, MVA, String, JSON ● ● Inverted index indexed, not stored ● ● stored, not indexed held in memory or on disk
  6. 6. Meet SphinxQL MySQL language MySQL MySQL connector MySQL protocol Application SELECT * FROM mytable WHERE ... SphinxQL language MySQL connector MySQL protocol Sphinx
  7. 7. Non-fulltext queries
  8. 8. What Sphinx can do beside fulltext? • • • • • usual WHERE, ORDER, GROUP BY GROUP BY custom extensions: o WITHIN GROUP ORDER BY o GROUP <N> BY Aggregation, timestamp,math functions Comparasion functions: IF(), INTERVAL(), IN() Geo spatial: GEODIST(), GEOPOLY2D()
  9. 9. WITHIN GROUP ORDER BY mysql> SELECT *,DAY(added) as today FROM facetdemo WHERE property2 = 160 AND today =26 GROUP BY brand_id WITHIN GROUP ORDER BY price ASC ORDER BY brand_id ASC; +---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+ | id | price | brand_id | property2 | added | title | brand_name | property | today | +---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+ | 520157 | 10 | 1 | 160 | 1382745486 | Product Nine Seven | brand1 | Three | 26 | | 1726473 | 10 | 2 | 160 | 1382796463 | Product Two Three | brand2 | Eight | 26 | | 1588875 | 11 | 3 | 160 | 1382762264 | Product Three Six | brand3 | Five | 26 | | 1556197 | 10 | 4 | 160 | 1382754018 | Product Eight Six | brand4 | Seven | 26 | | 751443 | 11 | 5 | 160 | 1382803444 | Product Six Three | brand5 | One | 26 | | 512776 | 11 | 6 | 160 | 1382743642 | Product Ten Five | brand6 | Six | 26 | mysql> SELECT *,DAY(added) as today FROM facetdemo WHERE property2 = 160 AND today =26 GROUP BY brand_id WITHIN GROUP ORDER BY price DESC ORDER BY brand_id ASC; +---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+ | id | price | brand_id | property2 | added | title | brand_name | property | today | +---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+ | 815154 | 998 | 1 | 160 | 1382819286 | Product Two Nine | brand1 | Eight | 26 | | 2793903 | 999 | 2 | 160 | 1382813601 | Product Eight Five | brand2 | Two | 26 | | 699831 | 1000 | 3 | 160 | 1382790589 | Product One Six | brand3 | Eight | 26 | | 714052 | 1000 | 4 | 160 | 1382794137 | Product One Ten | brand4 | Three | 26 | | 2791902 | 999 | 5 | 160 | 1382813140 | Product Five Three | brand5 | Four | 26 | | 2753725 | 1000 | 6 | 160 | 1382803662 | Product Seven Three | brand6 | Two | 26 |
  10. 10. Using GROUP <N> BY mysql> SELECT * FROM facetdemo GROUP 3 BY brand_id WITHIN GROUP ORDER BY added DESC ORDER BY brand_id ASC; +---------+-------+----------+------------+---------------------+------------+----------+ | id | price | brand_id | added | title | brand_name | property | +---------+-------+----------+------------+---------------------+------------+----------+ | 1479848 | 938 | 1 | 1382735889 | Product Ten Seven | brand1 | Four | 2479064 | 398 | 1 | 1382734998 | Product Ten Five | brand1 | Eight | | | 1480553 | 687 | 1 | 1382734048 | Product Four Two | brand1 | One | | 1479580 | 62 | 2 | 1382734834 | Product Nine Seven | brand2 | Ten | | 1479585 | 357 | 2 | 1382734834 | Product Six Two | brand2 | Five | | 477383 | 908 | 2 | 1382733871 | Product Ten Three | brand2 | Eight | | 2478429 | 425 | 3 | 1382734839 | Product Three Ten | brand3 | Five | | 477456 | 519 | 3 | 1382734818 | Product Ten One | brand3 | Six | | 477521 | 190 | 3 | 1382734403 | Product Three Two | brand3 | Five | | 2478459 | 931 | 4 | 1382734850 | Product One Two | brand4 | Five | | 1479718 | 891 | 4 | 1382734065 | Product Two One | brand4 | Three | | 2478514 | 106 | 4 | 1382733868 | Product Six Seven | brand4 | One | | 477297 | 991 | 5 | 1382734844 | Product Five Eight | brand5 | Four | | 2479053 | 648 | 5 | 1382733994 | Product Six One | brand5 | Nine | | 1480798 | 250 | 5 | 1382732121 | Product One Seven | brand5 | Eight |
  11. 11. Using HAVING mysql> SELECT *,COUNT(*) FROM facetdemo where property2 = 190 and price>900 GROUP BY brand_id HAVING COUNT(*)>1000; +-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+ | id | price | brand_id | property2 | added | title | brand_name | property | count(*) | +-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+ | 2566 | 934 | 24 | 190 | 1382615816 | Product One Three | brand24 | Six | 1023 | | 4807 | 905 | 11 | 190 | 1382616392 | Product Five Six | brand11 | Eight | 1023 | | 5539 | 985 | 44 | 190 | 1382616552 | Product Ten Four | brand44 | Three | 1009 | | 7655 | 912 | 10 | 190 | 1382617104 | Product Four Five | brand10 | Ten | 1028 | | 16837 | 968 | 20 | 190 | 1382619365 | Product One Nine | Five | 1015 | | brand20 +-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+ 5 rows in set (0.17 sec)
  12. 12. Comparing simple queries Operation Example MySQL Sphinx difference Filter by integer, group by integer WHERE property_int =190 GROUP BY brand_id 0.32 0.14 2.2x Group by integer, order by count(*) GROUP BY brand_id ORDER BY COUNT(*) DESC 1.76 0.53 3.3x Filter by integer, order by timestamp WHERE brand_id=20 ORDER BY added ASC 0.00 0.14 0 Filter by integer, order by timestamp and integer column WHERE brand_id=20 ORDER BY added DESC, property_int ASC 0.31 0.19 1.5x
  13. 13. Using IF comparasion mysql> SELECT COUNT(*), IF( property2=270 OR price<80, 1, IF(property2=280 OR price> 900,2,3) ) AS expr FROM facetdemo GROUP BY expr; +----------+------+ | count(*) | expr | +----------+------+ | 7494455 | 3 | | 1357178 | 2 | | 1148366 | 1 | +----------+------+ 3 rows in set (1.04 sec)
  14. 14. Using INTERVAL for segmentation mysql> SELECT id, price, INTERVAL(price,0,300,600,900) AS pricerange, COUNT(*) FROM facetdemo WHERE brand_id=27 GROUP BY pricerange ORDER BY pricerange ASC; +------+-------+------------+----------+ | id | price | pricerange | count(*) | +------+-------+------------+----------+ | 219 | 196 | 1 | 58283 | | 46 | 467 | 2 | 60535 | | 109 | 667 | 3 | 60789 | | 5 | 962 | 4 | 20285 | +------+-------+------------+----------+ 4 rows in set (0.19 sec)
  15. 15. Geo spatial in Sphinx GEODIST(lat1, lon1, lat2, lon2, { option=value, ... }) o in { deg | degrees | rad | radians} o out {m | meters | km | ft | mi | miles } o method {haversine | adaptive}  haversine - high precision, expensive  adaptive - good precision, cheaper (Polar flat-Earth algorithm )
  16. 16. • POLY2D(x1,y1,x2,y2,x3,y3, …) • GEOPOLY2D (lat1,lng1,lat2,lng2,lat3,lng3,...) • lat/lng in degrees • CONTAINTS( polygon, x, y ) mysql> SELECT *, CONTAINS(GEOPOLY2D(40.95164274496,-76.88583678218,41.188446201688,73.203723511772,39.900666261352,-74.171833538046,40.059260979044,76.301076056469),latitude_deg,longitude_deg) AS inside FROM geodemo WHERE inside=1 LIMIT 0,100 ;
  17. 17. Special data columns
  18. 18. Multi value attribute (MVA) • set of integers column Price Categories 24 ... 199.99 ... 128 300 float MVA ...
  19. 19. MVA with multiple selection mysql> SELECT id,price,brand_id,categories FROM facetdemo WHERE categories IN (13,14); +------+-------+----------+------------+ | id | price | brand_id | categories | +------+-------+----------+------------+ | 1 | 874 | 47 | 13 | | 2 | 712 | 38 | 11,14 | | 9 | 113 | 25 | 12,14 | | 17 | 440 | 46 | 13,15 | | 19 | 206 | 50 | 13,17 | | 21 | 76 | 28 | 7,10,13 | | 22 | 363 | 21 | 13,17,20 | ...
  20. 20. Grouping on MVA mysql> SELECT id,price,brand_id,categories,GROUPBY(),COUNT(*) FROM facetdemo GROUP BY categories; +------+-------+----------+------------+-----------+----------+ | id | price | brand_id | categories | groupby() | count(*) | +------+-------+----------+------------+-----------+----------+ | 1 | 874 | 47 | 13 | 13 | 362931 | | 2 | 712 | 38 | 11,14 | 14 | 185023 | | 2 | 712 | 38 | 11,14 | 11 | 329874 | | 3 | 773 | 7 | 12,16 | 16 | 143837 | | 3 | 773 | 7 | 12,16 | 12 | 349446 | | 4 | 803 | 31 | 6,9 | 9 | 267583 | | 4 | 803 | 31 | 6,9 | 6 | 184772 | ...
  21. 21. Going further: JSON • • starting with 2.1 Sphinx supports JSON documents useful for { o o "id": 1, "gid": 2, "title": "some title", "tags": [ "tag1", "tag2", "tag3" ], "property": [ { "name": "color", "value": "blue" }, { "name": "weight", "value": 2.56 } ] unstructured data complex one to many relations }
  22. 22. JSON attributes • • • filter, sort and group JSON/MVA array functions: LENGTH(), LEAST(), GREATEST() Advanced JSON search in array of objects: ANY(), ALL(), INDEXOF()
  23. 23. Advanced searching in JSON document : SELECT *,ANY ( id : 1011 title : Hotel Sky myjson: { … offers: { ( item.type = 1 AND item.start > my_start_timestamp AND item.end < my_end_timestamp ) { ‘type’ : 3, ‘start’ : start_timestamp, ‘end’: end_timestamp }, { ‘type’ : 1, ‘start’ : start_timestamp, ‘end’: end_timestamp } … } } FOR item IN myjson.offers ) AS condition FROM index WHERE condition =1
  24. 24. • • • ANY ( cond FOR var IN json.array) true if one element match condition ALL ( cond FOR var IN json.array) o true if all elements match condition INDEXOF ( cond FOR var IN json.array) o returns index key of first element that match condition o
  25. 25. Fulltext for speed up non-fulltext
  26. 26. SELECT *,(...) as heavy_expr WHERE attr=x AND heavy_expr =1 No fulltext match, query does fullscan, computes for whole collection the heavy expression SELECT *,(...) as heavy_expr WHERE MATCH(‘attrx’) AND heavy_expr =1 Fulltext match, heavy expression is computed only on result set returned by fulltext match
  27. 27. Sphinx with FT filter Operation Example MySQL Sphinx w/o FT Filter by integer, order by timestamp and integer column WHERE brand_id=20 ORDER BY added DESC, property_int ASC 0.31 0.19 Fulltext filter, order by timestamp and integer column WHERE MATCH(‘brand20’) ORDER BY added DESC, property_int ASC Sphinx with FT 0.13
  28. 28. Speed up geo spatial with fulltext • example: find items around a point in New York city in a 10km radius. Speed-up: search only items belonging to New York states mysql> SELECT *, GEODIST(0.710011075352, 1.2918035709982,latitude,longitude,{in=rad,out=km,method=adaptive}) as distance FROM geodemo WHERE distance < 10 ORDER BY distance ASC LIMIT 0,10;10 rows in set (0.17 sec) mysql> SELECT *, GEODIST(0.710011075352, 1.2918035709982,latitude,longitude,{in=rad,out=km,method=adaptive}) as distance FROM geodemo WHERE MATCH('@state_code NY') AND distance < 10 ORDER BY distance ASC LIMIT 0,10;10 rows in set (0.03 sec)
  29. 29. Questions? adrian.nuta@sphinxsearch.com http://www.sphinxsearch.com

×