SlideShare a Scribd company logo
1 of 29
Download to read offline
Fulltext engine for
Non-Fulltext Queries
Adrian Nuta // Sphinxsearch // 2013
•
•
•
•

Introduction
Non-fulltext queries
Special data columns
Fulltext for speed-up non-fulltext
Introduction
What is Sphinx

•
•
•
•

free, open-source, search server
fast
700 qps /core / 1M docs
flexible
100+ features
scalable
o
o

300 mil. q / day
50 TB data, 100+ boxes
Sphinx document
Doc
ID

Fulltext Fields

Attributes
...
Integer, Float, Bool, Timestamp, MVA,
String, JSON

●
●

Inverted index
indexed, not stored

●
●

stored, not indexed
held in memory or
on disk
Meet SphinxQL
MySQL language

MySQL

MySQL connector
MySQL protocol

Application

SELECT * FROM mytable WHERE ...

SphinxQL language
MySQL connector
MySQL protocol

Sphinx
Non-fulltext queries
What Sphinx can do beside fulltext?

•
•
•
•
•

usual WHERE, ORDER, GROUP BY
GROUP BY custom extensions:
o WITHIN GROUP ORDER BY
o GROUP <N> BY
Aggregation, timestamp,math functions
Comparasion functions: IF(), INTERVAL(), IN()
Geo spatial: GEODIST(), GEOPOLY2D()
WITHIN GROUP ORDER BY
mysql> SELECT *,DAY(added) as today FROM facetdemo WHERE property2 = 160 AND today =26 GROUP BY brand_id WITHIN GROUP ORDER BY
price ASC ORDER BY brand_id ASC;
+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+
| id
| price | brand_id | property2 | added
| title
| brand_name | property | today |
+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+
| 520157 |
10 |
1 |
160 | 1382745486 | Product Nine Seven | brand1
| Three
|
26 |
| 1726473 |
10 |
2 |
160 | 1382796463 | Product Two Three
| brand2
| Eight
|
26 |
| 1588875 |
11 |
3 |
160 | 1382762264 | Product Three Six
| brand3
| Five
|
26 |
| 1556197 |
10 |
4 |
160 | 1382754018 | Product Eight Six
| brand4
| Seven
|
26 |
| 751443 |
11 |
5 |
160 | 1382803444 | Product Six Three
| brand5
| One
|
26 |
| 512776 |
11 |
6 |
160 | 1382743642 | Product Ten Five
| brand6
| Six
|
26 |

mysql> SELECT *,DAY(added) as today FROM facetdemo WHERE property2 = 160 AND today =26 GROUP BY brand_id WITHIN GROUP ORDER
BY price DESC ORDER BY brand_id ASC;
+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+
| id
| price | brand_id | property2 | added
| title
| brand_name | property | today |
+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+
| 815154 |
998 |
1 |
160 | 1382819286 | Product Two Nine
| brand1
| Eight
|
26 |
| 2793903 |
999 |
2 |
160 | 1382813601 | Product Eight Five | brand2
| Two
|
26 |
| 699831 | 1000 |
3 |
160 | 1382790589 | Product One Six
| brand3
| Eight
|
26 |
| 714052 | 1000 |
4 |
160 | 1382794137 | Product One Ten
| brand4
| Three
|
26 |
| 2791902 |
999 |
5 |
160 | 1382813140 | Product Five Three | brand5
| Four
|
26 |
| 2753725 | 1000 |
6 |
160 | 1382803662 | Product Seven Three | brand6
| Two
|
26 |
Using GROUP <N> BY
mysql>

SELECT * FROM facetdemo GROUP 3 BY brand_id WITHIN GROUP ORDER BY added DESC ORDER BY brand_id ASC;

+---------+-------+----------+------------+---------------------+------------+----------+
| id

| price | brand_id | added

| title

| brand_name | property |

+---------+-------+----------+------------+---------------------+------------+----------+

| 1479848 |

938 |

1 | 1382735889 | Product Ten Seven

| brand1

| Four

| 2479064 |

398 |

1 | 1382734998 | Product Ten Five

| brand1

| Eight

|
|

| 1480553 |

687 |

1 | 1382734048 | Product Four Two

| brand1

| One

|

| 1479580 |

62 |

2 | 1382734834 | Product Nine Seven

| brand2

| Ten

|

| 1479585 |

357 |

2 | 1382734834 | Product Six Two

| brand2

| Five

|

|

477383 |

908 |

2 | 1382733871 | Product Ten Three

| brand2

| Eight

|

| 2478429 |

425 |

3 | 1382734839 | Product Three Ten

| brand3

| Five

|

|

477456 |

519 |

3 | 1382734818 | Product Ten One

| brand3

| Six

|

|

477521 |

190 |

3 | 1382734403 | Product Three Two

| brand3

| Five

|

| 2478459 |

931 |

4 | 1382734850 | Product One Two

| brand4

| Five

|

| 1479718 |

891 |

4 | 1382734065 | Product Two One

| brand4

| Three

|

| 2478514 |

106 |

4 | 1382733868 | Product Six Seven

| brand4

| One

|

|

477297 |

991 |

5 | 1382734844 | Product Five Eight

| brand5

| Four

|

| 2479053 |

648 |

5 | 1382733994 | Product Six One

| brand5

| Nine

|

| 1480798 |

250 |

5 | 1382732121 | Product One Seven

| brand5

| Eight

|
Using HAVING
mysql> SELECT *,COUNT(*) FROM facetdemo where property2 = 190 and price>900 GROUP BY brand_id HAVING COUNT(*)>1000;
+-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+
| id

| price | brand_id | property2 | added

| title

| brand_name | property | count(*) |

+-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+

|

2566 |

934 |

24 |

190 | 1382615816 | Product One Three | brand24

| Six

|

1023 |

|

4807 |

905 |

11 |

190 | 1382616392 | Product Five Six

| brand11

| Eight

|

1023 |

|

5539 |

985 |

44 |

190 | 1382616552 | Product Ten Four

| brand44

| Three

|

1009 |

|

7655 |

912 |

10 |

190 | 1382617104 | Product Four Five | brand10

| Ten

|

1028 |

| 16837 |

968 |

20 |

190 | 1382619365 | Product One Nine

| Five

|

1015 |

| brand20

+-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+

5 rows in set (0.17 sec)
Comparing simple queries
Operation

Example

MySQL

Sphinx

difference

Filter by integer, group by
integer

WHERE property_int =190
GROUP BY brand_id

0.32

0.14

2.2x

Group by integer, order by
count(*)

GROUP BY brand_id ORDER BY
COUNT(*) DESC

1.76

0.53

3.3x

Filter by integer, order by
timestamp

WHERE brand_id=20 ORDER BY
added ASC

0.00

0.14

0

Filter by integer, order by
timestamp and integer
column

WHERE brand_id=20 ORDER BY
added DESC, property_int ASC

0.31

0.19

1.5x
Using IF comparasion
mysql> SELECT COUNT(*), IF( property2=270 OR price<80, 1,
IF(property2=280 OR price> 900,2,3)
) AS expr

FROM facetdemo GROUP BY expr;

+----------+------+
| count(*) | expr |
+----------+------+
|

7494455 |

3 |

|

1357178 |

2 |

|

1148366 |

1 |

+----------+------+
3 rows in set (1.04 sec)
Using INTERVAL for segmentation
mysql> SELECT id, price, INTERVAL(price,0,300,600,900) AS pricerange, COUNT(*) FROM facetdemo WHERE
brand_id=27 GROUP BY pricerange ORDER BY pricerange ASC;
+------+-------+------------+----------+

| id

| price | pricerange | count(*) |

+------+-------+------------+----------+
|

219 |

196 |

1 |

58283 |

|

46 |

467 |

2 |

60535 |

|

109 |

667 |

3 |

60789 |

|

5 |

962 |

4 |

20285 |

+------+-------+------------+----------+
4 rows in set (0.19 sec)
Geo spatial in Sphinx
GEODIST(lat1, lon1, lat2, lon2, { option=value, ... })
o in
{ deg | degrees | rad | radians}
o out
{m | meters | km | ft | mi | miles }
o method
{haversine | adaptive}
 haversine - high precision, expensive
 adaptive - good precision, cheaper
(Polar flat-Earth algorithm )
• POLY2D(x1,y1,x2,y2,x3,y3, …)
• GEOPOLY2D (lat1,lng1,lat2,lng2,lat3,lng3,...)
•

lat/lng in degrees

• CONTAINTS( polygon, x, y )
mysql> SELECT *, CONTAINS(GEOPOLY2D(40.95164274496,-76.88583678218,41.188446201688,73.203723511772,39.900666261352,-74.171833538046,40.059260979044,76.301076056469),latitude_deg,longitude_deg) AS inside FROM geodemo WHERE inside=1
LIMIT 0,100 ;
Special data columns
Multi value attribute (MVA)

•

set of integers column
Price

Categories

24

...

199.99

...

128
300

float

MVA

...
MVA with multiple selection
mysql> SELECT id,price,brand_id,categories FROM facetdemo WHERE categories IN (13,14);
+------+-------+----------+------------+
| id

| price | brand_id | categories |

+------+-------+----------+------------+
|

1 |

874 |

47 | 13

|

|

2 |

712 |

38 | 11,14

|

|

9 |

113 |

25 | 12,14

|

|

17 |

440 |

46 | 13,15

|

|

19 |

206 |

50 | 13,17

|

|

21 |

76 |

28 | 7,10,13

|

|

22 |

363 |

21 | 13,17,20

|

...
Grouping on MVA
mysql> SELECT id,price,brand_id,categories,GROUPBY(),COUNT(*) FROM facetdemo GROUP BY categories;
+------+-------+----------+------------+-----------+----------+
| id

| price | brand_id | categories | groupby() | count(*) |

+------+-------+----------+------------+-----------+----------+
|

1 |

874 |

47 | 13

|

13 |

362931 |

|

2 |

712 |

38 | 11,14

|

14 |

185023 |

|

2 |

712 |

38 | 11,14

|

11 |

329874 |

|

3 |

773 |

7 | 12,16

|

16 |

143837 |

|

3 |

773 |

7 | 12,16

|

12 |

349446 |

|

4 |

803 |

31 | 6,9

|

9 |

267583 |

|

4 |

803 |

31 | 6,9

|

6 |

184772 |

...
Going further: JSON

•
•

starting with 2.1 Sphinx supports JSON
documents
useful for
{

o
o

"id": 1,
"gid": 2,
"title": "some title",
"tags":
[ "tag1", "tag2", "tag3" ],
"property": [
{
"name": "color",
"value": "blue"
},
{
"name": "weight",
"value": 2.56
}
]

unstructured data
complex one to many relations

}
JSON attributes

•
•
•

filter, sort and group
JSON/MVA array functions:
LENGTH(), LEAST(), GREATEST()
Advanced JSON search in array of objects:
ANY(), ALL(), INDEXOF()
Advanced searching in JSON
document :

SELECT *,ANY (

id : 1011
title : Hotel Sky
myjson: {
…
offers: {

( item.type = 1 AND
item.start > my_start_timestamp AND

item.end < my_end_timestamp )
{
‘type’ : 3,
‘start’ : start_timestamp,
‘end’: end_timestamp
},
{
‘type’ : 1,
‘start’ : start_timestamp,
‘end’: end_timestamp
}
…

}
}

FOR item IN myjson.offers
) AS condition
FROM index
WHERE condition =1
•
•
•

ANY ( cond FOR var IN json.array)

true if one element match condition
ALL ( cond FOR var IN json.array)
o true if all elements match condition
INDEXOF ( cond FOR var IN json.array)
o returns index key of first element that match
condition
o
Fulltext for speed up
non-fulltext
SELECT *,(...) as heavy_expr
WHERE attr=x AND heavy_expr =1

No fulltext match, query does fullscan,
computes for whole collection the heavy
expression

SELECT *,(...) as heavy_expr
WHERE MATCH(‘attrx’) AND heavy_expr =1

Fulltext match, heavy expression is
computed only on result set returned by
fulltext match
Sphinx with FT filter
Operation

Example

MySQL

Sphinx w/o FT

Filter by integer,
order by
timestamp and
integer column

WHERE
brand_id=20
ORDER BY added
DESC, property_int
ASC

0.31

0.19

Fulltext filter,
order by
timestamp and
integer column

WHERE
MATCH(‘brand20’)
ORDER BY added
DESC, property_int
ASC

Sphinx with FT

0.13
Speed up geo spatial with fulltext

•

example: find items around a point in New York city in a
10km radius. Speed-up: search only items belonging to
New York states

mysql> SELECT *, GEODIST(0.710011075352, 1.2918035709982,latitude,longitude,{in=rad,out=km,method=adaptive}) as distance FROM geodemo WHERE

distance < 10 ORDER BY distance ASC LIMIT 0,10;10 rows in set (0.17 sec)

mysql> SELECT *, GEODIST(0.710011075352, 1.2918035709982,latitude,longitude,{in=rad,out=km,method=adaptive}) as distance FROM geodemo WHERE
MATCH('@state_code NY')

AND distance < 10 ORDER BY distance ASC LIMIT 0,10;10 rows in set (0.03 sec)
Questions?
adrian.nuta@sphinxsearch.com
http://www.sphinxsearch.com

More Related Content

What's hot

What's hot (20)

Cassandra lesson learned - extended
Cassandra   lesson learned  - extendedCassandra   lesson learned  - extended
Cassandra lesson learned - extended
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
 
Using Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performance
 
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions
 
Window functions in MySQL 8.0
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0
 
Using PostgreSQL statistics to optimize performance
Using PostgreSQL statistics to optimize performance Using PostgreSQL statistics to optimize performance
Using PostgreSQL statistics to optimize performance
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
Deep dive to PostgreSQL Indexes
Deep dive to PostgreSQL IndexesDeep dive to PostgreSQL Indexes
Deep dive to PostgreSQL Indexes
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
 
Chapter 4
Chapter 4Chapter 4
Chapter 4
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...
 
Surface3d in R and rgl package.
Surface3d in R and rgl package.Surface3d in R and rgl package.
Surface3d in R and rgl package.
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gem
 
Understanding Erlang Terms
Understanding Erlang TermsUnderstanding Erlang Terms
Understanding Erlang Terms
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 

Similar to Fulltext engine for non fulltext searches

15 protips for mysql users pfz
15 protips for mysql users   pfz15 protips for mysql users   pfz
15 protips for mysql users pfz
Joshua Thijssen
 
Performance Enhancements In Postgre Sql 8.4
Performance Enhancements In Postgre Sql 8.4Performance Enhancements In Postgre Sql 8.4
Performance Enhancements In Postgre Sql 8.4
HighLoad2009
 
Short Intro to PHP and MySQL
Short Intro to PHP and MySQLShort Intro to PHP and MySQL
Short Intro to PHP and MySQL
Jussi Pohjolainen
 

Similar to Fulltext engine for non fulltext searches (20)

4. Data Manipulation.ppt
4. Data Manipulation.ppt4. Data Manipulation.ppt
4. Data Manipulation.ppt
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
Data Love Conference - Window Functions for Database Analytics
Data Love Conference - Window Functions for Database AnalyticsData Love Conference - Window Functions for Database Analytics
Data Love Conference - Window Functions for Database Analytics
 
Explain
ExplainExplain
Explain
 
Oracle sql high performance tuning
Oracle sql high performance tuningOracle sql high performance tuning
Oracle sql high performance tuning
 
MySQL Kitchen : spice up your everyday SQL queries
MySQL Kitchen : spice up your everyday SQL queriesMySQL Kitchen : spice up your everyday SQL queries
MySQL Kitchen : spice up your everyday SQL queries
 
15 protips for mysql users pfz
15 protips for mysql users   pfz15 protips for mysql users   pfz
15 protips for mysql users pfz
 
Cruel (SQL) Intentions
Cruel (SQL) IntentionsCruel (SQL) Intentions
Cruel (SQL) Intentions
 
Non-Relational Postgres
Non-Relational PostgresNon-Relational Postgres
Non-Relational Postgres
 
Performance Enhancements In Postgre Sql 8.4
Performance Enhancements In Postgre Sql 8.4Performance Enhancements In Postgre Sql 8.4
Performance Enhancements In Postgre Sql 8.4
 
Pro PostgreSQL
Pro PostgreSQLPro PostgreSQL
Pro PostgreSQL
 
Undelete (and more) rows from the binary log
Undelete (and more) rows from the binary logUndelete (and more) rows from the binary log
Undelete (and more) rows from the binary log
 
Short Intro to PHP and MySQL
Short Intro to PHP and MySQLShort Intro to PHP and MySQL
Short Intro to PHP and MySQL
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Oracle dbms_xplan.display_cursor format
Oracle dbms_xplan.display_cursor formatOracle dbms_xplan.display_cursor format
Oracle dbms_xplan.display_cursor format
 
MYSQL GROUP FUNCTION.pptx
MYSQL GROUP FUNCTION.pptxMYSQL GROUP FUNCTION.pptx
MYSQL GROUP FUNCTION.pptx
 
16 MySQL Optimization #burningkeyboards
16 MySQL Optimization #burningkeyboards16 MySQL Optimization #burningkeyboards
16 MySQL Optimization #burningkeyboards
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL Indexing
 
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
 Design and Develop SQL DDL statements which demonstrate the use of SQL objec... Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Fulltext engine for non fulltext searches

  • 1. Fulltext engine for Non-Fulltext Queries Adrian Nuta // Sphinxsearch // 2013
  • 2. • • • • Introduction Non-fulltext queries Special data columns Fulltext for speed-up non-fulltext
  • 4. What is Sphinx • • • • free, open-source, search server fast 700 qps /core / 1M docs flexible 100+ features scalable o o 300 mil. q / day 50 TB data, 100+ boxes
  • 5. Sphinx document Doc ID Fulltext Fields Attributes ... Integer, Float, Bool, Timestamp, MVA, String, JSON ● ● Inverted index indexed, not stored ● ● stored, not indexed held in memory or on disk
  • 6. Meet SphinxQL MySQL language MySQL MySQL connector MySQL protocol Application SELECT * FROM mytable WHERE ... SphinxQL language MySQL connector MySQL protocol Sphinx
  • 8. What Sphinx can do beside fulltext? • • • • • usual WHERE, ORDER, GROUP BY GROUP BY custom extensions: o WITHIN GROUP ORDER BY o GROUP <N> BY Aggregation, timestamp,math functions Comparasion functions: IF(), INTERVAL(), IN() Geo spatial: GEODIST(), GEOPOLY2D()
  • 9. WITHIN GROUP ORDER BY mysql> SELECT *,DAY(added) as today FROM facetdemo WHERE property2 = 160 AND today =26 GROUP BY brand_id WITHIN GROUP ORDER BY price ASC ORDER BY brand_id ASC; +---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+ | id | price | brand_id | property2 | added | title | brand_name | property | today | +---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+ | 520157 | 10 | 1 | 160 | 1382745486 | Product Nine Seven | brand1 | Three | 26 | | 1726473 | 10 | 2 | 160 | 1382796463 | Product Two Three | brand2 | Eight | 26 | | 1588875 | 11 | 3 | 160 | 1382762264 | Product Three Six | brand3 | Five | 26 | | 1556197 | 10 | 4 | 160 | 1382754018 | Product Eight Six | brand4 | Seven | 26 | | 751443 | 11 | 5 | 160 | 1382803444 | Product Six Three | brand5 | One | 26 | | 512776 | 11 | 6 | 160 | 1382743642 | Product Ten Five | brand6 | Six | 26 | mysql> SELECT *,DAY(added) as today FROM facetdemo WHERE property2 = 160 AND today =26 GROUP BY brand_id WITHIN GROUP ORDER BY price DESC ORDER BY brand_id ASC; +---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+ | id | price | brand_id | property2 | added | title | brand_name | property | today | +---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+ | 815154 | 998 | 1 | 160 | 1382819286 | Product Two Nine | brand1 | Eight | 26 | | 2793903 | 999 | 2 | 160 | 1382813601 | Product Eight Five | brand2 | Two | 26 | | 699831 | 1000 | 3 | 160 | 1382790589 | Product One Six | brand3 | Eight | 26 | | 714052 | 1000 | 4 | 160 | 1382794137 | Product One Ten | brand4 | Three | 26 | | 2791902 | 999 | 5 | 160 | 1382813140 | Product Five Three | brand5 | Four | 26 | | 2753725 | 1000 | 6 | 160 | 1382803662 | Product Seven Three | brand6 | Two | 26 |
  • 10. Using GROUP <N> BY mysql> SELECT * FROM facetdemo GROUP 3 BY brand_id WITHIN GROUP ORDER BY added DESC ORDER BY brand_id ASC; +---------+-------+----------+------------+---------------------+------------+----------+ | id | price | brand_id | added | title | brand_name | property | +---------+-------+----------+------------+---------------------+------------+----------+ | 1479848 | 938 | 1 | 1382735889 | Product Ten Seven | brand1 | Four | 2479064 | 398 | 1 | 1382734998 | Product Ten Five | brand1 | Eight | | | 1480553 | 687 | 1 | 1382734048 | Product Four Two | brand1 | One | | 1479580 | 62 | 2 | 1382734834 | Product Nine Seven | brand2 | Ten | | 1479585 | 357 | 2 | 1382734834 | Product Six Two | brand2 | Five | | 477383 | 908 | 2 | 1382733871 | Product Ten Three | brand2 | Eight | | 2478429 | 425 | 3 | 1382734839 | Product Three Ten | brand3 | Five | | 477456 | 519 | 3 | 1382734818 | Product Ten One | brand3 | Six | | 477521 | 190 | 3 | 1382734403 | Product Three Two | brand3 | Five | | 2478459 | 931 | 4 | 1382734850 | Product One Two | brand4 | Five | | 1479718 | 891 | 4 | 1382734065 | Product Two One | brand4 | Three | | 2478514 | 106 | 4 | 1382733868 | Product Six Seven | brand4 | One | | 477297 | 991 | 5 | 1382734844 | Product Five Eight | brand5 | Four | | 2479053 | 648 | 5 | 1382733994 | Product Six One | brand5 | Nine | | 1480798 | 250 | 5 | 1382732121 | Product One Seven | brand5 | Eight |
  • 11. Using HAVING mysql> SELECT *,COUNT(*) FROM facetdemo where property2 = 190 and price>900 GROUP BY brand_id HAVING COUNT(*)>1000; +-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+ | id | price | brand_id | property2 | added | title | brand_name | property | count(*) | +-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+ | 2566 | 934 | 24 | 190 | 1382615816 | Product One Three | brand24 | Six | 1023 | | 4807 | 905 | 11 | 190 | 1382616392 | Product Five Six | brand11 | Eight | 1023 | | 5539 | 985 | 44 | 190 | 1382616552 | Product Ten Four | brand44 | Three | 1009 | | 7655 | 912 | 10 | 190 | 1382617104 | Product Four Five | brand10 | Ten | 1028 | | 16837 | 968 | 20 | 190 | 1382619365 | Product One Nine | Five | 1015 | | brand20 +-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+ 5 rows in set (0.17 sec)
  • 12. Comparing simple queries Operation Example MySQL Sphinx difference Filter by integer, group by integer WHERE property_int =190 GROUP BY brand_id 0.32 0.14 2.2x Group by integer, order by count(*) GROUP BY brand_id ORDER BY COUNT(*) DESC 1.76 0.53 3.3x Filter by integer, order by timestamp WHERE brand_id=20 ORDER BY added ASC 0.00 0.14 0 Filter by integer, order by timestamp and integer column WHERE brand_id=20 ORDER BY added DESC, property_int ASC 0.31 0.19 1.5x
  • 13. Using IF comparasion mysql> SELECT COUNT(*), IF( property2=270 OR price<80, 1, IF(property2=280 OR price> 900,2,3) ) AS expr FROM facetdemo GROUP BY expr; +----------+------+ | count(*) | expr | +----------+------+ | 7494455 | 3 | | 1357178 | 2 | | 1148366 | 1 | +----------+------+ 3 rows in set (1.04 sec)
  • 14. Using INTERVAL for segmentation mysql> SELECT id, price, INTERVAL(price,0,300,600,900) AS pricerange, COUNT(*) FROM facetdemo WHERE brand_id=27 GROUP BY pricerange ORDER BY pricerange ASC; +------+-------+------------+----------+ | id | price | pricerange | count(*) | +------+-------+------------+----------+ | 219 | 196 | 1 | 58283 | | 46 | 467 | 2 | 60535 | | 109 | 667 | 3 | 60789 | | 5 | 962 | 4 | 20285 | +------+-------+------------+----------+ 4 rows in set (0.19 sec)
  • 15. Geo spatial in Sphinx GEODIST(lat1, lon1, lat2, lon2, { option=value, ... }) o in { deg | degrees | rad | radians} o out {m | meters | km | ft | mi | miles } o method {haversine | adaptive}  haversine - high precision, expensive  adaptive - good precision, cheaper (Polar flat-Earth algorithm )
  • 16. • POLY2D(x1,y1,x2,y2,x3,y3, …) • GEOPOLY2D (lat1,lng1,lat2,lng2,lat3,lng3,...) • lat/lng in degrees • CONTAINTS( polygon, x, y ) mysql> SELECT *, CONTAINS(GEOPOLY2D(40.95164274496,-76.88583678218,41.188446201688,73.203723511772,39.900666261352,-74.171833538046,40.059260979044,76.301076056469),latitude_deg,longitude_deg) AS inside FROM geodemo WHERE inside=1 LIMIT 0,100 ;
  • 18. Multi value attribute (MVA) • set of integers column Price Categories 24 ... 199.99 ... 128 300 float MVA ...
  • 19. MVA with multiple selection mysql> SELECT id,price,brand_id,categories FROM facetdemo WHERE categories IN (13,14); +------+-------+----------+------------+ | id | price | brand_id | categories | +------+-------+----------+------------+ | 1 | 874 | 47 | 13 | | 2 | 712 | 38 | 11,14 | | 9 | 113 | 25 | 12,14 | | 17 | 440 | 46 | 13,15 | | 19 | 206 | 50 | 13,17 | | 21 | 76 | 28 | 7,10,13 | | 22 | 363 | 21 | 13,17,20 | ...
  • 20. Grouping on MVA mysql> SELECT id,price,brand_id,categories,GROUPBY(),COUNT(*) FROM facetdemo GROUP BY categories; +------+-------+----------+------------+-----------+----------+ | id | price | brand_id | categories | groupby() | count(*) | +------+-------+----------+------------+-----------+----------+ | 1 | 874 | 47 | 13 | 13 | 362931 | | 2 | 712 | 38 | 11,14 | 14 | 185023 | | 2 | 712 | 38 | 11,14 | 11 | 329874 | | 3 | 773 | 7 | 12,16 | 16 | 143837 | | 3 | 773 | 7 | 12,16 | 12 | 349446 | | 4 | 803 | 31 | 6,9 | 9 | 267583 | | 4 | 803 | 31 | 6,9 | 6 | 184772 | ...
  • 21. Going further: JSON • • starting with 2.1 Sphinx supports JSON documents useful for { o o "id": 1, "gid": 2, "title": "some title", "tags": [ "tag1", "tag2", "tag3" ], "property": [ { "name": "color", "value": "blue" }, { "name": "weight", "value": 2.56 } ] unstructured data complex one to many relations }
  • 22. JSON attributes • • • filter, sort and group JSON/MVA array functions: LENGTH(), LEAST(), GREATEST() Advanced JSON search in array of objects: ANY(), ALL(), INDEXOF()
  • 23. Advanced searching in JSON document : SELECT *,ANY ( id : 1011 title : Hotel Sky myjson: { … offers: { ( item.type = 1 AND item.start > my_start_timestamp AND item.end < my_end_timestamp ) { ‘type’ : 3, ‘start’ : start_timestamp, ‘end’: end_timestamp }, { ‘type’ : 1, ‘start’ : start_timestamp, ‘end’: end_timestamp } … } } FOR item IN myjson.offers ) AS condition FROM index WHERE condition =1
  • 24. • • • ANY ( cond FOR var IN json.array) true if one element match condition ALL ( cond FOR var IN json.array) o true if all elements match condition INDEXOF ( cond FOR var IN json.array) o returns index key of first element that match condition o
  • 25. Fulltext for speed up non-fulltext
  • 26. SELECT *,(...) as heavy_expr WHERE attr=x AND heavy_expr =1 No fulltext match, query does fullscan, computes for whole collection the heavy expression SELECT *,(...) as heavy_expr WHERE MATCH(‘attrx’) AND heavy_expr =1 Fulltext match, heavy expression is computed only on result set returned by fulltext match
  • 27. Sphinx with FT filter Operation Example MySQL Sphinx w/o FT Filter by integer, order by timestamp and integer column WHERE brand_id=20 ORDER BY added DESC, property_int ASC 0.31 0.19 Fulltext filter, order by timestamp and integer column WHERE MATCH(‘brand20’) ORDER BY added DESC, property_int ASC Sphinx with FT 0.13
  • 28. Speed up geo spatial with fulltext • example: find items around a point in New York city in a 10km radius. Speed-up: search only items belonging to New York states mysql> SELECT *, GEODIST(0.710011075352, 1.2918035709982,latitude,longitude,{in=rad,out=km,method=adaptive}) as distance FROM geodemo WHERE distance < 10 ORDER BY distance ASC LIMIT 0,10;10 rows in set (0.17 sec) mysql> SELECT *, GEODIST(0.710011075352, 1.2918035709982,latitude,longitude,{in=rad,out=km,method=adaptive}) as distance FROM geodemo WHERE MATCH('@state_code NY') AND distance < 10 ORDER BY distance ASC LIMIT 0,10;10 rows in set (0.03 sec)