Multi-Terabyte Sphinx HA cluster
Vyacheslav Kryukov
vkrukov@ivinco.com
Sphinx cluster
Sphinx cluster
Sphinx cluster
Sphinx cluster
Sphinx cluster
Sphinx cluster
Sphinx HA cluster, requrements

●

Incident tolerance and availability level

●

Adaptive balancing

●

Resources redundan...
Sphinx HA cluster architecture
Sphinx HA cluster, architecture #1
Sphinx HA cluster, architecture #2
Sphinx HA cluster, ha_strategy

●

●

Simple balancing
●
random
●
roundrobin
Adaptive balancing
●
nodeads
●
noerrors

http...
Sphinx HA cluster, adaptive balancing
●

Latency

●

Query timeouts

●

Connect timeouts

●

Connect failures

●

Network ...
Sphinx HA cluster, configuration
index some_index
{
type = distributed
agent = se01-1:3312|se01-2:3312:some_index_se01
age...
Sphinx HA cluster, SHOW AGENT STATUS
mysql> SHOW AGENT STATUS;
+-------------------------------------+--------------------...
Sphinx HA cluster, balancing in real time
Sphinx HA cluster, balancing in real time

# cd /mnt/data
# iozone -i0 -i2 -s16g -r32k -f iozone.tmp
Sphinx HA cluster, balancing in real time
Sphinx HA cluster, balancing in real time
Sphinx HA cluster, data processing

●

Data loading to permanent store

●

Data indexig

Indexes validation and synchroniz...
Sphinx HA cluster, performance and
availability
●

Provide performance with band wide

●

What to monitor
●

SHOW AGENT ST...
Sphinx HA cluster, distributed indexer
Sphinx HA cluster, distributed indexer
●

Automated
●

distributed indexing

●

Indexes validation

●

indexes delivery

●...
Resources consumption accounting

●

io ops

●

io size

●

fetched_docs

●

fetched_hits

●

fetched_skips

●

total_foun...
Rosette Linguistics Platform

●

Used for analysis of unstructured text in CJK languages

●

Better quality then using ngr...
Questions?

vkrukov@ivinco.com
Sphinx cluster
Upcoming SlideShare
Loading in …5
×

Вячеслав Крюков, Ivinco

1,565 views

Published on

HighLoad++ 2013

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,565
On SlideShare
0
From Embeds
0
Number of Embeds
905
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Вячеслав Крюков, Ivinco

  1. 1. Multi-Terabyte Sphinx HA cluster Vyacheslav Kryukov vkrukov@ivinco.com
  2. 2. Sphinx cluster
  3. 3. Sphinx cluster
  4. 4. Sphinx cluster
  5. 5. Sphinx cluster
  6. 6. Sphinx cluster
  7. 7. Sphinx cluster
  8. 8. Sphinx HA cluster, requrements ● Incident tolerance and availability level ● Adaptive balancing ● Resources redundancy utilisation ● Easy deployment of new resources
  9. 9. Sphinx HA cluster architecture
  10. 10. Sphinx HA cluster, architecture #1
  11. 11. Sphinx HA cluster, architecture #2
  12. 12. Sphinx HA cluster, ha_strategy ● ● Simple balancing ● random ● roundrobin Adaptive balancing ● nodeads ● noerrors http://sphinxsearch.com/docs/current.html#conf-ha-strategy
  13. 13. Sphinx HA cluster, adaptive balancing ● Latency ● Query timeouts ● Connect timeouts ● Connect failures ● Network errors ● Wrong replies ● Unexpected closings ● Warnings
  14. 14. Sphinx HA cluster, configuration index some_index { type = distributed agent = se01-1:3312|se01-2:3312:some_index_se01 agent = se02-1:3312|se02-2:3312:some_index_se02 agent = se03-1:3312|se03-2:3312:some_index_se03 agent = se04-1:3312|se04-2:3312:some_index_se04 ha_strategy = nodeads } searchd { ... ha_ping_interval = 1000 ha_period_karma = 60 ... } http://sphinxsearch.com/docs/current.html#conf-ha-ping-interval http://sphinxsearch.com/docs/current.html#conf-ha-period-karma
  15. 15. Sphinx HA cluster, SHOW AGENT STATUS mysql> SHOW AGENT STATUS; +-------------------------------------+--------------------+ | Key | Value | +-------------------------------------+--------------------+ | status_period_seconds | 60 | | status_stored_periods | 15 | ... | ag_19_hostname | se02-1:3312 | | ag_19_references | 13 | | ag_19_lastquery | 1.91 | | ag_19_lastanswer | 1.86 | | ag_19_lastperiodmsec | 51 | | ag_19_errorsarow | 0 | | ag_19_1periods_query_timeouts | 0 | | ag_19_1periods_connect_timeouts | 0 | | ag_19_1periods_connect_failures | 0 | | ag_19_1periods_network_errors | 0 | | ag_19_1periods_wrong_replies | 0 | | ag_19_1periods_unexpected_closings | 0 | | ag_19_1periods_warnings | 0 | | ag_19_1periods_succeeded_queries | 101 | | ag_19_1periods_msecsperqueryy | 83.92 | (the same for 5periods_ and 15periods_) | ag_20_hostname | se02-2:3312 | | ag_20_references | 13 | | ag_20_lastquery | 0.55 | | ag_20_lastanswer | 0.49 | | ag_20_lastperiodmsec | 55 | | ag_20_errorsarow | 0 | | ag_20_1periods_query_timeouts | 0 | | ag_20_1periods_connect_timeouts | 0 | | ag_20_1periods_connect_failures | 0 | | ag_20_1periods_network_errors | 0 | | ag_20_1periods_wrong_replies | 0 | | ag_20_1periods_unexpected_closings | 0 | | ag_20_1periods_warnings | 0 | | ag_20_1periods_succeeded_queries | 55 | | ag_20_1periods_msecsperqueryy | 86.08 | (the same for 5periods_ and 15periods_) ...
  16. 16. Sphinx HA cluster, balancing in real time
  17. 17. Sphinx HA cluster, balancing in real time # cd /mnt/data # iozone -i0 -i2 -s16g -r32k -f iozone.tmp
  18. 18. Sphinx HA cluster, balancing in real time
  19. 19. Sphinx HA cluster, balancing in real time
  20. 20. Sphinx HA cluster, data processing ● Data loading to permanent store ● Data indexig Indexes validation and synchronization (Rsync and NetCat) ● ● Update indexes from application
  21. 21. Sphinx HA cluster, performance and availability ● Provide performance with band wide ● What to monitor ● SHOW AGENT STATUS, nodes performance, disc space, io and cpu usage ● Errors, warnings, crashes ● Indexes synchronization, validity, freshness
  22. 22. Sphinx HA cluster, distributed indexer
  23. 23. Sphinx HA cluster, distributed indexer ● Automated ● distributed indexing ● Indexes validation ● indexes delivery ● Failover ● Centralised Sphinx indexes configuration management ● Indexes rebalancing
  24. 24. Resources consumption accounting ● io ops ● io size ● fetched_docs ● fetched_hits ● fetched_skips ● total_found
  25. 25. Rosette Linguistics Platform ● Used for analysis of unstructured text in CJK languages ● Better quality then using ngram options ● Slow indexer performance http://www.basistech.com/text-analytics/rosette/
  26. 26. Questions? vkrukov@ivinco.com
  27. 27. Sphinx cluster

×