Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Side  by  Side  with
Elasticsearch &  Solr
Part  2  -­ Performance  &  Scalability
Radu  Gheorghe
Rafał  Kuć
Who  are  we?
Radu Rafał
Logsene
cftweia
Last  year
Last  year  conclusions
One  year  progress
Facet  by  function
https://issues.apache.org/jira/browse/SOLR-­1581
Analytics  component
https://issu...
That's  not  all
JSON  facets
https://issues.apache.org/jira/browse/SOLR-­7214
Backup  +  restore
https://issues.apache.or...
This year’s agenda
Horizontal scaling
Products use-­case
Logs use-­case
Horizontal scaling in  theory
Node
shard
Horizontal scaling  in  theory
Node
shard
Node
shard
Horizontal scaling  in  theory
Node
shard
Node
shard
Node
shard
Horizontal scaling  in  theory
Node
shard shard
shard shard
Node
shard shard
shard shard
Node
shard shard
shard shard
Horizontal scaling  in  theory
Node
shard shard
shard shard
replica
replica
replica
replica
Node
shard shard
shard shard
r...
Horizontal scaling  -­ the  API
Create /  remove replicas  on  the  fly  -­
Collections  API
Moving  shards  around the  c...
The  products  -­ assumptions
Steady  data  growth
Common,  known  
data  structure
Large  QPS
Spikes  in  traffic
Hardware  and  data
2  x  EC2  c3.large  instances
(2vCPU,  3.5GB  RAM,
2x16GB  SSD  in  RAID0)
vs
Wikipedia
Test  requests
One,  common query
https://github.com/sematext/berlin-­buzzwords-­samples/tree/master/2015
Dictionary  of  ...
Product  search  @  5 threads
The  products  -­ summary
prepare
for  high
QPS
plan  for  
data  
growth
Both are fast,  but  with  wikipedia
<
The  products  -­ summary
prepare
for  high
QPS
plan  for  
data  
growth
Both are fast,  but  with  wikipedia
<
Hardware  and  data (2nd  try)
2  x  EC2  c3.large  instances
(2vCPU,  3.5GB  RAM,
2x16GB  SSD  in  RAID0)
vs
Video
search...
Real product search  @  20 threads
Real product search  @  20 threads
The  products  – real summary
prepare
for  high
QPS
plan  for  
data  
growth
With  video  both are fast and  
configurati...
The  logs  -­ assumptions
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs Logs
L...
Hardware  and  data
2  x  EC2  c3.large  instances
(2vCPU,  3.5GB  RAM,
2x16GB  SSD  in  RAID0)
vs
Logs
Logs
Logs
Logs
Log...
Tuning
refresh_interval:  5s
doc_values:  truedocValues:  true
soft  autocommit: 5s
catch  all  field:  on _all:  enabled
...
Test  requests
Filters Aggregations/Facets
filter  by  client    IP date  histogram
filter  by  word  in  user  
agent
top...
Test  runs
1. Write  throughput
2. Capacity of  a  single  index
3. Capacity with  time-­based indices on  
hot/cold setup
Single  index  write  throughput
Single  index  @  max EPS
Single  index  @  400 EPS
Time-­based indices
smaller  indices
lighter  indexing
easier  to  isolate  hot  data  from  cold  data
easier  to  reloca...
Hot  /  Cold in  practice
cron job does everything
creates indices in  advance,  on  the  hot  
nodes (createNodeSet prope...
Time-­based:  2  hot and  2  cold
Before
AfterAfter
Before
Time-­based:  2  hot and  2  cold
The  logs  -­ summary
Faceting
use  time  
based  indices
use  hot  /  cold  
nodes  policy
Filtering
One  summary to  rule them all
Differences in  configuration often matter
more than differences in  products
Before the  t...
We  are  hiring
Dig  Search?
Dig  Analytics?
Dig  Big  Data?
Dig  Performance?
Dig  Logging?
Dig  working  with  and  in  ...
Call  me,  maybe?
Radu Gheorghe
@radu0gheorghe
radu.gheorghe@sematext.com
Rafał Kuć
@kucrafal
rafal.kuc@sematext.com
Semat...
Upcoming SlideShare
Loading in …5
×

Side by Side with Elasticsearch & Solr, Part 2

16,601 views

Published on

Elasticsearch & Solr side by side comparison with focus on performance and scalability.

Side by Side with Elasticsearch & Solr, Part 2

  1. 1. Side  by  Side  with Elasticsearch &  Solr Part  2  -­ Performance  &  Scalability Radu  Gheorghe Rafał  Kuć
  2. 2. Who  are  we? Radu Rafał Logsene cftweia
  3. 3. Last  year
  4. 4. Last  year  conclusions
  5. 5. One  year  progress Facet  by  function https://issues.apache.org/jira/browse/SOLR-­1581 Analytics  component https://issues.apache.org/jira/browse/SOLR-­5302 Solr  as  standalone  application https://issues.apache.org/jira/browse/SOLR-­4792 top_hits  aggregation https://github.com/elastic/elasticsearch/pull/6124 minimum_should_match on  has_child https://github.com/elastic/elasticsearch/pull/6019 filters  aggregation https://github.com/elastic/elasticsearch/pull/6118
  6. 6. That's  not  all JSON  facets https://issues.apache.org/jira/browse/SOLR-­7214 Backup  +  restore https://issues.apache.org/jira/browse/SOLR-­5750 Streaming  aggregations https://issues.apache.org/jira/browse/SOLR-­7082 Moving  averages  aggregation https://github.com/elastic/elasticsearch/pull/10002 Computation  on  aggregations https://github.com/elastic/elasticsearch/pull/9876 Cluster  state  diff  support https://github.com/elastic/elasticsearch/pull/6295 Cross  data  center  replication https://issues.apache.org/jira/browse/SOLR-­6273 Shadow  replicas https://github.com/elastic/elasticsearch/pull/8976
  7. 7. This year’s agenda Horizontal scaling Products use-­case Logs use-­case
  8. 8. Horizontal scaling in  theory Node shard
  9. 9. Horizontal scaling  in  theory Node shard Node shard
  10. 10. Horizontal scaling  in  theory Node shard Node shard Node shard
  11. 11. Horizontal scaling  in  theory Node shard shard shard shard Node shard shard shard shard Node shard shard shard shard
  12. 12. Horizontal scaling  in  theory Node shard shard shard shard replica replica replica replica Node shard shard shard shard replica replica replica replica Node shard shard shard shard replica replica replica replica
  13. 13. Horizontal scaling  -­ the  API Create /  remove replicas  on  the  fly  -­ Collections  API Moving  shards  around the  cluster  using   add  /  delete  replica Create /  remove replicas  on  the  fly  -­ Update  Indices  Settings Moving  shards  around the  cluster  using   Cluster  Reroute  API Shard  splitting using   Collections  API Automatic  shard  balancing by  default Migrating  data with  a  given  routing  key to  another  collection  using  API Shard  allocation  awareness &  rule   based  shard  placement
  14. 14. The  products  -­ assumptions Steady  data  growth Common,  known   data  structure Large  QPS Spikes  in  traffic
  15. 15. Hardware  and  data 2  x  EC2  c3.large  instances (2vCPU,  3.5GB  RAM, 2x16GB  SSD  in  RAID0) vs Wikipedia
  16. 16. Test  requests One,  common query https://github.com/sematext/berlin-­buzzwords-­samples/tree/master/2015 Dictionary  of  common and   uncommon terms JMeter hammering
  17. 17. Product  search  @  5 threads
  18. 18. The  products  -­ summary prepare for  high QPS plan  for   data   growth Both are fast,  but  with  wikipedia <
  19. 19. The  products  -­ summary prepare for  high QPS plan  for   data   growth Both are fast,  but  with  wikipedia <
  20. 20. Hardware  and  data (2nd  try) 2  x  EC2  c3.large  instances (2vCPU,  3.5GB  RAM, 2x16GB  SSD  in  RAID0) vs Video search video video video video video video video video
  21. 21. Real product search  @  20 threads
  22. 22. Real product search  @  20 threads
  23. 23. The  products  – real summary prepare for  high QPS plan  for   data   growth With  video  both are fast and   configuration matters
  24. 24. The  logs  -­ assumptions Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Lots  of  data No  updates Low  query  count Time  oriented   data
  25. 25. Hardware  and  data 2  x  EC2  c3.large  instances (2vCPU,  3.5GB  RAM, 2x16GB  SSD  in  RAID0) vs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Apache  logs
  26. 26. Tuning refresh_interval:  5s doc_values:  truedocValues:  true soft  autocommit: 5s catch  all  field:  on _all:  enabled hard autocommit:  200mb flush_threshold_size:  200mb
  27. 27. Test  requests Filters Aggregations/Facets filter  by  client    IP date  histogram filter  by  word  in  user   agent top  10  response  codes wildcard  filter  on  domain #  of  unique  IPs top  IPs  per  response  per  time https://github.com/sematext/berlin-­buzzwords-­samples/tree/master/2015
  28. 28. Test  runs 1. Write  throughput 2. Capacity of  a  single  index 3. Capacity with  time-­based indices on   hot/cold setup
  29. 29. Single  index  write  throughput
  30. 30. Single  index  @  max EPS
  31. 31. Single  index  @  400 EPS
  32. 32. Time-­based indices smaller  indices lighter  indexing easier  to  isolate  hot  data  from  cold  data easier  to  relocate bigger  indices less  RAM less  management  overhead smaller  cluster  state
  33. 33. Hot  /  Cold in  practice cron job does everything creates indices in  advance,  on  the  hot   nodes (createNodeSet property) Uses node properties (e.g.  node.tag) index templates +  shard allocation awareness =  new indices go  to  hot   nodes automagically Optimizes old indices,  creates N  new replicas of  each shard on  the  cold nodes Removes all the  replicas from  the  hot   nodes Cron job optimizes old indices and   changes shard allocation attributes =>   shards get moved and  Do on  the  cold nodes
  34. 34. Time-­based:  2  hot and  2  cold Before AfterAfter Before
  35. 35. Time-­based:  2  hot and  2  cold
  36. 36. The  logs  -­ summary Faceting use  time   based  indices use  hot  /  cold   nodes  policy Filtering
  37. 37. One  summary to  rule them all Differences in  configuration often matter more than differences in  products Before the  tests we  expected results to  be   totally opposite in  both use cases Do  your own tests with  your data and   queries before jumping  into conclusions
  38. 38. We  are  hiring Dig  Search? Dig  Analytics? Dig  Big  Data? Dig  Performance? Dig  Logging? Dig  working  with  and  in  open  – source? We’re  hiring  world  – wide! http://sematext.com/about/jobs.html
  39. 39. Call  me,  maybe? Radu Gheorghe @radu0gheorghe radu.gheorghe@sematext.com Rafał Kuć @kucrafal rafal.kuc@sematext.com Sematext @sematext http://sematext.com

×