Architectural lessons learned from refactoring aSolr based API application.Torsten Bøgh Köster (Shopping24)   Apache Lucen...
ContentsShopping24 and it‘s APITechnical scaling solutions  Sharding  Caching  Solr Cores  „Elastic“ infrastructurebusines...
@tboeghkSoftware- and systems- architect2 years experience with Solr3 years experience with LuceneTeam of 7 Java developer...
shopping24 internet group
1 portal became      n portals
30 partner shops became 700
500k to   7m documents
index fact time                  •16 Gig Data                  •Single-Core-Layout                  •Up to 17s response ti...
scaling goal:15-50m documents
ask the nerds           „Shard!“                That‘ll be fun!            „Use spare compute cores at Amazon?“           ...
data sharding ...
... is highly effective.500ms            1shard       2shard       3shard            4shard       6shard       8shard375ms...
Sharding: size matters                         the bigger your index gets,                              the more complex y...
but wait ...
Why do we have  such a big index?
7m documents  vs. 2m active poducts
fashion                               product                              lifecycle                             meets SEO...
Separation of duties! Removeunsearchable data from yourindex.
Why do we have complex queries?
A Solr indexdesigned for 1 portal
Grown into a multi-portal index
Let “sharding“ follow your data ...
... and build separate cores	 	 	 	 for every client.
Duplicate data as long as                            access is fast.andybahn / photocase.com
Streamline yourindex provisioning          process.
A thousand splendid cores  at your fingertips.
Throwing hardware atproblems. Automated.
evil traps: latency, $$
mirror your complete system                           – solve load balancer problemsfroodmat / photocase.com
I said faster!
use a cache layerlike Varnish.
What about those complex queries? Why do we havethem? And how do we get             rid of them?
Lost in encapsulation:Solr API exposed to world.
What‘s the key factor?
look at yourbusiness requirements
decrease complexity
Questions? Comments? Ideas?Twitter: @tboeghkGithub: @tboeghkEmail: torsten.koester@s24.comWeb: http://www.s24.comImages: s...
Upcoming SlideShare
Loading in …5
×

Refactoring a Solr based API application

690 views
634 views

Published on

Held at Apache Lucene Eurocon in Barcelona in October 2011

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
690
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Refactoring a Solr based API application

  1. 1. Architectural lessons learned from refactoring aSolr based API application.Torsten Bøgh Köster (Shopping24) Apache Lucene Eurocon, 19.10.2011
  2. 2. ContentsShopping24 and it‘s APITechnical scaling solutions Sharding Caching Solr Cores „Elastic“ infrastructurebusiness requirements as keyfactor
  3. 3. @tboeghkSoftware- and systems- architect2 years experience with Solr3 years experience with LuceneTeam of 7 Java developers currently at Shopping24
  4. 4. shopping24 internet group
  5. 5. 1 portal became n portals
  6. 6. 30 partner shops became 700
  7. 7. 500k to 7m documents
  8. 8. index fact time •16 Gig Data •Single-Core-Layout •Up to 17s response time •Machine size limited •Stalled at solr version 1.4 •API designed for small tools
  9. 9. scaling goal:15-50m documents
  10. 10. ask the nerds „Shard!“ That‘ll be fun! „Use spare compute cores at Amazon?“ breathe load into the cloud „Reduce that index size“ „Get rid of those long running queries!“
  11. 11. data sharding ...
  12. 12. ... is highly effective.500ms 1shard 2shard 3shard 4shard 6shard 8shard375ms250ms125ms 1 4 8 12 16 20 concurrent requests
  13. 13. Sharding: size matters the bigger your index gets, the more complex your queries are, the more concurrent requests, the more sharding you need
  14. 14. but wait ...
  15. 15. Why do we have such a big index?
  16. 16. 7m documents vs. 2m active poducts
  17. 17. fashion product lifecycle meets SEOBastografie / photocase.com
  18. 18. Separation of duties! Removeunsearchable data from yourindex.
  19. 19. Why do we have complex queries?
  20. 20. A Solr indexdesigned for 1 portal
  21. 21. Grown into a multi-portal index
  22. 22. Let “sharding“ follow your data ...
  23. 23. ... and build separate cores for every client.
  24. 24. Duplicate data as long as access is fast.andybahn / photocase.com
  25. 25. Streamline yourindex provisioning process.
  26. 26. A thousand splendid cores at your fingertips.
  27. 27. Throwing hardware atproblems. Automated.
  28. 28. evil traps: latency, $$
  29. 29. mirror your complete system – solve load balancer problemsfroodmat / photocase.com
  30. 30. I said faster!
  31. 31. use a cache layerlike Varnish.
  32. 32. What about those complex queries? Why do we havethem? And how do we get rid of them?
  33. 33. Lost in encapsulation:Solr API exposed to world.
  34. 34. What‘s the key factor?
  35. 35. look at yourbusiness requirements
  36. 36. decrease complexity
  37. 37. Questions? Comments? Ideas?Twitter: @tboeghkGithub: @tboeghkEmail: torsten.koester@s24.comWeb: http://www.s24.comImages: sxc.hu (unless noted otherwise)

×