Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AOEcon17: Searchperience - The journey from PHP and Solr to Scala and Elasticsearch


Published on

Timo Litzius and Mick Klapper take you on a journey about Searchperience, AOE's E-Commerce search solution. In their AOEconf17 talk, Timo and Mick show how it all started, how it progressed and where we are now. In the last 3 years the Searchperience project has gone through numerous changes regarding the team, our stack, the whole infrastructure as well as performance and automation. The move to Scala and Elasticsearch had the biggest impact so far. Mick and Timo give insights into several learnings the team made.

Published in: Software
  • Be the first to comment

  • Be the first to like this

AOEcon17: Searchperience - The journey from PHP and Solr to Scala and Elasticsearch

  1. 1. Agenda • What is Searchperience? • How did it start? • Changing the stack • Overall summary / Comparison
  2. 2. What is Searchperience? • Officially: AOE’s Enterprise search solution • Simply explained: „We’re Google for your website“
  3. 3. Clients
  4. 4. Server Our components in the beginning Indexer TYPO3 Frontend SolrJavascript Cockpit
  5. 5. Document
 Refers to a searchable item stored in the search engine. Terminologies we use Facet
 An UI element to apply filters to a search such as „colors“ or „sizes“.
  6. 6. How did it start?
  7. 7. History 2010 20142012 2013
  8. 8. What did we use? • Self hosted, Cloud hosted • Partly automated using Puppet or Chef • PHP implementation of all our components • TYPO3, Symfony • Apache Solr (master/slave)
  9. 9. Challenges • At the end we were facing performance issues with our TYPO3 implementation • Heavy TYPO3 bootstrapping • Template rendering • Update from TYPO3 4.5 to 6.2 also negatively affected performance • We introduced complex varnish caching to gain performance
  10. 10. Changing the stack from PHP and Solr to Scala and Elasticsearch
  11. 11. Reasons • Decouple us from TYPO3 • Frontend performance needed to be increased • Upgrading Solr would also require a huge effort • Switching to Elasticsearch • Easier scaling • Modern and flexible API • Improved configuration management
  12. 12. Stockmann • Founded in 1862 • The biggest department store chain in Finland • Over 14,000 employees • 1.8 billion € in Revenue (2014) • Biannual „Crazy days“ (Special campaign running 2 weeks)
  13. 13. Technical challenges • Custom sorting • Product variants • Elasticsearch 2 -> Elasticsearch 5: Suggestions
  14. 14. Custom Sorting Size Facet Brand Facet (Finnish sorting) Custom menu sorting
  15. 15. Custom Sorting • From all requirements we thought the finnish ordering would be the easiest • Wasn’t because Elasticsearch outputted weird characters after applying „Finnish sorting“ filter • Real custom sorting is not possible within Elasticsearch • Solution: We provide that kind of sorting within the Frontend
  16. 16. Product variants
  17. 17. A customer walks into a store to buy a shirt
  18. 18. I want a shirt! it should be • color: red • size: L S XL L What we expected to get Because this is the only shirt where both criteria match. It’s red and large What we actually got Because one criteria was enough
  19. 19. Product Product variants • Variants were not treated as separate documents • The fields of the variants (like color) were flattened into one big array destroying all associations • Luckily Elasticsearch supports treating these as separate documents internally • These are called nested documentscolor: blue color: red color: green variants.color: [blue, red, green]
  20. 20. Issues • We did not notice this misbehavior in the beginning • We had to rebuild our whole filter logic • Increased complexity • Filter combinations of the main product and a variant are difficult • Filters on nested documents have their own scope • Edge cases where this becomes a problem
  21. 21. But this was only one part We only want to show variants (colors) which match our current filter criteria Only these colors are available for the selected size
  22. 22. • Problem: Elasticsearch always returns the whole document even if a filter on variants is applied • Solution: Filter matching variants during rendering But this was only one part
  23. 23. Suggestions Based on what you typed we suggest you words you might want to search for, known as suggestions
  24. 24. a suggestion in the wild
  25. 25. We once upgraded Elasticsearch
 from 2 to 5 One remaining test needed to be fixed
  26. 26. Suggestion implementation Let’s get suggestions for „Adi“ Elasticsearch 2 Elasticsearch 5 Adidas Adidas Originals Adidas Performance Adidas Adidas Adidas
  27. 27. „Suggestions are document based now“ Which means we get all matching documents back and the suggestion attached to each returned document, which leads to duplicate completions. Suggestion result Adidas Adidas Originals Adidas Performance Elasticsearch 2 Suggestion result Adidas Adidas Adidas Elasticsearch 5
  28. 28. Solution • Maintain suggestions in a separate index • Generate suggestions during indexing pipeline processing • Managing de-duplication • We process documents continuously and never know which suggestions are already in there. (Without checking, which is expensive) • -> Hash the suggestion text and use it as an ID in Elasticsearch • No, it was not easy!
  29. 29. What we gained from Scala + Play • Scala’s functional style allowed us to solve hard problems much simpler • Scala is compiled • Twirl templates as well • Many errors are caught quite early • Much higher performance
  30. 30. What we gained from Elasticsearch • Easier readable queries and a nicer API • Easier development and configuration • Less complex index and schema management • Results in increased development speed • Parallel searches • Cheap searches to cover certain business logic • Redirect to brand page if brand name matches • Runs embedded in our Frontend application for automated tests
  31. 31. Lessons learned • Develop for one use case and generalize later helped us to… • … get one project finished much quicker • … find the right abstractions later • ... focus on relevant things
  32. 32. OM3 Facts • Search is the single source of truth for many more systems • Frontend Pipeline • Transforms Elasticsearch documents to match business requirements • Self-contained environments for development and testing • User driven search result ranking to improve conversion rate
  33. 33. Infrastructure buildingdevelop with containers in deploying to provisioned on via pushed to
  34. 34. Self-contained environments Kubernetes Cluster Frontend Indexer Tracker SQS (Indexer) S3 (Cockpit) Cockpit API- Console Elasticsearch MySQL (Indexer) MySQL (Cockpit) Redis (Indexer) Frontend Integration Demo S3 (Tracker) Recommender Recommendation processing User Manual
  35. 35. Overall summary
  36. 36. Business value • Single source of truth • Consistent data state over all systems • Influence search ranking • Provide individual business logic • Fast and easy to scale • Omnichannel • web, mobile, third party applications • Realtime search results without any cache in between
  37. 37. Technologies back then
  38. 38. Technologies now
  39. 39. Server Our components when we started Indexer TYPO3 Frontend SolrJavascript Cockpit
  40. 40. ServerServer ServerServer Our components: second generation Indexer TYPO3 Frontend SolrJavascript CockpitWidgets
  41. 41. ServerServer ServerServer Our components: third generation Indexer Scala Frontend ElasticsearchJavascript CockpitWidgets
  42. 42. ContainerContainer Container/ServerContainer Our components: fourth generation Indexer Scala Frontend ElasticsearchJavascript CockpitWidgets Container Recommender Container Tracking Container Recommendation processing Tracking Javascript
  43. 43. Performance improvements 0 100 200 300 400 reqs/seq 320 30 TYPO3 Scala 0 125 250 375 500 mean response time 60 478 TYPO3 Scala TYPO3: Search for random search terms, varnish enabled, APC cache enabled Scala: Search for random terms, no optimizations done, no cache at all!
  44. 44. The core team
  45. 45. More than just a search