Implementing search with solr at 7digital

965
-1

Published on

Presented by James Atherton, Search Team Lead, 7digital

A usage/case study, describing our journey as we implemented Lucene/Solr, the lessons we learned along the way and where we hope to go in the future.How we implemented our instant search/search suggest. How we handle trying to index 400 million tracks and metadata for over 40 countries, comprising over 300GB of data, and about 70GB of indexes. Finally where we hope to go in the future.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
965
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
19
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Implementing search with solr at 7digital

  1. 1. Implementing Search with Solr at 7digitalJames AthertonContent Discovery Team Lead
  2. 2. Implementing Searchwith SolrJames AthertonContent Discovery Lead@mr_road
  3. 3. Who is 7digital?Online digital content providerCovering over 47 territoriesOnline music store: www.7digital.comAPI: api.7digital.comWe power a number of music services:SamsungBlackberryTurntable.fmPure
  4. 4. Where we came from...SQL SearchesSELECT *FROM <table>WHERE name LIKE <search_term>%;This was SLOW and BAD!!
  5. 5. Wrapped Solr in an API
  6. 6. Old ArchitectureAPIDB
  7. 7. Domain ObjectsArtist DocumentsRelease Documents (e.g. album or single)Track Documents
  8. 8. First Attempt - 2011• Artists and Releases• Solr 1.4• 17 stores• ~40GB• Dropped DIH as it had issues
  9. 9. 2011 ArchitectureHTTPAPISearchAPI SolrDBSolrTracksArtistsReleases
  10. 10. 2012• Added Tracks Core• Solr 3.5• 47 stores• ~400GB• More than 430 M docs• Didnt revisit DIH
  11. 11. Current ArchitectureHTTPAPISearchAPIArtist/ReleaseSolrsTrack SolrsTrack SolrsTrack SolrsTrack SolrsArtist/ReleaseSolrs
  12. 12. Things LearntWe should have split by <X>; for us Shops.
  13. 13. Beware Inflection PointsData size: 400GB != 40GB * 10Throughput: 600 rpm IS NOT 4 * 150 rpm
  14. 14. What we want in our servers?RAM ?Fast Disks?CPUs?Virtual?Bare Metal?
  15. 15. Optimize really...?
  16. 16. Cache Warming/First search?
  17. 17. TestingTest ingestion/data import, then test againYour data is not as clean as you thinkLoad test early and oftenWe need to be better at this still
  18. 18. LogsLogging is worth its weight in goldBut dont get weighed down
  19. 19. MonitoringWe use statsd/graphite and NewRelic:
  20. 20. Visualise IndexingWhich territorys data has been indexed?
  21. 21. Instant Search
  22. 22. Magic DeploysWe recently adopted CFEngine, it is awesome!!
  23. 23. The FutureHTTPAPISearchAPIArtist SolrsTrackSolrsTrackSolrsTrackSolrsReleaseSolrsTrack SolrsSolr Cloud, inthe Cloud??
  24. 24. Questions?
  25. 25. Resourceshttps://github.com/etsy/statsd/https://github.com/7digitalhttp://d3js.org/
  26. 26. James Atherton@mr_road@7digital
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×