2013 11-07 lsr-dublin_m_hausenblas_when solr is best


Published on

Presented by Michael Hausenblas, Chief Data Engineer, , MapR Technologies

This session will present an overview of common big data use cases in the form of a set of questions that can be used to determine what kind of problem you really have. From the answers to these questions, you can quickly find out about what technologies are likely to be most productive, useful and easy to apply.This analysis will also allow you to discern cases where Solr is not a good fit, but where augmentation with other big data systems like HBase leads to feasible architectures. Conversely, you will see cases where Solr can be the hero by filling the gaps that big data systems alone are destined to fail.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2013 11-07 lsr-dublin_m_hausenblas_when solr is best

  1. 1. USE CASE DIAGNOSIS: WHEN IS SOLR REALLY THE BEST TOOL? Michael Hausenblas Twitter: @mhausenblas Chief Data Engineer EMEA, MapR Technologies
  2. 2. Agenda •  •  •  •  •  Solr in the Big Data ecosystem Polyglot Persistence Common (Big Data) use cases A checklist When not to use Solr …
  3. 3. processing storage Apache Pig Apache Zookeeper
  4. 4. Polyglot Persistence
  5. 5. $ ls -al $ tail –f some.log $ nc localhost 80 awk 'BEGIN { FS = "," } /2013-[[:digit:]]+-[[:digit:]]+/ { print $3 }’ sample.csv tool box one-size-fits-all
  6. 6. Polyglot Persistence—Backdrop •  Michael Stonebraker and Ugur Çetintemel—2005 "One Size Fits All": An Idea Whose Time Has Come and Gone •  Martin Fowler—2011 Polyglot Persistence1 •  Eric Brewer—2012 Ricon Keynote—Advancing Distributed Systems2 1) http://martinfowler.com/bliki/PolyglotPersistence.html 2) https://speakerdeck.com/eric_brewer/ricon-2012-keynote
  7. 7. Polyglot Persistence—Key Points •  Use different datastores for different needs •  Can apply within an application or cross-enterprise •  Encapsulating data access yields loosely coupled components •  Find sweet spot between dev/op complexity and flexibility
  8. 8. Common (Big Data) use cases
  9. 9. Where are we coming from? •  •  •  •  •  Keyword search Spellcheck & autosuggest Ranking Faceted search Spatial search
  10. 10. Use case: search-based recommendation
  11. 11. Search-based recommendation (credit card issuer) •  Given –  customer purchase history –  merchant designations –  merchant special offers •  Goal –  Improve existing recommender system –  Throughput important
  12. 12. Analyze with MapReduce complete   history   Co-­‐occurrence   (Mahout)   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   indexing   Index   shards  
  13. 13. Deploy with search system user   history   Web  >er   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   search   Index   shards  
  14. 14. Use case: log analysis
  15. 15. Log analysis •  Given –  Receive 200,000+ log lines per second •  Goal –  Want to do multi-field search –  Want to search on log lines with <30 second delay before search
  16. 16. Data Ingestion and Indexing incoming  data   Ka@a   SolR   SolR   Text   Indexer   Indexer   analysis   Solr   indexer   Real-­‐>me   Raw   documents   Older  index   shards   Live  index   shard   >me-­‐sharded  Solr  indexes  
  17. 17. Search Query   Solr   search   Web  >er   SolR   SolR   Solr   Indexer   Indexer   search   Raw   documents   Older  index   shards   Live  index   shard  
  18. 18. A checklist
  19. 19. Question you may want to ask … •  What is the volume of your data* (few GB? up to PB?) •  How are your query characteristics? –  full scans –  look-ups –  multiple passes over large parts –  continuous queries •  What’s (more) important: throughput or latency? *)  Note:  as  long  as  Moore's  law  s>ll  holds,  these  figures  obviously  change  on  a  yearly  if  not  monthly  basis.  
  20. 20. Key qualifiers •  Want exploratory interface rather than aggregates in a dashboard •  Data are sparse symbol sets like words or recommendation indicators •  Small-ish return sets are OK, especially if facets are good enough •  Near-real-time is good enough
  21. 21. When not to use Solr …
  22. 22. Red Flags •  You need strong consistency? •  JOINS, anyone? •  •  •  reme mber :  one fit  all  size  d —too Want (complex) transactions? l  belt oes  n  appr ot   oach! OLTP, streaming (but: near-real-time)   Graphs?
  23. 23. Let’s stay in touch … •  Twitter: @mhausenblas @MapR MapR  Nordics   MapR  UK   MapR  HQ   San  Jose,  US   MapR  DACH   MapR  Japan   MapR  SE  &  Benelux   MapR  Hyderbad   •  We’re hiring! MapR  Korea