2013 11-07 lsr-dublin_m_hausenblas_when solr is best

  • 524 views
Uploaded on

Presented by Michael Hausenblas, Chief Data Engineer, , MapR Technologies …

Presented by Michael Hausenblas, Chief Data Engineer, , MapR Technologies

This session will present an overview of common big data use cases in the form of a set of questions that can be used to determine what kind of problem you really have. From the answers to these questions, you can quickly find out about what technologies are likely to be most productive, useful and easy to apply.This analysis will also allow you to discern cases where Solr is not a good fit, but where augmentation with other big data systems like HBase leads to feasible architectures. Conversely, you will see cases where Solr can be the hero by filling the gaps that big data systems alone are destined to fail.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
524
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
11
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. USE CASE DIAGNOSIS: WHEN IS SOLR REALLY THE BEST TOOL? Michael Hausenblas Twitter: @mhausenblas Chief Data Engineer EMEA, MapR Technologies
  • 2. Agenda •  •  •  •  •  Solr in the Big Data ecosystem Polyglot Persistence Common (Big Data) use cases A checklist When not to use Solr …
  • 3. processing storage Apache Pig Apache Zookeeper
  • 4. Polyglot Persistence
  • 5. $ ls -al $ tail –f some.log $ nc localhost 80 awk 'BEGIN { FS = "," } /2013-[[:digit:]]+-[[:digit:]]+/ { print $3 }’ sample.csv tool box one-size-fits-all
  • 6. Polyglot Persistence—Backdrop •  Michael Stonebraker and Ugur Çetintemel—2005 "One Size Fits All": An Idea Whose Time Has Come and Gone •  Martin Fowler—2011 Polyglot Persistence1 •  Eric Brewer—2012 Ricon Keynote—Advancing Distributed Systems2 1) http://martinfowler.com/bliki/PolyglotPersistence.html 2) https://speakerdeck.com/eric_brewer/ricon-2012-keynote
  • 7. Polyglot Persistence—Key Points •  Use different datastores for different needs •  Can apply within an application or cross-enterprise •  Encapsulating data access yields loosely coupled components •  Find sweet spot between dev/op complexity and flexibility
  • 8. Common (Big Data) use cases
  • 9. Where are we coming from? •  •  •  •  •  Keyword search Spellcheck & autosuggest Ranking Faceted search Spatial search
  • 10. Use case: search-based recommendation
  • 11. Search-based recommendation (credit card issuer) •  Given –  customer purchase history –  merchant designations –  merchant special offers •  Goal –  Improve existing recommender system –  Throughput important
  • 12. Analyze with MapReduce complete   history   Co-­‐occurrence   (Mahout)   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   indexing   Index   shards  
  • 13. Deploy with search system user   history   Web  >er   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   search   Index   shards  
  • 14. Use case: log analysis
  • 15. Log analysis •  Given –  Receive 200,000+ log lines per second •  Goal –  Want to do multi-field search –  Want to search on log lines with <30 second delay before search
  • 16. Data Ingestion and Indexing incoming  data   Ka@a   SolR   SolR   Text   Indexer   Indexer   analysis   Solr   indexer   Real-­‐>me   Raw   documents   Older  index   shards   Live  index   shard   >me-­‐sharded  Solr  indexes  
  • 17. Search Query   Solr   search   Web  >er   SolR   SolR   Solr   Indexer   Indexer   search   Raw   documents   Older  index   shards   Live  index   shard  
  • 18. A checklist
  • 19. Question you may want to ask … •  What is the volume of your data* (few GB? up to PB?) •  How are your query characteristics? –  full scans –  look-ups –  multiple passes over large parts –  continuous queries •  What’s (more) important: throughput or latency? *)  Note:  as  long  as  Moore's  law  s>ll  holds,  these  figures  obviously  change  on  a  yearly  if  not  monthly  basis.  
  • 20. Key qualifiers •  Want exploratory interface rather than aggregates in a dashboard •  Data are sparse symbol sets like words or recommendation indicators •  Small-ish return sets are OK, especially if facets are good enough •  Near-real-time is good enough
  • 21. When not to use Solr …
  • 22. Red Flags •  You need strong consistency? •  JOINS, anyone? •  •  •  reme mber :  one fit  all  size  d —too Want (complex) transactions? l  belt oes  n  appr ot   oach! OLTP, streaming (but: near-real-time)   Graphs?
  • 23. Let’s stay in touch … •  Twitter: @mhausenblas @MapR MapR  Nordics   MapR  UK   MapR  HQ   San  Jose,  US   MapR  DACH   MapR  Japan   MapR  SE  &  Benelux   MapR  Hyderbad   •  We’re hiring! MapR  Korea