2013 11-07 lsr-dublin_m_hausenblas_when solr is best
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

2013 11-07 lsr-dublin_m_hausenblas_when solr is best

on

  • 793 views

Presented by Michael Hausenblas, Chief Data Engineer, , MapR Technologies ...

Presented by Michael Hausenblas, Chief Data Engineer, , MapR Technologies

This session will present an overview of common big data use cases in the form of a set of questions that can be used to determine what kind of problem you really have. From the answers to these questions, you can quickly find out about what technologies are likely to be most productive, useful and easy to apply.This analysis will also allow you to discern cases where Solr is not a good fit, but where augmentation with other big data systems like HBase leads to feasible architectures. Conversely, you will see cases where Solr can be the hero by filling the gaps that big data systems alone are destined to fail.

Statistics

Views

Total Views
793
Views on SlideShare
591
Embed Views
202

Actions

Likes
0
Downloads
11
Comments
0

3 Embeds 202

http://www.lucenerevolution.org 199
http://lucenerevolution.org 2
http://www.google.ca 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

2013 11-07 lsr-dublin_m_hausenblas_when solr is best Presentation Transcript

  • 1. USE CASE DIAGNOSIS: WHEN IS SOLR REALLY THE BEST TOOL? Michael Hausenblas Twitter: @mhausenblas Chief Data Engineer EMEA, MapR Technologies
  • 2. Agenda •  •  •  •  •  Solr in the Big Data ecosystem Polyglot Persistence Common (Big Data) use cases A checklist When not to use Solr …
  • 3. processing storage Apache Pig Apache Zookeeper
  • 4. Polyglot Persistence
  • 5. $ ls -al $ tail –f some.log $ nc localhost 80 awk 'BEGIN { FS = "," } /2013-[[:digit:]]+-[[:digit:]]+/ { print $3 }’ sample.csv tool box one-size-fits-all
  • 6. Polyglot Persistence—Backdrop •  Michael Stonebraker and Ugur Çetintemel—2005 "One Size Fits All": An Idea Whose Time Has Come and Gone •  Martin Fowler—2011 Polyglot Persistence1 •  Eric Brewer—2012 Ricon Keynote—Advancing Distributed Systems2 1) http://martinfowler.com/bliki/PolyglotPersistence.html 2) https://speakerdeck.com/eric_brewer/ricon-2012-keynote
  • 7. Polyglot Persistence—Key Points •  Use different datastores for different needs •  Can apply within an application or cross-enterprise •  Encapsulating data access yields loosely coupled components •  Find sweet spot between dev/op complexity and flexibility
  • 8. Common (Big Data) use cases
  • 9. Where are we coming from? •  •  •  •  •  Keyword search Spellcheck & autosuggest Ranking Faceted search Spatial search
  • 10. Use case: search-based recommendation
  • 11. Search-based recommendation (credit card issuer) •  Given –  customer purchase history –  merchant designations –  merchant special offers •  Goal –  Improve existing recommender system –  Throughput important
  • 12. Analyze with MapReduce complete   history   Co-­‐occurrence   (Mahout)   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   indexing   Index   shards  
  • 13. Deploy with search system user   history   Web  >er   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   search   Index   shards  
  • 14. Use case: log analysis
  • 15. Log analysis •  Given –  Receive 200,000+ log lines per second •  Goal –  Want to do multi-field search –  Want to search on log lines with <30 second delay before search
  • 16. Data Ingestion and Indexing incoming  data   Ka@a   SolR   SolR   Text   Indexer   Indexer   analysis   Solr   indexer   Real-­‐>me   Raw   documents   Older  index   shards   Live  index   shard   >me-­‐sharded  Solr  indexes  
  • 17. Search Query   Solr   search   Web  >er   SolR   SolR   Solr   Indexer   Indexer   search   Raw   documents   Older  index   shards   Live  index   shard  
  • 18. A checklist
  • 19. Question you may want to ask … •  What is the volume of your data* (few GB? up to PB?) •  How are your query characteristics? –  full scans –  look-ups –  multiple passes over large parts –  continuous queries •  What’s (more) important: throughput or latency? *)  Note:  as  long  as  Moore's  law  s>ll  holds,  these  figures  obviously  change  on  a  yearly  if  not  monthly  basis.  
  • 20. Key qualifiers •  Want exploratory interface rather than aggregates in a dashboard •  Data are sparse symbol sets like words or recommendation indicators •  Small-ish return sets are OK, especially if facets are good enough •  Near-real-time is good enough
  • 21. When not to use Solr …
  • 22. Red Flags •  You need strong consistency? •  JOINS, anyone? •  •  •  reme mber :  one fit  all  size  d —too Want (complex) transactions? l  belt oes  n  appr ot   oach! OLTP, streaming (but: near-real-time)   Graphs?
  • 23. Let’s stay in touch … •  Twitter: @mhausenblas @MapR MapR  Nordics   MapR  UK   MapR  HQ   San  Jose,  US   MapR  DACH   MapR  Japan   MapR  SE  &  Benelux   MapR  Hyderbad   •  We’re hiring! MapR  Korea