Search with Solr
Upcoming SlideShare
Loading in...5
×
 

Search with Solr

on

  • 3,792 views

With Google constantly pushing the customer expectations of searching, is it time to move away from our database full-text search in pursuit of a more targeted platform? Can implementing Solr offer ...

With Google constantly pushing the customer expectations of searching, is it time to move away from our database full-text search in pursuit of a more targeted platform? Can implementing Solr offer more than an answer to a search? Implementing a search platform isn’t always suitable for all applications, but in this talk we’ll look at identifying the right search solution, choosing the best way to integrate it into our application and exploring all the benefits a search server can offer.

Statistics

Views

Total Views
3,792
Views on SlideShare
3,778
Embed Views
14

Actions

Likes
4
Downloads
82
Comments
0

4 Embeds 14

http://protalk.ldev 6
http://protalk.me 5
http://www.thewebhatesme.com 2
http://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Twitter: @paulmatthews86Personal Blog: 86pTechnicalNon-techSoftware Engineer at IbuildingsTechportalMongoDBSolr (May 2011)Solr ProjectsTravel CompanyMedia Company
  • This talk What Is Solr? When is right timeWhySearch ?How Start journey– investigate Explain to business integrateWho is this talk aimed at? Developers Toying with search DB search Starting with search
  • This talk When right time – identifying Why Search benefits Dark horse How Start journey– investigate Explain to business integrateWho is this talk aimed at? Developers Toying with search DB search Starting with search
  • What is search? Text based navigation To content / products Customers describing something Capture queries SortingOrganising content Examples Quick search Category listing Advanced search
  • The Power of SearchFrom LIKE to SOLR
  • First up DB Like
  • Pros: Little effort to use, or understand.Cons: Not good User data: Not greater than 1 word
  • Full Text Lots of people use
  • Pros: Some power Convenient In DBCons: Feature poor Slow
  • Basic / Easy to use proper Search
  • Pros: Can be very fast Often simple to setupCons: Feature poor Less accurate More application code?Google Custom Search Engine Crawls siteXapian Simple search solution
  • Pros:Poweful Feature rich Relatively Simple Lots of pluginsCons: Could be overkill Different language
  • On Java stand alone Requires servlet container Tomcat Jetty stand alone Lucene Search library Offers Full Text High performance Java - other implementations available
  • This talk When right time – identifying Why Search benefits Dark horse How Start journey– investigate Explain to business integrateWho is this talk aimed at? Developers Toying with search DB search Starting with search
  • Who? Traffic Not for Facebook Works for average Features It has many No need to use themWhen? Designed from beginning Easily used to enrich site navigation Implementation as post-live project Implementation into existing open source softwareDrupalMagento
  • Spending time / effort / money on the search box Fixing bugs Endless tuning Adding functionalityCustomers complaining Not finding content High Bounce rates Site is slow Not finding the *right* content
  • Large data sets 10000 records Speed Like queriesMySQL full-text Site performanceSlowlog? Results Inaccurate MissingGraceful degradation Important for quality Low cost
  • This talk When right time – identifying Why Search benefits Dark horse How Start journey– investigate Explain to business integrateWho is this talk aimed at? Developers Toying with search DB search Starting with search
  • Is Solr right for me?Before Answering:Terms:Find materialsCommunicate to peopleFunctionality:Most Use – Know FunctionalityRe-invent – Wheel
  • Main 2Database tables Data Import Handler Easy – just configAPI Anything publish API Hooked into contentCSV & XMLSolr Cell - Rich Docs PDF MS Office
  • Parse: text generate index Removes junk Improve matchesHalf now, half later: Reduce time searching
  • Analyzer Groups actions of Parsing Important to do same / similar in searching
  • TokenizerStrings to tokensExample ones:Whitespace – splits on whitespaceKeyword – strips special charsStandard – General purpose, adds context
  • Transforms tokensLower case.Stop – filters out stop words: a, if, to, andStandard – Remove dots, ‘s (Context only)Synonym.
  • Hit Highlighting* Remember to set the delimiter, not everything is a web page.
  • Spell checkingConfigureSpellingsNames - FlickrKeywords
  • Autocomplete Common queries
  • Phrase queries "search for a phrase"Wildcard queries Match with wildcards ? single * multipleFuzzy queriesLevenshtein Distance Similar to word ~Proximity queries Words close together "two words"~12Range queries Between two values started:[20110101 TO 20120101] Inclusivename:{Paul TO Jeff} exclusive
  • Fields Single field Target search Multiple field Build Queries
  • Faceted Set Counts Filter data Multiple classifications
  • Ordered results based on best matchOr order by any field
  • Simultaneous update and search
  • This talk When right time – identifying Why Search benefits Dark horse How Start journey– investigate Explain to business integrateWho is this talk aimed at? Developers Toying with search DB search Starting with search
  • Blog post – to explainsConfigure ContainerSolrIndex Documents Any sourceSearch Default search Advanced search
  • Container setup Choose Configure Accessible
  • Define the data Define what is indexed Define what is storedIntegral to returning relevant search responsesRequire tweaking to get rightConscious of space size of the index - speed
  • Docs to Schema SpecIndexing by Database or API
  • Partial Words Analyzing?Search all fields Possibly the main onesResponse Less data Stay clear of additional queries consider caching
  • Consider using stemming analyzers to return more resultsIncrease matching columnsUse session data affect results Consider caching effectsMore response data required
  • Users modify their search Specify fields For enriching the results Consider bloated storage Tradeoff with Additional queries Tweak later?Advanced for returning More / Less results Search more of the document Filter on property
  • This talk When right time – identifying Why Search benefits Dark horse How Start journey– investigate Explain to business integrateWho is this talk aimed at? Developers Toying with search DB search Starting with search
  • Twitter: @paulmatthews86Personal Blog: 86pTechnicalNon-techSoftware Engineer at IbuildingsTechportalMongoDBSolr (May 2011)Solr ProjectsTravel CompanyMedia Company

Search with Solr Search with Solr Presentation Transcript

  • Searching with SolrWhen, Why and How?
    By Paul Matthews
  • 86p
    @paulmatthews86
    86p.paul-matthews.co.uk
    pmatthews@ibuildings.com
    techportal.ibuildings.com
    Projects:
    Travel companies
    Media corporations
    1
  • Searching…
    What?
    When?
    Why?
    How?
    2
  • Searching…
    What?
    When?
    Why?
    How?
    3
  • What is search?
    Text navigation
    Customers describing
    Sorting
    Examples
    Quick search
    Category listings
    4
  • The power of search
    5
  • Database Like
    6
  • Database Like
    Very little effort
    A very basic search
    Poor at: > 1 word
    7
  • Database Full-Text
    8
  • Database Full-Text
    Some power
    Convenient
    Feature poor
    Often very slow
    9
  • Basic Search Systems
    10
  • Basic Search Systems
    Rapid search
    Simple to setup
    Feature poor
    Accuracy
    Require more application code
    11
  • Solr Search
    12
  • Solr Search
    Very powerful
    Feature rich
    Relatively simple
    Lots of plugins (community)
    Overkill?
    Java
    13
  • Things you need to know
    14
  • Searching…
    What?
    When?
    Why?
    How?
    15
  • Applicable to me?
    Who is Solr designed for?
    Traffic
    Features
    When is a good time to implement it?
    Creation
    Post-live
    Open Source projects
    16
  • Business indicators
    Money / Time / Effort spent
    Bugs
    Tuning
    Features
    Customers
    17
  • Development indicators
    Data
    MySQL Full Text
    Degradation
    18
  • Searching…
    What?
    When?
    Why?
    How?
    19
  • Is Solr right for me?
    Know your enemy
    With great functionality comes great responsibility
    20
  • Data sources
    Database
    Easy
    API
    Features
    CSV & XML
    Solr Cell - Rich Documents
    PDF
    MS Office
    21
  • Indexing
    Parsing
    Half now, half later
    22
  • Analyzer
    Process documents
    The query gets analyzed too
    23
  • Tokenizer
    24
  • TokenizerFilter
    Synonym
    25
  • Stemming
    Matching similar words
    Reduce to Stem
    26
    Searching
    Search
    Searches
    Searched
    Searchers
    Search
  • Hit Highlighting
    “Hit” ==> “This is a <em>Hit</em> test.”
    27
  • Spell Check
    Spelchk
    Did you mean …?
    “flickr”
    28
  • 29
  • By the power of Queries!
    Phrase “Search for a phrase”
    Wildcards Look*familiar?
    Fuzzy fuzzy~
    Proximity “two words”~12
    Range name:{Paul TO Jeff}
    30
  • name:paul AND location:uk
    A single field
    Multiple Fields
    31
  • Faceting (21)
    Pre-fetching (11)
    Results (37)
    32
  • Ranked Search
    Ordered
    Any field
    33
  • Simultaneous update & search
    Hold on a minute!
    Actually, I don’t have to…
    34
  • Searching…
    What?
    When?
    Why?
    How?
    35
  • Flow
    36
  • Container
    Choose container
    Make accessible
    http://<host>:<port>/solr/admin
    37
  • SolrConfig
    Cores ~ Database Schema
    schema.xml ~ Schema definition
    38
  • Fields
    Define the data
    indexed
    Stored
    Important to model accurately
    Tweak to achieve functionality
    Conscious of space and index
    39
  • Index
    Create documents to Schema Spec
    40
  • Search
    Quick Search
    Default Search
    Advanced Search
    41
  • Quick Search
    Partial words
    Search all fields?
    Required response data
    42
  • Default Search
    Consider useful Analyzers
    Potentially match on more fields
    Enrich or refine results with personal data
    More in depth results
    43
  • Advanced Search
    Offer user control
    Consider search storage
    Data size vs Additional queries
    To return more / less results
    “Search entire document”
    “Filter by Colour”
    44
  • Searching…
    What?
    When?
    Why?
    How?
    45
  • Questions?
    46
  • We’re Hiring
    NL
    Vlissingen
    Utrecht
    UK
    London
    Sheffield
    Liverpool
    Speak to me at the end…
    pmatthews@ibuildings.com
    47
  • Thank you
    Resources Links:
    http://www.delicious.com/paulm86/solr
    This talk:
    http://joind.in/3221
    Contact Me:
    @paulmatthews86
    http://about.me/paul.matthews
    48