• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Using Solr In Online Travel To Improve User Experience
 

Using Solr In Online Travel To Improve User Experience

on

  • 498 views

 

Statistics

Views

Total Views
498
Views on SlideShare
487
Embed Views
11

Actions

Likes
1
Downloads
8
Comments
0

2 Embeds 11

http://www.linkedin.com 10
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Using Solr In Online Travel To Improve User Experience Using Solr In Online Travel To Improve User Experience Presentation Transcript

    • Using Solr in Online Travel to Improve User Experience
      Sudhakar Karegowdra, Esteban Donato
      Travelocity, May 25TH 2011{ sudhakar.karegowdra, esteban.donato}@travelocity.com
    • What We Will Cover
      Travelocity
      Speakers Background
      Merchandising & Solr
      Challenges
      Solution
      Sizing and performance data
      Take Away
      Location Resolution & Solr
      Challenges
      Solution
      Sizing and performance data
      Take Away
      Q&A
      3
    • First Online Travel Agency(OTA) Launched in 1996
      Grown to 3,000 employees and is one of the largest travel agencies worldwide
      Headquartered in Dallas/Fort Worth with satellite offices in San Francisco, New York, London, Singapore, Bangalore, Buenos Aires to name a few
      In 2004, the Roaming Gnome became the centerpiece of marketing efforts and has become an international pop icon
      Owned by Sabre Holdings - sister companies include Travelocity Business, IgoUgo.com, lastminute.com, Zuji among others
      4
    • Speakers Background
      • Sudhakar Karegowdra
      • Principal Architect
      Travelocity.com
      • My experience
      • 13 + years
      • Solr/ Lucene 3 years
      • Implementing Hadoop, Pig and Hive for Data warehouse.
      • Topic : Merchandising
      Esteban Donato
      Lead Architect
      Travelocity.com
      My experience
      10 + years
      Solr 2 years
      Analyzing Mahout and Carrot2 for document clustering engine.
      Topic : Location Resolution
      5
    • 6
      Merchandising
      By Sudhakar Karegowdra
    • The Challenge
      Market Drivers
      Build Landing Pages with Faceted Navigation
      Enable Content Segmentation and delivery
      Support Roll out of Promotions
      Roll up Data to a higher level
      E.g., All 5 star hotels in California to bring all the 5 Star hotels from SFO,LAX, SAN etc.,
      Faster time to market new Ideas
      Rapidly scale to accommodate global brands with disparate data sources
      7
    • The Challenge
      Traditional Database approach
      Higher time to market
      Specialized skill set to design and optimize database structures and queries
      Aggregation of data and changing of structures quite complex
      Building Faceted navigation capabilities needs complex logic leading to high maintenance cost
      8
    • Solution - Overview
      Data from various sources aggregated and ingested into Solr
      Core per Locale and Product Type
      Wrapper service to combine some data across product cores and manage configuration rules
      Solr’s built in Search and Faceting to power the navigation
      9
    • Solution – Architecture View
      10
      UI
      Widgets
      Mobile
      Services/Business Logic
      Solr Slaves (Multi Core)
      Solr Master (Multi Core)
      Offer Management Tool
      Oracle
      ETL
      Products
      Deals
      ……
    • Solution - Achievements
      Millions of unique Long Tail Landing Pages
      E.g., http://www.travelocity.com/hotel-d4980-nevada-las-vegas-hotels_5-star_business-center_green
      Faster search across products
      E.g., Beach Deals under $500
      Segmented Content delivery through tagging
      Scaled well to distribute the content to different brands, partners and advertisers
      Opened up for other innovative applications
      Deals on Map, Deals on Mobile, Wizards etc.,
      11
    • Solution – Road Ahead
      Migration to Solr 3.1
      Geo spatial search
      CSV out put format
      Query boosting by Search pattern
      Near Real time Updates
      Deal and user behavior mining in Hadoop – MapReduce and Solr to Serve the Content
      Move Slaves to Cloud
      12
    • Sizing & Performance
      Index Stats
      Number of Cores : 25
      Number of Documents : ~ 1 Million Records
      Response
      Requests : 70 tps
      Average response time : 0.005 seconds (5 ms)
      Software Versions
      Solr Version 1.4.0
      filterCache size : 30000
      Tomcat – 5.5.9
      JDK1.6
      13
    • Take Away
      Semi Structured Storage in Solr helps aggregate disparate sources easily
      Remember Dynamic fields
      Multiple Cores to manage multiple locale data
      Solr is a great enabler of “Innovations”
      14
    • 15
      Location Resolution
      By Esteban Donato
    • The Challenge
      How to develop a global location resolution service?
      Flexibility to changes
      General enough to cover everyone needs
      Multi language
      Performance and scalability
      Configurable by site
      16
    • Architecture of the solution
      17
      Solr Slave
      Auto-complete
      Resolution
      • Master/Slave architecture
      • Multi-core: each core represents a language
      • Remote Streaming indexing
      • CSV format
      Solr Master
      Location DB
      Batch Job
      Management Tool
      • SolrJ client binary format
      • Solr response cache
    • Auto-complete
      System has to suggest options as the users type their desired location
      Examples “san” => San Francisco, “veg” => Las Vegas
      Relevancy: not all the locations are equally important. “par” => “Paris, France”; “Parana, Argentina”
      Users can search by various fields: location code, location name, city code, city name, state/province code, state province name, country code, country name.
      18
    • Solr schema
      <dynamicField name="RANK*" type="int" required="false" indexed="true" stored="true" />
      <field name="GLS_FULL_SEARCH" type="glsSearchField" required="false" indexed="true" stored="false" multiValued="true"/>
      <fieldType name="glsSearchField" class="solr.TextField" positionIncrementGap="100“>
      <analyzer>
      <tokenizer class="solr.PatternTokenizerFactory" pattern="[/-t ]+" />
      <filter class="solr.LowerCaseFilterFactory" />
      <filter class="solr.TrimFilterFactory" />
      <filter class="solr.ISOLatin1AccentFilterFactory" />
      <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
      <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/>
      </analyzer>
      </fieldType>
      19
    • Resolution
      System has to resolve the location requested by the users.
      Contemplates aliases. Big Apple => New York
      Contemplates ambiguities.
      Contemplates misspellings. Lomdon => London
      NGramDistance algorithm.
      How to combine distance with relevancy
      Error suggesting the correct location when it is a prefix. Lond => London
      20
    • Spellchecker configuration
      <fieldType name=" spellcheckType " class="solr.TextField" positionIncrementGap="100“>
      <analyzer>
      <tokenizerclass="solr.KeywordTokenizerFactory” />
      <filter class="solr.LowerCaseFilterFactory" />
      <filter class="solr.TrimFilterFactory" />
      <filter class="solr.ISOLatin1AccentFilterFactory" />
      <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
      <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/>
      </analyzer>
      </fieldType>
      21
    • Sizing & Performance
      4 cores with ~ 500,000 documents indexed each
      Response times
      Auto-complete: 15ms, 20 TPS
      Resolution: 10ms, 2 TPS
      Cache configuration
      queryResultCache: maxSize=1024
      documentCache, maxSize=1024
      fieldValueCache  & filterCache  disabled
      22
    • Wrap Up
      Performance always as top priority
      Develop simple but robust services
      Provide a simple API
      23
    • Q&A
      24
    • Contact
      Esteban Donato
      Esteban.donato@travelocity.com
      Twitter: @eddonato
      Sudhakar Karegowdra
      Sudhakar.karegowdra@travelocity.com
      Twitter: @skaregowdra
      https://www.facebook.com/travelocity
      Twitter: @travelocityand
      @RoamingGnome
      25