Using Solr in Online Travel Shopping to Improve User Experience
Upcoming SlideShare
Loading in...5
×
 

Using Solr in Online Travel Shopping to Improve User Experience

on

  • 4,228 views

 

Statistics

Views

Total Views
4,228
Slideshare-icon Views on SlideShare
4,227
Embed Views
1

Actions

Likes
2
Downloads
12
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Using Solr in Online Travel Shopping to Improve User Experience Using Solr in Online Travel Shopping to Improve User Experience Presentation Transcript

    • Using Solr in Online Travel to Improve User Experience Sudhakar Karegowdra, Esteban Donato Travelocity, May 25TH 2011{ sudhakar.karegowdra, esteban.donato}@travelocity.com
    • What We Will Cover§  Travelocity§  Speakers Background§  Merchandising & Solr •  Challenges •  Solution •  Sizing and performance data •  Take Away§  Location Resolution & Solr •  Challenges •  Solution •  Sizing and performance data •  Take Away§  Q&A 3
    • §  First Online Travel Agency(OTA) Launched in 1996§  Grown to 3,000 employees and is one of the largest travel agencies worldwide§  Headquartered in Dallas/Fort Worth with satellite offices in San Francisco, New York, London, Singapore, Bangalore, Buenos Aires to name a few§  In 2004, the Roaming Gnome became the centerpiece of marketing efforts and has become an international pop icon§  Owned by Sabre Holdings - sister companies include Travelocity Business, IgoUgo.com, lastminute.com, Zuji among others 4
    • Speakers Background§  Sudhakar Karegowdra §  Esteban Donato •  Principal Architect •  Lead Architect Travelocity.com Travelocity.com §  My experience §  My experience –  13 + years –  10 + years –  Solr/ Lucene 3 years –  Solr 2 years –  Implementing Hadoop, –  Analyzing Mahout and Pig and Hive for Data Carrot2 for document warehouse. clustering engine.§  Topic : §  Topic : Merchandising Location Resolution 5
    • MerchandisingBy Sudhakar Karegowdra 6
    • The Challenge§  Market Drivers •  Build Landing Pages with Faceted Navigation •  Enable Content Segmentation and delivery •  Support Roll out of Promotions •  Roll up Data to a higher level §  E.g., All 5 star hotels in California to bring all the 5 Star hotels from SFO,LAX, SAN etc., •  Faster time to market new Ideas •  Rapidly scale to accommodate global brands with disparate data sources 7
    • The Challenge§  Traditional Database approach •  Higher time to market •  Specialized skill set to design and optimize database structures and queries •  Aggregation of data and changing of structures quite complex •  Building Faceted navigation capabilities needs complex logic leading to high maintenance cost 8
    • Solution - Overview§  Data from various sources aggregated and ingested into Solr •  Core per Locale and Product Type§  Wrapper service to combine some data across product cores and manage configuration rules§  Solr’s built in Search and Faceting to power the navigation 9
    • Solution – Architecture View UI Widgets Mobile Services/Business Logic Solr Slaves (Multi Core) Solr Master (Multi Core) OfferManagement Oracle ETL Tool Deals Products …… 10
    • Solution - Achievements§  Millions of unique Long Tail Landing Pages §  E.g., http://www.travelocity.com/hotel-d4980-nevada-las-vegas- hotels_5-star_business-center_green§  Faster search across products §  E.g., Beach Deals under $500§  Segmented Content delivery through tagging§  Scaled well to distribute the content to different brands, partners and advertisers§  Opened up for other innovative applications §  Deals on Map, Deals on Mobile, Wizards etc., 11
    • Solution – Road Ahead§  Migration to Solr 3.1 •  Geo spatial search •  CSV out put format§  Query boosting by Search pattern§  Near Real time Updates§  Deal and user behavior mining in Hadoop – MapReduce and Solr to Serve the Content§  Move Slaves to Cloud 12
    • Sizing & Performance§  Index Stats §  Number of Cores : 25 §  Number of Documents : ~ 1 Million Records§  Response §  Requests : 70 tps §  Average response time : 0.005 seconds (5 ms)§  Software Versions §  Solr Version 1.4.0 –  filterCache size : 30000 §  Tomcat – 5.5.9 §  JDK1.6 13
    • Take Away§  Semi Structured Storage in Solr helps aggregate disparate sources easily Remember Dynamic fields§  Multiple Cores to manage multiple locale data§  Solr is a great enabler of “Innovations” 14
    • Location Resolution By Esteban Donato 15
    • The Challenge§  How to develop a global location resolution service?§  Flexibility to changes§  General enough to cover everyone needs§  Multi language§  Performance and scalability§  Configurable by site 16
    • Architecture of the solution Auto-complete Solr Slave Resolution § Master/Slave architecture § SolrJ client each core § Multi-core: binary format § Solr response cache represents a language Solr Master § Remote Streaming indexing § CSV formatManagement Batch Job Tool Location DB 17
    • Auto-complete§  System has to suggest options as the users type their desired location§  Examples “san” => San Francisco, “veg” => Las Vegas§  Relevancy: not all the locations are equally important. “par” => “Paris, France”; “Parana, Argentina”§  Users can search by various fields: location code, location name, city code, city name, state/province code, state province name, country code, country name. 18
    • Solr schema<dynamicField name="RANK*" type="int" required="false" indexed="true" stored="true" /><field name="GLS_FULL_SEARCH" type="glsSearchField" required="false" indexed="true"stored="false" multiValued="true" /><fieldType name="glsSearchField" class="solr.TextField" positionIncrementGap="100“> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="[/-t ]+" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.ISOLatin1AccentFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement=""replace="all"/> </analyzer></fieldType> 19
    • Resolution§  System has to resolve the location requested by the users.§  Contemplates aliases. Big Apple => New York§  Contemplates ambiguities.§  Contemplates misspellings. Lomdon => London §  NGramDistance algorithm. §  How to combine distance with relevancy §  Error suggesting the correct location when it is a prefix. Lond => London 20
    • Spellchecker configuration<fieldType name=" spellcheckType " class="solr.TextField" positionIncrementGap="100“> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory” /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.ISOLatin1AccentFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement=""replace="all"/> </analyzer></fieldType> 21
    • Sizing & Performance§  4 cores with ~ 500,000 documents indexed each§  Response times •  Auto-complete: 15ms, 20 TPS •  Resolution: 10ms, 2 TPS§  Cache configuration •  queryResultCache: maxSize=1024 •  documentCache, maxSize=1024 •  fieldValueCache & filterCache disabled 22
    • Wrap Up§  Performance always as top priority§  Develop simple but robust services§  Provide a simple API 23
    • Q&A 24
    • Contact§  Esteban Donato •  Esteban.donato@travelocity.com •  Twitter: @eddonato§  Sudhakar Karegowdra •  Sudhakar.karegowdra@travelocity.com •  Twitter: @skaregowdra https://www.facebook.com/travelocity Twitter: @travelocity and @RoamingGnome 25