0
Using Solr in Online Travel to Improve User Experience        Sudhakar Karegowdra, Esteban Donato               Travelocit...
What We Will Cover§  Travelocity§  Speakers Background§  Merchandising & Solr   •    Challenges   •    Solution   •    ...
§  First Online Travel Agency(OTA) Launched in 1996§  Grown to 3,000 employees and is one of the largest    travel agenc...
Speakers Background§  Sudhakar Karegowdra             §  Esteban Donato  •  Principal Architect              •  Lead Arc...
MerchandisingBy Sudhakar Karegowdra                         6
The Challenge§  Market Drivers   •    Build Landing Pages with Faceted Navigation   •    Enable Content Segmentation and ...
The Challenge§  Traditional Database approach  •  Higher time to market  •  Specialized skill set to design and optimize ...
Solution - Overview§  Data from various sources aggregated and    ingested into Solr   •  Core per Locale and Product Typ...
Solution – Architecture View                      UI      Widgets       Mobile                        Services/Business Lo...
Solution - Achievements§  Millions of unique Long Tail Landing Pages      §  E.g.,          http://www.travelocity.com/h...
Solution – Road Ahead§  Migration to Solr 3.1   •  Geo spatial search   •  CSV out put format§  Query boosting by Search...
Sizing & Performance§  Index Stats      §  Number of Cores : 25      §  Number of Documents : ~ 1 Million Records§  Re...
Take Away§  Semi Structured Storage in Solr helps    aggregate disparate sources easily      Remember Dynamic fields§  M...
Location Resolution    By Esteban Donato                        15
The Challenge§  How to develop a global location resolution    service?§  Flexibility to changes§  General enough to co...
Architecture of the solution                 Auto-complete                                           Solr Slave           ...
Auto-complete§  System has to suggest options as the users    type their desired location§  Examples “san” => San Franci...
Solr schema<dynamicField name="RANK*" type="int" required="false" indexed="true" stored="true" /><field name="GLS_FULL_SEA...
Resolution§  System has to resolve the location requested    by the users.§  Contemplates aliases. Big Apple => New York...
Spellchecker configuration<fieldType name=" spellcheckType " class="solr.TextField" positionIncrementGap="100“>  <analyzer...
Sizing & Performance§  4 cores with ~ 500,000 documents indexed    each§  Response times  •  Auto-complete: 15ms, 20 TPS...
Wrap Up§  Performance always as top priority§  Develop simple but robust services§  Provide a simple API               ...
Q&A      24
Contact§  Esteban Donato  •  Esteban.donato@travelocity.com  •  Twitter: @eddonato§  Sudhakar Karegowdra  •  Sudhakar.ka...
Upcoming SlideShare
Loading in...5
×

Using Solr in Online Travel Shopping to Improve User Experience

3,999

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,999
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Using Solr in Online Travel Shopping to Improve User Experience"

  1. 1. Using Solr in Online Travel to Improve User Experience Sudhakar Karegowdra, Esteban Donato Travelocity, May 25TH 2011{ sudhakar.karegowdra, esteban.donato}@travelocity.com
  2. 2. What We Will Cover§  Travelocity§  Speakers Background§  Merchandising & Solr •  Challenges •  Solution •  Sizing and performance data •  Take Away§  Location Resolution & Solr •  Challenges •  Solution •  Sizing and performance data •  Take Away§  Q&A 3
  3. 3. §  First Online Travel Agency(OTA) Launched in 1996§  Grown to 3,000 employees and is one of the largest travel agencies worldwide§  Headquartered in Dallas/Fort Worth with satellite offices in San Francisco, New York, London, Singapore, Bangalore, Buenos Aires to name a few§  In 2004, the Roaming Gnome became the centerpiece of marketing efforts and has become an international pop icon§  Owned by Sabre Holdings - sister companies include Travelocity Business, IgoUgo.com, lastminute.com, Zuji among others 4
  4. 4. Speakers Background§  Sudhakar Karegowdra §  Esteban Donato •  Principal Architect •  Lead Architect Travelocity.com Travelocity.com §  My experience §  My experience –  13 + years –  10 + years –  Solr/ Lucene 3 years –  Solr 2 years –  Implementing Hadoop, –  Analyzing Mahout and Pig and Hive for Data Carrot2 for document warehouse. clustering engine.§  Topic : §  Topic : Merchandising Location Resolution 5
  5. 5. MerchandisingBy Sudhakar Karegowdra 6
  6. 6. The Challenge§  Market Drivers •  Build Landing Pages with Faceted Navigation •  Enable Content Segmentation and delivery •  Support Roll out of Promotions •  Roll up Data to a higher level §  E.g., All 5 star hotels in California to bring all the 5 Star hotels from SFO,LAX, SAN etc., •  Faster time to market new Ideas •  Rapidly scale to accommodate global brands with disparate data sources 7
  7. 7. The Challenge§  Traditional Database approach •  Higher time to market •  Specialized skill set to design and optimize database structures and queries •  Aggregation of data and changing of structures quite complex •  Building Faceted navigation capabilities needs complex logic leading to high maintenance cost 8
  8. 8. Solution - Overview§  Data from various sources aggregated and ingested into Solr •  Core per Locale and Product Type§  Wrapper service to combine some data across product cores and manage configuration rules§  Solr’s built in Search and Faceting to power the navigation 9
  9. 9. Solution – Architecture View UI Widgets Mobile Services/Business Logic Solr Slaves (Multi Core) Solr Master (Multi Core) OfferManagement Oracle ETL Tool Deals Products …… 10
  10. 10. Solution - Achievements§  Millions of unique Long Tail Landing Pages §  E.g., http://www.travelocity.com/hotel-d4980-nevada-las-vegas- hotels_5-star_business-center_green§  Faster search across products §  E.g., Beach Deals under $500§  Segmented Content delivery through tagging§  Scaled well to distribute the content to different brands, partners and advertisers§  Opened up for other innovative applications §  Deals on Map, Deals on Mobile, Wizards etc., 11
  11. 11. Solution – Road Ahead§  Migration to Solr 3.1 •  Geo spatial search •  CSV out put format§  Query boosting by Search pattern§  Near Real time Updates§  Deal and user behavior mining in Hadoop – MapReduce and Solr to Serve the Content§  Move Slaves to Cloud 12
  12. 12. Sizing & Performance§  Index Stats §  Number of Cores : 25 §  Number of Documents : ~ 1 Million Records§  Response §  Requests : 70 tps §  Average response time : 0.005 seconds (5 ms)§  Software Versions §  Solr Version 1.4.0 –  filterCache size : 30000 §  Tomcat – 5.5.9 §  JDK1.6 13
  13. 13. Take Away§  Semi Structured Storage in Solr helps aggregate disparate sources easily Remember Dynamic fields§  Multiple Cores to manage multiple locale data§  Solr is a great enabler of “Innovations” 14
  14. 14. Location Resolution By Esteban Donato 15
  15. 15. The Challenge§  How to develop a global location resolution service?§  Flexibility to changes§  General enough to cover everyone needs§  Multi language§  Performance and scalability§  Configurable by site 16
  16. 16. Architecture of the solution Auto-complete Solr Slave Resolution § Master/Slave architecture § SolrJ client each core § Multi-core: binary format § Solr response cache represents a language Solr Master § Remote Streaming indexing § CSV formatManagement Batch Job Tool Location DB 17
  17. 17. Auto-complete§  System has to suggest options as the users type their desired location§  Examples “san” => San Francisco, “veg” => Las Vegas§  Relevancy: not all the locations are equally important. “par” => “Paris, France”; “Parana, Argentina”§  Users can search by various fields: location code, location name, city code, city name, state/province code, state province name, country code, country name. 18
  18. 18. Solr schema<dynamicField name="RANK*" type="int" required="false" indexed="true" stored="true" /><field name="GLS_FULL_SEARCH" type="glsSearchField" required="false" indexed="true"stored="false" multiValued="true" /><fieldType name="glsSearchField" class="solr.TextField" positionIncrementGap="100“> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="[/-t ]+" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.ISOLatin1AccentFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement=""replace="all"/> </analyzer></fieldType> 19
  19. 19. Resolution§  System has to resolve the location requested by the users.§  Contemplates aliases. Big Apple => New York§  Contemplates ambiguities.§  Contemplates misspellings. Lomdon => London §  NGramDistance algorithm. §  How to combine distance with relevancy §  Error suggesting the correct location when it is a prefix. Lond => London 20
  20. 20. Spellchecker configuration<fieldType name=" spellcheckType " class="solr.TextField" positionIncrementGap="100“> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory” /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.ISOLatin1AccentFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement=""replace="all"/> </analyzer></fieldType> 21
  21. 21. Sizing & Performance§  4 cores with ~ 500,000 documents indexed each§  Response times •  Auto-complete: 15ms, 20 TPS •  Resolution: 10ms, 2 TPS§  Cache configuration •  queryResultCache: maxSize=1024 •  documentCache, maxSize=1024 •  fieldValueCache & filterCache disabled 22
  22. 22. Wrap Up§  Performance always as top priority§  Develop simple but robust services§  Provide a simple API 23
  23. 23. Q&A 24
  24. 24. Contact§  Esteban Donato •  Esteban.donato@travelocity.com •  Twitter: @eddonato§  Sudhakar Karegowdra •  Sudhakar.karegowdra@travelocity.com •  Twitter: @skaregowdra https://www.facebook.com/travelocity Twitter: @travelocity and @RoamingGnome 25
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×