Using Solr in Online Travel to Improve User Experience<br />Sudhakar Karegowdra, Esteban Donato<br />Travelocity, May 25TH...
What We Will Cover<br />Travelocity<br />Speakers Background<br />Merchandising & Solr<br />Challenges<br />Solution<br />...
First Online Travel Agency(OTA) Launched in 1996<br />Grown to 3,000 employees and is one of the largest travel agencies w...
Speakers Background<br /><ul><li>Sudhakar Karegowdra
Principal Architect</li></ul>Travelocity.com<br /><ul><li>My experience
13 + years
Solr/ Lucene 3 years
Implementing Hadoop, Pig and Hive for Data warehouse.
Topic : Merchandising</li></ul>Esteban Donato<br />Lead Architect<br />    Travelocity.com<br />My experience<br />10 + ye...
6<br />Merchandising <br />By Sudhakar Karegowdra<br />
The Challenge<br />Market Drivers<br />Build Landing Pages with Faceted Navigation<br />Enable Content Segmentation and de...
The Challenge<br />Traditional Database approach<br />Higher time to market<br />Specialized skill set to design and optim...
Solution - Overview <br />Data from various sources aggregated and ingested into Solr <br />Core per Locale and Product Ty...
Solution – Architecture View<br />10<br />UI<br />Widgets<br />Mobile<br />Services/Business Logic<br />Solr Slaves (Multi...
Solution - Achievements<br />Millions of unique Long Tail Landing Pages<br />E.g., http://www.travelocity.com/hotel-d4980-...
Solution – Road Ahead<br />Migration to Solr 3.1 <br />Geo spatial search<br />CSV out put format<br />Query boosting by S...
Sizing & Performance <br />Index Stats <br />Number of Cores : 25<br />Number of Documents : ~ 1 Million Records<br />Resp...
Take Away<br />Semi Structured Storage in Solr helps aggregate disparate sources easily<br />Remember Dynamic fields <br /...
15<br />Location Resolution<br />By Esteban Donato<br />
The Challenge<br />How to develop a global location resolution service?<br />Flexibility to changes<br />General enough to...
Architecture of the solution<br />17<br />Solr Slave<br />Auto-complete<br />Resolution<br /><ul><li>Master/Slave architec...
Multi-core: each core represents a language
Remote Streaming indexing
CSV format</li></ul>Solr Master<br />Location DB<br />Batch Job<br />Management Tool<br /><ul><li>SolrJ client binary format
Upcoming SlideShare
Loading in...5
×

Using solr in online travel to improve  user experience - By Karegowdra Sudhakar and Donato Estaban 

925

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
925
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Using solr in online travel to improve  user experience - By Karegowdra Sudhakar and Donato Estaban 

  1. 1. Using Solr in Online Travel to Improve User Experience<br />Sudhakar Karegowdra, Esteban Donato<br />Travelocity, May 25TH 2011{ sudhakar.karegowdra, esteban.donato}@travelocity.com<br />
  2. 2. What We Will Cover<br />Travelocity<br />Speakers Background<br />Merchandising & Solr<br />Challenges<br />Solution<br />Sizing and performance data<br />Take Away<br />Location Resolution & Solr<br />Challenges<br />Solution<br />Sizing and performance data<br />Take Away<br />Q&A<br />3<br />
  3. 3. First Online Travel Agency(OTA) Launched in 1996<br />Grown to 3,000 employees and is one of the largest travel agencies worldwide<br />Headquartered in Dallas/Fort Worth with satellite offices in San Francisco, New York, London, Singapore, Bangalore, Buenos Aires to name a few<br />In 2004, the Roaming Gnome became the centerpiece of marketing efforts and has become an international pop icon<br />Owned by Sabre Holdings - sister companies include Travelocity Business, IgoUgo.com, lastminute.com, Zuji among others<br />4<br />
  4. 4. Speakers Background<br /><ul><li>Sudhakar Karegowdra
  5. 5. Principal Architect</li></ul>Travelocity.com<br /><ul><li>My experience
  6. 6. 13 + years
  7. 7. Solr/ Lucene 3 years
  8. 8. Implementing Hadoop, Pig and Hive for Data warehouse.
  9. 9. Topic : Merchandising</li></ul>Esteban Donato<br />Lead Architect<br /> Travelocity.com<br />My experience<br />10 + years<br />Solr 2 years <br />Analyzing Mahout and Carrot2 for document clustering engine.<br />Topic : Location Resolution<br />5<br />
  10. 10. 6<br />Merchandising <br />By Sudhakar Karegowdra<br />
  11. 11. The Challenge<br />Market Drivers<br />Build Landing Pages with Faceted Navigation<br />Enable Content Segmentation and delivery<br />Support Roll out of Promotions <br />Roll up Data to a higher level <br />E.g., All 5 star hotels in California to bring all the 5 Star hotels from SFO,LAX, SAN etc.,<br />Faster time to market new Ideas<br />Rapidly scale to accommodate global brands with disparate data sources<br />7<br />
  12. 12. The Challenge<br />Traditional Database approach<br />Higher time to market<br />Specialized skill set to design and optimize database structures and queries<br />Aggregation of data and changing of structures quite complex<br />Building Faceted navigation capabilities needs complex logic leading to high maintenance cost<br />8<br />
  13. 13. Solution - Overview <br />Data from various sources aggregated and ingested into Solr <br />Core per Locale and Product Type <br /> Wrapper service to combine some data across product cores and manage configuration rules<br />Solr’s built in Search and Faceting to power the navigation<br />9<br />
  14. 14. Solution – Architecture View<br />10<br />UI<br />Widgets<br />Mobile<br />Services/Business Logic<br />Solr Slaves (Multi Core)<br />Solr Master (Multi Core)<br />Offer Management Tool<br />Oracle<br />ETL<br />Products<br />Deals<br />……<br />
  15. 15. Solution - Achievements<br />Millions of unique Long Tail Landing Pages<br />E.g., http://www.travelocity.com/hotel-d4980-nevada-las-vegas-hotels_5-star_business-center_green<br />Faster search across products <br />E.g., Beach Deals under $500<br />Segmented Content delivery through tagging <br />Scaled well to distribute the content to different brands, partners and advertisers<br />Opened up for other innovative applications<br />Deals on Map, Deals on Mobile, Wizards etc.,<br />11<br />
  16. 16. Solution – Road Ahead<br />Migration to Solr 3.1 <br />Geo spatial search<br />CSV out put format<br />Query boosting by Search pattern<br />Near Real time Updates<br />Deal and user behavior mining in Hadoop – MapReduce and Solr to Serve the Content<br />Move Slaves to Cloud <br />12<br />
  17. 17. Sizing & Performance <br />Index Stats <br />Number of Cores : 25<br />Number of Documents : ~ 1 Million Records<br />Response<br />Requests : 70 tps <br />Average response time : 0.005 seconds (5 ms)<br />Software Versions<br />Solr Version 1.4.0<br />filterCache size : 30000<br />Tomcat – 5.5.9<br />JDK1.6<br />13<br />
  18. 18. Take Away<br />Semi Structured Storage in Solr helps aggregate disparate sources easily<br />Remember Dynamic fields <br />Multiple Cores to manage multiple locale data<br />Solr is a great enabler of “Innovations”<br />14<br />
  19. 19. 15<br />Location Resolution<br />By Esteban Donato<br />
  20. 20. The Challenge<br />How to develop a global location resolution service?<br />Flexibility to changes<br />General enough to cover everyone needs<br />Multi language<br />Performance and scalability <br />Configurable by site<br />16<br />
  21. 21. Architecture of the solution<br />17<br />Solr Slave<br />Auto-complete<br />Resolution<br /><ul><li>Master/Slave architecture
  22. 22. Multi-core: each core represents a language
  23. 23. Remote Streaming indexing
  24. 24. CSV format</li></ul>Solr Master<br />Location DB<br />Batch Job<br />Management Tool<br /><ul><li>SolrJ client binary format
  25. 25. Solr response cache</li></li></ul><li>Auto-complete<br />System has to suggest options as the users type their desired location <br />Examples “san” => San Francisco, “veg” => Las Vegas<br />Relevancy: not all the locations are equally important. “par” => “Paris, France”; “Parana, Argentina”<br />Users can search by various fields: location code, location name, city code, city name, state/province code, state province name, country code, country name.<br />18<br />
  26. 26. Solr schema<br /><dynamicField name="RANK*" type="int" required="false" indexed="true" stored="true" /><br /><field name="GLS_FULL_SEARCH" type="glsSearchField" required="false" indexed="true" stored="false" multiValued="true"/><br /><fieldType name="glsSearchField" class="solr.TextField" positionIncrementGap="100“><br /> <analyzer> <br /> <tokenizer class="solr.PatternTokenizerFactory" pattern="[/-t ]+" /><br /> <filter class="solr.LowerCaseFilterFactory" /><br /> <filter class="solr.TrimFilterFactory" /><br /> <filter class="solr.ISOLatin1AccentFilterFactory" /><br /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /><br /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/><br /> </analyzer><br /></fieldType><br />19<br />
  27. 27. Resolution<br />System has to resolve the location requested by the users.<br />Contemplates aliases. Big Apple => New York<br />Contemplates ambiguities.<br />Contemplates misspellings. Lomdon => London<br />NGramDistance algorithm.<br />How to combine distance with relevancy<br />Error suggesting the correct location when it is a prefix. Lond => London<br />20<br />
  28. 28. Spellchecker configuration<br /><fieldType name=" spellcheckType " class="solr.TextField" positionIncrementGap="100“><br /> <analyzer> <br /> <tokenizerclass="solr.KeywordTokenizerFactory” /><br /> <filter class="solr.LowerCaseFilterFactory" /><br /> <filter class="solr.TrimFilterFactory" /><br /> <filter class="solr.ISOLatin1AccentFilterFactory" /><br /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /><br /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/><br /> </analyzer><br /></fieldType><br />21<br />
  29. 29. Sizing & Performance <br />4 cores with ~ 500,000 documents indexed each<br />Response times<br />Auto-complete: 15ms, 20 TPS<br />Resolution: 10ms, 2 TPS<br />Cache configuration<br />queryResultCache: maxSize=1024<br />documentCache, maxSize=1024<br />fieldValueCache  & filterCache  disabled<br />22<br />
  30. 30. Wrap Up<br />Performance always as top priority<br />Develop simple but robust services<br />Provide a simple API<br />23<br />
  31. 31. Q&A<br />24<br />
  32. 32. Contact<br />Esteban Donato<br />Esteban.donato@travelocity.com<br />Twitter: @eddonato<br />Sudhakar Karegowdra<br />Sudhakar.karegowdra@travelocity.com<br />Twitter: @skaregowdra<br />https://www.facebook.com/travelocity<br />Twitter: @travelocityand <br />@RoamingGnome<br />25<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×