Amazon Cloudsearch Session With Elsevier: re:Invent 2013

1,619 views

Published on

Session SV302 from re:Invent 2013
Today's applications work across many different data assets - documents stored in Amazon S3, metadata stored in NoSQL data stores, catalogs and orders stored in relational database systems, raw files in filesystems, etc. Building a great search experience across all these disparate datasets and contexts can be daunting. Amazon CloudSearch provides simple, low-cost search, enabling your users to find the information they are looking for. In this session, we will show you how to integrate search with your application, including key areas such as data preparation, domain creation and configuration, data upload, integration of search UI, search performance and relevance tuning. We will cover search applications that are deployed for both desktop and mobile devices. Peter Simpkin from Elsevier provides a summary of their use of CloudSearch.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,619
On SlideShare
0
From Embeds
0
Number of Embeds
37
Actions
Shares
0
Downloads
14
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Amazon Cloudsearch Session With Elsevier: re:Invent 2013

  1. 1. Enrich Search User Experience For Different Parts of Your Application Using Amazon CloudSearch Jon Handler, CloudSearch Solution Architect November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Agenda •  •  •  •  Sourcing your documents Retrieval and ranking Search user interface Performance and Scale •  Developer example: Peter Simpkin, Solution Architect, Elsevier
  3. 3. Architecting with CloudSearch
  4. 4. Hands-Off Operation Document Quantity and Size SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE Index Partition 1 Copy 1 Search Request Volume and Complexity Index Partition 1 Copy 2 Index Partition 1 Copy n Index Partition 2 Copy 1 Index Partition 2 Copy 2 Index Partition 2 Copy n Index Partition n Copy 1 Index Partition n Copy 2 Index Partition n Copy n
  5. 5. MovieMate Application
  6. 6. Multiple Sources Multiple Functions
  7. 7. Mobile Experience Cancel Iron Man! Iron Man Done Iron Man 3 (2013)! When Tony Stark's world is torn apart by a formidable terrorist called the Mandarin, he starts an odyssey of rebuilding and retribution. ! Iron Man 2 (2010)! Tony Stark has declared himself Iron Man and installed world peace... or so he thinks. He soon realizes that not only is there a mad man...! Iron Man (2008)! ! When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil. ! The Man With The Iron Fists (2012) ! On the hunt for a fabled treasure of gold, a band of warriors, assassins, and a rogue British soldier descend upon a village in feudal China, where a humble blacksmith...! Movies Search Social Nearby Account Movies Search Social Nearby Account
  8. 8. Agenda •  •  •  •  Sourcing your documents Retrieval and ranking Search user interface Performance and Scale •  Developer example: Peter Simpkin, Elsevier Oxford Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  9. 9. CloudSearch Documents •  Unique identifier •  Version •  Fields –  Indexed according to configuration –  Source of matches Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  10. 10. Amazon RDS DynamoDB Amazon S3 Application Content User actions Help files Movie data Media (clips, Theater data images) User reviews, Articles lists etc.
  11. 11. Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example Bootstrap Strategy Amazon CloudSearch Amazon EC2 Amazon SQS Source System Processing Script Amazon EC2 Queuing Batching
  12. 12. Document Construction •  One source will be the master for  each  record    determine  doc  id  and  version    create  fields    for  each  auxiliary  source      gather  additional  data      send  or  queue  the  document   Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  13. 13. Relational DB Addresses Street City Movie Theater Title Name Description AddressesID Showtimes TheaterID ShowtimesID Date State Time State Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  14. 14. S3 •  Clips, images, reviews •  Apache Tika to extract content •  S3 Metadata for additional fields Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  15. 15. Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example Dynamo DB DynamoDB CloudSearch Table Item Domain Attribute Attribute Attribute Attribute Field Field Field Field Document
  16. 16. Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example Cancel Iron Man! Iron Man Done Iron Man 3 (2013)! When Tony Stark's world is torn apart by a formidable terrorist called the Mandarin, he starts an odyssey of rebuilding and retribution. ! Iron Man 2 (2010)! Tony Stark has declared himself Iron Man and installed world peace... or so he thinks. He soon realizes that not only is there a mad man...! Iron Man (2008)! ! When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil. ! The Man With The Iron Fists (2012) ! On the hunt for a fabled treasure of gold, a band of warriors, assassins, and a rogue British soldier descend upon a village in feudal China, where a humble blacksmith...! Movies Search Social Nearby Account Movies Search Social Nearby Account
  17. 17. Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example Searching Show Times id title description t_name t_street date time 1 Iron Man ... Galaxy Main 11/11 12:30pm 2 Iron Man ... Galaxy Main 11/11 1:15pm 3 Iron Man ... Galaxy Main 11/11 2:45pm 4 Iron Man ... Galaxy Main 11/11 6:00pm
  18. 18. Heterogenous Data Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  19. 19. Multi Domain Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  20. 20. Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example Updating CloudSearch Update Processor Web Server Users Amazon EC2 Amazon SQS Amazon EC2 DynamoDB Amazon RDS Amazon CloudSearch Amazon S3
  21. 21. Section Summary •  Multiple sources •  Bootstrap / Update •  Heterogeneous data Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  22. 22. Agenda •  •  •  •  Sourcing your documents Retrieval and ranking Search user interface Performance and Scale •  Developer example: Peter Simpkin, Elsevier Oxford Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  23. 23. Cancel Iron Man! Iron Man 3 (2013)! When Tony Stark's world is torn apart by a formidable terrorist called the Mandarin, he starts an odyssey of rebuilding and retribution. ! Iron Man 2 (2010)! Tony Stark has declared himself Iron Man and installed world peace... or so he thinks. He soon realizes that not only is there a mad man...! Good Matches Iron Man (2008)! When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil. ! The Man With The Iron Fists (2012) ! On the hunt for a fabled treasure of gold, a band of warriors, assassins, and a rogue British soldier descend upon a village in feudal China, where a humble blacksmith...! Movies Search Social Nearby Account
  24. 24. The Search Algorithm •  Locate documents that satisfy Boolean constraints –  Usually intersection •  Relevance rank those documents –  Differentiates from databases by relevance Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  25. 25. Document Structure Movie title description user_rating likes release_date latitude longitude Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  26. 26. Configuring for Search Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example •  Text fields for individual word search –  User-generated and external text – titles, descriptions •  Literal fields for exact matches –  Application-generated text like facets •  Integer fields for range searching and ranking
  27. 27. Searching Text Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example http(s)://<endpoint>/2011-02-01/search? •  Simple searches –  q=<text> •  Filtering –  bq= (or title:'iron' (and description:'iron' description:'man')) •  Filtering with integer ranges –  bq=(and 'iron man' year:..2010) •  Geo filtering –  bq=(and 'iron man' latitude:12700..12900 longitude:5700..5800)
  28. 28. Search Results Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example {"rank":  "-­‐text_relevance",   "match-­‐expr":  "(label  'iron  man')",   "hits":  {  "found":  204,  "start":  0,                      "hit":  [  {  "id":  "sontsst12cf5f88b42"  },                                        {  "id":  "sopvopr12ab017f082"  },                                        {  "id":  "sorzrpw12ac468a13b"  },                                    ]  },   ...   }  
  29. 29. Cancel Iron Man! Iron Man 3 (2013)! When Tony Stark's world is torn apart by a formidable terrorist called the Mandarin, he starts an odyssey of rebuilding and retribution. ! Iron Man 2 (2010)! Tony Stark has declared himself Iron Man and installed world peace... or so he thinks. He soon realizes that not only is there a mad man...! Relevant Results Iron Man (2008)! When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil. ! The Man With The Iron Fists (2012) ! On the hunt for a fabled treasure of gold, a band of warriors, assassins, and a rogue British soldier descend upon a village in feudal China, where a humble blacksmith...! Movies Search Social Nearby Account
  30. 30. Customizing Ranking •  text_relevance and cs.text_relevance •  Rank expressions –  Compute a score for each document –  &rank=<function> •  Defined in the console •  Defined at query-time –  &q='iron-man'&rank-recency=text_relevance + year &rank=recency Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  31. 31. Field Weighting
  32. 32. Field Weighting •  Adjust relative importance of fields •  &rank-title= cs.text_relevance({"weights":{"title":4.0}, "default_weight":1}) Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  33. 33. Popularity
  34. 34. Popularity Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example •  Convert floating point to integer •  Weight by the number of ranks •  rank-pop=text_relevance + log10(user-rating * number-user-ranks) * 10 + metascore * 3
  35. 35. Freshness
  36. 36. Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example Freshness •  Exponential decay function r = ce − λt •  &rank-decay=text_relevance + 200*Math.exp(-0.1*days_ago)
  37. 37. Location Sort Iron Man Done ! Movies Search Social Nearby Account
  38. 38. Location Sort Movie title description user_rating likes release_date latitude longitude Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example •  Latitude and longitude expressed as integers •  Denormalized for particular theaters with locations
  39. 39. Location Sort •  Cartesian distance function (lat − latuser )2 + (lon − lonuser )2 •  &rank-geo=sqrt(pow(latitude - lat, 2) + pow(longitude - lon), 2) •  &rank=-geo Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  40. 40. Rank Expressions: Combined Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example •  &rank-combined=text_relevance + 2.0 * geo + 0.5 * popularity + 0.3 * freshness •  &rank=combined
  41. 41. Section Summary •  Search API basics •  Customizing ranking –  Field weighting, popularity, freshness, GEO, combined •  Rank expression comparison tool Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  42. 42. Agenda •  •  •  •  Sourcing your documents Retrieval and ranking Search user interface Performance and Scale •  Developer example: Peter Simpkin, Elsevier Oxford Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  43. 43. Facets
  44. 44. Facets
  45. 45. Simple Faceting: Document Movie title description genre Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  46. 46. Simple Faceting: Configuration Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  47. 47. Simple Faceting: Query q=iron+man&facet=genre {"rank":  "-­‐text_relevance",   "match-­‐expr":  "(label  'star  wars')",   "hits":  {"found":  7,  "start":  0,  "hit":  []                  },   "facets":  {      "genre":  {          "constraints":  [              {"value":  "Family",  "count":  62},              {"value":  "Action/Adventure",  "count":  21},              {"value":  "Drama",  "count":  5  },   Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  48. 48. Simple Faceting: UI <div  class='facet'>          <ul  class='facet_list'>                  <?php                          $genres  =  $resultsObj-­‐>facets-­‐>genre-­‐>constraints;                          for  ($i  =  0;  $i  <  count($genres);  $i++)  {                                  $curGenre  =  $genres[$i];  $curCount  =  $thisGenre-­‐>count;                    ?>                  <li  class='facet_item'>                          <div  class='facet_name'><?=$curGenre?></div>                          <div  class='facet_count'><?=$curCount?></div>                  </li>                  <?php  }  ?>          </ul>   </div>   Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  49. 49. Facets
  50. 50. Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example Document Movie title description oscar1 oscar2 oscar3 •  •  •  •  •  title: Lincoln description: ... oscar1: Awards oscar2: Awards/Best Actor oscar3: Awards/Best Actor/ Daniel Day Lewis
  51. 51. Query Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example &q=lincoln&facet=oscar1,oscar2,oscar3 {"rank":  "-­‐text_relevance",  "hits":{...},   "facets":  {      "oscar1":  {          "constraints":  [              {"value":  "Awards",  "count":  23},              {"value":  "Nominations",  "count":  124}]},      "oscar2":  {          "constraints":  [              {"value":  "Awards/Best  Actor",  "count":  6},              {"value":  "Awards/Best  Actress",  "count":  3}...]},            "oscar3":  {          "constraints":  [              {"value":  "Awards/Best  Actor/Daniel  Day  Lewis",  "count":  1},              {"value":  "Awards/Best  Actor/Denzel  Washington",  "count":  2}...]},        
  52. 52. Drilldown •  •  •  •  Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example bq=oscar1:'Awards' bq=oscar2:'Awards/Best Actor' bq=oscar3:'Awards/Best Actor/Daniel Day Lewis' bq=(and 'star' oscar2:'Awards/Best Actor')
  53. 53. Section Summary •  Simple faceting •  Hierarchical faceting •  Hierarchical data handling Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  54. 54. Agenda •  •  •  •  Sourcing your documents Retrieval and ranking Search user interface Performance and Scale •  Developer example: Peter Simpkin, Elsevier Oxford
  55. 55. The Search Algorithm •  Locate documents that satisfy Boolean constraints –  Usually intersection •  Relevance rank those documents –  Differentiates from databases by relevance Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  56. 56. Performance Best Practices Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example •  Match set size •  Text queries perform better than integer queries •  Complex relevance functions
  57. 57. Optimizing Index Size Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example •  Trade off literal and uint for cost/performance •  Result fields matter most •  Enabling faceting increases size
  58. 58. Wrap Up •  •  •  •  Sourcing documents from various locations Building queries and ranking UI Components for faceting Getting the most out of your index
  59. 59. Agenda •  •  •  •  Sourcing your documents Retrieval and ranking Search user interface Performance and Scale •  Developer example: Peter Simpkin, Elsevier Oxford Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
  60. 60. Agenda •  •  •  •  •  Elsevier Intro Search Problem Statement Enterprise Content Search Hints and Tips CloudSearch Observations
  61. 61. •  •  •  •  7,000+ employees in 26 countries 2,200 journals / article market share 25% $3B revenue Scientific, Technical & Medical
  62. 62. Customers Academic Research Institutions Government & Health Corporate Research Labs Individual Researchers Products
  63. 63. Content Challenges: •  No central place for consumers to discover content •  Is not currently possible to search and retrieve atomic assets •  Assets are not reusable across products Content Systems Consumer Platforms
  64. 64. Empower our product development partners Search Opportunities: •  Create a comprehensive inventory to discover easily content Elsevier owns •  Provide access to Granular / Modular content they want at will •  Assets must be uniquely addressable Enterprise Content Search Engine
  65. 65. Enterprise Content Search eco-system Amazon SWF SDF metadata E.U Corporate Data center Amazon S3 U.S Corporate Data center Amazon CloudSearch DynamoDB Federated Content Warehouse Product Platform Data center
  66. 66. Simple Search UI
  67. 67. Elsevier Technical Drivers & Approach •  Fully-managed, full featured search service in the cloud •  Automatically scales for data & traffic •  Easy to set up and use •  PoC created in days •  Search Engine as a Service •  Pay-as-you-go pricing model
  68. 68. Hints & Tips (and issn:'0022-1694' (and type:'1.2'  (and (not action:'D') (or (and pubstartdate:..2013176 pubenddate:2005002..) (or (and pubstartdate:2005001 (and pubstarttime:0.. pubstarttime:..235959))              (or (and pubstartdate:2013177 pubstarttime:..235959)                (or (and pubenddate:2005001 pubendtime:0..) (and pubenddate:2013177 (and pubendtime:..235959 pubendtime:0..))))))))) •  Query Response Time = 5 seconds
  69. 69. Optimising Nested Queries (and issn:'0022-1694' type:'1.2'  (not action:'D') (or (and pubstartdate:..2013176 pubenddate:2005002..)          (and pubstartdate:2005001 pubstarttime:0..235959)          (and pubstartdate:2013177 pubstarttime:0..235959)          (and pubenddate:2005001 pubendtime:0..)          (and pubenddate:2013177 pubendtime:0..235959))) •  Response Time = 2.5 seconds
  70. 70. Optimised Nested Query ((not action:'D') (or (and issn:'0022-1694' and type‘1.2' and pubstartdate:..2013176 pubenddate:2005002..)       (and issn:'0022-1694' and type‘1.2' and pubstartdate:2005001 pubstarttime:0..235959)       (and issn:'0022-1694' and type‘1.2' and pubstartdate:2013177 pubstarttime:0..235959)       (and issn:'0022-1694' and type‘1.2' and pubenddate:2005001 pubendtime:0..)       (and issn:'0022-1694' and type‘1.2' and pubenddate:2013177 pubendtime:0..235959))) •  Response Time = 0.17ms
  71. 71. CloudSearch Observations facilitate knowledge sharing on content matters across Elsevier’s product platforms ability to leverage content infrastructure and capabilities across Elsevier’s divisions easy to integrate with existing on-premise Content Systems speed to market, allows developers to focus building other core Content Strategy components need to spend time optimising queries to maximise performance
  72. 72. Please give us your feedback on this presentation SVC302 As a thank you, we will select prize winners daily for completed surveys! Thank You

×