Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

2,064 views

Published on

Amazon CloudSearch is a fully-managed service that makes it easy to set up, operate, and scale a search solution for your website or application. Traditional search solutions require significant time and resources to maintain and operate. In addition to the complexity involved, administration of a search system is also expensive. Amazon CloudSearch not only significantly lowers the cost of a search solution, but it also makes it easy to setup a search system that can change with the needs of the business.

During this session we will provide an overview of Amazon CloudSearch including recently launched powerful search and admin features, discuss popular use cases for CloudSearch, and share best practices that will help you fully leverage CloudSearch to build scalable search solutions for your websites and applications.

AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

  1. 1. Build a Scalable Search Engine With the New Amazon CloudSearch
  2. 2. Agenda • What Search Engines Do • Amazon CloudSearch Introduction • Building With CloudSearch
  3. 3. What Search Engines Do
  4. 4. Search Engines Connect Us To Data
  5. 5. Documents
  6. 6. Representation of a Document Field Value id tt0371746 title Iron Man description When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil. director John Favreau actors Robert Downey Jr., Gwyneth Paltrow, Terrence Howard ... rating 7.9 release_date 2008-05-02T00:00:00Z
  7. 7. Data Types Doubles Dates Signed Integers Text Literal
  8. 8. Geo • Latlon data type • Region search • Distance sort • Supports mobile
  9. 9. Text Processing (Normalization) • Tokenization (parsing) • Downcasing • Stemming • Stopword removal • Synonym Addition When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil. when wealth industrial tony stark force build armor suit after life threaten incident ultimate decide use technology fight against evil
  10. 10. Indexing Term Documents (Posting List) Iron The Man in the Iron Mask Iron Man 2 Iron Man The Iron Giant The Iron Lady ... Man Rain Man The Man in the Moon Iron Man 2 The Lawnmower Man The Third Man Iron Man ...
  11. 11. Matching The Man in the Iron Mask Iron Man 2 Iron Man The Iron Giant The Iron Lady Rain Man The Man in the Moon Iron Man 2 The Lawnmower Man The Third Man Iron Man Iron Man 2 Iron Man
  12. 12. Ranking and Relevance • The meat of the search engine • TF-IDF – uniqueness and presence • Additional Criteria – Measures of document value (e.g. rating) – Observed user behavior – Freshness
  13. 13. Summary • Search makes data accessible • Search documents gather information about one search target • Reverse indices provide the basis of text-text matching • Relevance brings the best matches
  14. 14. Amazon CloudSearch
  15. 15. Building a Search service • Build your own – Extend datastores and build custom relevance engine • Open Source – Apache Solr, ElasticSearch • Legacy Enterprise Search – FAST, Autonomy, Endeca
  16. 16. Challenges with building a Search service • COMPLEX: Requires extensive search expertise • COSTLY: High upfront expenditure • SLOW: Long time to market. Slows innovation • UNDIFFERENTIATED: Operational overhead that doesn’t add value to core product
  17. 17. Where CloudSearch fits in the picture Amazon CloudSearch is a fully managed search service in the cloud that makes it easy to setup, operate, and scale a search solution for your website or application Similar benefits as other AWS Managed Services • Easy to setup and operate (Console, SDK, CLT) • Pay as you go • No need to guess capacity • Experiment fast with low risk • Go Global in minutes
  18. 18. Building With CloudSearch
  19. 19. Create a Domain
  20. 20. Upload Data
  21. 21. Document Upload http(s)://< document service endpoint >/2013-01-01/documents/batch Accept: application/json Content-Length: 1176 Content-Type: application/json Host: doc.imdb-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch.amazonaws.com { : , : "tt0371746", : { "directors" : [ "Jon Favreau" ], "release_date" : "2008-04- 14T00:00:00Z", "rating" : 7.9, "genres" : [ "Action", "Adventure", "Sci-Fi" ], "image_url" : "http://ia.media- imdb.com/images/M/MV5BMTczNTI2ODUwOF5BMl5BanBnXkFtZTcwMTU0NTIzMw@@._V1_SX400_.jpg", "plot" : "When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil.", "title" : "Iron Man", "rank" : 171, "running_time_secs" : 7560, "actors" : [ "Robert Downey Jr.", "Gwyneth Paltrow", "Terrence Howard" ], "year" : 2008 }}, { , : "tt0434409"} ]
  22. 22. Simple Queries Movies > Sci-Fi/Fantasy > 2008 to 2010 > Downey > "Iron"
  23. 23. Simple Queries http(s)/<search endpoint>/2013-01-01/search?q=iron+man {"id": "tt0371746", "highlights": { "plot": "When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil.", "title": "Iron Man"} }, {"id": "tt1866249", "highlights": { "plot": "A man in an iron lung who wishes to lose his virginity contacts a professional sex surrogate with the help of his therapist and priest.", "title": "The Sessions" } },
  24. 24. Complex Queries Movies > Sci-Fi/Fantasy > 2008 to 2010 > Downey > "Iron"
  25. 25. Complex Queries /search?q=(and 'iron' genres:'Sci-Fi/Fantasy' actors:'downey' year:[2008,2010] category:'Movies')&q.parser=structured& q.options={fields:['title^2','plot^0.5']} {"id": "tt0371746", "fields": { "title": "Iron Man", "year": "2008" }}, {"id": "tt1228705", "fields": { "title": "Iron Man 2", "year": "2010" }}
  26. 26. Faceting Movies > Sci-Fi/Fantasy > 2008 to 2010 > Downey > "Iron"
  27. 27. Feature Detail: Faceting /search?q=iron man&facet.genres={} {"status": {...},"hits": {...}, "facets": {"genres": { "buckets": [ {"value": "Action", "count": 62}, {"value": "Sci-Fi/Fantasy", "count": 25}, {"value": "Comedy", "count": 2}, {"value": "History", "count": 1},...
  28. 28. Adjustable Ranking Movies > Sci-Fi/Fantasy > 2008 to 2010 > Downey > "Iron"
  29. 29. Expressions • Baseline TF-IDF function provides textual relevance • Expressions use field sources or other expressions • Allows customization per-user or per-query
  30. 30. Movies > Sci-Fi/Fantasy > 2008 to 2010 > Downey > "Iron" Highlighting
  31. 31. Feature Detail: Highlighting /search&q=iron+man&highlight.plot={"format":"text"} {"status": {"rid": "8Pq/88woCwrstGQ=","time-ms": 48}, "hits": {"found": 9,"start": 0, "hit": [{ "id": "tt1228705", "fields": { "title": "Iron Man 2" }, "highlights": { "plot": "With the world now aware of his identity as *Iron* *Man*, Tony Stark must contend..." } }, . . .
  32. 32. Movies > Sci-Fi/Fantasy > 2008 to 2010 > Downey > "Iron"
  33. 33. Feature Detail: Suggestions http://<endpoint>/2013-01-01/suggest?q=ir&suggester=title_sug {"status": {"rid": "t7mti80oAQrstGQ=","time-ms": 3}, "suggest": {"query": "ir", "found": 5, "suggestions": [ {"suggestion":"Iron Man Three","score": 0, "id": "tt0371746"}, { "suggestion": "Iron Man", "score": 0, "id": "tt1228705"},
  34. 34. Feature Detail: Availability Options
  35. 35. Feature Detail: Scaling Options
  36. 36. Feature Detail: IAM Integration Configuration API Only { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["cloudsearch:*"], "Resource": "arn:aws:cloudsearch:us-east-1:111122223333:domain/imdb-movies" }, { "Effect": "Deny", "Action": ["cloudsearch:DeleteDomain"], "Resource": "arn:aws:cloudsearch:us-east-1:111122223333:domain/imdb-movies" } ] }
  37. 37. Closing Thoughts • Content Discovery goes hand in hand with Content. Search is everywhere! • CloudSearch is a fully managed, easy to use, cost effective search service • Get the powerful search features found in open source engines (Apache Solr) combined with value add AWS features (easy setup, on demand pricing, auto scaling, Multi-AZ, global availability)
  38. 38. Questions? Jon Handler (handler@amazon.com) Pravin Muthukumar (pravinm@amazon.com)

×