Amazon CloudSearch is a fully-managed service that makes it easy to set up, operate, and scale a search solution for your website or application. Traditional search solutions require significant time and resources to maintain and operate. In addition to the complexity involved, administration of a search system is also expensive. Amazon CloudSearch not only significantly lowers the cost of a search solution, but it also makes it easy to setup a search system that can change with the needs of the business.
During this session we will provide an overview of Amazon CloudSearch including recently launched powerful search and admin features, discuss popular use cases for CloudSearch, and share best practices that will help you fully leverage CloudSearch to build scalable search solutions for your websites and applications.
6. Representation of a Document
Field Value
id tt0371746
title Iron Man
description When wealthy industrialist Tony Stark is forced to build
an armored suit after a life-threatening incident, he
ultimately decides to use its technology to fight against
evil.
director John Favreau
actors Robert Downey Jr., Gwyneth Paltrow, Terrence Howard
...
rating 7.9
release_date 2008-05-02T00:00:00Z
8. Geo
• Latlon data type
• Region search
• Distance sort
• Supports mobile
9. Text Processing (Normalization)
• Tokenization
(parsing)
• Downcasing
• Stemming
• Stopword removal
• Synonym Addition
When wealthy industrialist Tony Stark is forced to
build an armored suit after a life-threatening
incident, he ultimately decides to use its
technology to fight against evil.
when wealth industrial tony stark force build
armor suit after life threaten incident ultimate
decide use technology fight against evil
10. Indexing
Term Documents (Posting List)
Iron The Man in the Iron Mask
Iron Man 2
Iron Man
The Iron Giant
The Iron Lady
...
Man Rain Man
The Man in the Moon
Iron Man 2
The Lawnmower Man
The Third Man
Iron Man
...
11. Matching
The Man in the Iron
Mask
Iron Man 2
Iron Man
The Iron Giant
The Iron Lady
Rain Man
The Man in the Moon
Iron Man 2
The Lawnmower Man
The Third Man
Iron Man
Iron Man 2
Iron Man
12. Ranking and Relevance
• The meat of the search engine
• TF-IDF – uniqueness and presence
• Additional Criteria
– Measures of document value (e.g. rating)
– Observed user behavior
– Freshness
13. Summary
• Search makes data accessible
• Search documents gather information about one search target
• Reverse indices provide the basis of text-text matching
• Relevance brings the best matches
15. Building a Search service
• Build your own
– Extend datastores and build custom relevance engine
• Open Source
– Apache Solr, ElasticSearch
• Legacy Enterprise Search
– FAST, Autonomy, Endeca
16. Challenges with building a Search service
• COMPLEX: Requires extensive search expertise
• COSTLY: High upfront expenditure
• SLOW: Long time to market. Slows innovation
• UNDIFFERENTIATED: Operational overhead that doesn’t add value to
core product
17. Where CloudSearch fits in the picture
Amazon CloudSearch is a fully managed search service in the cloud that
makes it easy to setup, operate, and scale a search solution for your
website or application
Similar benefits as other AWS Managed Services
• Easy to setup and operate (Console, SDK, CLT)
• Pay as you go
• No need to guess capacity
• Experiment fast with low risk
• Go Global in minutes
23. Simple Queries
http(s)/<search endpoint>/2013-01-01/search?q=iron+man
{"id": "tt0371746",
"highlights": {
"plot": "When wealthy industrialist Tony Stark is
forced to build an armored suit after a life-threatening
incident, he ultimately decides to use its technology to
fight against evil.",
"title": "Iron Man"} },
{"id": "tt1866249",
"highlights": {
"plot": "A man in an iron lung who wishes to lose his
virginity contacts a professional sex surrogate with the
help of his therapist and priest.",
"title": "The Sessions" } },
29. Expressions
• Baseline TF-IDF function provides textual relevance
• Expressions use field sources or other expressions
• Allows customization per-user or per-query
36. Feature Detail: IAM Integration
Configuration API Only
{
"Version":"2012-10-17",
"Statement": [
{ "Effect": "Allow",
"Action": ["cloudsearch:*"],
"Resource": "arn:aws:cloudsearch:us-east-1:111122223333:domain/imdb-movies" },
{ "Effect": "Deny",
"Action": ["cloudsearch:DeleteDomain"],
"Resource": "arn:aws:cloudsearch:us-east-1:111122223333:domain/imdb-movies" }
]
}
37. Closing Thoughts
• Content Discovery goes hand in hand with Content. Search is
everywhere!
• CloudSearch is a fully managed, easy to use, cost effective search
service
• Get the powerful search features found in open source engines
(Apache Solr) combined with value add AWS features (easy setup, on
demand pricing, auto scaling, Multi-AZ, global availability)