• Save
Getting Started with Amazon CloudSearch
Upcoming SlideShare
Loading in...5
×
 

Getting Started with Amazon CloudSearch

on

  • 9,671 views

Introduction to Amazon CloudSearch

Introduction to Amazon CloudSearch

Statistics

Views

Total Views
9,671
Views on SlideShare
5,825
Embed Views
3,846

Actions

Likes
10
Downloads
0
Comments
0

5 Embeds 3,846

http://softwarestrategiesblog.com 3775
http://www.gabrielemittica.com 63
https://mj89sp3sau2k7lj1eg3k40hkeppguj6j-a-sites-opensocial.googleusercontent.com 4
http://lcolumbus.wordpress.com 3
http://www.testlocal.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Hello everyone and welcome to today's webinar – an Introduction to Amazon CloudSearch.Before I get started I wanted to say that you have joined the webinar muted, but throughout the webinar you can submit questions at any time using the Question Panel of your Go To Meeting control panel.We'll be answering as many questions as we can at the end of the presentation. GO TO NEXT SLIDE
  • So what is Amazon CloudSearch? Here’s a summary.It’s a full-managed and fully featured search service that runs in the AWS cloudIt scales automatically as your data and traffic fluctuates It can handle both structured and unstructured data and supports near-real time indexing of your documentsIt’s designed to be up and running in less than a hour Built by Amazon A9.comLeverages 10+ years of R&D in Search
  • For the audience members who are not already familiar with searchSearch engines automate finding a particular item (a document) in a collection of documents.What goes on under the covers?
  • Search engines let you retrieve information from a large collection based on matching some terms you’re curious about with each of the items in the collectionAn Encyclopedia is a good example of a large collection of items – the articlesLet’s say you’re looking for references to US Presidents in an EncyclopediaEach article in the encyclopedia is a possible matchYou could read from the beginning and examine each oneYou would have examined many articles that were not related to what you were looking forChapters or volumes are a way to limit what you look atBut there’s an even faster way
  • Another way is to create an index, like in a bookIndexes contain the list of all important terms that appear in all the articlesFor each term, the index lists all of the articles that have that termFor instance the index entry for ”president” would contain page numbers of the articles for Washington, Jefferson, Adams, and so onNow, you can look up presidents much more efficiently – just go to the entry for “president” and get the list of all the page numbersLet’s say that you wanted to find presidents that had lived in virginia.You could use the index to find “president” and then read all those pages to find the word “vriginia”A better way would be to look up “president”, write down the list of pages, look up “virginia” and write down that list of pages. If a page number appears on both lists (“president” AND virginia) then that would be the place to look.When you send a query to a search engine it does much the same thing.
  • Of course, we’ve all come to expect more from our search experience then strict matchingLet’s have a look at Amazon.com to see the kind of features that we expect.
  • We need to order the match set in a way that the most useful results are at the top. We need to sort by relevance.
  • Each document has metadata attributes that help users navigate to that documentOn Amazon.com, you’ll find counts for these facet values in the left rail of the page.For instance, searching for us president gives us a range of genres we can use to narrow the results.
  • In addition it’s useful to be able to search ranges of integers like prices or user ratings
  • Fielded searching allows us to narrow our results based on the values of particular document attributes
  • We are able to add combinations of different field values and use Boolean logic to build complex queries
  • Finally, documents are sorted based on more complex aspects of the query and user behavior. Values for popularity or clicks are added in to the relevance computation.All of these features provide users with a rich search experience that gets them to the items they want quickly.
  • At this point we’ll be diving into Amazon CloudSearchWe’ll be walking through the service and how to use itWhat you’ll see is a full-featured search engine in action that’s easy to set up and useI’ve listed out some of the search features we’ll coverWe’ll also cover how to interact with CloudSearch, including your http endpoints, the console, command line tools and search and document service APIs
  • When you sign up for CloudSearch you set up a search domain. This wraps a search engine in three RESTful endpoints shown on the bottom of the diagramYou send document batches to the document serviceYou send search requests to the search serviceYou configure the indexing of your document, access to your domain, text processing options and more The endpoints are stable DNS entries that stay the same as your domain scalesAs a search developer, you will interact with all of these endpoints through different modesCloudSearch has a console within the aws consoleYou can also interact with CloudSearch from the command lineFinally, you can operate directly with the APIYour application will primarily interact with the search serviceIt may also send updates direct to the document service
  • We’re going to walk through an example in some depth.* We’ll create a search domainUpload documentsConfigure the domainAnd run some searches
  • As we work through the process, we’ll be showcasing the console primarilyThe CloudSearch console is designed to simplify the process of interacting with your domain. It hides complexity behind a convenient UIIt is also the main portal for managing and monitoring the status of your domain. You’ll find your endpoints here as well as your domain’s sizing informationThe first time you log into the console, you’ll be greeted with a welcome screen with an invitation to create your domain.Selecting create domain will bring up a wizard that will allow you toName your domainPreconfigure with sample documentsFrom disk, or S3And upload those documents
  • CloudSearch will take several minutes to create the domainWe can see here a snapshot of the CloudSearch dashboardIt’s processing the domain creationThe dashboard shows you the current number of searchable documentsHow many fields you have configuredThe number of instances and partitions you are currently usingAnd the endpoints for the document and search services.
  • When your domain finishes initializing, you will upload documents for searchingCloudSearch uses a standard syntax for encoding your documents, but more on that in a secondThere are 3 main methods for sending documents to cloudsearchYou can upload a small batch of documents through the consoleYou can use the cs-post-sdf command-line tool, specifying a source location. The source can be an S3 object (an SDF file) or a local file on diskYou can also use a utility like curl to post your documents directly to your endpointYour application can use standard libraries for HTTP transport to send directly to your endpoint
  • Search documents are the heart of searchYou send documents to your document service, you retrieve documents from your search serviceSDF is the proprietary format for representing search documents. XML or JSONHere’s a JSON example. If you don’t know JSON syntaxThe square braces specify an array, we’ll come back to that in a minutethe wiggly braces specify an object. The outermost braces enclose a single search documentThe object contains a set of properties with a name and value separated by a colonLet’s look at the propertiesThe ID and version identifies this search documentThe type specifies whether to add or delete the documentThe lang specifies the language of the document (english only)The fields property has a value which is also an object, with a set of named propertiesThe fields contain the data from your search problem.We have a title, an author, a year…SDF files that you send to CloudSearch are batches, signified by the enclosing array markers.Guide to SDF is coming
  • As you send data, it’s reflected in the search results in near real time with no additional effortAs you send data, CloudSearch will automatically scale to handle that dataSearch instances are individual hosts with the capacity to store for retrieval a certain number of documentsCloudSearch is RAM-based, providing the best latencyCloudSearch automatically adds additional search instancesYou don’t have to do partitioning yourselfCloudSearch will add up to 10 partitions handling 10s of millions of documentsCloudSearch will scale larger than that if you have a bigger need. Please contact us.
  • Once your SDF is defined you can configure your domain with the indexing options that you want for your fields.Using the console or command line tools, you can use a sample set (or all of) your documents to auto-configure your domainYou’ll get a proposed configurationWhen you have sent an initial configuration, you can easily update it in the domain’s dashboard on the AWS consoleYou can also update it with the command-line tools or directly with an API call to the config serviceCloudSearch supports customization of each field as we’ll see now
  • Here you can see a snapshot of the dashboard’s Indexing Options panel.Down the left side you can see the fields that are configured, and across each row the configuration options set for that fieldEach field shows a status, these fields are all Active – that means that the configuration shown is the one that is deployed to your domainWhen you make changes to your fields’ configurations, the status will move to “Needs Indexing”. This lets you know that CloudSearch must build your changes into your domain for them to be active.If you have any fields that need indexing, the console will show an additional link to “Run Indexing”Each field has a type, and there are 3 types availableText: text fields are processed to extract tokens from the data for individual matching. From the US Presidents example, we would process each article with the contents as a text field, so that terms in the query could match within the body of the article.Text processing also includes applying stemming, stopwords, and synonyms to each token in the data for the field.Literal fields are not processed as text. Instead the entire contents is put in the index as-is for exact matching.UINT fields are processed as 32-bit unsigned integers. You can use UINTs for range searching as well as a source for custom relevance calculations.In addition to the field’s type, each field has 3 options you can set, depending on typeSearch: Literal fields can be search enabled to allow searching them directlyFacet: Fields can be facet enabled to allow retrieval of value counts for the field across the entire result setResult: Enabling result on a field means that CloudSearch will store up to 2K bytes of the field’s value in the index. Queries can then request that value be returned in the results. Careful, though, enabling result can dramatically grow your index size.You can set a default value for each field. Any document that does not contain a value for that field will receive the default value insteadYou can set field sources. This allows you to build fields out of sets of other fields for custom searching
  • In addition to field configurations, you can set domain-wide indexing optionsCloudSearch provides the ability to create a custom relevance function using a simple expression syntax employing arithmetic operators and allowing you to pull values from each document’s uint fieldsIn this way, you can mix fields like popularity with the text relevance for each documentCloudSearch also lets you configure processing of tokens in your text fields. You can upload a custom stemming and synonym dictionary as well as defining your own stopwords
  • It’s easy to integrate CloudSearch searching into your applicationCloudSearch offers its functionality as restful services giving you an easy way to send searches and get results.With the full-featured query language you can perform simple to complex queries, specify ranking and sorting, control pagination, and retrieve facets and result fieldsThe query itself is specified in the URL parameters
  • The console offers a simplified way to run test searches against your data.You can perform text searchesChange your sorting criteria using text relevance or your documents’ fieldsAnd view facet counts for and filter with all of the fields you have set for
  • We won’t detail the full query API, but here are some examples of common search tasks.The simplest searches to run are full text searches, specified by the q= parameter. Here I’ve written out the full URL including the endpoint and path. For the rest of the examples I won’t show that part.You can perform more complex queries with the bq= syntax. In this example we search for president in the title field and with a genre attribute of history. CloudSearch supports and, or, not, and additional nesting of expressions with parenthesesTo retrieve counts for a facet enabled field, you specify the facet= parameter with a comma-separated list of fields. CloudSearch also lets you control which facet values are returned with counts and the sorting of the facets that are returned.You can specify the ranking function to use with the rank= parameter. You can use the text_relevance function to sort by relevance, a rank expression you’ve defined, or field values on your documents to get alphabetic or numeric sorting.
  • To retrieve the source data from your documents’ fields, you specify the return-fields= parameter. In this example, CloudSearchwill includethe values for title, actor, and director for each document returned.You can paginate your results by specifying a start and a size parameter. In this case we will get 20 results starting at the 200th.You can search for an integer range, either open-ended or within a specific set of values using the .. Syntax.There are many more features that we haven’t covered, but this should give you a feel for some of the most common uses.
  • We have already discussed how CloudSearch scales for data, let’s look at how CloudSearch scales for traffic.More traffic requires more CPU to handle that trafficCloudSearch adds Search Instances to add the CPU to accomplish thatCloudSearch removes rows to scale back excess capacityMaximum scale:50 instancesMax of 10 wide on XLThere’s a contact us link to go larger – customers with over 1Bn docs
  • This diagram shows how CloudSearch scales in 2 directions for both traffic and dataHowever, none of this scaling requires intervention on your partCloudSearch adds partitions by reindexing on a parallel fleet that is swapped in with no down timeIt adds instances as needed, again with no down time.This concludes our walk through creating a search domain. I’ll turn it back over to Puneet to discuss pricing.
  • Explain the pricing model and why it is the way it is.CS makes it easy to try your configurationGo to the control panel to see your resources
  • Who is using Amazon CloudSearch in production now?There’s wide range of use cases: SmugMug for photographic images; Sage for bioinformatics and medical research; NewsRight for news licensingOur partner Search Technologies has done a cool integration of CloudSearch for WIKIPEDIA search

Getting Started with Amazon CloudSearch Getting Started with Amazon CloudSearch Presentation Transcript

  • Introduction to Amazon CloudSearch© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • What You Can Expect To Learn In This Webinar Amazon CloudSearch details How search works How to set up and configure your search domain CloudSearch pricing Where to find additional resources© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Introduction To Amazon CloudSearch Fully-managed, full-featured search service Automatically scales for data & traffic Handles both structured and unstructured data Near real-time indexing Up and running in less than 1 hour© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • How Search Works© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Introduction to Search© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Inverted Index US President© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Search On The WebRelevance/RankingFacetingRange SearchingFielded SearchingBoolean QueriesComplex Relevance © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Search On The WebRelevance/RankingFacetingRange SearchingFielded SearchingBoolean QueriesComplex relevance © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Search On The WebRelevance/RankingFacetingRange SearchingFielded SearchingBoolean QueriesComplex relevance © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Search On The WebRelevance/RankingFacetingRange SearchingFielded SearchingBoolean QueriesComplex relevance © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Search On The WebRelevance/RankingFacetingRange SearchingFielded SearchingBoolean QueriesComplex relevance © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Search On The WebRelevance/RankingFacetingRange SearchingFielded SearchingBoolean QueriesComplex relevance © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Search On The WebRelevance/RankingFacetingRange-SearchingFielded SearchingBoolean QueriesComplex Relevance © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Features OverviewFull Featured Search Easy to Set-up and Use Free text, structured data, and HTTP endpoints Boolean search • Configuration Faceting • Document upload Customizable relevance ranking • Search Fielded and Range search Web console Result sorting Command Line Tools Text Processing Options APIs Near real-time indexing© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Amazon CloudSearch Architecture SEARCH CLIENT SEARCH DEVELOPER www.example.com Send Search Requests Use the Search Send Create and Search Tester Documents Manage Domains Results SEARCH ENDPOINT DOCUMENT SERVICE ENDPOINT CONFIGURATION SERVICE ENDPOINT Document Command Configuration Command Console Search API Console Console Service API Line Tools API Line Tools ACCESS CONTROL ACCESS CONTROL ACCESS CONTROL SEARCH SERVICE DOCUMENT SERVICE CONFIGURATION SERVICE Search Documents Add Documents Create Domains Update Documents Configure Domains Delete Documents Delete Domains© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Creating an Amazon CloudSearch Domain1. Create a search domain2. Upload documents3. Configure search fields and text processing options4. Integrate CloudSearch into your application© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Create Search Domain Amazon CloudSearch Console • 3 - easy steps • Hides complexity • Management dashboard© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Create Search Domain© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Upload Documents Console (good for small data sets) Command Line • cs-post-sdf --source <file> [other options] • curl -d @<file> [other options] Direct-to-API • http://<endpoint>/2011-02-01/documents/batch© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Upload Documents Search Data Format (SDF) [ {"type":"add", "id": "sombzze12a8c134960", "version":5, "lang":"en", "fields": { "title":"The History Buff’s Guide to the Presidents", "author":"Thomas R. Flagel", "year":"2007", "book_id":"sombzze12a8c134960", "popularity":449425, "genre":["biographies", "politics", "social science"] } }, ...]© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Upload DocumentsAutomatic Scaling: Data Amazon CloudSearch adds capacity • Automatically • Seamlessly DATA Document Quantity and Size SEARCH INSTANCE Index Partition 1 Copy 1 SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE Index Partition 1 Index Partition 2 Index Partition n Copy 1 Copy 1 Copy 1 © 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Configure Search Domain Automatic configuration detection Easy to update Fully customizable© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Configure Search DomainConfiguration Field types: text, literal, uint Options: search, facet, result Defaults and sources© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Configure Search DomainCustom Ranking Simple syntax Use integer fields© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Search Integration Easy to integrate HTTP endpoint • http(s)://<endpoint>/2011-02-01/search Full-featured query language Queries are specified as URL parameters© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Search IntegrationConsole Full text search Text relevance Facets© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Search IntegrationAPIs Full-text search • http://<endpoint>/2011-02-01/search?q=us+presidents Complex and fielded search • bq=(and title:’us presidents genre:’history’) Retrieving facet counts • q=us+presidents&facet=genre Custom Ranking • q=us+presidents&rank=custom,text_relevance© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Search IntegrationAPIs Retrieving data • q=us+presidents&return-fields=title,actor,director Pagination • q=us+presidents&size=20&start=200 Integer range search • bq=year:1970..1980© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Automatic Scaling: Data & Traffic SEARCH INSTANCE Index Partition 1 Copy 1 SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE Index Partition 1 Index Partition 2 Index Partition n Copy 1 Copy 1 Copy 1 TRAFFIC Search Request Volume and SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE Complexity Index Partition 1 Index Partition 2 Index Partition n Copy 2 Copy 2 Copy 2 SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE Index Partition 1 Index Partition 2 Index Partition n Copy n Copy n Copy n© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Automatic Scaling: Data & Traffic DATA Document Quantity and Size SEARCH INSTANCE Index Partition 1 Copy 1 SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE Index Partition 1 Index Partition 2 Index Partition n Copy 1 Copy 1 Copy 1 TRAFFIC Search Request Volume and SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE Complexity Index Partition 1 Index Partition 2 Index Partition n Copy 2 Copy 2 Copy 2 SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE Index Partition 1 Index Partition 2 Index Partition n Copy n Copy n Copy n© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Pricing© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Pricing Model Parameters Search instances Document uploads Index documents requests© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Pricing Model1. Search Instance Types* Search Instance Type Cost per hour Small $0.12 Large $0.48 XLarge $0.682. Document Upload Charge $0.10 per 1,000 batch uploads 1 batch has 5 MB limit3. Index Documents Requests Charge $0.98 per GB of data in Search Domain© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Pricing Example 1 Million documents Average document size 1K 80K updates per day 1 million queries per day 1 index documents request call per month Cost: $97/month 1 Small Search Instance© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • CloudSearch Users© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Top Requested Features Multi-region Multi-AZ Languages Auto-complete Highlights© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Resources Amazon CloudSearch Overview Page http://aws.amazon.com/cloudsearch/ • FAQs • Community Forum • Documentation & Getting Started Tutorial (IMDb) Demos and Tutorials • What Is Amazon CloudSearch • Introducing Amazon CloudSearch (Features) • Building a Search Application Using Amazon CloudSearch© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.