Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

  • 2,866 views
Uploaded on

Presentation on using Amazon CloudSearch with databases. What to use when? How can you use CloudSearch with a database? Tom Hill, Solutions Architect, Amazon CloudSearch

Presentation on using Amazon CloudSearch with databases. What to use when? How can you use CloudSearch with a database? Tom Hill, Solutions Architect, Amazon CloudSearch

More in: Business , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
2,866
On Slideshare
0
From Embeds
0
Number of Embeds
7

Actions

Shares
Downloads
29
Comments
1
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • It's all about time.Who here is currently using search?
  • Yes, column oriented databases can be relational. There are lots of ways to classify databases, as there are MANY ways to organize data. Data Base Management Systemtechnically, that "Database" is the data, not the program.
  • Yes, column oriented databases can be relational. There are lots of ways to classify databases, as there are MANY ways to organize data. Data Base Management Systemtechnically, that "Database" is the data, not the program.
  • Yes, column oriented databases can be relational. There are lots of ways to classify databases, as there are MANY ways to organize data. Data Base Management Systemtechnically, that "Database" is the data, not the program.
  • Case folding, stemming, stopwordremoval.synonyms (wizard/philospher)Also accent normalization, UTF-8 normalization, etc.These are generally based on an inverted index, a data structure that is like the index at a back of a book. An inverted index is good for the type of queries that are common with text.
  • Designed to Search with words
  • I hate the term denormalized. Things frequently come into the system as a document, then get "normalized" and put into a database. Then they get "Denormalized" back into a document. Sometimes better to skip the middle, and put the document directly into CloudSearch.
  • I hate the term denormalized. Things frequently come into the system as a document, then get "normalized" and put into a database. Then they get "Denormalized" back into a document. Sometimes better to skip the middle, and put the document directly into CloudSearch.
  • Can talk about proximity
  • Can talk about proximity
  • You're launching a new site, where do you start? Most people start with relational databases.
  • "handling words" To do anything LIKE what CloudSearch does, you'd have to make a table of words that map to documents that contain it. This is going to be rather inefficient in most relational databases.
  • "handling words" To do anything LIKE what CloudSearch does, you'd have to make a table of words that map to documents that contain it. This is going to be rather inefficient in most relational databases.
  • *Relational databases are great at what they do. If you use a wrench for a wrench, it's great. But it doesn't make a very good hammer!You frequently will only use a relational database, if people aren't doing free text search. You might only use a text search engine, if all you do is search (e.g a blog search). But this isn't common.
  • H.L. Menken said "For every problem, there is a solution that is clear, simple, obvious, and WRONG.The "like" does a linear scan. It's like a database without an index. There's no relevance, doesn’t support multiple words, etc. This is a non-starter.
  • Depending on relational databases to do text is like depending on the join in a text search engine to do your relational activities.
  • Hammers and scalpels are both good tools. But you don't want to confuse them. "If all you have is a hammer, every problem looks like a nail"
  • *Relational databases are great at what they do. If you use a wrench for a wrench, it's great. But it doesn't make a very good hammer!You frequently will only use a relational database, if people aren't doing free text search. You might only use a text search engine, if all you do is search (e.g a blog search).
  • Now that we've established that you want to use a text search database.
  • Amazon CloudSearch is a service that allows you to add text search to your application in a minimum amount of time.
  • Here's how it scales
  • Amazon CloudSearch is a service that allows you to add text search to your application in a minimum amount of time.
  • Here's how cloudsearch works
  • What do you do with all of those features? You build something like smugmug.
  • For deletes, if all you do is delete the record from the relational database, there is no record of the record existing, so you won't know to delete it from CloudSearch
  • For deletes, if all you do is delete the record from the relational database, there is no record of the record existing, so you won't know to delete it from CloudSearch
  • Not a good model. What if one is offline for a while?
  • Simple, and works. But this relies on being able to detect all of your updates in just the database. You might need another table to keep track of things. In which case it looks like the next slide.
  • Now we record the records that have changed (not usually their contents, just their ID, and delete or add). The contents are fetched by the CloudSearch loadser
  • You may delete data from one table, and record it in another table for applying to cloudsearch.
  • Simple, and works. But this relies on being able to detect all of your updates in just the database. You might need another table to keep track of things. In which case it looks like the next slide.
  • The java SDK is actually only used for JSON. You can get those classes from JSON.org as well, but then they might conflict with the AWS SDK, which you might want to use later.
  • This is stripped down, but it contains the essential itemsWe execute the SQLWe iterate through the result setWe build a documentWe build a batchThe batcher posts when it is full. Don't forget to call "flush" There is more code in the example for command line args that for actual work!Use "select as" to get the right field names
  • The changes to the code to make this handle s3 are pretty trivial. You have to change the ResultSet loop to a
  • The changes to the code to make this handle s3 are pretty trivial. You have to change the ResultSet loop to a

Transcript

  • 1. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Searching for SuccessAmazon CloudSearch and Relational Databases
  • 2. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.AgendaFinding things• Types of DatabasesMaking ChoicesWhat is CloudSearch?Combining CloudSearch with RelationalSample Code
  • 3. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Finding ThingsSo Many Databases
  • 4. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Finding Your InformationYour users need to find things• What do you use?A Database!• What Kind?
  • 5. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Its a Big World Out There!"Database" != "Relational Database"Tons of relational databases• Amazon RDS• MySQL• MSSQL• Oraclebut…
  • 6. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Many Other TypesNoSQL databases• Dynamo, Cassandra, CouchDB…Graph databases• Neo4J, Titan, …Column oriented databases• Redshift, Bigtable…Text Search Engine• CloudSearch, Lucene, Autonomy...
  • 7. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Text Search EngineGood at text queries• "Harry Potter and the Philosophers Stone"Harry Potter and the Philosophers Stoneharry potter and the philosophers stoneharry potter and the philosopher stoneharry potter philosopher stone
  • 8. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Text Search EngineBasic element is the documentDocuments are made of fields"title" => "star wars"Fields can be• Missing• Multi-valued• Variable length
  • 9. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Text Search EngineDocuments are not "normalized"• In a relational database• A movie table• A director table• An actor table• In CloudSearch• One document per movie
  • 10. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.RelationalID Document1 title:star trekactor: chris pine zacchary quinto zoe saldanadirectory: j j abramsID Title1 Star Wars2 Star Trek3 Dark StarID Actor1 Zacchary Quinto2 Chris Pine3 Zoë SaldanaID Director1 J.J. Abrams2 George Lucas3 John CarpenterText Search Engine
  • 11. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.RelevanceKey differentiator for text searchNot "does this match?"• "how WELL does this match?Includes multiple factors• Term Frequency, Document Frequency, ProximityUsers can customize this• Distance• Popularity• Field Weighting
  • 12. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Text is more than "War & Peace"Its not just books & blog postsMeta-data• Author, Title, Category, Tags• Can include numbers: counts, dates, latitude,…
  • 13. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Making ChoicesRelational? CloudSearch?
  • 14. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Relational DatabaseGood at• Exact matches• Joins• Atomic TransactionsNot so good at• Relevance• How well does this match?• Handling words
  • 15. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Text Search EnginesGood at finding• Words, Phrases• RelevanceNot so good at• Joins• Transactions
  • 16. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Options for SearchCan I just use a relational database?• Yes.Do I want to just use a relational database?• Probably not
  • 17. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Simple ApproachWidely supported, easySELECT id, title FROM books WHERE title LIKE "%amazon%"Does not perform wellDoesnt deal with multiple words
  • 18. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Text Extensions for Relational DatabasesVendor specificSELECT id,title FROM books WHERE MATCH(title)AGAINST(Harry Potter) IN NATURAL LANGUAGE MODE• Use different index structures• Typically MUCH less mature than relational code• More manual processes• Scaling, (if possible)• Managing• minimal relevance, no control
  • 19. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Appropriate ToolsVS
  • 20. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.OptionsRelational database• Weak relevance• Scaling & performance limitsText Search Engine• No transactions & locking• No JoinsBoth• Some extra effort, then best of both worlds
  • 21. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.What is Amazon CloudSearch?
  • 22. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.CloudSearchFully-managed text search engineHigh PerformanceAutomatically ScalingReliable, ResilientBased on Amazon Product Search
  • 23. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Search FeaturesFacetingComplex queries• (and potter harry (not author:rowling))Configurable synonyms, stemming & stopwordsCustom Sorting/Ranking
  • 24. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.ScalingCloudSearch scales automatically• Handle your spikes• Plan for success, but dont spend until you need it• Handle more data• Scaling is seamless – no downtime
  • 25. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Automatic ScalingSEARCH INSTANCEIndex Partition nCopy 1SEARCH INSTANCEIndex Partition 2Copy 2SEARCH INSTANCEIndex Partition nCopy 2SEARCH INSTANCEIndex Partition 2Copy nSEARCH INSTANCEDATA Document Quantity and SizeTRAFFICSearchRequestVolume andComplexityIndex Partition nCopy nSEARCH INSTANCEIndex Partition 1Copy 1SEARCH INSTANCEIndex Partition 2Copy 1SEARCH INSTANCEIndex Partition 1Copy 2SEARCH INSTANCEIndex Partition 1Copy n
  • 26. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Easy to UseRest APISimple to add• Http PostSimple to query• q=star trekSimple to integrate• JSONDocumentsCloudSearchQueriesHTTPHTTP
  • 27. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Amazon CloudSearch ArchitectureDNS / Load Balancing AWS QuerySearch API Console ConfigAPICommandLine ToolsConsoleDoc SvcAPICommandLine ToolsConsoleSEARCH SERVICE DOCUMENT SERVICE CONFIG SERVICESearch Domain
  • 28. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.What Can You Search For With CloudSearch?WineYour college buddiesCurly hair productsDownton Abbey episodesNews in BermudaPlayoff ticketsOnline coursesCat memesFurnitureDoctor reviewsTake out foodVacation rentalsTrademarksAfrican safarisKids arts & craftsFrench dating/marriageOnline videosRecipesWeather insuranceFashion newsBollywood musicStock artAnd more!
  • 29. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 30. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Combining CloudSearch+Relational Database
  • 31. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Combining the TwoBest of both worlds• Relational queries run on relational database• Text queries run on CloudSearchDownside: Complexity• More moving parts• Synchronization
  • 32. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.SynchronizationWhich one is the master?• Usually the relational databaseUpdates• All at once• At regular intervals• When data is availableDeletes
  • 33. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.DataflowOne sourceSimultaneous updatesRDBMSCloudSearchLoaderSource
  • 34. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.DataflowOne sourceTwo loadersRDBMS CloudSearchLoaderSourceLoader
  • 35. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.DataflowOne sourceLog updatesTwo loaderRDBMS CloudSearchLoaderSourceLog Loader
  • 36. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.DataflowRDBMS CloudSearchLoaderSourceLog LoaderSourceSource
  • 37. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Sample Code
  • 38. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.DataflowOne sourceTwo loadersRDBMS CloudSearchLoaderSourceLoader
  • 39. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Java ExampleRead from MySQL• JDBC – Nothing specialPost to CloudSearch• Apache HTTP Client
  • 40. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.LibrariesApache• HTTP Client• HTTP Core• Commons LoggingAWS Java SDKMySQL connector
  • 41. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Source FilesCloudSearchRDS• Just does the setup for the demoExtractAndUpload• Does the main workBatcher• Groups documents into batchesPosterHttp• Posts to CloudSearch
  • 42. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Main LoopResultSet rs = stmt.executeQuery("select * from movies");ResultSetMetaData meta = rs.getMetaData();for (int col = 1; col <= meta.getColumnCount(); col++)names.add(meta.getColumnName(col));while (rs.next()) {int version = (int) (lastModified.getTime() / 1000);JSONObject doc = new JSONObject();for (String name : names) {doc.put(name, rs.getString(name));}String id = rs.getString("id");if (batcher != null) {batcher.addDocument(doc, version, id);}}
  • 43. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.SQLselect * from movies;select key as id, title as name from moviesDenormalizing may require multiple queries
  • 44. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Demo
  • 45. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Search: Its not just for Relational DataYou can pull data from• S3• Redshift• Web• Internal Documents• And more…And make it searchable
  • 46. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Indexing S3ListObjectsRequest listObjectsRequest = newListObjectsRequest().withBucketName(bucketName);ObjectListing objectListing;do {objectListing = s3client.listObjects(listObjectsRequest);for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {processObject(objectSummary);}listObjectsRequest.setMarker(objectListing.getNextMarker());} while (objectListing.isTruncated());
  • 47. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.SummaryUse the right tool!• Text Search for Searching TextCloudSearch is fully managed text searchEasy to get data from relational DBEasy to load data into CloudSearch
  • 48. © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Next Step: Free TrialOne month (750 hours) free.Set up an accountGive it a try!Questions?• TomHill@amazon.com