Share what you know




   Sam Kimbrel                               sam@snapguide.com
   Software Engineer

Monday, April 1, 13
What is Snapguide?
                               • 1.5 million uniques/month
                               • ~2000 reqs/min across app
                                 and web

                               • Python (Pyramid/uWSGI/
                                 nginx)

                               • MySQL/Redis
                               • Built primarily on AWS: EC2,
                                 RDS, S3, SQS, SNS,
                                 CloudSearch, CloudFront




                                            daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Snapguide on CloudSearch
           • Beta trial users after mentioning Solr on the phone
                 (seriously!)

           • Primary data set: guides
           • Facets: guide topic, “featured” boolean, visibility/ACL
                 flags

           • “autocomplete” search (more later)




                                                           daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
{
      "lang": "en",
      "fields": {
          "step_count": "14",
          "author_external_id": "qS878yliQ4mxg_9uHt2AZg",
          "author": "Claire Hesseltine",
          "items": [
              "Preheat oven to 325 degrees Fahrenheit.",
              ...
          ],
          "title": "Make Brown Butter Sea Salt Cookies",
          "featured": 1,
          "summary": "The brown butter adds a nutty, caramel-like taste
  to these delicious cookies.",
          "topic": [
              "desserts"
          ],
          "main_image_uuid": "43d201c8fd4b4833b83d3f95d112f1c1",
          "like_count": 761,

                      "public": "true"
             },
             "version": 1364333310,
             "type": "add",
             "id": "9eabff97e32c4244a8205da3fba442e9"
  }                                                     daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Queries
           • Guide text search:
           q=cookies
           • Guide search with topic:
           q=cookies&facet=topic&bq=topic:‘desserts’
           • “Typeahead”/suggestion search:
           bq=(or ‘paper flower’ ‘paper flower*’)




                                              daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Result Ranking
           • Use “Compare Rank Expressions”
           • text_relevance is your friend
           • Goals:
                • Boost popular/featured guides
                • Make title/summary matches worth more than item
                      (supplies, step text) matches




                                                        daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
min(
 cs.text_relevance(
  {"weights":
   {"title":2.5, "author": 1.5, "items":
   0.1, "summary": 1.5},
  "default_weight":1}),
 1000)
+ min(200, like_count / 10)
+ 100*featured


                             daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Offline index updates
           • Extracting guide data to update document is slow
           • Remove update from online web request process
           • Internal-only API endpoints
           • SQS
           • queue_consumer daemon




                                                       daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Offline index updates

                       Web server           SQS




                                      Queue consumer
                       Snapguide
                       DB/Redis




                                         Web server
                                    (dedicated to queues)   CloudSearch




                                                            daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Performance
                      SSL is painful




                                           daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Performance




         but physical proximity (us-west-1) is
                       awesome



                                                 daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Future work
           • Add more domains (users, new features)
           • Search-based suggestion engine
           • Improved ranking/scoring — crawl our social graph




                                                       daniel@snapguide.com • confidential do not distribute



Monday, April 1, 13
Questions?




    www.snapguide.com

Monday, April 1, 13

Snapguide - Amazon Cloudsearch

  • 1.
    Share what youknow Sam Kimbrel sam@snapguide.com Software Engineer Monday, April 1, 13
  • 2.
    What is Snapguide? • 1.5 million uniques/month • ~2000 reqs/min across app and web • Python (Pyramid/uWSGI/ nginx) • MySQL/Redis • Built primarily on AWS: EC2, RDS, S3, SQS, SNS, CloudSearch, CloudFront daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 3.
    daniel@snapguide.com • confidential donot distribute Monday, April 1, 13
  • 4.
    daniel@snapguide.com • confidential donot distribute Monday, April 1, 13
  • 5.
    daniel@snapguide.com • confidential donot distribute Monday, April 1, 13
  • 6.
    Snapguide on CloudSearch • Beta trial users after mentioning Solr on the phone (seriously!) • Primary data set: guides • Facets: guide topic, “featured” boolean, visibility/ACL flags • “autocomplete” search (more later) daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 7.
    { "lang": "en", "fields": { "step_count": "14", "author_external_id": "qS878yliQ4mxg_9uHt2AZg", "author": "Claire Hesseltine", "items": [ "Preheat oven to 325 degrees Fahrenheit.", ... ], "title": "Make Brown Butter Sea Salt Cookies", "featured": 1, "summary": "The brown butter adds a nutty, caramel-like taste to these delicious cookies.", "topic": [ "desserts" ], "main_image_uuid": "43d201c8fd4b4833b83d3f95d112f1c1", "like_count": 761, "public": "true" }, "version": 1364333310, "type": "add", "id": "9eabff97e32c4244a8205da3fba442e9" } daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 8.
    Queries • Guide text search: q=cookies • Guide search with topic: q=cookies&facet=topic&bq=topic:‘desserts’ • “Typeahead”/suggestion search: bq=(or ‘paper flower’ ‘paper flower*’) daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 9.
    Result Ranking • Use “Compare Rank Expressions” • text_relevance is your friend • Goals: • Boost popular/featured guides • Make title/summary matches worth more than item (supplies, step text) matches daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 10.
    min( cs.text_relevance( {"weights": {"title":2.5, "author": 1.5, "items": 0.1, "summary": 1.5}, "default_weight":1}), 1000) + min(200, like_count / 10) + 100*featured daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 11.
    Offline index updates • Extracting guide data to update document is slow • Remove update from online web request process • Internal-only API endpoints • SQS • queue_consumer daemon daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 12.
    Offline index updates Web server SQS Queue consumer Snapguide DB/Redis Web server (dedicated to queues) CloudSearch daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 13.
    Performance SSL is painful daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 14.
    Performance but physical proximity (us-west-1) is awesome daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 15.
    Future work • Add more domains (users, new features) • Search-based suggestion engine • Improved ranking/scoring — crawl our social graph daniel@snapguide.com • confidential do not distribute Monday, April 1, 13
  • 16.
    Questions? www.snapguide.com Monday, April 1, 13