Revolutionazing Search Advertising with ElasticSearch at Swoop


Published on

Search advertising is the only type of online advertising that consistently provides value to users. Swoop is a search advertising company that uses ElasticSearch at the core of its offering. This presentation is from a talk Swoop founder & CTO Sim Simeonov gave at the Boston ElasticSearch meetup.

Published in: Technology, Design
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Revolutionazing Search Advertising with ElasticSearch at Swoop

  1. 1. Revolutionizing Search Advertising with ElasticSearch
  2. 2. Hi, I’m @simeons. I build startups.
  3. 3. This hangs at the Swoop office
  4. 4. Super brief history of advertising on the Web
  5. 5. October 27, 1994
  6. 6. Traditional advertising makes the Web suck
  7. 7. October, 2000
  8. 8. Google AdWords
  9. 9. Display Advertising Search Advertising High volume Low quality Does not optimize for users Low engagement 16% of users click 1 in 1,200 ads clicked Low volume High quality Optimizes for users High engagement 80% of users click 1 in 40 ads clicked
  10. 10. Search advertising is a real, useful Web service
  11. 11. The Battle of the Web
  12. 12. Display Advertising $18 billion 200 companies Search Advertising $20 billion
  13. 13. Display Advertising $18 $13 billion 200 companies Search Advertising $20 $25 billion
  14. 14. Join us Work with people who care Solve insanely hard problems Make the Web better
  15. 15. query AdWords on SERP ads
  16. 16. What’s in the index?
  17. 17. Data model Advertisers Campaigns Ad Groups Creatives (Ads) Keywords
  18. 18. Creatives don’t match queries. Keywords match queries.
  19. 19. What’s in the index? keyword documents
  20. 20. What is a keyword? A string e.g., canon d70 A type: specifies when a keyword matches e.g., positive phrase 9 types: each with own analysis pipeline Inherited filtering criteria e.g., US-only traffic also negative keywords
  21. 21. Keyword Types
  22. 22. Keyword doc schema Many possible schema Query dependent One type vs. many types Query depends on matching model
  23. 23. Matching models Two main approaches Boolean matching IR matching No time to discuss this Gets very geeky/math-y very quickly
  24. 24. Boolean Query Pattern for all keyword document fields i, AND together ( “does not have field i” OR ( “has field i” AND “field i satisfies the user query” ) )
  25. 25. Keyword ranking Generalized second-price auctions with revenue ordering, minimum prices and user value feedback, tuned for locally envy-free equilibria P.S. Tends to work best when the moon is full
  26. 26. Search relevance is not enough "Terrorism: Pursue a certificate in terrorism 100% online. Enroll today. Ads by Google.”
  27. 27. Custom ranking algorithm Balance expected “value” trade-offs User: engagement w/o WTF moments Advertiser: performance Publisher/network: revenue Need external data CTRs, bounce rates, share of budget, … Frequent updates to this data
  28. 28. Problem Lucene not suited for external data access Expensive to add data to indexes update == delete + add
  29. 29. Superheroes to the rescue @antirez @imotov elasticsearch-facet-script
  30. 30. General map/reduce with ES elasticsearch-facet-script on each shard node init_script: run once map_script: run per result combine_script: run w/ shard results on the aggregation node reduce_script: sees all results
  31. 31. Congrats! You built nano-AdW0rdz. Deploy to your search portal! What do you mean, you don’t have a search portal???
  32. 32. Search advertising for content Google AdWords for GDN a.k.a, Google AdSense GDN == Google Display Network Bing ContentAds
  33. 33. Search ads Search ads
  34. 34. Search ads Where is the query???
  35. 35. Build a “query” from the page Same two models as before Phrase extraction (boolean) IR matching Common tools Text analysis/summarization Language modeling Often involves indexing the pages
  36. 36. There is a catch AdWords on GDN performs 3-10x worse than AdWords on SERP
  37. 37. Problems Poor targeting accuracy Poor placement locality
  38. 38. Swoop solves these problems Unique real-time extraction & placement browser/app, Web/mobile 100+ patent claims A single page can generate 50+ queries Pixel-perfect placement in content If there is nothing to say we say nothing
  39. 39. Some metrics 3 x 3 x 3 ES deployment data, master, client nodes 5,000+ rps < 5ms query execution time ElasticSearch, Lucene & Redis are fast!
  40. 40. Rewards for solving problems A big sense of accomplishment Business doubling Q-Q Users getting better content Bigger, harder, more important problems
  41. 41. Swoop’s future with ES Deeper into Lucene More machine learning in ES map/reduce Better query rewriting engine Better content enhancement engine Probabilistic synchronized sharding Much bigger clusters
  42. 42. Thanks! Sim Simeonov @simeons Join us & make the Web better