Coursera amazon cloudsearch presentation


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Coursera amazon cloudsearch presentation

  1. 1. Coursera +AWS CloudSearch Frank Chen Software Engineer
  2. 2. About•  Ed-Tech startup providing MOOCs o  Massive Open Online Courses•  New company -- launched 4/18/12 o  Less than a year old.•  215 free courses from 33 top universities o  Princeton, Stanford, Penn, Duke, etc... o  From Cryptography to Modern and Contemporary American Poetry•  2.5+ million users o  We reached a million users faster than Facebook and Pinterest.•  ~9 million course enrollments
  3. 3. Platform Scale•  Moderate-sized (>10,000 concurrent users)•  65 concurrent courses running now, each with tens of thousands of enrollments each•  >600 "pretty heavy" PHP/Python dynamic pages served per second sustained o  Might make backend calls to services (e.g. CloudSearch or SES --> want low latencies)•  Various other services (70 instances+ on EC2 running at the moment)•  Spiky traffic o  People procrastinate on deadlines - spiky on the weekends
  4. 4. Stack•  PHP / Python / Scala backed by MySQL•  Runs on AWS completely•  Utilizes lots of AWS services o  EC2 / ELB for servers o  MySQL RDS for databases o  S3 for video and static hosting o  Cloudfront for video / asset hosting o  SES for emails (>1 million emails everyday) o  SQS for long running tasks (video encoding, gradebook generation, etc...) o  SNS for notification services o  Route53 for DNS o  CloudSearch for forum search
  5. 5. Why CloudSearch?•  Big issue for us back in March / April. Solution then didnt work o  MySQL Full Text Search §  LIKE %x% AS NATURAL LANGUAGE? §  Really terrible results §  MyISAM (eww...)•  Requirements: o  Fast searches (we call backend APIs - dont want to keep the users waiting too long) o  Good results (need to be relevant - dont waste the students time) o  Low/no maintenance (we have enough instances to manage as is)
  6. 6. Why CloudSearch?•  Alternatives we looked at: o  Apache Solr, Sphinx, fiddling with MySQL•  Then CloudSearch was announced...•  Early general adopter - we started using CloudSearch ~10 days after announcement o  We didnt get any heads-up about CS before the public announcement o  Wrote the code to use CloudSearch and import over our existing forum posts / comments in 2 or 3 days. §  From decision to production! §  Easy to use and great documentation
  7. 7. CloudSearch Uses User facing forum search
  8. 8. CloudSearch Uses•  Analytics o  Most frequent searches and other statistics about their courses §  Informing instructors about this so they can clarify information o  Finding posts across forums §  Easy for CloudSearch, hard normally because of sharded scatter-gather problems •  Old way: Querying 600 databases on 4 RDS servers? Not fun §  Usage analysis §  Unexpected use: Instructors often want to find all their own posts so they can save / archive common answers
  9. 9. CloudSearch Scale•  Moderate scale•  ~1.5 million documents indexed o  All forum posts and comments•  50,000+ searches a day o  Spikey! Depends on when homeworks are due.
  10. 10. Experience GREAT!
  11. 11. We Want...•  "Did you mean..." o  Lots of typos from non-native speakers•  Multilingual Tokenization / Search o  We are starting to run courses in other languages...•  Find Similar Documents
  12. 12. Thank You! Questions?