Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Coursera amazon cloudsearch presentation


Published on

  • Be the first to comment

  • Be the first to like this

Coursera amazon cloudsearch presentation

  1. 1. Coursera +AWS CloudSearch Frank Chen Software Engineer
  2. 2. About•  Ed-Tech startup providing MOOCs o  Massive Open Online Courses•  New company -- launched 4/18/12 o  Less than a year old.•  215 free courses from 33 top universities o  Princeton, Stanford, Penn, Duke, etc... o  From Cryptography to Modern and Contemporary American Poetry•  2.5+ million users o  We reached a million users faster than Facebook and Pinterest.•  ~9 million course enrollments
  3. 3. Platform Scale•  Moderate-sized (>10,000 concurrent users)•  65 concurrent courses running now, each with tens of thousands of enrollments each•  >600 "pretty heavy" PHP/Python dynamic pages served per second sustained o  Might make backend calls to services (e.g. CloudSearch or SES --> want low latencies)•  Various other services (70 instances+ on EC2 running at the moment)•  Spiky traffic o  People procrastinate on deadlines - spiky on the weekends
  4. 4. Stack•  PHP / Python / Scala backed by MySQL•  Runs on AWS completely•  Utilizes lots of AWS services o  EC2 / ELB for servers o  MySQL RDS for databases o  S3 for video and static hosting o  Cloudfront for video / asset hosting o  SES for emails (>1 million emails everyday) o  SQS for long running tasks (video encoding, gradebook generation, etc...) o  SNS for notification services o  Route53 for DNS o  CloudSearch for forum search
  5. 5. Why CloudSearch?•  Big issue for us back in March / April. Solution then didnt work o  MySQL Full Text Search §  LIKE %x% AS NATURAL LANGUAGE? §  Really terrible results §  MyISAM (eww...)•  Requirements: o  Fast searches (we call backend APIs - dont want to keep the users waiting too long) o  Good results (need to be relevant - dont waste the students time) o  Low/no maintenance (we have enough instances to manage as is)
  6. 6. Why CloudSearch?•  Alternatives we looked at: o  Apache Solr, Sphinx, fiddling with MySQL•  Then CloudSearch was announced...•  Early general adopter - we started using CloudSearch ~10 days after announcement o  We didnt get any heads-up about CS before the public announcement o  Wrote the code to use CloudSearch and import over our existing forum posts / comments in 2 or 3 days. §  From decision to production! §  Easy to use and great documentation
  7. 7. CloudSearch Uses User facing forum search
  8. 8. CloudSearch Uses•  Analytics o  Most frequent searches and other statistics about their courses §  Informing instructors about this so they can clarify information o  Finding posts across forums §  Easy for CloudSearch, hard normally because of sharded scatter-gather problems •  Old way: Querying 600 databases on 4 RDS servers? Not fun §  Usage analysis §  Unexpected use: Instructors often want to find all their own posts so they can save / archive common answers
  9. 9. CloudSearch Scale•  Moderate scale•  ~1.5 million documents indexed o  All forum posts and comments•  50,000+ searches a day o  Spikey! Depends on when homeworks are due.
  10. 10. Experience GREAT!
  11. 11. We Want...•  "Did you mean..." o  Lots of typos from non-native speakers•  Multilingual Tokenization / Search o  We are starting to run courses in other languages...•  Find Similar Documents
  12. 12. Thank You! Questions?