Coursera +AWS CloudSearch    Frank Chen    Software Engineer
About•    Ed-Tech startup providing MOOCs     o    Massive Open Online Courses•    New company -- launched 4/18/12     o  ...
Platform Scale•    Moderate-sized (>10,000 concurrent users)•    65 concurrent courses running now, each with tens of     ...
Stack•    PHP / Python / Scala backed by MySQL•    Runs on AWS completely•    Utilizes lots of AWS services     o    EC2 /...
Why CloudSearch?•    Big issue for us back in March / April. Solution then     didnt work     o    MySQL Full Text Search ...
Why CloudSearch?•  Alternatives we looked at:   o  Apache Solr, Sphinx, fiddling with MySQL•  Then CloudSearch was announc...
CloudSearch Uses      User facing forum search
CloudSearch Uses•  Analytics   o  Most frequent searches and other statistics about their courses      §  Informing instr...
CloudSearch Scale•  Moderate scale•  ~1.5 million documents indexed   o    All forum posts and comments•  50,000+ searches...
Experience        GREAT!
We Want...•  "Did you mean..."  o    Lots of typos from non-native speakers•  Multilingual Tokenization / Search  o    We ...
Thank You!    Questions?frank@coursera.org
Upcoming SlideShare
Loading in …5
×

Coursera amazon cloudsearch presentation

835
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
835
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Coursera amazon cloudsearch presentation

  1. 1. Coursera +AWS CloudSearch Frank Chen Software Engineer
  2. 2. About•  Ed-Tech startup providing MOOCs o  Massive Open Online Courses•  New company -- launched 4/18/12 o  Less than a year old.•  215 free courses from 33 top universities o  Princeton, Stanford, Penn, Duke, etc... o  From Cryptography to Modern and Contemporary American Poetry•  2.5+ million users o  We reached a million users faster than Facebook and Pinterest.•  ~9 million course enrollments
  3. 3. Platform Scale•  Moderate-sized (>10,000 concurrent users)•  65 concurrent courses running now, each with tens of thousands of enrollments each•  >600 "pretty heavy" PHP/Python dynamic pages served per second sustained o  Might make backend calls to services (e.g. CloudSearch or SES --> want low latencies)•  Various other services (70 instances+ on EC2 running at the moment)•  Spiky traffic o  People procrastinate on deadlines - spiky on the weekends
  4. 4. Stack•  PHP / Python / Scala backed by MySQL•  Runs on AWS completely•  Utilizes lots of AWS services o  EC2 / ELB for servers o  MySQL RDS for databases o  S3 for video and static hosting o  Cloudfront for video / asset hosting o  SES for emails (>1 million emails everyday) o  SQS for long running tasks (video encoding, gradebook generation, etc...) o  SNS for notification services o  Route53 for DNS o  CloudSearch for forum search
  5. 5. Why CloudSearch?•  Big issue for us back in March / April. Solution then didnt work o  MySQL Full Text Search §  LIKE %x% AS NATURAL LANGUAGE? §  Really terrible results §  MyISAM (eww...)•  Requirements: o  Fast searches (we call backend APIs - dont want to keep the users waiting too long) o  Good results (need to be relevant - dont waste the students time) o  Low/no maintenance (we have enough instances to manage as is)
  6. 6. Why CloudSearch?•  Alternatives we looked at: o  Apache Solr, Sphinx, fiddling with MySQL•  Then CloudSearch was announced...•  Early general adopter - we started using CloudSearch ~10 days after announcement o  We didnt get any heads-up about CS before the public announcement o  Wrote the code to use CloudSearch and import over our existing forum posts / comments in 2 or 3 days. §  From decision to production! §  Easy to use and great documentation
  7. 7. CloudSearch Uses User facing forum search
  8. 8. CloudSearch Uses•  Analytics o  Most frequent searches and other statistics about their courses §  Informing instructors about this so they can clarify information o  Finding posts across forums §  Easy for CloudSearch, hard normally because of sharded scatter-gather problems •  Old way: Querying 600 databases on 4 RDS servers? Not fun §  Usage analysis §  Unexpected use: Instructors often want to find all their own posts so they can save / archive common answers
  9. 9. CloudSearch Scale•  Moderate scale•  ~1.5 million documents indexed o  All forum posts and comments•  50,000+ searches a day o  Spikey! Depends on when homeworks are due.
  10. 10. Experience GREAT!
  11. 11. We Want...•  "Did you mean..." o  Lots of typos from non-native speakers•  Multilingual Tokenization / Search o  We are starting to run courses in other languages...•  Find Similar Documents
  12. 12. Thank You! Questions?frank@coursera.org

×