Your SlideShare is downloading. ×
Coursera amazon cloudsearch presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Coursera amazon cloudsearch presentation

656
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
656
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Coursera +AWS CloudSearch Frank Chen Software Engineer
  • 2. About•  Ed-Tech startup providing MOOCs o  Massive Open Online Courses•  New company -- launched 4/18/12 o  Less than a year old.•  215 free courses from 33 top universities o  Princeton, Stanford, Penn, Duke, etc... o  From Cryptography to Modern and Contemporary American Poetry•  2.5+ million users o  We reached a million users faster than Facebook and Pinterest.•  ~9 million course enrollments
  • 3. Platform Scale•  Moderate-sized (>10,000 concurrent users)•  65 concurrent courses running now, each with tens of thousands of enrollments each•  >600 "pretty heavy" PHP/Python dynamic pages served per second sustained o  Might make backend calls to services (e.g. CloudSearch or SES --> want low latencies)•  Various other services (70 instances+ on EC2 running at the moment)•  Spiky traffic o  People procrastinate on deadlines - spiky on the weekends
  • 4. Stack•  PHP / Python / Scala backed by MySQL•  Runs on AWS completely•  Utilizes lots of AWS services o  EC2 / ELB for servers o  MySQL RDS for databases o  S3 for video and static hosting o  Cloudfront for video / asset hosting o  SES for emails (>1 million emails everyday) o  SQS for long running tasks (video encoding, gradebook generation, etc...) o  SNS for notification services o  Route53 for DNS o  CloudSearch for forum search
  • 5. Why CloudSearch?•  Big issue for us back in March / April. Solution then didnt work o  MySQL Full Text Search §  LIKE %x% AS NATURAL LANGUAGE? §  Really terrible results §  MyISAM (eww...)•  Requirements: o  Fast searches (we call backend APIs - dont want to keep the users waiting too long) o  Good results (need to be relevant - dont waste the students time) o  Low/no maintenance (we have enough instances to manage as is)
  • 6. Why CloudSearch?•  Alternatives we looked at: o  Apache Solr, Sphinx, fiddling with MySQL•  Then CloudSearch was announced...•  Early general adopter - we started using CloudSearch ~10 days after announcement o  We didnt get any heads-up about CS before the public announcement o  Wrote the code to use CloudSearch and import over our existing forum posts / comments in 2 or 3 days. §  From decision to production! §  Easy to use and great documentation
  • 7. CloudSearch Uses User facing forum search
  • 8. CloudSearch Uses•  Analytics o  Most frequent searches and other statistics about their courses §  Informing instructors about this so they can clarify information o  Finding posts across forums §  Easy for CloudSearch, hard normally because of sharded scatter-gather problems •  Old way: Querying 600 databases on 4 RDS servers? Not fun §  Usage analysis §  Unexpected use: Instructors often want to find all their own posts so they can save / archive common answers
  • 9. CloudSearch Scale•  Moderate scale•  ~1.5 million documents indexed o  All forum posts and comments•  50,000+ searches a day o  Spikey! Depends on when homeworks are due.
  • 10. Experience GREAT!
  • 11. We Want...•  "Did you mean..." o  Lots of typos from non-native speakers•  Multilingual Tokenization / Search o  We are starting to run courses in other languages...•  Find Similar Documents
  • 12. Thank You! Questions?frank@coursera.org