Coursera +AWS CloudSearch Frank Chen Software Engineer
About• Ed-Tech startup providing MOOCs o Massive Open Online Courses• New company -- launched 4/18/12 o Less than a year old.• 215 free courses from 33 top universities o Princeton, Stanford, Penn, Duke, etc... o From Cryptography to Modern and Contemporary American Poetry• 2.5+ million users o We reached a million users faster than Facebook and Pinterest.• ~9 million course enrollments
Platform Scale• Moderate-sized (>10,000 concurrent users)• 65 concurrent courses running now, each with tens of thousands of enrollments each• >600 "pretty heavy" PHP/Python dynamic pages served per second sustained o Might make backend calls to services (e.g. CloudSearch or SES --> want low latencies)• Various other services (70 instances+ on EC2 running at the moment)• Spiky traffic o People procrastinate on deadlines - spiky on the weekends
Stack• PHP / Python / Scala backed by MySQL• Runs on AWS completely• Utilizes lots of AWS services o EC2 / ELB for servers o MySQL RDS for databases o S3 for video and static hosting o Cloudfront for video / asset hosting o SES for emails (>1 million emails everyday) o SQS for long running tasks (video encoding, gradebook generation, etc...) o SNS for notification services o Route53 for DNS o CloudSearch for forum search
Why CloudSearch?• Big issue for us back in March / April. Solution then didnt work o MySQL Full Text Search § LIKE %x% AS NATURAL LANGUAGE? § Really terrible results § MyISAM (eww...)• Requirements: o Fast searches (we call backend APIs - dont want to keep the users waiting too long) o Good results (need to be relevant - dont waste the students time) o Low/no maintenance (we have enough instances to manage as is)
Why CloudSearch?• Alternatives we looked at: o Apache Solr, Sphinx, fiddling with MySQL• Then CloudSearch was announced...• Early general adopter - we started using CloudSearch ~10 days after announcement o We didnt get any heads-up about CS before the public announcement o Wrote the code to use CloudSearch and import over our existing forum posts / comments in 2 or 3 days. § From decision to production! § Easy to use and great documentation
CloudSearch Uses• Analytics o Most frequent searches and other statistics about their courses § Informing instructors about this so they can clarify information o Finding posts across forums § Easy for CloudSearch, hard normally because of sharded scatter-gather problems • Old way: Querying 600 databases on 4 RDS servers? Not fun § Usage analysis § Unexpected use: Instructors often want to find all their own posts so they can save / archive common answers
CloudSearch Scale• Moderate scale• ~1.5 million documents indexed o All forum posts and comments• 50,000+ searches a day o Spikey! Depends on when homeworks are due.