Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cornell University, Central Syllabi - A look at the ingredients for a major new Class Roster feature.

122 views

Published on

Using Amazon S3 & CloudFront for syllabus storage & delivery; SNS & SQS for a message bus; and Elasticsearch on ECS for content search.

Eric Grysko (Software Engineer, Student Services IT)

Presented on Aug 30 2017 to Software Development - Special Interest Group at Cornell University

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Cornell University, Central Syllabi - A look at the ingredients for a major new Class Roster feature.

  1. 1. Student Services IT Central Syllabi A look at the ingredients for a major new Class Roster feature. classes.cornell.edu ECSS3SNS SQS CloudFront SES
  2. 2. Student Services IT Athletics & Physical Education Dean of Students Career Services Cornell Health Chimes Public Service Center Campus Life Enterprise Services
 (Cornell Store, Housing, Dining...) Student Disability Services Campus & Community Engagement ... and more Supported Customers University Registrar Student Employment Financial Aid Undergraduate Admissions Graduate School ... and more Division of Student and Campus Life (SCL) Vice Provost for Enrollment
  3. 3. Central Syllabi Class Roster ‣ Official schedule of classes
 classes.cornell.edu ‣ Office of the University Registrar ‣ October 2014
 Launched on-premises
 Symfony Framework (PHP), MongoDB, MySQL ‣ February 2016
 AWS Migration
 AWS (EC2, RDS, ELB, ASG), Jenkins, Ansible ‣ March 2016
 Released "Scheduler"
 AngularJS ‣ June 2017
 Released "Central Syllabi"
  4. 4. Central Syllabi New Functionality - June 2017 ‣ Expanded API
 Internal & Academic Units ‣ Syllabus Upload
 Instructors & Admin. Asst. ‣ Syllabus Download
 All ‣ Syllabus Content Search
 All ‣ Email Alerts
 All ‣ Reports
 Instructors, Admin Asst., and Staff
  5. 5. Central Syllabi New Ingredients ‣ Expanded API
 Internal & Academic Units ‣ Syllabus Upload
 Instructors & Admin. Asst. ‣ Syllabus Download
 All ‣ Syllabus Content Search
 All ‣ Email Alerts
 All ‣ Reports
 Instructors, Admin Asst., and Staff Message Bus
 Amazon SNS + SQS Object Storage
 Amazon S3 Content Delivery
 Amazon CloudFront Full Text Search
 Elasticsearch on Amazon ECS Notifications
 Amazon SES (API, not SMTP)
  6. 6. Central Syllabi ‣ Class Roster introduced "API 4.0" to add support for "Central Syllabi" AngularJS app ‣ Stumbling into an event driven model and a message bus ‣ Amazon SNS - Simple Notification Service ‣ Publish/Subscribe (Pub/Sub) Messaging Service ‣ Publish messages to SNS Topic using AWS SDK/API ‣ Subscribe endpoints (SQS, HTTPS, AWS Lambda, SMS, etc) ‣ Amazon SQS - Simple Queue Service ‣ SQS Queue subscribed to SNS Topic ‣ ReceiveMessage from SQS using AWS SDK/API ‣ at least once delivery; no guarantee on order of messages received ‣ Easy entry point to pub/sub ‣ AWS SDK; serialize/deserialize objects; push to topic and a receive from queue ‣ Pricing: Less than $1/million messages Expanded API
  7. 7. Central Syllabi Events Published to SNS Topic (JSON) SyllabusCreate SyllabusFileUpload SyllabusAttach ReportGenerate SyllabusUpdate SyllabusFileDelete SyllabusDetach ReportRun SyllabusReplace SyllabusFileUnsafe SyllabusPublish RosterRefresh SyllabusDelete SyllabusUnpublish SyllabusSearchIngest
  8. 8. Central Syllabi ‣ Publish hydrated messages to SNS ‣ SQS Queue subscribed to SNS Topic ‣ Worker Process (Run Loop) ‣ SQS Long Polling (20s) sqs:ReceiveMessage ‣ Processing of may trigger additional sns:Publish ‣ On success; sqs:DeleteMessage ‣ Advantages ‣ decouple components ‣ async/non-blocking; can parallelize ‣ "Maximum Receives"; redrive policy to DLQ (Dead Letter Queue) ‣ CloudWatch monitor message age (alarmed on failure) ‣ Facilitates future integrations; ex. publishing messages not internally used
 (RosterRefresh) ‣ Challenges ‣ Idempotent ("at least once", order not guaranteed); not using FIFO ‣ Fast (async) Event Processing { "type": "SyllabusPublish" "syllabusId": "..." "rosterId": 13, "roster": { "strm": 2678, "rosterSlug": "FA17", ... }, "syllabus": { "viewPermission": ... "publishedDttm": ... "resourceId": ... ... } }
  9. 9. Central Syllabi ‣ Amazon S3 (Simple Storage Service) ‣ Unlimited object storage (Key/Value); not a file system ‣ Durability: 99.999999999% ‣ Pricing: about 2 cents/gigabyte month ‣ S3 Bucket "cu-classroster-prod" ‣ Evaluation of several permissions (IAM, Bucket Policy, ACL) ‣ Bucket policy: requires encryption (AES256) and grants CloudFront OAI (user) access ‣ CORS Policy ‣ IAM Role w/policy: GetObject, PutObject only. No DeleteObject ‣ Class Roster provides clients temporary credentials to enable direct upload to S3 Syllabus Upload
  10. 10. Central Syllabi ‣ AngularJS app w/API 4.0 ‣ Client given API Token ‣ Client issues $http request ‣ Client requests includes Authorization: header ‣ Class Roster API uses AWS SDK to generate 
 and return security policy 
 and signature to client ‣ Client uploads to S3 ‣ PutObject (V4)
 http://docs.aws.amazon.com/AmazonS3/latest/ API/sigv4-UsingHTTPPOST.html ‣ Client notifies API on completion ‣ SyllabusFileUpload event published to SNS Create a POST Policy { "s3Key":"prod/syllabus-file/2678/ ae51683709939c0f2ab250e9511606803ab0020e6024e03", "formInputs": { "acl": "private", "X-Amz-Security-Token": "...", "key": "${filename}", "X-Amz-Credential": "ASIAJKOTJGRPYKUDMUHA/ 20170828/us-east-1/s3/aws4_request", "X-Amz-Algorithm": "AWS4-HMAC-SHA256", "X-Amz-Date": "20170828T231949Z", "Policy": "...", "X-Amz-Signature": "..." }, "formAttributes": { "action": "https://cu- classroster-prod.s3.amazonaws.com", "method": "POST", "enctype": "multipart/form-data" }, "fileId": "4c06f8ee0830e747eb791dd01b9a0b7796" }
  11. 11. Central Syllabi ‣ Amazon CloudFront (Content Delivery Network) ‣ CloudFront vs S3 ‣ Or ... why are we not using Amazon S3 for delivery ‣ Distribution ‣ vanity domain: files.classes.cornell.edu + SSL Certificate (Amazon Certificate Manager) ‣ Behaviors (path routing to origin) ‣ Origin (S3) ‣ Allow CloudFront distribution to access origin via bucket policy w/OAI
 (Origin Access Identity) ‣ Class Roster authorizes client to access private content Syllabus Download
  12. 12. Central Syllabi ‣ Signed URL vs Signed Cookie ‣ Signed URL, similar in concept to S3, but uses key-pair (root credentials to create) ‣ short expiration ‣ content-disposition to rename ‣ Client requests syllabus from Class Roster ‣ https://classes.cornell.edu/download/syllabus-simple/FA17/AEM/1220/1/001 ‣ Server returns 302 Redirect to Signed URL: ‣ https://files.classes.cornell.edu/prod/
 syllabus-file/2678/16d90dc10961b1f4aaf758691fc0ca82b8ca5accc487c85f477a020dc65ffbd
 ?response-content-disposition=inline:filename="Syllabus-FA17-AEM1220-LEC001-PRIOR-TERM.pdf"
 &Expires=<expireUnixTime>
 &Signature=<base64Text>
 &Key-Pair-Id=<access key> ‣ http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-signed-urls.html CloudFront Signed URLs
  13. 13. Central Syllabi ‣ Full text search ‣ "Options": RDBMS (MySQL) vs NoSQL (MongoDB) vs Elasticsearch ‣ What is Elasticsearch? ‣ Full text search engine based on Apache Lucene supported by
 elastic.co ‣ Open source; massively scalable; RESTful; "E" in ELK Stack ‣ Not for the faint of heart ‣ Considered AWS Elasticsearch ‣ pros: AWS managed service (backups, scaling) ‣ cons: limited plugins; out of date; costly; not in VPC ‣ ** at time of development, lacked attachment pipeline plugin ‣ Decision: Elasticsearch on EC2 Container Service ‣ Utilize official Elastic Docker images for base
 docker pull docker.elastic.co/elasticsearch/elasticsearch ‣ Elasticsearch is I/O Intensive; ECS Task Definition w/placement constraint set to 
 container instance that has an attribute to identify our persistent EBS volume ‣ Ansible + Jenkins; lay down task definition, service, IAM roles, etc. Syllabus Content Search
  14. 14. Central Syllabi ‣ Index - Collection of Types
 Syllabi ‣ Type - Consistent Fields within Type
 Syllabus ‣ Plugins
 ingest-attachment (Apache Tika for text extraction) ‣ APIs (RESTful and JSON) - Indices, Document, Search, etc
 PUT /syllabi
 PUT /syllabi/syllabus/<syllabusId>
 GET /_search Elasticsearch at a Glance
  15. 15. Central Syllabi ‣ Worker processing of SyllabusPublish results in new event: SyllabusSearchIngest ‣ Worker receives SyllabusSearchIngest ‣ GetObject from S3 ‣ Document API ‣ PUT syllabi/syllabus/<id> ‣ Backup vs DR Plan ‣ re-ingest all docs Elasticsearch Ingestion $key = $syllabus->getGeneralFile()->getS3Key(); try { $s3Object = $s3Client->getObject(
 ['Key' => $key, 'Bucket' => $this->getS3Bucket()]); } catch (Exception $e) { throw new FileNotFoundException('File Not Found'); } $body = base64_encode($s3Object['Body']); $esParams = [ 'index' => 'syllabi', 'type' => 'syllabus', 'id' => $syllabus->getId(), 'pipeline' => 'attachment', 'body' => [ 'rosterId' => $roster->getId(), 'strm' => $roster->getStrm(), 'syllabusId' => $syllabus->getId(), 'updatedDttm' => $syllabus->getUpdatedDttm()- >format(DATE_ATOM), 'data' => $body ] ]; $result = $this->getElasticsearchClient()->index($esParams);
  16. 16. Central Syllabi ‣ Search API ‣ Example query ‣ GET /_search ‣ Room for improvement ‣ fuzzy matches ‣ phonetic matching Elasticsearch Query $params = [ 'index' => 'syllabi', 'type' => 'syllabus', 'size' => 50, 'body' => [ '_source' => ['syllabusId'], 'query' => [ 'bool' => [ 'must' => [ ['match' => 
 ['rosterId' => $roster->getId()]], ['match' => 
 ['attachment.content' => $searchQuery]] ] ] ] ], 'client' => [ 'timeout' => 5, 'connect_timeout' => 5 ] ] $esResults = $this->getElasticsearchClient()
 ->search($params);
  17. 17. Central Syllabi Thank You
 Questions?

×