Apereo OAE - Architectural overview


Published on

The Apereo Open Academic Environment is a platform that focusses on group collaboration between researchers, students and lecturers, and strongly embraces openness, creation, re-use, re-mixing and discovery of content, people and groups.

How does Apereo OAE work? OAE targets a large scale and a multi-tenant cloud-compatible deployment model, where a single installation can host multiple institutions at the same time.

This presentation provides an overview of the different components and technologies that are being used, as well as details around deploying and configuring OAE and its associated running costs.

A summary of the approach used for continuous nightly performance testing and how we are validating the desired (horizontal) scalability is provided. Details around back-end and UI unit testing, code coverage and security testing will be shared and contribution models for service development and UI development is discussed as well.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Apereo OAE - Architectural overview

  1. 1. Apereo OAEArchitectural Overview, San Diego 2013Wednesday, 12 June 13
  2. 2. http://oae.oaeproject.orgWednesday, 12 June 13
  3. 3. Topics1. Project Goals2. Hilary System Architecture3. Performance Testing4. Deployment and Automation5. UI Architecture6. Customization and Configuration7. Questions?Wednesday, 12 June 13
  4. 4. Project goals• Multi-tenant platform• Cloud-ready• SaaS• Used at large scaleWednesday, 12 June 13
  5. 5. Project goals• Maintainable• Extendable• Integrate-ableWednesday, 12 June 13
  6. 6. Solid foundationModern, not exoticWednesday, 12 June 13
  7. 7. Multi-tenancy• Market is heading• Support multiple institutions at same time• Multi-tenancy+• Easily created, maintained and configuredWednesday, 12 June 13
  8. 8. Multi-tenancyWednesday, 12 June 13
  9. 9. Multi-tenancyWednesday, 12 June 13
  10. 10. Multi-tenancyWednesday, 12 June 13
  11. 11. Multi-tenancyWednesday, 12 June 13
  12. 12. Performance!• Ability to scale horizontally• Evidence based• ContinuousWednesday, 12 June 13
  13. 13. Topics1. Project Goals2. Hilary System Architecture3. Performance Testing4. Deployment and Automation5. UI Architecture6. Customization and Configuration7. Questions?Wednesday, 12 June 13
  14. 14. OAE ArchitectureThe Apereo OAE project is made up of 2 distinct source codeplatforms:• “Hilary”• Server-side RESTful web platform that exposes theOAE services• Written entirely using server-side JavaScript in Node.js• “3akai-ux”• A client-side / browser platform that provides theHTML, JavaScript and CSS that make up the browserUI of the applicationWednesday, 12 June 13
  15. 15. OAE ArchitectureWednesday, 12 June 13
  16. 16. Hilary System ArchitectureWednesday, 12 June 13
  17. 17. Application Servers• Written in server-side JavaScript, run in Node.js• Node.js used by: eBay, LinkedIn, Storify, Trello• Light-weight (80Mb memory) single-threaded platform that processes IOasynchronously / non-blocking• App servers can be configured into functional specialization:• User Request Processor• Activity Processor• Search Indexer• Preview Processor• Specializing app servers allows for clustering different types of applicationprocessing in distinct waysWednesday, 12 June 13
  18. 18. Apache Cassandra• Canonical data source• Provides high-availability and fault-tolerance without trading awayperformance• Gives flexibility with incremental scalability in a cloud environment• Helps overcome unpredictable growth of multi-tenant systems• Option for multi-datacenter deployments to localize reads and writesin geographical regions• Can trade off consistency for availability at the query level• Scales linearly by sharding and balancing rows across nodes, withconfigurable replication levels• Used by: Netflix, eBay,TwitterWednesday, 12 June 13
  19. 19. ElasticSearch• Lucene-backed search platform• Built for cloud-friendly incremental scaling and high-availability• Exposes HTTP RESTful APIs for indexing and queryingdocuments• RESTful query interface uses JSON-based Query DSL• Scales linearly by distributing pre-determined number ofshards among nodes and automatically and rebalances whennecessary• Used by: GitHub, FourSquare, StackOverflow,WordPressWednesday, 12 June 13
  20. 20. RabbitMQ• Message queue platform written in Erlang• Used for distributing tasks to specializedapplication server instances• Supports active-active queue mirroring tonodes for high availability• Used by: JoyentWednesday, 12 June 13
  21. 21. Redis• Commonly known as a cache server• Fills a variety of functionality:• Caching of basic user profiles• Broadcast messaging (can move to RabbitMQ)• Locking• Holds volatile activity aggregation data• Comes with no managed clustering solution (yet), but has slavereplication for active fail-over• Some clients manage master-slave switching, and distributed reads foryou• Used by:Twitter, Instagram, StackOverflow, FlickrWednesday, 12 June 13
  22. 22. Etherpad• Open Source collaborative editing application written inNode.js• Originally developed by Google and Mozilla• Licensed under Apache License v2• Powers collaborative document editing in OAE• Doesn’t cluster, but we shard for performance• If an etherpad server goes down, active sessions on thatserver are lost• But document data is flushed to Cassandra on the fly solarge volumes of progress are not lost as a resultWednesday, 12 June 13
  23. 23. Nginx• HTTP and reverse-proxy server• Used to distribute load to applicationservers, etherpad servers and stream filedownloads• Useful rate-limiting features based onsource IP• Used by: Netflix,WordPress.comWednesday, 12 June 13
  24. 24. Topics1. Project Goals2. Hilary System Architecture3. Performance Testing4. Deployment and Automation5. UI Architecture6. Customization and ConfigurationWednesday, 12 June 13
  25. 25. Performance Testing:Workflow1. Generate data with Model Loader2. Load data into the system with Model Loader3. Generate Tsung Test with custom framework4. Run Tsung Test5. AnalysisWednesday, 12 June 13
  26. 26. Performance Testing:Setup• 1 nginx load balancer (0.5GB / 1CPU)• 2 app nodes (0.5GB / 1 CPU)• 3 db nodes (8GB / 2 CPU)• 1 redis node (0.5GB / 1 CPU)• 1 search node (0.5GB / 1 CPU)Wednesday, 12 June 13
  27. 27. TransactionsWednesday, 12 June 13
  28. 28. Request latencyWednesday, 12 June 13
  29. 29. Transactions / secWednesday, 12 June 13
  30. 30. Arrival rate of new usersWednesday, 12 June 13
  31. 31. Simultaneous usersWednesday, 12 June 13
  32. 32. HTTP Requests / secWednesday, 12 June 13
  33. 33. Histogram latency - POST /api/*Wednesday, 12 June 13
  34. 34. So, does it scale?• Yes. We can scale the applicationhorizontally by adding more nodes• Doubling the hardware, roughly doublesthe throughputWednesday, 12 June 13
  35. 35. Simultaneous usersWednesday, 12 June 13
  36. 36. Requests / secWednesday, 12 June 13
  37. 37. Topics1. Project Goals2. Hilary System Architecture3. Performance Testing4. Deployment and Automation5. UI Architecture6. Customization and Configuration7. Questions?Wednesday, 12 June 13
  38. 38. Deployment andAutomation• As you can imagine, many machines to manage. Current inventory:• 3x Cassandra• 2x Redis• 2x RabbitMQ• 4x Application + Indexer• 3x Preview Processor• 1x Activity Processor• 1x Nginx• 3x Etherpad• Performance testing with a cluster of 21 virtual machines• Additional scalability testing and verification with ~30 virtual machinesWednesday, 12 June 13
  39. 39. Puppet• Use puppet to centralize machine configuration and prevent configuration drift• Collection of “Manifests” that define the state that the machine should in based onits hostname / role:• What files should exist? What should their contents be?• What packages should be installed?• What services should be running, or stopped?• http://github.com/sakaiproject/puppet-hilary• All 20+ machines in cluster have Puppet installed, which ask for “catalog” info(expected configuration state) from a single puppet master machine• Puppet Master knows how to determine the machine state from the manifestsbased on its host (e.g., db0 is a cassandra node, it should have cassandra, java, etc...)• Use puppetdb with “External Resources” to share machine-specific informationwith each other node in the clusterWednesday, 12 June 13
  40. 40. MCollective• Provides parallel execution over a number of machines atone time• Start / Stop / Check status of services• Install / Remove / Check version of packages• Use puppet resource syntax to check adhoc machinefacts• Apply puppet manifests• Each cluster node subscribes to an ActiveMQ server toreceive commands. Central machine (the “client”) publishesthe command and waits for replyWednesday, 12 June 13
  41. 41. Slapchop• Missing piece:We need to create 21 machines of different specs in acloud service, and somehow get MCollective on them• A tool we lovingly call slapchop• Define a JSON manifest that holds machines configs and instances• Run slapchop to create the machines in Joyent cloud, start them, getmcollective installed• Well, kind of...• Now you can log in to the MCollective client and run mco puppetapply• Well, kind of...• Go from empty cloud to working 21 machine cluster in ~15 minutesWednesday, 12 June 13
  42. 42. Monitoring• Nagios• MuninWednesday, 12 June 13
  43. 43. Security• Infrastructure penetration tests by Universityof Murcia• No major issues found• UI vulnerability testing performed by SCIRTgroup• Followed up on all known XSS issues• Using OWASP JQuery plugin for XSSfiltering user-created dataWednesday, 12 June 13
  44. 44. Topics1. Project Goals2. Hilary System Architecture3. Performance Testing4. Deployment and Automation5. UI Architecture6. Customization and Configuration7. Questions?Wednesday, 12 June 13
  45. 45. UI ArchitectureHilary3akai-uxMobile UI3rd party integrationsWednesday, 12 June 13
  46. 46. Core UI Architecture• JS frameworks• CSS framework• 3rd party plugins• OAE UI API• OAE CSS ComponentsWednesday, 12 June 13
  47. 47. Core frameworks• RequireJS• jQuery• underscore.jsWednesday, 12 June 13
  48. 48. RequireJS• File and module loader• Necessity to keep things modular• Optimisation built-inWednesday, 12 June 13
  49. 49. • DOM manipulation• Cross-browser abstraction• Events• Pretty much everythingWednesday, 12 June 13
  50. 50. • Utility toolbelt• Manipulate objects, arrays, etc.Wednesday, 12 June 13
  51. 51. CSS frameworks• Twitter Bootstrap• Font AwesomeWednesday, 12 June 13
  52. 52. 3rd party plug-ins• Autosuggest• History.js• Fileupload• Validate• Templates• etc.Wednesday, 12 June 13
  53. 53. OAE UI API• Wrapper for REST requests• Users• Profile• Groups• Content• Discussions• Search• ConfigWednesday, 12 June 13
  54. 54. OAE UI API• Utilities• i18n• l10n• Widget loading• Template rendering• Notifications• XSS escaping• etc.Wednesday, 12 June 13
  55. 55. OAE CSS Components• Re-usable HTML fragments• OAE specific elements• Consistency• Design guidelinesWednesday, 12 June 13
  56. 56. ToolboxJS frameworksCSS framework3rd party pluginsOAE UI APIOAE CSS ComponentsWIDGET SDKWednesday, 12 June 13
  57. 57. Putting it togetherWednesday, 12 June 13
  58. 58. Widgets• Modular components• HTML Fragment• JavaScript• CSS• Config file• Language bundles• Loaded into DOMWednesday, 12 June 13
  59. 59. Topics1. Project Goals2. Hilary System Architecture3. Performance Testing4. Deployment and Automation5. UI Architecture6. Customization and Configuration7. Questions?Wednesday, 12 June 13
  60. 60. Tenant Administration• Create, start & stop tenants• Tenants are an application-level concept, notinfrastructure• Uses HTTP Host header toidentify tenantWednesday, 12 June 13
  61. 61. Tenant Skinning• Tenant skinning• Uses LESS CSS frameworkwith dynamic valuesWednesday, 12 June 13
  62. 62. Tenant Configuration• Global or tenant-specific configuration• Tenant configuration• Single-Sign On (CAS, Shibboleth,Social Media)• Default privacy settings• etc...• Changes happen on-the-fly• Uses Redis to broadcast changesacross clusterWednesday, 12 June 13
  63. 63. July 1, 20131st production releaseWednesday, 12 June 13