EQR Reporting: Rails + Amazon EC2


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

EQR Reporting: Rails + Amazon EC2

  1. 1. Platform 3: To Infinity and Beyond January, 2009 Summit XI
  2. 2. Gartner’s Hype Cycle - -
  3. 3. Overview <ul><li>Architecture </li></ul><ul><li>Video </li></ul><ul><li>Reporting </li></ul>- -
  4. 4. Architecture: What’s that? <ul><li>The structures of the system </li></ul><ul><ul><li>The externally visible parts and the relationships between them </li></ul></ul>- -
  5. 5. Architecture: Goals <ul><li>Performance </li></ul><ul><ul><li>Every page needs to yield a response within 5 seconds </li></ul></ul><ul><li>Availability/Reliability </li></ul><ul><ul><li>Always there! </li></ul></ul><ul><li>Scalability </li></ul><ul><ul><li>Dynamically add RAM/CPU </li></ul></ul><ul><ul><li>Dynamically add more servers </li></ul></ul><ul><li>Agile/Flexible </li></ul><ul><ul><li>Can easily be adapted </li></ul></ul><ul><ul><li>Follow best practices </li></ul></ul><ul><li>Accuracy </li></ul><ul><ul><li>No response left behind </li></ul></ul><ul><ul><li>Quality Assurance </li></ul></ul>- -
  6. 6. Architecture: Performance <ul><li>How do we achieve great performance? </li></ul><ul><ul><li>Using the right software </li></ul></ul><ul><ul><ul><li>Ruby on Rails </li></ul></ul></ul><ul><ul><ul><ul><li>Twitter, LinkedIn, Hulu </li></ul></ul></ul></ul><ul><ul><li>Good application design </li></ul></ul><ul><ul><ul><li>Reporting has different needs than Authoring/Runtime </li></ul></ul></ul><ul><ul><li>Testing / Benchmarking / Tuning </li></ul></ul><ul><ul><ul><li>Rails has lots of good built-in utilities to make these easy </li></ul></ul></ul><ul><ul><ul><li>We’re writing test code, right? </li></ul></ul></ul><ul><ul><li>Dedicating time for maintenance / new features </li></ul></ul><ul><ul><ul><li>As data grows </li></ul></ul></ul><ul><ul><ul><li>As more complexity is brought in to application environment </li></ul></ul></ul><ul><ul><ul><li>As we get smarter </li></ul></ul></ul>- -
  7. 7. Architecture: Performance <ul><li>Good Application Design – Separation of Concerns </li></ul><ul><li>Separating databases for Runtime and Reporting is a Good thing! </li></ul><ul><ul><li>Runtime is OLTP </li></ul></ul><ul><ul><ul><li>OLTP , refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing . It has also been used to refer to processing in which the system responds immediately to user requests. - Wikipedia </li></ul></ul></ul><ul><ul><li>Reporting is OLAP </li></ul></ul><ul><ul><ul><li>OLAP , is an approach to quickly provide answers to analytical queries that are multi-dimensional in nature. Databases configured for OLAP employ a multidimensional data model , allowing for complex analytical and ad-hoc queries with a rapid execution time. - Wikipedia </li></ul></ul></ul><ul><ul><li>Analytical processing on Reporting doesn’t impact performance on Runtime (ie Active Surveys in the field) because they are physically different systems. </li></ul></ul>- -
  8. 8. Architecture: Availability/Reliability <ul><li>Co-location </li></ul><ul><ul><li>Uptime </li></ul></ul><ul><ul><ul><li>eApps </li></ul></ul></ul><ul><ul><ul><ul><li>99.98% over past 1000 days </li></ul></ul></ul></ul><ul><ul><ul><li>Colo4Dallas </li></ul></ul></ul><ul><ul><ul><ul><li>Guarantees 100%, reality? 99%+ </li></ul></ul></ul></ul><ul><ul><ul><li>Amazon Web Services </li></ul></ul></ul><ul><ul><ul><ul><li>99.95% </li></ul></ul></ul></ul><ul><li>Redundancy </li></ul><ul><ul><li>Servers have different profiles for different services </li></ul></ul><ul><ul><ul><li>Databases </li></ul></ul></ul><ul><ul><ul><li>Web / Application servers </li></ul></ul></ul><ul><ul><ul><li>Proxy / Load balancing </li></ul></ul></ul><ul><ul><li>Server profiles are duplicated and online for… </li></ul></ul><ul><ul><ul><li>Hardware failures </li></ul></ul></ul><ul><ul><ul><li>Load balancing during peak demand </li></ul></ul></ul>- -
  9. 9. Architecture: Scalability <ul><li>Reporting </li></ul><ul><ul><li>www.eqrtools.com hosted at eApps </li></ul></ul><ul><ul><ul><li>Runs on an $70/month plan (1.2 GB RAM Virtual Private Server) </li></ul></ul></ul><ul><ul><ul><li>Pre-packaged with Java, Rails, MySQL, mail server, etc. </li></ul></ul></ul><ul><ul><ul><li>Can upgrade package in minutes and add servers via web interface </li></ul></ul></ul><ul><ul><ul><li>Cancel anytime </li></ul></ul></ul><ul><ul><li>Amazon Web Services </li></ul></ul><ul><ul><ul><li>S3 = Simple Storage Service </li></ul></ul></ul><ul><ul><ul><li>EC2 = Elastic Cloud Computing </li></ul></ul></ul><ul><ul><ul><li>CloudFront = Content Delivery Network </li></ul></ul></ul><ul><li>Authoring/Runtime </li></ul><ul><ul><li>Hosted at Colo4Dallas </li></ul></ul><ul><ul><ul><li>n Front End Web/Application servers </li></ul></ul></ul><ul><ul><ul><li>n Database servers </li></ul></ul></ul><ul><ul><li>Wowza </li></ul></ul><ul><ul><ul><li>Streaming Video Service via Amazon EC2 </li></ul></ul></ul>- -
  10. 10. Architecture: Amazon Web Services <ul><li>Simple Storage Service (S3) </li></ul><ul><ul><li>In use at Equation with JTS for 2+ years </li></ul></ul><ul><ul><li>Expanding use for storing more stuff </li></ul></ul><ul><ul><ul><li>Images – plain, rollover, etc. </li></ul></ul></ul><ul><ul><ul><li>Documents – PDF reports </li></ul></ul></ul><ul><ul><ul><li>Videos </li></ul></ul></ul><ul><ul><ul><li>EC2 Machine Images </li></ul></ul></ul><ul><li>Elastic Cloud Computing (EC2) </li></ul><ul><ul><li>Provides ability to add servers (Linux/Windows flavors) for specific services </li></ul></ul><ul><ul><ul><li>i.e. Wowza Video Streaming </li></ul></ul></ul><ul><ul><ul><li>Grabs content from S3 </li></ul></ul></ul><ul><ul><ul><li>Can be expanded to other uses – Rails application hosting/database </li></ul></ul></ul><ul><li>CloudFront </li></ul><ul><ul><li>Provides Content Delivery Network (CDN) to push to edge </li></ul></ul><ul><ul><ul><li>Content that we move into S3 </li></ul></ul></ul><ul><ul><ul><li>Moves content closer to clients reducing network latency </li></ul></ul></ul>- -
  11. 11. Architecture: EC2 Simplified <ul><li>Virtual Machines/Servers </li></ul><ul><ul><li>Scalability in two dimensions </li></ul></ul><ul><ul><ul><li>Use as many machines as you need </li></ul></ul></ul><ul><ul><ul><li>Various machine sizes available </li></ul></ul></ul><ul><li>High availability </li></ul><ul><li>High bandwidth </li></ul>- -
  12. 12. Architecture: EC2 Instance Types <ul><li>EC2 supports different instance types </li></ul><ul><ul><li>Small Instance </li></ul></ul><ul><ul><ul><li>1.7 GB memory, 32-bit platform, I/O Performance: Moderate </li></ul></ul></ul><ul><ul><ul><li>1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit) </li></ul></ul></ul><ul><ul><ul><li>160 GB instance storage (150 GB plus 10 GB root partition) </li></ul></ul></ul><ul><ul><ul><li>Price: $0.10 per instance hour </li></ul></ul></ul><ul><ul><li>Large Instance </li></ul></ul><ul><ul><ul><li>7.5 GB memory, 64-bit platform , I/O Performance: High </li></ul></ul></ul><ul><ul><ul><li>4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) </li></ul></ul></ul><ul><ul><ul><li>850 GB instance storage (2 x 420 GB plus 10 GB root partition) </li></ul></ul></ul><ul><ul><ul><li>Price: $0.40 per instance hour </li></ul></ul></ul><ul><ul><li>Extra Large Instance </li></ul></ul><ul><ul><ul><li>15 GB memory, 64-bit platform, I/O Performance: High </li></ul></ul></ul><ul><ul><ul><li>8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each) </li></ul></ul></ul><ul><ul><ul><li>1,690 GB instance storage (4 x 420 GB plus 10 GB root partition) </li></ul></ul></ul><ul><ul><ul><li>Price: $0.80 per instance hour </li></ul></ul></ul>- -
  13. 13. - - CloudFront: Content Delivery Network <ul><li>and how it works… </li></ul>
  14. 14. Amazon – CloudFront CDN <ul><li>Copies of files in S3 bucket are accessed/cached from edge servers around the world. </li></ul>- - Amazon: CloudFront
  15. 15. Architecture: Amazon <ul><li>Benefits </li></ul><ul><ul><li>No upfront investments </li></ul></ul><ul><ul><ul><li>No contract </li></ul></ul></ul><ul><ul><ul><li>No hardware to purchase, install/fit, maintain </li></ul></ul></ul><ul><ul><ul><li>Pay for what we use </li></ul></ul></ul><ul><ul><li>Offer variety of uses – Content hosting, machine hosting, streaming video </li></ul></ul><ul><ul><li>Competitors often charge upfront and monthly fees and don’t offer one-stop-service </li></ul></ul><ul><ul><li>We can dynamically add/remove machines as we need them </li></ul></ul><ul><li>Additional applications built on EC2 are also available… </li></ul><ul><ul><li>Wowza Video Streaming </li></ul></ul><ul><ul><li>Jungle Disk (backup/recovery) </li></ul></ul><ul><ul><li>GigaVox Media (Podcast hosting) </li></ul></ul><ul><ul><li>Morph (Application hosting) </li></ul></ul><ul><ul><li>RightScale (Application hosting/monitoring) </li></ul></ul><ul><ul><li>Scalr (Load Balancing/farm) </li></ul></ul>- -
  16. 16. Architecture: Quality Assurance <ul><li>Code Coverage </li></ul>- -
  17. 17. Architecture: Quality Assurance <ul><li>Example – Question controller </li></ul>- -
  18. 18. <ul><li>Rich Media: Audio, Images and Video </li></ul>- -
  19. 19. Video – The learning curve… <ul><li>grok as &quot;to understand intuitively or by empathy; to establish rapport with&quot; and &quot;to empathize or communicate sympathetically (with); also, to experience enjoyment.“ (source Old Oxford Dictionary) </li></ul>- -
  20. 20. Serving Video is like… TV <ul><li>Content </li></ul><ul><ul><li>(i.e. The Ad) </li></ul></ul><ul><li>Delivery </li></ul><ul><ul><li>(i.e. Cable, Satellite, Rabbit ears) </li></ul></ul><ul><li>Viewer </li></ul><ul><ul><li>(i.e. – The television box) </li></ul></ul>
  21. 21. The Content: Preparation <ul><li>There are many source formats to video </li></ul><ul><ul><li>AVI (early Windows format), Quicktime (.mov), Windows Media, MPEG, Flash </li></ul></ul><ul><li>Files are large and not optimized for web delivery </li></ul><ul><ul><li>Encoded for other mediums </li></ul></ul>
  22. 22. Content conversion <ul><li>The Old Way </li></ul><ul><ul><li>Sorensen Squeeze </li></ul></ul><ul><ul><ul><li>A desktop tool where we manually took a file and converted into multiple varying bitrate Flash files </li></ul></ul></ul><ul><ul><li>Uploaded file(s) to third party hosted Flash Video service </li></ul></ul><ul><li>The New Way </li></ul><ul><ul><li>File uploader </li></ul></ul><ul><ul><li>ffmpeg (under the covers) </li></ul></ul><ul><ul><ul><li>An open source utility that has been wrapped with Ruby packages to provide compression in the P3 Application </li></ul></ul></ul><ul><ul><ul><li>Media is compressed for optimal playback experience </li></ul></ul></ul><ul><ul><ul><li>Media is still formatted to flash </li></ul></ul></ul><ul><ul><ul><ul><li>Most commonly served format on Internet (> 92%) </li></ul></ul></ul></ul><ul><ul><li>Converted file uploaded to Amazon </li></ul></ul><ul><ul><ul><li>File resides in S3 folder </li></ul></ul></ul><ul><ul><ul><li>Streamed via Wowza server hosted on EC2 instance </li></ul></ul></ul>
  23. 23. Video: ffmpeg <ul><li>Still a bit of magic involved… </li></ul><ul><ul><li>Reduce this, increase that… </li></ul></ul>- -
  24. 24. Video: ffmpeg conversion <ul><li>But at least we’ve built tools! </li></ul>- -
  25. 25. Video: Delivery <ul><li>Progressive Download </li></ul><ul><ul><li>Copy of video is made on your local temp drive and then buffered back through the player as it downloads </li></ul></ul><ul><ul><ul><li>Lacks IP protection </li></ul></ul></ul><ul><ul><li>ESPN </li></ul></ul><ul><ul><li>Video is sent to player over http from file system on host server </li></ul></ul><ul><ul><li>Some companies will block content </li></ul></ul><ul><ul><ul><li>by MIME type </li></ul></ul></ul><ul><ul><ul><li>video over http on port 80 is the easiest way to get past security </li></ul></ul></ul><ul><li>Streaming </li></ul><ul><ul><li>Video is streamed in real time from streaming video server </li></ul></ul><ul><ul><ul><li>No local copy made </li></ul></ul></ul><ul><ul><li>Near instantaneous playback </li></ul></ul><ul><ul><li>Uses rtmp protocol </li></ul></ul><ul><ul><li>Important to size/compress correctly for intended audience </li></ul></ul>- -
  26. 26. Video: Delivery <ul><li>Factors impacting Client reception </li></ul><ul><ul><li>Other programs running </li></ul></ul><ul><ul><ul><li>How much available CPU/RAM does the respondent’s web-enabled device have? </li></ul></ul></ul><ul><ul><li>Bandwidth </li></ul></ul><ul><ul><ul><li>DSL, Cable, dialup? </li></ul></ul></ul><ul><ul><ul><li>Bandwidth varies during a video session (i.e. 30 second Ad) </li></ul></ul></ul>- -
  27. 27. Video: The Player <ul><li>The swf file </li></ul><ul><ul><li>Hosted on server, embedded in page </li></ul></ul><ul><ul><li>Skinnable </li></ul></ul><ul><ul><ul><li>Remove controls </li></ul></ul></ul><ul><ul><li>Plays either progressive or streaming </li></ul></ul><ul><ul><li>JW Player is the most ubiquitous </li></ul></ul>- -
  28. 28. P3 Reporting - -
  29. 29. Reporting: Online Analytical Processing (OLAP) - -
  30. 30. Reporting: The Update Algorithm <ul><li>Scheduled Batch </li></ul><ul><ul><li>Go update all the surveys every x minutes… </li></ul></ul><ul><ul><ul><li>Open and recently closed </li></ul></ul></ul><ul><li>On Demand </li></ul><ul><ul><li>Update this survey now </li></ul></ul><ul><li>Real-time </li></ul><ul><ul><li>Asynchronously, grab queued responses from a MQ with updates from the Runtime </li></ul></ul>- -
  31. 31. Reporting: On demand - -
  32. 32. Reporting: Key features <ul><li>View results by Question </li></ul><ul><li>Filtering </li></ul><ul><ul><li>By status </li></ul></ul><ul><ul><li>Compound filters based on question/choice sets </li></ul></ul><ul><li>Crosstabs </li></ul><ul><ul><li>Question v Question crosstabs </li></ul></ul><ul><ul><li>Filter by status </li></ul></ul><ul><li>Quotas / Segments </li></ul><ul><ul><li>View current / total counts </li></ul></ul><ul><li>Monitor survey progress </li></ul><ul><ul><li>Total, Last day, Last hour… </li></ul></ul>- -
  33. 33. Reporting: What’s left? <ul><li>More testing… </li></ul><ul><li>Report generation </li></ul><ul><ul><li>PDF </li></ul></ul><ul><ul><li>Other formats </li></ul></ul><ul><li>Email notification </li></ul><ul><li>More slicing/dicing tools </li></ul><ul><li>Migration to Scalr??? </li></ul><ul><li>Beta with select clients </li></ul><ul><li>User feedback </li></ul><ul><ul><li>Incorporate into future releases </li></ul></ul>- -