AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerating Innovation in the Public Sector

787 views

Published on

The cloud not only helps organizations do things better, cheaper, and faster; it also drives breakthroughs that transform mission delivery. This session will feature a panel of international government and university leaders who are using the cloud to take on big data challenges, and innovating in the “white space” between data silos to deliver impact.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
787
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerating Innovation in the Public Sector

  1. 1. AWS Government, Education, & Nonprofits Symposium Canberra, Australia | May 20, 2014 Big Data in the Cloud: Accelerating Innovation in the Public Sector Russell Nash Solutions Architect Amazon Web Services, APAC
  2. 2. Technologies and techniques for working productively with data, at any scale BIG DATA
  3. 3. VOLUME VELOCITY VARIETY
  4. 4. GB TB PB ZB EB Unconstrained Data Growth YB
  5. 5. 27 TB per day Large Hadron Collider – CERN
  6. 6. STORE │ ANALYZE
  7. 7. STORE │ ANALYZE
  8. 8. AMAZON S3
  9. 9. Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012 Q2 2013 2 Trillion 1.1 M peak transactions per second Objects in S3
  10. 10. 99.999999999%
  11. 11. STORE │ ANALYZE
  12. 12. HPC Hadoop MPP Database
  13. 13. High Performance Computing
  14. 14. Amazon EC2
  15. 15. 1 2 4 8 16 32 64 128 256 1 2 4 8 16 32 64 128 Memory(GB) EC2 Compute Units Instance Types Standard 2nd Gen Standard Micro High-Memory High-CPU Cluster Compute Cluster GPU High I/O High-Storage Cluster High-Mem hi1.4xlarge 60.5 GB of memory 35 EC2 Compute Units 2x1024 GB SSD instance storage 64-bit platform cc1.4xlarge 23 GB of memory 33.5 EC2 Compute Units 1690 GB of instance storage 64-bit platform c1.xlarge 7 GB of memory 20 EC2 Compute Units 1690 GB of instance storage 64-bit platform m1.small 1.7 GB memory 1 EC2 Compute Unit 160 GB instance storage 32-bit or 64-bit m1.medium 3.75 GB memory 2 EC2 Compute Unit 410 GB instance storage 32-bit or 64-bit platform m1.large EBS Optimizable 7.5 GB memory 4 EC2 Compute Units 850 GB instance storage 64-bit platform m1.xlarge EBS Optimizable 15 GB memory 8 EC2 Compute Units 1,690 GB instance storage 64-bit platform m2.xlarge 17.1 GB of memory 6.5 EC2 Compute Units 420 GB of instance storage 64-bit platform m2.2xlarge 34.2 GB of memory 13 EC2 Compute Units 850 GB of instance storage 64-bit platform m2.4xlarge EBS Optimizable 68.4 GB of memory 26 EC2 Compute Units 1690 GB of instance storage 64-bit platform t1.micro 613 MB memory Up to 2 EC2 Compute Units EBS storage only 32-bit or 64-bit platform c1.medium 1.7 GB of memory 5 EC2 Compute Units 350 GB of instance storage 32-bit or 64-bit platform cg1.4xlarge 22 GB of memory 33.5 EC2 Compute Units 2 x NVIDIA Tesla “Fermi”  M2050  GPUs 1690 GB of instance storage 64-bit platform cc2.8xlarge 60.5 GB of memory 88 EC2 Compute Units 3370 GB of instance storage 64-bit platformm3.xlarge 15 GB of memory 13 EC2 Compute Units m3.2xlarge EBS Optimizable 30 GB of memory 26 EC2 Compute Units hs1.8xlarge 117 GB of memory 35 EC2 Compute Units 24x2 TB instance storage 64-bit platform cr1.8xlarge 244 GB of memory 88 EC2 Compute Units 2x120 GB SSD instance storage 64-bit platform
  16. 16. •  ON A SINGLE INSTANCE COST: 4h x $2.1 = $8.4 COMPUTE TIME: 4h
  17. 17. •  ON MULTIPLE INSTANCES COST: 2 x 2h x $2.1 = $8.4 COMPUTE TIME:
  18. 18. Metric Count Compute Hours of Work 109,927 hours Compute Days of Work 4,580 days Compute Years of Work 12.55 years Ligand Count ~21 million ligands Using Cycle Computing and Amazon Web Services
  19. 19. 
 3 Hours" 
 for $4829/hr" 
 $20+ Million "
  20. 20. HPC Hadoop MPP Database
  21. 21. HADOOP
  22. 22. Amazon Elastic Map Reduce
  23. 23. Deploying  a  Hadoop  cluster  is  hard  
  24. 24. Elasticity
  25. 25. HPC Hadoop MPP Database
  26. 26. MPP Database
  27. 27. Amazon Redshift
  28. 28. Scalability
  29. 29. What are Spot Instances? Availability Zone Region Availability Zone Unused   Unused   Unused   Unused   Unused   Unused        Sold  at            50%     Discount!      Sold  at          56%     Discount!      Sold  at            66%     Discount!      Sold  at            59%     Discount!      Sold  at            54%     Discount!      Sold  at            63%     Discount!  
  30. 30. •  ON MULTIPLE INSTANCES COST: 2 x 2h x $2.1 = $8.4 COMPUTE TIME:
  31. 31. •  ON MULTIPLE SPOT INSTANCES COST: 4 x 1h x $0.35 = $1.4 COMPUTE TIME:
  32. 32. SEC MIDAS & Tradeworx Real-time analysis of 20 billion messages/day Reconstruct any market, any day in history
  33. 33. “For the growing team of quant types now employed at the SEC, MIDAS is becoming the world’s greatest data sandbox. The staff is planning to use it to make the SEC a leader in its use of market data” Elisse B. Walter, Chairman of the SEC
  34. 34. PUBLIC DATA SETS http://aws.amazon.com/publicdatasets
  35. 35. 20th May 2014 Dr Nick Tate Director, RDSI
  36. 36. }  The Research Data Storage Infrastructure (RDSI) Project, an Australian Government initiative, is funded from the Education Investment Fund under the Super Science (Future Industries) initiative. }  RDSI is a $50m federally funded project, for which UQ is the lead agent and was awarded up to $10m in NCRIS 2013. 26 May 2014 44
  37. 37. }  Andrew Goodchild, QCIF }  Andrew Reay, AWS }  Paul Campbell, RDSI }  Shane Youl, Intersect }  Mark Terrill , O2 26 May 2014 45
  38. 38. 46 }  Researchers will be able to use and manipulate significant collections of data that were previously either unavailable or difficult to access
  39. 39. 47 Data Stores Across Australia http://www.rdsi.uq.edu.au/node-statuses
  40. 40. 48 }  Evaluation and testing facility }  Working with Partners }  Implemented through two nodes (initially) §  QCIF – Brisbane §  Intersect - Sydney }  Act as gateways for public cloud access }  Will host a series of projects
  41. 41. 26 May 2014 49 Gateway to the Public Cloud through nodes Peering through AARNet (Sydney/Oregon) Volume Aggregation through CAUDIT/Test Platform Removing barriers to use, such as egress charges for researchers
  42. 42. 50 RDSI use cases are designed to test boundary conditions involving performance, capability and cost/ effectiveness
  43. 43. RDSI is partnering with AWS, QCIF, Intersect and O2 networks to undertake testing of: §  Integration §  Data movement §  Data storage §  Services over data
  44. 44. §  AWS Identity and access manager (IAM) for billing aggregation §  Australian Access Federation (AAF) based authentication of researchers §  O2 networks integration of IAM and AAF via SAML §  Extending authentication and authorization through reX
  45. 45. QBI need to visualize neural tracks for ~1000 MRI image sets Each image set: §  Input: 17 TB §  Compute: 300 - 900 cores for 1 week §  Output: 50 TB Status: §  QCIF are working with QBI on the first run
  46. 46. Compute Services over Data §  Challenge: 900 cores for a week becomes expensive ($15K) §  Solution: MIT Starcluster + AWS Spot §  Why: Starcluster elastically expands based on spot price & queue size
  47. 47. Storage –  Challenge: Need 67 TB of volume storage, but AWS provides 1 TB volumes –  Solution: Glusterfs + HS1.8XLarge servers –  Why: Glusterfs gets faster with more servers and HS1.8XLarge are less $ per TB compared to SSDs (which are hard to saturate)
  48. 48. Network –  Challenge: Moving 17 TB through the eye of a needle (the campus network) –  Solution: NAS sneaker net + Aspera –  Why: Tried compression, but it adds significant time overheads –  Would have been better if campus had a “data transfer network”
  49. 49. Effective use of the Cloud, whether public or private requires thought and planning The Public Cloud provides a significant opportunity to extend the capability and capacity of existing research infrastructure The ability to scale very rapidly with few constraints is attractive for Data Intensive Research Public Cloud offers the capability of allowing access to significant infrastructure as large upfront capital investments become scarce
  50. 50. 59 RDSI is establishing a BLOG to document the journey rather than wait until the end to write a report https://www.rdsi.edu.au/aws-test
  51. 51. THANK YOU Please give us your feedback by filling out the Feedback Forms AWS Government, Education, & Nonprofits Symposium Canberra, Australia | May 20, 2014

×