• Save
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerating Innovation in the Public Sector
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerating Innovation in the Public Sector

on

  • 471 views

The cloud not only helps organizations do things better, cheaper, and faster; it also drives breakthroughs that transform mission delivery. This session will feature a panel of international ...

The cloud not only helps organizations do things better, cheaper, and faster; it also drives breakthroughs that transform mission delivery. This session will feature a panel of international government and university leaders who are using the cloud to take on big data challenges, and innovating in the “white space” between data silos to deliver impact.

Statistics

Views

Total Views
471
Views on SlideShare
471
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerating Innovation in the Public Sector Presentation Transcript

  • 1. AWS Government, Education, & Nonprofits Symposium Canberra, Australia | May 20, 2014 Big Data in the Cloud: Accelerating Innovation in the Public Sector Russell Nash Solutions Architect Amazon Web Services, APAC
  • 2. Technologies and techniques for working productively with data, at any scale BIG DATA
  • 3. VOLUME VELOCITY VARIETY
  • 4. GB TB PB ZB EB Unconstrained Data Growth YB
  • 5. 27 TB per day Large Hadron Collider – CERN
  • 6. STORE │ ANALYZE
  • 7. STORE │ ANALYZE
  • 8. AMAZON S3
  • 9. Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012 Q2 2013 2 Trillion 1.1 M peak transactions per second Objects in S3
  • 10. 99.999999999%
  • 11. STORE │ ANALYZE
  • 12. HPC Hadoop MPP Database
  • 13. High Performance Computing
  • 14. Amazon EC2
  • 15. 1 2 4 8 16 32 64 128 256 1 2 4 8 16 32 64 128 Memory(GB) EC2 Compute Units Instance Types Standard 2nd Gen Standard Micro High-Memory High-CPU Cluster Compute Cluster GPU High I/O High-Storage Cluster High-Mem hi1.4xlarge 60.5 GB of memory 35 EC2 Compute Units 2x1024 GB SSD instance storage 64-bit platform cc1.4xlarge 23 GB of memory 33.5 EC2 Compute Units 1690 GB of instance storage 64-bit platform c1.xlarge 7 GB of memory 20 EC2 Compute Units 1690 GB of instance storage 64-bit platform m1.small 1.7 GB memory 1 EC2 Compute Unit 160 GB instance storage 32-bit or 64-bit m1.medium 3.75 GB memory 2 EC2 Compute Unit 410 GB instance storage 32-bit or 64-bit platform m1.large EBS Optimizable 7.5 GB memory 4 EC2 Compute Units 850 GB instance storage 64-bit platform m1.xlarge EBS Optimizable 15 GB memory 8 EC2 Compute Units 1,690 GB instance storage 64-bit platform m2.xlarge 17.1 GB of memory 6.5 EC2 Compute Units 420 GB of instance storage 64-bit platform m2.2xlarge 34.2 GB of memory 13 EC2 Compute Units 850 GB of instance storage 64-bit platform m2.4xlarge EBS Optimizable 68.4 GB of memory 26 EC2 Compute Units 1690 GB of instance storage 64-bit platform t1.micro 613 MB memory Up to 2 EC2 Compute Units EBS storage only 32-bit or 64-bit platform c1.medium 1.7 GB of memory 5 EC2 Compute Units 350 GB of instance storage 32-bit or 64-bit platform cg1.4xlarge 22 GB of memory 33.5 EC2 Compute Units 2 x NVIDIA Tesla “Fermi”  M2050  GPUs 1690 GB of instance storage 64-bit platform cc2.8xlarge 60.5 GB of memory 88 EC2 Compute Units 3370 GB of instance storage 64-bit platformm3.xlarge 15 GB of memory 13 EC2 Compute Units m3.2xlarge EBS Optimizable 30 GB of memory 26 EC2 Compute Units hs1.8xlarge 117 GB of memory 35 EC2 Compute Units 24x2 TB instance storage 64-bit platform cr1.8xlarge 244 GB of memory 88 EC2 Compute Units 2x120 GB SSD instance storage 64-bit platform
  • 16. •  ON A SINGLE INSTANCE COST: 4h x $2.1 = $8.4 COMPUTE TIME: 4h
  • 17. •  ON MULTIPLE INSTANCES COST: 2 x 2h x $2.1 = $8.4 COMPUTE TIME:
  • 18. Metric Count Compute Hours of Work 109,927 hours Compute Days of Work 4,580 days Compute Years of Work 12.55 years Ligand Count ~21 million ligands Using Cycle Computing and Amazon Web Services
  • 19. 
 3 Hours" 
 for $4829/hr" 
 $20+ Million "
  • 20. HPC Hadoop MPP Database
  • 21. HADOOP
  • 22. Amazon Elastic Map Reduce
  • 23. Deploying  a  Hadoop  cluster  is  hard  
  • 24. Elasticity
  • 25. HPC Hadoop MPP Database
  • 26. MPP Database
  • 27. Amazon Redshift
  • 28. Scalability
  • 29. What are Spot Instances? Availability Zone Region Availability Zone Unused   Unused   Unused   Unused   Unused   Unused        Sold  at            50%     Discount!      Sold  at          56%     Discount!      Sold  at            66%     Discount!      Sold  at            59%     Discount!      Sold  at            54%     Discount!      Sold  at            63%     Discount!  
  • 30. •  ON MULTIPLE INSTANCES COST: 2 x 2h x $2.1 = $8.4 COMPUTE TIME:
  • 31. •  ON MULTIPLE SPOT INSTANCES COST: 4 x 1h x $0.35 = $1.4 COMPUTE TIME:
  • 32. SEC MIDAS & Tradeworx Real-time analysis of 20 billion messages/day Reconstruct any market, any day in history
  • 33. “For the growing team of quant types now employed at the SEC, MIDAS is becoming the world’s greatest data sandbox. The staff is planning to use it to make the SEC a leader in its use of market data” Elisse B. Walter, Chairman of the SEC
  • 34. PUBLIC DATA SETS http://aws.amazon.com/publicdatasets
  • 35. 20th May 2014 Dr Nick Tate Director, RDSI
  • 36. }  The Research Data Storage Infrastructure (RDSI) Project, an Australian Government initiative, is funded from the Education Investment Fund under the Super Science (Future Industries) initiative. }  RDSI is a $50m federally funded project, for which UQ is the lead agent and was awarded up to $10m in NCRIS 2013. 26 May 2014 44
  • 37. }  Andrew Goodchild, QCIF }  Andrew Reay, AWS }  Paul Campbell, RDSI }  Shane Youl, Intersect }  Mark Terrill , O2 26 May 2014 45
  • 38. 46 }  Researchers will be able to use and manipulate significant collections of data that were previously either unavailable or difficult to access
  • 39. 47 Data Stores Across Australia http://www.rdsi.uq.edu.au/node-statuses
  • 40. 48 }  Evaluation and testing facility }  Working with Partners }  Implemented through two nodes (initially) §  QCIF – Brisbane §  Intersect - Sydney }  Act as gateways for public cloud access }  Will host a series of projects
  • 41. 26 May 2014 49 Gateway to the Public Cloud through nodes Peering through AARNet (Sydney/Oregon) Volume Aggregation through CAUDIT/Test Platform Removing barriers to use, such as egress charges for researchers
  • 42. 50 RDSI use cases are designed to test boundary conditions involving performance, capability and cost/ effectiveness
  • 43. RDSI is partnering with AWS, QCIF, Intersect and O2 networks to undertake testing of: §  Integration §  Data movement §  Data storage §  Services over data
  • 44. §  AWS Identity and access manager (IAM) for billing aggregation §  Australian Access Federation (AAF) based authentication of researchers §  O2 networks integration of IAM and AAF via SAML §  Extending authentication and authorization through reX
  • 45. QBI need to visualize neural tracks for ~1000 MRI image sets Each image set: §  Input: 17 TB §  Compute: 300 - 900 cores for 1 week §  Output: 50 TB Status: §  QCIF are working with QBI on the first run
  • 46. Compute Services over Data §  Challenge: 900 cores for a week becomes expensive ($15K) §  Solution: MIT Starcluster + AWS Spot §  Why: Starcluster elastically expands based on spot price & queue size
  • 47. Storage –  Challenge: Need 67 TB of volume storage, but AWS provides 1 TB volumes –  Solution: Glusterfs + HS1.8XLarge servers –  Why: Glusterfs gets faster with more servers and HS1.8XLarge are less $ per TB compared to SSDs (which are hard to saturate)
  • 48. Network –  Challenge: Moving 17 TB through the eye of a needle (the campus network) –  Solution: NAS sneaker net + Aspera –  Why: Tried compression, but it adds significant time overheads –  Would have been better if campus had a “data transfer network”
  • 49. Effective use of the Cloud, whether public or private requires thought and planning The Public Cloud provides a significant opportunity to extend the capability and capacity of existing research infrastructure The ability to scale very rapidly with few constraints is attractive for Data Intensive Research Public Cloud offers the capability of allowing access to significant infrastructure as large upfront capital investments become scarce
  • 50. 59 RDSI is establishing a BLOG to document the journey rather than wait until the end to write a report https://www.rdsi.edu.au/aws-test
  • 51. THANK YOU Please give us your feedback by filling out the Feedback Forms AWS Government, Education, & Nonprofits Symposium Canberra, Australia | May 20, 2014