Empowering Congress with Data-Driven
Analytics
Mathew Chase,

November 13, 2013

Sri Vasireddy,

© 2013 Amazon.com, Inc. a...
• A small federal legislative branch agency
• Newly established in late 2010
• Going beyond the “Cloud First” goal
to “Clo...
Hello
• Mathew Chase
• Federal CIO
• Over 20 years experience in the
public and private sectors leading
technology operati...
Who are you?
•
•
•
•

Government
Health care industry
Cloud newbies
AWS ninjas

• Whoops… wrong session
Question?

How many of you are using
AWS as your primary
computing datacenter?
MACPAC’s AWS Datacenter
• AWS to replace an onsite or hosted
datacenter
• Single primary region with cold recovery on
the ...
MACPAC: the “perfect” cloud customer

•
•
•
•

Predicable work cycles
Two intense work periods (annual)
Growing with an un...
What we achieved in the cloud
• > 40% reduction in capital expenses
– With additional savings in rent, utilities, and labo...
Core focus

Recommendations to Congress on
Medicaid and the Children’s Health
Insurance Program
Reports to the Congress
Reports due by:
• March 15th &
• June 15th

www.MACPAC.gov/reports
Research backed by analytics
• Analyze Medicaid program data
• Find intersections with Medicare
• Evaluate Medicaid survey...
Tools
• SAS Office Analytics enterprise platform
• Red Hat Enterprise Linux x64
• Amazon EC2
Concerns

1. Security
2. Performance
Security
Security Requirements
• Multi-user controlled environment
• Isolated environment with strong controls
• No sensitive and p...
Access Protection Challenge

• Twenty Instances
• Twenty Ports for AD
• 20 x 20 = 400 Rules
Access Control Using Security Groups
AD-1

AD-2

Accept AD related requests from ‘Infra’ group

AD Security Group
Client I...
Encrypted
Data flow
Cloud
Security
Design
Performance
SAS Requirements
• Very IO intensive
• Sequential read and writes
o 35-70mb/sec per core of IO desired
o GOAL: 4 core syst...
Base AWS Structure
• M3 extra large running RHEL x64 for cluster
o 1 TB EBS RAID 10 for primary data (4, 500gb drives)
o 1...
Can AWS yield the necessary performance?
In the immortal words of
Spinal Tap:

“These go to eleven!”
Turning up the AWS dial
Volume @ 3
Specifications
M3 extra large
4 – 256gb EBS Disks
RAID 0 Stripe
fio Sequential Read @ 3
[ec2-user]# fio sastest.fio
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
f...
Volume @ 10
Specifications
M3 extra large
4 – 256gb EBS Disks
4000 iops per drive
RAID 0 Stripe
fio Sequential Read @ 10
[ec2-user]$ fio sastest.fio
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
...
“If we need that extra push over the cliff.
You know what we do?”

“11! Exactly.”

— Nigel
fio Sequential Read @ 11
[ec2-user]$ fio sastest.fio
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
...
Volume @ 11
Specifications
4 – 256gb EBS Disks
4000 iops per drive
RAID 0 Stripe
cg1.4xlarge (10gb io channel)
fio Sequential Read @ 11
[ec2-user]$ fio sastest.fio
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
...
I am pretty sure I can make the dial go higher

Ram Disks
Block sizes
Larger stripes
Application tuning
Etc…
WARNING!
• Be sure to touch all sectors of a new disk per
AWS guidance prior to testing and production
Command for Unix en...
You are not alone…
•
•
•
•

Guidance from software vendors
AWS professional services
Use an iterative process (Fail quickl...
What did we learn?
•
•
•
•

Make a decision
Start at zero…
Spend time really thinking about security
And then crank it up ...
References
• Amazon EBS Volume Performance
– http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerfor
mance.html

• AW...
Special Thanks to: 8kMiles, AWS, and SAS

And thank you for your time today.
Contact Information
mathew.chase@macpac.gov
www.macpac.gov
Please give us your feedback on this
presentation

BDT304
As a thank you, we will select prize
winners daily for completed...
Upcoming SlideShare
Loading in …5
×

Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

1,324 views

Published on

MACPAC is a federal legislative branch agency tasked with reviewing state and federal Medicaid and Children's Health Insurance Program (CHIP) access and payment policies and making recommendations to Congress. By March 15 and again by June 15 each year, the agency produces a comprehensive report for Congress that compiles results from Medicaid and CHIP data sources for the 50 states and territories. The CIO of MACPAC wanted a secure, cost-effective, high performance platform that met their needs to crunch this large amount of health data. In this session, learn how MACPAC and 8KMiles helped set up the agency’s Big Data/HPC analytics platform on AWS using SAS analytics software.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,324
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

  1. 1. Empowering Congress with Data-Driven Analytics Mathew Chase, November 13, 2013 Sri Vasireddy, © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. • A small federal legislative branch agency • Newly established in late 2010 • Going beyond the “Cloud First” goal to “Cloud Only”
  3. 3. Hello • Mathew Chase • Federal CIO • Over 20 years experience in the public and private sectors leading technology operations
  4. 4. Who are you? • • • • Government Health care industry Cloud newbies AWS ninjas • Whoops… wrong session
  5. 5. Question? How many of you are using AWS as your primary computing datacenter?
  6. 6. MACPAC’s AWS Datacenter • AWS to replace an onsite or hosted datacenter • Single primary region with cold recovery on the the other coast • Multiple AZs for redundancy • Separate VPCs for security “air gaps”
  7. 7. MACPAC: the “perfect” cloud customer • • • • Predicable work cycles Two intense work periods (annual) Growing with an undefined future Potential need for more computing resources • Very cost conscious • No legacy infrastructure
  8. 8. What we achieved in the cloud • > 40% reduction in capital expenses – With additional savings in rent, utilities, and labor • • • • Cost spread over typical equipment lifespan On demand storage and archiving Zero over provisioning Ability to expand and contract resources at will
  9. 9. Core focus Recommendations to Congress on Medicaid and the Children’s Health Insurance Program
  10. 10. Reports to the Congress Reports due by: • March 15th & • June 15th www.MACPAC.gov/reports
  11. 11. Research backed by analytics • Analyze Medicaid program data • Find intersections with Medicare • Evaluate Medicaid survey information
  12. 12. Tools • SAS Office Analytics enterprise platform • Red Hat Enterprise Linux x64 • Amazon EC2
  13. 13. Concerns 1. Security 2. Performance
  14. 14. Security
  15. 15. Security Requirements • Multi-user controlled environment • Isolated environment with strong controls • No sensitive and personal data sitting at periphery • Data encrypted at rest and in transit
  16. 16. Access Protection Challenge • Twenty Instances • Twenty Ports for AD • 20 x 20 = 400 Rules
  17. 17. Access Control Using Security Groups AD-1 AD-2 Accept AD related requests from ‘Infra’ group AD Security Group Client Instances Accept DNS queries from AD group Infra Security Group DNS-1 DNS-2 DNS SecurityGroup Accept DNS queries from ‘Infra’ group
  18. 18. Encrypted Data flow
  19. 19. Cloud Security Design
  20. 20. Performance
  21. 21. SAS Requirements • Very IO intensive • Sequential read and writes o 35-70mb/sec per core of IO desired o GOAL: 4 core system = ~200mb /sec IO
  22. 22. Base AWS Structure • M3 extra large running RHEL x64 for cluster o 1 TB EBS RAID 10 for primary data (4, 500gb drives) o 1 TB EBS RAID 0 for temp work space (4, 256gb drives) o 1 TB EBS LUKS encrypted RAID 0 for ETL (4, 256gb drives)
  23. 23. Can AWS yield the necessary performance?
  24. 24. In the immortal words of Spinal Tap: “These go to eleven!”
  25. 25. Turning up the AWS dial
  26. 26. Volume @ 3 Specifications M3 extra large 4 – 256gb EBS Disks RAID 0 Stripe
  27. 27. fio Sequential Read @ 3 [ec2-user]# fio sastest.fio job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.1.2 Starting 1 process Jobs: 1 (f=1) job1: (groupid=0, jobs=1): err= 0: pid=31661: Sun Oct 27 23:07:18 2013 read : io=102400KB, bw=77167KB/s, iops=19291, runt= 1327msec clat (usec): min=3, max=25911, avg=44.70, stdev=572.02 lat (usec): min=5, max=25913, avg=46.86, stdev=572.02 77,166 KB/s Run status group 0 (all jobs): READ: io=102400KB, aggrb=77166KB/s, minb=77166KB/s, maxb=77166KB/s, mint=1327msec, maxt=1327msec
  28. 28. Volume @ 10 Specifications M3 extra large 4 – 256gb EBS Disks 4000 iops per drive RAID 0 Stripe
  29. 29. fio Sequential Read @ 10 [ec2-user]$ fio sastest.fio job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.1.2 Starting 1 process 191,401 KB/s job1: (groupid=0, jobs=1): err= 0: pid=2731: Tue Nov 5 22:55:33 2013 read : io=102400KB, bw=191402KB/s, iops=47850, runt= 535msec clat (usec): min=3, max=51820, avg=13.29, stdev=337.22 lat (usec): min=4, max=51821, avg=15.52, stdev=337.21 Run status group 0 (all jobs): READ: io=102400KB, aggrb=191401KB/s, minb=191401KB/s, maxb=191401KB/s, mint=535msec, maxt=535msec
  30. 30. “If we need that extra push over the cliff. You know what we do?” “11! Exactly.” — Nigel
  31. 31. fio Sequential Read @ 11 [ec2-user]$ fio sastest.fio job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.1.2 Starting 1 process 432,067 KB/s job1: (groupid=0, jobs=1): err= 0: pid=3133: Tue Nov 5 23:13:13 2013 read : io=102400KB, bw=432068KB/s, iops=108016, runt= 237msec clat (usec): min=0, max=1594, avg= 8.26, stdev=42.59 lat (usec): min=0, max=1594, avg= 8.38, stdev=42.59 Run status group 0 (all jobs): READ: io=102400KB, aggrb=432067KB/s, minb=432067KB/s, maxb=432067KB/s, mint=237msec, maxt=237msec
  32. 32. Volume @ 11 Specifications 4 – 256gb EBS Disks 4000 iops per drive RAID 0 Stripe cg1.4xlarge (10gb io channel)
  33. 33. fio Sequential Read @ 11 [ec2-user]$ fio sastest.fio job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.1.2 Starting 1 process 432,067 KB/s job1: (groupid=0, jobs=1): err= 0: pid=3133: Tue Nov 5 23:13:13 2013 read : io=102400KB, bw=432068KB/s, iops=108016, runt= 237msec clat (usec): min=0, max=1594, avg= 8.26, stdev=42.59 lat (usec): min=0, max=1594, avg= 8.38, stdev=42.59 Run status group 0 (all jobs): READ: io=102400KB, aggrb=432067KB/s, minb=432067KB/s, maxb=432067KB/s, mint=237msec, maxt=237msec
  34. 34. I am pretty sure I can make the dial go higher Ram Disks Block sizes Larger stripes Application tuning Etc…
  35. 35. WARNING! • Be sure to touch all sectors of a new disk per AWS guidance prior to testing and production Command for Unix environments $ dd if=/dev/md0 of=/dev/null
  36. 36. You are not alone… • • • • Guidance from software vendors AWS professional services Use an iterative process (Fail quickly) Third party partners (8kMiles) so get going!
  37. 37. What did we learn? • • • • Make a decision Start at zero… Spend time really thinking about security And then crank it up where you need it “Try again. Fail again. Fail better.” Samuel Beckett, Worstward Ho (1983)
  38. 38. References • Amazon EBS Volume Performance – http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerfor mance.html • AWS Microsoft Platform Security – http://media.amazonwebservices.com/AWS_Microsoft_Platform_Se curity.pdf • Benchmarking SAS I/O: Verifying I/O Performance Using fio – http://support.sas.com/resources/papers/proceedings13/4792013.pdf • This is Spinal Tap (Movie, 1984, Rob Reiner - Director)
  39. 39. Special Thanks to: 8kMiles, AWS, and SAS And thank you for your time today.
  40. 40. Contact Information mathew.chase@macpac.gov www.macpac.gov
  41. 41. Please give us your feedback on this presentation BDT304 As a thank you, we will select prize winners daily for completed surveys!

×