There is a constant tension between empowering teams to be agile through autonomy and enforcing governance policies to maintain regulatory compliance. Hear from Nathan Scott, Senior Consultant at AWS and James Martin, Automation Engineering Manager at 3M on how they have achieved both autonomy and governance through self-service automation tools on AWS. Learn how to avoid pitfalls with building the CI/CD team, right sizing and how to address. This session will also feature a demo from Casey Lee, Chief Architect at Stelligent on the tools used to accomplish this for 3M, including AWS Service Catalog, AWS CloudFormation, AWS CodePipeline and Cloud Custodian, an open source tool for managing AWS accounts.
5. 5
Historical business
Our legacy 1983–2011
Helping healthcare organizations
get complete and accurate
reimbursement and mitigate
compliance risks
Streamlining and simplifying the
process of documenting the
patient’s encounter
in a hospital
Working with hospitals
to efficiently access, compile,
code, classify, report, store,
and exchange health information
6. 6
Leading in a changing landscape
Our present course and future
Analyzing the cost, quality, and
outcomes data of both
patients and populations
over time and across the
healthcare continuum
Ensuring providers capture the
full burden of illness of their
patients to deliver effective
care management and receive
accurate and complete
payment
Measuring performance and
effectiveness among payer and
provider networks to deliver
higher quality outcomes at
lower total costs
7. 73M Confidential.
3M HIS grouper applications
22 states (27 grouper adoptions) through 1983–2006
11 additional states (37 grouper adoptions) 2007–2010
6 additional states (33 grouper adoptions) 2011–Q3 2012
• Industry-recognized expertise
in payment methodologies and
patient classification
• 24 states have adopted APR
DRGs for payment, including
the eight largest Medicaid
programs in the country
• The APR DRG adoption by
payers typically yields over
75% downstream penetration
with providers
• Lays a foundation for further
payment products
87%
of the US
population is
covered by 3M
patient
classification
systems
8. 8
Not moving fast enough
Lift and shift got us out of the traditional data center, but…
Lots of software is getting built with nowhere to go, so it’s time to evolve
again.
11. Deployment pipeline
Feedback loop
plan monitor
build test release
Developers Customers
Based on slideshare.net/AmazonWebServices/dvo202-devops-at-amazon-a-look-at-our-tools-processes
Continuous delivery
13. 13
Building the automation team
Automation engineering team
• Deep knowledge of AWS services
• Comfortable talking to other development
teams
• Understands the complete development
lifecycle—from commit to deploy
14. 14
Choosing the right technology
• Focus on the problem at hand
• Don’t try to predict the future
• Use native AWS services/AWS
Lambda/software as a service
(Saas) services
15. 15
Working with security
• Gain buy-in early
• Security from the start
• Security as consumers
• Freedom (with guard rails)
• Sensitive data
16. 16
• Find a simple application
• Just enough to prove your pipeline
• Rinse, repeat
The right services and teams
17. 17
The right services and teams
Find the hungry team that
• Wants the power
• Is willing to do the work
• Has a champion
• Has the business need
18. 18
Embed with the AppDev team
• Establish success criteria
• Works closely with application team
• Participates in the team’s sprint cycle
• Helps AppDev team consume the pipeline process and tools
AppDev
team
Automation
engineering
19. 19
Establishing a CI/CD process at scale
Problems
• Complex components
• Special snowflakes
• Limited governance
Been in business for 30+ years
Develop products and services that help our customers produce accurate documentation and medical coding to improve quality of care and reduce cost.
The US is moving from a fee-for-service based medical care to big data driven population health
Measuring performance and effectiveness of care
Determining actions to take on that performance for improvement
24 states have our adopted our systems
87 % of the population is covered by our systems
1% of the Gross Domestic Product is being risk adjusted with 3M HIS methodologies (products and services)
Lots of records
Lots of dollars
Bottleneck=The amount of time it takes to do the action and waiting on the availability of the team.
How long it took to get to production on some of our deployments
Get software into the hands of customers as fast as possible.
Rob Brigham
Building the CI/CD platform team
Choosing the right technology
Security
Find the right service
Find a hungry team
Embed with the team
Establishing a Feedback loop
Needed a balance of engineer types and consulting engineer types
If you don’t have it in house, bring in consultants and rotate FTEs into the team
Don't try to over engineer to solve all types of delivery
Don’t try to figure out what you are going to need, figure out what you do need
Know that your CI/CD platform is iterative, like any product it will get better over time
Use native AWS services/Lambda/SaaS over instance-based infrastructure when possible
Security involved in the cu
CI/CD needs to have security baked into the process
Start building the platform with the Security team to gain buy-in early
Help the security team become consumers of the platform so they can be champions
Regulated Data
Development with Guardrails
Sensitive Data requires unique control frameworks that must be implemented.
Find an easy to deploy service
Small, stateless, a web app?
Get that thing to production don’t worry about containers, microservices, just yet.
Keep trying new services, wait for patterns to develop, iterate
Find an easy to deploy service
Find a team that is eager
Some teams want in just because it’s the hot new thing
http://lghttp.32478.nexcesscdn.net/80E972/organiclifestylemagazine/wp-content/uploads/2015/02/Hungry.jpg
Explain the teams on the graphs
Explain the bullet points
Onboarding team works closely with the Automation Engineering Team
Communicates App Team challenges to Automation Engineering Team
Acts as champion for App Team issues to make sure they are captured for future Teams and Pipeline Factory enhancements
Hands the steering wheel when app team is ready
Consistent CICD Pipelines and process at scale
James covered challenges and the approach to addressing
3 parts to solution
Pipelines – every commit can make its way to production with minimal human intervention (SPEED/AGILITY)
Self service – teams can create and manage their own pipelines (AUTONOMY)
Monitor – guardrails to keep people from hurting themselves (REPUTATION & COMPLIANCE)
Restate problem – manual handoff
Some automation, but still requires support from a centralized team
Use CodePipeline for automating deployment workflow
### All deployments must be done via pipeline
### Triggered by commit
Single pipeline per deployable application/service
### Only yes/no input
All infrastructure defined as CFN by developer
### Everything in code
2 repos – one for app, one for IaC…allows separate of roles inside a team
Pipeline is trigged when either one changes
### Define all Jenkins jobs as JobDSL in the IaC repository
Every pipeline execution runs the DSL
Source is built, unit tested and packaged
We’ll come back to CfnNag later….
3 stages…one per environment (automated testing, manual testing, production)
### Only manual step is between each env...approve/reject
Launch infrastructure via CloudFormation templates defined in the IaC repo
ASGs, ELBs, DBs
Deploy app that was built previously to new infrastructure
* Run end to end tests…selenium, resteasy, postman/newman
Blue green switch at the ELB to the new ASG
!!!UPDATE – blue/green
New problem…how to allow self-service to provision pipelines?
Don’t want to allow folks to create manually
Needed a pipeline factory!
Least privilege - Control who can create pipelines via IAM.
Govern – Pipeline is creating exactly as intended as users can only create whats in the approved template.
Versioned - Changes can be versioned allowing users to consume changes to pipelines at their own pace
Declarative > Imperative - Easier to manage as CloudFormation does a great job of converging incremental changes. Simply declare the desired state of your resources and CFN will make it happen…rather than you having to write the code to do that hard stuff
### CloudFormation is king – easier to version and apply incremental changes
### CloudFormation service role – a role that only is used by service catalog/cloudformation that has all the access…can’t be assumed by users
### ServiceCatalog to provide self service with governance
Demo script…(to be recorded)
Create team via SC
Login to Jenkins
View list of created stacks (cross account)
Create pipeline via SC
View CodePipeline
View Jenkins
View CodeCommit
Watch pipeline succeed
Service Catalog creates top level stack
Custom resource backed by lambda function, creates nested stacks in other accounts using IAM role
Can reattach to existing stack, useful for KMS keys and S3 buckets
### Retain important resources – buckets, keys, databases
Custom resource
One per account, uses AssumeRole to jump accounts
Shared template for all accounts, versioned
DeletionPolicy…retained and reattached
Self service documentation
How to get started
How to solve common problems
Changelog and migration documentation
Teams create the CFN for their ELBs, ASGs, Route53, RDS
The cfn-nag tool looks for patterns in CloudFormation templates that may indicate insecure infrastructure.
ELBs that are open to outside
Security group rules that are too permissive (wildcards)
Access logs that aren't enabled
Encryption that isn't enabled
### static analysis before deployment
Rules defined via custodian DSL
Deployed as lambda functions
Perform notification and remediation
Look for public buckets
Automatically remove grants and website hosting
Notify the resource owner
### setup processes to assess and enforce policy compliance
!!! Mode/type
Look for instance missing ”Cost Center” or “Team” tags
Stop the instance
Notify resource owner
Teams can define their own tests (functional or non-functional) as lambda functions
Modify S3 bucket ACL -> failed build
IAM role trust policy with non-HIS account -> failed build
Permissive security groups
!!! UPDATE - icon
Dynamic testing framework for infrastructure and application level functional and non functional tests
Verify Infrastructure aligns with AWS Best Practices (AWS Security Epics) and your own organizational governance
Application Level Functional Tests (Call my endpoints and assert the response)
Non-Functional Tests (Terminate instances in auto-scaling group, verify resiliency )
Framework allows for dynamically testing AWS best practices like (AWS Security Epics)
Framework capable of running cross account tests, in multiple accounts
Security Tests (Organizational / BU Level) are run in SecOps, but test infra in other accounts
Application Tests (Product Level) Created by the app team are executed in the deployment account(s)
Framework that can be directly integrated with the pipeline or used independently with minor changes
Embraces DevSecOps allowing the security team and the application teams to build security into the development process
Organization Level Test – Test define to verify enterprise or business unit requirements Product Level Test – Test written by the product team to verify security, functional, and non-functional requirements
Single CW dashboard showing metrics for each pipeline
SuccessCount
FailureCount
CycleTime
RedTime
GreenTime
### monitor health of pipelines
!!!UPDATE – new picture
Triggered by each CW event
Recorded as CW metric, pipeline/stage/action as dimensions
Dashboard, built nightly via lambda that queries CW metrics
!!!UPDATE - typo
SAM
Defines both the function and the event rule
SAM
Runs nightly
!!!UPDATE - cron schedule
Continuous Delivery
### Everything in code
### Deployed via pipeline
### Triggered by commit
### Only manual step is between each env...approve/reject
Self Service
### ServiceCatalog to provide self service with governance
!!! UPDATE – add bullet point
Self Service
### CloudFormation is king – easier to version and apply incremental changes
### CloudFormation service role – a role that only is used by service catalog/cloudformation that has all the access…can’t be assumed by users
### ServiceCatalog to provide self service with governance
### Retain important resources – buckets, keys, databases
Monitor
### static analysis before deployment
### setup process as guardrails that assess and enforce policy compliance
### monitor pipeline health
!!! UPDATE – add bullet point