Data Con LA 2020
Description
One of the challenges our team is facing to deliver software to production is lack of consistency and excessive manual labor.
Building a continuous integration and continuous delivery pipeline helps delivering software to production environment with speed, safety and reliability.
The concept of automation and tooling is not new but finding the right tools that best fit our need becomes more difficult given a lot of tools is available.
The talk will take a look on how we use the AWS Resources to support the software lifecycle by automating a series of steps
Speakers
Babu Repaka, California State University, Office of the Chancellor, Cloud DataOps Admin Engineer
Maria Fung, California State University, Office of the Chancellor, Cloud DataOps Admin Engineer
Statistics notes ,it includes mean to index numbers
CICD Pipeline and delivery of Apache Spark Applications on the cloud using AWS
1. CICD Pipeline and delivery of Apache Spark Applications on
the cloud using AWS
Maria Fung
Data Warehouse Development Lead
California State University, Office of the Chancellor
mfung@calstate.edu
Babu Repaka
Business Intelligence Solution Architect
California State University, Office of the Chancellor
brepaka@calstate.edu
October 25, 2020
2. • The Largest and most
diverse system of 4-
year higher education in
the U.S.
• 23 Campuses
• ~50K Employees
• Nearly 500k enrolled
students this year
3. CICD Overview
CSU Past technologies
CSU Present & Future Cloud with Agile methodology
- Devops
- Dataops
- CICD
Develop
Build
Package
Test
Deploy
Operate
5. Current CICD Process Overview
S3 bucket
Start
Developer A
CodeCommit
Check in code
to Git
Notify Peers to
review pull
request
Automatically
merge Pull
Request
Developers
B,C,D
CloudWatch
Event
Rule
Alarm
SNS
Topic
email
CodeBuild
Code Testing
Pytest.py
Code Coverage
Coverage.py Submit EMR steps
to Dev EMR cluster
Build and
package
dependencies
from Git
CodeBuild CodePipeline
EMR
Source Test Build Deploy
CloudFormation
Production
Deployment
Create Pull
Request and
resolve conflicts
Test
Result
?
review
comments and
changes
Approve
?
Yes
No
Failed
Passed
Lambda
6. Security for the CI/CD Pipeline
Develop
Build
Package
Test
Deploy
Operate
Security
7. Multi-factor authentication (MFA)
Role assignments and segregation of duties
Test Cases and acceptable outcomes
Secured code repositories
Secured Build and package environment
Sensitive information
Weekly Security Assessment Review using AWS Trust Advisor, AWS inspector and AWS Guard Duty
Security Checklist
8. AWS Trust Advisor is being used to review the following:
Any exceptions raised by the tool will be investigated and address right away
Checklist cost
Checklist fault
tolerant
Checklist
performance
Checklist
security
Checklist
Management and Governance
10. Development Pipeline
Start
Developer create Pull
Request for changes from
developer’s branch to dev
branch
Notify Approvers to approval
pull request
Submit EMR steps on dev
EMR Cluster
Test
Result?
Auto Update Pull Request
comment and merge from
developer’s changes to dev
branch
Failed
Success
Run unit testing and code
coverage
Pull
Request
Approve?
Developer review comments
and commit changes
No
Yes
CloudWatch
SNS
Topic email
Event
Rule
S3
Lambda
CodeCommit
CodeBuild
No
Code Coverage
Pytest
EMR
13. Production Deployment
Devops admin
Submit Pull
Request to merge
from Dev to
master branch
Notify Manager
for approval
Update Pull Request
comment and merge to
master branch
Pull
Request
Approve?
No
Yes
Build all dependencies
and packaging all files to
production s3 bucket
Deploy cloudformation
stack to provision
production EMR cluster
and run steps
Developer review
comments and
suggested changes
CloudWatch SNS
Topic emailEvent Rule
Lambda
KMS key
CodeCommit
CodeBuild
CodePipeline
EMR
CloudFormation
S3
S3
Dev Account Prod Account
Lambda
IAM Role Permissions
Cross-account
Role Permissions
STS Assume Role
PROD stack
Create
change set
Execute
change set
14. Lesson Learned
Continuous learning and exploration
Challenges
Lack of jobs orchestration in AWS EMR.
AWS Code Pipeline Cross Account Roles are not transparent on the AWS console
Next Steps
Explore better jobs orchestration
Analyzing & planning to implement AWS EKS
15. Thank you for
watching
Maria Fung
mfung@calstate.edu
Babu Repaka
brepaka@calstate.edu
maria-fung-7931001b4
babu-repaka-3167534
Editor's Notes
The CSU awards nearly half of the state’s baccalaureate degrees.
1 in 10 employed graduates came from CSU… representing 1 in 20 college degree holders nationwide!
We produce in the neighborhood of 120,000 graduates per year and our more than 3.4 MILLION living alumni are employed in every field across the world