How is cloud changing data storage options for development teams at Thomson Reuters? Come hear how projects are changing the way they work with data in the cloud and what role a centralized cloud team can play in helping your business get products to market more quickly without worry about ending up on the front page of the news as the latest data breach. Any storage medium is up for discussion, but we’ll be primarily sticking to relational databases, elastic search, NoSQL and object storage. This will be useful to both teams that are looking to just get started in AWS to teams who already have production workloads in AWS. Although it assumes a basic knowledge of the relational database, elastic search, and NoSQL options in AWS, you will be able to get value if you haven’t used those technologies before.
1. Moving Quickly With Data Services In The Cloud
Matt Dimich
Cloud Architect
Thomson Reuters
@JobsWithUs
#WorkingAtTR
#HappyAtTR
2. What do you mean by Data Services?
Technologies
• Aurora MySQL
• Amazon S3
• Elastic Search
• Aurora PostgreSQL
• DMS/SCT
• DynamoDB
• ElastiCache
• Big Data
Areas of Concern
• Automate Creation/Provisioning
• Automate “Schema” Changes
• Data Security – Network
• Data Security – Access
• Data Security – Encryption
• High Availability
• Cost Monitoring & Reporting
• Logging, Monitoring & Alerting
• Connection Management
• Backup & Restore
• Scale & Limits
• Retry Logic, Error Handling &
Transactional Integrity
• Disaster Recovery
• Audit Infrastructure
• Internationalization
• Data Consistency
• Maintenance
• Support Team
3. Data Services Before Cloud…
• Provision huge, costly database servers
• Elastic Search would provision monster elastic search nodes so they could scale
• Lots of human hands would touch each change to the database
• Leads to high cost for experiments if you need a database
• Large effort to figure out the storage and compute appropriately
• (and a big time penalty if you get it wrong)
• Focused on High Availability
4. What’s changing…
• Provisioning takes minutes
• Scaling is easier than ever
• Start ups suddenly have the power to get an app out with relatively low cost and high scale
• Focus has shifted from high availability to time to market.
6. Automated Provisioning
•How does it work?
• AWS CloudFormation creates our database
clusters for us.
• Consistent
• Repeatable
• Embedded Standards (i.e. default to encryption
at rest)
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-whatis-howdoesitwork.html
AWS CloudFormation can update our database clusters as well
11. Automated Schema Change Pipeline
• Reduce human error
• Repeatable
• Consistent
• Same tooling as App Deployment
• Lower time to deployment
• Build in safeguards
12. Multiple Iterations
1. Flyway with Jenkins on Prem
2. Flyway with Spinnaker and Jenkins
in AWS
3. Flyway with AWS Code Pipeline in
AWS
15. S3
• Simple Storage Service
• Object Storage
• Allows hierarchy structure
• Has powerful lifecycle rules to expire items or
move to items to cheaper storage classes etc.
HTTP
AWS Command Line
AWS Web Console
16. Simple Right?
It turns out this simple storage service is actually quite complex to set up…correctly.
29. Why is this so hard?
• How do you lock down a bucket?
• IAM Policy
• AWS-managed policy attached to a user, group or role
• Self-managed policy attached to a user, group or role
• Inline policy for a user, group, or role
• Bucket Policy
• Bucket ACL
• Object ACL
31. How to avoid the front page of the news
• Spend time creating secure CloudFormation then use that everywhere!
• At TR we often use predefined bucket types with built in standards/safeguards
• Resources – For your microservice and only your microservice
• Infrastructure – What it sounds like, stuff outside the app.
• Website – For public facing static content
35. How do I restore to a point-in-time with S3?
• S3 has versioning for each object
• Everything is available on the API
• Enter the PIT restore tool
• Dry run available
• Estimate size
• Loops through a bucket, path or just one file and
reverts it to the version that was current during the
entered point-in-time
37. Elastic Search
• It’s not as easy as saying AWS every time
• Platform Elastic Search has automated their Elastic Search distribution on AWS
and Azure.
• Gives us custom plugins and encryption at rest
• A step toward a managed service
38. But I Have Experience Running This
• Same ES distribution as our Data Centers
• Different hardware profile
• The team started smaller
• But not small enough
• Force our applications to ”tip over” the
resources before upgrading