Gustavo Veloso, Solutions Architect
16-07-19
AWS databases and analytics
Broad and deep portfolio, purpose-built for builders
Data Lake
S3/Glacier
Glue
(ETL & Data Catalog)
Data Movement
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams
Non-Relational Databases
DynamoDB
ElastiCache
(Redis, Memcached)
Neptune
(Graph)
Analytics
DW | Big Data Processing | Interactive
Redshift EMR Athena
Kinesis
Analytics
Elasticsearch
Service
Real-time
Relational Databases
RDS
Aurora
Business Intelligence & Machine Learning
QuickSight SageMaker Comprehend
What is your database
Strategy?
Two fundamental areas of focus
“Lift and shift” existing
apps to the cloud
Quickly build new
apps in the cloud
Two fundamental areas of focus
“Lift and shift” existing
apps to the cloud
Quickly build new
apps in the cloud
Old-guard commercial databases
Very
expensive
Proprietary Lock-in Punitive
licensing
You’ve
got mail
Relational data
• Divide data among tables
• Highly structured
• Relationships established via
keys enforced by the system
• Data accuracy and consistency
Patient
* Patient ID
First Name
Last Name
Gender
DOB
* Doctor ID
Visit
* Visit ID
* Patient ID
* Hospital ID
Date
* Treatment ID
Medical Treatment
* Treatment ID
Procedure
How Performed
Adverse Outcome
Contraindication
Doctor
* Doctor ID
First Name
Last Name
Medical Specialty
* Hospital Affiliation
Hospital
* Hospital ID
Name
Address
Rating
Relational data
// Doctors affiliated with Mercy hospital
Patient
* Patient ID
First Name
Last Name
Gender
DOB
* Doctor ID
Visit
* Visit ID
* Patient ID
* Hospital ID
Date
* Treatment ID
Medical Treatment
* Treatment ID
Procedure
How Performed
Adverse Outcome
Contraindication
Doctor
* Doctor ID
First Name
Last Name
Medical Specialty
* Hospital Affiliation
Hospital
* Hospital ID
Name
Address
Rating
SELECT
d.first_name, d.last_name
FROM
doctor as d,
hospital as h
WHERE
d.hospital = h.hospital_id
AND h.name = ‘Mercy';
// Number of patient visits each doctor
completed last week
SELECT
d.first_name, d.last_name, count(*)
FROM
visit as v,
hospital as h,
doctor as d
WHERE
v.hospital_id = h.hospital_id
AND h.hospital_id = d.hospital
AND v.t_date > date_trunc('week’,
CURRENT_TIMESTAMP - interval '1 week')
GROUP BY
d.first_name, d.last_name;
Managed services transform operations
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB software patches
Database backups
High Availability
DB software installs
OS installation
Scaling
Operating
Databases
in AWS
App optimization
you
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB software patches
Database backups
Scaling
High Availability
DB software installs
OS installation
you
App optimization
Operating
Databases
in the Old World
Moving to open source
database engines
+
Commercial-grade performance and reliability?
Scale compute
and storage with a
few clicks; minimal
downtime for your
application
Automatic Multi-AZ
data replication;
automated backup,
snapshots, and
failover
Data encryption at
rest and in transit;
industry compliance
and assurance
programs
Running many databases with Amazon RDS
Managed Relational Database Service with choice
Managed & Automated
Deploy and maintain
hardware, OS, and DB
software; built-in
monitoring
Performant & scalable Available & durable Secure & compliant
Key Amazon RDS Features
Managed Relational Database Service with choice
Amazon RDS
Configuration
Improve
Availability
Increase
Throughput
Reduce
Latency
Push-Button Scaling
Multi AZ
Read Replicas
Provisioned IOPS
Read ReplicasPush-Button Scaling
Provisioned IOPS
Region
Multi-AZ
availability
zone
availability
zone
Fully compatible with PostgreSQL
and MySQL,
with 3x – 5x the throughput
Storage volume striped across
hundreds of storage nodes
distributed over 3 different
availability zones
Six copies of data on SSD, two
copies in each availability zone, to
protect against AZ+1 failures
Continuous backup to Amazon S3
(built for 99.999999999%
durability)
Master Replica Replica Replica
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3
Large relational databases with Amazon Aurora
Scale-out, distributed, multi-tenant architecture
Large relational databases with Amazon Aurora
Scale-out, distributed, multi-tenant architecture
• Your data is replicated 6 ways
across 3 AZs
• Storage grows up to
64 TB* seamlessly
• Up to 15 Aurora Replicas with
instant crash recovery
AZ 1 AZ 2 AZ 3
Virtualized, cross-AZ storage layer
Aurora
Aurora Aurora Aurora Aurora
Size for the peak load
-or-
Continuously monitor and
manually scale up/down
Aurora Serverless . . .
Responds to your application load
automatically
• Scale capacity with no downtime
• Multi-tenant proxy is highly
available
• Scale target has warm buffer
pool
• Shuts down when not in use
Aurora is used by ¾ of the top 100 AWS customers
Amazon Aurora customer adoption
Fastest growing service in AWS history
AWS Database Migration Service
Migrating
Databases to AWS
150,000+
Databases migrated
Migrate between on-premises and AWS
Migrate between databases
Data replication for zero-downtime migration
Automated schema conversion
Two fundamental areas of focus
“Lift and shift” existing
apps to the cloud
Quickly build new
apps in the cloud
A one size fits all database doesn’t fit anyone
Modern Applications Need Purpose-Built Databases
Users: 1M+
Data volume: TB–PB–EB
Locality: Global
Performance: Milliseconds–microseconds
Request Rate: Millions
Access: Mobile, IoT, devices
Scale: Up-out-in
Economics: Pay as you go
Developer Access: Instant API access
Relational Key-value Document
In-memory Graph Search
AWS purpose-built strategy
The right tool for the right job
Relational
Non-Relational
Aurora RDS
ElastiCacheDynamoDB
Key-value Document
Neptune
Graph
Data models and common use cases
Relational Key-value Document In-memory Graph Search
Referential
integrity, ACID
transactions,
schema-on-write
Low-latency,
key look-ups with
high throughput
and fast ingestion
of data
Indexing and
storing
documents with
support
for query on
any attribute
Microseconds
latency,
key-based
queries, and
specialized
data structures
Creating and
navigating
relations
between data
easily
and quickly
Indexing and
searching
semistructured
logs and data
ERP, medical records,
CRM, finance
Real-time bidding,
shopping cart, IoT device
tracking
Content management,
personalization, mobile
Leaderboards, real-time
analytics, caching
Fraud detection, social
networking,
recommendation engine
Product catalog,
help/FAQs, full-text
Amazon Aurora
Amazon RDS
Amazon Redshift
Amazon
DynamoDB
Amazon
DynamoDB
Amazon
DocumentDB
Amazon
ElastiCache for
Redis &
Memcached
Amazon Neptune Amazon
Elasticsearch
Service
NoSQL vs. SQL for a new app: how to choose?
• Want simplest possible DB
management?
• Want app to manage DB integrity?
• Need joins, transactions, frequent
table scans?
• Want DB engine to manage DB
integrity?
• Team has SQL skills?
Amazon DynamoDB Amazon RDS
Let’s take a closer look at…
Key-value Graph
Key-value data
• Simple key value
pairs
• Partitioned by
keys
• Resilient to failure
• High throughput,
low-latency reads
and writes
• Consistent
performance at
scale
Gamers
Primary Key Attributes
GamerTag Level Points High Score Plays
Hammer57 21 4050 483610 1722
FluffyDuffy 5 1123 10863 43
Lol777313 14 3075 380500 1307
Jam22Jam 20 3986 478658 1694
ButterZZ_55 7 1530 12547 66
… … … … …
PUT {
TableName:"Gamers",
Item: {
"GamerTag":"Hammer57",
"Level":21,
"Points":4050,
"Score":483610,
"Plays":1722
} }
GET {
TableName:"Gamers",
Key: {
"GamerTag":"Hammer57“,
“ProjectionExpression“:”Points”
} }
Gamers
Primary Key
Attributes
Gamer Tag Type
Hammer57
Rank
Level Points Tier
87 4050 Elite
Status
Health Progress
90 30
Weapon
Class Damage Range
Taser 87% 50
FluffyDuffy
Rank
Level Points Tier
5 1072 Trainee
Status
Health Progress
37 8
Key-value data
// Status of Hammer57
GET {
TableName:"Gamers",
Key: {
"GamerTag":"Hammer57",
"Type":"Status” } }
// Return all Hammer57
QUERY {
TableName:“Gamers”,
KeyConditionExpression:"GamerTag = :a",
ExpressionAttributeValues: {
":a”:”Hammer57” } }
Amazon DynamoDB
Fully-managed nonrelational database for any scale
Secure
Encryption at rest and transit
Fine-grained access control
PCI, HIPAA, FIPS140-2 eligible
High performance
Fast, consistent performance
Virtually unlimited throughput
Virtually unlimited storage
Fully managed
Maintenance-free
Serverless
Auto scaling
Backup and restore
Global tables Global Tables
High-performance, globally
distributed applications
Multi-region redundancy
and resiliency
Easy to set up and no application
rewrites required
Amazon DocumentDB
Fast, scalable, highly available, fully managed MongoDB-compatible database service
Secure and compliant
Simple
and fully managed
Same code, drivers, and tools
you use with MongoDB
Millions of requests per
second, millisecond latency
2x throughput of
managed MongoDB services
Deeply integrated with
AWS services
Managed services for open source software
Redis, Memcached, Elasticsearch, Apache Hadoop, etc.
Fully managed
AWS manages all hardware
and software setup,
configuration, monitoring
Extreme performance
In-memory data store and cache
for sub-millisecond response times
Easily scalable
Non-disruptive scaling
up and down to
meet changing
demands
Amazon ElastiCache
Open and Secure
Direct access to open-source APIs
Secure access with VPC
Amazon Elasticsearch Service
Apache Hadoop Ecosystem
19 open-source frameworks
Low costs with S3 storage and Spot
Amazon EMR
Graph data
• Relationships are first-class objects
• Vertices connected by Edges
Vertex
PURCHASED PURCHASED
FOLLOWS
PURCHASED
KNOWS
PRODUCT
SPORT
FOLLOWS
Edge
Amit
Graph use case
Do you know…
Customers who also follow
sports purchased…
gremlin> g.V().has(‘name’,’sara’).as(‘customer’).out(‘follows’).in(‘follows’).out(‘purchased’)
where(neq(‘customer’)).dedup().by(‘name’).properties('name')
KNOW
S
PURCHASED PURCHASED
FOLLOWS
PURCHASED
KNOWS
PRODUCT
SPORT
FOLLOWS
Bill
FOLLOWS
Sara
// Identify a friend in common and
make a recommendation
gremlin> g.V().has('name','mary').as(‘start’).
both('knows').both('knows’).
where(neq(‘start’)).
dedup().by('name').properties('name')
Kevin
Mary
Highly connected data best represented in
a graph
Relational model
Foreign keys used to represent relationships
Queries can involve nesting & complex joins
Performance can degrade as datasets grow
Graph model
Relationships are first-order citizens
Write queries that navigate the graph
Results returned quickly, even on large datasets
Amazon Neptune
Fully managed graph database
Fast & Scalable ReliableFlexible
Store billions of relationships;
query with millisecond
latency
Six replicas of your
data across three AZs
with full backup and
restore
Build powerful
queries with
Gremlin and SPARQL
Supports Apache
TinkerPop & W3C
RDF graph models
Gremlin
SPARQL
Open Standards
Builders Day' - Databases on AWS: The Right Tool for The Right Job

Builders Day' - Databases on AWS: The Right Tool for The Right Job

  • 1.
    Gustavo Veloso, SolutionsArchitect 16-07-19
  • 2.
    AWS databases andanalytics Broad and deep portfolio, purpose-built for builders Data Lake S3/Glacier Glue (ETL & Data Catalog) Data Movement Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams Non-Relational Databases DynamoDB ElastiCache (Redis, Memcached) Neptune (Graph) Analytics DW | Big Data Processing | Interactive Redshift EMR Athena Kinesis Analytics Elasticsearch Service Real-time Relational Databases RDS Aurora Business Intelligence & Machine Learning QuickSight SageMaker Comprehend
  • 3.
    What is yourdatabase Strategy?
  • 4.
    Two fundamental areasof focus “Lift and shift” existing apps to the cloud Quickly build new apps in the cloud
  • 5.
    Two fundamental areasof focus “Lift and shift” existing apps to the cloud Quickly build new apps in the cloud
  • 6.
    Old-guard commercial databases Very expensive ProprietaryLock-in Punitive licensing You’ve got mail
  • 7.
    Relational data • Dividedata among tables • Highly structured • Relationships established via keys enforced by the system • Data accuracy and consistency Patient * Patient ID First Name Last Name Gender DOB * Doctor ID Visit * Visit ID * Patient ID * Hospital ID Date * Treatment ID Medical Treatment * Treatment ID Procedure How Performed Adverse Outcome Contraindication Doctor * Doctor ID First Name Last Name Medical Specialty * Hospital Affiliation Hospital * Hospital ID Name Address Rating
  • 8.
    Relational data // Doctorsaffiliated with Mercy hospital Patient * Patient ID First Name Last Name Gender DOB * Doctor ID Visit * Visit ID * Patient ID * Hospital ID Date * Treatment ID Medical Treatment * Treatment ID Procedure How Performed Adverse Outcome Contraindication Doctor * Doctor ID First Name Last Name Medical Specialty * Hospital Affiliation Hospital * Hospital ID Name Address Rating SELECT d.first_name, d.last_name FROM doctor as d, hospital as h WHERE d.hospital = h.hospital_id AND h.name = ‘Mercy'; // Number of patient visits each doctor completed last week SELECT d.first_name, d.last_name, count(*) FROM visit as v, hospital as h, doctor as d WHERE v.hospital_id = h.hospital_id AND h.hospital_id = d.hospital AND v.t_date > date_trunc('week’, CURRENT_TIMESTAMP - interval '1 week') GROUP BY d.first_name, d.last_name;
  • 9.
    Managed services transformoperations Power, HVAC, net Rack & stack Server maintenance OS patches DB software patches Database backups High Availability DB software installs OS installation Scaling Operating Databases in AWS App optimization you Power, HVAC, net Rack & stack Server maintenance OS patches DB software patches Database backups Scaling High Availability DB software installs OS installation you App optimization Operating Databases in the Old World
  • 10.
    Moving to opensource database engines + Commercial-grade performance and reliability?
  • 11.
    Scale compute and storagewith a few clicks; minimal downtime for your application Automatic Multi-AZ data replication; automated backup, snapshots, and failover Data encryption at rest and in transit; industry compliance and assurance programs Running many databases with Amazon RDS Managed Relational Database Service with choice Managed & Automated Deploy and maintain hardware, OS, and DB software; built-in monitoring Performant & scalable Available & durable Secure & compliant
  • 12.
    Key Amazon RDSFeatures Managed Relational Database Service with choice Amazon RDS Configuration Improve Availability Increase Throughput Reduce Latency Push-Button Scaling Multi AZ Read Replicas Provisioned IOPS Read ReplicasPush-Button Scaling Provisioned IOPS Region Multi-AZ availability zone availability zone
  • 13.
    Fully compatible withPostgreSQL and MySQL, with 3x – 5x the throughput Storage volume striped across hundreds of storage nodes distributed over 3 different availability zones Six copies of data on SSD, two copies in each availability zone, to protect against AZ+1 failures Continuous backup to Amazon S3 (built for 99.999999999% durability) Master Replica Replica Replica Availability Zone 1 Availability Zone 2 Availability Zone 3 Large relational databases with Amazon Aurora Scale-out, distributed, multi-tenant architecture
  • 14.
    Large relational databaseswith Amazon Aurora Scale-out, distributed, multi-tenant architecture • Your data is replicated 6 ways across 3 AZs • Storage grows up to 64 TB* seamlessly • Up to 15 Aurora Replicas with instant crash recovery AZ 1 AZ 2 AZ 3 Virtualized, cross-AZ storage layer Aurora Aurora Aurora Aurora Aurora Size for the peak load -or- Continuously monitor and manually scale up/down
  • 15.
    Aurora Serverless .. . Responds to your application load automatically • Scale capacity with no downtime • Multi-tenant proxy is highly available • Scale target has warm buffer pool • Shuts down when not in use
  • 16.
    Aurora is usedby ¾ of the top 100 AWS customers Amazon Aurora customer adoption Fastest growing service in AWS history
  • 17.
    AWS Database MigrationService Migrating Databases to AWS 150,000+ Databases migrated Migrate between on-premises and AWS Migrate between databases Data replication for zero-downtime migration Automated schema conversion
  • 18.
    Two fundamental areasof focus “Lift and shift” existing apps to the cloud Quickly build new apps in the cloud
  • 19.
    A one sizefits all database doesn’t fit anyone Modern Applications Need Purpose-Built Databases Users: 1M+ Data volume: TB–PB–EB Locality: Global Performance: Milliseconds–microseconds Request Rate: Millions Access: Mobile, IoT, devices Scale: Up-out-in Economics: Pay as you go Developer Access: Instant API access Relational Key-value Document In-memory Graph Search
  • 20.
    AWS purpose-built strategy Theright tool for the right job Relational Non-Relational Aurora RDS ElastiCacheDynamoDB Key-value Document Neptune Graph
  • 21.
    Data models andcommon use cases Relational Key-value Document In-memory Graph Search Referential integrity, ACID transactions, schema-on-write Low-latency, key look-ups with high throughput and fast ingestion of data Indexing and storing documents with support for query on any attribute Microseconds latency, key-based queries, and specialized data structures Creating and navigating relations between data easily and quickly Indexing and searching semistructured logs and data ERP, medical records, CRM, finance Real-time bidding, shopping cart, IoT device tracking Content management, personalization, mobile Leaderboards, real-time analytics, caching Fraud detection, social networking, recommendation engine Product catalog, help/FAQs, full-text Amazon Aurora Amazon RDS Amazon Redshift Amazon DynamoDB Amazon DynamoDB Amazon DocumentDB Amazon ElastiCache for Redis & Memcached Amazon Neptune Amazon Elasticsearch Service
  • 22.
    NoSQL vs. SQLfor a new app: how to choose? • Want simplest possible DB management? • Want app to manage DB integrity? • Need joins, transactions, frequent table scans? • Want DB engine to manage DB integrity? • Team has SQL skills? Amazon DynamoDB Amazon RDS
  • 23.
    Let’s take acloser look at… Key-value Graph
  • 24.
    Key-value data • Simplekey value pairs • Partitioned by keys • Resilient to failure • High throughput, low-latency reads and writes • Consistent performance at scale Gamers Primary Key Attributes GamerTag Level Points High Score Plays Hammer57 21 4050 483610 1722 FluffyDuffy 5 1123 10863 43 Lol777313 14 3075 380500 1307 Jam22Jam 20 3986 478658 1694 ButterZZ_55 7 1530 12547 66 … … … … … PUT { TableName:"Gamers", Item: { "GamerTag":"Hammer57", "Level":21, "Points":4050, "Score":483610, "Plays":1722 } } GET { TableName:"Gamers", Key: { "GamerTag":"Hammer57“, “ProjectionExpression“:”Points” } }
  • 25.
    Gamers Primary Key Attributes Gamer TagType Hammer57 Rank Level Points Tier 87 4050 Elite Status Health Progress 90 30 Weapon Class Damage Range Taser 87% 50 FluffyDuffy Rank Level Points Tier 5 1072 Trainee Status Health Progress 37 8 Key-value data // Status of Hammer57 GET { TableName:"Gamers", Key: { "GamerTag":"Hammer57", "Type":"Status” } } // Return all Hammer57 QUERY { TableName:“Gamers”, KeyConditionExpression:"GamerTag = :a", ExpressionAttributeValues: { ":a”:”Hammer57” } }
  • 26.
    Amazon DynamoDB Fully-managed nonrelationaldatabase for any scale Secure Encryption at rest and transit Fine-grained access control PCI, HIPAA, FIPS140-2 eligible High performance Fast, consistent performance Virtually unlimited throughput Virtually unlimited storage Fully managed Maintenance-free Serverless Auto scaling Backup and restore Global tables Global Tables High-performance, globally distributed applications Multi-region redundancy and resiliency Easy to set up and no application rewrites required
  • 27.
    Amazon DocumentDB Fast, scalable,highly available, fully managed MongoDB-compatible database service Secure and compliant Simple and fully managed Same code, drivers, and tools you use with MongoDB Millions of requests per second, millisecond latency 2x throughput of managed MongoDB services Deeply integrated with AWS services
  • 28.
    Managed services foropen source software Redis, Memcached, Elasticsearch, Apache Hadoop, etc. Fully managed AWS manages all hardware and software setup, configuration, monitoring Extreme performance In-memory data store and cache for sub-millisecond response times Easily scalable Non-disruptive scaling up and down to meet changing demands Amazon ElastiCache Open and Secure Direct access to open-source APIs Secure access with VPC Amazon Elasticsearch Service Apache Hadoop Ecosystem 19 open-source frameworks Low costs with S3 storage and Spot Amazon EMR
  • 29.
    Graph data • Relationshipsare first-class objects • Vertices connected by Edges Vertex PURCHASED PURCHASED FOLLOWS PURCHASED KNOWS PRODUCT SPORT FOLLOWS Edge
  • 30.
    Amit Graph use case Doyou know… Customers who also follow sports purchased… gremlin> g.V().has(‘name’,’sara’).as(‘customer’).out(‘follows’).in(‘follows’).out(‘purchased’) where(neq(‘customer’)).dedup().by(‘name’).properties('name') KNOW S PURCHASED PURCHASED FOLLOWS PURCHASED KNOWS PRODUCT SPORT FOLLOWS Bill FOLLOWS Sara // Identify a friend in common and make a recommendation gremlin> g.V().has('name','mary').as(‘start’). both('knows').both('knows’). where(neq(‘start’)). dedup().by('name').properties('name') Kevin Mary
  • 31.
    Highly connected databest represented in a graph Relational model Foreign keys used to represent relationships Queries can involve nesting & complex joins Performance can degrade as datasets grow Graph model Relationships are first-order citizens Write queries that navigate the graph Results returned quickly, even on large datasets
  • 32.
    Amazon Neptune Fully managedgraph database Fast & Scalable ReliableFlexible Store billions of relationships; query with millisecond latency Six replicas of your data across three AZs with full backup and restore Build powerful queries with Gremlin and SPARQL Supports Apache TinkerPop & W3C RDF graph models Gremlin SPARQL Open Standards