SlideShare a Scribd company logo
1 of 33
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
DAT311: Technol ogy trend s i n Data Pro cessi ng
A n u r a g G u p t a , V i c e P r e s i d e n t , A m a z o n W e b S e r v i c e s
a w g u p t a @ a m a z o n . c o m
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
>
>
>
Manag ing e x p losion of data
Se rve rle ss, API -ce ntric comp u ting
Glob al u se rs, local acce ss e x p e rie nce
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Managing Data Explosion with Data Lakes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Traditionally, analytics used to look like this
OLTP ERP CRM LOB
Data Warehouse
Business Intelligence Relational data
TBs-PBs scale
Schema defined prior to data load
Operational reporting and ad hoc
Large initial capex + $10k-$50k / TB
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Transition from IT to
DevOps increases rate of
change
Network connected smart devices
drive variety and volume of data
Micro-services architecture
increases need for real-time
monitoring and analytics
Machine-generated data is growing 10x faster than business data
Source: insideBigData - The Exponential Growth of Data, February 16, 2017
Explosion of machine-generated data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data lakes extend the traditional approach
Relational and non-relational data
TBs-EBs scale
Schema defined during analysis
Diverse analytical engines to gain insights
Designed for low-cost storage and analytics
OLTP ERP CRM LOB
Data Warehouse
Business
Intelligence
Data Lake
100110000100101011100
101010111001010100001
011111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
Learning
DW
Queries
Big data
processing
Interactive Real-time
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Snowball
Snowmobile Kinesis
Data Firehose
Kinesis
Data Streams
S3
Most ways to bring data in
Unmatched durability and availability at EB scale
Best security, compliance, and audit capabilities
Run any analytics on the same data without movement
Scale storage and compute independently
Store at $0.027 / GB-month; Query for $0.05/GB scanned
Redshift
EMR
Athena
Kinesis
Elasticsearch Service
Data lakes on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Layers of a data lake
INGEST
DISCOVER
ANALYZE
INFER
CRAWL, CATALOG, INDEX, SECURE
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A W S G l u e – S e rve rles s D a t a c a talog & E T L s e r vi ce
Data Catalog
ETL Job
authoring
Discover data and
extract schema
Auto-generates
customizable ETL code
in Python and Spark
Automatically discovers data and stores schema
Data searchable, and available for ETL
Generates customizable code
Schedules and runs your ETL jobs
Serverless
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Crawlers: Automatic schema inference
semi-structured
per-file schema
semi-structured
unified schema
identify file type
and parse files
enumerate
S3 objects
file 1
file 2
file N
… int
array
intchar
struct
char int
array
struct
char
bool int
int
arrayint
char
char int
custom classifiers
Apache log parser
built-in classifiers
JSON parser
CSV parser
Parquet parser
…
bool
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Crawlers: Automatic partitions detection
Estimate schema similarity among files at each level to
handle semi-structured logs, schema evolution…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift – Data Warehousing
Fast, powerful, simple, and fully managed data warehouse at 1/10 the cost
Massively parallel, scale from gigabytes to petabytes
Fast at scale
Columnar storage
technology to improve I/O
efficiency and scale query
performance
$
Inexpensive
As low as $1,000 per
terabyte per year, 1/10th
the cost of traditional data
warehouse solutions; start
at $0.25 per hour
Open file formats Secure
Audit everything; encrypt
data end-to-end;
extensive certification and
compliance
Analyze optimized data
formats on the latest SSD,
and all open data formats
in Amazon S3
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EMR – Big data processing
Analytics and ML at scale
19 open-source projects: Apache Hadoop, Spark, HBase, Presto, and more
Enterprise-grade security
$
Latest versions
Updated with the latest
open source frameworks
within 30 days of release
Low cost
Flexible billing with per-
second billing, EC2 spot,
reserved instances, and
auto-scaling to reduce
costs 50-80%
Use S3 storage
Process data directly in
the S3 data lake securely
with high performance
using the EMRFS
connector
Easy
Launch fully managed
Hadoop & Spark in minutes;
no cluster setup, node
provisioning, cluster tuning
Data Lake
100110000100101011100
1010101110010101000
00111100101100101
010001100001
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Elasticsearch Service
Easy to Use
Fully-managed.
Deploy production-ready
clusters in minutes.
Open
Direct access to
Elasticsearch open-source
APIs. Supports Logstash
and Kibana.
Secure
Secure access with VPC
to keep all traffic within
AWS network.
Available
Zone awareness replicates
data between two AZs;
automatically monitors and
replaces failed nodes.
Easy to deploy, secure, operate, and scale Elasticsearch
Customers use Elasticsearch for log analytics, full-text search, and application
monitoring
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis – Real time
Easily to collect, process, and analyze video and data streams in real time
Capture, process,
and store video
streams for analytics
Load data streams
into AWS data stores
Analyze data streams
with SQL
Build custom
applications that
analyze data streams
Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics
SQL
New
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
H i g hly c o n necte d d a t a, b e s t r e p re se nted i n a g r a p h
Relational model
Foreign keys used to represent relationships
Queries can involve nesting & complex joins
Performance can degrade as datasets grow
Graph model
Relationships are first-order citizens
Easy to write queries that navigate the graph
Results returned quickly, even on large datasets
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Existing graph databasesRelational databases
Too
expensive
Difficult to
maintain high
availability
Difficult to
scale
Limited support
for open
standards
$
Inefficient
graph
processing
Unnatural for
querying
graph
Rigid schema,
inflexible for
changing graphs
Building apps with highly connected data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Neptune
F u l l y m a n a g e d g r a p h d a t a b a s e
Fast ReliableOpen
Query billions of
relationships with
millisecond latency
Six replicas of your
data across three AZs,
with full backup and
restore
Build powerful
queries easily with
Gremlin and SPARQL
Supports Apache
TinkerPop & W3C
RDF graph models
Gremlin
SPARQL
Easy
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serverless, API-Centric Computing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serverless Analytics
Deliver cost-effective analytic solutions faster
Amazon
S3
Data Lake
AWS Glue
(ETL & Data
Catalog)
Amazon
Athena
Amazon
QuickSight
Serverless. Zero
Infrastructure. Zero
Administration.
Never pay for
idle resources
$
Availability and
fault tolerance
built in
Automatically
scales resources
with usage
AWS IoT
Devices Web Sensors Social
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena—interactive analysis
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
$
SQL
Query Instantly
Zero setup cost; just
point to S3 and start
querying
Pay per query
Pay only for queries run;
save 30-90% on per-query
costs through
compression
Open
ANSI SQL interface,
JDBC/ODBC drivers, multiple
formats, compression types,
and complex joins and data
types
Easy
Serverless: zero
infrastructure, zero
administration
Integrated with QuickSight
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redshift Spectrum
E x t e n d t h e d a t a w a r e h o u s e t o y o u r S 3 d a t a l a k e
S3 data lakeRedshift data
Redshift Spectrum
query engine
Exabyte Redshift SQL queries against S3
Join data across Redshift and S3
Scale compute and storage separately
Stable query performance and unlimited concurrency
CSV, ORC, Grok, Avro, and Parquet data formats
Pay only for the amount of data scanned
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Aurora Serverless
 Starts up on demand, shuts
down when not in use
 Automatically scales with no
instances to manage
 No impact to applications
during scaling events
 Pay per second for the
database capacity you use
Warm pool
of instances
Application
Database Storage
Scalable DB capacity
Request Router
Database end-point
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Instance provisioning and scaling
 First request triggers provisioning of a
database instance. It typically takes
about 5-10 secs.
 Instances scale-up and scale-down
automatically in response to changes in
workloads. Instance scaling takes about
1-3 secs.
 Instances are hibernated after a user-
defined period of inactivity
 Scaling operations are transparent to
the application – user sessions are not
terminated
 Database storage is persisted until
explicitly deleted by user
Database Storage
Warm Pool
Application
Request
Router
Current
Instance
New
Instance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Global Users, Local Processing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DynamoDB Global Tables ( G A )
Fi r s t f u l l y m a n a g e d , m u l t i - m a s t e r , m u l t i - r e g i o n d a t a b a s e
Build high performance, globally distributed applications
Low latency reads & writes to locally available tables
Disaster proof with multi-region redundancy
Easy to set up, and no application rewrites required
Globally dispersed users
Global Table
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Distributed Lock Manager
GLOBAL
RESOURCE
MANAGER
SQL
TRANSACTIONS
CACHING
LOGGING
SQL
TRANSACTIONS
CACHING
LOGGING
SHARED DISK CLUSTER
STORAGE
APPLICATION
LOCKING PROTOCOL MESSAGES
SHARED STORAGE
M1 M2 M3
M1 M1 M1M2 M3 M2
Cons
Heavyweight cache coherency traffic, on per-lock basis
Networking can be expensive
Negative scaling when hot blocks
Pros
All data available to all nodes
Easy-to-build applications
Similar cache coherency as in multi-processors
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Consensus with two phase or Paxos commit
DATA
RANGE #1
DATA
RANGE #2
DATA
RANGE #4
DATA
RANGE #3
DATA
RANGE #5
L
L L
L
L
SHARED NOTHING
SQL
TRANSACTIONS
CACHING
LOGGING
SQL
TRANSACTIONS
CACHING
LOGGING
APPLICATION
STORAGE STORAGE
Cons
Heavyweight commit and membership change protocols
Range partitioning can result in hot partitions, not just hot
blocks. Re-partitioning expensive.
Cross partition operations expensive. Better at small
requests
Pros
Query broken up and sent to data node
Less coherence traffic – just for commits
Can scale to many nodes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Conflict resolution using distributed ledgers
There are many “oases” of
consistency in Aurora
The database nodes know
transaction orders from that
node
The storage nodes know
transactions orders applied at
that node
Only have conflicts when data
changed at both multiple
database nodes AND multiple
storage nodes
Much less coordination required
2 3 4 5 61
BT1 [P1]
BT2 [P1]
BT3 [P1]
BT4 [P1]
BT1
BT2
BT3
BT4
Page1
Quorum
OT1
OT2
OT3
OT4
Page 1 Page 2
2 3 4 5 61
OT1[P2]
OT2[P2]
OT3[P2]
OT4[P2]
PAGE1 PAGE2
MASTER
MASTER
Page 2
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hierarchical conflict resolution
Both masters are writing to two
pages P1 and P2
BLUE master wins the quorum at
page P1; ORANGE master wins
quorum at P2
Both masters recognize the conflict
and have two choices: (1) roll back
the transactions or (2) escalate to
regional resolver
Regional arbitrator decides who
wins the tie breaker.
2 3 4 5 61
BT1 [P1]
OT1 [P1]
2 3 4 5 61
OT1[P2]
BT1[P2]
PAGE1 PAGE2
MASTER
MASTER
BT1 OT1
Regional
resolver
Page 1 Page 2 Page 1 Page 2
Quorum
X X
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Crash recovery in multi-master
CRASH
MULTI-MASTERSINGLE MASTER
Log records Gaps
Volume Complete
LSN (VCL)
AT CRASH
IMMEDIATELY AFTER RECOVERY
MASTER 1
CRASHES
GAPS
AT CRASH
Consistency
Point LSN(CPL)
VCLVCL
CPL CPL
MASTER 1
MASTER 2
IMMEDIATELY AFTER RECOVERY
Gap filled New LSNs
and Gaps
Master 1
Recovery Point
Consistency
Point LSN(CPL)
Multi-region Multi-master
Write accepted locally
Optimistic concurrency control – no distributed lock
manager, no chatty lock management protocol
REGION 1 REGION 2
HEAD NODES HEAD NODES
MULTI-AZ STORAGE VOLUME MULTI-AZ STORAGE VOLUME
LOCAL PARTITION LOCAL PARTITIONREMOTE PARTITION REMOTE PARTITION
Conflicts handled hierarchically – at head nodes, at
storage nodes, at AZ and region level arbitrators
Near-linear performance scaling when there is no or
low levels of conflicts
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017Amazon Web Services
 
FSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingFSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingAmazon Web Services
 
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...Amazon Web Services
 
GAM401_Designing for the Future Building a Flexible Event-based Analytics Arc...
GAM401_Designing for the Future Building a Flexible Event-based Analytics Arc...GAM401_Designing for the Future Building a Flexible Event-based Analytics Arc...
GAM401_Designing for the Future Building a Flexible Event-based Analytics Arc...Amazon Web Services
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Amazon Web Services
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...Amazon Web Services
 
How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019Randall Hunt
 
DynamoDB - What's new - DAT304 - re:Invent 2017
DynamoDB - What's new - DAT304 - re:Invent 2017DynamoDB - What's new - DAT304 - re:Invent 2017
DynamoDB - What's new - DAT304 - re:Invent 2017Amazon Web Services
 
Building High Performance Apps with In-memory Data
Building High Performance Apps with In-memory DataBuilding High Performance Apps with In-memory Data
Building High Performance Apps with In-memory DataAmazon Web Services
 
ABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS GlueABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS GlueAmazon Web Services
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSAmazon Web Services
 
Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...
Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...
Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...Amazon Web Services
 
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Amazon Web Services
 
How DynamoDB Powered Amazon Prime Day 2017 - DAT326 - re:Invent 2017
How DynamoDB Powered Amazon Prime Day 2017 - DAT326 - re:Invent 2017How DynamoDB Powered Amazon Prime Day 2017 - DAT326 - re:Invent 2017
How DynamoDB Powered Amazon Prime Day 2017 - DAT326 - re:Invent 2017Amazon Web Services
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeAmazon Web Services
 
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...Amazon Web Services
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...Amazon Web Services
 

What's hot (20)

Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
 
FSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingFSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory Reporting
 
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
 
GAM401_Designing for the Future Building a Flexible Event-based Analytics Arc...
GAM401_Designing for the Future Building a Flexible Event-based Analytics Arc...GAM401_Designing for the Future Building a Flexible Event-based Analytics Arc...
GAM401_Designing for the Future Building a Flexible Event-based Analytics Arc...
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
 
How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019
 
Using Graph Databases
Using Graph DatabasesUsing Graph Databases
Using Graph Databases
 
DynamoDB - What's new - DAT304 - re:Invent 2017
DynamoDB - What's new - DAT304 - re:Invent 2017DynamoDB - What's new - DAT304 - re:Invent 2017
DynamoDB - What's new - DAT304 - re:Invent 2017
 
Building High Performance Apps with In-memory Data
Building High Performance Apps with In-memory DataBuilding High Performance Apps with In-memory Data
Building High Performance Apps with In-memory Data
 
ABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS GlueABD215_Serverless Data Prep with AWS Glue
ABD215_Serverless Data Prep with AWS Glue
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWS
 
Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...
Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...
Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...
 
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
 
How DynamoDB Powered Amazon Prime Day 2017 - DAT326 - re:Invent 2017
How DynamoDB Powered Amazon Prime Day 2017 - DAT326 - re:Invent 2017How DynamoDB Powered Amazon Prime Day 2017 - DAT326 - re:Invent 2017
How DynamoDB Powered Amazon Prime Day 2017 - DAT326 - re:Invent 2017
 
Graph and Amazon Neptune
Graph and Amazon NeptuneGraph and Amazon Neptune
Graph and Amazon Neptune
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
 
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
 

Similar to Technology Trends in Data Processing - DAT311 - re:Invent 2017

AWS Database and Analytics State of the Union
AWS Database and Analytics State of the UnionAWS Database and Analytics State of the Union
AWS Database and Analytics State of the UnionAmazon Web Services
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansAmazon Web Services
 
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...Amazon Web Services
 
AWS Database and Analytics State of the Union
AWS Database and Analytics State of the UnionAWS Database and Analytics State of the Union
AWS Database and Analytics State of the UnionAmazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with ZopaAmazon Web Services
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightAmazon Web Services
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...Amazon Web Services
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Amazon Web Services
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 
Applying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS SummitApplying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS SummitAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017Amazon Web Services
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS SummitApplying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS SummitAmazon Web Services
 

Similar to Technology Trends in Data Processing - DAT311 - re:Invent 2017 (20)

AWS Database and Analytics State of the Union
AWS Database and Analytics State of the UnionAWS Database and Analytics State of the Union
AWS Database and Analytics State of the Union
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data Oceans
 
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
 
AWS Database and Analytics State of the Union
AWS Database and Analytics State of the UnionAWS Database and Analytics State of the Union
AWS Database and Analytics State of the Union
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Applying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS SummitApplying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS Summit
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS SummitApplying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
Deep Dive on Big Data
Deep Dive on Big Data Deep Dive on Big Data
Deep Dive on Big Data
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Technology Trends in Data Processing - DAT311 - re:Invent 2017

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT DAT311: Technol ogy trend s i n Data Pro cessi ng A n u r a g G u p t a , V i c e P r e s i d e n t , A m a z o n W e b S e r v i c e s a w g u p t a @ a m a z o n . c o m
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda > > > Manag ing e x p losion of data Se rve rle ss, API -ce ntric comp u ting Glob al u se rs, local acce ss e x p e rie nce
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Managing Data Explosion with Data Lakes
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Traditionally, analytics used to look like this OLTP ERP CRM LOB Data Warehouse Business Intelligence Relational data TBs-PBs scale Schema defined prior to data load Operational reporting and ad hoc Large initial capex + $10k-$50k / TB
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transition from IT to DevOps increases rate of change Network connected smart devices drive variety and volume of data Micro-services architecture increases need for real-time monitoring and analytics Machine-generated data is growing 10x faster than business data Source: insideBigData - The Exponential Growth of Data, February 16, 2017 Explosion of machine-generated data
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data lakes extend the traditional approach Relational and non-relational data TBs-EBs scale Schema defined during analysis Diverse analytical engines to gain insights Designed for low-cost storage and analytics OLTP ERP CRM LOB Data Warehouse Business Intelligence Data Lake 100110000100101011100 101010111001010100001 011111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine Learning DW Queries Big data processing Interactive Real-time
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Snowball Snowmobile Kinesis Data Firehose Kinesis Data Streams S3 Most ways to bring data in Unmatched durability and availability at EB scale Best security, compliance, and audit capabilities Run any analytics on the same data without movement Scale storage and compute independently Store at $0.027 / GB-month; Query for $0.05/GB scanned Redshift EMR Athena Kinesis Elasticsearch Service Data lakes on AWS
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Layers of a data lake INGEST DISCOVER ANALYZE INFER CRAWL, CATALOG, INDEX, SECURE
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A W S G l u e – S e rve rles s D a t a c a talog & E T L s e r vi ce Data Catalog ETL Job authoring Discover data and extract schema Auto-generates customizable ETL code in Python and Spark Automatically discovers data and stores schema Data searchable, and available for ETL Generates customizable code Schedules and runs your ETL jobs Serverless
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Crawlers: Automatic schema inference semi-structured per-file schema semi-structured unified schema identify file type and parse files enumerate S3 objects file 1 file 2 file N … int array intchar struct char int array struct char bool int int arrayint char char int custom classifiers Apache log parser built-in classifiers JSON parser CSV parser Parquet parser … bool
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Crawlers: Automatic partitions detection Estimate schema similarity among files at each level to handle semi-structured logs, schema evolution…
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift – Data Warehousing Fast, powerful, simple, and fully managed data warehouse at 1/10 the cost Massively parallel, scale from gigabytes to petabytes Fast at scale Columnar storage technology to improve I/O efficiency and scale query performance $ Inexpensive As low as $1,000 per terabyte per year, 1/10th the cost of traditional data warehouse solutions; start at $0.25 per hour Open file formats Secure Audit everything; encrypt data end-to-end; extensive certification and compliance Analyze optimized data formats on the latest SSD, and all open data formats in Amazon S3
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EMR – Big data processing Analytics and ML at scale 19 open-source projects: Apache Hadoop, Spark, HBase, Presto, and more Enterprise-grade security $ Latest versions Updated with the latest open source frameworks within 30 days of release Low cost Flexible billing with per- second billing, EC2 spot, reserved instances, and auto-scaling to reduce costs 50-80% Use S3 storage Process data directly in the S3 data lake securely with high performance using the EMRFS connector Easy Launch fully managed Hadoop & Spark in minutes; no cluster setup, node provisioning, cluster tuning Data Lake 100110000100101011100 1010101110010101000 00111100101100101 010001100001
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Elasticsearch Service Easy to Use Fully-managed. Deploy production-ready clusters in minutes. Open Direct access to Elasticsearch open-source APIs. Supports Logstash and Kibana. Secure Secure access with VPC to keep all traffic within AWS network. Available Zone awareness replicates data between two AZs; automatically monitors and replaces failed nodes. Easy to deploy, secure, operate, and scale Elasticsearch Customers use Elasticsearch for log analytics, full-text search, and application monitoring
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis – Real time Easily to collect, process, and analyze video and data streams in real time Capture, process, and store video streams for analytics Load data streams into AWS data stores Analyze data streams with SQL Build custom applications that analyze data streams Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics SQL New
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. H i g hly c o n necte d d a t a, b e s t r e p re se nted i n a g r a p h Relational model Foreign keys used to represent relationships Queries can involve nesting & complex joins Performance can degrade as datasets grow Graph model Relationships are first-order citizens Easy to write queries that navigate the graph Results returned quickly, even on large datasets
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Existing graph databasesRelational databases Too expensive Difficult to maintain high availability Difficult to scale Limited support for open standards $ Inefficient graph processing Unnatural for querying graph Rigid schema, inflexible for changing graphs Building apps with highly connected data
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Neptune F u l l y m a n a g e d g r a p h d a t a b a s e Fast ReliableOpen Query billions of relationships with millisecond latency Six replicas of your data across three AZs, with full backup and restore Build powerful queries easily with Gremlin and SPARQL Supports Apache TinkerPop & W3C RDF graph models Gremlin SPARQL Easy
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless, API-Centric Computing
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless Analytics Deliver cost-effective analytic solutions faster Amazon S3 Data Lake AWS Glue (ETL & Data Catalog) Amazon Athena Amazon QuickSight Serverless. Zero Infrastructure. Zero Administration. Never pay for idle resources $ Availability and fault tolerance built in Automatically scales resources with usage AWS IoT Devices Web Sensors Social
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena—interactive analysis Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Ability to run SQL queries on data archived in Amazon Glacier (coming soon) $ SQL Query Instantly Zero setup cost; just point to S3 and start querying Pay per query Pay only for queries run; save 30-90% on per-query costs through compression Open ANSI SQL interface, JDBC/ODBC drivers, multiple formats, compression types, and complex joins and data types Easy Serverless: zero infrastructure, zero administration Integrated with QuickSight
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Redshift Spectrum E x t e n d t h e d a t a w a r e h o u s e t o y o u r S 3 d a t a l a k e S3 data lakeRedshift data Redshift Spectrum query engine Exabyte Redshift SQL queries against S3 Join data across Redshift and S3 Scale compute and storage separately Stable query performance and unlimited concurrency CSV, ORC, Grok, Avro, and Parquet data formats Pay only for the amount of data scanned
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Aurora Serverless  Starts up on demand, shuts down when not in use  Automatically scales with no instances to manage  No impact to applications during scaling events  Pay per second for the database capacity you use Warm pool of instances Application Database Storage Scalable DB capacity Request Router Database end-point
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Instance provisioning and scaling  First request triggers provisioning of a database instance. It typically takes about 5-10 secs.  Instances scale-up and scale-down automatically in response to changes in workloads. Instance scaling takes about 1-3 secs.  Instances are hibernated after a user- defined period of inactivity  Scaling operations are transparent to the application – user sessions are not terminated  Database storage is persisted until explicitly deleted by user Database Storage Warm Pool Application Request Router Current Instance New Instance
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Global Users, Local Processing
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DynamoDB Global Tables ( G A ) Fi r s t f u l l y m a n a g e d , m u l t i - m a s t e r , m u l t i - r e g i o n d a t a b a s e Build high performance, globally distributed applications Low latency reads & writes to locally available tables Disaster proof with multi-region redundancy Easy to set up, and no application rewrites required Globally dispersed users Global Table
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Distributed Lock Manager GLOBAL RESOURCE MANAGER SQL TRANSACTIONS CACHING LOGGING SQL TRANSACTIONS CACHING LOGGING SHARED DISK CLUSTER STORAGE APPLICATION LOCKING PROTOCOL MESSAGES SHARED STORAGE M1 M2 M3 M1 M1 M1M2 M3 M2 Cons Heavyweight cache coherency traffic, on per-lock basis Networking can be expensive Negative scaling when hot blocks Pros All data available to all nodes Easy-to-build applications Similar cache coherency as in multi-processors
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Consensus with two phase or Paxos commit DATA RANGE #1 DATA RANGE #2 DATA RANGE #4 DATA RANGE #3 DATA RANGE #5 L L L L L SHARED NOTHING SQL TRANSACTIONS CACHING LOGGING SQL TRANSACTIONS CACHING LOGGING APPLICATION STORAGE STORAGE Cons Heavyweight commit and membership change protocols Range partitioning can result in hot partitions, not just hot blocks. Re-partitioning expensive. Cross partition operations expensive. Better at small requests Pros Query broken up and sent to data node Less coherence traffic – just for commits Can scale to many nodes
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Conflict resolution using distributed ledgers There are many “oases” of consistency in Aurora The database nodes know transaction orders from that node The storage nodes know transactions orders applied at that node Only have conflicts when data changed at both multiple database nodes AND multiple storage nodes Much less coordination required 2 3 4 5 61 BT1 [P1] BT2 [P1] BT3 [P1] BT4 [P1] BT1 BT2 BT3 BT4 Page1 Quorum OT1 OT2 OT3 OT4 Page 1 Page 2 2 3 4 5 61 OT1[P2] OT2[P2] OT3[P2] OT4[P2] PAGE1 PAGE2 MASTER MASTER Page 2
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hierarchical conflict resolution Both masters are writing to two pages P1 and P2 BLUE master wins the quorum at page P1; ORANGE master wins quorum at P2 Both masters recognize the conflict and have two choices: (1) roll back the transactions or (2) escalate to regional resolver Regional arbitrator decides who wins the tie breaker. 2 3 4 5 61 BT1 [P1] OT1 [P1] 2 3 4 5 61 OT1[P2] BT1[P2] PAGE1 PAGE2 MASTER MASTER BT1 OT1 Regional resolver Page 1 Page 2 Page 1 Page 2 Quorum X X
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Crash recovery in multi-master CRASH MULTI-MASTERSINGLE MASTER Log records Gaps Volume Complete LSN (VCL) AT CRASH IMMEDIATELY AFTER RECOVERY MASTER 1 CRASHES GAPS AT CRASH Consistency Point LSN(CPL) VCLVCL CPL CPL MASTER 1 MASTER 2 IMMEDIATELY AFTER RECOVERY Gap filled New LSNs and Gaps Master 1 Recovery Point Consistency Point LSN(CPL)
  • 32. Multi-region Multi-master Write accepted locally Optimistic concurrency control – no distributed lock manager, no chatty lock management protocol REGION 1 REGION 2 HEAD NODES HEAD NODES MULTI-AZ STORAGE VOLUME MULTI-AZ STORAGE VOLUME LOCAL PARTITION LOCAL PARTITIONREMOTE PARTITION REMOTE PARTITION Conflicts handled hierarchically – at head nodes, at storage nodes, at AZ and region level arbitrators Near-linear performance scaling when there is no or low levels of conflicts
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!