re:Invent re:Cap - Big Data & IoT at Any Scale

•

3 likes•1,090 views

This session covers the most recent Big Data & IoT announcements at re:Invent. Learn about trends and use cases for understanding your data and implementing an Internet of Things (IoT) project. Hear about how AWS customers are using AWS IoT to connect their devices to the cloud and solve business challenges with IoT.

Technology

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BigData & IoT
re:CAP
A d r i a n H o r n s b y – Te c h n i c a l E v a n g e l i s t w i t h A W S
@ a d h o r n

Data for Competitive Advantage
• Customer segmentation
• Marketing spend optimization
• Financial modeling & forecasting
• Ad targeting & real-time bidding
• Clickstream analysis
• Fraud detection
• Security threat detection

The diminishing value of data
• Recent data is highly valuable
• Old + Recent data is more valuable

Relational Database Service (RDS)
AMAZON AURORA
MySQL and PostgreSQL compatible
Several times faster than EC2/RDS
Highly available and durable
1/10th the cost of commercial grade
databases
re:Invent 2015: Thousands of customers
re:Invent 2016: 3.5X more customers
Today: Tens of thousands of customers

Aurora is the fastest growing service in the
history of AWS

Why AWS built Amazon Aurora
 Speed and availability of high-end commercial databases
 Simplicity and cost-effectiveness of open source databases
 Drop-in compatibility with MySQL and PostgreSQL
 Simple pay as you go pricing
Delivered as a managed service

Database architectures in last 30 years
Even when you scale it out, you’re still replicating the same stack
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
Application
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
Application
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
Storage
Application

A service-oriented architecture applied to the database
Moved the logging and storage layer into a
multitenant, scaled-out database-optimized
storage service
Integrated with other AWS services like
Amazon EC2, Amazon VPC, Amazon
DynamoDB, Amazon SWF, and Amazon
Route 53 for control plane operations
Integrated with Amazon S3 for continuous
backup with 99.999999999% durability
Data plane
Logging + Storage
SQL
Transactions
Caching
Amazon S3
1
2
3

Aurora Design
Seamless recovery from read replica failures
Auto-scale new read replicas
Up to 15 read replicas across 3 availability zones
Application
Read Replica 1
Master
Node
Read Replica 2
Shared Distributed Storage Volume
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3

Application
Read/Write
Master 2
Read/Write
Master 1
Shared Distributed Storage Volume
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3
Read/Write
Master 3
Zero application downtime from ANY node failure
Zero application downtime from ANY AZ failure
Multi-region coming in 2018
Faster write performance
Aurora Multi-Masters
First relational database service
with scale-out both read and write across multiple datacenters
(Preview Today)

Unpredictable Workloads are Challenging
DATABASE
REQUESTS
TIME

Aurora Severless
On-demand, auto-scaling database for applications with unpredictable or
cyclical workloads
Automatically
scales capacity
up and down
Pay per second
and only for the
database capacity
you use
Starts up on
demand and shuts
down when not in
use
No need to
provision
instances
(Preview Today)

Starts up on demand, shuts down when not in use
Automatically scales with no instances to manage
Pay per second for the database capacity you use
Aurora Serverless
O n - d e m a n d , a u t o - s c a l i n g d a t a b a s e f o r a p p l i c a t i o n s w i t h
v a r i a b l e w o r k l o a d s
Warm Capacity
Pool
Application

Evolution of Databases
Amazon
DynamoDB
Amazon
ElastiCache
KEY VALUE
DOCUMENT
IN-MEMORY
STORE
AURORA
Amazon RDS
COMMERCIA
L
COMMUNITY HIGHLY
CONNECTED
DATA
N o n - R e l a t i o n a l
D a t a b a s e s
R e l a t i o n a l
D a t a b a s e s

Relational vs. non-relational databases
Traditional SQL NoSQL
DB
Primary Secondary
Scale up
DB
DB
DBDB
DB DB
Scale out

SQL vs. NoSQL schema design
NoSQL design optimizes for
compute instead of storage

WRITES
Replicated continuously to 3
Availability Zones
Persisted to disk (custom SSD)
READS
Strongly or eventually consistent
No latency trade-off
Designed to
support 99.99%
of availability
Built for high
durability
High availability and durability

Amazon DynamoDB:
Fast, Flexible, Nosql Database Service

Prime Day 2017 Metrics
Block Storage – Use of Amazon Elastic Block Store (EBS) grew by 40% year-over-year, with
aggregate data transfer jumping to 52 petabytes (a 50% increase) for the day and total I/O requests
rising to 835 million (a 30% increase).
NoSQL Database – Amazon DynamoDB requests from Alexa, the Amazon.com sites, and the
Amazon fulfillment centers totaled 3.34 trillion, peaking at 12.9 million per second.
Stack Creation – Nearly 31,000 AWS CloudFormation stacks were created for Prime Day in order to
bring additional AWS resources on line.
API Usage – AWS CloudTrail processed over 50 billion events and tracked more than 419 billion
calls to various AWS APIs, all in support of Prime Day.
Configuration Tracking – AWS Config generated over 14 million Configuration items for AWS
resources.

Build high performance, globally distributed applications
Low latency reads and writes to locally available tables
Disaster proof with multi-region redundancy
Easy to setup and no application re-writes required
DYNAMODB GLOBAL TABLES
First fully managed, multi-master, multi-region database
(GA)

Amazon
DynamoDB
AWS LambdaAmazon API
Gateway
Amazon
DynamoDB
AWS LambdaAmazon API
Gateway
Amazon
Route53
eu-west-1
us-east-1
GlobalTables
https://globalddb.adhorn.me/

DynamoDB Backup and Restore
First NoSQL database to automate on-demand and continuous
backups
Point in time restore for
short term retention and
data corruption protection
(Coming soon)
Back up hundreds of TB
instantaneously with NO
performance impact
On-demand backups for
long-term data archival
and compliance
(GA)

CHALLENGES BUILDING APPS WITH
HIGHLY CONNECTED DATA
Difficult to
maintain
high
availability
Difficult to
scale
Relational databases Existing graph databases
Limited
support for
open
standards
Too
expensiv
e
Unnatural for
querying graph
Inefficient
graph
processing
Rigid schema
inflexible for
changing
graphs

Available in preview
today
F A S T A N D
S C A L A B L E
E A S Y
Build powerful
queries easily with
Gremlin and
SPARQL
6 replicas of
your data
across 3 AZs
with full backup
and restore
R E L I A B L E
Supports Apache
TinkerPopTM and
W3C RDF graph
models
OPEN
F u l l y m a n a g e d g r a p h d a t a b a s e
Store billions of
relationships and
query with
milliseconds
latency
Amazon Neptune

Amazon
DynamoDB
Amazon
ElastiCache
KEY VALUE
DOCUMENT
IN-MEMORY
STORE
AURORA
Amazon RDS
COMMERCIA
L
COMMUNITY
Amazon
Neptune
GRAPH
MULTI-
MASTERS
SERVERLES
S
GLOBAL
TABLES
BACLUP&
RESTORE
Evolution of Databases
N o n - R e l a t i o n a l
D a t a b a s e s
R e l a t i o n a l
D a t a b a s e s

Data Lake
How Big Data workloads look like
Collect Store Analyze Visualize

Data Lake on AWS
Amazon Redshift
+ Redshif t Spectru m
Amazon
QuickSight
Amazon EMR
Hadoop, Spark , Presto , Pig,
Hive…19 total
Amazon
Athena
Amazon
Kinesis
Amazon
Elasticsearch Service
AWS Glue
S3 DATA LAKE

Amazon S3: Data Lake on AWS
Most ways to
bring data in
Best security,
compliance, and
audit capabilities
Object-level
controls
Unmatched durability,
availability,
and scalability
Twice as
many partner
integrations
Business
insights
into your data

New API to select and retrieve data within objects
Accelerate any application that processes a subset of object data in S3
Improve data access performance by up to 400%
v
Powerful new S3 capability to pull out only the object data you need using standard SQL expressions
S3 SELECT
8 seconds
Without S3 Select
1.8 seconds
With S3 Select
4.5x faster3
aggregations
1
table
4
filters
COMPLEX PRESTO QUERY
Against a standard TPC-DS dataset
6 sub-queries with each containing:
(Preview Today)

Glacier SELECT
Run queries directly on data stored in Glacier
(GA)
Run queries on data stored at rest in Amazon
Glacier
Any application can query Glacier data
Retrieve only what you need
Makes Glacier part of your data lake

Billions Of Devices Helping Making Better
Business Decisions

Largest And Most Successful IoT
Deployments Run On Aws

Round-trip latency
Intermittent connectivity
Expensive bandwidth
Programming and updating embedded software needs specialized skills
Limited to what is on the device unless you rewrite or program the device
Challenges Of Devices Living On The Edge

Local
Lambda
Local
Device
Shadows
Local
Security
Greengrass
is…
AW
S
Local
Broker
AWS Greengrass
Local compute, messaging, data caching and sync capabilities for connected devices.

Use AWS
Greengrass console
to transfer models
to your devices
Inference on
the device
Devices take
action quickly –
even when
disconnected
AWS Greengrass ML Inference
Build and train
models in the
cloud
Run Machine Learning at the edge
(Preview Today)

Model
Training
Inference
in the Cloud
Inference
at the Edge
Infrastructure to support model build and deploy

Model
Training with
Amazon SageMaker
Inference
at the Edge with
AWS DeepLens
Infrastructure to support model build and deploy

What's hot

Xanadu Based Big Data Deep Learning for Medical Data AnalysisAlex G. Lee, Ph.D. Esq. CLP

SIMA AZ: Emerging Information Technology Innovations & Trends 11/15/17Mark Goldstein

Short introduction to Big Data Analytics, the Internet of Things, and their s...Andrei Khurshudov

IEEE CS Phoenix - Internet of Things Innovations & Megatrends UpdateMark Goldstein

Device to Intelligence, IOT and Big Data in OracleJunSeok Seo

IoT Architecture - are traditional architectures good enough?Guido Schmutz

Xanadu for Big Data + IoT + Deep Learning + Cloud Integration Strategy (YouTu...Alex G. Lee, Ph.D. Esq. CLP

Building Scalable IoT Apps (QCon S-F)Pavel Hardak

Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...Andrei Khurshudov

Internet of Things and Azure - DevTeach 2016Guy Barrette

Internet of Things and Big DataSwiss Data Forum Swiss Data Forum

AI as a Catalyst for IoTmarina romanovich

Powering the Internet of Things with Apache HadoopCloudera, Inc.

Internet of Things propositie - Enterprise IOT - AMIS - ConclusionRobbrecht van Amerongen

Green Compute and Storage - Why does it Matter and What is in ScopeNarayanan Subramaniam

A Review: The Internet of Things Using Fog ComputingIRJET Journal

Cloud-centric Internet of ThingsLynn Langit

Making sense of IoT, M2M and Big DataReduan Hasan Khan, PhD

Enterprise, Architecture and IoTNibodha Technologies

Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi

What's hot (20)

Xanadu Based Big Data Deep Learning for Medical Data Analysis

SIMA AZ: Emerging Information Technology Innovations & Trends 11/15/17

Short introduction to Big Data Analytics, the Internet of Things, and their s...

IEEE CS Phoenix - Internet of Things Innovations & Megatrends Update

Device to Intelligence, IOT and Big Data in Oracle

IoT Architecture - are traditional architectures good enough?

Xanadu for Big Data + IoT + Deep Learning + Cloud Integration Strategy (YouTu...

Building Scalable IoT Apps (QCon S-F)

Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...

Internet of Things and Azure - DevTeach 2016

Internet of Things and Big Data

AI as a Catalyst for IoT

Powering the Internet of Things with Apache Hadoop

Internet of Things propositie - Enterprise IOT - AMIS - Conclusion

Green Compute and Storage - Why does it Matter and What is in Scope

A Review: The Internet of Things Using Fog Computing

Cloud-centric Internet of Things

Making sense of IoT, M2M and Big Data

Enterprise, Architecture and IoT

Guest Lecture: Introduction to Big Data at Indian Institute of Technology

Similar to re:Invent re:Cap - Big Data & IoT at Any Scale

Building with Purpose - Built Databases: Match Your Workloads to the Right Da...Amazon Web Services

Getting Started with Managed Database Services on AWSAmazon Web Services

AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)Amazon Web Services

(DAT207) Amazon Aurora: The New Amazon Relational Database EngineAmazon Web Services

Getting Started with Managed Database Services on AWSAmazon Web Services

Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseAmazon Web Services

AWS December 2015 Webinar Series - Amazon Aurora: Introduction and MigrationAmazon Web Services

Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services

Database and Analytics on the AWS CloudAmazon Web Services

(DAT312) Using Amazon Aurora for Enterprise WorkloadsAmazon Web Services

Databases & Analytics AWS re:invent 2019 RecapSungmin Kim

Getting Started with Amazon RedshiftAmazon Web Services

AWS Storage and Edge ProcessingAmazon Web Services

Amazon Elastic Map Reduce - Ian Meyershuguk

Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Amazon Web Services

(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...Amazon Web Services

ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...Amazon Web Services

Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016Amazon Web Services

2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?Amazon Web Services Korea

Similar to re:Invent re:Cap - Big Data & IoT at Any Scale (20)

Building with Purpose - Built Databases: Match Your Workloads to the Right Da...

Getting Started with Managed Database Services on AWS

AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)

(DAT207) Amazon Aurora: The New Amazon Relational Database Engine

Getting Started with Managed Database Services on AWS

Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse

AWS December 2015 Webinar Series - Amazon Aurora: Introduction and Migration

Understanding AWS Managed Database and Analytics Services | AWS Public Sector...

Database and Analytics on the AWS Cloud

(DAT312) Using Amazon Aurora for Enterprise Workloads

Databases & Analytics AWS re:invent 2019 Recap

Getting Started with Amazon Redshift

AWS Storage and Edge Processing

Amazon Elastic Map Reduce - Ian Meyers

Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...

(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...

ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...

Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016

2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?

Recently uploaded

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

AI as an Interface for Commercial BuildingsMemoori

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

The transition to renewables in India.pdfCompetition Advisory Services (India) LLP

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

How to convert PDF to text with Nanonetsnaman860154

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

AI as an Interface for Commercial Buildings

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

The transition to renewables in India.pdf

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Advanced Test Driven-Development @ php[tek] 2024

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget

Breaking the Kubernetes Kill Chain: Host Path Mount

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

How to convert PDF to text with Nanonets

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Unblocking The Main Thread Solving ANRs and Frozen Frames

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Maximizing Board Effectiveness 2024 Webinar.pptx

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads

re:Invent re:Cap - Big Data & IoT at Any Scale

2. Data for Competitive Advantage • Customer segmentation • Marketing spend optimization • Financial modeling & forecasting • Ad targeting & real-time bidding • Clickstream analysis • Fraud detection • Security threat detection

3. The diminishing value of data • Recent data is highly valuable • Old + Recent data is more valuable

4. Relational Database Service (RDS) AMAZON AURORA MySQL and PostgreSQL compatible Several times faster than EC2/RDS Highly available and durable 1/10th the cost of commercial grade databases re:Invent 2015: Thousands of customers re:Invent 2016: 3.5X more customers Today: Tens of thousands of customers

5. Aurora is the fastest growing service in the history of AWS

6. Why AWS built Amazon Aurora  Speed and availability of high-end commercial databases  Simplicity and cost-effectiveness of open source databases  Drop-in compatibility with MySQL and PostgreSQL  Simple pay as you go pricing Delivered as a managed service

7. Database architectures in last 30 years Even when you scale it out, you’re still replicating the same stack SQL Transactions Caching Logging SQL Transactions Caching Logging Application SQL Transactions Caching Logging SQL Transactions Caching Logging Application SQL Transactions Caching Logging SQL Transactions Caching Logging Storage Application

8. A service-oriented architecture applied to the database Moved the logging and storage layer into a multitenant, scaled-out database-optimized storage service Integrated with other AWS services like Amazon EC2, Amazon VPC, Amazon DynamoDB, Amazon SWF, and Amazon Route 53 for control plane operations Integrated with Amazon S3 for continuous backup with 99.999999999% durability Data plane Logging + Storage SQL Transactions Caching Amazon S3 1 2 3

9. Aurora Design Seamless recovery from read replica failures Auto-scale new read replicas Up to 15 read replicas across 3 availability zones Application Read Replica 1 Master Node Read Replica 2 Shared Distributed Storage Volume Availability Zone 1 Availability Zone 2 Availability Zone 3

10. Application Read/Write Master 2 Read/Write Master 1 Shared Distributed Storage Volume Availability Zone 1 Availability Zone 2 Availability Zone 3 Read/Write Master 3 Zero application downtime from ANY node failure Zero application downtime from ANY AZ failure Multi-region coming in 2018 Faster write performance Aurora Multi-Masters First relational database service with scale-out both read and write across multiple datacenters (Preview Today)

11. Unpredictable Workloads are Challenging DATABASE REQUESTS TIME

12. Aurora Severless On-demand, auto-scaling database for applications with unpredictable or cyclical workloads Automatically scales capacity up and down Pay per second and only for the database capacity you use Starts up on demand and shuts down when not in use No need to provision instances (Preview Today)

13. Starts up on demand, shuts down when not in use Automatically scales with no instances to manage Pay per second for the database capacity you use Aurora Serverless O n - d e m a n d , a u t o - s c a l i n g d a t a b a s e f o r a p p l i c a t i o n s w i t h v a r i a b l e w o r k l o a d s Warm Capacity Pool Application

14. Evolution of Databases Amazon DynamoDB Amazon ElastiCache KEY VALUE DOCUMENT IN-MEMORY STORE AURORA Amazon RDS COMMERCIA L COMMUNITY HIGHLY CONNECTED DATA N o n - R e l a t i o n a l D a t a b a s e s R e l a t i o n a l D a t a b a s e s

15. Relational vs. non-relational databases Traditional SQL NoSQL DB Primary Secondary Scale up DB DB DBDB DB DB Scale out

16. SQL vs. NoSQL schema design NoSQL design optimizes for compute instead of storage

17. WRITES Replicated continuously to 3 Availability Zones Persisted to disk (custom SSD) READS Strongly or eventually consistent No latency trade-off Designed to support 99.99% of availability Built for high durability High availability and durability

18. Amazon DynamoDB: Fast, Flexible, Nosql Database Service

19. Prime Day 2017 Metrics Block Storage – Use of Amazon Elastic Block Store (EBS) grew by 40% year-over-year, with aggregate data transfer jumping to 52 petabytes (a 50% increase) for the day and total I/O requests rising to 835 million (a 30% increase). NoSQL Database – Amazon DynamoDB requests from Alexa, the Amazon.com sites, and the Amazon fulfillment centers totaled 3.34 trillion, peaking at 12.9 million per second. Stack Creation – Nearly 31,000 AWS CloudFormation stacks were created for Prime Day in order to bring additional AWS resources on line. API Usage – AWS CloudTrail processed over 50 billion events and tracked more than 419 billion calls to various AWS APIs, all in support of Prime Day. Configuration Tracking – AWS Config generated over 14 million Configuration items for AWS resources.

20. Build high performance, globally distributed applications Low latency reads and writes to locally available tables Disaster proof with multi-region redundancy Easy to setup and no application re-writes required DYNAMODB GLOBAL TABLES First fully managed, multi-master, multi-region database (GA)

21.

22.

23.

24.

25.

26. Amazon DynamoDB AWS LambdaAmazon API Gateway Amazon DynamoDB AWS LambdaAmazon API Gateway Amazon Route53 eu-west-1 us-east-1 GlobalTables https://globalddb.adhorn.me/

27. Route53: Traffic Policy

28. AWS Lambda Function

29. DynamoDB Backup and Restore First NoSQL database to automate on-demand and continuous backups Point in time restore for short term retention and data corruption protection (Coming soon) Back up hundreds of TB instantaneously with NO performance impact On-demand backups for long-term data archival and compliance (GA)

30.

31.

32.

33.

34.

35.

36. CHALLENGES BUILDING APPS WITH HIGHLY CONNECTED DATA Difficult to maintain high availability Difficult to scale Relational databases Existing graph databases Limited support for open standards Too expensiv e Unnatural for querying graph Inefficient graph processing Rigid schema inflexible for changing graphs

37. Available in preview today F A S T A N D S C A L A B L E E A S Y Build powerful queries easily with Gremlin and SPARQL 6 replicas of your data across 3 AZs with full backup and restore R E L I A B L E Supports Apache TinkerPopTM and W3C RDF graph models OPEN F u l l y m a n a g e d g r a p h d a t a b a s e Store billions of relationships and query with milliseconds latency Amazon Neptune

38. Amazon DynamoDB Amazon ElastiCache KEY VALUE DOCUMENT IN-MEMORY STORE AURORA Amazon RDS COMMERCIA L COMMUNITY Amazon Neptune GRAPH MULTI- MASTERS SERVERLES S GLOBAL TABLES BACLUP& RESTORE Evolution of Databases N o n - R e l a t i o n a l D a t a b a s e s R e l a t i o n a l D a t a b a s e s

39. Data Lake How Big Data workloads look like Collect Store Analyze Visualize

40. Data Lake on AWS Amazon Redshift + Redshif t Spectru m Amazon QuickSight Amazon EMR Hadoop, Spark , Presto , Pig, Hive…19 total Amazon Athena Amazon Kinesis Amazon Elasticsearch Service AWS Glue S3 DATA LAKE

41. Amazon S3: Data Lake on AWS Most ways to bring data in Best security, compliance, and audit capabilities Object-level controls Unmatched durability, availability, and scalability Twice as many partner integrations Business insights into your data

42. Analytics Happening On AWS

43. The diminishing value of data • Recent data is highly valuable • Old + Recent data is more valuable

44. New API to select and retrieve data within objects Accelerate any application that processes a subset of object data in S3 Improve data access performance by up to 400% v Powerful new S3 capability to pull out only the object data you need using standard SQL expressions S3 SELECT 8 seconds Without S3 Select 1.8 seconds With S3 Select 4.5x faster3 aggregations 1 table 4 filters COMPLEX PRESTO QUERY Against a standard TPC-DS dataset 6 sub-queries with each containing: (Preview Today)

45. Glacier SELECT Run queries directly on data stored in Glacier (GA) Run queries on data stored at rest in Amazon Glacier Any application can query Glacier data Retrieve only what you need Makes Glacier part of your data lake

46. Data for Competitive Advantage • Customer segmentation • Marketing spend optimization • Financial modeling & forecasting • Ad targeting & real-time bidding • Clickstream analysis • Fraud detection • Security threat detection

47. Billions Of Devices Helping Making Better Business Decisions

48. Largest And Most Successful IoT Deployments Run On Aws

49. Round-trip latency Intermittent connectivity Expensive bandwidth Programming and updating embedded software needs specialized skills Limited to what is on the device unless you rewrite or program the device Challenges Of Devices Living On The Edge

50. Local Lambda Local Device Shadows Local Security Greengrass is… AW S Local Broker AWS Greengrass Local compute, messaging, data caching and sync capabilities for connected devices.

51. AWS Greengrass: How it works?

52. Use AWS Greengrass console to transfer models to your devices Inference on the device Devices take action quickly – even when disconnected AWS Greengrass ML Inference Build and train models in the cloud Run Machine Learning at the edge (Preview Today)

53. Model Training Inference in the Cloud Inference at the Edge Infrastructure to support model build and deploy

54. Model Training with Amazon SageMaker Inference at the Edge with AWS DeepLens Infrastructure to support model build and deploy

55. G O B U I L D

Editor's Notes

Customers have also found tremendous value in being able to mine this data to make better medicine, tailored purchasing recommendations, detect fraudulent financial transactions in real time, provide on-demand digital content such as movies and songs, predict weather forecasts, the list goes on and on. The core job of analytics is to help companies gain insight into their customers. Then, the companies can optimize their marketing and deliver a better product. Data driven - > Netflix use case. So how does Netflix use analytics? “There are 33 million different versions of Netflix.” – Joris Evers, Director of Global Communications Netflix Uses Analytics To Select Movies, Create Content, and Make Multimillion Dollar Decisions
Narrative: So how much is this data worth? Well, it depends… Recent data is highly valuable If you act on it in time Perishable Insights (M. Gualtieri, Forrester) Old + Recent data is more valuable If you have the means to combine them Narrative: Processing real-time data as it arrives can let you make decisions much faster and get the most value from your data. But, building your own custom applications to process streaming data is complicated and resource intensive. You need to train or hire developers with the right skillsets, and then wait for months for the applications to be built and fine-tuned, and the operate and scale the application as the business grows. All of this takes lots of time and money, and, at the end of the day, lots of companies just never get there, settle for the status-quo, and live with information that is hours or days old.
This is why enterprises have been moving, as fast as they can, as many of their databases to open source database engines like MySQL, MariaDB, and Postgres. However, to try and get the same performance from those open source engines that you get in the commercial grade databases is hard. It's possible. We've done a lot of it at Amazon. But it takes a lot of tuning. Customers want to move from proprietary databases over to open source and want a fully managed environment with automated provisioning, configuration, tuning, patching and backups, all with lower cost and simple pay-as-you-go pricing. We also heard from customers that in addition to the familiarity of open source databases, and the time saving benefits of managed database services in the cloud, they want the enterprise-grade performance and reliability that old-world databases offer, which is why we built Amazon Aurora.
1/ Grown by 2.5X again…tens of thousands of customers 2/ FINRA 3/ Expedia 4/ Verizon 5/ CBS Interactive 6/ Dow Jones 7/ Hulu 8/ TRANSITION: There are a lot of things people love about Aurora…
Before we started working on Aurora earlier this decade…
…because if you look at database architectures in last 30 years..
We radically changed this architecture with Amazon Aurora. We delivered a MySQL 5.6 compatible engine where we used distributed infrastructure of AWS to create a purpose built logging and storage system that sits completely outside the database box.
1/ Customers love the high performance and high availability they get from Aurora… and the scale-out architecture we’ve built is a big part of this. 2/ Customers can scale-OUT database read capacity by seamlessly adding up to 15 copies of your data through read replicas. This allows customers to scale to millions of database read statements per second. 3/ We also recently added the ability for Aurora to automatically add new read replicas (up to 15) as your application load on the database grows. 4/ This architecture also gives customers high levels of availability as Aurora will immediately route reads to an alternate replica if one of the read replicas fail. 5/ Customers love this architecture, but they’ve been asking us for even more scalability and availability…. In particular, they are asking us to scale out and provide seamless recovery not just for database reads, but also for database writes and to do this across multiple datacenters and multiple regions. 6/ Today Aurora databases run with a single master instance which processes all database write requests. If the master fails, Aurora will promote a read replica to become the new master in under 30 seconds. While this is considerably less than other databases for recovery of a master node, we asked ourselves if we could provide the same seamless recovery and scale-out for writes as we do for reads. 7/ And I am very excited to announce…
1/ Customers love the high performance and high availability they get from Aurora… and the scale-out architecture we’ve built is a big part of this. 2/ Customers can scale-OUT database read capacity by seamlessly adding up to 15 copies of your data through read replicas. This allows customers to scale to millions of database read statements per second. 3/ We also recently added the ability for Aurora to automatically add new read replicas (up to 15) as your application load on the database grows. 4/ This architecture also gives customers high levels of availability as Aurora will immediately route reads to an alternate replica if one of the read replicas fail. 5/ Customers love this architecture, but they’ve been asking us for even more scalability and availability…. In particular, they are asking us to scale out and provide seamless recovery not just for database reads, but also for database writes and to do this across multiple datacenters and multiple regions. 6/ Today Aurora databases run with a single master instance which processes all database write requests. If the master fails, Aurora will promote a read replica to become the new master in under 30 seconds. While this is considerably less than other databases for recovery of a master node, we asked ourselves if we could provide the same seamless recovery and scale-out for writes as we do for reads. 7/ And I am very excited to announce…
1/ As we’ve been discussing, customers love the performance, open source capability, and price of Aurora 2/ But, they have some workloads that don’t require databases to run often 3/ Could be bursty workloads like development and test, flash sales or blogs 4/ Could be unpredictable workloads like weather disaster sites 5/ Could be workloads that just get action at a couple times a day 6/ Yet, these customers have no option in the relational database market anywhere that doesn’t force them to buy the software and hardware and pay for it full time 7/ Customers have said, Hey I know this is hard but can you fix? 8/ Introducing….
...Aurora Serverless, an on-demand version of Aurora. 1/ Aurora Serverless has virtually all the same benefits as Aurora but… 2/ Doesn’t require you to provision instances 3/ Automatically scales capacity up and down when needed 4/ Starts up on demand and shuts down when not in use 5/ Pay per second and only for the database capacity you consume
1/ Another type of non relational database that has gained popularity is in-memory data stores which are commonly used as distributed data caches. 2/ Distributed caches are often used with relational and non-relational databases to speed-up data access. 3/ This is useful for when the application needs microsecond latency when millisecond latency is just not fast enough. 4/ Distributed caches also allow you to deliver massive throughput by keeping frequently stored information in memory, and reducing the load on your existing database. 5/ And, that is why we built ElastiCache. 6/ ElastiCache offers managed in memory stores for both Redis and MemcacheD In addition to relational, key-value and in-memory stores, customers today are also building applications that rely on storing and navigating highly connected data.
Relational - Data is normalized. To enable joins, You are tied to a single partition and a single system. performance on the hardware specs of the primary server. To improve performance, Optimize -- Move to a bigger box. You may still run out of headroom. Create Read Replicas. You will still run out. Scale UP. NoSQL -- NoSQL databases were designed specifically to overcome scalability issues. Scale “out” data using distributed clusters, low-cost hardware, throughput + low latency Therefore, Using NoSQL, businesses can scale virtually without limit.
Generic product catalog. Table relationships in normalized. A product could be a book – say the Harry Potter Series. There’s a 1:1 relationship. Or it could be a movie.. You can imagine the types of queries that you’d have to execute. 1. Show me all the movies starring. 2. the entire product catalog. This is Resource intensive – perform complex join ** NoSQL you have to ask – how will the application access the data? optimize for the costlier asset. No joins. Just a select. Hierarchical structures. Designed by keeping in mind Access patterns. Via duplication of data (storage), optimized for compute, it is fast.
13/35. 4 more regions. DynamoDB is highly durable. AWS has a concept of regions and Availability zones. AWS region is a geographic area. Each region has multiple availability zones. Each AZ has 1 or more physical DCs. They have redundant power and cooling, and interconnected via high speed low latency fiber. Take for example the AWS region in NVIrgina. It has 4 Azs. When you create a DynamoDB table in Nvirgina, we will replicate the data to 3 Azs. All the data is stored in SSDs. A lot of value built into DynamoDB– a few clicks.
1/ Can see with how many companies are using DynamoDB today…Snap, Lyft, Tinder, Redfin, Comcast, Under Armour, BMW, and Toyota 2/ As an increasing number of customers build geographically distributed applications with high performance and availability needs, these applications require the data in DynamoDB tables to be replicated, and locally available in multiple regions. 3/ Today, DynamoDB already provides intra-region data replication across three availability zones. 4/ In addition, customers can replicate their DynamoDB table data across multiple regions with an open source command line tool. 5/ However, this approach can be time consuming and complex to manage, and many of our customers have asked us whether we can automate this process for them. 6/ So today, we are pleased to announce…
1/ With Global Tables, DynamoDB is the first fully managed, multi-master, multi-region database. 2/ Since your data is now replicated across multiple regions, your globally distributed applications benefit from low latency reads and writes from the locally available tables. 3/ This is especially important for customers who have users all over the world and cannot afford to have any delay or lag in their application experience. For example, an Expedia customer using their mobile application in North America, should have the same responsive user experience when receiving personalized recommendations or updating their user profile when they travel to Europe or Asia. The application should be able to read from and write to a locally available database closest to the user no matter where the user is travelling in the world. 4/ Global Tables also ensures data redundancy across multiple regions and allows the database to stay available even in the event of a complete regional outage. 5/ Global Tables is easy to setup with a few clicks. Customers simply select the regions where data should be replicated, and DynamoDB handles the rest. This frees developers to focus on building their applications and business rather than worry about database administration tasks. 6/ DynamoDB Global Tables is Generally Available today, we hope you will check it out. 1/ To meet this need, today, we are adding On-Demand Backups and Point In Time Restore to DynamoDB 2/ With these new capabilities, DynamoDB is the first NoSQL database to automate on demand and continuous backups. 3/ On-Demand Backups allow customers to create FULL backups of their data instantaneously for long term data archival and to comply with corporate and governmental regulatory requirements. 4/ Point In Time Restore (PITR) allows customers to restore their data up to the minute for the past 35 days. When enabled, DynamoDB will continuously backup all customer data to protect from short-term loss due to application errors. 5/ Customers with data volumes that are 100s of TBs large, serving single digit millisecond latency workloads, can now back up their data instantaneously with NO performance impact to their production applications. 6/ No other cloud database provides this capability today. 7/ On Demand Backup is Generally Available today and Point in Time Restore is coming in early 2018…
1/ A lot of applications being built today need to understand and navigate relationships between highly connected data to enable use cases like social applications, recommendation engines, fraud detection, etc 2/ If you are building a restaurant recommendation app, you want the app to provide recommendations of restaurants of a certain cuisine like Sushi in a city like New York that at least two of the users friends also like. 3/ In all of these use cases, because the data is highly connected, it is easily represented as a graph shown here. 4/ TRANSITION: You can perform this song with a trumpet even though it calls for a saxophone, but a trumpet isn’t a saxophone… 1/ Today, customers build application on highly connected data with either their existing relational databases or with purpose-built graph databases. Both approaches today are sub-optimal. 2/ If you were to try and represent this data in a relational model, you would end up with multiple tables with multiple foreign keys and your queries would quickly become unwieldy involving nested queries and complex joins that won’t perform as your data size grows over time. 3/ Today’s graph database options are typically open source or commercially licensed. 4/ Open source editions are hard to scale and lack enterprise capabilities such as high availability and management. 5/ Commercial options are expensive and force customers to choose between either the Property Graph (e.g. Apache TinkerPop™) and RDF graph models regardless of their application needs. Support for Open APIs in such solutions tend to be bolt-ons and users often need to use proprietary APIs for best performance. 6/ What customers really want is a graph database service that is compatible with leading graph models, features open APIs, and is also fully managed, fast, scalable and cost effective… 7/ Introducing…
…Amazon Neptune, a fast, reliable, fully-managed graph database that makes it easy to build and run applications that need to work with highly connected datasets. 1/ Neptune gives developers flexibility by supporting both Tinkerpop and RDF graph models 2a/ It’s really fast and scalable – can create sophisticated, interactive graph applications storing billions of relationships and querying the graph with milliseconds latency 2b/ Neptune’s core is a purpose built high performance database engine optimized for graphs 2c/ Enables 15 low latency read replicas allowing 100K queries/second 3/ Very reliable – Four 9s of availability, fault tolerant and self healing storage built for cloud that replicates your data across 3 AZs and continuously backs up your data to S3 4/ Easy – Query processing engine optimized for both Gremlin and SPARQL (SPARKLE) 5/ Available today in preview
1/ So you can see in the new world of cloud born applications, a one-size-fits-all database model no longer works. 2/ All modern organizations will use multiple DB types, some multiple in same app 3/ At AWS, our goal is to provide you with the right tool for the job, and nobody has the breadth of DB capabilities available for you that AWS does
A foundation of highly durable data storage and streaming of any type of data A metadata index and workflow which helps us categorise and govern data stored in the data lake A search index and workflow which enables data discovery A robust set of security controls – governance through technology, not policy An API and user interface that expose these features to internal and external users
1/ For unstructured ad hoc queries on things like logs, raw event files, and click-stream data, Athena is a great solution 2/ For processing vast amounts of unstructured data across dynamically scalable clusters using popular distributed frameworks like Spark, Hadoop, Presto, Pig, Hive, Yarn (16 in all), EMR is a great solution (we have) 3/ For complex queries on large collections of unstructured data with super-fast performance, you can use Redshift as your data warehouse solution…and if you want to extend your queries beyond the optimized, local Redshift cluster, you can use Redshift Spectrum to extend these Redshift queries to run directly on your S3 data 4/ For customers wanting to run real-time operational intelligence and document search analysis, can run our managed Elasticsearch service 5/ For real-time processing of streaming data, customers are using Kinesis (especially for streaming data from edge connected devices to and from the cloud) 6/ For business intelligence and visualization, QuickSight 7/ And, to do ETL (extract, transform, and load) as well as move data around in the cloud and across data stores and analytics services, AWS has Glue…this is an unmatched set of analytics services that give customers the right tool for the right analytics need The thing is that with a lot of the querying they don’t need all the data in the objects. Today the way they do it-- whether you are querying in place or using your own applications,-- the analytics application has to take all the data out before it can process it, which adds cost and impacts performance. People really just want to pinpoint that exact data they want to query and pull that out instead of the whole object. For example, let’s say you’re running analytics on web site log file data related to iOS 10 users, and those users only represent 10% of the log file data you store in S3 objects. In this case, your analytics application really only needs to process a small subset of the data within a bunch of your S3 objects. The analytics application needs to do too much hard work here. 1/It pull all the relevant objects out of S3. 2/find and extract the iOS 10 data from inside of the S3 objects used to store web site log file data 3/then it can finally process the data. All of this means you are moving around and processing a lot of data that isn’t relevant to the query you want to run, so it slows things down and cost more than it should. So we decided to give you a new way for your applications to run these queries.
With much more operating experience and scale, and a much broader set of features and capability than available anywhere else, S3 is the clear number one choice for a data lake. There are a few reasons why Amazon S3 is the world’s most popular cloud storage platform for data lakes. 1/ S3 has unmatched durability, availability, and scalability. Only S3 replicates your data in three availability zones within a single region. This give you unmatched resilience to single data center issues like power failures. Only S3 lets customers do cross region replication seamlessly without having to use a separate storage class, Only S3 lets you choose which regions you want to replicate into (and as many as you want to replicate to). 2/ S3 has the best security, compliance, and audit capability. Only S3 lets you replicate data from one region to another with cross-region replication using company-specific keys stored by Amazon’s key management service for the encryption between regions. S3 Cross-region replication also lets you use separate accounts for the source and destination regions, protecting against malicious insider deletions of backup data. S3 also has the most depth in security and compliance controls, offering capabilities you just can’t get in other options like the ability to audit how, when, and who is accessing individual objects in S3 through CloudTrail Data Events. Amazon Macie, an AI-powered security service, automatically monitors CloudTrail S3 audit trails, detecting and alarming on anomalies that might indicate early stages of an attack like an outsider trying to enumerate role privileges for your storage. S3 also offers a daily inventory report listing all the objects in a bucket, including important details like encryption status, for security report-outs on storage status. 3/ S3 offers object-level control at any scale. Other providers force you to set policies broadly across all objects in a bucket, which customer find frustrating and too coarse for enterprise needs. With S3, lifecycle policies can automatically delete or tier groups of objects that share a common tag or a prefix like a department code. Only S3 gives you the option of setting multiple editable tags on an individual object, letting you label an object by project ID, compliance requirements, or other business taxonomy. This lets you set up lifecycle policies to delete or tier storage based on tag, or use the tags to restrict access to the object. 4/ S3 gives you business insight into your data. Only S3 has Storage Class Analysis that analyzes storage request patterns and provides recommendations for setting up tiering to lower cost storage classes. Export data that Storage Class Analysis uses for recommendations to a CSV file and use your favorite BI tool, like Quicksight, to generate custom reports like heat maps for groups of objects. Amazon Macie, which I mentioned earlier, automatically classifies your storage by content type so you understand how your storage changes over time. 5/ One more thing that is really important with a data lake is making it easy to ingest data, and AWS offers more ways to bring data into S3 than anyone else – by far. AWS Snowball, which are physical devices that let you move petabytes of data into S3, AWS Snowmobile for exabytes Direct Connect, which is like your own private data pipe, and S3 Transfer Acceleration, a unique way to make the Internet go up to 500% faster. 6/ As I said before, we have more than twice as many integrations with storage partners compared to any other cloud platform. This means that it’s easy to use S3 with what you already have from folks like NetApp, EMC, Veritas and Cloudera, Use cases like Primary Storage, Backup and Restore, Archive, Disaster Recovery and Analytics. In addition to integration with most AWS services, the Amazon S3 ecosystem includes tens of thousands of consulting, systems integrator, and ISV partners, AWS Marketplace offers 35 categories and more than 3,500 software listings from over 1,100 Independent Software Vendors that are pre-configured to deploy on the AWS Cloud. No other cloud provider has more partners with solutions that are pre-integrated to work with Amazon S3.
1/ Pinterest 2/ Philips 3/ #m 4/ NTT Docomo 5/ GE 6/ TRANSITION: One of our analytics customers is Goldman Sachs…people sometime don’t realize how sophisticated and technically strong Fin Services cos are, but to share how they’re using AWS, it’s my pleasure to introduce Managing Director, Roy Joseph from Goldman Sachs
Narrative: So how much is this data worth? Well, it depends… Recent data is highly valuable If you act on it in time Perishable Insights (M. Gualtieri, Forrester) Old + Recent data is more valuable If you have the means to combine them Narrative: Processing real-time data as it arrives can let you make decisions much faster and get the most value from your data. But, building your own custom applications to process streaming data is complicated and resource intensive. You need to train or hire developers with the right skillsets, and then wait for months for the applications to be built and fine-tuned, and the operate and scale the application as the business grows. All of this takes lots of time and money, and, at the end of the day, lots of companies just never get there, settle for the status-quo, and live with information that is hours or days old.
…S3 Select, a powerful new Amazon S3 capability to pull out only the object data you need using standard SQL expressions <PAUSE FOR CLAPPING> 1/ S3 Select dramatically improves the performance and reduces the cost of applications that need to query data in S3. 2/ Applications only retrieve a subset of data from an S3 object instead of retrieving the entire object. You filter the data using standard SQL expressions like SELECT, FROM, or WHERE. 3/ S3 Select can improve the performance of most applications that frequently access data from S3 by up to 400%. 4/ Example: if a retailer needed to analyze the weekly sales data from just one store, but the data for all 200 stores was saved in a new object every day. Without S3 Select, the retailer’s analytics application would have to retrieve the complete set of S3 objects and then filter out just the required store data before being able to perform the analysis. With S3 Select, you can offload the heavy lifting of filtering data inside objects to the Amazon S3 service. 5/ Now, your analytics application just calls the S3 Select API to retrieve only the data from the one store you are interested in. Your analytics application only has to process just the data for that store, greatly increasing performance and reducing the processing cost for your application. 1/ With S3 Select the same query time was reduced to 1.8 seconds. 2/ This reduced the query response time by 78%, 3/ 4.5X faster performance. 4/ S3 Select can be used to dramatically accelerate any application that queries data in Amazon S3. 5/ No other cloud object storage service can even come close to this kind of performance. 6/ Almost all AWS customers use S3 in some way, and so we are really excited about this new feature that makes the world’s best cloud storage even better – more cost effective, able to encompass more data, and optimized for analytics.
…S3 Select, a powerful new Amazon S3 capability to pull out only the object data you need using standard SQL expressions <PAUSE FOR CLAPPING> 1/ S3 Select dramatically improves the performance and reduces the cost of applications that need to query data in S3. 2/ Applications only retrieve a subset of data from an S3 object instead of retrieving the entire object. You filter the data using standard SQL expressions like SELECT, FROM, or WHERE. 3/ S3 Select can improve the performance of most applications that frequently access data from S3 by up to 400%. 4/ Example: if a retailer needed to analyze the weekly sales data from just one store, but the data for all 200 stores was saved in a new object every day. Without S3 Select, the retailer’s analytics application would have to retrieve the complete set of S3 objects and then filter out just the required store data before being able to perform the analysis. With S3 Select, you can offload the heavy lifting of filtering data inside objects to the Amazon S3 service. 5/ Now, your analytics application just calls the S3 Select API to retrieve only the data from the one store you are interested in. Your analytics application only has to process just the data for that store, greatly increasing performance and reducing the processing cost for your application. 1/ With S3 Select the same query time was reduced to 1.8 seconds. 2/ This reduced the query response time by 78%, 3/ 4.5X faster performance. 4/ S3 Select can be used to dramatically accelerate any application that queries data in Amazon S3. 5/ No other cloud object storage service can even come close to this kind of performance. 6/ Almost all AWS customers use S3 in some way, and so we are really excited about this new feature that makes the world’s best cloud storage even better – more cost effective, able to encompass more data, and optimized for analytics.
Customers have also found tremendous value in being able to mine this data to make better medicine, tailored purchasing recommendations, detect fraudulent financial transactions in real time, provide on-demand digital content such as movies and songs, predict weather forecasts, the list goes on and on. The core job of analytics is to help companies gain insight into their customers. Then, the companies can optimize their marketing and deliver a better product. Data driven - > Netflix use case. So how does Netflix use analytics? “There are 33 million different versions of Netflix.” – Joris Evers, Director of Global Communications Netflix Uses Analytics To Select Movies, Create Content, and Make Multimillion Dollar Decisions
Today there are billions of devices everywhere. They are in homes, factories, oil wells, agricultural fields, hospitals, cars, machinery, and thousands of other places. With the proliferation of these IoT devices, enterprises are increasingly having to manage infrastructure that is not located in a data center. In fact, when companies think about their on premises footprint in 10 years, servers will have moved to the cloud, and connected devices will be on-premises - literally everywhere. The number of devices out there has been exploding because companies are finding that the closer they can collect and respond to the data at the source the better decisions they can make. For example there are rainfall and weather sensors that can make irrigation more efficient and saves water. Light sensors can help make lighting more efficient. Sensors on jet engines help with operational efficiencies. And transportation sensors can help with route management to optimize on travel time and avoid accidents or weather related issues. But the thing is, when you look at these devices, they tend to be relatively limited in their capabilities. They have a very small amount of CPU and a very small amount of disk. This is why the cloud is disproportionately important to these IoT devices.
Illumina John Deere BamTech/Statcast Enel/Engie BMW (collects sensor data from cars to give dynamically updated map info) Under Armour Connected Fitness Platform (used by 180M customers WW)
I’m excited to announce Greengrass ML Inference, a new feature of Greengrass that brings machine learning to the edge . <PAUSE FOR CLAPPING> Today, the most common way ML gets done is that first you go through the compute intensive process of building and training the model. Then you can use the model to recognizes patterns in new data and do inference for your applications. Usually, all of this is done in the cloud. But with devices at the edge, there is a big advantage if you can do inference right on the device itself, so you can reduce the time and cost of sending the device data up to the cloud to get your predict and then waiting to get the result back down to the device to take action. Greengrass ML Inference is different. With ML@ Edge, you build and train your models in the cloud, using a service like Sagemaker or whatever you want. Then you use the Greengrass console to transfer the models down to your devices so inference can be done right on the device itself. This lets your devices make smart decisions quickly even when they are disconnected. With Greengrass ML Inference, application developers can add machine learning to their devices without having any special machine learning skills. Greengrass ML Inference, and it changes what is possible for IoT, machine learning, and the edge. Its super cool, and we think people will be excited to try it.

re:Invent re:Cap - Big Data & IoT at Any Scale

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to re:Invent re:Cap - Big Data & IoT at Any Scale

Similar to re:Invent re:Cap - Big Data & IoT at Any Scale (20)

More from Adrian Hornsby

More from Adrian Hornsby (20)

Recently uploaded

Recently uploaded (20)

re:Invent re:Cap - Big Data & IoT at Any Scale

Editor's Notes