SlideShare a Scribd company logo
1 of 130
Download to read offline
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data Analytics and
Machine Learning on AWS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
WHAT IS BIG DATA?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data and the 3Vs
Variety
Velocity
Volume
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Elastic and highly scalable
No upfront capital expense
Only pay for what you use
+
+
Available on-demand
+
The Cloud Advantage
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BIG DATA ANALYTICS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples of Business Outcomes and Insights
Ø Security threat detection
Ø User Behavior Analysis
Ø Enhanced customer experience
Ø Business Intelligence
Ø Spending optimization
Ø Real-time alerting
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Relational Databases
NoSQL Databases
Web servers
Mobile phones/Tablets
3rd party feeds
IoT
Clickstream
Examples of Big Data Sources
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples of AWS Services for Big Data Analytics
EMR EC2
Glacier
S3
Import Export
Kinesis
Direct Connect
Machine LearningRedshift
DynamoDB
AWS Database
Migration Service
AWS Lambda
AWS IoT
AWS Data Pipeline
Amazon KinesisAnalytic
Analytics
Amazon
SNS
AWS Snowball
Amazon
SWF
AmazonAthena
Amazon
QuickSight
Amazon AuroraAWS Glue
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3—Object Storage
Security and
Compliance
Three different forms of
encryption; encrypts data
in transit when
replicating across regions;
log and monitor with
CloudTrail, use ML to
discover and protect
sensitive data with Macie
Flexible Management
Classify, report, and
visualize data usage
trends; objects can be
tagged to see storage
consumption, cost, and
security; build lifecycle
policies to automate
tiering, and retention
Durability, Availability
& Scalability
Built for eleven nine’s of
durability; data
distributed across 3
physical facilities in an
AWS region;
automatically replicated
to any other AWS region
Query in Place
Run analytics & ML on
data lake without data
movement; S3 Select can
retrieve subset of data,
improving analytics
performance by 400%
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift—Data Warehousing
Fast at scale
Columnar storage
technology to improve
I/O efficiency and scale
query performance
Secure
Audit everything; encrypt
data end-to-end;
extensive certification
and compliance
Open file formats
Analyze optimized data
formats on the latest
SSD, and all open data
formats in Amazon S3
Inexpensive
As low as $1,000 per
terabyte per year, 1/10th
the cost of traditional
data warehouse
solutions; start at $0.25
per hour
$
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Spectrum
Extend the data warehouse to exabytes of data in S3 data lake
S3 data lakeRedshift data
Redshift Spectrum
query engine • Exabyte Redshift SQL queries against S3
• Join data across Redshift and S3
• Scale compute and storage separately
• Stable query performance and unlimited concurrency
• CSV, ORC, Grok, Avro, & Parquet data formats
• Pay only for the amount of data scanned
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EMR—Big Data Processing
Low cost
Flexible billing with per-
second billing, EC2 spot,
reserved instances and
auto-scaling to reduce
costs 50–80%
$
Easy
Launch fully managed
Hadoop & Spark in
minutes; no cluster
setup, node provisioning,
cluster tuning
Latest versions
Updated with the latest
open source frameworks
within 30 days of release
Use S3 storage
Process data directly in
the S3 data lake securely
with high performance
using the EMRFS
connector
Data Lake
1001100001001010111
0010101011100101010
0000111100101100101
010001100001
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Elasticsearch Service
Easy to Use
Fully managed;
Deploy production-ready
clusters in minutes
Secure
Secure access with VPC
to keep all traffic within
AWS network
Open
Direct access to
Elasticsearch open-source
APIs; supports Logstash
and Kibana
Available
Zone awareness
replicates data between
two AZs; automatically
monitors & replaces
failed nodes
$
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis—Real Time
time
Load data streams
into AWS data stores
Kinesis Data
Firehose
Build custom
applications that
analyze data streams
Kinesis Data
Streams
Capture, process, and
store video streams
for analytics
Kinesis Video
Streams
New
Analyze data streams
with SQL
Kinesis Data
Analytics
SQL
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena—Interactive Analysis
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
Query Instantly
Zero setup cost; just
point to S3 and
start querying
SQL
Open
ANSI SQL interface,
JDBC/ODBC drivers,
multiple formats,
compression types,
and complex joins and
data types
Easy
Serverless: zero
infrastructure, zero
administration
Integrated with
QuickSight
Pay per query
Pay only for queries
run; save 30–90% on
per-query costs
through compression
$
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon QuickSight
easy
Empower
everyone
Seamless
connectivity
Fast analysis Serverless
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modern data architecture
ORIGIN DESTINATION
Insight
consumers
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modern data architecture
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
ORIGIN DESTINATION
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modern data architecture
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Transactions
Web logs /
cookies
ERP
Connected
devices
Social media
ORIGIN DESTINATION
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modern data architecture
Insights to enhance business applications, new digital services
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Transactions
Web logs /
cookies
ERP
Connected
devices
Social media
Data Warehouse
Amazon Redshift
Legacy Apps
Amazon RDS
Schemaless
Amazon ElasticSearch
Direct Query
Amazon Athena
Near-Zero Latency
Amazon DynamoDB
Semi/Unstructured
Amazon EMR
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modern data architecture
Data Warehouse
Amazon Redshift
Legacy Apps
Amazon RDS
Data analysts
Data scientists
Business users
Engagement platforms
Schemaless
Amazon ElasticSearch
Direct Query
Amazon Athena
Near-Zero Latency
Amazon DynamoDB
Automation / events
Amazon S3
Staged Data
(Data Lake)
Semi/Unstructured
Amazon EMR
Transactions
Web logs /
cookies
ERP
AWS Database
Migration
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Connected
devices
Social media
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modern data architecture
Insights to enhance business applications, new digital services
Data Warehouse
Amazon Redshift
Legacy Apps
Amazon RDS
Data analysts
Data scientists
Business users
Engagement platforms
Schemaless
Amazon ElasticSearch
Direct Query
Amazon Athena
Near-Zero Latency
Amazon DynamoDB
Automation / events
Amazon S3
Staged Data
(Data Lake)
Semi/Unstructured
Amazon EMR
Transactions
Web logs /
cookies
ERP
AWS Database
Migration
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Connected
devices
Social media
Amazon S3
Raw Data
Amazon EMR
ETL
Advanced
Analytics
MLlib
AWS
Cloud Trail
AWS
IAM
Amazon
CloudWatch
AWS
KMS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modern data architecture
Data Warehouse
Amazon Redshift
Legacy Apps
Amazon RDS
Data analysts
Data scientists
Business users
Engagement platforms
Schemaless
Amazon ElasticSearch
Direct Query
Amazon Athena
Near-Zero Latency
Amazon DynamoDB
Automation / events
Amazon S3
Staged Data
(Data Lake)
Semi/Unstructured
Amazon EMR
Transactions
Web logs /
cookies
ERP
AWS Database
Migration
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Connected
devices
Social media
Amazon S3
Raw Data
Amazon EMR
ETL
Advanced
Analytics
MLlib
Event Capture
Amazon Kinesis
Stream Analysis
Amazon EMR
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modern data architecture
Transactions
Web logs /
cookies
ERP
AWS Database
Migration
AWS Direct
Connect
Internet
Interfaces Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Data analysts
Data scientists
Business users
Engagement platforms
Amazon
Kinesis
Connected
devices
Social media
Advanced
Analytics
MLlib
Event Capture
Amazon Kinesis
Stream Analysis
Amazon EMR Event Scoring
Amazon AI
Event Handler
AWS Lambda Response Handler
AWS Lambda
Automation / events
Data Warehouse
Amazon Redshift
Legacy Apps
Amazon RDS
Schemaless
Amazon ElasticSearch
Direct Query
Amazon Athena
Near-Zero Latency
Amazon DynamoDB
Semi/Unstructured
Amazon EMR
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Sample Batch Analytics Pipeline
Ad-hoc access to data using Athena
Athena can query
aggregated datasets as well
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Smart Applications | Machine Learning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Clickstream Analysis
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customer Success.
Powered by AWS.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sysco is the leader
in selling, marketing,
and distributing food.
Challenge:
Large volumes of data in
multiple systems. Also, high costs
from maintaining on-premises
EDW deployment.
Solution:
• Migrated their on-premises
solution to the cloud with
Redshift, S3, EMR, and Athena
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analytics on the Data Lake
• Sysco is the leader in
selling, marketing, &
distributing food
• Challenge: large volumes of
data in multiple systems
• Consolidated data into
a single S3 data lake
• Data scientists use EMR
notebooks, Athena &
Amazon Redshift Spectrum
used by business users
for reporting
Redshift
ETL
process
Data
preparation
Ingest raw data from
multiple sources
S3
Redshift
Spectrum
Athena
EMR
Marketing
data source
Other source
systems
Transformed
data
S3
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FINRA oversees > 3,000
securities firms doing
business in the United States.
Challenge:
FINRA’s legacy system did not
scale well
• Up to 75 billion events per day
• Run complex surveillance queries
over 20+ PB of data
Solution:
• Migrated their big data appliance
to a S3 Data Lake and used EMR
for ingestion and processing
• Migrated to RDS and testing Aurora
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FINRA uses S3 to Build Data Lake with EMR
• Required fast access
across trillions of trade
records (20PB+)
• Migrated from
on-premises system
• Use Apache HBase on
Amazon EMR to store
and serve this data
• Use EMR engines—
Spark, Presto, and Hive
to process data
• Lower costs by 60% over
on-premises system
Spark
on EMR
Presto
on EMR
Hive
on EMR
S3
Herd
Metastore
HBase
on EMR
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Nasdaq operates financial
exchanges around the
world, and processes
large volumes of data.
Challenge:
Nasdaq wanted to make their large
historical data footprint available
to analyze as a single dataset.
Solution:
• Use Amazon Redshift for
interactive querying
• Use Amazon S3 as a Data Lake,
and Presto on EMR to process
historical data
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Nasdaq Uses AWS to Build a Data Lake
• Migrate legacy
on-premises warehouse
to Amazon Redshift
• 4.8B rows inserted
per trading day
(orders, trades, quotes)
• Ingest data from multiple
sources, validates, and
stages in S3
• Redshift reads data out of
S3 for fast queries
• Presto on EMR and S3 used
for analysis of massive
historical data set
Data from all 7 exchanges
operated by Nasdaq
(orders, quotes, trade executions)
Flat
files
Operational
Databases
EMR
Redshift
S3
SQL
Clients
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake Overview
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• A centralized repository for both structured
and unstructured data
• Store data as-is in open-source file
formats to enable direct analytics
What is a Data Lake?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why a Data Lake?
• Decouple storage from compute, allowing
you to scale
• Enable advanced analytics across all of
your data sources
• Reduce complexity in ETL and
operational overhead
• Future extensibility as new database and
analytics technologies are invented
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Traditionally, Analytics Looked Like This
OLTP ERP CRM LOB
Data Warehouse
Business Intelligence
TBs-PBs Scale
Schema Defined Prior to Data Load
Operational and Ad Hoc Reporting
Large Initial Capex + $$K / TB/ Year
Relational Data
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lakes Extend the Traditional Approach
OLTP ERP CRM LOB
Data Lake
1001100001001010111001
0101011100101010000101
1111011010001111001011
0010110
0100011000010
Catalog
DW
Queries
Big Data
Processing
Interactive Real-Time
Web Sensors SocialDevices
Business Intelligence Machine Learning TB-EBs Scale
All Data in one place, a Single Source of Truth
Relational and Non-Relational Data
Decouples (low cost) Storage and Compute
Schema on Read
Diverse Analytical Engines
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake – All Data in One Place
Store and analyze all of your data,
from all of your sources, in one
centralized location.
“Why is the data distributed in
many locations? Where is the
single source of truth ?”
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake – Quick Ingest
Quickly ingest data
without needing to force it into a
pre-defined schema.
“How can I collect data quickly
from various sources and store
it efficiently?”
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake – Storage vs Compute
Separating your storage and compute
allows you to scale each component as
required
“How can I scale up with the
volume of data being generated?”
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake – Schema on Read
“Is there a way I can apply multiple
analytics and processing frameworks
to the same data?”
A Data Lake enables ad-hoc
analysis by applying schemas
on read, not write.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building a Data lake on AWS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why AWS?
Implementing a Data Lake architecture requires a broad
set of tools and technologies to serve an increasingly
diverse set of applications and use cases.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake on AWS
Catalog & Search Access & User Interfaces
Data Ingestion
Analytics & Serving
S3
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
AppSync
Amazon
API Gateway
Amazon
Cognito
AWS
KMS
AWS
CloudTrail
Manage & Secure
AWS
IAM
Amazon
CloudWatch
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
Amazon
Athena
Amazon
EMR
AWS
Glue
Amazon
Redshift
Amazon
DynamoDB
Amazon
QuickSight
Amazon
Kinesis
Amazon
Elasticsearch
Service
Amazon
Neptune
Amazon
RDS
Central Storage
Scalable, secure, cost-
effective
AWS
Glue
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Designed for 11 9s
of durability
Designed for
99.99% availability
Durable Available High performance
§ Multiple upload
§ Range GET
§ Store as much as you need
§ Scale storage and compute
independently
§ No minimum usage commitments
Scalable
§ Amazon EMR
§ Amazon Redshift
§ Amazon DynamoDB
Integrated
§ Simple REST API
§ AWS SDKs
§ Read-after-create consistency
§ Event notification
§ Lifecycle policies
Easy to use
Why Amazon S3 for a Data Lake?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What can you do with a Data Lake?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Query Directly with Amazon Athena
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analyze with Hadoop on Amazon EMR
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Create Visualizations with Amazon QuickSight
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Train ML Models with Amazon SageMaker
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Create a Central Data Catalog with AWS Glue
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Load into Downstream Services
AURORAAmazon Redshift
Amazon DynamoDB
Amazon Aurora
Amazon Elasticsearch
Run complex analytic queries against
petabytes of structured data
A NoSQL database service that
delivers consistent, single-digit
millisecond latency at any scale.
A MySQL and PostgreSQL compatible relational
database built for the cloud
Delivers Elasticsearch’s real-time analytics
capabilities alongside the availability,
scalability, and security that production
workloads require.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Movement into the Data Lake
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Sources
FilesLogsStreamsDatabases
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Sources - Databases
Amazon S3Databases
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Change Data Capture
Techniques to Capture Changes
• Timestamp
• Diff Comparison
• Triggers
• Transaction Log
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Change Data Capture – Timestamp
4/18/18 300
3/12/18 800
9/25/17 230
2/04/18 100
4/18/18 300
7/16/19 1600
9/25/17 230
2/04/18 100
Last Run: 7/16/19 1400
Kinesis Data Firehose Amazon S3
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Change Data Capture – Diff Compare
6/15/18 0300
6/16/18 0300
20180615T0300
20180616T0300
Diff Compare Kinesis Data Firehose Amazon S3
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Change Data Capture – Triggers
SELECT
Id: 20982358
Name: Jean-Luc Picard
Rank: Captain
State: Agitated
Roster
ChangeData
Table: Roster
Id: 20982358
Operation: Update
Job: ag8afh8 ChangeDataBatch
SELECT
Table: Roster
Id: 20982358
Operation: Update
Amazon S3
Write operations to Firehose
Kinesis Data Firehose
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Change Data Capture – Database Logs
LOG_FILE_HDR_SIZE
OS_FILE_LOG_BLOCK
_SIZE
FORMAT
CHECKSUM
LOG_CHECKPOINT_1
LOG_CHECKPOINT_2
Checkpoint_lsn
Checkpoint_no
Log.buf_size
LOG BLOCK
LOG_BLOCK_HDR_SIZ
E
Hdr_no
[…]
???
Tx001.log
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Database Migration Service (AWS DMS) easily
and securely migrate and/or replicate your databases
and data warehouses to AWS
AWS Schema Conversion Tool (AWS SCT) convert your
commercial database and data warehouse schemas to open-
source engines or AWS-native services, such as Amazon
Aurora and Redshift
Database Migration Service
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Modernize Migrate Replicate
Modernize your database tier –
• Commercial to open-source
• Commercial to Amazon Aurora
Modernize your Data Warehouse –
• Commercial to Redshift
• Migrate business-critical applications
• Migrate from Classic to VPC
• Migrate data warehouse to Redshift
• Upgrade to a minor version
• Create cross-regions Read Replicas
• Run your analytics in the cloud
• Keep your dev/test and production
environment sync
When to use DMS and SCT?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Sources - Files
Amazon S3Files
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Files
Optimizing Transfers Available Services
• S3 Multi-Part Upload
• S3 Transfer Acceleration
• AWS Direct Connect
• AWS DataSync
• AWS Transfer - SFTP
• AWS Snowball/Snowmobile
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Uploading to Amazon S3
• Amazon S3 supports both a single-part upload
and a multi-part upload API
• The single-part upload supports objects up to 5
GB in size
• The multi-part upload supports objects up to 5
TB in size
• The multi-part upload also enables you to
maximize your throughput by using parallel
threads
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
PUT requests go through the nearest AWS Edge
Location
Data transits over the AWS private network rather
than Internet
AWS private network optimizes throughput and
latency to the AWS Region
Data is not stored in the edge cache
S3 Transfer Acceleration
S3 bucket
AWS edge
location
Uploader
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Direct Connect
Amazon S3
VPC Endpoint
Customer
Gateway
Corporate Data Center
AWS Region
Virtual Private Cloud
EC2
Direct Connect Location
Customer/Partner
Cage
AWS Cage
Customer/Partner
Router
AWS Direct Connect
Endpoint
Private Virtual Interface
Public Virtual Interface
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS DataSync
Online transfer service that simplifies, automates, and
accelerates moving data between on-premises storage and AWS
Fast data
transfer
Cost-
effective
Combines the speed and reliability of network acceleration
software with the cost-effectiveness of open source tools
Easy to use Secure and
reliable
Cloud
integrated
AWS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Transfer for SFTP
Fully managed SFTP service for Amazon S3
Native integration
with AWS services
Simple
to use
Cost-effective
Fully managed
in AWS Secure and Compliant
Seamless migration
of existing workflows
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Snowball/Snowmobile
Use Case AWS Solution
Cloud Migration, Disaster Recovery AWS Snowball
Internet of Things (IoT), Remote
Locations
AWS Snowball Edge
Migrating Exabytes of Data AWS Snowmobile
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Sources - Streams
Amazon S3Streams
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streams
Collecting and Analyzing
• Amazon Kinesis
• Amazon Managed Streaming for Kafka (MSK)
• Example: Clickstream Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis - Stream Processing on AWS
Firehose
• Buffer records in a stream into a
single output for more efficient
storage
• Automatic flushing of buffer to S3,
ElasticSearch, Redshift, or Splunk
Analytics
• Create time windows over streams
and perform aggregate operations
using SQL
• Join together multiple streams and
output to new streams
Streams
• Capture streaming data for
downstream processing
• Allow multiple processors to read
streams at their own rate
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Summary - Ingestion
s3://datalake/
/vendorfeeds
/vendorA
/vendorB
/clickstream
/orders
/vendors
/customers
/app_logs
/instance1
/instance2
/syslogs
/instance1
/instance2
/databases
/customers
/orders
/vendors
API Gateway
Kinesis Agent
DMS
Kinesis Data Firehose
Amazon S3
Files
Streams
Logs
Databases
AWS DataSync
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Consuming Data from the Data Lake
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Anti-Pattern
Everything
Query
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Also an Anti-Pattern
Everything
Query
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
One tool to
rule them all
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Where do I start?
• Understand your data
• Data Structure, Access patterns & characteristics,
Temperature, Cost, Size
• Know your audience
• Business Users, Data Scientists, Developers
• Select the right service
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Archival
In-memory Warehouse
NoSQL
Hot data Warm data Cold data
Data
Structure
Low
High
Object
Search
Understand your Data
Latency
Data volume
HighLow
Request rate
Cost / GB
High Low
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon
ElastiCache
Amazon ES
Amazon
DynamoDB Amazon S3 Amazon Glacier
Hot data Warm data Cold data
Data
Structure
Low
High
Understand your Data
Latency
Data volume
HighLow
Request rate
Cost / GB
High Low
NoSQL
Object
Archival
Search
In-Memory
Warehouse
Amazon Redshift
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
PRIORITIES NEEDS
Creating engaging visual and narrative journeys
for analytical solutions
Data Visualizer
Manages data as a product. Ensures freshness and
consistency of data; understands lineage and
compliance needs; treats DS as customers
Data Product
Manager
Monitoring for reliability, quickly diagnose
deployment or availability issues
DevOps
Engineer
ROLE
Visualization
Dashboards
Reporting
Reports – data quality, errors
Ad hoc querying
Dashboards
Makes sense of data, generates and communicates
insights to improve or create business processes,
creates predictive ML models to support them
Data Scientist
Ad hoc querying
Robust ML tools
Builds scalable pipelines, transforms and loads data
into structures complete with metadata that can be
readily consumed by DS
Data
Engineer
Ad hoc querying
Quick visualization
Vetting the priortization and ROI, funding projects,
providing ongoing feedback
Business
Sponsor
Reporting
Dashboards
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Overview of AI/ML
Machine Learning
Learning without being
explicitly programmed
Artificial Intelligence
Machines or programs
exhibiting intelligence
Deep Learning
Learning based on
Deep Neural Networks
AI vs Machine Learning vs Deep Learning
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Closer Look at Machine Learning
and when do you use it
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
43,252,003,274,489,856,000
43 QUINTILLION
UNIQUE COMBINATIONS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
F2 U' R' L F2 R L' U'
Learning
function
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
F2 U' R' L F2 R L' U'
Learning
function
1%
accuracy
R U r U R U2 r U2%
accuracy
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learning
function
20%
accuracy
40%
accuracy
60%
accuracy
80%
accuracy
95%
accuracy
2%
accuracy
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learning
function
95%
accuracy
?
F2 R F R′ B′ D F D′ B D F
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Don’t code the patterns; let
the system learn through data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Train a model
positive/negative
reinforcement
Infer from a model
to obtain a
prediction
Data
Feedback
Model
Supervised Learning
It is a cat.
No, it’s a
Dog.
Supervised Learning – How Machine Learn
Human intervention and validation required
e.g. Photo classification and tagging
Input
Label
Machine
Learning
Algorithm
Dog
Prediction
Cat
Training Data
?
Label
Dog
Adjust Model
Unsupervised Learning
No human intervention required
(e.g. Customer segmentation)
Input
Machine
Learning
Algorithm
Prediction
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Retail
Demand Forecasting
Vendor Lead Time
Prediction
Pricing
Packaging
Substitute Prediction
Customers
Recommendation
Product Search
Product Ads
Shopping Advice
Customer Problem
Detection
Catalogue
Browse-Node
Classification
Meta-data Validation
Review Analysis
Product Matching
Text
In-Book Search
Named-entity
Extraction
Summarisation/Xray
Plagiarism Detection
Seller
Fraud Detection
Predictive Help
Seller Search &
Crawling
Images
Visual Search
Product Image
Enhancement
Brand Tracking
Machine Learning at Amazon.com
Personalized recommendation
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alexa, Hello!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AmazonFresh
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Put AI and ML in the hands of every developer
and data scientist
Our Mission at AWS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
M L F R A M E W O R K S &
I N F R A S T R U C T U R E
A I S E R V I C E S
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N
V I D E O
Vision Speech Language Chatbots
A M A Z O N
S A G E M A K E R
B U I L D T R A I N
F O R E C A S T
Forecasting
T E X T R A C T P E R S O N A L I Z E
Recommendations
D E P L O Y
Pre-built algorithms & notebooks
Data labeling (G R O U N D T R U T H )
One-click model training & tuning
Optimization (N E O )
One-click deployment & hosting
M L S E R V I C E S
F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e
E C 2 P 3
& P 3 N
E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C
I N F E R E N C E
Reinforcement learningAlgorithms & models ( A W S M A R K E T P L A C E
F O R M A C H I N E L E A R N I N G )
AWS AI Services
AIwithoutworryingaboutML
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Vision: Amazon Rekognition
Key Features
Object & Scene Detection
Image Moderation
Facial Analysis
Facial Comparison
Facial Recognition
Celebrity Recognition
Rekognition Demo: Selfie Analyzer
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Object and Activity
Detection
Person
Tracking
Face
Recognition
Real-time Live
Stream
Content Moderation Celebrity
Recognition
Vision: Amazon Rekognition Video
Video Analysis
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Speech: Amazon Polly
Key Features
• 50 Voices
• 24 Languages
• Lip-Syncing & Text Highlighting
• Fine-grained Voice Control
• Custom Vocabularies
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Language: Amazon Lex
Conversational interfaces for your applications, powered by the same Natural
Language Understanding (NLU) & Automatic Speech Recognition (ASR) models
as Alexa
Amazon Connect Contact Center Can Use Amazon
Lex for Natural Conversations
AWS ML Services
DemocratizingMachineLearning
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ML can be very complicated
1
2
3
1
2
3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker: build, train, and deploy ML at Scale
1
2
3
1
2
3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1
2
3
1
2
3
Amazon SageMaker: build, train, and deploy ML at Scale
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1
2
3
1
2
3
Amazon SageMaker: build, train, and deploy ML at Scale
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1
2
3
1
2
3
Amazon SageMaker: build, train, and deploy ML at Scale
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1
2
3
1
2
3
Amazon SageMaker: build, train, and deploy ML at Scale
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1
2
3
1
2
3
Amazon SageMaker: build, train, and deploy ML at Scale
How do you make it easier to obtain
high quality labeled data?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker: Build, train, and deploy ML
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Successful models require high-quality data
Build highly accurate training datasets and reduce data
labeling costs by up to 70% using machine learning
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker ground truth
Label machine learning training data easily and accurately
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank You

More Related Content

What's hot

What's new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
What's new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...What's new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
What's new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...Amazon Web Services
 
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-job
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-jobDatabases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-job
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-jobAmazon Web Services
 
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdfBuilding data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdfAmazon Web Services
 
Create, map, and drive performance with Amazon FSx for Windows File Server - ...
Create, map, and drive performance with Amazon FSx for Windows File Server - ...Create, map, and drive performance with Amazon FSx for Windows File Server - ...
Create, map, and drive performance with Amazon FSx for Windows File Server - ...Amazon Web Services
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Amazon Web Services
 
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Amazon Web Services
 
Threat Detection using artificial intelligence
Threat Detection using artificial intelligenceThreat Detection using artificial intelligence
Threat Detection using artificial intelligenceAmazon Web Services
 
Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...
Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...
Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...Amazon Web Services
 
Serverless data prep with AWS Glue - ADB306 - New York AWS Summit
Serverless data prep with AWS Glue - ADB306 - New York AWS SummitServerless data prep with AWS Glue - ADB306 - New York AWS Summit
Serverless data prep with AWS Glue - ADB306 - New York AWS SummitAmazon Web Services
 
How-to-Choose-the-Right-Database-to-Build-High-Performance-Internet-Scale-App...
How-to-Choose-the-Right-Database-to-Build-High-Performance-Internet-Scale-App...How-to-Choose-the-Right-Database-to-Build-High-Performance-Internet-Scale-App...
How-to-Choose-the-Right-Database-to-Build-High-Performance-Internet-Scale-App...Amazon Web Services
 
Build a Next-Gen Meeting Room Experience Using Alexa for Business - SVC203 - ...
Build a Next-Gen Meeting Room Experience Using Alexa for Business - SVC203 - ...Build a Next-Gen Meeting Room Experience Using Alexa for Business - SVC203 - ...
Build a Next-Gen Meeting Room Experience Using Alexa for Business - SVC203 - ...Amazon Web Services
 
Continuous security monitoring and threat detection with AWS services - SEC20...
Continuous security monitoring and threat detection with AWS services - SEC20...Continuous security monitoring and threat detection with AWS services - SEC20...
Continuous security monitoring and threat detection with AWS services - SEC20...Amazon Web Services
 
Making CI/CD pipelines safer with application monitoring and tracing - MAD202...
Making CI/CD pipelines safer with application monitoring and tracing - MAD202...Making CI/CD pipelines safer with application monitoring and tracing - MAD202...
Making CI/CD pipelines safer with application monitoring and tracing - MAD202...Amazon Web Services
 
Twelve-factor serverless applications - MAD302 - Santa Clara AWS Summit
Twelve-factor serverless applications - MAD302 - Santa Clara AWS SummitTwelve-factor serverless applications - MAD302 - Santa Clara AWS Summit
Twelve-factor serverless applications - MAD302 - Santa Clara AWS SummitAmazon Web Services
 
Scalable serverless architectures using event-driven design - MAD308 - New Yo...
Scalable serverless architectures using event-driven design - MAD308 - New Yo...Scalable serverless architectures using event-driven design - MAD308 - New Yo...
Scalable serverless architectures using event-driven design - MAD308 - New Yo...Amazon Web Services
 
Modernize your data warehouse with Amazon Redshift - ADB305 - New York AWS Su...
Modernize your data warehouse with Amazon Redshift - ADB305 - New York AWS Su...Modernize your data warehouse with Amazon Redshift - ADB305 - New York AWS Su...
Modernize your data warehouse with Amazon Redshift - ADB305 - New York AWS Su...Amazon Web Services
 
Optimize your workloads with Amazon EC2 and AMD EPYC - DEM01-R - Atlanta AWS ...
Optimize your workloads with Amazon EC2 and AMD EPYC - DEM01-R - Atlanta AWS ...Optimize your workloads with Amazon EC2 and AMD EPYC - DEM01-R - Atlanta AWS ...
Optimize your workloads with Amazon EC2 and AMD EPYC - DEM01-R - Atlanta AWS ...Amazon Web Services
 

What's hot (20)

What's new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
What's new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...What's new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
What's new with Amazon S3, Amazon EFS, and other AWS storage services - STG20...
 
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-job
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-jobDatabases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-job
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-job
 
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdfBuilding data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdf
 
Create, map, and drive performance with Amazon FSx for Windows File Server - ...
Create, map, and drive performance with Amazon FSx for Windows File Server - ...Create, map, and drive performance with Amazon FSx for Windows File Server - ...
Create, map, and drive performance with Amazon FSx for Windows File Server - ...
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...
 
HK-AWS-Quick-Start-Workshop
HK-AWS-Quick-Start-WorkshopHK-AWS-Quick-Start-Workshop
HK-AWS-Quick-Start-Workshop
 
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
 
Threat Detection using artificial intelligence
Threat Detection using artificial intelligenceThreat Detection using artificial intelligence
Threat Detection using artificial intelligence
 
Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...
Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...
Searching for patterns: Log analytics using Amazon ES - ADB205 - New York AWS...
 
Serverless data prep with AWS Glue - ADB306 - New York AWS Summit
Serverless data prep with AWS Glue - ADB306 - New York AWS SummitServerless data prep with AWS Glue - ADB306 - New York AWS Summit
Serverless data prep with AWS Glue - ADB306 - New York AWS Summit
 
How-to-Choose-the-Right-Database-to-Build-High-Performance-Internet-Scale-App...
How-to-Choose-the-Right-Database-to-Build-High-Performance-Internet-Scale-App...How-to-Choose-the-Right-Database-to-Build-High-Performance-Internet-Scale-App...
How-to-Choose-the-Right-Database-to-Build-High-Performance-Internet-Scale-App...
 
Build a Next-Gen Meeting Room Experience Using Alexa for Business - SVC203 - ...
Build a Next-Gen Meeting Room Experience Using Alexa for Business - SVC203 - ...Build a Next-Gen Meeting Room Experience Using Alexa for Business - SVC203 - ...
Build a Next-Gen Meeting Room Experience Using Alexa for Business - SVC203 - ...
 
Continuous security monitoring and threat detection with AWS services - SEC20...
Continuous security monitoring and threat detection with AWS services - SEC20...Continuous security monitoring and threat detection with AWS services - SEC20...
Continuous security monitoring and threat detection with AWS services - SEC20...
 
Build-a-Unified-Cloud
Build-a-Unified-CloudBuild-a-Unified-Cloud
Build-a-Unified-Cloud
 
Serverless_with_MongoDB
Serverless_with_MongoDBServerless_with_MongoDB
Serverless_with_MongoDB
 
Making CI/CD pipelines safer with application monitoring and tracing - MAD202...
Making CI/CD pipelines safer with application monitoring and tracing - MAD202...Making CI/CD pipelines safer with application monitoring and tracing - MAD202...
Making CI/CD pipelines safer with application monitoring and tracing - MAD202...
 
Twelve-factor serverless applications - MAD302 - Santa Clara AWS Summit
Twelve-factor serverless applications - MAD302 - Santa Clara AWS SummitTwelve-factor serverless applications - MAD302 - Santa Clara AWS Summit
Twelve-factor serverless applications - MAD302 - Santa Clara AWS Summit
 
Scalable serverless architectures using event-driven design - MAD308 - New Yo...
Scalable serverless architectures using event-driven design - MAD308 - New Yo...Scalable serverless architectures using event-driven design - MAD308 - New Yo...
Scalable serverless architectures using event-driven design - MAD308 - New Yo...
 
Modernize your data warehouse with Amazon Redshift - ADB305 - New York AWS Su...
Modernize your data warehouse with Amazon Redshift - ADB305 - New York AWS Su...Modernize your data warehouse with Amazon Redshift - ADB305 - New York AWS Su...
Modernize your data warehouse with Amazon Redshift - ADB305 - New York AWS Su...
 
Optimize your workloads with Amazon EC2 and AMD EPYC - DEM01-R - Atlanta AWS ...
Optimize your workloads with Amazon EC2 and AMD EPYC - DEM01-R - Atlanta AWS ...Optimize your workloads with Amazon EC2 and AMD EPYC - DEM01-R - Atlanta AWS ...
Optimize your workloads with Amazon EC2 and AMD EPYC - DEM01-R - Atlanta AWS ...
 

Similar to Data_Analytics_and_AI_ML

Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSSteven Hsieh
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSAmazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析Amazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSAmazon Web Services
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Amazon Web Services
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudAlluxio, Inc.
 

Similar to Data_Analytics_and_AI_ML (20)

Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWS
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the Cloud
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Data_Analytics_and_AI_ML

  • 1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Analytics and Machine Learning on AWS
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. WHAT IS BIG DATA?
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data and the 3Vs Variety Velocity Volume
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Elastic and highly scalable No upfront capital expense Only pay for what you use + + Available on-demand + The Cloud Advantage
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. BIG DATA ANALYTICS
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Examples of Business Outcomes and Insights Ø Security threat detection Ø User Behavior Analysis Ø Enhanced customer experience Ø Business Intelligence Ø Spending optimization Ø Real-time alerting
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Relational Databases NoSQL Databases Web servers Mobile phones/Tablets 3rd party feeds IoT Clickstream Examples of Big Data Sources
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Examples of AWS Services for Big Data Analytics EMR EC2 Glacier S3 Import Export Kinesis Direct Connect Machine LearningRedshift DynamoDB AWS Database Migration Service AWS Lambda AWS IoT AWS Data Pipeline Amazon KinesisAnalytic Analytics Amazon SNS AWS Snowball Amazon SWF AmazonAthena Amazon QuickSight Amazon AuroraAWS Glue
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3—Object Storage Security and Compliance Three different forms of encryption; encrypts data in transit when replicating across regions; log and monitor with CloudTrail, use ML to discover and protect sensitive data with Macie Flexible Management Classify, report, and visualize data usage trends; objects can be tagged to see storage consumption, cost, and security; build lifecycle policies to automate tiering, and retention Durability, Availability & Scalability Built for eleven nine’s of durability; data distributed across 3 physical facilities in an AWS region; automatically replicated to any other AWS region Query in Place Run analytics & ML on data lake without data movement; S3 Select can retrieve subset of data, improving analytics performance by 400%
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift—Data Warehousing Fast at scale Columnar storage technology to improve I/O efficiency and scale query performance Secure Audit everything; encrypt data end-to-end; extensive certification and compliance Open file formats Analyze optimized data formats on the latest SSD, and all open data formats in Amazon S3 Inexpensive As low as $1,000 per terabyte per year, 1/10th the cost of traditional data warehouse solutions; start at $0.25 per hour $
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift Spectrum Extend the data warehouse to exabytes of data in S3 data lake S3 data lakeRedshift data Redshift Spectrum query engine • Exabyte Redshift SQL queries against S3 • Join data across Redshift and S3 • Scale compute and storage separately • Stable query performance and unlimited concurrency • CSV, ORC, Grok, Avro, & Parquet data formats • Pay only for the amount of data scanned
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EMR—Big Data Processing Low cost Flexible billing with per- second billing, EC2 spot, reserved instances and auto-scaling to reduce costs 50–80% $ Easy Launch fully managed Hadoop & Spark in minutes; no cluster setup, node provisioning, cluster tuning Latest versions Updated with the latest open source frameworks within 30 days of release Use S3 storage Process data directly in the S3 data lake securely with high performance using the EMRFS connector Data Lake 1001100001001010111 0010101011100101010 0000111100101100101 010001100001
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Elasticsearch Service Easy to Use Fully managed; Deploy production-ready clusters in minutes Secure Secure access with VPC to keep all traffic within AWS network Open Direct access to Elasticsearch open-source APIs; supports Logstash and Kibana Available Zone awareness replicates data between two AZs; automatically monitors & replaces failed nodes $
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis—Real Time time Load data streams into AWS data stores Kinesis Data Firehose Build custom applications that analyze data streams Kinesis Data Streams Capture, process, and store video streams for analytics Kinesis Video Streams New Analyze data streams with SQL Kinesis Data Analytics SQL
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena—Interactive Analysis Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Ability to run SQL queries on data archived in Amazon Glacier (coming soon) Query Instantly Zero setup cost; just point to S3 and start querying SQL Open ANSI SQL interface, JDBC/ODBC drivers, multiple formats, compression types, and complex joins and data types Easy Serverless: zero infrastructure, zero administration Integrated with QuickSight Pay per query Pay only for queries run; save 30–90% on per-query costs through compression $
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon QuickSight easy Empower everyone Seamless connectivity Fast analysis Serverless
  • 17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modern data architecture ORIGIN DESTINATION Insight consumers
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modern data architecture Data analysts Data scientists Business users Engagement platforms Automation / events ORIGIN DESTINATION
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modern data architecture Data analysts Data scientists Business users Engagement platforms Automation / events Transactions Web logs / cookies ERP Connected devices Social media ORIGIN DESTINATION
  • 20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modern data architecture Insights to enhance business applications, new digital services Data analysts Data scientists Business users Engagement platforms Automation / events Transactions Web logs / cookies ERP Connected devices Social media Data Warehouse Amazon Redshift Legacy Apps Amazon RDS Schemaless Amazon ElasticSearch Direct Query Amazon Athena Near-Zero Latency Amazon DynamoDB Semi/Unstructured Amazon EMR
  • 21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modern data architecture Data Warehouse Amazon Redshift Legacy Apps Amazon RDS Data analysts Data scientists Business users Engagement platforms Schemaless Amazon ElasticSearch Direct Query Amazon Athena Near-Zero Latency Amazon DynamoDB Automation / events Amazon S3 Staged Data (Data Lake) Semi/Unstructured Amazon EMR Transactions Web logs / cookies ERP AWS Database Migration AWS Direct Connect Internet Interfaces Amazon Kinesis Connected devices Social media
  • 22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modern data architecture Insights to enhance business applications, new digital services Data Warehouse Amazon Redshift Legacy Apps Amazon RDS Data analysts Data scientists Business users Engagement platforms Schemaless Amazon ElasticSearch Direct Query Amazon Athena Near-Zero Latency Amazon DynamoDB Automation / events Amazon S3 Staged Data (Data Lake) Semi/Unstructured Amazon EMR Transactions Web logs / cookies ERP AWS Database Migration AWS Direct Connect Internet Interfaces Amazon Kinesis Connected devices Social media Amazon S3 Raw Data Amazon EMR ETL Advanced Analytics MLlib AWS Cloud Trail AWS IAM Amazon CloudWatch AWS KMS
  • 23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modern data architecture Data Warehouse Amazon Redshift Legacy Apps Amazon RDS Data analysts Data scientists Business users Engagement platforms Schemaless Amazon ElasticSearch Direct Query Amazon Athena Near-Zero Latency Amazon DynamoDB Automation / events Amazon S3 Staged Data (Data Lake) Semi/Unstructured Amazon EMR Transactions Web logs / cookies ERP AWS Database Migration AWS Direct Connect Internet Interfaces Amazon Kinesis Connected devices Social media Amazon S3 Raw Data Amazon EMR ETL Advanced Analytics MLlib Event Capture Amazon Kinesis Stream Analysis Amazon EMR
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modern data architecture Transactions Web logs / cookies ERP AWS Database Migration AWS Direct Connect Internet Interfaces Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Data analysts Data scientists Business users Engagement platforms Amazon Kinesis Connected devices Social media Advanced Analytics MLlib Event Capture Amazon Kinesis Stream Analysis Amazon EMR Event Scoring Amazon AI Event Handler AWS Lambda Response Handler AWS Lambda Automation / events Data Warehouse Amazon Redshift Legacy Apps Amazon RDS Schemaless Amazon ElasticSearch Direct Query Amazon Athena Near-Zero Latency Amazon DynamoDB Semi/Unstructured Amazon EMR
  • 25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A Sample Batch Analytics Pipeline Ad-hoc access to data using Athena Athena can query aggregated datasets as well
  • 26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Smart Applications | Machine Learning
  • 27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Clickstream Analysis
  • 28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Customer Success. Powered by AWS.
  • 29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sysco is the leader in selling, marketing, and distributing food. Challenge: Large volumes of data in multiple systems. Also, high costs from maintaining on-premises EDW deployment. Solution: • Migrated their on-premises solution to the cloud with Redshift, S3, EMR, and Athena
  • 30. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analytics on the Data Lake • Sysco is the leader in selling, marketing, & distributing food • Challenge: large volumes of data in multiple systems • Consolidated data into a single S3 data lake • Data scientists use EMR notebooks, Athena & Amazon Redshift Spectrum used by business users for reporting Redshift ETL process Data preparation Ingest raw data from multiple sources S3 Redshift Spectrum Athena EMR Marketing data source Other source systems Transformed data S3
  • 31. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FINRA oversees > 3,000 securities firms doing business in the United States. Challenge: FINRA’s legacy system did not scale well • Up to 75 billion events per day • Run complex surveillance queries over 20+ PB of data Solution: • Migrated their big data appliance to a S3 Data Lake and used EMR for ingestion and processing • Migrated to RDS and testing Aurora
  • 32. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FINRA uses S3 to Build Data Lake with EMR • Required fast access across trillions of trade records (20PB+) • Migrated from on-premises system • Use Apache HBase on Amazon EMR to store and serve this data • Use EMR engines— Spark, Presto, and Hive to process data • Lower costs by 60% over on-premises system Spark on EMR Presto on EMR Hive on EMR S3 Herd Metastore HBase on EMR
  • 33. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Nasdaq operates financial exchanges around the world, and processes large volumes of data. Challenge: Nasdaq wanted to make their large historical data footprint available to analyze as a single dataset. Solution: • Use Amazon Redshift for interactive querying • Use Amazon S3 as a Data Lake, and Presto on EMR to process historical data
  • 34. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Nasdaq Uses AWS to Build a Data Lake • Migrate legacy on-premises warehouse to Amazon Redshift • 4.8B rows inserted per trading day (orders, trades, quotes) • Ingest data from multiple sources, validates, and stages in S3 • Redshift reads data out of S3 for fast queries • Presto on EMR and S3 used for analysis of massive historical data set Data from all 7 exchanges operated by Nasdaq (orders, quotes, trade executions) Flat files Operational Databases EMR Redshift S3 SQL Clients
  • 35. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Overview
  • 36. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • A centralized repository for both structured and unstructured data • Store data as-is in open-source file formats to enable direct analytics What is a Data Lake?
  • 37. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why a Data Lake? • Decouple storage from compute, allowing you to scale • Enable advanced analytics across all of your data sources • Reduce complexity in ETL and operational overhead • Future extensibility as new database and analytics technologies are invented
  • 38. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Traditionally, Analytics Looked Like This OLTP ERP CRM LOB Data Warehouse Business Intelligence TBs-PBs Scale Schema Defined Prior to Data Load Operational and Ad Hoc Reporting Large Initial Capex + $$K / TB/ Year Relational Data
  • 39. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lakes Extend the Traditional Approach OLTP ERP CRM LOB Data Lake 1001100001001010111001 0101011100101010000101 1111011010001111001011 0010110 0100011000010 Catalog DW Queries Big Data Processing Interactive Real-Time Web Sensors SocialDevices Business Intelligence Machine Learning TB-EBs Scale All Data in one place, a Single Source of Truth Relational and Non-Relational Data Decouples (low cost) Storage and Compute Schema on Read Diverse Analytical Engines
  • 40. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake – All Data in One Place Store and analyze all of your data, from all of your sources, in one centralized location. “Why is the data distributed in many locations? Where is the single source of truth ?”
  • 41. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake – Quick Ingest Quickly ingest data without needing to force it into a pre-defined schema. “How can I collect data quickly from various sources and store it efficiently?”
  • 42. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake – Storage vs Compute Separating your storage and compute allows you to scale each component as required “How can I scale up with the volume of data being generated?”
  • 43. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake – Schema on Read “Is there a way I can apply multiple analytics and processing frameworks to the same data?” A Data Lake enables ad-hoc analysis by applying schemas on read, not write.
  • 44. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building a Data lake on AWS
  • 45. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why AWS? Implementing a Data Lake architecture requires a broad set of tools and technologies to serve an increasingly diverse set of applications and use cases.
  • 46. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake on AWS Catalog & Search Access & User Interfaces Data Ingestion Analytics & Serving S3 Amazon DynamoDB Amazon Elasticsearch Service AWS AppSync Amazon API Gateway Amazon Cognito AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS Database Migration Service Amazon Athena Amazon EMR AWS Glue Amazon Redshift Amazon DynamoDB Amazon QuickSight Amazon Kinesis Amazon Elasticsearch Service Amazon Neptune Amazon RDS Central Storage Scalable, secure, cost- effective AWS Glue
  • 47. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Designed for 11 9s of durability Designed for 99.99% availability Durable Available High performance § Multiple upload § Range GET § Store as much as you need § Scale storage and compute independently § No minimum usage commitments Scalable § Amazon EMR § Amazon Redshift § Amazon DynamoDB Integrated § Simple REST API § AWS SDKs § Read-after-create consistency § Event notification § Lifecycle policies Easy to use Why Amazon S3 for a Data Lake?
  • 48. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What can you do with a Data Lake?
  • 49. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Query Directly with Amazon Athena
  • 50. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analyze with Hadoop on Amazon EMR
  • 51. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Create Visualizations with Amazon QuickSight
  • 52. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Train ML Models with Amazon SageMaker
  • 53. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Create a Central Data Catalog with AWS Glue
  • 54. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Load into Downstream Services AURORAAmazon Redshift Amazon DynamoDB Amazon Aurora Amazon Elasticsearch Run complex analytic queries against petabytes of structured data A NoSQL database service that delivers consistent, single-digit millisecond latency at any scale. A MySQL and PostgreSQL compatible relational database built for the cloud Delivers Elasticsearch’s real-time analytics capabilities alongside the availability, scalability, and security that production workloads require.
  • 55. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Movement into the Data Lake
  • 56. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Sources FilesLogsStreamsDatabases
  • 57. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Sources - Databases Amazon S3Databases
  • 58. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Change Data Capture Techniques to Capture Changes • Timestamp • Diff Comparison • Triggers • Transaction Log
  • 59. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Change Data Capture – Timestamp 4/18/18 300 3/12/18 800 9/25/17 230 2/04/18 100 4/18/18 300 7/16/19 1600 9/25/17 230 2/04/18 100 Last Run: 7/16/19 1400 Kinesis Data Firehose Amazon S3
  • 60. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Change Data Capture – Diff Compare 6/15/18 0300 6/16/18 0300 20180615T0300 20180616T0300 Diff Compare Kinesis Data Firehose Amazon S3
  • 61. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Change Data Capture – Triggers SELECT Id: 20982358 Name: Jean-Luc Picard Rank: Captain State: Agitated Roster ChangeData Table: Roster Id: 20982358 Operation: Update Job: ag8afh8 ChangeDataBatch SELECT Table: Roster Id: 20982358 Operation: Update Amazon S3 Write operations to Firehose Kinesis Data Firehose
  • 62. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Change Data Capture – Database Logs LOG_FILE_HDR_SIZE OS_FILE_LOG_BLOCK _SIZE FORMAT CHECKSUM LOG_CHECKPOINT_1 LOG_CHECKPOINT_2 Checkpoint_lsn Checkpoint_no Log.buf_size LOG BLOCK LOG_BLOCK_HDR_SIZ E Hdr_no […] ??? Tx001.log
  • 63. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Database Migration Service (AWS DMS) easily and securely migrate and/or replicate your databases and data warehouses to AWS AWS Schema Conversion Tool (AWS SCT) convert your commercial database and data warehouse schemas to open- source engines or AWS-native services, such as Amazon Aurora and Redshift Database Migration Service
  • 64. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Modernize Migrate Replicate Modernize your database tier – • Commercial to open-source • Commercial to Amazon Aurora Modernize your Data Warehouse – • Commercial to Redshift • Migrate business-critical applications • Migrate from Classic to VPC • Migrate data warehouse to Redshift • Upgrade to a minor version • Create cross-regions Read Replicas • Run your analytics in the cloud • Keep your dev/test and production environment sync When to use DMS and SCT?
  • 65. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Sources - Files Amazon S3Files
  • 66. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Files Optimizing Transfers Available Services • S3 Multi-Part Upload • S3 Transfer Acceleration • AWS Direct Connect • AWS DataSync • AWS Transfer - SFTP • AWS Snowball/Snowmobile
  • 67. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Uploading to Amazon S3 • Amazon S3 supports both a single-part upload and a multi-part upload API • The single-part upload supports objects up to 5 GB in size • The multi-part upload supports objects up to 5 TB in size • The multi-part upload also enables you to maximize your throughput by using parallel threads
  • 68. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. PUT requests go through the nearest AWS Edge Location Data transits over the AWS private network rather than Internet AWS private network optimizes throughput and latency to the AWS Region Data is not stored in the edge cache S3 Transfer Acceleration S3 bucket AWS edge location Uploader
  • 69. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Direct Connect Amazon S3 VPC Endpoint Customer Gateway Corporate Data Center AWS Region Virtual Private Cloud EC2 Direct Connect Location Customer/Partner Cage AWS Cage Customer/Partner Router AWS Direct Connect Endpoint Private Virtual Interface Public Virtual Interface
  • 70. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS DataSync Online transfer service that simplifies, automates, and accelerates moving data between on-premises storage and AWS Fast data transfer Cost- effective Combines the speed and reliability of network acceleration software with the cost-effectiveness of open source tools Easy to use Secure and reliable Cloud integrated AWS
  • 71. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Transfer for SFTP Fully managed SFTP service for Amazon S3 Native integration with AWS services Simple to use Cost-effective Fully managed in AWS Secure and Compliant Seamless migration of existing workflows
  • 72. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Snowball/Snowmobile Use Case AWS Solution Cloud Migration, Disaster Recovery AWS Snowball Internet of Things (IoT), Remote Locations AWS Snowball Edge Migrating Exabytes of Data AWS Snowmobile
  • 73. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Sources - Streams Amazon S3Streams
  • 74. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streams Collecting and Analyzing • Amazon Kinesis • Amazon Managed Streaming for Kafka (MSK) • Example: Clickstream Analytics
  • 75. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis - Stream Processing on AWS Firehose • Buffer records in a stream into a single output for more efficient storage • Automatic flushing of buffer to S3, ElasticSearch, Redshift, or Splunk Analytics • Create time windows over streams and perform aggregate operations using SQL • Join together multiple streams and output to new streams Streams • Capture streaming data for downstream processing • Allow multiple processors to read streams at their own rate
  • 76. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Summary - Ingestion s3://datalake/ /vendorfeeds /vendorA /vendorB /clickstream /orders /vendors /customers /app_logs /instance1 /instance2 /syslogs /instance1 /instance2 /databases /customers /orders /vendors API Gateway Kinesis Agent DMS Kinesis Data Firehose Amazon S3 Files Streams Logs Databases AWS DataSync
  • 77. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Consuming Data from the Data Lake
  • 78. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Anti-Pattern Everything Query
  • 79. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Also an Anti-Pattern Everything Query
  • 80. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. One tool to rule them all
  • 81. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Where do I start? • Understand your data • Data Structure, Access patterns & characteristics, Temperature, Cost, Size • Know your audience • Business Users, Data Scientists, Developers • Select the right service
  • 82. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Archival In-memory Warehouse NoSQL Hot data Warm data Cold data Data Structure Low High Object Search Understand your Data Latency Data volume HighLow Request rate Cost / GB High Low
  • 83. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon ElastiCache Amazon ES Amazon DynamoDB Amazon S3 Amazon Glacier Hot data Warm data Cold data Data Structure Low High Understand your Data Latency Data volume HighLow Request rate Cost / GB High Low NoSQL Object Archival Search In-Memory Warehouse Amazon Redshift
  • 84. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. PRIORITIES NEEDS Creating engaging visual and narrative journeys for analytical solutions Data Visualizer Manages data as a product. Ensures freshness and consistency of data; understands lineage and compliance needs; treats DS as customers Data Product Manager Monitoring for reliability, quickly diagnose deployment or availability issues DevOps Engineer ROLE Visualization Dashboards Reporting Reports – data quality, errors Ad hoc querying Dashboards Makes sense of data, generates and communicates insights to improve or create business processes, creates predictive ML models to support them Data Scientist Ad hoc querying Robust ML tools Builds scalable pipelines, transforms and loads data into structures complete with metadata that can be readily consumed by DS Data Engineer Ad hoc querying Quick visualization Vetting the priortization and ROI, funding projects, providing ongoing feedback Business Sponsor Reporting Dashboards
  • 85. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Overview of AI/ML
  • 86. Machine Learning Learning without being explicitly programmed Artificial Intelligence Machines or programs exhibiting intelligence Deep Learning Learning based on Deep Neural Networks AI vs Machine Learning vs Deep Learning
  • 87. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Closer Look at Machine Learning and when do you use it
  • 88. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 89. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. 43,252,003,274,489,856,000 43 QUINTILLION UNIQUE COMBINATIONS
  • 90. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. F2 U' R' L F2 R L' U' Learning function
  • 91. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. F2 U' R' L F2 R L' U' Learning function 1% accuracy R U r U R U2 r U2% accuracy
  • 92. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Learning function 20% accuracy 40% accuracy 60% accuracy 80% accuracy 95% accuracy 2% accuracy
  • 93. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Learning function 95% accuracy ? F2 R F R′ B′ D F D′ B D F
  • 94. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 95. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Don’t code the patterns; let the system learn through data
  • 96. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Train a model positive/negative reinforcement Infer from a model to obtain a prediction Data Feedback Model
  • 97. Supervised Learning It is a cat. No, it’s a Dog.
  • 98. Supervised Learning – How Machine Learn Human intervention and validation required e.g. Photo classification and tagging Input Label Machine Learning Algorithm Dog Prediction Cat Training Data ? Label Dog Adjust Model
  • 99. Unsupervised Learning No human intervention required (e.g. Customer segmentation) Input Machine Learning Algorithm Prediction
  • 100. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Retail Demand Forecasting Vendor Lead Time Prediction Pricing Packaging Substitute Prediction Customers Recommendation Product Search Product Ads Shopping Advice Customer Problem Detection Catalogue Browse-Node Classification Meta-data Validation Review Analysis Product Matching Text In-Book Search Named-entity Extraction Summarisation/Xray Plagiarism Detection Seller Fraud Detection Predictive Help Seller Search & Crawling Images Visual Search Product Image Enhancement Brand Tracking Machine Learning at Amazon.com
  • 102. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alexa, Hello!
  • 103. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 104. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 105. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 106. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 108. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Put AI and ML in the hands of every developer and data scientist Our Mission at AWS
  • 109. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. M L F R A M E W O R K S & I N F R A S T R U C T U R E A I S E R V I C E S R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N V I D E O Vision Speech Language Chatbots A M A Z O N S A G E M A K E R B U I L D T R A I N F O R E C A S T Forecasting T E X T R A C T P E R S O N A L I Z E Recommendations D E P L O Y Pre-built algorithms & notebooks Data labeling (G R O U N D T R U T H ) One-click model training & tuning Optimization (N E O ) One-click deployment & hosting M L S E R V I C E S F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e E C 2 P 3 & P 3 N E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C I N F E R E N C E Reinforcement learningAlgorithms & models ( A W S M A R K E T P L A C E F O R M A C H I N E L E A R N I N G )
  • 111. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Vision: Amazon Rekognition Key Features Object & Scene Detection Image Moderation Facial Analysis Facial Comparison Facial Recognition Celebrity Recognition
  • 113. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Object and Activity Detection Person Tracking Face Recognition Real-time Live Stream Content Moderation Celebrity Recognition Vision: Amazon Rekognition Video Video Analysis
  • 114. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Speech: Amazon Polly Key Features • 50 Voices • 24 Languages • Lip-Syncing & Text Highlighting • Fine-grained Voice Control • Custom Vocabularies
  • 115. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Language: Amazon Lex Conversational interfaces for your applications, powered by the same Natural Language Understanding (NLU) & Automatic Speech Recognition (ASR) models as Alexa
  • 116. Amazon Connect Contact Center Can Use Amazon Lex for Natural Conversations
  • 118. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. ML can be very complicated 1 2 3 1 2 3
  • 119. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker: build, train, and deploy ML at Scale 1 2 3 1 2 3
  • 120. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1 2 3 1 2 3 Amazon SageMaker: build, train, and deploy ML at Scale
  • 121. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1 2 3 1 2 3 Amazon SageMaker: build, train, and deploy ML at Scale
  • 122. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1 2 3 1 2 3 Amazon SageMaker: build, train, and deploy ML at Scale
  • 123. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1 2 3 1 2 3 Amazon SageMaker: build, train, and deploy ML at Scale
  • 124. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1 2 3 1 2 3 Amazon SageMaker: build, train, and deploy ML at Scale
  • 125. How do you make it easier to obtain high quality labeled data?
  • 126. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker: Build, train, and deploy ML
  • 127. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Successful models require high-quality data
  • 128. Build highly accurate training datasets and reduce data labeling costs by up to 70% using machine learning
  • 129. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker ground truth Label machine learning training data easily and accurately
  • 130. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank You