SlideShare a Scribd company logo
1 of 146
Download to read offline
AWS Partner: Data Analytics on
AWS – Technical
Amey Birje
Sr AWS Partner Trainer
Module 1: Course Introduction
Course objectives
In this course, you will learn how to:
• Identify Amazon Web Services (AWS) services in the AWS analytics stack
• Describe decision points and technology selections for data analytics architectures
• Discuss the AWS Data Pipeline and the customer data analytics journey using the Data Flywheel
• Describe five AWS data analytics technical solutions:
• Modernizing a data warehouse with Amazon Redshift
• Data lakes
• Streaming data
• Data governance
• Machine learning (ML)
• Locate and use AWS Partner Network (APN) Partner resources for opportunities and training
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 3
About this course
• This course is for technical professionals at APN Consulting Partner organizations
who are engaged in pre-sales discussions with customers to help architect data
analytic solutions on AWS and answer technical questions about using AWS
data analytics services.
• This 1-day course is focused on educating technical professionals with sufficient
technical knowledge on AWS data analytics services and solutions to
successfully engage with and help customers.
• This course is not designed to be a technical deep dive into AWS data analytics
services and solutions. It provides the necessary resources and learning path
towards gaining deeper knowledge into the services.
4
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Module 2: AWS Data Analytics
Portfolio
Objectives
In this module, you will learn how to:
• Understand customer challenges related to data analytics in their business
• Provide a technical overview of AWS data analytics portfolio
• Discuss technical advantages and position of data analytics solutions on AWS
• Explain how to build a data analytics pipeline
• Explain the Data Flywheel
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 8
Customer challenges and
opportunities for APN Partners
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 9
New realities
https://assets.ey.com/content/dam/ey-sites/ey-com/en_gl/topics/workforce/Seagate-WP-DataAge2025-March-2017.pdf
Data
every 5 years
There is more data than
people think
15
years
live for
Data platforms must
1,000x
scale
>10x
grows
Data is ever-growing
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 11
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
New realities
By making 10% more data accessible, a typical Fortune 1000
company will see a $65 million increase in net income.*
Explosion of data-
connected devices, apps,
and systems generate more
data than ever before.
Pay-as-you-go pricing
allows organizations to
analyze data to gain
insights.
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 12
*Source: Forbes Online; New Vantage Partners - Big Data Executive Survey
https://www.forbes.com/sites/cognitiveworld/2019/02/06/data-the-fuel-powering-ai-digital-transformation/#5062b36b578b
Demand growing for faster
decision making on
real-time data.
Customers need your help
13
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
85% of businesses want
to be data driven,
but only 37% have
been successful.
https://www.forbes.com/sites/cognitiveworld/2019/02/06/data-the-fuel-powering-ai-digital-transformation/#51efb027578b
http://newvantage.com/wp-content/uploads/2017/01/Big-Data-Executive-Survey-2017-Executive-Summary.pdf
Common data analytics challenges
14
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Top four challenges
involve knowledge, skill,
security, and privacy
This is your opportunity
Data security (unauthorized access to company data)
Data privacy issues (safety of personal data)
What challenges do you see when using big data
analytics/technologies? (n=545)
Inadequate technical know-how in our company
53%
49%
48%
48%
Inadequate analytical know-how in our company
https://bi-survey.com/challenges-big-data-analytics
AWS data analytics portfolio
overview
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 15
Secure infrastructure for analytics
Customers need multiple levels of security, identity and access
management, encryption, and compliance to secure their data lake.
16
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Compliance
AWS Artifact
Amazon Inspector
AWS CloudHSM
Amazon Cognito
AWS CloudTrail
Security
Amazon GuardDuty
AWS Shield
AWS Well-Architected Tool
Amazon Macie
Amazon Virtual Private
Cloud (Amazon VPC)
Encryption
AWS Certificate Manager
Private Certificate Authority
(ACM Private CA)
AWS Key Management Service
(AWS KMS)
Encryption at rest
Encryption in transit
Bring your own keys,
hardware security module
(HSM) support
Identity
AWS Identify and Access
Management (IAM)
AWS Single Sign-On
Amazon Cloud Directory
AWS Directory Service
AWS Organizations
AWS data analytics portfolio
AWS Database Migration Service (AWS DMS) | AWS Snowball | AWS Snowmobile | Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams | Amazon Managed Streaming for Apache Kafka
Data movement
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 17
Amazon
QuickSight
Amazon
SageMaker
Amazon
Comprehend
Amazon
Lex
Amazon
Polly
Amazon
Rekognition
Amazon
Translate
Amazon
Pinpoint
AWS Data
Exchange
Data visualization, engagement, and machine learning
Amazon
Redshift
Amazon EMR
(Spark and Presto)
Amazon
Athena
Amazon Opensearch
Service
Amazon Kinesis
Data Analytics
AWS Glue
(Spark and Python)
Analytics
Amazon Simple Storage Service (Amazon S3)
Amazon S3 Glacier
AWS Glue
AWS Lake Formation
Data lake infrastructure and management
Data movement services
Help customers move data from on premises to the cloud
18
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DMS AWS Snowball AWS Snowmobile
Amazon Managed
Streaming for
Kafka
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Firehose
Data lake services
Customers are constrained by volume, variety, veracity, and velocity of
on-premises data, and data silos pose a major challenge.
19
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3 Amazon S3 Glacier AWS Lake Formation AWS Glue
Analytics services
Help customers extract value out of their data
20
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift Amazon EMR AWS Glue
Amazon
OpenSearch
Amazon Athena Amazon Kinesis
Data Analytics
Data visualization, engagement, and
machine learning services
Help customers understand and visualize their data, and use
machine learning (ML) for advanced analytics and predictions
21
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight Amazon SageMaker
AWS Data Exchange
AWS value proposition
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 22
Standards, formats, and open source
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Apache Flink
• Ganglia
• Apache HBase
• HCatalog
• Hadoop Distributed
File System (HDFS)
• Apache Hive
• Hudi
• Java
• JupyterHub
• Apache Kafka
• Apache Livy
• Apache Mahout
• MapReduce
• Apache MXNet
• MySQL
• Apache Oozie
• Apache ORC
• Apache Parquet
• Phoenix
• Apache Pig
• Presto
• Python
• PyTorch
• R
• Scala
• Apache Spark
• Sqoop
• SQL
• TensorFlow
• Tez
• Yarn
• Apache Zeppelin
• Apache Zookeeper
…and many more
23
AWS alternatives to open source
24
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EMR
Amazon
OpenSearch Service
Managed Streaming
for Apache Kafka
Real-time
analytics
Kafka
Operational
analytics
Elasticsearch
Logstash
Kibana
Spark, Hive, Presto,
Flink, HBase
Hadoop
Spark
Data analytics pipeline
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 25
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data management challenges
How can customers:
• Collect a variety of data types accumulating at varying velocities?
• Collect data from numerous sources accumulating at differing velocities?
• Store massive amounts of data without running out of space?
• Cleanse and augment data quality to be analyzed?
Can they automate these steps?
26
Data analytics pipeline
Collect
Store
Process and
analyze
Visualize
Insights
Time-to-answer (latency)
Balance of throughput and cost
Data Insights
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf?did=wp_card&trk=wp_card
27
Data pipeline challenges
Building a data pipeline is challenging. Customers must:
• Manage updates, patches, and software integrations
• Handle increased overhead costs plus need for support
• Maintain focus on the core task of building applications that lead to data insights
28
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS data analytics pipeline services
29
Collect Store Process and analyze Visualize
Automate
Amazon Kinesis
Data Firehose
AWS Direct
Connect
Amazon Kinesis
Data Streams
AWS
Snowball
Amazon
S3 Glacier
Amazon S3
Amazon DynamoDB Amazon RDS
Amazon Aurora
Amazon OpenSearch
Amazon EMR
Amazon Kinesis
Data Analytics
Amazon
QuickSight
Amazon Redshift
Amazon Athena
AWS Database Migration Service
Amazon
SageMaker
AWS Glue
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Managed
Streaming for
Kafka
Data Flywheel
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 30
010010010
01010001
100010100
Data Flywheel and customer journey
Build data-driven
applications
Modernize data
warehouse and
build a data lake
Migrate data and
workloads to the cloud
ü Save time
ü Save costs
Store and
manage data
ü Agility
ü Global distribution
ü Scale and performance ü New and faster insights
ü Broader access to analytics
Innovate with
machine learning
ü Better experiences
ü Deeper engagement
ü Efficient processes
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 31
Attract new customers
Generate more data
Data
https://pages.awscloud.com/data-flywheel.html
Summary
In this module, you learned about:
• Customer challenges related to data analytics
• AWS data analytics portfolio
• Technical benefits of AWS data analytics solutions
• Data analytics pipeline
• Data Flywheel
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 32
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission
from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course-
feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners.
Thank you
33
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Module 3: Data Analytics
Solutions on AWS – Part I
Objectives
In this module, you will learn how to:
• Explain data migration options from on premises to the AWS Cloud
• Describe two AWS data analytics technical solutions
• Modernizing a data warehouse with Amazon Redshift
• Data lakes
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 35
Evolution of data architecture
Traditional
data warehousing
Data lakes
on AWS
Real-time
analytics with
streaming data
Data warehouse
modernization
Data
governance
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
Machine
learning
Data migration options
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 36
Journey to a modern data architecture
Evolution of data architecture
Traditional
data warehousing
Data lakes
on AWS
Data
warehouse
modernization
100110000100
101011100101
010111001010
100001011111
011010
001111001011
0010110
010001100001
0
Types of data
Data
governance
Machine
learning
Real-time analytics
with streaming data
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 37
AWS data migration options
38
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Snowball
AWS Storage
Gateway
Amazon S3 Transfer
Acceleration
AWS Direct
Connect
AWS Database
Migration Service
Amazon Kinesis
Data Firehose
• File gateway
• Tape gateway
• Volume gateway
• Snowball Edge storage
optimized
• AWS Snowmobile
Solution 1: Modernizing a data
warehouse with Amazon Redshift
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 39
Journey to a modern data architecture
Evolution of data architecture
Traditional
data warehousing
Data lakes
on AWS
Data
warehouse
modernization
100110000100
101011100101
010111001010
100001011111
011010
001111001011
0010110
010001100001
0
Types of data
Data
governance
Machine
learning
Real-time analytics
with streaming data
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 40
Data warehouses
41
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
44
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Traditional architecture and on-premises
data warehouse challenges
• Difficult to scale
• Long lead times for hardware procurement
• Complex upgrades are the norm
• High overhead costs for administration
• Expensive licensing and support costs
• Proprietary formats do not support newer open data formats, which results in data silos
• Data not cataloged, unreliable quality
• Licensing cost limits number of users and how much data can be accommodated
• Difficult to integrate with services and tools
Amazon Redshift
45
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
A fully managed data warehouse that is highly integrated with
other AWS services. Features include:
• Optimized for high performance
• Support for open file formats
• Petabyte-scale capability
• Support for complex queries and analytics, with data visualization
tools
• Secure end-to-end encryption and certified compliance
• Service Level Agreement (SLA) of 99.9 percent
• Based on open source Postgres database
• Cost efficient
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://aws.amazon.com/redshift/pricing/
Amazon Redshift
Secure data warehouse that extends seamlessly to a data lake
46
Amazon Redshift performance features
Breaks a large job it into smaller
tasks, then distributes the tasks to
multiple compute nodes
47
Independent and resilient nodes
without any dependencies
Data from each column is stored
together so the data can be
accessed faster, without scanning
and sorting all other columns
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Massively parallel processing
(MPP)
Columnar storage Shared-nothing architecture
Result: Faster processing time Result: Compression of stored
data improves performance
Result: Improves scalability
Amazon Redshift architecture
48
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Client applications
Leader node
Compute Node 1 Compute Node 2
Data warehouse cluster
Java Database
Connectivity
(JDBC)
Open Database
Connectivity
(ODBC)
https://docs.aws.amazon.com/redshift/index.html
Node slices Node slices
Leader node
Responsible for communication with the client application and
compute notes
49
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift leader
node:
• SQL endpoint
• Metadata
• Query compilation and
optimization
• Coordinates parallel SQL
processing
• Machine learning (ML)
optimizations
Leader node
Compute node 1 Compute node 2
Data warehouse cluster
Node slices Node slices
Compute node
• SQL running powerhouses
• Compute node can load, unload, backup, and
restore data to and from Amazon S3.
• Node clusters range from 1 to 128.
50
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Runs queries in parallel and returns the result to the leader node
Leader node
Compute node 1 Compute node 2
Data warehouse cluster
Node slices Node slices
Compute node slices
51
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Slices are a symmetric multiprocessing (SMP) mechanism.
Slice 1 | Slice 2
Local disk Local disk
Virtual core Virtual core
7.5 GB
RAM
7.5 GB
RAM
• Partitioned into slices.
• Slices work in parallel to
complete operations.
• Virtual processors contained in
each compute node.
• Each slice is allocated an equal
amount of memory, compute
allowance, and disk space.
• Each slice operates in parallel
but can request data from
other slices.
Compute node 1 Compute node 2
Data warehouse cluster
Node slices Node slices
Amazon Redshift cluster resizing:
Two approaches
52
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Elastic resize
Exiting cluster is modified to add or remove nodes in
two stages.
Stage 1
• Cluster is temporarily unavailable while elastic
resize migrates cluster metadata.
• Typically completes in minutes.
• Amazon Redshift holds session connections
while queries remain queued.
Stage 2
• Session connections are reinstated and queries
resume.
• Redistributes data to node slices in the
background.
• Cluster is available for read and write
operations.
Classic resize
Can be reconfigured to different node count and
instance type.
• Might take one or more hours to complete,
depending on data size.
• Involves streaming all data from original
cluster to newly configured cluster.
• During the resize, original cluster is in read-
only mode.
• Customer charged for only one cluster.
Amazon Redshift instance types
53
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html
Management interfaces
54
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://us-west-2.console.aws.amazon.com/redshiftv2/home?region=us-west-2#query-editor
Amazon Redshift
differentiating features
55
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
differentiating features
56
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Federated query
Amazon Redshift
lake house architecture
Federated query
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data warehouse Amazon Aurora
OLTP ERP CRM LOB
Integrate queries on live data in Amazon RDS for
PostegreSQL and Amazon Aurora PostgreSQL with
queries on Amazon Redshift and Amazon data lake
Reduce data moved over the network with
Amazon Redshift’s intelligent optimizer. Pushes
and distributes portions of computation directly
into remote operational databases
Benefits
• Incorporate live data into business intelligence
(BI) and reporting applications
• Ingest data into Amazon Redshift
• Query operational databases directly
• Apply transformations on the fly
• Load data into target tables without
complex ETL pipelines
57
Amazon Redshift
lake house architecture
With Amazon Redshift lake house
architecture, customers can:
• Query data in the data lake and
write data back in open formats
• Use familiar SQL statements to
combine and process data across
data stores
• Run queries on live data in
operational databases without
requiring data loading and ETL
pipelines
58
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift lake house queries are run by a fleet of nodes that are
owned and maintained by AWS.
https://aws.amazon.com/redshift/lake-house-architecture/
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 59
SQL clients, business intelligence tools
Leader node
Compute node 1
Node slices
JDBC/ODBC
Compute node 2
Node slices
Amazon S3 AWS Glue Data
Catalog
Amazon Redshift
lake house
Amazon Redshift
lake house fleet
1
SELECT COUNT(*)
FROM
S3.EXT_TABLE
GROUP BY…
Query
2
Query is optimized and compiled
using ML at the leader node.
Determine what is run locally and
what goes to Amazon
Redshift lake house.
3 Query plan sent
to all compute
nodes.
4 Compute nodes
obtained from the Data
Catalog; dynamically
prune partitions.
5 Each compute node issues
multiple requests to Amazon
Redshift lake house layers.
6 Amazon Redshift lake house
nodes scan Amazon S3 data.
7 Amazon Redshift lake
house projects, filters, joins,
and aggregates.
8 Final aggregations and joins
with local Amazon Redshift
tables done in-cluster.
9 Result is sent to client.
Advanced Query Accelerator (AQUA)
A new distributed and hardware-accelerated cache that makes Amazon Redshift
faster than other cloud data warehouses, without increasing cost
60
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Minimizes data movement over the network
by pushing operations to Advanced Query
Accelerator (AQUA) nodes
AQUA nodes with custom AWS designed
analytics processors to make operations
(compression, encryption, filtering, and
aggregations) faster than traditional CPUs
RA3
cluster
AQUA node
Custom
AWS designed
processor
Running in parallel
Amazon Redshift managed storage
RA3
cluster
RA3
cluster
AQUA node
Custom
AWS designed
processor
AQUA node
Custom
AWS designed
processor
AQUA node
Custom
AWS designed
processor
Migration to Amazon Redshift
61
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS SCT data extractors
63
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift extracts data
through local migration agents.
Data is optimized for Amazon
Redshift and saved in local files.
Files are loaded to an Amazon S3
bucket (through network or AWS
Snowball) and then to Amazon
Redshift.
Amazon
Redshift
AWS SCT S3 Bucket
Amazon S3 Amazon
Redshift
AWS SCT
Legacy data
warehouse
Use case: Equinox
64
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenge
Their data warehouse had limited
integration, was very expensive,
and required a lot of platform-
specific domain knowledge.
They needed to reduce
administration and costs, blend
structured and semi-structured
data for analytics, and evolve into
a data lake strategy.
Solution
Equinox migrated from a legacy
data warehouse to Amazon
Redshift to combine data from
disparate sources like clickstream
data, cycling log data, club
management software, and
more.
They land data directly in an
Amazon S3 data lake and
perform analytics using Amazon
Redshift, Redshift Spectrum, and
Amazon EMR.
Benefits
Their monthly Amazon Redshift
bill is now 20% of prior yearly
maintenance of their legacy data
warehouse.
AWS data lake and analytics
reduced report delivery time
from months to days.
Equinox sees faster reports, 80% cost savings with Amazon Redshift.
https://www.youtube.com/watch?v=EvDicFx9StE
Solution 2: Data lakes
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 66
Journey to a modern data architecture
Evolution of data architecture
Traditional
data warehousing
Data lakes
on AWS
Data
warehouse
modernization
100110000100
101011100101
010111001010
100001011111
011010
001111001011
0010110
010001100001
0
Types of data
Data
governance
Machine
learning
Real-time analytics
with streaming data
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 67
Data lakes defined
69
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Stores all structured, semi-structured,
unstructured, and binary data at unlimited scale
• Holds curated and raw data
• Uses AWS data analytics tools for analytics
• Increases pace of innovation by extracting insights
from data
• Enables more organizational agility
• Reduces cost and delivers results with predictive
analytics and ML
Architectural approach for a centralized
enterprise data repository stored on
Amazon S3
Machine
learning
Business
intelligence
and
analytics
Data
warehousing
Data lake
Open formats
central catalog
Secure data lake on Amazon S3
70
Amazon S3
Access Points
Amazon S3
object lock
Amazon S3
object tags
Amazon S3
Block Public Access
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon FSx
for Lustre
• Multi-tenant bucket
• Dedicated access points
• Customer permissions
from an Amazon Virtual
Private Cloud (Amazon
VPC)
• Across AWS accounts and
Amazon S3 bucket level
• Specify public permissions
using Access Control List
(ACL) or policy
• Four settings:
• BlockPublicAcls
• IgnorePublicAcls
• BlockPublicPolicy
• RestrictPublicBuckets
• Access control, lifecycle
policies, and analysis
• Classify data with
metadata
• Use tags to filter objects
• Define replication policies
• Populate tags with AWS
Lambda functions or S3
Batch Operations
• Immutable Amazon S3
objects
• Retention management
controls
• Data protection and
compliance
https://aws.amazon.com/compliance/services-in-scope
71
IAM
Amazon CloudWatch AWS STS AWS CloudTrail
AWS KMS
Protect and secure
Machine
learning
Amazon QuickSight Amazon EMR
Amazon
Redshift
Amazon Athena
Processing and analytics
Amazon Kinesis
AWS
Direct Connect AWS Snowball
AWS DMS
AWS Data Exchange
Data ingestion
AWS Glue Amazon ES
Amazon DynamoDB
Catalog and search
Amazon API Gateway IAM Amazon Cognito
Access and user interface
Amazon S3
Central storage
Reference architecture:
Data lake on AWS
Data services – AWS Glue
72
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cleansing data
After migration, data still presents challenges:
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 73
Data is increasingly diverse
• Volume
• Variety
• Velocity
• Veracity
It accumulates rapidly
• Missing or incorrect data
• Wrong data format
• Partial missing data
Avoid unsearchable data
It must be cleansed before
analyzed by many applications
How can customers provide access to users to gain insights?
AWS Glue
74
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Data
Catalog
Job authoring
Job running
Job workflow
§ Hive metastore compatible with enhanced functionality
§ Crawlers automatically extracts metadata and creates tables
§ Integrates with Amazon Athena, Amazon EMR, and many more
§ Run jobs on a serverless Spark platform
§ Use flexible scheduling, job monitoring, and alerting
§ Generates ETL code
§ Build on open frameworks – Python, Scala, and Apache Spark
§ Developer-centric – editing, debugging, sharing
§ Orchestrate triggers, crawlers, and jobs
§ Author and monitor entire flows and integrated alerting
AWS Glue crawlers
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 75
Amazon Redshift
Amazon DynamoDB
Amazon S3
Databases
AWS IAM role
AWS Glue crawler
JDBC
connection
NoSQL
connection
Object
connection
Built-in classifiers
MySQL
MariaDB
PostgreSQL
Amazon Aurora
Oracle
Amazon Redshift
Apache Avro
Parquet
ORC
XML
JSON and JSONPaths
AWS CloudTrail
Binary JSON (BSON)
Logs
Delimited
… growing
AWS Glue Data Catalog services
76
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Data
Catalog
Amazon Redshift
lake house
Amazon Athena
AWS Glue ETL
Amazon EMR
Use case: Log aggregation with ETL
77
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS service logs
Web application logs
Server logs
Amazon S3
bucket
AWS Glue
crawler
Update table partition
Create partition
on Amazon S3
Query data
AWS Glue ETL
Amazon S3
bucket
AWS Glue Data
Catalog
Amazon Athena
Data services – AWS Data
Exchange and Amazon Athena
78
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Data Exchange
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Find diverse data in one place Analyze data Access third-party data
Find and subscribe to third-party data in the cloud
• More than 1,000 data products
• More than 80 data providers
• Download of copy of data to
Amazon S3
• Combine, analyze, and model with
existing data
• Streamlined access to data
• Minimize legal reviews and
negotiations
79
Amazon Athena
80
No setup costs Streamlined
Open
Pay per query
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Interactive query service to analyze data in Amazon S3 using standard SQL
SQL
$
Zero setup costs,
point to Amazon S3
and start querying
Pay only for queries run,
save 30%–90% on
per-query costs through
compression
ANSI SQL interface,
JDBC/ODBC drivers, multiple
formats, compression types,
and complex joins and data
types
Serverless, zero
infrastructure, zero
administration,
integrated with Amazon
QuickSight
AWS Lake Formation
81
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges of building a secure data lake
Typical steps to build a secure data lake
Move data
2 Cleanse,
prepare, and
catalog data
3
Configure and
enforce security and
compliance policies
4
Make data available
for analytics
5
Set up
storage
1
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 82
Data engineer Data security officer Data analyst
Ingestion and cleaning Security
Analytics and machine learning
AWS Lake Formation for a secure data lake
Secure and control Collaborate and use Monitor and audit
Ingest and organize
Automates creating data
lake and data ingestion.
Sets up fine-grained
access control and data
governance.
Search and data
discovery using Data
Catalog metadata.
To protect data, all
access is checked against
set policies.
Based on data access
and governance policies,
alert notifications are
raised on policy violation
and logged.
2 3 4
1
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 83
AWS Lake Formation builds on AWS Glue
84
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Blueprints
AWS Glue ETL jobs
Workflow
AWS Glue crawlers
AWS Glue Data Catalog
Connections,
databases, tables
Monitoring
Security, search, collaboration
AWS Glue
AWS Lake Formation
AWS Lake Formation benefits
85
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
Amazon Athena
AWS Glue
Amazon EMR
Amazon
QuickSight
Amazon
SageMaker
AWS Lake
Formation
Blueprints ML
Transforms
Data Catalog Access
control
Amazon S3
data lake storage
Cost effective, durable storage
includes global replication
capabilities.
Simplified ingest and cleaning
enables data engineers to build
faster.
Centralized management of
fine-grained permissions
empowers security officers.
Comprehensive set of integrated
tools enables every user equally.
Data visualization with Amazon
QuickSight
86
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight
87
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
BI service built for the cloud with pay-per-session pricing and ML insights
Scalable
Automatically scales with use and
activity, with no additional
infrastructure requirements.
Seamlessly grows with customers.
Pay monthly or annually.
With pay-per-session pricing, customers
only pay when they access their reports
and dashboards, with no upfront costs.
Pay for use
Fully managed cloud application,
meaning there's no upfront cost,
software to deploy, capacity planning,
maintenance, upgrades, or migrations.
Serverless and fully
managed Deeply integrated with data sources and
other AWS services like Amazon Redshift,
Amazon S3, Athena, Amazon Aurora,
Amazon RDS, IAM, AWS CloudTrail, and
Amazon Cloud Directory– providing
customers with everything they need for an
end-to-end cloud BI solution.
Fully integrated
Serverless data lakes and analytics
Amazon S3
AWS Glue
crawler
AWS Glue Data
Catalog
Amazon Athena
Amazon EMR
Amazon Redshift
Spectrum
Amazon
QuickSight
Amazon RDS
Web app data
Other databases
On-premises data
Streaming data
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 88
Different users solving different problems
89
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Data
Catalog
Amazon Athena
Amazon EMR
Amazon Redshift
Spectrum
Amazon Redshift
Amazon
QuickSight
Data lake
Amazon
SageMaker
Machine learning
Amazon Redshift
Spectrum
Kibana
Apache Zeppelin
Jupyter
Tableau
MicroStrategy
Data scientists
Data engineers
Business
reporting
Use case: COVID-19
90
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use case: COVID-19 pandemic
91
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenge
The COVID-19 pandemic has
stressed healthcare systems,
businesses, and economies. It has
disrupted the daily lives of
people around the world.
People need a solution to
capture data (diagnosis,
mortality, and recovery rates)
globally in real time, and turn the
data into insights they can share
and respond to with confidence.
Solution
Amazon worked with APN
Partners Salesforce, Tableau, and
MuleSoft to create a secure data
lake using AWS Data Exchange,
AWS Glue, Amazon Athena, and
Amazon S3 as a store of trusted
data from open source COVID-19
data providers.
Benefits
Health workers, scientists, and
decision makers can access and
compare international data to
their local data, enabling
understanding and visualization
of the impact of COVID-19
locally and globally.
This solution enables decision
making and deeper insights to
help manage and flatten the
COVID-19 curve until a vaccine is
available.
Summary
93
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evolution of data architecture
Traditional
data warehousing
Data lakes
on AWS
Real-time
analytics with
streaming data
Data warehouse
modernization
Data
governance
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
Machine
learning
Amazon Redshift
• Amazon S3
• AWS Glue
• AWS Data Exchange
• Amazon Athena
• AWS Lake Formation
• Amazon QuickSight
AWS data
migration
options
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission
from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course-
feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners.
Thank you
94
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Module 4: AWS Data Analytics
Solutions – Part II
Objectives
In this module, you will learn about three key types of data analytics
technical solutions on AWS:
• Streaming and real-time analytics with Amazon Kinesis
• Data governance
• Extended solution: Insights and monetization with machine learning (ML)
106
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evolution of data architecture
Traditional
data warehousing
Data lakes
on AWS
Real-time
analytics with
streaming data
Data warehouse
modernization
Data
governance
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
Machine
learning
Solution 3: Streaming and
real-time analytics with
Amazon Kinesis
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 107
Journey to a modern data architecture
Evolution of data architecture
Traditional
data warehousing
Data lakes
on AWS
Real-time
analytics with
streaming data
Data warehouse
modernization
Data
governance
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
Machine
learning
Types of data used
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 108
Streaming data defined
109
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data that is generated continuously from thousands of
data sources, sent simultaneously
Player-game interactions
Geolocation of
cars and devices
Music downloads
Website clicks
Social media streams
Common use cases: Real-time analytics
110
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Milliseconds Seconds Minutes Hours
• Messaging between
microservices
• Response analytics
(web and mobile
application
notifications)
• Log ingestion
• Internet of Things (IoT)
device maintenance
• Change data capture (CDC)
• Streaming ETL into
data lakes and
data warehouse
The value of data diminishes over time
Enabling real-time analytics
111
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data streaming technology enables a customer to ingest, process, and analyze high
volumes of high-velocity data from a variety of sources, in real time.
1. 2. 3. 4. 5.
Data streaming solution challenges
Difficult to set up
Difficult to achieve high availability
Error prone and complex to
manage
Tricky to scale
Integration requires development
Expensive to maintain
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 112
Challenges of building on-premises, real-time streaming solutions:
AWS streaming data solutions
Efficiently collect, process, and analyze data streams in real time
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Firehose
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 113
Amazon Kinesis
Data Analytics
Data generators: Simple streaming
data patterns
114
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data producers Streaming services Data consumers
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Streams
Mobile and applications
Amazon Kinesis Agent
Amazon Kinesis Data Streams
Amazon CloudWatch Logs
Amazon CloudWatch Events
AWS IoT
Apache Kafka
Amazon Kinesis Producer
Library (KPL)
Amazon EMR
Amazon Redshift
Amazon Simple
Storage Service (S3)
Amazon EC2
Amazon Kinesis
Connector Library
Amazon Kinesis Data Streams
115
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Streams
116
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Massively scalable, highly durable data ingestion and processing service
optimized for real-time data streaming
No upfront cost
low, pay-as-
you-go pricing
70
Data collected is
available within
milliseconds
Real-time analytics
• Dashboards
• Anomaly detection
• Dynamic pricing
3
7
Data synchronously
replicates data across
Availability
Zones in a
Region
Data can be stored up to
days
Serverless, can scale
dynamically to handle
MB to TB Thousands
to millions
each hour
of PutRecords
each second
and
https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5
How Kinesis Data Streams works
117
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis
Data Analytics
Amazon EC2
AWS Lambda
Input
Output
Spark on Amazon EMR
Amazon Kinesis
Data Streams
Capture and send data Ingest and store data
streams for processing
Build custom, real-time
applications
Analyze streaming data
using BI tools
Kinesis Data Streams architecture
118
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EC2 instances
Client
Mobile client
Traditional
server
Data producers
Shard 1
Shard 2
Shard N
Amazon Kinesis
Data Stream
EC2 instance
EC2 instance
Data consumers
Amazon Redshift
Amazon S3
Amazon Kinesis
Data Firehose
Amazon EMR
Amazon DynamoDB
Shard 1
Data record
• Sequence #
• Partition Key
• Data blob
Data stream
https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Analytics
Kinesis Data Streams provisioning
119
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Firehose
120
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
How Kinesis Data Firehose works
121
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis
Data Firehose
Input
Output
Splunk
Amazon Redshift
Amazon S3
Amazon
Elasticsearch Service
Capture and send data Prepares and loads data
continuously to the
selected destinations
Durably store the data
for analytics
Analyze streaming data
using analytics tools
Kinesis Data Streams and
Kinesis Data Firehose
122
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Characteristics Amazon Kinesis Data Streams Amazon Kinesis Data Firehose
Processing time As fast as 70 milliseconds after ingestion Between 60–900 seconds
Stream storage and
duration
In shards, default 24 hours and up to 7
days
Max buffer size 128 MB and max time 900
seconds
Data transformation and
conversion
None Uses AWS Lambda and AWS Glue
Data producer
Amazon Kinesis Agent, applications using Amazon Kinesis Producer Library (KPL), AWS SDK
for Java, Amazon CloudWatch Logs and CloudWatch Events, AWS IoT
Data consumer
AWS Lambda, Amazon Kinesis Data
Analytics, Amazon Kinesis Data Firehose,
Applications using the Kinesis Client Library
(KCL) and SDK for Java
AWS Lambda, Amazon Kinesis Data Analytics,
and Kinesis Data Firehose, apps using the KCL
and SWK for Java, Amazon S3, Amazon
Redshift, Amazon ES, Splunk, and Amazon
Kinesis Data Analytics
Data compression None gzip, Snappy, Zip, or no data compression
https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5
https://aws.amazon.com/kinesis/data-firehose/faqs/?nc=sn&loc=5
When to use Kinesis Data Streams and
Kinesis Data Firehose
123
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis
Data Firehose
For data streaming applications with massive ingestion requirements
• Requires data to be sent to consumer analytics services for millisecond
response time
• Massively scalable
• Data retention time ranging from hours to days
• Example: Real-time gaming
Amazon Kinesis
Data Streams
For data streaming applications that require near real-time responses in seconds
• Need for data augmentation, data transformation, or data compression
• Need to save data to Amazon S3, Amazon Redshift, Amazon ES, Splunk, or
send data to Amazon Kinesis Data Analytics for analytics
• Example: Log analytics
Amazon Kinesis Data Analytics
124
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Analytics
125
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Input
Amazon Kinesis
Data Analytics Output
Capture streaming data
with Amazon MSK,
Amazon Kinesis Data
Streams, Amazon Kinesis
Data Firehose, or other
data sources
Query and analyze
streaming data
Send processes data
to analytics tools to
create alerts and
respond in real time
Use case: Clickstream analytics
s127
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis
Data Firehose
Input Output
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Analytics
Amazon Redshift
Evolve from batch processing to real-time analytics
Websites send
clickstream data
Collects the data and
sends to Kinesis Data
Analytics
Processes data in
near-real time
Loads processed
data into
Amazon Redshift
Runs analytics
models to identify
content
recommendations
Readers see
personalized content
suggestions and
increase
engagement
Case study: Epic Games
130
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenge
Needed a way to process and analyze
over 100 PB of data (125 million
events each minute) ingested from
game clients and game servers to
understand and adapt to player
engagement.
Solution
Epic Games turned to AWS for an
Amazon S3 data lake in combination
with Amazon EMR, Amazon EC2, and
Amazon Kinesis.
Benefits
The data provides a constant
feedback loop for designers, and an
up-to-the-minute analysis of gamer
satisfaction to drive gamer
engagement.
Continually improves Fortnite for 250+ million players globally
https://aws.amazon.com/solutions/case-studies/EPICGames/
Solution 4: Data governance
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 132
Journey to a modern data architecture
Evolution of data architecture
Traditional
data warehousing
Data lakes
on AWS
Real-time
analytics with
streaming data
Data warehouse
modernization
Data
governance
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
Machine
learning
Types of data used
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 133
Challenges of data in data lakes
• Securing data
• Auditing data usage
• Managing data access
• Safeguarding sensitive data and PII
• Maintaining regulations and mandates
134
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data security and governance
© ENTERPRISE STRATEGY GROUP, 2019.
With big data comes big responsibility.
More than one in three companies cite data privacy and governance as
a hurdle to both digital transformation and IoT initiatives
34% 37%
of IT decision makers cite ensuring data
governance/privacy as one of their
organization’s biggest digital
transformation challenges
of IT decision makers cite ensuring
security/compliance upon movement of
data as one of their most important IoT
priorities over the next 18–24 months
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 135
https://www.esg-global.com/hubfs/ESG-Infographic-IT-Spending-Intentions-2019.pdf
Resolving PII dangers
136
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Personally
identifiable
information
(PII)
Consumer
consent
violation
Data
breach
Spyware
Unsecured
devices
Rogue
agents
Second-
party
misuse
Espionage
External
hacking
• Do these issues need to be
resolved?
• Is there a solution architecture
that solves all PII issues?
• What best practices can be
used to mitigate PII dangers?
Amazon Macie
137
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Macie
Continually evaluate
Amazon S3
environment
Discover sensitive
data
Take action
Enable Amazon
Macie with one-click
in the AWS
Management
Console or with a
single API call
Automatically
generates an
inventory of
Amazon S3 bucket
and details on the
bucket-level security
and access controls
Analyzes bucket
using ML and
pattern matching to
discover sensitive
data, like PII
Generates findings
and sends to
Amazon
CloudWatch Events
for integration into
workflows and
remediation actions
• Financial
• Personal
• National
• Medical
• Credentials and secrets
De-identified data lake (DIDL) on AWS
A de-identified data lake (DIDL) is an architectural approach that reduces the risks
associated with managing data, particularly personally identifiable information (PII).
Benefits
Reduce risk
• Remove PII before it enters a data lake
Understand all the data
• Create a Data Catalog of an entire data lake
Reduce compliance costs
• Automate the discovery, classification, de-identification,
and ongoing monitoring of data across an organization
Turn data into an asset, not a liability
• Enable a broader set of governed analytic and machine learning use cases
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 138
Masking PII data
139
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Email Customer ID Transcript
csalazar@example.com 19664 Just talked to Carlos Salazar
mary@example.com 23423 Mary’s SSN is 000000000
mateo@example.com 99644 Mateo is moving to Nevada
NA 02945 It is expected to rain tomorrow
Email Customer ID Transcript
4t34gttt 7462391 Just talked to Jane Roe
44e5325 1239474 Jorge’s SSN is 666666666
0we&yrw 9983487 Sofia is moving to Texas
NA 3344325 It is expected to rain tomorrow
Email ID Name, SSN, State
Extended solution 5: Insights and
monetization with ML on AWS
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 140
Journey to a modern data architecture
Evolution of data architecture
Traditional
data warehousing
Data lakes
on AWS
Real-time
analytics with
streaming data
Data warehouse
modernization
Data
governance
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
Machine
learning
Types of data used
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 141
Data lakes and machine learning
Machine learning requires:
• More data: Collect all types of data
• Flexibility: Define schema during analysis
• Scalability: Scale storage and compute (CPU or
GPU) independently
• Data transformation and processing: Run a broad
set of processing and analytics on the
same data without movement
• Security: Networking, identity, encryption, and
compliance
OLTP ERP CRM LOB
Data warehouse
Business analytics
10011000010010101
11001010101110010
10100001011111011
010
00111100101100101
10
0100011000010
Data lake
Devices Web Sensors Social
Data Catalog
AI and
machine learning
Data warehouse
queries
Big data
processing
Interactive Real time
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 142
Amazon SageMaker
143
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Machine learning at enterprise scale
Build
Train and tune
Deploy and manage
Notebooks for
common problems
High-performance
algorithms
• Managed Jupyter for enterprise data science
• Sample notebooks for most common use cases
• Single-pass, streaming training algorithms
One-click training Hyperparameter
optimization
One-click
deployment
Fully managed
elastic hosting
• Training models at scale without DevOps
assistance
• ML on ML to optimize hyperparameters
• Deploy to production with a single call
• Fully managed, production-grade inferences
https://aws.amazon.com/machine-learning/?nc2=h_ql_prod_ml
Machine learning resources
• Fundamental digital course on
how SageMaker mitigates the
core challenges of
implementing an ML pipeline
• Duration: 30 minutes
• https://www.aws.training/Detai
ls/Video?id=49646
145
• Explore how to use the
machine learning pipeline to
solve a real business problem
(intermediate)
• Duration: 4 days
• https://www.aws.training/Sessi
onSearch?pageNumber=1&cou
rseId=38910
• Learn to solve real-world use
cases with machine learning
(intermediate)
• Duration: 1 day
• https://www.aws.training/Sessi
onSearch?pageNumber=1&cou
rseId=40748
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Foundations: How Amazon
SageMaker Can Help
Practical Data Science with
Amazon SageMaker
The Machine Learning Pipeline
on AWS
https://partnercentral.awspartner.com/LmsSsoRe
direct?RelayState=%2flearningobject%2fcurriculu
m%3fid%3d25521
AWS STP: Machine Learning (ML) on
AWS for ML Practitioners - Technical
Use case: Next Caller
146
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://www.youtube.com/watch?v=K27WjYwyqw8&list=PLhr1KZpdzukdeX8m
Q2qO73bg6UKQHYsHb&index=1&did=ta_card&trk=ta_card
2022 – re:Invent – What’s new in Analytics?
© 2022 Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s new
Amazon OpenSearch
• Serverless (Preview)
Redshift
• Multi-AZ for RA3 clusters (Preview)
• Auto-copy from S3 (Preview)
• Integration for Apache Spark
• Amazon Aurora zero-ETL
• Dynamic Data Masking (Preview)
• AWS Backup support
• Streaming Ingestion for Kinesis
Data Streams (Preview)
• Centralized access control with
AWS Lake formation (Preview)
AWS Glue
• Data Quality (Preview)
• AWS Glue 4.0
• Custom visual transforms
• AWS Glue for Ray (Preview)
Amazon Athena
• Spark support
QuickSight
• Automated data preparation
• Paginated Reports
Summary
148
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evolution of data architecture
Traditional
data warehousing
Data lakes
on AWS
Real-time
analytics with
streaming data
Data warehouse
modernization
Data
governance
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
Machine
learning
• Kinesis Data Streams
• Kinesis Data Firehose
• Kinesis Data Analytics
Amazon Macie Amazon SageMaker
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission
from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course-
feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners.
Thank you
Module 5: APN Partner
Opportunities and Resources
Objectives
In this module, you will learn how to:
• Describe how to collaborate with AWS for data analytics
• Describe AWS Data and Analytics resources for APN Partners:
• Competency categories
• AWS Immersion Days
• AWS Certified Data Analytics and learning resources
• Access the AWS Marketplace
• Perform the calls to action
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 202
APN Partners and
AWS for Data Analytics
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Discounting and funding programs
Enterprise
Discount
Program
(EDP)
Migration
programs
POC funding
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 204
AWS Data and Analytics Competency
categories
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Analytics
Platforms
NoSQL/New SQL
Data Integration and
Preparation
Business Intelligence
(BI) and Data
Visualization
Data Governance and
Security
Provide a set of integrated tools to solve data
analytics challenges within a standard framework
Provide highly scalable databases that
organize data into a structure
Enable customers to move and consolidate data
from disparate sources, transform it, and
prepare it for analytics
Help customers turn raw data into actionable business
information, such as reporting, dashboards, and data visualization
Help customers discover, categorize, and control their data
205
AWS data analytics solutions and
Immersion Days
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Data Lab program
• The AWS Data Lab program offers accelerated joint engineering
engagements between a team of customer builders and AWS
technical resources to create tangible deliverables that accelerate data
and analytics modernization initiatives.
• Two offerings:
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Design Lab
Focus on real-world
architectural design
Build Lab
Focus on providing
guidance with building
a functioning
prototype with a
customer team
Duration
Half day to 5 days
Location
Virtual or AWS Data Lab hub – Seattle, NYC,
Herndon (VA), London, Bangalore
Cost
Free. Reach out to your APN support
team for more information.
210
https://aws.amazon.com/aws-data-lab/
AWS Immersion Days
Designed to help APN Advanced and Premier Consulting Partners deliver technical data analytics
workshops to their customers and help grow their businesses
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Engineering
Immersion Day
Build a serverless data lake
solution on AWS including
modules focusing on
ingestion, hydration,
exploration, and consumption
https://aws.amazon.com/partners/immersion-days/
Amazon EMR
Immersion Day
Focus on unique facets of
Amazon EMR for big data
workloads
Database Migration
Immersion Day
Give your customers a head
start with the AWS Database
Migration Service and the
Schema Conversion Tool
… and many more.
Benefits: Access to technical workshop content, AWS usage credits, Market Development Funds (MDF)
opportunities, and support from AWS teams
211
Data solutions in AWS
Marketplace
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Marketplace
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://aws.amazon.com/marketplace/search/results?searchTerms=data+and+analytics
213
Use case: Modern data warehouse
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 214
AWS Certified data analytics and
learning resources
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Technical Professional Learning Path
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 216
AWS Certified Data Analytics – Specialty
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/certification/certified-data-analytics-specialty/ 217
Why I got AWS certified
Customers talk about the value of AWS certification
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 218
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Partner Cast: Analytics
219
Call to action
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build a data analytic practice on AWS
Build packaged
solutions
Know your Partner
Solutions Architect
Ask for customer
references
Engage with AWS
service teams
Develop customer
workshops
Achieve an APN
competency
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 221
Call to action
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use the Data Flywheel
to perform
assessments
Work with your
Partner team to
schedule an
Immersion Day for
your customers
View the analytics
customer case studies
https://aws.amazon.com/
big-data/datalakes-and-
analytics/
Create a specialized
service around one of
the analytics services
Participate in the
AWS Data Lab
https://aws.amazon.com/
aws-data-lab/
Prepare for the AWS
Data Analytics –
Specialty certification
Build relationships
with APN teams for
funding opportunities
for your marketing
and sales efforts
222
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission
from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course-
feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners.
Thank You

More Related Content

What's hot

Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Amazon Web Services
 
HSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsHSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsAmazon Web Services
 
AWS Core Services Overview, Immersion Day Huntsville 2019
AWS Core Services Overview, Immersion Day Huntsville 2019AWS Core Services Overview, Immersion Day Huntsville 2019
AWS Core Services Overview, Immersion Day Huntsville 2019Amazon Web Services
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceAmazon Web Services
 
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나Amazon Web Services Korea
 
Encryption and Key Management in AWS
Encryption and Key Management in AWSEncryption and Key Management in AWS
Encryption and Key Management in AWSAmazon Web Services
 
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018Amazon Web Services
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Amazon Web Services
 
Introducing AWS Elastic Beanstalk
Introducing AWS Elastic BeanstalkIntroducing AWS Elastic Beanstalk
Introducing AWS Elastic BeanstalkAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...Amazon Web Services Korea
 

What's hot (20)

Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
 
ElastiCache & Redis
ElastiCache & RedisElastiCache & Redis
ElastiCache & Redis
 
HSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsHSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundations
 
AWS Core Services Overview, Immersion Day Huntsville 2019
AWS Core Services Overview, Immersion Day Huntsville 2019AWS Core Services Overview, Immersion Day Huntsville 2019
AWS Core Services Overview, Immersion Day Huntsville 2019
 
Deep dive into AWS IAM
Deep dive into AWS IAMDeep dive into AWS IAM
Deep dive into AWS IAM
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Introduction to AWS Security
Introduction to AWS SecurityIntroduction to AWS Security
Introduction to AWS Security
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
Intro to AWS Lambda
Intro to AWS Lambda Intro to AWS Lambda
Intro to AWS Lambda
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
 
Encryption and Key Management in AWS
Encryption and Key Management in AWSEncryption and Key Management in AWS
Encryption and Key Management in AWS
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
 
Introducing AWS Elastic Beanstalk
Introducing AWS Elastic BeanstalkIntroducing AWS Elastic Beanstalk
Introducing AWS Elastic Beanstalk
 
AWS Secrets Manager
AWS Secrets ManagerAWS Secrets Manager
AWS Secrets Manager
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Aws certified solutions architect
Aws certified solutions architectAws certified solutions architect
Aws certified solutions architect
 
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
 

Similar to AWS Data Analytics on AWS

AWS Partner Data Analytics on AWS_Handout.pdf
AWS Partner Data Analytics on AWS_Handout.pdfAWS Partner Data Analytics on AWS_Handout.pdf
AWS Partner Data Analytics on AWS_Handout.pdfSrinjoySaha12
 
Module 3 - QuickSight Overview
Module 3 - QuickSight OverviewModule 3 - QuickSight Overview
Module 3 - QuickSight OverviewLam Le
 
Develop Integrations for Salesforce and AWS (API320) - AWS re:Invent 2018
Develop Integrations for Salesforce and AWS (API320) - AWS re:Invent 2018Develop Integrations for Salesforce and AWS (API320) - AWS re:Invent 2018
Develop Integrations for Salesforce and AWS (API320) - AWS re:Invent 2018Amazon Web Services
 
DevopsDays Geneva 2020 - Compliance & Governance as Code
DevopsDays Geneva 2020 - Compliance & Governance as CodeDevopsDays Geneva 2020 - Compliance & Governance as Code
DevopsDays Geneva 2020 - Compliance & Governance as Codejeromevdl
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your SolutionsAmazon Web Services
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSAmazon Web Services
 
Improve Time to Market with Real-Time Analytics on Time-Series Data
Improve Time to Market with Real-Time Analytics on Time-Series DataImprove Time to Market with Real-Time Analytics on Time-Series Data
Improve Time to Market with Real-Time Analytics on Time-Series DataVin Dahake
 
Data Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data StrategyData Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data StrategyData Con LA
 
Single View of Data
Single View of DataSingle View of Data
Single View of Dataconfluent
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightAmazon Web Services
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSSteven Hsieh
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析Amazon Web Services
 
Introduction to Hybrid Cloud on AWS - AWS Online Tech Talks
Introduction to Hybrid Cloud on AWS - AWS Online Tech TalksIntroduction to Hybrid Cloud on AWS - AWS Online Tech Talks
Introduction to Hybrid Cloud on AWS - AWS Online Tech TalksAmazon Web Services
 
Introduction to Hybrid Cloud on AWS
Introduction to Hybrid Cloud on AWSIntroduction to Hybrid Cloud on AWS
Introduction to Hybrid Cloud on AWSTom Laszewski
 
Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven DecisionsLeveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven DecisionsAmazon Web Services
 
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Amazon Web Services
 
Best Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS MarketplaceBest Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS MarketplaceDenodo
 
Get More from your Data: Accelerate Time-to-Value and Reduce TCO with Conflue...
Get More from your Data: Accelerate Time-to-Value and Reduce TCO with Conflue...Get More from your Data: Accelerate Time-to-Value and Reduce TCO with Conflue...
Get More from your Data: Accelerate Time-to-Value and Reduce TCO with Conflue...HostedbyConfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Confluent_AWS_ImmersionDay_Q42023.pdf
Confluent_AWS_ImmersionDay_Q42023.pdfConfluent_AWS_ImmersionDay_Q42023.pdf
Confluent_AWS_ImmersionDay_Q42023.pdfAhmed791434
 

Similar to AWS Data Analytics on AWS (20)

AWS Partner Data Analytics on AWS_Handout.pdf
AWS Partner Data Analytics on AWS_Handout.pdfAWS Partner Data Analytics on AWS_Handout.pdf
AWS Partner Data Analytics on AWS_Handout.pdf
 
Module 3 - QuickSight Overview
Module 3 - QuickSight OverviewModule 3 - QuickSight Overview
Module 3 - QuickSight Overview
 
Develop Integrations for Salesforce and AWS (API320) - AWS re:Invent 2018
Develop Integrations for Salesforce and AWS (API320) - AWS re:Invent 2018Develop Integrations for Salesforce and AWS (API320) - AWS re:Invent 2018
Develop Integrations for Salesforce and AWS (API320) - AWS re:Invent 2018
 
DevopsDays Geneva 2020 - Compliance & Governance as Code
DevopsDays Geneva 2020 - Compliance & Governance as CodeDevopsDays Geneva 2020 - Compliance & Governance as Code
DevopsDays Geneva 2020 - Compliance & Governance as Code
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
 
Improve Time to Market with Real-Time Analytics on Time-Series Data
Improve Time to Market with Real-Time Analytics on Time-Series DataImprove Time to Market with Real-Time Analytics on Time-Series Data
Improve Time to Market with Real-Time Analytics on Time-Series Data
 
Data Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data StrategyData Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data Strategy
 
Single View of Data
Single View of DataSingle View of Data
Single View of Data
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSight
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析
 
Introduction to Hybrid Cloud on AWS - AWS Online Tech Talks
Introduction to Hybrid Cloud on AWS - AWS Online Tech TalksIntroduction to Hybrid Cloud on AWS - AWS Online Tech Talks
Introduction to Hybrid Cloud on AWS - AWS Online Tech Talks
 
Introduction to Hybrid Cloud on AWS
Introduction to Hybrid Cloud on AWSIntroduction to Hybrid Cloud on AWS
Introduction to Hybrid Cloud on AWS
 
Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven DecisionsLeveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
 
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
 
Best Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS MarketplaceBest Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
 
Get More from your Data: Accelerate Time-to-Value and Reduce TCO with Conflue...
Get More from your Data: Accelerate Time-to-Value and Reduce TCO with Conflue...Get More from your Data: Accelerate Time-to-Value and Reduce TCO with Conflue...
Get More from your Data: Accelerate Time-to-Value and Reduce TCO with Conflue...
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Confluent_AWS_ImmersionDay_Q42023.pdf
Confluent_AWS_ImmersionDay_Q42023.pdfConfluent_AWS_ImmersionDay_Q42023.pdf
Confluent_AWS_ImmersionDay_Q42023.pdf
 

Recently uploaded

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

AWS Data Analytics on AWS

  • 1. AWS Partner: Data Analytics on AWS – Technical Amey Birje Sr AWS Partner Trainer
  • 2. Module 1: Course Introduction
  • 3. Course objectives In this course, you will learn how to: • Identify Amazon Web Services (AWS) services in the AWS analytics stack • Describe decision points and technology selections for data analytics architectures • Discuss the AWS Data Pipeline and the customer data analytics journey using the Data Flywheel • Describe five AWS data analytics technical solutions: • Modernizing a data warehouse with Amazon Redshift • Data lakes • Streaming data • Data governance • Machine learning (ML) • Locate and use AWS Partner Network (APN) Partner resources for opportunities and training © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 3
  • 4. About this course • This course is for technical professionals at APN Consulting Partner organizations who are engaged in pre-sales discussions with customers to help architect data analytic solutions on AWS and answer technical questions about using AWS data analytics services. • This 1-day course is focused on educating technical professionals with sufficient technical knowledge on AWS data analytics services and solutions to successfully engage with and help customers. • This course is not designed to be a technical deep dive into AWS data analytics services and solutions. It provides the necessary resources and learning path towards gaining deeper knowledge into the services. 4 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5. Module 2: AWS Data Analytics Portfolio
  • 6. Objectives In this module, you will learn how to: • Understand customer challenges related to data analytics in their business • Provide a technical overview of AWS data analytics portfolio • Discuss technical advantages and position of data analytics solutions on AWS • Explain how to build a data analytics pipeline • Explain the Data Flywheel © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 8
  • 7. Customer challenges and opportunities for APN Partners © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 9
  • 9. https://assets.ey.com/content/dam/ey-sites/ey-com/en_gl/topics/workforce/Seagate-WP-DataAge2025-March-2017.pdf Data every 5 years There is more data than people think 15 years live for Data platforms must 1,000x scale >10x grows Data is ever-growing © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 11
  • 10. © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. New realities By making 10% more data accessible, a typical Fortune 1000 company will see a $65 million increase in net income.* Explosion of data- connected devices, apps, and systems generate more data than ever before. Pay-as-you-go pricing allows organizations to analyze data to gain insights. © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 12 *Source: Forbes Online; New Vantage Partners - Big Data Executive Survey https://www.forbes.com/sites/cognitiveworld/2019/02/06/data-the-fuel-powering-ai-digital-transformation/#5062b36b578b Demand growing for faster decision making on real-time data.
  • 11. Customers need your help 13 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 85% of businesses want to be data driven, but only 37% have been successful. https://www.forbes.com/sites/cognitiveworld/2019/02/06/data-the-fuel-powering-ai-digital-transformation/#51efb027578b http://newvantage.com/wp-content/uploads/2017/01/Big-Data-Executive-Survey-2017-Executive-Summary.pdf
  • 12. Common data analytics challenges 14 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Top four challenges involve knowledge, skill, security, and privacy This is your opportunity Data security (unauthorized access to company data) Data privacy issues (safety of personal data) What challenges do you see when using big data analytics/technologies? (n=545) Inadequate technical know-how in our company 53% 49% 48% 48% Inadequate analytical know-how in our company https://bi-survey.com/challenges-big-data-analytics
  • 13. AWS data analytics portfolio overview © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 15
  • 14. Secure infrastructure for analytics Customers need multiple levels of security, identity and access management, encryption, and compliance to secure their data lake. 16 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Compliance AWS Artifact Amazon Inspector AWS CloudHSM Amazon Cognito AWS CloudTrail Security Amazon GuardDuty AWS Shield AWS Well-Architected Tool Amazon Macie Amazon Virtual Private Cloud (Amazon VPC) Encryption AWS Certificate Manager Private Certificate Authority (ACM Private CA) AWS Key Management Service (AWS KMS) Encryption at rest Encryption in transit Bring your own keys, hardware security module (HSM) support Identity AWS Identify and Access Management (IAM) AWS Single Sign-On Amazon Cloud Directory AWS Directory Service AWS Organizations
  • 15. AWS data analytics portfolio AWS Database Migration Service (AWS DMS) | AWS Snowball | AWS Snowmobile | Amazon Kinesis Data Firehose Amazon Kinesis Data Streams | Amazon Managed Streaming for Apache Kafka Data movement © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 17 Amazon QuickSight Amazon SageMaker Amazon Comprehend Amazon Lex Amazon Polly Amazon Rekognition Amazon Translate Amazon Pinpoint AWS Data Exchange Data visualization, engagement, and machine learning Amazon Redshift Amazon EMR (Spark and Presto) Amazon Athena Amazon Opensearch Service Amazon Kinesis Data Analytics AWS Glue (Spark and Python) Analytics Amazon Simple Storage Service (Amazon S3) Amazon S3 Glacier AWS Glue AWS Lake Formation Data lake infrastructure and management
  • 16. Data movement services Help customers move data from on premises to the cloud 18 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DMS AWS Snowball AWS Snowmobile Amazon Managed Streaming for Kafka Amazon Kinesis Data Streams Amazon Kinesis Data Firehose
  • 17. Data lake services Customers are constrained by volume, variety, veracity, and velocity of on-premises data, and data silos pose a major challenge. 19 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 Amazon S3 Glacier AWS Lake Formation AWS Glue
  • 18. Analytics services Help customers extract value out of their data 20 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift Amazon EMR AWS Glue Amazon OpenSearch Amazon Athena Amazon Kinesis Data Analytics
  • 19. Data visualization, engagement, and machine learning services Help customers understand and visualize their data, and use machine learning (ML) for advanced analytics and predictions 21 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon QuickSight Amazon SageMaker AWS Data Exchange
  • 20. AWS value proposition © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 22
  • 21. Standards, formats, and open source © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. • Apache Flink • Ganglia • Apache HBase • HCatalog • Hadoop Distributed File System (HDFS) • Apache Hive • Hudi • Java • JupyterHub • Apache Kafka • Apache Livy • Apache Mahout • MapReduce • Apache MXNet • MySQL • Apache Oozie • Apache ORC • Apache Parquet • Phoenix • Apache Pig • Presto • Python • PyTorch • R • Scala • Apache Spark • Sqoop • SQL • TensorFlow • Tez • Yarn • Apache Zeppelin • Apache Zookeeper …and many more 23
  • 22. AWS alternatives to open source 24 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EMR Amazon OpenSearch Service Managed Streaming for Apache Kafka Real-time analytics Kafka Operational analytics Elasticsearch Logstash Kibana Spark, Hive, Presto, Flink, HBase Hadoop Spark
  • 23. Data analytics pipeline © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 25
  • 24. © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Data management challenges How can customers: • Collect a variety of data types accumulating at varying velocities? • Collect data from numerous sources accumulating at differing velocities? • Store massive amounts of data without running out of space? • Cleanse and augment data quality to be analyzed? Can they automate these steps? 26
  • 25. Data analytics pipeline Collect Store Process and analyze Visualize Insights Time-to-answer (latency) Balance of throughput and cost Data Insights © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf?did=wp_card&trk=wp_card 27
  • 26. Data pipeline challenges Building a data pipeline is challenging. Customers must: • Manage updates, patches, and software integrations • Handle increased overhead costs plus need for support • Maintain focus on the core task of building applications that lead to data insights 28 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 27. AWS data analytics pipeline services 29 Collect Store Process and analyze Visualize Automate Amazon Kinesis Data Firehose AWS Direct Connect Amazon Kinesis Data Streams AWS Snowball Amazon S3 Glacier Amazon S3 Amazon DynamoDB Amazon RDS Amazon Aurora Amazon OpenSearch Amazon EMR Amazon Kinesis Data Analytics Amazon QuickSight Amazon Redshift Amazon Athena AWS Database Migration Service Amazon SageMaker AWS Glue © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Managed Streaming for Kafka
  • 28. Data Flywheel © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 30
  • 29. 010010010 01010001 100010100 Data Flywheel and customer journey Build data-driven applications Modernize data warehouse and build a data lake Migrate data and workloads to the cloud ü Save time ü Save costs Store and manage data ü Agility ü Global distribution ü Scale and performance ü New and faster insights ü Broader access to analytics Innovate with machine learning ü Better experiences ü Deeper engagement ü Efficient processes © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 31 Attract new customers Generate more data Data https://pages.awscloud.com/data-flywheel.html
  • 30. Summary In this module, you learned about: • Customer challenges related to data analytics • AWS data analytics portfolio • Technical benefits of AWS data analytics solutions • Data analytics pipeline • Data Flywheel © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 32
  • 31. © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course- feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners. Thank you 33 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 32. Module 3: Data Analytics Solutions on AWS – Part I
  • 33. Objectives In this module, you will learn how to: • Explain data migration options from on premises to the AWS Cloud • Describe two AWS data analytics technical solutions • Modernizing a data warehouse with Amazon Redshift • Data lakes © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 35 Evolution of data architecture Traditional data warehousing Data lakes on AWS Real-time analytics with streaming data Data warehouse modernization Data governance 10011000010010101110010 10101110010101000010111 11011010 0011110010110010110 0100011000010 Machine learning
  • 34. Data migration options © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 36
  • 35. Journey to a modern data architecture Evolution of data architecture Traditional data warehousing Data lakes on AWS Data warehouse modernization 100110000100 101011100101 010111001010 100001011111 011010 001111001011 0010110 010001100001 0 Types of data Data governance Machine learning Real-time analytics with streaming data © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 37
  • 36. AWS data migration options 38 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Snowball AWS Storage Gateway Amazon S3 Transfer Acceleration AWS Direct Connect AWS Database Migration Service Amazon Kinesis Data Firehose • File gateway • Tape gateway • Volume gateway • Snowball Edge storage optimized • AWS Snowmobile
  • 37. Solution 1: Modernizing a data warehouse with Amazon Redshift © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 39
  • 38. Journey to a modern data architecture Evolution of data architecture Traditional data warehousing Data lakes on AWS Data warehouse modernization 100110000100 101011100101 010111001010 100001011111 011010 001111001011 0010110 010001100001 0 Types of data Data governance Machine learning Real-time analytics with streaming data © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 40
  • 39. Data warehouses 41 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 40. 44 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Traditional architecture and on-premises data warehouse challenges • Difficult to scale • Long lead times for hardware procurement • Complex upgrades are the norm • High overhead costs for administration • Expensive licensing and support costs • Proprietary formats do not support newer open data formats, which results in data silos • Data not cataloged, unreliable quality • Licensing cost limits number of users and how much data can be accommodated • Difficult to integrate with services and tools
  • 41. Amazon Redshift 45 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 42. Amazon Redshift A fully managed data warehouse that is highly integrated with other AWS services. Features include: • Optimized for high performance • Support for open file formats • Petabyte-scale capability • Support for complex queries and analytics, with data visualization tools • Secure end-to-end encryption and certified compliance • Service Level Agreement (SLA) of 99.9 percent • Based on open source Postgres database • Cost efficient © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/redshift/pricing/ Amazon Redshift Secure data warehouse that extends seamlessly to a data lake 46
  • 43. Amazon Redshift performance features Breaks a large job it into smaller tasks, then distributes the tasks to multiple compute nodes 47 Independent and resilient nodes without any dependencies Data from each column is stored together so the data can be accessed faster, without scanning and sorting all other columns © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Massively parallel processing (MPP) Columnar storage Shared-nothing architecture Result: Faster processing time Result: Compression of stored data improves performance Result: Improves scalability
  • 44. Amazon Redshift architecture 48 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Client applications Leader node Compute Node 1 Compute Node 2 Data warehouse cluster Java Database Connectivity (JDBC) Open Database Connectivity (ODBC) https://docs.aws.amazon.com/redshift/index.html Node slices Node slices
  • 45. Leader node Responsible for communication with the client application and compute notes 49 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift leader node: • SQL endpoint • Metadata • Query compilation and optimization • Coordinates parallel SQL processing • Machine learning (ML) optimizations Leader node Compute node 1 Compute node 2 Data warehouse cluster Node slices Node slices
  • 46. Compute node • SQL running powerhouses • Compute node can load, unload, backup, and restore data to and from Amazon S3. • Node clusters range from 1 to 128. 50 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Runs queries in parallel and returns the result to the leader node Leader node Compute node 1 Compute node 2 Data warehouse cluster Node slices Node slices
  • 47. Compute node slices 51 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Slices are a symmetric multiprocessing (SMP) mechanism. Slice 1 | Slice 2 Local disk Local disk Virtual core Virtual core 7.5 GB RAM 7.5 GB RAM • Partitioned into slices. • Slices work in parallel to complete operations. • Virtual processors contained in each compute node. • Each slice is allocated an equal amount of memory, compute allowance, and disk space. • Each slice operates in parallel but can request data from other slices. Compute node 1 Compute node 2 Data warehouse cluster Node slices Node slices
  • 48. Amazon Redshift cluster resizing: Two approaches 52 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Elastic resize Exiting cluster is modified to add or remove nodes in two stages. Stage 1 • Cluster is temporarily unavailable while elastic resize migrates cluster metadata. • Typically completes in minutes. • Amazon Redshift holds session connections while queries remain queued. Stage 2 • Session connections are reinstated and queries resume. • Redistributes data to node slices in the background. • Cluster is available for read and write operations. Classic resize Can be reconfigured to different node count and instance type. • Might take one or more hours to complete, depending on data size. • Involves streaming all data from original cluster to newly configured cluster. • During the resize, original cluster is in read- only mode. • Customer charged for only one cluster.
  • 49. Amazon Redshift instance types 53 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html
  • 50. Management interfaces 54 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://us-west-2.console.aws.amazon.com/redshiftv2/home?region=us-west-2#query-editor
  • 51. Amazon Redshift differentiating features 55 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 52. Amazon Redshift differentiating features 56 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Federated query Amazon Redshift lake house architecture
  • 53. Federated query © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Data warehouse Amazon Aurora OLTP ERP CRM LOB Integrate queries on live data in Amazon RDS for PostegreSQL and Amazon Aurora PostgreSQL with queries on Amazon Redshift and Amazon data lake Reduce data moved over the network with Amazon Redshift’s intelligent optimizer. Pushes and distributes portions of computation directly into remote operational databases Benefits • Incorporate live data into business intelligence (BI) and reporting applications • Ingest data into Amazon Redshift • Query operational databases directly • Apply transformations on the fly • Load data into target tables without complex ETL pipelines 57
  • 54. Amazon Redshift lake house architecture With Amazon Redshift lake house architecture, customers can: • Query data in the data lake and write data back in open formats • Use familiar SQL statements to combine and process data across data stores • Run queries on live data in operational databases without requiring data loading and ETL pipelines 58 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift lake house queries are run by a fleet of nodes that are owned and maintained by AWS. https://aws.amazon.com/redshift/lake-house-architecture/
  • 55. © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 59 SQL clients, business intelligence tools Leader node Compute node 1 Node slices JDBC/ODBC Compute node 2 Node slices Amazon S3 AWS Glue Data Catalog Amazon Redshift lake house Amazon Redshift lake house fleet 1 SELECT COUNT(*) FROM S3.EXT_TABLE GROUP BY… Query 2 Query is optimized and compiled using ML at the leader node. Determine what is run locally and what goes to Amazon Redshift lake house. 3 Query plan sent to all compute nodes. 4 Compute nodes obtained from the Data Catalog; dynamically prune partitions. 5 Each compute node issues multiple requests to Amazon Redshift lake house layers. 6 Amazon Redshift lake house nodes scan Amazon S3 data. 7 Amazon Redshift lake house projects, filters, joins, and aggregates. 8 Final aggregations and joins with local Amazon Redshift tables done in-cluster. 9 Result is sent to client.
  • 56. Advanced Query Accelerator (AQUA) A new distributed and hardware-accelerated cache that makes Amazon Redshift faster than other cloud data warehouses, without increasing cost 60 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Minimizes data movement over the network by pushing operations to Advanced Query Accelerator (AQUA) nodes AQUA nodes with custom AWS designed analytics processors to make operations (compression, encryption, filtering, and aggregations) faster than traditional CPUs RA3 cluster AQUA node Custom AWS designed processor Running in parallel Amazon Redshift managed storage RA3 cluster RA3 cluster AQUA node Custom AWS designed processor AQUA node Custom AWS designed processor AQUA node Custom AWS designed processor
  • 57. Migration to Amazon Redshift 61 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 58. AWS SCT data extractors 63 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift extracts data through local migration agents. Data is optimized for Amazon Redshift and saved in local files. Files are loaded to an Amazon S3 bucket (through network or AWS Snowball) and then to Amazon Redshift. Amazon Redshift AWS SCT S3 Bucket Amazon S3 Amazon Redshift AWS SCT Legacy data warehouse
  • 59. Use case: Equinox 64 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenge Their data warehouse had limited integration, was very expensive, and required a lot of platform- specific domain knowledge. They needed to reduce administration and costs, blend structured and semi-structured data for analytics, and evolve into a data lake strategy. Solution Equinox migrated from a legacy data warehouse to Amazon Redshift to combine data from disparate sources like clickstream data, cycling log data, club management software, and more. They land data directly in an Amazon S3 data lake and perform analytics using Amazon Redshift, Redshift Spectrum, and Amazon EMR. Benefits Their monthly Amazon Redshift bill is now 20% of prior yearly maintenance of their legacy data warehouse. AWS data lake and analytics reduced report delivery time from months to days. Equinox sees faster reports, 80% cost savings with Amazon Redshift. https://www.youtube.com/watch?v=EvDicFx9StE
  • 60. Solution 2: Data lakes © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 66
  • 61. Journey to a modern data architecture Evolution of data architecture Traditional data warehousing Data lakes on AWS Data warehouse modernization 100110000100 101011100101 010111001010 100001011111 011010 001111001011 0010110 010001100001 0 Types of data Data governance Machine learning Real-time analytics with streaming data © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 67
  • 62. Data lakes defined 69 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. • Stores all structured, semi-structured, unstructured, and binary data at unlimited scale • Holds curated and raw data • Uses AWS data analytics tools for analytics • Increases pace of innovation by extracting insights from data • Enables more organizational agility • Reduces cost and delivers results with predictive analytics and ML Architectural approach for a centralized enterprise data repository stored on Amazon S3 Machine learning Business intelligence and analytics Data warehousing Data lake Open formats central catalog
  • 63. Secure data lake on Amazon S3 70 Amazon S3 Access Points Amazon S3 object lock Amazon S3 object tags Amazon S3 Block Public Access © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon FSx for Lustre • Multi-tenant bucket • Dedicated access points • Customer permissions from an Amazon Virtual Private Cloud (Amazon VPC) • Across AWS accounts and Amazon S3 bucket level • Specify public permissions using Access Control List (ACL) or policy • Four settings: • BlockPublicAcls • IgnorePublicAcls • BlockPublicPolicy • RestrictPublicBuckets • Access control, lifecycle policies, and analysis • Classify data with metadata • Use tags to filter objects • Define replication policies • Populate tags with AWS Lambda functions or S3 Batch Operations • Immutable Amazon S3 objects • Retention management controls • Data protection and compliance https://aws.amazon.com/compliance/services-in-scope
  • 64. 71 IAM Amazon CloudWatch AWS STS AWS CloudTrail AWS KMS Protect and secure Machine learning Amazon QuickSight Amazon EMR Amazon Redshift Amazon Athena Processing and analytics Amazon Kinesis AWS Direct Connect AWS Snowball AWS DMS AWS Data Exchange Data ingestion AWS Glue Amazon ES Amazon DynamoDB Catalog and search Amazon API Gateway IAM Amazon Cognito Access and user interface Amazon S3 Central storage Reference architecture: Data lake on AWS
  • 65. Data services – AWS Glue 72 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 66. Cleansing data After migration, data still presents challenges: © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 73 Data is increasingly diverse • Volume • Variety • Velocity • Veracity It accumulates rapidly • Missing or incorrect data • Wrong data format • Partial missing data Avoid unsearchable data It must be cleansed before analyzed by many applications How can customers provide access to users to gain insights?
  • 67. AWS Glue 74 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Data Catalog Job authoring Job running Job workflow § Hive metastore compatible with enhanced functionality § Crawlers automatically extracts metadata and creates tables § Integrates with Amazon Athena, Amazon EMR, and many more § Run jobs on a serverless Spark platform § Use flexible scheduling, job monitoring, and alerting § Generates ETL code § Build on open frameworks – Python, Scala, and Apache Spark § Developer-centric – editing, debugging, sharing § Orchestrate triggers, crawlers, and jobs § Author and monitor entire flows and integrated alerting
  • 68. AWS Glue crawlers © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 75 Amazon Redshift Amazon DynamoDB Amazon S3 Databases AWS IAM role AWS Glue crawler JDBC connection NoSQL connection Object connection Built-in classifiers MySQL MariaDB PostgreSQL Amazon Aurora Oracle Amazon Redshift Apache Avro Parquet ORC XML JSON and JSONPaths AWS CloudTrail Binary JSON (BSON) Logs Delimited … growing
  • 69. AWS Glue Data Catalog services 76 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Data Catalog Amazon Redshift lake house Amazon Athena AWS Glue ETL Amazon EMR
  • 70. Use case: Log aggregation with ETL 77 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS service logs Web application logs Server logs Amazon S3 bucket AWS Glue crawler Update table partition Create partition on Amazon S3 Query data AWS Glue ETL Amazon S3 bucket AWS Glue Data Catalog Amazon Athena
  • 71. Data services – AWS Data Exchange and Amazon Athena 78 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 72. AWS Data Exchange © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Find diverse data in one place Analyze data Access third-party data Find and subscribe to third-party data in the cloud • More than 1,000 data products • More than 80 data providers • Download of copy of data to Amazon S3 • Combine, analyze, and model with existing data • Streamlined access to data • Minimize legal reviews and negotiations 79
  • 73. Amazon Athena 80 No setup costs Streamlined Open Pay per query © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Interactive query service to analyze data in Amazon S3 using standard SQL SQL $ Zero setup costs, point to Amazon S3 and start querying Pay only for queries run, save 30%–90% on per-query costs through compression ANSI SQL interface, JDBC/ODBC drivers, multiple formats, compression types, and complex joins and data types Serverless, zero infrastructure, zero administration, integrated with Amazon QuickSight
  • 74. AWS Lake Formation 81 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 75. Challenges of building a secure data lake Typical steps to build a secure data lake Move data 2 Cleanse, prepare, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics 5 Set up storage 1 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 82 Data engineer Data security officer Data analyst Ingestion and cleaning Security Analytics and machine learning
  • 76. AWS Lake Formation for a secure data lake Secure and control Collaborate and use Monitor and audit Ingest and organize Automates creating data lake and data ingestion. Sets up fine-grained access control and data governance. Search and data discovery using Data Catalog metadata. To protect data, all access is checked against set policies. Based on data access and governance policies, alert notifications are raised on policy violation and logged. 2 3 4 1 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 83
  • 77. AWS Lake Formation builds on AWS Glue 84 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Blueprints AWS Glue ETL jobs Workflow AWS Glue crawlers AWS Glue Data Catalog Connections, databases, tables Monitoring Security, search, collaboration AWS Glue AWS Lake Formation
  • 78. AWS Lake Formation benefits 85 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift Amazon Athena AWS Glue Amazon EMR Amazon QuickSight Amazon SageMaker AWS Lake Formation Blueprints ML Transforms Data Catalog Access control Amazon S3 data lake storage Cost effective, durable storage includes global replication capabilities. Simplified ingest and cleaning enables data engineers to build faster. Centralized management of fine-grained permissions empowers security officers. Comprehensive set of integrated tools enables every user equally.
  • 79. Data visualization with Amazon QuickSight 86 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 80. Amazon QuickSight 87 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. BI service built for the cloud with pay-per-session pricing and ML insights Scalable Automatically scales with use and activity, with no additional infrastructure requirements. Seamlessly grows with customers. Pay monthly or annually. With pay-per-session pricing, customers only pay when they access their reports and dashboards, with no upfront costs. Pay for use Fully managed cloud application, meaning there's no upfront cost, software to deploy, capacity planning, maintenance, upgrades, or migrations. Serverless and fully managed Deeply integrated with data sources and other AWS services like Amazon Redshift, Amazon S3, Athena, Amazon Aurora, Amazon RDS, IAM, AWS CloudTrail, and Amazon Cloud Directory– providing customers with everything they need for an end-to-end cloud BI solution. Fully integrated
  • 81. Serverless data lakes and analytics Amazon S3 AWS Glue crawler AWS Glue Data Catalog Amazon Athena Amazon EMR Amazon Redshift Spectrum Amazon QuickSight Amazon RDS Web app data Other databases On-premises data Streaming data © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 88
  • 82. Different users solving different problems 89 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Data Catalog Amazon Athena Amazon EMR Amazon Redshift Spectrum Amazon Redshift Amazon QuickSight Data lake Amazon SageMaker Machine learning Amazon Redshift Spectrum Kibana Apache Zeppelin Jupyter Tableau MicroStrategy Data scientists Data engineers Business reporting
  • 83. Use case: COVID-19 90 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 84. Use case: COVID-19 pandemic 91 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenge The COVID-19 pandemic has stressed healthcare systems, businesses, and economies. It has disrupted the daily lives of people around the world. People need a solution to capture data (diagnosis, mortality, and recovery rates) globally in real time, and turn the data into insights they can share and respond to with confidence. Solution Amazon worked with APN Partners Salesforce, Tableau, and MuleSoft to create a secure data lake using AWS Data Exchange, AWS Glue, Amazon Athena, and Amazon S3 as a store of trusted data from open source COVID-19 data providers. Benefits Health workers, scientists, and decision makers can access and compare international data to their local data, enabling understanding and visualization of the impact of COVID-19 locally and globally. This solution enables decision making and deeper insights to help manage and flatten the COVID-19 curve until a vaccine is available.
  • 85. Summary 93 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Evolution of data architecture Traditional data warehousing Data lakes on AWS Real-time analytics with streaming data Data warehouse modernization Data governance 10011000010010101110010 10101110010101000010111 11011010 0011110010110010110 0100011000010 Machine learning Amazon Redshift • Amazon S3 • AWS Glue • AWS Data Exchange • Amazon Athena • AWS Lake Formation • Amazon QuickSight AWS data migration options
  • 86. © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course- feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners. Thank you 94 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 87. Module 4: AWS Data Analytics Solutions – Part II
  • 88. Objectives In this module, you will learn about three key types of data analytics technical solutions on AWS: • Streaming and real-time analytics with Amazon Kinesis • Data governance • Extended solution: Insights and monetization with machine learning (ML) 106 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Evolution of data architecture Traditional data warehousing Data lakes on AWS Real-time analytics with streaming data Data warehouse modernization Data governance 10011000010010101110010 10101110010101000010111 11011010 0011110010110010110 0100011000010 Machine learning
  • 89. Solution 3: Streaming and real-time analytics with Amazon Kinesis © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 107
  • 90. Journey to a modern data architecture Evolution of data architecture Traditional data warehousing Data lakes on AWS Real-time analytics with streaming data Data warehouse modernization Data governance 10011000010010101110010 10101110010101000010111 11011010 0011110010110010110 0100011000010 Machine learning Types of data used © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 108
  • 91. Streaming data defined 109 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Data that is generated continuously from thousands of data sources, sent simultaneously Player-game interactions Geolocation of cars and devices Music downloads Website clicks Social media streams
  • 92. Common use cases: Real-time analytics 110 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Milliseconds Seconds Minutes Hours • Messaging between microservices • Response analytics (web and mobile application notifications) • Log ingestion • Internet of Things (IoT) device maintenance • Change data capture (CDC) • Streaming ETL into data lakes and data warehouse The value of data diminishes over time
  • 93. Enabling real-time analytics 111 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Data streaming technology enables a customer to ingest, process, and analyze high volumes of high-velocity data from a variety of sources, in real time. 1. 2. 3. 4. 5.
  • 94. Data streaming solution challenges Difficult to set up Difficult to achieve high availability Error prone and complex to manage Tricky to scale Integration requires development Expensive to maintain © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 112 Challenges of building on-premises, real-time streaming solutions:
  • 95. AWS streaming data solutions Efficiently collect, process, and analyze data streams in real time Amazon Kinesis Data Streams Amazon Kinesis Data Firehose © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 113 Amazon Kinesis Data Analytics
  • 96. Data generators: Simple streaming data patterns 114 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Data producers Streaming services Data consumers Amazon Kinesis Data Firehose Amazon Kinesis Data Analytics Amazon Kinesis Data Streams Mobile and applications Amazon Kinesis Agent Amazon Kinesis Data Streams Amazon CloudWatch Logs Amazon CloudWatch Events AWS IoT Apache Kafka Amazon Kinesis Producer Library (KPL) Amazon EMR Amazon Redshift Amazon Simple Storage Service (S3) Amazon EC2 Amazon Kinesis Connector Library
  • 97. Amazon Kinesis Data Streams 115 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 98. Amazon Kinesis Data Streams 116 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Massively scalable, highly durable data ingestion and processing service optimized for real-time data streaming No upfront cost low, pay-as- you-go pricing 70 Data collected is available within milliseconds Real-time analytics • Dashboards • Anomaly detection • Dynamic pricing 3 7 Data synchronously replicates data across Availability Zones in a Region Data can be stored up to days Serverless, can scale dynamically to handle MB to TB Thousands to millions each hour of PutRecords each second and https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5
  • 99. How Kinesis Data Streams works 117 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Analytics Amazon EC2 AWS Lambda Input Output Spark on Amazon EMR Amazon Kinesis Data Streams Capture and send data Ingest and store data streams for processing Build custom, real-time applications Analyze streaming data using BI tools
  • 100. Kinesis Data Streams architecture 118 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EC2 instances Client Mobile client Traditional server Data producers Shard 1 Shard 2 Shard N Amazon Kinesis Data Stream EC2 instance EC2 instance Data consumers Amazon Redshift Amazon S3 Amazon Kinesis Data Firehose Amazon EMR Amazon DynamoDB Shard 1 Data record • Sequence # • Partition Key • Data blob Data stream https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5 Amazon Kinesis Data Firehose Amazon Kinesis Data Analytics
  • 101. Kinesis Data Streams provisioning 119 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 102. Amazon Kinesis Data Firehose 120 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 103. How Kinesis Data Firehose works 121 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Firehose Input Output Splunk Amazon Redshift Amazon S3 Amazon Elasticsearch Service Capture and send data Prepares and loads data continuously to the selected destinations Durably store the data for analytics Analyze streaming data using analytics tools
  • 104. Kinesis Data Streams and Kinesis Data Firehose 122 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Characteristics Amazon Kinesis Data Streams Amazon Kinesis Data Firehose Processing time As fast as 70 milliseconds after ingestion Between 60–900 seconds Stream storage and duration In shards, default 24 hours and up to 7 days Max buffer size 128 MB and max time 900 seconds Data transformation and conversion None Uses AWS Lambda and AWS Glue Data producer Amazon Kinesis Agent, applications using Amazon Kinesis Producer Library (KPL), AWS SDK for Java, Amazon CloudWatch Logs and CloudWatch Events, AWS IoT Data consumer AWS Lambda, Amazon Kinesis Data Analytics, Amazon Kinesis Data Firehose, Applications using the Kinesis Client Library (KCL) and SDK for Java AWS Lambda, Amazon Kinesis Data Analytics, and Kinesis Data Firehose, apps using the KCL and SWK for Java, Amazon S3, Amazon Redshift, Amazon ES, Splunk, and Amazon Kinesis Data Analytics Data compression None gzip, Snappy, Zip, or no data compression https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5 https://aws.amazon.com/kinesis/data-firehose/faqs/?nc=sn&loc=5
  • 105. When to use Kinesis Data Streams and Kinesis Data Firehose 123 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Firehose For data streaming applications with massive ingestion requirements • Requires data to be sent to consumer analytics services for millisecond response time • Massively scalable • Data retention time ranging from hours to days • Example: Real-time gaming Amazon Kinesis Data Streams For data streaming applications that require near real-time responses in seconds • Need for data augmentation, data transformation, or data compression • Need to save data to Amazon S3, Amazon Redshift, Amazon ES, Splunk, or send data to Amazon Kinesis Data Analytics for analytics • Example: Log analytics
  • 106. Amazon Kinesis Data Analytics 124 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 107. Amazon Kinesis Data Analytics 125 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Input Amazon Kinesis Data Analytics Output Capture streaming data with Amazon MSK, Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose, or other data sources Query and analyze streaming data Send processes data to analytics tools to create alerts and respond in real time
  • 108. Use case: Clickstream analytics s127 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Firehose Input Output Amazon Kinesis Data Firehose Amazon Kinesis Data Analytics Amazon Redshift Evolve from batch processing to real-time analytics Websites send clickstream data Collects the data and sends to Kinesis Data Analytics Processes data in near-real time Loads processed data into Amazon Redshift Runs analytics models to identify content recommendations Readers see personalized content suggestions and increase engagement
  • 109. Case study: Epic Games 130 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenge Needed a way to process and analyze over 100 PB of data (125 million events each minute) ingested from game clients and game servers to understand and adapt to player engagement. Solution Epic Games turned to AWS for an Amazon S3 data lake in combination with Amazon EMR, Amazon EC2, and Amazon Kinesis. Benefits The data provides a constant feedback loop for designers, and an up-to-the-minute analysis of gamer satisfaction to drive gamer engagement. Continually improves Fortnite for 250+ million players globally https://aws.amazon.com/solutions/case-studies/EPICGames/
  • 110. Solution 4: Data governance © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 132
  • 111. Journey to a modern data architecture Evolution of data architecture Traditional data warehousing Data lakes on AWS Real-time analytics with streaming data Data warehouse modernization Data governance 10011000010010101110010 10101110010101000010111 11011010 0011110010110010110 0100011000010 Machine learning Types of data used © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 133
  • 112. Challenges of data in data lakes • Securing data • Auditing data usage • Managing data access • Safeguarding sensitive data and PII • Maintaining regulations and mandates 134 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 113. Data security and governance © ENTERPRISE STRATEGY GROUP, 2019. With big data comes big responsibility. More than one in three companies cite data privacy and governance as a hurdle to both digital transformation and IoT initiatives 34% 37% of IT decision makers cite ensuring data governance/privacy as one of their organization’s biggest digital transformation challenges of IT decision makers cite ensuring security/compliance upon movement of data as one of their most important IoT priorities over the next 18–24 months © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 135 https://www.esg-global.com/hubfs/ESG-Infographic-IT-Spending-Intentions-2019.pdf
  • 114. Resolving PII dangers 136 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Personally identifiable information (PII) Consumer consent violation Data breach Spyware Unsecured devices Rogue agents Second- party misuse Espionage External hacking • Do these issues need to be resolved? • Is there a solution architecture that solves all PII issues? • What best practices can be used to mitigate PII dangers?
  • 115. Amazon Macie 137 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Macie Continually evaluate Amazon S3 environment Discover sensitive data Take action Enable Amazon Macie with one-click in the AWS Management Console or with a single API call Automatically generates an inventory of Amazon S3 bucket and details on the bucket-level security and access controls Analyzes bucket using ML and pattern matching to discover sensitive data, like PII Generates findings and sends to Amazon CloudWatch Events for integration into workflows and remediation actions • Financial • Personal • National • Medical • Credentials and secrets
  • 116. De-identified data lake (DIDL) on AWS A de-identified data lake (DIDL) is an architectural approach that reduces the risks associated with managing data, particularly personally identifiable information (PII). Benefits Reduce risk • Remove PII before it enters a data lake Understand all the data • Create a Data Catalog of an entire data lake Reduce compliance costs • Automate the discovery, classification, de-identification, and ongoing monitoring of data across an organization Turn data into an asset, not a liability • Enable a broader set of governed analytic and machine learning use cases © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 138
  • 117. Masking PII data 139 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Email Customer ID Transcript csalazar@example.com 19664 Just talked to Carlos Salazar mary@example.com 23423 Mary’s SSN is 000000000 mateo@example.com 99644 Mateo is moving to Nevada NA 02945 It is expected to rain tomorrow Email Customer ID Transcript 4t34gttt 7462391 Just talked to Jane Roe 44e5325 1239474 Jorge’s SSN is 666666666 0we&yrw 9983487 Sofia is moving to Texas NA 3344325 It is expected to rain tomorrow Email ID Name, SSN, State
  • 118. Extended solution 5: Insights and monetization with ML on AWS © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 140
  • 119. Journey to a modern data architecture Evolution of data architecture Traditional data warehousing Data lakes on AWS Real-time analytics with streaming data Data warehouse modernization Data governance 10011000010010101110010 10101110010101000010111 11011010 0011110010110010110 0100011000010 Machine learning Types of data used © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 141
  • 120. Data lakes and machine learning Machine learning requires: • More data: Collect all types of data • Flexibility: Define schema during analysis • Scalability: Scale storage and compute (CPU or GPU) independently • Data transformation and processing: Run a broad set of processing and analytics on the same data without movement • Security: Networking, identity, encryption, and compliance OLTP ERP CRM LOB Data warehouse Business analytics 10011000010010101 11001010101110010 10100001011111011 010 00111100101100101 10 0100011000010 Data lake Devices Web Sensors Social Data Catalog AI and machine learning Data warehouse queries Big data processing Interactive Real time © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 142
  • 121. Amazon SageMaker 143 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Machine learning at enterprise scale Build Train and tune Deploy and manage Notebooks for common problems High-performance algorithms • Managed Jupyter for enterprise data science • Sample notebooks for most common use cases • Single-pass, streaming training algorithms One-click training Hyperparameter optimization One-click deployment Fully managed elastic hosting • Training models at scale without DevOps assistance • ML on ML to optimize hyperparameters • Deploy to production with a single call • Fully managed, production-grade inferences https://aws.amazon.com/machine-learning/?nc2=h_ql_prod_ml
  • 122. Machine learning resources • Fundamental digital course on how SageMaker mitigates the core challenges of implementing an ML pipeline • Duration: 30 minutes • https://www.aws.training/Detai ls/Video?id=49646 145 • Explore how to use the machine learning pipeline to solve a real business problem (intermediate) • Duration: 4 days • https://www.aws.training/Sessi onSearch?pageNumber=1&cou rseId=38910 • Learn to solve real-world use cases with machine learning (intermediate) • Duration: 1 day • https://www.aws.training/Sessi onSearch?pageNumber=1&cou rseId=40748 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Foundations: How Amazon SageMaker Can Help Practical Data Science with Amazon SageMaker The Machine Learning Pipeline on AWS https://partnercentral.awspartner.com/LmsSsoRe direct?RelayState=%2flearningobject%2fcurriculu m%3fid%3d25521 AWS STP: Machine Learning (ML) on AWS for ML Practitioners - Technical
  • 123. Use case: Next Caller 146 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://www.youtube.com/watch?v=K27WjYwyqw8&list=PLhr1KZpdzukdeX8m Q2qO73bg6UKQHYsHb&index=1&did=ta_card&trk=ta_card
  • 124. 2022 – re:Invent – What’s new in Analytics? © 2022 Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s new Amazon OpenSearch • Serverless (Preview) Redshift • Multi-AZ for RA3 clusters (Preview) • Auto-copy from S3 (Preview) • Integration for Apache Spark • Amazon Aurora zero-ETL • Dynamic Data Masking (Preview) • AWS Backup support • Streaming Ingestion for Kinesis Data Streams (Preview) • Centralized access control with AWS Lake formation (Preview) AWS Glue • Data Quality (Preview) • AWS Glue 4.0 • Custom visual transforms • AWS Glue for Ray (Preview) Amazon Athena • Spark support QuickSight • Automated data preparation • Paginated Reports
  • 125. Summary 148 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Evolution of data architecture Traditional data warehousing Data lakes on AWS Real-time analytics with streaming data Data warehouse modernization Data governance 10011000010010101110010 10101110010101000010111 11011010 0011110010110010110 0100011000010 Machine learning • Kinesis Data Streams • Kinesis Data Firehose • Kinesis Data Analytics Amazon Macie Amazon SageMaker
  • 126. © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course- feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners. Thank you
  • 127. Module 5: APN Partner Opportunities and Resources
  • 128. Objectives In this module, you will learn how to: • Describe how to collaborate with AWS for data analytics • Describe AWS Data and Analytics resources for APN Partners: • Competency categories • AWS Immersion Days • AWS Certified Data Analytics and learning resources • Access the AWS Marketplace • Perform the calls to action © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 202
  • 129. APN Partners and AWS for Data Analytics © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 130. Discounting and funding programs Enterprise Discount Program (EDP) Migration programs POC funding © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 204
  • 131. AWS Data and Analytics Competency categories © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Analytics Platforms NoSQL/New SQL Data Integration and Preparation Business Intelligence (BI) and Data Visualization Data Governance and Security Provide a set of integrated tools to solve data analytics challenges within a standard framework Provide highly scalable databases that organize data into a structure Enable customers to move and consolidate data from disparate sources, transform it, and prepare it for analytics Help customers turn raw data into actionable business information, such as reporting, dashboards, and data visualization Help customers discover, categorize, and control their data 205
  • 132. AWS data analytics solutions and Immersion Days © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 133. AWS Data Lab program • The AWS Data Lab program offers accelerated joint engineering engagements between a team of customer builders and AWS technical resources to create tangible deliverables that accelerate data and analytics modernization initiatives. • Two offerings: © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Design Lab Focus on real-world architectural design Build Lab Focus on providing guidance with building a functioning prototype with a customer team Duration Half day to 5 days Location Virtual or AWS Data Lab hub – Seattle, NYC, Herndon (VA), London, Bangalore Cost Free. Reach out to your APN support team for more information. 210 https://aws.amazon.com/aws-data-lab/
  • 134. AWS Immersion Days Designed to help APN Advanced and Premier Consulting Partners deliver technical data analytics workshops to their customers and help grow their businesses © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Engineering Immersion Day Build a serverless data lake solution on AWS including modules focusing on ingestion, hydration, exploration, and consumption https://aws.amazon.com/partners/immersion-days/ Amazon EMR Immersion Day Focus on unique facets of Amazon EMR for big data workloads Database Migration Immersion Day Give your customers a head start with the AWS Database Migration Service and the Schema Conversion Tool … and many more. Benefits: Access to technical workshop content, AWS usage credits, Market Development Funds (MDF) opportunities, and support from AWS teams 211
  • 135. Data solutions in AWS Marketplace © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 136. AWS Marketplace © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/marketplace/search/results?searchTerms=data+and+analytics 213
  • 137. Use case: Modern data warehouse © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 214
  • 138. AWS Certified data analytics and learning resources © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 139. AWS Technical Professional Learning Path © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 216
  • 140. AWS Certified Data Analytics – Specialty © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/certification/certified-data-analytics-specialty/ 217
  • 141. Why I got AWS certified Customers talk about the value of AWS certification © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 218
  • 142. © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Partner Cast: Analytics 219
  • 143. Call to action © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 144. Build a data analytic practice on AWS Build packaged solutions Know your Partner Solutions Architect Ask for customer references Engage with AWS service teams Develop customer workshops Achieve an APN competency © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 221
  • 145. Call to action © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Use the Data Flywheel to perform assessments Work with your Partner team to schedule an Immersion Day for your customers View the analytics customer case studies https://aws.amazon.com/ big-data/datalakes-and- analytics/ Create a specialized service around one of the analytics services Participate in the AWS Data Lab https://aws.amazon.com/ aws-data-lab/ Prepare for the AWS Data Analytics – Specialty certification Build relationships with APN teams for funding opportunities for your marketing and sales efforts 222
  • 146. © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course- feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners. Thank You