3. AWS Big Data and Analytic Services
Any analytic workload, any scale, at the lowest possible cost
Insights
Analytics
Data Lake
Data Movement
Amazon
QuickSight
Amazon
SageMaker
AWS Glue
(ETL & Data Catalog)
S3/Amazon
Glacier
(Storage)
Amazon
Redshift
+Spectrum
Amazon
EMR
Amazon
Athena
Amazon Elasticsearch
Service
Amazon Kinesis
Analytics
Database Migration Service | Snowball | Kinesis Data Firehose | Kinesis Data Streams
Real-time
Amazon Comprehend
DW Big data processing Interactive
4. Big Data on AWS
Immediate Availability. Deploy instantly. No hardware to
procure, no infrastructure to maintain & scale
Trusted & Secure. Designed to meet the strictest
requirements. Continuously audited, including certifications
such as ISO 27001, FedRAMP, DoD CSM, and PCI DSS.
Broad & Deep Capabilities. Over 100 services and 100s
of features to support virtually any big data application &
workload
Hundreds of Partners & Solutions. Get help from a
consulting partner or choose from hundreds of tools and
applications across the entire data management stack.
6. Traditionally, Analytics Used to Look Like This
OLTP ERP CRM LOB
Data Warehouse
Business Intelligence Relational data
TBs-PBs scale
Schema defined prior to data load
Operational reporting and ad hoc
Large initial capex + $10K–$50K / TB / Year
7. New requirements break the traditional approach
Secure and combine
data from new and
existing sources
Do new types of
analysis (ML, big data
& real-time)
Capture and store new
non-relational data at
EB scale
Customers
need to:
Data exists in silos,
ETL does not scale
at EB data volumes
Operational
and ad hoc on
relational only
DW is optimized
for relational data
at PB scale
Challenges with
traditional approach:
8. Data Lakes Extend the Traditional Approach
Relational and non-relational data
TBs-EBs scale
Schema defined during analysis
Diverse analytical engines to gain insights
Designed for low-cost storage and analytics
OLTP ERP CRM LOB
Data Warehouse
Business
Intelligence
Data Lake
100110000100101011100
101010111001010100001
011111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
Learning
DW
Queries
Big data
processing
Interactive Real-time
9. Storing is not enough, data needs to be discoverable
Dark data are the information
assets organizations collect,
process, and store during
regular business activities,
but generally fail to use for
other purposes (for example,
analytics, business relationships
and direct monetizing).
Gartner
CRM ERP Data warehouse Mainframe
data
Web Social Log
files
Machine
data
Semi-
structured
Unstructured
“
”
10. 1. Automated and Reliable Data Ingestion
2. Preservation of Original Source Data
3. Lifecycle Management and Cold Storage
4. Metadata Capture
5. Managing Governance, Security and Privacy
6. Self-Service Discovery, Search, and Access
7. Managing Data Quality
8. Preparing for Analytics
9. Orchestration and Job Scheduling
10. Capturing Data Change
Storage & Streams
Catalogue & Search
Entitlements
API & UI
Attributes of a Modern
Data Architecture
Key Pillars of a
Data Lake
Key Components of a Successful Data Strategy
11. Building a Data Lake on AWS
Athena
Query Service
AWS Batch AWS GlueIoT Lambda Amazon SageMaker
Amazon
QuickSight
Amazon
Redshift
Amazon
EMR
12. Building a Data Lake on AWS
Automated and reliable data ingestion
13. Building a Data Lake on AWS
Preservation of Original Source Data
Lifecycle Management and Cold Storage
Capturing Data Change
AWS GlueAmazon Glacier
14. Building a Data Lake on AWS
Metadata Capture
AWS Glue
Amazon
ElastiSearchDynamoDB
15. Building a Data Lake on AWS
Managing Governance, Security, Privacy
AWS Glue
16. Building a Data Lake on AWS
Self-Service Discovery, Search, Access
AWS Glue
Amazon
Cognito
Identity & Access
Management
API Gateway
17. Building a Data Lake on AWS
Managing Data QualityLambda
AWS Glue
18. Building a Data Lake on AWS
Preparing for Analytics
Lambda
AWS Glue
19. Building a Data Lake on AWS
Orchestration and Job Scheduling
AWS Glue
Lambda
Step Functions
CloudWatch
Simple Workflow Service
23. Serverless Analytics
Deliver cost-effective analytic solutions faster
S3
Data Lake
AWS Glue
(ETL & Data
Catalog)
Amazon
Athena
Amazon
QuickSight
Serverless;
zero infrastructure;
zero administration
Never pay for
idle resources
$
Availability and
fault tolerance
built in
Automatically
scales resources
with usage
AWS IoT
Devices Web Sensors Social
24. What does the customer say?
https://aws.amazon.com/solutions/case-studies/analytics/
https://aws.amazon.com/solutions/case-studies/big-data/
25. FINRA Analyzes Billions of Transactions Daily
To respond to
rapidly changing
market dynamics,
FINRA, moved 75% of
its operations to
Amazon Web
Services, using AWS
to analyze 75B
records a day.
26. FINRA uses Amazon EMR and Amazon S3 to process up to 75 billion
trading events per day and securely store over 5 petabytes of data,
attaining savings of $10-20mm per year.
Fraud Detection
27. • AWS enables you to build sophisticated data strategies and related
analytics applications
• Retrospective, Real-time, Predictive
• You can build incrementally, adding use cases and increasing scale
as you go
• AWS provides a broad range of security and auditing features to
enable you to meet your security requirements
https://aws.amazon.com/big-data/
28. • Prescriptive guidance and rapidly deployable solutions to help
you store, analyze, and process big data on the AWS Cloud
• Derive Insights from IoT in Minutes using AWS IoT, Amazon
Kinesis Firehose, Amazon Athena, and Amazon QuickSight
• Deploying a Data Lake on AWS - March 2017 AWS Online Tech
Talks
• Harmonize, Search, and Analyze Loosely Coupled Datasets on
AWS with AWS Glue, Amazon Athena, and Amazon QuickSight
• From Data Lake to Data Warehouse: Enhancing Customer 360
with Amazon Redshift Spectrum
• Implement Continuous Integration and Delivery of Apache Spark
Applications using AWS
http://amzn.to/2vHIwBq
http://amzn.to/2i9gqZn
http://bit.ly/2qipA8h
http://amzn.to/2qpiFaK
http://amzn.to/2lpbc8p
Resources
http://amzn.to/2gIJcj8
29. Summary
Cloud and Big Data - perfect match: agility and new opportunities
AWS provides comprehensive Analytics, Security, Compliance capabilities
Data Lake requirements and use cases vary
Use AWS Partner Network (APN) and Open Source tools for specific needs
30. Join us for our first-ever Amazon Web Services Summit
in Ottawa on October 29, 2018
15 sessions featuring various
management and technical
topics
Connect with AWS & our
Canadian public sector partners
in the Solutions Expo
Meet and mingle with other public
sector customers from government,
education, and nonprofits
Register today!
aws.amazon.com/summits/ottawa-public-sector
31. We value your feedback!
Please share your feedback on the
AWS Public Sector Summit survey.
Survey will be emailed 24-48 hours following event.