SlideShare a Scribd company logo
1 of 53
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fabrizio Napolitano, Senior Solutions Architect
AWS Public Sector Canada
November 5, 2019
Building data lakes for analytics on
AWS
Paul Save, Product Manager, Data Science
Central 1
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
125+ million players
Data provides a constant feedback loop
for game designers
Up-to-the-minute analysis of gamer
satisfaction to drive gamer engagement
Resulting in the most popular
game played in the world
Fortnite
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to extract more value from your data ?
Growing
exponentially
From new
sources
Increasingly
diverse
Used by
many people
Analyzed by
many applications
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to extract more value from your data ?
Complications
Siloed approaches don’t work anymore
It’s too expensive and limiting
to store data on-premises
Implication
A new approach is needed to
extract insights and value
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cloud data lakes are the future
Data lake
Customers want:
To move to a single store, i.e., a data lake in the cloud
To store data securely in standard formats
To grow to any scale with low costs
To analyze their data in a variety of ways
To easily access and analyze data
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
You need a broad and deep portfolio
Migration & streaming services
Infrastructure Data catalog
& ETL
Security &
management
Dashboards Predictive analytics
Data
warehousing
Big data
processing
Interactive
query
Operational
analytics
Real-
time
analytics
Serverless
data processing
Visualization & machine learning
Data movement
Analytics
Data lake infrastructure & management
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data movement
Analytics
Broadest and deepest portfolio,
purpose-built for builders
+ 10 more
Amazon
Redshift
Amazon EMR
(Spark &
Hadoop)
Amazon
Athena
Amazon
Elasticsearch
Service
Amazon
Kinesis Data
Analytics
AWS Glue (Spark &
Python)
Amazon S3 &
Amazon S3 Glacier
AWS
Glue
AWS Lake
Formation
Visualization & machine learning
Amazon
QuickSight
Amazon
SageMaker
Amazon
Comprehend
Amazon
Lex
Amazon
Polly
Amazon
Rekognition
Amazon
Translate
Amazon
Transcribe
Deep learning
AMIs
AWS Database Migration Service | AWS Snowball | AWS Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka
Data lake infrastructure & management
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon SageMaker
Frameworks Interfaces
Amazon EC2 P3
& P3dn
Amazon
EC2 C5
FPGASs AWS IoT
Greengrass
Amazon Elastic
Inference
The Amazon ML stack
Broadest & deepest set of capabilities
AI services
ML frameworks & infrastructure
Amazon
Rekognition
Image
Amazon Polly
Transcribe
Amazon
Translate
Amazon
Comprehend
& Amazon Comprehend
Medical
Amazon
Rekognition
Video
Amazon
Textract
Amazon
Forecast
Amazon
Personalize
Amazon
Lex
Vision Speech ChatbotsLanguage Forecasting Recommendations
Infrastructure
Pre-built algorithms & notebooks
Data labeling (Amazon SageMaker Ground Truth)
One-click model training & tuning
Optimization (NEO)
One-click deployment & hosting
Reinforcement learningAlgorithms & models (AWS Marketplace for ML)
Train DeployBuild
ML services
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Services for security and governance
Compliance
AWS Artifact
Amazon Inspector
AWS CloudHSM
Amazon Cognito
AWS CloudTrail
Security
Amazon GuardDuty
AWS Shield
AWS WAF
Amazon Macie
Amazon VPC
Encryption
AWS Certification Manager
AWS Key Management
Service
Encryption at rest
Encryption in transit
Bring your own keys,
HSM support
Identity
AWS IAM
AWS SSO
Amazon Cloud Directory
AWS Directory Service
AWS Organizations
Customers need to have multiple levels of security, identity and access management,
encryption, and compliance to secure their data lake
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Decouple compute and storage,
choice of PAYG analytics services
Storage
Amazon S3 tiers &
intelligent tiering
From $0.023 per
GB/mo. to as low as
$0.004 per GB/mo.
Compute
Spot & Reserved
Instances
Save up to 90%
off On-Demand
prices
Amazon
EMR
Automatic scaling
57% less than
on-premises
per IDC report
Amazon Redshift
Less than a tenth
of the cost of
traditional solutions
Athena & Amazon
QuickSight
Serverless pay
only for what you use
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More data lakes and analytics than anywhere else
More than 10,000 data lakes on AWS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Most partners to complement AWS offerings
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data lake infrastructure
& management solutions
Infrastructure Data catalog
& ETL
Security &
management
Data lake infrastructure & management
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
S3
Lake Formation & AWS Glue
Snowball Kinesis
Data Streams
Snowmobile Kinesis
Data Firehose
Amazon
Redshift
Amazo
n EMR
Athena
Kinesis
Amazon ES
Robust data lake infrastructure
Amazon
SageMaker
Comprehend
Amazon
Rekognition
Durable and available; exabyte scale
Secure, compliant, auditable
Object-level controls for fine-grained access
Fast performance by retrieving subsets of data
Decoupling of compute and storage
On-demand resources, tiering, cost choices
Data lake infrastructure
& management
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Build on robust data lake infrastructure
with Amazon S3
✔ 99.99999999999% durability
✔ Global replication capabilities
✔ Management features
✔ Cost-effective storage classes
✔ Most partner integrations
Data lake infrastructure
& management
”
“ • Public sector company evaluates properties across
Canada for use in establishing taxes
• Migrated off traditional IT architecture to AWS for
greater speed and agility
• Main valuation engine now runs 5,000 percent faster at
one-tenth the cost of previous architecture
• Developers release new features every one to two
weeks instead of three to six months in the past
MPAC Valuation Engine Runs 5,000% Faster Using AWS
MPAC provides a property assessment system for
Canada. It is based in Pickering, Ontario.
Nicole McNeill, Chief Financial Officer
”
“AWS has had a transformational
effect on our business, enabling
us to serve our business clients
better and faster than we ever
have before.
Data lake infrastructure
& management
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Set up a catalog, ETL, and data prep
with AWS Glue
Serverless provisioning, configuration,
and scaling to run your ETL jobs on
Apache Spark
Pay only for the resources used for jobs
Crawl your data sources, identify data
formats, and suggest schemas and
transformations
Automates the effort in building,
maintaining and running ETL jobs
Data lake infrastructure
& management
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges to making a secure data lake Data lake infrastructure
& management
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Build a secure data lake in days
with AWS Lake Formation
Move, store, catalog, and
clean your data faster
Move, store, catalog,
and clean your data faster
with machine learning
Enforce security policies
across multiple services
Enforce security policies
across multiple services
Gain and manage new
insights
Empower analysts and data
scientists to gain and
manage new insights
Data lake infrastructure
& management
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data lake infrastructure
& management
“With an enterprise-ready
option like Lake Formation,
we will be able to spend
more time deriving value
from our data rather than
doing the heavy lifting
involved in manually
setting up and managing
our data lake.”
Joshua Couch, VP engineering,
Fender Digital
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analytics solutions
Data
warehousing
Big data
processing
Interactive
query
Operational
analytics
Real-
time
analytics
Serverless
data processing
Big data processing with Apache Spark & Hadoop
with Amazon EMR
Easy to use notebooks
Low cost vs on-premises
Elastic automatic scaling
Reliable 99.9% SLA
Secure with encryption and keys
Flexible, open source choice
Analytics
Enterprise-grade Easy Lowest cost
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analytics
FINRA’s legacy system did not
scale to handle 75 billion events
per day. It needed to run
complex surveillance queries
over 20+ PB of data.
FINRA migrated its big data
appliance to an Amazon S3 data
lake, and it uses Amazon EMR
for ingestion and processing.
”
“
National Bank of Canada Uses AWS to Generate New Revenue
National Bank of Canada is a leading Canadian
financial services organization.
The speed and performance
of AWS are impressive.
Data-manipulation processes
that took days are now down
to one minute.
• Wanted to more easily scale its data analysis platform
• Runs data analysis using the TickVault platform on
the AWS Cloud
• Scales to process and analyze hundreds of terabytes
of financial data
• Conducts data manipulations in one minute instead of
days
• Optimizes its trading operations and generates more
revenue
Pascal Bergeron
Director of Algorithmic Trading
”
“
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Forrester Wave
Cloud Hadoop/Spark Platforms
Q1 2019
The 11 providers that matter most
and how they stack up
by Noel Yuhanna and Mike Gualtieri
February 13, 2019
The Forrester Wave is copyrighted by Forrester Research, Inc. Forrester and
Forrester Wave are trademarks of Forrester Research, Inc. The Forrester Wave
is a graphical representation of Forrester's call on a market and is plotted using a
detailed spreadsheet with exposed scores, weightings, and comments. Forrester
does not endorse any vendor, product, or service depicted in the Forrester Wave.
Information is based on best available resources. Opinions reflect judgment at the
time and are subject to change.
Data warehouse for business reporting
with Amazon Redshift
Fast: Up to 10x faster than traditional
data warehouses
Easy to set up, deploy, and manage
Cost-effective
Scale on-demand for large data
volume and high query concurrency
Query data in open formats directly
from the data lake
Analytics
CHALLENGE
Needed to analyze data to find
insights, identify opportunities, and
evaluate business performance.
The Oracle DW did not scale, was
difficult to maintain, and costly.
SOLUTION
Deployed a data lake with Amazon S3,
and run analytics with Amazon
Redshift, Amazon Redshift Spectrum,
and Amazon EMR.
Result: They doubled the data stored
(100PB), lowered costs, and was able
to gain insights faster.
50 PB of data
600,000 analytics jobs/day
Analytics
Real-time analytics for timely insights
with Amazon Kinesis
Make streaming data available to
multiple real-time analytics applications
Run streaming applications without
managing any infrastructure
Durable to reduce the probability
of data loss
Scalable to process data from hundreds
of thousands of sources with low latencies
Analytics
Operational analytics for logs and search
with Amazon Elasticsearch
Fully managed; deploy
production-ready cluster
in minutes
Direct access to Elasticsearch
open-source APIs, Logstash,
and Kibana
VPC support; at-rest and
in-transit encryption
Scale up and down easily
Analytics
Interactive analysis
with Amazon Athena
Interactive query service to analyze data in
Amazon S3 using standard SQL
No infrastructure to set up or manage, and
no data to load
Ability to run SQL queries on data archived
in Amazon S3 Glacier
(coming soon)
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serverless analytics
Deliver on-demand analytics on the data lake
Amazon
S3
Data lake
AWS Glue
(ETL &
Data Catalog)
Athena
Amazon
QuickSight
Serverless. Zero
infrastructure. Zero
administration
Never pay for
idle resources
Availability and
fault tolerance
built in
Automatically
scales resources
with usage
AWS IoT
Core
AI/ML
Devices Web Sensors Social
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Visualization & machine
learning solutions
Dashboards Predictive analytics
Visualization & machine learning
Visual insights for everyone
with Amazon QuickSight
Pay only for what you use
Scale to tens of thousands of users
Embedded analytics
Build end-to-end BI solutions
Visualization &
machine learning
Advanced insights for everyone
With Amazon ML & AI services
Frameworks and interfaces for
machine learning practitioners
Platform services that make it
easy for any developer to get
started and get deep with ML
Application services that enable
developers to plugin pre-built
AI functionality into their apps
Visualization &
machine learning
Amazon S3
raw data Initial training data
is annotated by
human labelers
Active learning model
is trained from human
labeled data
Ambiguous data is sent to human
labelers for annotation
Human labeled data is then sent
back to retrain and improve the
machine learning model
Training data the
model understands is
automatically labeled
An accurate training dataset
is ready for use in
Amazon SageMaker
Central 1 & AWS
2019-Nov-05
Paul Save, Product Manager, Data Science
Central 1
What is a Credit Union?
Full service
financial
institutions
Use a
cooperative
business
model
Safe and
stable –
provincially
regulated &
insured
Owned by
their
members –
people who
bank with
them
Profits have a
purpose – to
benefit the
people they were
built to serve
Boards are
elected
members from
local
community
As of May 2018
Credit unions in Canada
My Team
Data Engineers
Data Scientists
Product Manager, Data Science
The need: to provide the right services and
products to the right customers at the right time
Why AWS?
44
Model Acceleration
Solutions/Product
Selection &
Integration
Governance, Quality
& Data Protection
Scale & Cost
Data & Integration
Workforce Capabilities
Value, Strategy &
Adoption
Data Ecosystem
Output
Engage the right people at the right time
PrecariousActive Member
Optimal Retainment
Period
Asset loss
Churn
Feature
Feature
Feature
Next best
action
Probability * Cost
Jane
Complexity & Value of the models
~600 features created 15 models explored to determine
appropriateness for retention
10,000+ models trained
for tuning
>90% accuracy
on predictions
Continuous Improvement of the
model with marketing results data
How our system
works
Overall flow
Credit
Union C1
CRM
1
2
3
Credit Union AWS Encryption CLI
AWS KMS
AWS Transfer for
SFTP
Step Functions Workflow
Secure Landing
Decrypt Lambda
Function
EC2 Instance
Raw
Refinement
Lambda Function
Glue JobRefine
d
Credit Union Side Secure VPC
VPC Endpoint
VPC Endpoint
1. Invoke 2. GenerateDataKey
3. Encrypt and
transfer
4. CreateObject
5. Trigger
6. Launch
7. Decrypt
8. Security Stuff
9. CreateObject
10.
Trigger
11.
Start
12.
CreateObject
1
Consumption
Prepare
& Send
Receive,
Decrypt +
Security
Process
& Refine
Step Functions Workflow
3. Data Validations
1. Crawler Lambda 2. Raw Crawler
Date Validation Null Validation Duplicate Record
3. Data Refinement
4. Crawler Lambda 5. Refined Crawler
Glue Data Catalog
S3 Metadata Bucket
Event Topic
Simple Notification Service
Lambda Event Subscriber
Dynamo Event Table
SQS Message
Feed Metadata JSON
…
Other Validations
Questions?
Paul Save
Product Manager, Data Science
psave@central1.com
WATCH VIDEO >>
Enhancing the
fan experience
One week of NFL games now creates 3TB
of data. NFL uses Amazon SageMaker to
analyze telemetry data to predict plays.
Computations that could take months to
refine now take only weeks or days.
Visualization &
machine learning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“Developing for the cutting
edge of medical education
means you need to have
access to cutting edge
features and services.
With AWS and Amazon Polly,
we have been able to
develop a new standard in
healthcare education.”
Sarah Third
IT & Communications Manager
Visualization &
machine learning
Thank you

More Related Content

What's hot

Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Data Protection in Transit and at Rest
Data Protection in Transit and at RestData Protection in Transit and at Rest
Data Protection in Transit and at RestAmazon Web Services
 
Databricks secure deployments and security baselines, doug march 2022
Databricks secure deployments and security baselines, doug march 2022Databricks secure deployments and security baselines, doug march 2022
Databricks secure deployments and security baselines, doug march 2022Henrik Brattlie
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxSwathiPonugumati
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...Amazon Web Services
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
AWS Storage - S3 Fundamentals
AWS Storage - S3 FundamentalsAWS Storage - S3 Fundamentals
AWS Storage - S3 FundamentalsPiyush Agrawal
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
Migrating On-Premises Databases to Cloud
Migrating On-Premises Databases to CloudMigrating On-Premises Databases to Cloud
Migrating On-Premises Databases to CloudAmazon Web Services
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSAmazon Web Services
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 

What's hot (20)

BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Data Protection in Transit and at Rest
Data Protection in Transit and at RestData Protection in Transit and at Rest
Data Protection in Transit and at Rest
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
 
Databricks secure deployments and security baselines, doug march 2022
Databricks secure deployments and security baselines, doug march 2022Databricks secure deployments and security baselines, doug march 2022
Databricks secure deployments and security baselines, doug march 2022
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
How to Streamline DataOps on AWS
How to Streamline DataOps on AWSHow to Streamline DataOps on AWS
How to Streamline DataOps on AWS
 
AWS Storage - S3 Fundamentals
AWS Storage - S3 FundamentalsAWS Storage - S3 Fundamentals
AWS Storage - S3 Fundamentals
 
AWS Simple Storage Service (s3)
AWS Simple Storage Service (s3) AWS Simple Storage Service (s3)
AWS Simple Storage Service (s3)
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Migrating On-Premises Databases to Cloud
Migrating On-Premises Databases to CloudMigrating On-Premises Databases to Cloud
Migrating On-Premises Databases to Cloud
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 

Similar to Building Data Lakes for Analytics on AWS

Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudAlluxio, Inc.
 
Deriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML ArchitecturesDeriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML ArchitecturesAmazon Web Services
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSSteven Hsieh
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAmazon Web Services
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析Amazon Web Services
 
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaImmersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaAmazon Web Services LATAM
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summits
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin BriskmanSameer Kenkare
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven DecisionsLeveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven DecisionsAmazon Web Services
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSAmazon Web Services
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Amazon Web Services
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTAmazon Web Services
 

Similar to Building Data Lakes for Analytics on AWS (20)

Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the Cloud
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Deriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML ArchitecturesDeriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML Architectures
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析
 
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaImmersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin Briskman
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven DecisionsLeveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
 
AWS-Quick-Start
AWS-Quick-StartAWS-Quick-Start
AWS-Quick-Start
 
HK-AWS-Quick-Start-Workshop
HK-AWS-Quick-Start-WorkshopHK-AWS-Quick-Start-Workshop
HK-AWS-Quick-Start-Workshop
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Building Data Lakes for Analytics on AWS

  • 1.
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fabrizio Napolitano, Senior Solutions Architect AWS Public Sector Canada November 5, 2019 Building data lakes for analytics on AWS Paul Save, Product Manager, Data Science Central 1
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 125+ million players Data provides a constant feedback loop for game designers Up-to-the-minute analysis of gamer satisfaction to drive gamer engagement Resulting in the most popular game played in the world Fortnite
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to extract more value from your data ? Growing exponentially From new sources Increasingly diverse Used by many people Analyzed by many applications
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to extract more value from your data ? Complications Siloed approaches don’t work anymore It’s too expensive and limiting to store data on-premises Implication A new approach is needed to extract insights and value
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cloud data lakes are the future Data lake Customers want: To move to a single store, i.e., a data lake in the cloud To store data securely in standard formats To grow to any scale with low costs To analyze their data in a variety of ways To easily access and analyze data
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. You need a broad and deep portfolio Migration & streaming services Infrastructure Data catalog & ETL Security & management Dashboards Predictive analytics Data warehousing Big data processing Interactive query Operational analytics Real- time analytics Serverless data processing Visualization & machine learning Data movement Analytics Data lake infrastructure & management
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data movement Analytics Broadest and deepest portfolio, purpose-built for builders + 10 more Amazon Redshift Amazon EMR (Spark & Hadoop) Amazon Athena Amazon Elasticsearch Service Amazon Kinesis Data Analytics AWS Glue (Spark & Python) Amazon S3 & Amazon S3 Glacier AWS Glue AWS Lake Formation Visualization & machine learning Amazon QuickSight Amazon SageMaker Amazon Comprehend Amazon Lex Amazon Polly Amazon Rekognition Amazon Translate Amazon Transcribe Deep learning AMIs AWS Database Migration Service | AWS Snowball | AWS Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka Data lake infrastructure & management
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon SageMaker Frameworks Interfaces Amazon EC2 P3 & P3dn Amazon EC2 C5 FPGASs AWS IoT Greengrass Amazon Elastic Inference The Amazon ML stack Broadest & deepest set of capabilities AI services ML frameworks & infrastructure Amazon Rekognition Image Amazon Polly Transcribe Amazon Translate Amazon Comprehend & Amazon Comprehend Medical Amazon Rekognition Video Amazon Textract Amazon Forecast Amazon Personalize Amazon Lex Vision Speech ChatbotsLanguage Forecasting Recommendations Infrastructure Pre-built algorithms & notebooks Data labeling (Amazon SageMaker Ground Truth) One-click model training & tuning Optimization (NEO) One-click deployment & hosting Reinforcement learningAlgorithms & models (AWS Marketplace for ML) Train DeployBuild ML services
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Services for security and governance Compliance AWS Artifact Amazon Inspector AWS CloudHSM Amazon Cognito AWS CloudTrail Security Amazon GuardDuty AWS Shield AWS WAF Amazon Macie Amazon VPC Encryption AWS Certification Manager AWS Key Management Service Encryption at rest Encryption in transit Bring your own keys, HSM support Identity AWS IAM AWS SSO Amazon Cloud Directory AWS Directory Service AWS Organizations Customers need to have multiple levels of security, identity and access management, encryption, and compliance to secure their data lake
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Decouple compute and storage, choice of PAYG analytics services Storage Amazon S3 tiers & intelligent tiering From $0.023 per GB/mo. to as low as $0.004 per GB/mo. Compute Spot & Reserved Instances Save up to 90% off On-Demand prices Amazon EMR Automatic scaling 57% less than on-premises per IDC report Amazon Redshift Less than a tenth of the cost of traditional solutions Athena & Amazon QuickSight Serverless pay only for what you use
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. More data lakes and analytics than anywhere else More than 10,000 data lakes on AWS
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Most partners to complement AWS offerings
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data lake infrastructure & management solutions Infrastructure Data catalog & ETL Security & management Data lake infrastructure & management
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. S3 Lake Formation & AWS Glue Snowball Kinesis Data Streams Snowmobile Kinesis Data Firehose Amazon Redshift Amazo n EMR Athena Kinesis Amazon ES Robust data lake infrastructure Amazon SageMaker Comprehend Amazon Rekognition Durable and available; exabyte scale Secure, compliant, auditable Object-level controls for fine-grained access Fast performance by retrieving subsets of data Decoupling of compute and storage On-demand resources, tiering, cost choices Data lake infrastructure & management
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Build on robust data lake infrastructure with Amazon S3 ✔ 99.99999999999% durability ✔ Global replication capabilities ✔ Management features ✔ Cost-effective storage classes ✔ Most partner integrations Data lake infrastructure & management
  • 17. ” “ • Public sector company evaluates properties across Canada for use in establishing taxes • Migrated off traditional IT architecture to AWS for greater speed and agility • Main valuation engine now runs 5,000 percent faster at one-tenth the cost of previous architecture • Developers release new features every one to two weeks instead of three to six months in the past MPAC Valuation Engine Runs 5,000% Faster Using AWS MPAC provides a property assessment system for Canada. It is based in Pickering, Ontario. Nicole McNeill, Chief Financial Officer ” “AWS has had a transformational effect on our business, enabling us to serve our business clients better and faster than we ever have before. Data lake infrastructure & management
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Set up a catalog, ETL, and data prep with AWS Glue Serverless provisioning, configuration, and scaling to run your ETL jobs on Apache Spark Pay only for the resources used for jobs Crawl your data sources, identify data formats, and suggest schemas and transformations Automates the effort in building, maintaining and running ETL jobs Data lake infrastructure & management
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Challenges to making a secure data lake Data lake infrastructure & management
  • 20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Build a secure data lake in days with AWS Lake Formation Move, store, catalog, and clean your data faster Move, store, catalog, and clean your data faster with machine learning Enforce security policies across multiple services Enforce security policies across multiple services Gain and manage new insights Empower analysts and data scientists to gain and manage new insights Data lake infrastructure & management
  • 21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data lake infrastructure & management “With an enterprise-ready option like Lake Formation, we will be able to spend more time deriving value from our data rather than doing the heavy lifting involved in manually setting up and managing our data lake.” Joshua Couch, VP engineering, Fender Digital
  • 22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analytics solutions Data warehousing Big data processing Interactive query Operational analytics Real- time analytics Serverless data processing
  • 23. Big data processing with Apache Spark & Hadoop with Amazon EMR Easy to use notebooks Low cost vs on-premises Elastic automatic scaling Reliable 99.9% SLA Secure with encryption and keys Flexible, open source choice Analytics Enterprise-grade Easy Lowest cost
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analytics FINRA’s legacy system did not scale to handle 75 billion events per day. It needed to run complex surveillance queries over 20+ PB of data. FINRA migrated its big data appliance to an Amazon S3 data lake, and it uses Amazon EMR for ingestion and processing.
  • 25. ” “ National Bank of Canada Uses AWS to Generate New Revenue National Bank of Canada is a leading Canadian financial services organization. The speed and performance of AWS are impressive. Data-manipulation processes that took days are now down to one minute. • Wanted to more easily scale its data analysis platform • Runs data analysis using the TickVault platform on the AWS Cloud • Scales to process and analyze hundreds of terabytes of financial data • Conducts data manipulations in one minute instead of days • Optimizes its trading operations and generates more revenue Pascal Bergeron Director of Algorithmic Trading ” “ Analytics
  • 26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Forrester Wave Cloud Hadoop/Spark Platforms Q1 2019 The 11 providers that matter most and how they stack up by Noel Yuhanna and Mike Gualtieri February 13, 2019 The Forrester Wave is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave are trademarks of Forrester Research, Inc. The Forrester Wave is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
  • 27. Data warehouse for business reporting with Amazon Redshift Fast: Up to 10x faster than traditional data warehouses Easy to set up, deploy, and manage Cost-effective Scale on-demand for large data volume and high query concurrency Query data in open formats directly from the data lake Analytics
  • 28. CHALLENGE Needed to analyze data to find insights, identify opportunities, and evaluate business performance. The Oracle DW did not scale, was difficult to maintain, and costly. SOLUTION Deployed a data lake with Amazon S3, and run analytics with Amazon Redshift, Amazon Redshift Spectrum, and Amazon EMR. Result: They doubled the data stored (100PB), lowered costs, and was able to gain insights faster. 50 PB of data 600,000 analytics jobs/day Analytics
  • 29. Real-time analytics for timely insights with Amazon Kinesis Make streaming data available to multiple real-time analytics applications Run streaming applications without managing any infrastructure Durable to reduce the probability of data loss Scalable to process data from hundreds of thousands of sources with low latencies Analytics
  • 30. Operational analytics for logs and search with Amazon Elasticsearch Fully managed; deploy production-ready cluster in minutes Direct access to Elasticsearch open-source APIs, Logstash, and Kibana VPC support; at-rest and in-transit encryption Scale up and down easily Analytics
  • 31. Interactive analysis with Amazon Athena Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage, and no data to load Ability to run SQL queries on data archived in Amazon S3 Glacier (coming soon) Analytics
  • 32. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless analytics Deliver on-demand analytics on the data lake Amazon S3 Data lake AWS Glue (ETL & Data Catalog) Athena Amazon QuickSight Serverless. Zero infrastructure. Zero administration Never pay for idle resources Availability and fault tolerance built in Automatically scales resources with usage AWS IoT Core AI/ML Devices Web Sensors Social Analytics
  • 33. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Visualization & machine learning solutions Dashboards Predictive analytics Visualization & machine learning
  • 34. Visual insights for everyone with Amazon QuickSight Pay only for what you use Scale to tens of thousands of users Embedded analytics Build end-to-end BI solutions Visualization & machine learning
  • 35. Advanced insights for everyone With Amazon ML & AI services Frameworks and interfaces for machine learning practitioners Platform services that make it easy for any developer to get started and get deep with ML Application services that enable developers to plugin pre-built AI functionality into their apps Visualization & machine learning Amazon S3 raw data Initial training data is annotated by human labelers Active learning model is trained from human labeled data Ambiguous data is sent to human labelers for annotation Human labeled data is then sent back to retrain and improve the machine learning model Training data the model understands is automatically labeled An accurate training dataset is ready for use in Amazon SageMaker
  • 36. Central 1 & AWS 2019-Nov-05 Paul Save, Product Manager, Data Science
  • 38. What is a Credit Union? Full service financial institutions Use a cooperative business model Safe and stable – provincially regulated & insured Owned by their members – people who bank with them Profits have a purpose – to benefit the people they were built to serve Boards are elected members from local community
  • 39. As of May 2018 Credit unions in Canada
  • 40. My Team Data Engineers Data Scientists Product Manager, Data Science
  • 41. The need: to provide the right services and products to the right customers at the right time
  • 42. Why AWS? 44 Model Acceleration Solutions/Product Selection & Integration Governance, Quality & Data Protection Scale & Cost Data & Integration Workforce Capabilities Value, Strategy & Adoption Data Ecosystem
  • 44. Engage the right people at the right time PrecariousActive Member Optimal Retainment Period Asset loss Churn Feature Feature Feature Next best action Probability * Cost Jane
  • 45. Complexity & Value of the models ~600 features created 15 models explored to determine appropriateness for retention 10,000+ models trained for tuning >90% accuracy on predictions Continuous Improvement of the model with marketing results data
  • 48. Credit Union AWS Encryption CLI AWS KMS AWS Transfer for SFTP Step Functions Workflow Secure Landing Decrypt Lambda Function EC2 Instance Raw Refinement Lambda Function Glue JobRefine d Credit Union Side Secure VPC VPC Endpoint VPC Endpoint 1. Invoke 2. GenerateDataKey 3. Encrypt and transfer 4. CreateObject 5. Trigger 6. Launch 7. Decrypt 8. Security Stuff 9. CreateObject 10. Trigger 11. Start 12. CreateObject 1 Consumption Prepare & Send Receive, Decrypt + Security Process & Refine
  • 49. Step Functions Workflow 3. Data Validations 1. Crawler Lambda 2. Raw Crawler Date Validation Null Validation Duplicate Record 3. Data Refinement 4. Crawler Lambda 5. Refined Crawler Glue Data Catalog S3 Metadata Bucket Event Topic Simple Notification Service Lambda Event Subscriber Dynamo Event Table SQS Message Feed Metadata JSON … Other Validations
  • 50. Questions? Paul Save Product Manager, Data Science psave@central1.com
  • 51. WATCH VIDEO >> Enhancing the fan experience One week of NFL games now creates 3TB of data. NFL uses Amazon SageMaker to analyze telemetry data to predict plays. Computations that could take months to refine now take only weeks or days. Visualization & machine learning
  • 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. “Developing for the cutting edge of medical education means you need to have access to cutting edge features and services. With AWS and Amazon Polly, we have been able to develop a new standard in healthcare education.” Sarah Third IT & Communications Manager Visualization & machine learning