SlideShare a Scribd company logo
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
BI & Analytics - Datalakes on AWS
Johan Brom an
M anager, Solutions Architecture
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Today's conversation
Business drivers for a Data Lake
Designing and building
Production use cases
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Data Drives Better Decision
Making
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Outcome 1 : Modernize and consolidate
• Insights to enhance business applications and create new digital services
Outcome 2 : Innovate for new revenues
• Personalization, demand forecasting, risk analysis
Outcome 3 : Real-time engagement
• Interactive customer experience, event-driven automation, fraud detection
Outcome 4 : Automate for expansive reach
• Automation of business processes and physical infrastructure
Business Outcomes on a Modern Data Architecture
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Legacy Data Architectures Exist as Isolated Data Silos
Hadoop
Cluster
SQL
Database
Data
Warehouse
Appliance
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Enter Data Lake Architectures
Data Lake is a new and increasingly
popular architecture to store and analyze
massive volumes and heterogeneous
types of data.
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake
Store and analyse all of your data,
from all of your sources, in one
centralised location.
Quickly ingest data
without needing to force it into a
pre-defined schema.
Separating your
storage and compute
allows you to scale
each component as
required
A Data Lake enables ad-hoc analysis
by applying schemas
on read, not write.
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Today's conversation
Business drivers for a Data Lake
Designing and building
Production use cases
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Data
scientists
Automation /
events
Business
users
Data
analysts
Engagement
platforms
1. More personas need access to data, through appropriate tools
2. More systems need to link to data for decision and process automation
3. Users need to be able to find information, and access it securely
Expanding access requirements
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
1. Data must be captured from diverse sources at speed and scale
2. Data needs to be pulled together, breaking down traditional silos
3. Benefits need to far outweigh the costs of collection and analysis
Transactions ERP Connected
devices
Social mediaWeb logs /
cookies
Exponential growth of business data
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Important Components of a Data Lake
Catalogue
& Search
Protect
& Secure
Access &
User Interface Ingest & Store
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
AWS Approach to Data Lakes
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Data Lakes Extend the Traditional Approach
Relational and non-relational data
TBs-EBs scale
Schema defined during analysis
Diverse analytical engines to gain insights
Designed for low-cost storage and analytics
OLTP ERP CRM LOB
Data warehouse
Business
intelligence
Data lake
100110000100101011100
101010111001010100001011111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
learning
DW
queries
Big data
processing
Interactive Real-time
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
S3 is key in the
Data Lake
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Building a Data Lake on AWS
Kinesis Firehose Athena
Query Service
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Designed for 11 9s
of durability
Designed for
99.99% availability
Durable Available High performance
§ Multipart upload
§ Range GET
§ Store as much as you need
§ Scale storage and compute
independently
§ No minimum usage
commitments
Scalable
§ Amazon Redshift / Spectrum
§ Amazon EMR
§ Amazon Athena
§ Amazon DynamoDB
Integrated
§ Simple REST API
§ AWS SDKs
§ Read-after-create consistency
§ Event notification
§ Lifecycle policies
Easy to use
Why Amazon S3 for the Data Lake?
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Security
§ Identity and Access
Management (IAM) policies
§ Bucket policies
§ Access Control Lists (ACLs)
§ Private VPC endpoints to
Amazon S3
§ Pre-signed S3 URLs
Encryption
§ SSL endpoints
§ Server Side Encryption
(SSE-S3)
§ S3 Server Side
Encryption with
provided keys (SSE-C,
SSE-KMS)
§ Client-side Encryption
Audit & Compliance
§ Buckets access logs
§ Lifecycle Management
Policies
§ Versioning & MFA
deletes
§ Certifications – HIPAA,
PCI, SOC 1/2/3 etc.
Implement the right cloud security controls
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Data Ingestion into S3
AWS Direct Connect
AWS SnowballISV Connectors
Amazon Kinesis
Firehose
AWS Storage
Gateway
S3 Transfer
Acceleration
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Storing is not enough. Data needs to be discoverable.
Dark data are the information
assets organizations collect,
process, and store during
regular business activities,
but generally fail to use for
other purposes (for example,
analytics, business relationships
and direct monetizing).
Gartner
CRM ERP D ata w arehouse M ainfram e
data
W eb Social Log
files
M achine
data
Sem i-
structured
Unstructured
“
”
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
AWS Glue: Data Catalog
Make data discoverable
Automatically discovers data and stores schema
Catalog makes data searchable and available for ETL
Catalog contains table and job definitions
Computes statistics to make queries efficient
Com pliance
AWS Glue
Data Catalog
Discover data and
extract schema
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Data preparation accounts for ~80% of the work.
Building training sets
Cleaning and organizing data
Collecting data sets
Mining data for patterns
Refining algorithms
Other
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
AWS Glue: ETL Service
Make ETL scripting and deployment easy
Automatically generates ETL code
Code is customizable with Python and Spark
Endpoints provided to edit, debug, & test code
Jobs are scheduled or event-based
Serverless
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Amazon Athena: Interactive Analysis
$ SQL
Query Instantly
Zero setup cost;
just point to
Amazon S3 and
start querying.
Pay per query
Pay only for queries run;
save 30–90% on per-
query costs through
compression.
Open
ANSI SQL interface,
JDBC/ODBC drivers, multiple
formats, compression types,
and complex joins and data
types.
Easy
Serverless: zero
infrastructure, zero
administration
Integrated with Amazon
QuickSight.
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Q uickSight O verview
Integrated with AWS - Redshift, RDS, Athena, S3,
IAM, Roles, CloudTrail and more
Cloud Native - Fully managed, serverless analytics at
scale
Super Fast and Easy to Use - Backed by SPICE and
a beautiful UI
Cost Effective - Starts at $9 per user per month
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Putting it all together…
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Summary of AWS Analytics, Database & AI Tools
Amazon Redshift
Enterprise Data Warehouse
Amazon EMR
Hadoop/Spark
Amazon Athena
Clusterless SQL
Amazon Glue
Clusterless ETL
Amazon Aurora
Managed Relational Database
Amazon Machine Learning
Predictive Analytics
Amazon Quicksight
Business Intelligence/Visualization
Amazon ElasticSearch Service
ElasticSearch
Amazon ElastiCache
Redis In-memory Datastore
Amazon DynamoDB
Managed NoSQL Database
Amazon Rekognition
Deep Learning-based Image Recognition
Amazon Lex
Voice or Text Chatbots
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Queries Against an Amazon S3 Data Lake
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Event-driven ETL Pipelines
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
AWS Solution Builder - Data Lake on AWS
Reference Architecture deployment
via CloudFormation
Configures core services to tag,
search and catalogue datasets
Deploys a console to search and
browse available datasets
http://amzn.to/2nTVjcp
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Processing & Analytics
Real-time Batch
AI & Predictive
BI & Data Visualization
Transactional &
RDBMS
AWSLambda
ApacheStorm
onEMR
ApacheFlink
onEMR
Spark Streaming
onEMR
Elasticsearch
Service
Kinesis Analytics,
Kinesis Streams
DynamoDB
NoSQL DB Relational Database
Aurora
EMR
Hadoop, Spark,
Presto
Redshift
DataWarehouse
Athena
Query Service
AmazonLex
Speech
recognition
Amazon
Rekognition
AmazonPolly
Text tospeech
MachineLearning
Predictiveanalytics
Kinesis Streams
& Firehose
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Today's conversation
Business drivers for a Data Lake
Designing and building
Production use cases
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
“For our market
surveillance systems, we
are looking at about 40%
[savings with AWS], but
the real benefits are the
business benefits: We
can do things that we
physically weren’t able to
do before, and that is
priceless.”
- Steve Randich, CIO
Case Study: Re-architecting Compliance
What FINRA needed
• Infrastructure for its market surveillance platform
• Support of analysis and storage of approximately 75
billion market events every day
Why they chose AWS
• Fulfillment of FINRA’s security requirements
• Ability to create a flexible platform using dynamic
clusters (Hadoop, Hive, and HBase), Amazon EMR,
and Amazon S3
Benefits realized
• Increased agility, speed, and cost savings
• Estimated savings of $10-20m annually by using AWS
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
AWS Solution Builder - Data Lake on AWS
Reference Architecture deployment
via CloudFormation
Configures core services to tag,
search and catalogue datasets
Deploys a console to search and
browse available datasets
http://amzn.to/2nTVjcp
©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
Amazon Web Services
 
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018Amazon Web Services Korea
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Amazon Web Services
 
AWSome Day Iceland - Technical Track
AWSome Day Iceland - Technical TrackAWSome Day Iceland - Technical Track
AWSome Day Iceland - Technical Track
Amazon Web Services
 
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglioArtificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Amazon Web Services
 
Innovating SAP the Easy Way – Migrate it to AWS
Innovating SAP the Easy Way – Migrate it to AWSInnovating SAP the Easy Way – Migrate it to AWS
Innovating SAP the Easy Way – Migrate it to AWS
Amazon Web Services
 
IoT Compute at the Edge with AWS Greengrass - GOTO Amsterdam
IoT Compute at the Edge with AWS Greengrass - GOTO AmsterdamIoT Compute at the Edge with AWS Greengrass - GOTO Amsterdam
IoT Compute at the Edge with AWS Greengrass - GOTO Amsterdam
Boaz Ziniman
 
Transforming Enterprise IT - AWS Transformation Days Raleigh 2018.pdf
Transforming Enterprise IT - AWS Transformation Days Raleigh 2018.pdfTransforming Enterprise IT - AWS Transformation Days Raleigh 2018.pdf
Transforming Enterprise IT - AWS Transformation Days Raleigh 2018.pdf
Amazon Web Services
 
Creazione del business case per l'adozione del cloud nella tua azienda
Creazione del business case per l'adozione del cloud nella tua aziendaCreazione del business case per l'adozione del cloud nella tua azienda
Creazione del business case per l'adozione del cloud nella tua azienda
Amazon Web Services
 
New Tools for a New World
New Tools for a New WorldNew Tools for a New World
New Tools for a New World
Amazon Web Services
 
BDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMaker
BDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMakerBDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMaker
BDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMaker
Amazon Web Services
 
Leadership Session: Using AWS End User Computing Services for Your Modern Wor...
Leadership Session: Using AWS End User Computing Services for Your Modern Wor...Leadership Session: Using AWS End User Computing Services for Your Modern Wor...
Leadership Session: Using AWS End User Computing Services for Your Modern Wor...
Amazon Web Services
 
Data Lifecycle Management
Data Lifecycle ManagementData Lifecycle Management
Data Lifecycle Management
Amazon Web Services
 
Top Security Myths Dispelled
Top Security Myths DispelledTop Security Myths Dispelled
Top Security Myths Dispelled
Amazon Web Services
 
What’s New with Device Qualification Program and IoT Services
What’s New with Device Qualification Program and IoT ServicesWhat’s New with Device Qualification Program and IoT Services
What’s New with Device Qualification Program and IoT Services
Amazon Web Services
 
Keynote
KeynoteKeynote
The Future of Enterprise IT - Lessons Learned
The Future of Enterprise IT - Lessons LearnedThe Future of Enterprise IT - Lessons Learned
The Future of Enterprise IT - Lessons Learned
Amazon Web Services
 
Using data lifecycle management
Using data lifecycle managementUsing data lifecycle management
Using data lifecycle management
Interfacing
 
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
Amazon Web Services
 
Adding image and video analysis to your app
Adding image and video analysis to your appAdding image and video analysis to your app
Adding image and video analysis to your app
Amazon Web Services
 

What's hot (20)

雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
 
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
 
AWSome Day Iceland - Technical Track
AWSome Day Iceland - Technical TrackAWSome Day Iceland - Technical Track
AWSome Day Iceland - Technical Track
 
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglioArtificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
 
Innovating SAP the Easy Way – Migrate it to AWS
Innovating SAP the Easy Way – Migrate it to AWSInnovating SAP the Easy Way – Migrate it to AWS
Innovating SAP the Easy Way – Migrate it to AWS
 
IoT Compute at the Edge with AWS Greengrass - GOTO Amsterdam
IoT Compute at the Edge with AWS Greengrass - GOTO AmsterdamIoT Compute at the Edge with AWS Greengrass - GOTO Amsterdam
IoT Compute at the Edge with AWS Greengrass - GOTO Amsterdam
 
Transforming Enterprise IT - AWS Transformation Days Raleigh 2018.pdf
Transforming Enterprise IT - AWS Transformation Days Raleigh 2018.pdfTransforming Enterprise IT - AWS Transformation Days Raleigh 2018.pdf
Transforming Enterprise IT - AWS Transformation Days Raleigh 2018.pdf
 
Creazione del business case per l'adozione del cloud nella tua azienda
Creazione del business case per l'adozione del cloud nella tua aziendaCreazione del business case per l'adozione del cloud nella tua azienda
Creazione del business case per l'adozione del cloud nella tua azienda
 
New Tools for a New World
New Tools for a New WorldNew Tools for a New World
New Tools for a New World
 
BDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMaker
BDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMakerBDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMaker
BDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMaker
 
Leadership Session: Using AWS End User Computing Services for Your Modern Wor...
Leadership Session: Using AWS End User Computing Services for Your Modern Wor...Leadership Session: Using AWS End User Computing Services for Your Modern Wor...
Leadership Session: Using AWS End User Computing Services for Your Modern Wor...
 
Data Lifecycle Management
Data Lifecycle ManagementData Lifecycle Management
Data Lifecycle Management
 
Top Security Myths Dispelled
Top Security Myths DispelledTop Security Myths Dispelled
Top Security Myths Dispelled
 
What’s New with Device Qualification Program and IoT Services
What’s New with Device Qualification Program and IoT ServicesWhat’s New with Device Qualification Program and IoT Services
What’s New with Device Qualification Program and IoT Services
 
Keynote
KeynoteKeynote
Keynote
 
The Future of Enterprise IT - Lessons Learned
The Future of Enterprise IT - Lessons LearnedThe Future of Enterprise IT - Lessons Learned
The Future of Enterprise IT - Lessons Learned
 
Using data lifecycle management
Using data lifecycle managementUsing data lifecycle management
Using data lifecycle management
 
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
 
Adding image and video analysis to your app
Adding image and video analysis to your appAdding image and video analysis to your app
Adding image and video analysis to your app
 

Similar to BI & Analytics - A Datalake on AWS

Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Amazon Web Services
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Amazon Web Services
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Amazon Web Services
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Amazon Web Services
 
Build and Innovate with a Modern Data Architecture
Build and Innovate with a Modern Data ArchitectureBuild and Innovate with a Modern Data Architecture
Build and Innovate with a Modern Data Architecture
Amazon Web Services
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Amazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
Amazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
Adir Sharabi
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Amazon Web Services
 
It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018
Amazon Web Services
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Amazon Web Services
 
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
Amazon Web Services
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Amazon Web Services
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]
Amazon Web Services
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSightBusiness Intelligence in Minutes with Amazon Athena and Amazon QuickSight
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight
Amazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
Amazon Web Services
 
Choose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day IsraelChoose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day Israel
Amazon Web Services
 
From Data To Insights
From Data To Insights From Data To Insights
From Data To Insights
Orit Alul
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Amazon Web Services
 

Similar to BI & Analytics - A Datalake on AWS (20)

Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
 
Build and Innovate with a Modern Data Architecture
Build and Innovate with a Modern Data ArchitectureBuild and Innovate with a Modern Data Architecture
Build and Innovate with a Modern Data Architecture
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
 
It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
 
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSightBusiness Intelligence in Minutes with Amazon Athena and Amazon QuickSight
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Choose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day IsraelChoose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day Israel
 
From Data To Insights
From Data To Insights From Data To Insights
From Data To Insights
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

BI & Analytics - A Datalake on AWS

  • 1. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. BI & Analytics - Datalakes on AWS Johan Brom an M anager, Solutions Architecture ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
  • 2. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Today's conversation Business drivers for a Data Lake Designing and building Production use cases
  • 3. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Data Drives Better Decision Making
  • 4. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Outcome 1 : Modernize and consolidate • Insights to enhance business applications and create new digital services Outcome 2 : Innovate for new revenues • Personalization, demand forecasting, risk analysis Outcome 3 : Real-time engagement • Interactive customer experience, event-driven automation, fraud detection Outcome 4 : Automate for expansive reach • Automation of business processes and physical infrastructure Business Outcomes on a Modern Data Architecture
  • 5. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
  • 6. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.
  • 7. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Legacy Data Architectures Exist as Isolated Data Silos Hadoop Cluster SQL Database Data Warehouse Appliance
  • 8. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Enter Data Lake Architectures Data Lake is a new and increasingly popular architecture to store and analyze massive volumes and heterogeneous types of data.
  • 9. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake Store and analyse all of your data, from all of your sources, in one centralised location. Quickly ingest data without needing to force it into a pre-defined schema. Separating your storage and compute allows you to scale each component as required A Data Lake enables ad-hoc analysis by applying schemas on read, not write.
  • 10. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Today's conversation Business drivers for a Data Lake Designing and building Production use cases
  • 11. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Data scientists Automation / events Business users Data analysts Engagement platforms 1. More personas need access to data, through appropriate tools 2. More systems need to link to data for decision and process automation 3. Users need to be able to find information, and access it securely Expanding access requirements
  • 12. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. 1. Data must be captured from diverse sources at speed and scale 2. Data needs to be pulled together, breaking down traditional silos 3. Benefits need to far outweigh the costs of collection and analysis Transactions ERP Connected devices Social mediaWeb logs / cookies Exponential growth of business data
  • 13. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Important Components of a Data Lake Catalogue & Search Protect & Secure Access & User Interface Ingest & Store
  • 14. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. AWS Approach to Data Lakes
  • 15. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Data Lakes Extend the Traditional Approach Relational and non-relational data TBs-EBs scale Schema defined during analysis Diverse analytical engines to gain insights Designed for low-cost storage and analytics OLTP ERP CRM LOB Data warehouse Business intelligence Data lake 100110000100101011100 101010111001010100001011111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine learning DW queries Big data processing Interactive Real-time
  • 16. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. S3 is key in the Data Lake
  • 17. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Building a Data Lake on AWS Kinesis Firehose Athena Query Service
  • 18. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Designed for 11 9s of durability Designed for 99.99% availability Durable Available High performance § Multipart upload § Range GET § Store as much as you need § Scale storage and compute independently § No minimum usage commitments Scalable § Amazon Redshift / Spectrum § Amazon EMR § Amazon Athena § Amazon DynamoDB Integrated § Simple REST API § AWS SDKs § Read-after-create consistency § Event notification § Lifecycle policies Easy to use Why Amazon S3 for the Data Lake?
  • 19. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Security § Identity and Access Management (IAM) policies § Bucket policies § Access Control Lists (ACLs) § Private VPC endpoints to Amazon S3 § Pre-signed S3 URLs Encryption § SSL endpoints § Server Side Encryption (SSE-S3) § S3 Server Side Encryption with provided keys (SSE-C, SSE-KMS) § Client-side Encryption Audit & Compliance § Buckets access logs § Lifecycle Management Policies § Versioning & MFA deletes § Certifications – HIPAA, PCI, SOC 1/2/3 etc. Implement the right cloud security controls
  • 20. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Data Ingestion into S3 AWS Direct Connect AWS SnowballISV Connectors Amazon Kinesis Firehose AWS Storage Gateway S3 Transfer Acceleration
  • 21. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Storing is not enough. Data needs to be discoverable. Dark data are the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Gartner CRM ERP D ata w arehouse M ainfram e data W eb Social Log files M achine data Sem i- structured Unstructured “ ”
  • 22. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. AWS Glue: Data Catalog Make data discoverable Automatically discovers data and stores schema Catalog makes data searchable and available for ETL Catalog contains table and job definitions Computes statistics to make queries efficient Com pliance AWS Glue Data Catalog Discover data and extract schema
  • 23. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Data preparation accounts for ~80% of the work. Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
  • 24. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. AWS Glue: ETL Service Make ETL scripting and deployment easy Automatically generates ETL code Code is customizable with Python and Spark Endpoints provided to edit, debug, & test code Jobs are scheduled or event-based Serverless
  • 25. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Amazon Athena: Interactive Analysis $ SQL Query Instantly Zero setup cost; just point to Amazon S3 and start querying. Pay per query Pay only for queries run; save 30–90% on per- query costs through compression. Open ANSI SQL interface, JDBC/ODBC drivers, multiple formats, compression types, and complex joins and data types. Easy Serverless: zero infrastructure, zero administration Integrated with Amazon QuickSight. Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
  • 26. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Q uickSight O verview Integrated with AWS - Redshift, RDS, Athena, S3, IAM, Roles, CloudTrail and more Cloud Native - Fully managed, serverless analytics at scale Super Fast and Easy to Use - Backed by SPICE and a beautiful UI Cost Effective - Starts at $9 per user per month
  • 27. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Putting it all together…
  • 28. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Summary of AWS Analytics, Database & AI Tools Amazon Redshift Enterprise Data Warehouse Amazon EMR Hadoop/Spark Amazon Athena Clusterless SQL Amazon Glue Clusterless ETL Amazon Aurora Managed Relational Database Amazon Machine Learning Predictive Analytics Amazon Quicksight Business Intelligence/Visualization Amazon ElasticSearch Service ElasticSearch Amazon ElastiCache Redis In-memory Datastore Amazon DynamoDB Managed NoSQL Database Amazon Rekognition Deep Learning-based Image Recognition Amazon Lex Voice or Text Chatbots
  • 29. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Queries Against an Amazon S3 Data Lake
  • 30. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Event-driven ETL Pipelines
  • 31. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. AWS Solution Builder - Data Lake on AWS Reference Architecture deployment via CloudFormation Configures core services to tag, search and catalogue datasets Deploys a console to search and browse available datasets http://amzn.to/2nTVjcp
  • 32. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Processing & Analytics Real-time Batch AI & Predictive BI & Data Visualization Transactional & RDBMS AWSLambda ApacheStorm onEMR ApacheFlink onEMR Spark Streaming onEMR Elasticsearch Service Kinesis Analytics, Kinesis Streams DynamoDB NoSQL DB Relational Database Aurora EMR Hadoop, Spark, Presto Redshift DataWarehouse Athena Query Service AmazonLex Speech recognition Amazon Rekognition AmazonPolly Text tospeech MachineLearning Predictiveanalytics Kinesis Streams & Firehose
  • 33. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Today's conversation Business drivers for a Data Lake Designing and building Production use cases
  • 34. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. “For our market surveillance systems, we are looking at about 40% [savings with AWS], but the real benefits are the business benefits: We can do things that we physically weren’t able to do before, and that is priceless.” - Steve Randich, CIO Case Study: Re-architecting Compliance What FINRA needed • Infrastructure for its market surveillance platform • Support of analysis and storage of approximately 75 billion market events every day Why they chose AWS • Fulfillment of FINRA’s security requirements • Ability to create a flexible platform using dynamic clusters (Hadoop, Hive, and HBase), Amazon EMR, and Amazon S3 Benefits realized • Increased agility, speed, and cost savings • Estimated savings of $10-20m annually by using AWS
  • 35. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved.©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. AWS Solution Builder - Data Lake on AWS Reference Architecture deployment via CloudFormation Configures core services to tag, search and catalogue datasets Deploys a console to search and browse available datasets http://amzn.to/2nTVjcp
  • 36. ©2018, AmazonWebServices, Inc. or its Affiliates. All rights reserved. Thank you!