SlideShare a Scribd company logo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a Data Lake for Your
Enterprise, ft. Sysco
Greg Nelson
Director, BI, and
Analytics Platforms
Sysco
S T G 3 0 9
Varun Kumar
Sr. Manager,
Platforms
Data & Analytics
Sysco
Laith Al-Saadoon
Sr. Solutions
Architect
AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Related breakouts and chalk talks
Tuesday, November 27
STG311 – Lessons Learned from a Large-Scale Legacy Migration with Sysco
4:45 PM – 5:45 PM | MGM, Level 1, Grand Ballroom 122
Thursday, November 29
STG340 – Customizing Data Lakes to Work for Your Enterprise with Sysco
4:00 PM – 5:00 PM | Venetian, Level 4, Lando 4305
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Sysco Corporation overview and data lake
goals
Sysco’s enterprise data lake architecture
Data lake ingestion and storage patterns on
Amazon S3
Data lake best practices
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sysco at a glance
Sysco is the global leader in selling, marketing, and distributing food products
to restaurants, healthcare and educational facilities, lodging establishments,
and other customers who prepare meals away from home. Its family of
products also includes equipment and supplies for the foodservice and
hospitality industries.
To be our customers' most valued
and trusted business partner.
Integrity, Teamwork, Excellence,
Inclusiveness, Responsibility.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sysco is a growing, global company with a strong presence in a
roughly $400B, large and fragmented foodservice market
Sysco currently operates in the U.S., Canada, Mexico, Costa Rica, Panama, Bahamas, U.K., France, Sweden,
Spain, Belgium, Luxembourg, and Ireland and services customers in an additional 81 countries via the IFG
exporting business.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why the data lake?
Capital and capacity constraints limit analytic use cases
EDW
Wall of Business
Constraint
M&A data
Revenue management
Machine learning
Unstructured data
Social media
Clickstream
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Introducing SEED
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The journey
• Start small
• Pay for what you use
• Fail fast and pivot
Proof of concept –
use case
Sales subject area
Supply chain
All data sources
Impediment – course correct
Optimize and pivot
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Outcomes and use cases
Data
science
Decision
support
Operational
management
Analyticcapability
Reactive Proactive
ApproachPast Future
•Note: Size of circles correlates to the scale of the
capability’s utilization across Sysco
Operations data insights
Category management
insights
Revenue management
insights
Customer red alert
Personalized
recommendations
Insights-driven assortment
Formatted
reporting
Parameterized
reporting
Guided
ad hoc
Exploratory
analysis &
self service
AI / ML
Predictive &
prescriptive
analytics
Our growth focus
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Transactional
Pricing
Attribution
third-party
syndicated
data
Convergence of analytics within SEED Data Repository
Customer
risk
Personalization
Price migration
& engine
Assortment
optimization
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SEED data repository architecture
Amazon
Redshift
User
interface
API
Amazon
Cognito
authentication
Custom
authorizer
Data lake
microservices
AWS Lambda
Amazon S3
AWS Glue Crawler Amazon
Athena
Roles
IAM
Upload
Amazon
Redshift
Amazon S3 External data
sources
Gravity to
ingest data
at scale
Amazon EMR
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SEED data repository architecture
Amazon
Redshift
User
interface
API
Amazon
Cognito
authentication
Custom
authorizer
Data lake
microservices
AWS Lambda
Amazon S3
AWS Glue Crawler Amazon
Athena
Roles
IAM
Upload
Amazon
DynamoDB
Amazon
Elasticsearch
Amazon
Redshift
Amazon S3 External data
sources
Gravity to
ingest data
at scale
Amazon EMR
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SEED data repository architecture
Amazon
EMR
Amazon
Redshift
User
interface
API
Amazon
Cognito
authentication
Custom
authorizer
Data lake
microservices
AWS Lambda
Amazon S3
AWS Glue Crawler Amazon
Athena
Roles
IAM
Upload
Amazon
Redshift
Amazon S3 External data
sources
Gravity to
ingest data
at scale
Data integrity checks & data quality report
 Advanced analytics
 Scheduled reporting
 Interactive analysis
 Prototyping reports
 Data analysis
 Custom data extracts
Amazon EMR
Amazon
SageMaker
Amazon
DynamoDB
Amazon
Elasticsearch
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enabling Data as a Service (DaaS) using SDR
API
Gateway
Serverless
Lambda
API
consumer
Amazon
Redshift
Spectrum
S3
bucket
Ingest
using Hive
DynamoDB
Amazon ES Amazon
EMR
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enabling data science and engineering at scale
Higher conversion rate
enabled by
recommendation engine
Driver compliance by IOT live
streams
Item search and match capability to
identify similar, available, and
profitable items across various
product and business teams
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lessons learned
Centralized data
repository
• What numbers mean as opposed to
what numbers are
Data as a Service
• Thousands of reports to a publish
subscribe architecture
Rapid experimentation
platform
• AI/ML limited to certain processes
and adopted with caution
Accountable continuous
process
• Manual intervention/reprocessing
for operational issues to operational
accountability
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Road ahead
SPROUT SEEDLING SAPLING TREE
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Legacy data architectures exist as isolated
data silos
Hadoop cluster
OLTP
databases
Data warehouse
appliance
S
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with legacy
data architectures
1. Data movement and ETL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with legacy
data architectures
1. Data movement and ETL
2. Data types and formats
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with legacy
data architectures
1. Data movement and ETL
2. Data types and formats
3. Real-time processing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with legacy
data architectures
1. Data movement and ETL
2. Data types and formats
3. Real-time processing
4. Data governance across
data silos
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reasons to build a data lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reasons to build a data lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reasons to build a data lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Characteristics of a data lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits of a data lake—Separation of storage and
compute
“How can I scale up with the
volume of data being generated?”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits of a data lake—Separation of storage and
compute
Scale storage and compute
independently
“How can I scale up with the
volume of data being generated?”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a data lake on AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
KMS
AWS
IAM
AWS
CloudTrail
Amazon
CloudWatch
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
AppSync
Amazon
API Gateway
Amazon
Cognito
AWS
KMS
AWS
IAM
AWS
CloudTrail
Amazon
CloudWatch
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
AppSync
Amazon
API Gateway
Amazon
Cognito
AWS
KMS
AWS
IAM
AWS
CloudTrail
Amazon
CloudWatch
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
Amazon
Athena
Amazon
EMR
AWS
Glue
Amazon
Redshift
Amazon
DynamoDB
Amazon
QuickSight
Amazon
Kinesis
Amazon
Elasticsearch
Service
Amazon
Neptune
Amazon
RDS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Durable Available High performance
Scalable IntegratedEasy to use
Why Amazon S3 for data lake?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Secure your data lake
Encrypt
SSE-S3 or CSE using AWS KMS
HTTPS endpoints
Authorize and authenticate
IAM policies
Amazon S3 bucket policies
AWS Glue Data Catalog resource-based policies
Amazon S3 VPC endpoints
Audit and comply
AWS CloudTrail and bucket access logs
Lifecyle management policies
Versioning and MFA Delete
Certifications—HIPAA, PCI, SOC 1, 2, 3, etc.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use your data lake
Data
lake
Query data in
place
Load curated
data for
applications
Collect train
and test
datasets for
ML models
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake storage best practices
s3://amazon-reviews-pds/parquet/product_category=Apparel/part-00000-
495c48e6-96d6-4650-aa65-3c36a3516ddd.c000.snappy.parquet
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake storage best practices
Use lifecycle rules Use Amazon S3 Select and
Amazon Glacier Select
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake compute best practices
Spot Instance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS answers—Data lake on AWS
https://amzn.to/2k4FQX1
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
Amazon Web Services
 
Construyendo un data lake en la nube aws
Construyendo un data lake en la nube awsConstruyendo un data lake en la nube aws
Construyendo un data lake en la nube aws
Amazon Web Services LATAM
 
Data Migration to Azure
Data Migration to AzureData Migration to Azure
Data Migration to Azure
Sanjay B. Bhakta
 
Migrating Your Databases to AWS - Tools and Services.pdf
Migrating Your Databases to AWS -  Tools and Services.pdfMigrating Your Databases to AWS -  Tools and Services.pdf
Migrating Your Databases to AWS - Tools and Services.pdf
Amazon Web Services
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
Amazon Web Services
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
SwathiPonugumati
 
AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive
Amazon Web Services
 
Data Governance in a big data era
Data Governance in a big data eraData Governance in a big data era
Data Governance in a big data era
Pieter De Leenheer
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
Tyler Wishnoff
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
Gary Stafford
 
Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWS
Amazon Web Services
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudHow to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
Denodo
 
AIOps - The next 5 years
AIOps - The next 5 yearsAIOps - The next 5 years
AIOps - The next 5 years
Moogsoft
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
Amazon Web Services
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...
만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...
만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...
Amazon Web Services Korea
 

What's hot (20)

The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
 
Construyendo un data lake en la nube aws
Construyendo un data lake en la nube awsConstruyendo un data lake en la nube aws
Construyendo un data lake en la nube aws
 
Data Migration to Azure
Data Migration to AzureData Migration to Azure
Data Migration to Azure
 
Migrating Your Databases to AWS - Tools and Services.pdf
Migrating Your Databases to AWS -  Tools and Services.pdfMigrating Your Databases to AWS -  Tools and Services.pdf
Migrating Your Databases to AWS - Tools and Services.pdf
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive
 
Data Governance in a big data era
Data Governance in a big data eraData Governance in a big data era
Data Governance in a big data era
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWS
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudHow to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
 
AIOps - The next 5 years
AIOps - The next 5 yearsAIOps - The next 5 years
AIOps - The next 5 years
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...
만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...
만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...
 

Similar to Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent 2018

BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
Amazon Web Services
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
Amazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
Amazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
Adir Sharabi
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Amazon Web Services
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
Amazon Web Services
 
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Amazon Web Services
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Amazon Web Services
 
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Amazon Web Services
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Amazon Web Services
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Amazon Web Services
 
Build and Innovate with a Modern Data Architecture
Build and Innovate with a Modern Data ArchitectureBuild and Innovate with a Modern Data Architecture
Build and Innovate with a Modern Data Architecture
Amazon Web Services
 
It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018
Amazon Web Services
 
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
Amazon Web Services
 
Using Big Data Retail to Build a Single View of Your Customer.pdf
Using Big Data Retail to Build a Single View of Your Customer.pdfUsing Big Data Retail to Build a Single View of Your Customer.pdf
Using Big Data Retail to Build a Single View of Your Customer.pdf
Amazon Web Services
 
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Michaela Bromfield
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Summits
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
Amazon Web Services
 
SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
 SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
Amazon Web Services
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]
Amazon Web Services
 

Similar to Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent 2018 (20)

BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
 
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
 
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
 
Build and Innovate with a Modern Data Architecture
Build and Innovate with a Modern Data ArchitectureBuild and Innovate with a Modern Data Architecture
Build and Innovate with a Modern Data Architecture
 
It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018
 
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
 
Using Big Data Retail to Build a Single View of Your Customer.pdf
Using Big Data Retail to Build a Single View of Your Customer.pdfUsing Big Data Retail to Build a Single View of Your Customer.pdf
Using Big Data Retail to Build a Single View of Your Customer.pdf
 
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
 SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building a Data Lake for Your Enterprise, ft. Sysco Greg Nelson Director, BI, and Analytics Platforms Sysco S T G 3 0 9 Varun Kumar Sr. Manager, Platforms Data & Analytics Sysco Laith Al-Saadoon Sr. Solutions Architect AWS
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Related breakouts and chalk talks Tuesday, November 27 STG311 – Lessons Learned from a Large-Scale Legacy Migration with Sysco 4:45 PM – 5:45 PM | MGM, Level 1, Grand Ballroom 122 Thursday, November 29 STG340 – Customizing Data Lakes to Work for Your Enterprise with Sysco 4:00 PM – 5:00 PM | Venetian, Level 4, Lando 4305
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Sysco Corporation overview and data lake goals Sysco’s enterprise data lake architecture Data lake ingestion and storage patterns on Amazon S3 Data lake best practices
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sysco at a glance Sysco is the global leader in selling, marketing, and distributing food products to restaurants, healthcare and educational facilities, lodging establishments, and other customers who prepare meals away from home. Its family of products also includes equipment and supplies for the foodservice and hospitality industries. To be our customers' most valued and trusted business partner. Integrity, Teamwork, Excellence, Inclusiveness, Responsibility.
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sysco is a growing, global company with a strong presence in a roughly $400B, large and fragmented foodservice market Sysco currently operates in the U.S., Canada, Mexico, Costa Rica, Panama, Bahamas, U.K., France, Sweden, Spain, Belgium, Luxembourg, and Ireland and services customers in an additional 81 countries via the IFG exporting business.
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why the data lake? Capital and capacity constraints limit analytic use cases EDW Wall of Business Constraint M&A data Revenue management Machine learning Unstructured data Social media Clickstream
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Introducing SEED
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The journey • Start small • Pay for what you use • Fail fast and pivot Proof of concept – use case Sales subject area Supply chain All data sources Impediment – course correct Optimize and pivot
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Outcomes and use cases Data science Decision support Operational management Analyticcapability Reactive Proactive ApproachPast Future •Note: Size of circles correlates to the scale of the capability’s utilization across Sysco Operations data insights Category management insights Revenue management insights Customer red alert Personalized recommendations Insights-driven assortment Formatted reporting Parameterized reporting Guided ad hoc Exploratory analysis & self service AI / ML Predictive & prescriptive analytics Our growth focus
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Transactional Pricing Attribution third-party syndicated data Convergence of analytics within SEED Data Repository Customer risk Personalization Price migration & engine Assortment optimization
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SEED data repository architecture Amazon Redshift User interface API Amazon Cognito authentication Custom authorizer Data lake microservices AWS Lambda Amazon S3 AWS Glue Crawler Amazon Athena Roles IAM Upload Amazon Redshift Amazon S3 External data sources Gravity to ingest data at scale Amazon EMR
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SEED data repository architecture Amazon Redshift User interface API Amazon Cognito authentication Custom authorizer Data lake microservices AWS Lambda Amazon S3 AWS Glue Crawler Amazon Athena Roles IAM Upload Amazon DynamoDB Amazon Elasticsearch Amazon Redshift Amazon S3 External data sources Gravity to ingest data at scale Amazon EMR
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SEED data repository architecture Amazon EMR Amazon Redshift User interface API Amazon Cognito authentication Custom authorizer Data lake microservices AWS Lambda Amazon S3 AWS Glue Crawler Amazon Athena Roles IAM Upload Amazon Redshift Amazon S3 External data sources Gravity to ingest data at scale Data integrity checks & data quality report  Advanced analytics  Scheduled reporting  Interactive analysis  Prototyping reports  Data analysis  Custom data extracts Amazon EMR Amazon SageMaker Amazon DynamoDB Amazon Elasticsearch
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Enabling Data as a Service (DaaS) using SDR API Gateway Serverless Lambda API consumer Amazon Redshift Spectrum S3 bucket Ingest using Hive DynamoDB Amazon ES Amazon EMR
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Enabling data science and engineering at scale Higher conversion rate enabled by recommendation engine Driver compliance by IOT live streams Item search and match capability to identify similar, available, and profitable items across various product and business teams
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lessons learned Centralized data repository • What numbers mean as opposed to what numbers are Data as a Service • Thousands of reports to a publish subscribe architecture Rapid experimentation platform • AI/ML limited to certain processes and adopted with caution Accountable continuous process • Manual intervention/reprocessing for operational issues to operational accountability
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Road ahead SPROUT SEEDLING SAPLING TREE
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Legacy data architectures exist as isolated data silos Hadoop cluster OLTP databases Data warehouse appliance S
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges with legacy data architectures 1. Data movement and ETL
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges with legacy data architectures 1. Data movement and ETL 2. Data types and formats
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges with legacy data architectures 1. Data movement and ETL 2. Data types and formats 3. Real-time processing
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges with legacy data architectures 1. Data movement and ETL 2. Data types and formats 3. Real-time processing 4. Data governance across data silos
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Reasons to build a data lake
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Reasons to build a data lake
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Reasons to build a data lake
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Characteristics of a data lake
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benefits of a data lake—Separation of storage and compute “How can I scale up with the volume of data being generated?”
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benefits of a data lake—Separation of storage and compute Scale storage and compute independently “How can I scale up with the volume of data being generated?”
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building a data lake on AWS
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS Database Migration Service
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake Amazon DynamoDB Amazon Elasticsearch Service AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS Database Migration Service
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake Amazon DynamoDB Amazon Elasticsearch Service AWS KMS AWS IAM AWS CloudTrail Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS Database Migration Service
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake Amazon DynamoDB Amazon Elasticsearch Service AWS AppSync Amazon API Gateway Amazon Cognito AWS KMS AWS IAM AWS CloudTrail Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS Database Migration Service
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake Amazon DynamoDB Amazon Elasticsearch Service AWS AppSync Amazon API Gateway Amazon Cognito AWS KMS AWS IAM AWS CloudTrail Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS Database Migration Service Amazon Athena Amazon EMR AWS Glue Amazon Redshift Amazon DynamoDB Amazon QuickSight Amazon Kinesis Amazon Elasticsearch Service Amazon Neptune Amazon RDS
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Durable Available High performance Scalable IntegratedEasy to use Why Amazon S3 for data lake?
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Secure your data lake Encrypt SSE-S3 or CSE using AWS KMS HTTPS endpoints Authorize and authenticate IAM policies Amazon S3 bucket policies AWS Glue Data Catalog resource-based policies Amazon S3 VPC endpoints Audit and comply AWS CloudTrail and bucket access logs Lifecyle management policies Versioning and MFA Delete Certifications—HIPAA, PCI, SOC 1, 2, 3, etc.
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Use your data lake Data lake Query data in place Load curated data for applications Collect train and test datasets for ML models
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data lake storage best practices s3://amazon-reviews-pds/parquet/product_category=Apparel/part-00000- 495c48e6-96d6-4650-aa65-3c36a3516ddd.c000.snappy.parquet
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data lake storage best practices Use lifecycle rules Use Amazon S3 Select and Amazon Glacier Select
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data lake compute best practices Spot Instance
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS answers—Data lake on AWS https://amzn.to/2k4FQX1
  • 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary
  • 51. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.