SlideShare a Scribd company logo
Architecting a Data Lake on AWS
November 3, 2016
Today’s speakers
Agenda
 What is a Data Lake?
 Benefits of a Data Lake
 Why deploy your Data Lake on AWS?
 Deploying your Data Lake with featured APN Big Data Competency Partners
– Cloudwick
– NorthBay
– 47Lining
 Getting Started
What is a Data Lake?
Data Lake is a new and increasingly
popular way to store and analyze
massive volumes and heterogenous
types of data in a centralized repository.
Solving big data challenges with a Data Lake
The volume, variety, and velocity at which data is being generated are leaving
organizations with new questions to answer, such as:
Benefits of a Data Lake
“How can I collect data quickly
from various sources and store
it efficiently?”
Benefits of a Data Lake
Quickly ingest data
without needing to force it into a
pre-defined schema.
“How can I collect data quickly
from various sources and store
it efficiently?”
Benefits of a Data Lake
Benefits of a Data Lake
Store and analyze all of your data,
from all of your sources, in one
centralized location.
Benefits of a Data Lake
Benefits of a Data Lake
Separating your storage and compute
allows you to scale each component as
required
Benefits of a Data Lake
Benefits of a Data Lake
A Data Lake enables ad-hoc
analysis by applying schemas
on read, not write.
Important components of a Data Lake
Building a Data Lake on AWS
Why a Data Lake on AWS?
Why a Data Lake on AWS?
AWS provides a flexible platform that enables you
to get the most out of your Data Lake:
 Store all of your data cost-effectively using
Amazon Simple Storage Service (Amazon S3)
 Quickly and securely move data to AWS
 Provision and scale storage, compute, and
networking capacity on-demand
 Eliminate challenges associated with formatting
data and converting it to a pre-defined schema
Cloudwick
Mark Schreiber, General Manager
www.cloudwick.com
Architecting Data Lakes on AWS with Cloudwick
Powering the Digital Enterprise
Cloudwick is an AWS Advanced Consulting Partner with Big Data Competency designation
that is one of the largest enterprise Big Data-as-a-Service providers, managing dozens of
Hadoop, Cassandra, and Spark services for Global 1000 companies on AWS.
Cloudwick leverages a repeatable 3-phase approach to architecting a Data Lake and
performing Big Data migrations:
1 2 3
How Cloudwick helped a large, well-known healthcare
company improve flexibility
About the Healthcare Company:
This large and well-known healthcare company leverages a unique business model
employing doctors to develop software to make it easier to maintain Electronic Health
Records (EHR).
The Challenge:
 The company’s data ingestion process took 48 hours to complete
 Data ingestion needed to wait until weekends
 Business decisions were being made on week-old data
How Cloudwick helped a large, well-known healthcare
company improve flexibility
The Cloudwick Solution:
 Architect a Data Lake capable of reducing ingestion time to 3 hours
 Develop a new Data Governance process
 Integrate the company’s predictive analytics applications with their Data Lake on AWS
The Benefits:
 Improved metadata management
 Reduced data ingestion time
 HIPAA and PHI Compliance
 Near real-time analytics
Why Data Lakes on AWS?
Security & Compliance
AWS enables organizations like yours to improve their
security posture in the cloud:
 Run on top of the secure AWS data center infrastructure
 Encrypt data at rest and in-transit using 256-bit
encryption
 Meet compliance standards including PCI DSS, HIPAA,
FedRAMP, and more
 Manage user/group access with AWS Identity and
Access Management (AWS IAM)
NorthBay
Raza Shaikh, CTO
www.northbaysolutions.com
Architecting Data Lakes on AWS with NorthBay
Teaching Old Data New Tricks™
NorthBay is an AWS Advanced Consulting Partner that has been implementing Data
Lakes on AWS since 2013 and was among the first in the world to achieve the AWS Big
Data and Mobile competencies.
When NorthBay works with an organization like yours to architect a Data Lake, they:
How NorthBay Helped Eliza Corporation Deploy a Data
Lake on AWS
Eliza Corporation develops healthcare consumer engagement solutions to address some of the industry’s
greatest challenges – from adherence, to prevention, to condition management, to brand loyalty and retention.
Eliza Corporation
analyzes more than
300 million interactions
per year
Outreach questions and
responses form a decision
tree, and each question and
response are captured as a
pair
Diverse downstream
consumption requirements
Challenging to process
and analyze data
The challenges
How NorthBay helped Eliza Corporation
Deploy a Data Lake on AWS
Create next generation
data architecture
Decouple Storage and
Compute
Ability to process old &
new data streams
Achieve HIPAA
compliance
Ingest & store original
datasets
Allow both real-time &
batch processing
Enable access through
entitlements and
governance
Increase self-service for
end-users
• Streamlined data load process by enabling schema on read
• Improved business agility
• Improved cost management by more clearly separating
costs
• Provided ability to provision and scale resources on-demand
• Reduced end-to-end client analytics time
Benefits of the Data Lake on AWS
Why Data Lakes on AWS?
The Most Complete Platform for Big Data
AWS offers the most complete platform for
Big Data
 Support any workload regardless of
volume, velocity, or variety of data
 Use a variety of descriptive, predictive, and
prescriptive analytics for business insights
 Immediately available and easy to scale
47Lining
Mick Bass, CEO
www.47lining.com
Architecting Data Lakes on AWS with 47Lining
Breathtaking Big Data
47Lining is an AWS Advanced Consulting Partner with Big Data Competency designation
that develops Big Data solutions and delivers Big Data managed services on AWS.
Example use cases include:
How 47Lining helped The Howard Hughes Corporation
leverage the most complete platform
for big data
 The Howard Hughes Corporation (HHC) owns, manages and develops commercial,
residential and mixed-use real estate throughout the country.


Benefits from The Howard Hughes Corporation Data Lake
The 47Lining Solution:
 Architect a Data Lake that ingests third party and on-premises data on AWS
 Implement a lead-scoring model using Amazon Machine Learning
 Turn the customer’s Big Data practice into an AWS-based managed service
The Benefits:
400% 10x
How partnering with 47Lining helped The Howard
Hughes Corporation reduce time to value
“Our Data Lake is enabling us to
answer previously unanswerable
questions through on-demand data
fusion and analytics. We are
leveraging data for strategic
advantage in real estate in ways that
are groundbreaking for our industry.”
– Daryan Dehghanpisheh,
Senior Vice President, The Howard Hughes Corporation
BI Tools
Data
Contributors
Managed
Enterprise Data Lake
External
Systems
Data Lake
Governors
(Governance,
En tlements)
Data Consumers
B2E | B2B | B2C
Direct Users
Business Processes
Raw Submissions
Untransformed
Batch | Stream
Managed Datasets
Data Managed by Lake
Suppor ng Schema-on-Read Usage
Data | Metadata
Published Data
Indexed, consumable via
HA DataLake API
Indices, History
ContributeManage
ConsumeSearch
Rules, Policies & En tlements
Contribute | Manage | Transform | Access
Rule-Driven Incremental Loads, Transforms, Cataloging/Indexing, Publishing
Iden ty & Security Indexing & Search
Ingest
Workers,
Loaders
DataLake API
Agile Lakeshore Analy cs
Data Mgmt &
Orchestra on
DataLake UI
Owned | On-prem
3rd Party
Partners | Vendors
Customers
Directory
Roles
IAM Perms Key
Mgmt
Monitoring
DataLake
Web Uis
Elas c Beanstalk
Search
Manage
Consume
Search
Manage
Consume
Elas c Beanstalk
Single Sign-On
Unified Policy-Based En tlements
S3 | Submissions
Kinesis | Submissions
S3 | Content
S3 | Work in Progress
SQS
Queue
Lambda
Worker
Tier
RDS
UI, App & API State
DynamoDB
Discovery Views
CloudSearch
Facets | Indices | Views
DynamoDB
HA Published Results
R Studio Amazon
Machine
Learning
Hadoop/Spark
On-demand
Elas c
MapReduce,
Qubole
Redshi
On-Demand Warehouses
BI /Visualiza on
AWS
Data
Pipeline
airflo
w
Tableau
Server
Amazon
Quicksight
CloudWatch CloudCheckr
CloudTrail DataDog
Data
Ecosystem
API Users
Summary
What is a Data Lake?
 A new and increasingly popular way to store and analyze all of your data, regardless of
source or format, in a centralized repository for use with analytics.
What are the benefits?
 Accelerated time to value and reduced TCO
 Increased storage flexibility and agility
 Ad-hoc analysis that delivers business insights more quickly
Summary
Why AWS for a Data Lake?
 Flexibility and agility
 Security and compliance
 Most complete platform for Big Data
Who can help me implement a Data Lake on AWS?
 Featured APN Partners
 Cloudwick - http://www.cloudwick.com/
 NorthBay - http://northbaysolutions.com/
 47Lining - http://www.47lining.com/
Questions?

More Related Content

What's hot

Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
Amazon Web Services
 
AWS Data Analytics on AWS
AWS Data Analytics on AWSAWS Data Analytics on AWS
AWS Data Analytics on AWS
sampath439572
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
Amazon Web Services
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
Gary Stafford
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
Amazon Web Services
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Amazon Web Services
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
Amazon Web Services
 
Hybrid Cloud Solutions to Transform Your Organization
Hybrid Cloud Solutions to Transform Your OrganizationHybrid Cloud Solutions to Transform Your Organization
Hybrid Cloud Solutions to Transform Your Organization
Amazon Web Services
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
Amazon Web Services
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
Amazon Web Services
 
Designing security & governance via AWS Control Tower & Organizations - SEC30...
Designing security & governance via AWS Control Tower & Organizations - SEC30...Designing security & governance via AWS Control Tower & Organizations - SEC30...
Designing security & governance via AWS Control Tower & Organizations - SEC30...
Amazon Web Services
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
SwathiPonugumati
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
Amazon Web Services
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Introduction to AWS Services and Cloud Computing
Introduction to AWS Services and Cloud ComputingIntroduction to AWS Services and Cloud Computing
Introduction to AWS Services and Cloud Computing
Amazon Web Services
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
Amazon Web Services
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
confluent
 
AWS for Backup and Recovery
AWS for Backup and RecoveryAWS for Backup and Recovery
AWS for Backup and Recovery
Amazon Web Services
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
Amazon Web Services
 

What's hot (20)

Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
AWS Data Analytics on AWS
AWS Data Analytics on AWSAWS Data Analytics on AWS
AWS Data Analytics on AWS
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
 
Hybrid Cloud Solutions to Transform Your Organization
Hybrid Cloud Solutions to Transform Your OrganizationHybrid Cloud Solutions to Transform Your Organization
Hybrid Cloud Solutions to Transform Your Organization
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Designing security & governance via AWS Control Tower & Organizations - SEC30...
Designing security & governance via AWS Control Tower & Organizations - SEC30...Designing security & governance via AWS Control Tower & Organizations - SEC30...
Designing security & governance via AWS Control Tower & Organizations - SEC30...
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Introduction to AWS Services and Cloud Computing
Introduction to AWS Services and Cloud ComputingIntroduction to AWS Services and Cloud Computing
Introduction to AWS Services and Cloud Computing
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
AWS for Backup and Recovery
AWS for Backup and RecoveryAWS for Backup and Recovery
AWS for Backup and Recovery
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 

Similar to Building a Data Lake on AWS

Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
Amazon Web Services
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web Services
Jisc
 
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
Amazon Web Services
 
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
Amazon Web Services
 
利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料
Amazon Web Services
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptx
ArunPandiyan890855
 
Data Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data StrategyData Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data Strategy
Data Con LA
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
IBM
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
Amazon Web Services
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
Abhimanyu Singhal
 
Delivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data servicesDelivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data services
Bhuvaneshwaran R
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
Amazon Web Services
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
Amazon Web Services
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
Amazon Web Services
 
State of the Union: Database & Analytics
State of the Union: Database & AnalyticsState of the Union: Database & Analytics
State of the Union: Database & AnalyticsAmazon Web Services
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Amazon Web Services LATAM
 
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
Amazon Web Services
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
Amazon Web Services
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
Amazon Web Services
 

Similar to Building a Data Lake on AWS (20)

Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web Services
 
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
 
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
 
利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptx
 
Data Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data StrategyData Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data Strategy
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Delivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data servicesDelivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data services
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
State of the Union: Database & Analytics
State of the Union: Database & AnalyticsState of the Union: Database & Analytics
State of the Union: Database & Analytics
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 

Building a Data Lake on AWS

  • 1. Architecting a Data Lake on AWS November 3, 2016
  • 3. Agenda  What is a Data Lake?  Benefits of a Data Lake  Why deploy your Data Lake on AWS?  Deploying your Data Lake with featured APN Big Data Competency Partners – Cloudwick – NorthBay – 47Lining  Getting Started
  • 4. What is a Data Lake? Data Lake is a new and increasingly popular way to store and analyze massive volumes and heterogenous types of data in a centralized repository.
  • 5. Solving big data challenges with a Data Lake The volume, variety, and velocity at which data is being generated are leaving organizations with new questions to answer, such as:
  • 6. Benefits of a Data Lake “How can I collect data quickly from various sources and store it efficiently?”
  • 7. Benefits of a Data Lake Quickly ingest data without needing to force it into a pre-defined schema. “How can I collect data quickly from various sources and store it efficiently?”
  • 8. Benefits of a Data Lake
  • 9. Benefits of a Data Lake Store and analyze all of your data, from all of your sources, in one centralized location.
  • 10. Benefits of a Data Lake
  • 11. Benefits of a Data Lake Separating your storage and compute allows you to scale each component as required
  • 12. Benefits of a Data Lake
  • 13. Benefits of a Data Lake A Data Lake enables ad-hoc analysis by applying schemas on read, not write.
  • 14. Important components of a Data Lake
  • 15. Building a Data Lake on AWS
  • 16. Why a Data Lake on AWS?
  • 17. Why a Data Lake on AWS? AWS provides a flexible platform that enables you to get the most out of your Data Lake:  Store all of your data cost-effectively using Amazon Simple Storage Service (Amazon S3)  Quickly and securely move data to AWS  Provision and scale storage, compute, and networking capacity on-demand  Eliminate challenges associated with formatting data and converting it to a pre-defined schema
  • 18. Cloudwick Mark Schreiber, General Manager www.cloudwick.com
  • 19. Architecting Data Lakes on AWS with Cloudwick Powering the Digital Enterprise Cloudwick is an AWS Advanced Consulting Partner with Big Data Competency designation that is one of the largest enterprise Big Data-as-a-Service providers, managing dozens of Hadoop, Cassandra, and Spark services for Global 1000 companies on AWS. Cloudwick leverages a repeatable 3-phase approach to architecting a Data Lake and performing Big Data migrations: 1 2 3
  • 20. How Cloudwick helped a large, well-known healthcare company improve flexibility About the Healthcare Company: This large and well-known healthcare company leverages a unique business model employing doctors to develop software to make it easier to maintain Electronic Health Records (EHR). The Challenge:  The company’s data ingestion process took 48 hours to complete  Data ingestion needed to wait until weekends  Business decisions were being made on week-old data
  • 21. How Cloudwick helped a large, well-known healthcare company improve flexibility The Cloudwick Solution:  Architect a Data Lake capable of reducing ingestion time to 3 hours  Develop a new Data Governance process  Integrate the company’s predictive analytics applications with their Data Lake on AWS The Benefits:  Improved metadata management  Reduced data ingestion time  HIPAA and PHI Compliance  Near real-time analytics
  • 22.
  • 23. Why Data Lakes on AWS? Security & Compliance AWS enables organizations like yours to improve their security posture in the cloud:  Run on top of the secure AWS data center infrastructure  Encrypt data at rest and in-transit using 256-bit encryption  Meet compliance standards including PCI DSS, HIPAA, FedRAMP, and more  Manage user/group access with AWS Identity and Access Management (AWS IAM)
  • 25. Architecting Data Lakes on AWS with NorthBay Teaching Old Data New Tricks™ NorthBay is an AWS Advanced Consulting Partner that has been implementing Data Lakes on AWS since 2013 and was among the first in the world to achieve the AWS Big Data and Mobile competencies. When NorthBay works with an organization like yours to architect a Data Lake, they:
  • 26. How NorthBay Helped Eliza Corporation Deploy a Data Lake on AWS Eliza Corporation develops healthcare consumer engagement solutions to address some of the industry’s greatest challenges – from adherence, to prevention, to condition management, to brand loyalty and retention. Eliza Corporation analyzes more than 300 million interactions per year Outreach questions and responses form a decision tree, and each question and response are captured as a pair Diverse downstream consumption requirements Challenging to process and analyze data The challenges
  • 27. How NorthBay helped Eliza Corporation Deploy a Data Lake on AWS Create next generation data architecture Decouple Storage and Compute Ability to process old & new data streams Achieve HIPAA compliance Ingest & store original datasets Allow both real-time & batch processing Enable access through entitlements and governance Increase self-service for end-users
  • 28. • Streamlined data load process by enabling schema on read • Improved business agility • Improved cost management by more clearly separating costs • Provided ability to provision and scale resources on-demand • Reduced end-to-end client analytics time Benefits of the Data Lake on AWS
  • 29.
  • 30. Why Data Lakes on AWS? The Most Complete Platform for Big Data AWS offers the most complete platform for Big Data  Support any workload regardless of volume, velocity, or variety of data  Use a variety of descriptive, predictive, and prescriptive analytics for business insights  Immediately available and easy to scale
  • 32. Architecting Data Lakes on AWS with 47Lining Breathtaking Big Data 47Lining is an AWS Advanced Consulting Partner with Big Data Competency designation that develops Big Data solutions and delivers Big Data managed services on AWS. Example use cases include:
  • 33. How 47Lining helped The Howard Hughes Corporation leverage the most complete platform for big data  The Howard Hughes Corporation (HHC) owns, manages and develops commercial, residential and mixed-use real estate throughout the country.  
  • 34. Benefits from The Howard Hughes Corporation Data Lake The 47Lining Solution:  Architect a Data Lake that ingests third party and on-premises data on AWS  Implement a lead-scoring model using Amazon Machine Learning  Turn the customer’s Big Data practice into an AWS-based managed service The Benefits: 400% 10x
  • 35. How partnering with 47Lining helped The Howard Hughes Corporation reduce time to value “Our Data Lake is enabling us to answer previously unanswerable questions through on-demand data fusion and analytics. We are leveraging data for strategic advantage in real estate in ways that are groundbreaking for our industry.” – Daryan Dehghanpisheh, Senior Vice President, The Howard Hughes Corporation
  • 36. BI Tools Data Contributors Managed Enterprise Data Lake External Systems Data Lake Governors (Governance, En tlements) Data Consumers B2E | B2B | B2C Direct Users Business Processes Raw Submissions Untransformed Batch | Stream Managed Datasets Data Managed by Lake Suppor ng Schema-on-Read Usage Data | Metadata Published Data Indexed, consumable via HA DataLake API Indices, History ContributeManage ConsumeSearch Rules, Policies & En tlements Contribute | Manage | Transform | Access Rule-Driven Incremental Loads, Transforms, Cataloging/Indexing, Publishing Iden ty & Security Indexing & Search Ingest Workers, Loaders DataLake API Agile Lakeshore Analy cs Data Mgmt & Orchestra on DataLake UI Owned | On-prem 3rd Party Partners | Vendors Customers Directory Roles IAM Perms Key Mgmt Monitoring DataLake Web Uis Elas c Beanstalk Search Manage Consume Search Manage Consume Elas c Beanstalk Single Sign-On Unified Policy-Based En tlements S3 | Submissions Kinesis | Submissions S3 | Content S3 | Work in Progress SQS Queue Lambda Worker Tier RDS UI, App & API State DynamoDB Discovery Views CloudSearch Facets | Indices | Views DynamoDB HA Published Results R Studio Amazon Machine Learning Hadoop/Spark On-demand Elas c MapReduce, Qubole Redshi On-Demand Warehouses BI /Visualiza on AWS Data Pipeline airflo w Tableau Server Amazon Quicksight CloudWatch CloudCheckr CloudTrail DataDog Data Ecosystem API Users
  • 37. Summary What is a Data Lake?  A new and increasingly popular way to store and analyze all of your data, regardless of source or format, in a centralized repository for use with analytics. What are the benefits?  Accelerated time to value and reduced TCO  Increased storage flexibility and agility  Ad-hoc analysis that delivers business insights more quickly
  • 38. Summary Why AWS for a Data Lake?  Flexibility and agility  Security and compliance  Most complete platform for Big Data Who can help me implement a Data Lake on AWS?  Featured APN Partners  Cloudwick - http://www.cloudwick.com/  NorthBay - http://northbaysolutions.com/  47Lining - http://www.47lining.com/

Editor's Notes

  1. Flexible storage = you can store all data types
  2. A Data Lake can enhance Time to Value by quickly ingesting data without forcing it into a pre-defined schema and can be used as a staging point for a Data Warehouse
  3. The flexibility of a Data Lake enables you to store and analyze all of your data, from all data sources, in one centralized location to help eliminate this challenge.
  4. The flexibility of a Data Lake enables you to store and analyze all of your data, from all data sources, in one centralized location to help eliminate this challenge.
  5. Amazon S3 provides a flexible storage solution that can scale to store nearly any volume of data.
  6. Amazon S3 provides a flexible storage solution that can scale to store nearly any volume of data.
  7. A Data Lake enables you to apply schemas on read, not write, for ad hoc data analysis.
  8. A Data Lake enables you to apply schemas on read, not write, for ad hoc data analysis.
  9. Third bullet - Design the Data Lake to send the right data to each user, whether they be in Finance, Marketing, IT, or any other department
  10. E.G.: <question, response> = <“Did you visit your physician in the last 30 days?”, “Yes”>
  11. Create a next generation data architecture to support continued growth and functionality Allow for both real-time and batch processing with little human intervention Provide metadata, catalog and data discovery for content in the data lake Enable access to data through entitlements and governance Have the ability to accept, transform and process old & new data streams Increase the level of self-service enablement for end-users
  12. Robust architecture to support data at any scale Business agility improved by allowing schema on read vs traditional shaping of data in ETL process Ability to manage costs improved by allowing separation of costs associated with compute and storage resources Flexibility improved by providing elasticity enabling resources to dynamically be scaled up or down based on demand Reduction in touchpoints meant elimination of about 30% manual labor costs Time for end-to-end Client Analytics reduced to 4-8 hours Reduction in touchpoints meant elimination of about 30% manual labor costs Customer Portal gets near real-time updates Ad-hoc reporting tools are self-service with Presto Machine Learning workloads transformed from SAS to Spark ML
  13. Making more timely business decisions could include things like content recommendations on what content to show to which customers. Risk management includes such things as making decisions on customer credit.