SlideShare a Scribd company logo
1 of 44
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Neeraj Verma – AWS Solutions Architect
Saurav Mahanti – Senior Manager – Information Systems
Dario Rivera – AWS Solutions Architect
November 28, 2016
How to Build a Big Data Analytics
Data Lake
What to expect from this short talk
• Data Lake concept
• Data Lake - Important Capabilities
• AMGEN’s Data Lake initiative
• How to Build a Data Lake in your AWS account
Data Lake Concept
What is a Data Lake?
Data Lake is a new and increasingly
popular way to store and analyze
massive volumes and heterogenous
types of data in a centralized repository.
Benefits of a Data Lake – Quick Ingest
Quickly ingest data
without needing to force it into a
pre-defined schema.
“How can I collect data quickly
from various sources and store
it efficiently?”
Benefits of a Data Lake – All Data in One Place
“Why is the data distributed in
many locations? Where is the
single source of truth ?”
Store and analyze all of your data,
from all of your sources, in one
centralized location.
Benefits of a Data Lake – Storage vs Compute
Separating your storage and compute
allows you to scale each component as
required
“How can I scale up with the
volume of data being generated?”
Benefits of a Data Lake – Schema on Read
“Is there a way I can apply multiple
analytics and processing frameworks
to the same data?”
A Data Lake enables ad-hoc
analysis by applying schemas
on read, not write.
Important Capabilities of a
“Data Lake”
Important components of a Data Lake
Ingest and Store Catalogue & Search Protect & Secure Access & User
Interface
Amazon SQS apps
Streaming
Amazon Kinesis
Analytics
Amazon KCL
apps
AWS Lambda
Amazon Redshift
COLLECT INGEST/STORE CONSUMEPROCESS / ANALYZE
Amazon Machine
Learning
Presto
Amazon
EMR
Amazon Elasticsearch
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
BatchMessageInteractiveStreamML
SearchSQLNoSQLCacheFileQueueStream
Amazon EC2
Mobile apps
Web apps
Devices
Messaging
Message
Sensors and
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebooksIDEAPI
LoggingIoTApplicationsTransportMessaging
ETL
Many tools to Support the Data Analytics LifeCycle
Ingest and Store
Ingest real time and batch data
Support for any type of data at scale
Durable
Low cost
Use S3 as Data Substrate – Apply to Compute as Needed
EMR Kinesis
Redshift DynamoDB RDS
Data Pipeline
Spark Streaming Storm
Amazon
S3
Import/Export
Snowball
Highly Durable
Low Cost
Scalable Storage
Amazon S3 as your cluster’s persistent data store
Amazon S3
Separate compute and storage
Resize and shut down Analytics
Compute Environments with no data
loss
Point multiple compute clusters at
same data in Amazon S3
AWS Direct Connect AWS Snowball ISV Connectors
Amazon Kinesis
Firehose
S3 Transfer
Acceleration
AWS Storage
Gateway
Data Ingestion into Amazon S3
Metadata lake
Used for summary statistics and data
Classification management
Simplified model for data discovery &
governance
Catalogue & Search
Catalogue & Search Architecture
Data Collectors
(EC2,ECS)
AWS LambdaS3 Bucket
AWS Lambda
Metadata Index
(DynamoDB)
AWS Elasticsearch
Service
Extract Search Fields
Put Object
Object Created,
Object Deleted
Put Item
Update
Stream
Update
Index
Access Control - Authentication &
Authorization
Data protection – Encryption
Logging and Monitoring
Protect and Secure
Encryption ComplianceSecurity
 Identity and Access
Management (IAM) policies
 Bucket policies
 Access Control Lists (ACLs)
 Query string authentication
 Private VPC endpoints to
Amazon S3
 SSL endpoints
 Server Side Encryption
(SSE-S3)
 S3 Server Side
Encryption with
provided keys (SSE-C,
SSE-KMS)
 Client-side Encryption
 Buckets access logs
 Lifecycle Management
Policies
 Access Control Lists
(ACLs)
 Versioning & MFA
deletes
 Certifications – HIPAA,
PCI, SOC 1/2/3 etc.
Implement the right controls
Exposes the data lake to customers
Programmatically query catalogue
Expose search API
Ensures that entitlements are respected
API & User Interface
API & UI Architecture
API Gateway
AWS Lambda Metadata IndexUsers
User
Management
Static Website
Putting It All Together
Common Add-On Capability to Data Lake
Backend ETL Organizational
Control Gates for
Dataset Access
Data Correlation
Identification via
Machine Learning
Dataset Updates to
Dependent Compute
Environments
ETL
json
Data Lake is a Journey
There are Multiple implementation Methods
for Building a Data Lake
• Using a Combination of Specialized/Open Source Tools from
Various Providers to build a Customized Solution
• Use various managed services from AWS such as S3, Amazon
ElasticSearch Service, DynamoDB, Cognito, Lambda, etc as a Core
Data Lake Solution
• Using a DataLake-as-a-Service Solution
Data Lake Evolution @ Amgen
SAURAV MAHANTI
Senior Manager - Information Systems
One Data Lake
One code-base
Multiple Locations
(Cloud / On-Premise)
Common scalable
infrastructure
Multiple Business
Functions
Shared Features
A CONCEPTUAL VIEW OF THE DATA LAKE
DATA LAKE PLATFORM
Common Tools and Capabilities
Innovative new tools and
capabilities are reused
across all functions (e.g.
search, data processing &
storage, visualization tools)Functions can manage
their own data, while
contributing to the
common data layer
Business applications are
built to meet specific
information needs, from
simple data access, data
visualization, to complex
statistical/predictive models
Manufacturing Data
BatchGenealogyVisualization
PDSelf-ServiceAnalytics
InstrumentBusDataSearch
Real World Data
FDASentinelAnalytics
EpiProgrammer’sWorkbench
PatientPopulationAnalytics
Commercial Data
USFieldSalesReporting
GlobalForecasting
USMarketingAnalytics
Best
Practices
Awards
Bio IT World
2016
Winner
HIGH LEVEL COMPONENT ARCHITECTURE of the data lake
Procure/Ingest
DataIngestion
Adapter
Managed
Infrastructure
Self-service
Adapter
Application
Accelerator
Curate and Enrichment
Self-Service
Data Integration
Tools
IS Owned
Automated
Integration
Data
Catalog
Storage and Processing
User Data
Workspace
Raw Mastered Apps
Elastic
Compute
Capability
Analytics Toolkit
Analytical
Toolkit
Data Science
Toolkit
BI Reporting
User Specific
Apps
Access Portal
Centralized Analytics Toolkit
provides different reporting and
analytics solutions for end-users to
access the data on the Data Lake
Curate and Enrichment Layer
Enhances the value and usability of the
data through automation, cataloging
and linking
Data processing & storage layer is
the core of the Enterprise Data Lake
that enables its scale and flexibility
Procure/Ingest
a collection of tools and
technologies to easily
move data to and from
the Data Lake
Reference Data
Linkage
DATA PROCESSING AND STORAGE LAYERProcure/Ingest
DataIngestion
Adapter
Managed
Infrastructure
Self-service
Adapter
Application
Accelerator
Curate and Enrichment
Self-Service Data
Integration Tools
IS Owned Automated
Integration
Data Catalog
Centralized Analytics Toolkit
Analytical Toolkit Data Science
Toolkit
BI Reporting
User Specific
Apps
Centralized Access Portal
Centralized Analytics Toolkit
provides different reporting and analytics
solutions for end-users to access the data
on the Data Lake
Curate and Enrichment Layer
provides different reporting and analytics
solutions for end-users to access the data
on the Data Lake
Data processing & storage layer is the
core of the Enterprise Data Lake that
enables its scale and flexibility
Procure/Ingest
a collection of tools and technologies to
easily move data to and from the
Enterprise Data Lake
The combination of Hadoop HDFS
and Amazon S3 provides the right
cost/performance balance with
unlimited scalability while
maintaining security and encryption
at the data file level
Powerful execution
engines like YARN,
Map Reduce and
SPARK bring the
“compute to the
data”
HIVE and Impala provide SQL over
structured data;
HBASE is used for NoSQL/Transactional
jobs
Solr is used for search capabilities over
documents
MarkLogic and Neo4J provide semantic and
graph capabilities
Amazon RedShift
and Amazon EMR
low-cost elastic
computing with the
ability to spin up
clusters for data
processing and
metrics calculations
Storage and Processing
User Data
Workspace
Raw Mastered Apps
Elastic
Compute
Capability
Mark
Logic
PROCURE AND INGEST - Pre-built and configurable common components to load any type
of data into the Lake
Procure/Ingest
DataIngestion
Adapter
Managed
Infrastructure
Self-service
Adapter
Application
Accelerator
Curate and Enrichment
Self-Service
Data Integration
Tools
IS Owned
Automated
Integration
Data Catalog
Storage and Processing
User Data
Workspace
Raw Mastered Apps
Elastic
Compute
Capability
Centralized Analytics Toolkit
Analytical
Toolkit
Data Science
Toolkit
BI Reporting
User Specific
Apps
Centralized Access Portal
Centralized Analytics Toolkit
provides different reporting and
analytics solutions for end-users to
access the data on the Data Lake
Curate and Enrichment Layer
provides different reporting and
analytics solutions for end-users to
access the data on the Data Lake
Data processing & storage layer is
the core of the Enterprise Data Lake
that enables its scale and flexibilityCloud Data Integration tools like SnapLogic enable data analysts end users to build data
pipelines into the Data Lake using pre-built connectors to various cloud hosted services like Box
and make it easy to move data set between HDFS, S3, sFTP and fileshares
Structured Data Ingestion - A common component for scheduled production data
loads of incremental or full data. It uses Python and native Hadoop tools like Scoop
and HIVE for efficiency and speed.
Real-Time Data Ingestion for real time or streaming data. It
uses the Kafka messaging queue, Spark streaming, Hbase and
Java.
Unstructured Data Pipeline Morphlines Document Pipeline for document ingestion, text
processing and indexing. It uses Morphlines, Lily indexer and HBase.
CURATE AND ENRICHMENT
Enhance the value of the data by linking and cataloging
Reference Data
linkage – Connect
Datasets to
Ontologies and
Vocabularies to get
more relevant and
better results
Data Catalog is an enterprise-
wide metadata catalog that stores,
describes, indexes, and shows
how to access any registered data
asset
Self-Service
Data Integration
Tools
Analytical
Subject Areas
Data
Catalog
Curate and Enrichment
Reference Data
Linkage
Analytical Subject Areas – Build
targeted applications using Data
Integration tools like SnapLogic or
deploy packaged applications or
data marts that transform the data
in the Lake for consumption by
end user tools
PDSelf-ServiceAnalytics
FDASentinelAnalytics
USMarketingAnalytics
CENTRALIZED ANALYTICS TOOLKIT - provides reporting and analytics solutions for end-
users to access the data in the Lake
Analytics Toolkit
Analytical
Toolkit
Data Science
Toolkit
BI Reporting
User Specific
Apps
Access Portal
Data Science toolkit enables
analysts and data scientists use
tools like SAS, R, Jupyter (Python
notebooks) to connect to their
analytics sandbox within the Lake
and submit distributed computing
jobs
Reporting and Business
Intelligence tools like
MicroStrategy and Cognos can be
deployed either directly on the Lake
or on derived Analytical Subject
Areas
Analytics and
Visualization tools are
the most common
methods used to query
and analyze data in the
Lake
Focused applications that target
a specific use-case can be built
using open source products and
deployed to a specific user
community through the portal
Data Lake Portal
provides a user-friendly,
mobile-enabled and
secure portal to all the
end-user applications
Business Impact
Marketed product
defense
Clinical trial speed &
outcomes
Design & Analytic Toolbox
Pre-calculated Cohorts for All Pipeline & Marketed Medicines
Real World Data Platform: Shared capability used by multiple business functions from R&D
and Commercial
RWD Data Lake – Claims, EMR+
Superior processing speed enables simultaneous processing of terabytes of data
Datasets Converted to Common OMOP Data Model
Asia US EU ROW
Descriptive
Rapid RWD Query
Targeted Demand
Therapy areas
Study Design
Advanced Analytics
Spotfire, R, SAS, Achilles
Product value
Benefit:Risk
Evidence-based
research
That’s a lot of Work! –
Where do I even Start?!
Data Lake Solution
Introducing:
Presented by: Dario Rivera – AWS Solutions Architect
Built by: Sean Senior – AWS Solutions Builder
Data Lake Solution
Package of Code –
Deployed via CloudFormation
into your AWS Account
Architecture Overview
Data Lake Server-less Composition
• Amazon S3 (2 buckets)
• Primary data lake content storage
• Static Website Hosting
• Amazon API Gateway (1 RESTful API)
• Amazon DynamoDB (7 Tables)
• Amazon Cognito Your User Pools (1 User Pool)
• AWS Lambda
• 5 microservice functions,
• 1 Amazon API Gateway custom authorizer function,
• 1 Amazon Cognito User Pool event trigger function [Pre sign-up, Post confirmation]
• Amazon Elasticsearch Service (1 cluster)
• AWS IAM (7 policies, 7 roles)
Demo of
AWS Data Lake Solution
http://tinyurl.com/DataLakeSolution
All of this and it costs less than a $1 /
hour to run the Data Lake Solution
* Excluding Storage and Analytics Environment Costs
Data Lake Solution will be available End of Q4.
Available via AWS Answers
https://aws.amazon.com/answers/
Thank you!
Neeraj Verma – rajverma@amazon.com
Saurav Mahanti – smahanti@amgen.com
Dario Rivera – darior@amazon.com

More Related Content

What's hot

Getting Started with AWS Database Migration Service
Getting Started with AWS Database Migration ServiceGetting Started with AWS Database Migration Service
Getting Started with AWS Database Migration ServiceAmazon Web Services
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)Amazon Web Services Korea
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Amazon Web Services
 
Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security Tom Laszewski
 
AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!Chris Taylor
 
AWS 기반 5천만 모바일 앱서비스 확장하기 - 이영진 (강남SE 모임) :: AWS Community Day 2017
AWS 기반 5천만 모바일 앱서비스 확장하기 - 이영진 (강남SE 모임) :: AWS Community Day 2017AWS 기반 5천만 모바일 앱서비스 확장하기 - 이영진 (강남SE 모임) :: AWS Community Day 2017
AWS 기반 5천만 모바일 앱서비스 확장하기 - 이영진 (강남SE 모임) :: AWS Community Day 2017AWSKRUG - AWS한국사용자모임
 
AWS Cloud Adoption Framework and Workshops
AWS Cloud Adoption Framework and WorkshopsAWS Cloud Adoption Framework and Workshops
AWS Cloud Adoption Framework and WorkshopsTom Laszewski
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...Amazon Web Services
 
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...Amazon Web Services Korea
 
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Amazon Web Services
 
데이터 마이그레이션 및 전송을 위한 AWS 스토리지 서비스 활용방안 - 박용선, 메가존 클라우드 매니저
데이터 마이그레이션 및 전송을 위한 AWS 스토리지 서비스 활용방안 - 박용선, 메가존 클라우드 매니저데이터 마이그레이션 및 전송을 위한 AWS 스토리지 서비스 활용방안 - 박용선, 메가존 클라우드 매니저
데이터 마이그레이션 및 전송을 위한 AWS 스토리지 서비스 활용방안 - 박용선, 메가존 클라우드 매니저Amazon Web Services Korea
 
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...Amazon Web Services Korea
 

What's hot (20)

Getting Started with AWS Database Migration Service
Getting Started with AWS Database Migration ServiceGetting Started with AWS Database Migration Service
Getting Started with AWS Database Migration Service
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security
 
AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!
 
AWS 기반 5천만 모바일 앱서비스 확장하기 - 이영진 (강남SE 모임) :: AWS Community Day 2017
AWS 기반 5천만 모바일 앱서비스 확장하기 - 이영진 (강남SE 모임) :: AWS Community Day 2017AWS 기반 5천만 모바일 앱서비스 확장하기 - 이영진 (강남SE 모임) :: AWS Community Day 2017
AWS 기반 5천만 모바일 앱서비스 확장하기 - 이영진 (강남SE 모임) :: AWS Community Day 2017
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
AWS Cloud Adoption Framework and Workshops
AWS Cloud Adoption Framework and WorkshopsAWS Cloud Adoption Framework and Workshops
AWS Cloud Adoption Framework and Workshops
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
 
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
데이터 마이그레이션 및 전송을 위한 AWS 스토리지 서비스 활용방안 - 박용선, 메가존 클라우드 매니저
데이터 마이그레이션 및 전송을 위한 AWS 스토리지 서비스 활용방안 - 박용선, 메가존 클라우드 매니저데이터 마이그레이션 및 전송을 위한 AWS 스토리지 서비스 활용방안 - 박용선, 메가존 클라우드 매니저
데이터 마이그레이션 및 전송을 위한 AWS 스토리지 서비스 활용방안 - 박용선, 메가존 클라우드 매니저
 
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
 

Similar to AWS Big Data Analytics Data Lake

Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxArunPandiyan890855
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS Amazon Web Services
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Amazon Web Services
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Amazon Web Services LATAM
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Amazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Amazon Web Services
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 

Similar to AWS Big Data Analytics Data Lake (20)

AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptx
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

AWS Big Data Analytics Data Lake

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Neeraj Verma – AWS Solutions Architect Saurav Mahanti – Senior Manager – Information Systems Dario Rivera – AWS Solutions Architect November 28, 2016 How to Build a Big Data Analytics Data Lake
  • 2. What to expect from this short talk • Data Lake concept • Data Lake - Important Capabilities • AMGEN’s Data Lake initiative • How to Build a Data Lake in your AWS account
  • 4. What is a Data Lake? Data Lake is a new and increasingly popular way to store and analyze massive volumes and heterogenous types of data in a centralized repository.
  • 5. Benefits of a Data Lake – Quick Ingest Quickly ingest data without needing to force it into a pre-defined schema. “How can I collect data quickly from various sources and store it efficiently?”
  • 6. Benefits of a Data Lake – All Data in One Place “Why is the data distributed in many locations? Where is the single source of truth ?” Store and analyze all of your data, from all of your sources, in one centralized location.
  • 7. Benefits of a Data Lake – Storage vs Compute Separating your storage and compute allows you to scale each component as required “How can I scale up with the volume of data being generated?”
  • 8. Benefits of a Data Lake – Schema on Read “Is there a way I can apply multiple analytics and processing frameworks to the same data?” A Data Lake enables ad-hoc analysis by applying schemas on read, not write.
  • 9. Important Capabilities of a “Data Lake”
  • 10. Important components of a Data Lake Ingest and Store Catalogue & Search Protect & Secure Access & User Interface
  • 11. Amazon SQS apps Streaming Amazon Kinesis Analytics Amazon KCL apps AWS Lambda Amazon Redshift COLLECT INGEST/STORE CONSUMEPROCESS / ANALYZE Amazon Machine Learning Presto Amazon EMR Amazon Elasticsearch Service Apache Kafka Amazon SQS Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS Amazon DynamoDB Streams BatchMessageInteractiveStreamML SearchSQLNoSQLCacheFileQueueStream Amazon EC2 Mobile apps Web apps Devices Messaging Message Sensors and IoT platforms AWS IoT Data centers AWS Direct Connect AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS DOCUMENTS FILES MESSAGES STREAMS Amazon QuickSight Apps & Services Analysis&visualizationNotebooksIDEAPI LoggingIoTApplicationsTransportMessaging ETL Many tools to Support the Data Analytics LifeCycle
  • 12. Ingest and Store Ingest real time and batch data Support for any type of data at scale Durable Low cost
  • 13. Use S3 as Data Substrate – Apply to Compute as Needed EMR Kinesis Redshift DynamoDB RDS Data Pipeline Spark Streaming Storm Amazon S3 Import/Export Snowball Highly Durable Low Cost Scalable Storage
  • 14. Amazon S3 as your cluster’s persistent data store Amazon S3 Separate compute and storage Resize and shut down Analytics Compute Environments with no data loss Point multiple compute clusters at same data in Amazon S3
  • 15. AWS Direct Connect AWS Snowball ISV Connectors Amazon Kinesis Firehose S3 Transfer Acceleration AWS Storage Gateway Data Ingestion into Amazon S3
  • 16. Metadata lake Used for summary statistics and data Classification management Simplified model for data discovery & governance Catalogue & Search
  • 17. Catalogue & Search Architecture Data Collectors (EC2,ECS) AWS LambdaS3 Bucket AWS Lambda Metadata Index (DynamoDB) AWS Elasticsearch Service Extract Search Fields Put Object Object Created, Object Deleted Put Item Update Stream Update Index
  • 18. Access Control - Authentication & Authorization Data protection – Encryption Logging and Monitoring Protect and Secure
  • 19. Encryption ComplianceSecurity  Identity and Access Management (IAM) policies  Bucket policies  Access Control Lists (ACLs)  Query string authentication  Private VPC endpoints to Amazon S3  SSL endpoints  Server Side Encryption (SSE-S3)  S3 Server Side Encryption with provided keys (SSE-C, SSE-KMS)  Client-side Encryption  Buckets access logs  Lifecycle Management Policies  Access Control Lists (ACLs)  Versioning & MFA deletes  Certifications – HIPAA, PCI, SOC 1/2/3 etc. Implement the right controls
  • 20. Exposes the data lake to customers Programmatically query catalogue Expose search API Ensures that entitlements are respected API & User Interface
  • 21. API & UI Architecture API Gateway AWS Lambda Metadata IndexUsers User Management Static Website
  • 22. Putting It All Together
  • 23.
  • 24. Common Add-On Capability to Data Lake Backend ETL Organizational Control Gates for Dataset Access Data Correlation Identification via Machine Learning Dataset Updates to Dependent Compute Environments ETL json
  • 25. Data Lake is a Journey There are Multiple implementation Methods for Building a Data Lake • Using a Combination of Specialized/Open Source Tools from Various Providers to build a Customized Solution • Use various managed services from AWS such as S3, Amazon ElasticSearch Service, DynamoDB, Cognito, Lambda, etc as a Core Data Lake Solution • Using a DataLake-as-a-Service Solution
  • 26. Data Lake Evolution @ Amgen SAURAV MAHANTI Senior Manager - Information Systems
  • 27. One Data Lake One code-base Multiple Locations (Cloud / On-Premise) Common scalable infrastructure Multiple Business Functions Shared Features
  • 28. A CONCEPTUAL VIEW OF THE DATA LAKE DATA LAKE PLATFORM Common Tools and Capabilities Innovative new tools and capabilities are reused across all functions (e.g. search, data processing & storage, visualization tools)Functions can manage their own data, while contributing to the common data layer Business applications are built to meet specific information needs, from simple data access, data visualization, to complex statistical/predictive models Manufacturing Data BatchGenealogyVisualization PDSelf-ServiceAnalytics InstrumentBusDataSearch Real World Data FDASentinelAnalytics EpiProgrammer’sWorkbench PatientPopulationAnalytics Commercial Data USFieldSalesReporting GlobalForecasting USMarketingAnalytics Best Practices Awards Bio IT World 2016 Winner
  • 29. HIGH LEVEL COMPONENT ARCHITECTURE of the data lake Procure/Ingest DataIngestion Adapter Managed Infrastructure Self-service Adapter Application Accelerator Curate and Enrichment Self-Service Data Integration Tools IS Owned Automated Integration Data Catalog Storage and Processing User Data Workspace Raw Mastered Apps Elastic Compute Capability Analytics Toolkit Analytical Toolkit Data Science Toolkit BI Reporting User Specific Apps Access Portal Centralized Analytics Toolkit provides different reporting and analytics solutions for end-users to access the data on the Data Lake Curate and Enrichment Layer Enhances the value and usability of the data through automation, cataloging and linking Data processing & storage layer is the core of the Enterprise Data Lake that enables its scale and flexibility Procure/Ingest a collection of tools and technologies to easily move data to and from the Data Lake Reference Data Linkage
  • 30. DATA PROCESSING AND STORAGE LAYERProcure/Ingest DataIngestion Adapter Managed Infrastructure Self-service Adapter Application Accelerator Curate and Enrichment Self-Service Data Integration Tools IS Owned Automated Integration Data Catalog Centralized Analytics Toolkit Analytical Toolkit Data Science Toolkit BI Reporting User Specific Apps Centralized Access Portal Centralized Analytics Toolkit provides different reporting and analytics solutions for end-users to access the data on the Data Lake Curate and Enrichment Layer provides different reporting and analytics solutions for end-users to access the data on the Data Lake Data processing & storage layer is the core of the Enterprise Data Lake that enables its scale and flexibility Procure/Ingest a collection of tools and technologies to easily move data to and from the Enterprise Data Lake The combination of Hadoop HDFS and Amazon S3 provides the right cost/performance balance with unlimited scalability while maintaining security and encryption at the data file level Powerful execution engines like YARN, Map Reduce and SPARK bring the “compute to the data” HIVE and Impala provide SQL over structured data; HBASE is used for NoSQL/Transactional jobs Solr is used for search capabilities over documents MarkLogic and Neo4J provide semantic and graph capabilities Amazon RedShift and Amazon EMR low-cost elastic computing with the ability to spin up clusters for data processing and metrics calculations Storage and Processing User Data Workspace Raw Mastered Apps Elastic Compute Capability Mark Logic
  • 31. PROCURE AND INGEST - Pre-built and configurable common components to load any type of data into the Lake Procure/Ingest DataIngestion Adapter Managed Infrastructure Self-service Adapter Application Accelerator Curate and Enrichment Self-Service Data Integration Tools IS Owned Automated Integration Data Catalog Storage and Processing User Data Workspace Raw Mastered Apps Elastic Compute Capability Centralized Analytics Toolkit Analytical Toolkit Data Science Toolkit BI Reporting User Specific Apps Centralized Access Portal Centralized Analytics Toolkit provides different reporting and analytics solutions for end-users to access the data on the Data Lake Curate and Enrichment Layer provides different reporting and analytics solutions for end-users to access the data on the Data Lake Data processing & storage layer is the core of the Enterprise Data Lake that enables its scale and flexibilityCloud Data Integration tools like SnapLogic enable data analysts end users to build data pipelines into the Data Lake using pre-built connectors to various cloud hosted services like Box and make it easy to move data set between HDFS, S3, sFTP and fileshares Structured Data Ingestion - A common component for scheduled production data loads of incremental or full data. It uses Python and native Hadoop tools like Scoop and HIVE for efficiency and speed. Real-Time Data Ingestion for real time or streaming data. It uses the Kafka messaging queue, Spark streaming, Hbase and Java. Unstructured Data Pipeline Morphlines Document Pipeline for document ingestion, text processing and indexing. It uses Morphlines, Lily indexer and HBase.
  • 32. CURATE AND ENRICHMENT Enhance the value of the data by linking and cataloging Reference Data linkage – Connect Datasets to Ontologies and Vocabularies to get more relevant and better results Data Catalog is an enterprise- wide metadata catalog that stores, describes, indexes, and shows how to access any registered data asset Self-Service Data Integration Tools Analytical Subject Areas Data Catalog Curate and Enrichment Reference Data Linkage Analytical Subject Areas – Build targeted applications using Data Integration tools like SnapLogic or deploy packaged applications or data marts that transform the data in the Lake for consumption by end user tools PDSelf-ServiceAnalytics FDASentinelAnalytics USMarketingAnalytics
  • 33. CENTRALIZED ANALYTICS TOOLKIT - provides reporting and analytics solutions for end- users to access the data in the Lake Analytics Toolkit Analytical Toolkit Data Science Toolkit BI Reporting User Specific Apps Access Portal Data Science toolkit enables analysts and data scientists use tools like SAS, R, Jupyter (Python notebooks) to connect to their analytics sandbox within the Lake and submit distributed computing jobs Reporting and Business Intelligence tools like MicroStrategy and Cognos can be deployed either directly on the Lake or on derived Analytical Subject Areas Analytics and Visualization tools are the most common methods used to query and analyze data in the Lake Focused applications that target a specific use-case can be built using open source products and deployed to a specific user community through the portal Data Lake Portal provides a user-friendly, mobile-enabled and secure portal to all the end-user applications
  • 34. Business Impact Marketed product defense Clinical trial speed & outcomes Design & Analytic Toolbox Pre-calculated Cohorts for All Pipeline & Marketed Medicines Real World Data Platform: Shared capability used by multiple business functions from R&D and Commercial RWD Data Lake – Claims, EMR+ Superior processing speed enables simultaneous processing of terabytes of data Datasets Converted to Common OMOP Data Model Asia US EU ROW Descriptive Rapid RWD Query Targeted Demand Therapy areas Study Design Advanced Analytics Spotfire, R, SAS, Achilles Product value Benefit:Risk Evidence-based research
  • 35. That’s a lot of Work! – Where do I even Start?!
  • 36. Data Lake Solution Introducing: Presented by: Dario Rivera – AWS Solutions Architect Built by: Sean Senior – AWS Solutions Builder
  • 37. Data Lake Solution Package of Code – Deployed via CloudFormation into your AWS Account
  • 39. Data Lake Server-less Composition • Amazon S3 (2 buckets) • Primary data lake content storage • Static Website Hosting • Amazon API Gateway (1 RESTful API) • Amazon DynamoDB (7 Tables) • Amazon Cognito Your User Pools (1 User Pool) • AWS Lambda • 5 microservice functions, • 1 Amazon API Gateway custom authorizer function, • 1 Amazon Cognito User Pool event trigger function [Pre sign-up, Post confirmation] • Amazon Elasticsearch Service (1 cluster) • AWS IAM (7 policies, 7 roles)
  • 40. Demo of AWS Data Lake Solution http://tinyurl.com/DataLakeSolution
  • 41.
  • 42. All of this and it costs less than a $1 / hour to run the Data Lake Solution * Excluding Storage and Analytics Environment Costs
  • 43. Data Lake Solution will be available End of Q4. Available via AWS Answers https://aws.amazon.com/answers/
  • 44. Thank you! Neeraj Verma – rajverma@amazon.com Saurav Mahanti – smahanti@amgen.com Dario Rivera – darior@amazon.com