Data Lake na área da saúde- AWS

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Construindo seu Data Lake naAWS
21/08
10:30h
Melissa Ravanini
Arquiteta de Soluções da AWS com foco em Saúde
ravanini@amazon.com

Organizações que geram valor a partir de
dados de forma bem sucedida se sobressairão
em comparação com a concorrência.
24%
15%
Líderes Seguidore
s
Crescimento Orgânico do
Faturamento
*Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence
Gerando valor a partirde dados

Desafios
Data Visibility Multiple consumers
and requirements
Multiple Access
Mechanisms
1990 2000 2010 2020
Generated Data
Available for Analysis
Analysts Applications
Data Scientists
Business Users API Access BI Tools
Notebooks

AAWS éasoluçãoperfeita paraseuDataLake

AWS para Data Lakes
Armazenamento, análise e segurança em larga
escala para compartilhamento de dados Celgene é uma
biofarmacêutica de escala
global que desenvolve
terapias com medicamentos
para câncer e doenças
inflamatórias:
“The speed is important, but
equally important is the
additional intellectual
curiosity this enables for
researchers. Even small
gains in research staff
productivity can have a
significant impact on cost
and time to market.”
Lance Smith
Diretor de TI - Celgene
5
Rápida ingestão de dados Separe a computação do armazenamentoDados centralizados
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
https://aws.amazon.com/pt/solutions/case-studies/celgene/

“Escolhemos a AWS pela liderança. São
pioneiros e hoje não existe nada melhor no
mercado. Além disso, a AWS tem na sua
estrutura todos os dados do projeto 1000
Genomes, o que facilita o acesso e reduz o
tempo de processamento”
Dr. Pedro Galante
Pesquisador do IEP
https://aws.amazon.com/pt/solutions/case-studies/sirio-libanes/

Tradicionalmente,Analyticssepareciacom isso
OLTP ERP CRM LOB
Data warehouse
Business intelligence • Relational data
• TBs–PBs scale
• Schema defined prior to data load
• Operational reporting and ad hoc
• Large initial CAPEX + $10K–$50K/TB/year

Data Lakes seextendem aomodelo tradicional
Data warehouse
Business intelligence
OLTP ERP CRM LOB
• Relational and nonrelational data
• TBs–EBs scale
• Diverse analytical engines
• Low-cost storage & analytics
Devices Web Sensors Social
Data lake
Big data processing,
real-time, machine learning

Managed ML Service
Deep Learning AMIs
Video and Image Recognition
Conversational Interfaces
Deep-Learning Video Camera
Natural Language Processing
Language Translation
Speech Recognition
Text-to-Speech
Interactive Analysis
Hadoop & Spark
Data Warehousing
Full-text search
Real-time analytics
Dashboards & Visualizations
Dedicated Network connection
Secure appliances
Ruggedized Shipping Container
Database migration
Connect Devices to AWS
Real-time Data Streams
Real-time Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
PortfolioAWS para DataLakes,AnalyticseIoT
Maisamploecompletoconjuntodeferramentas

Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
PortfolioAWS para DataLakes,AnalyticseIoT
Maisamploecompletoconjuntodeferramentas

Arquitetura dereferência
Athena
Glue

Porque os dados NUNCAsãoperfeitos
Amazon EMR
Spark and Hive running on EMR
• Clean
• Transform
• Concatenate
• Convert to better formats
• Schedule transformations
• Event-driven transformations
• Transformations expressed as code
AWS Glue
Event based Server-less ETL engine
AWS Lambda
Trigger-basedCode Execution

AWSGlue
Serverless ETL
Scales automatically
Uses crawlers to automatically discover your data
Metadata (table definitions and schema) stored in a centralized Data Catalog
Automatically generates code to extract, transform, and load. Scala or Python
written for Apache Spark. Use your IDE or Notebooks to develop and debug. Share
code through GitHub
Start multiple jobs in parallel or with dependencies. Start by schedule, on-demand or
event-based
Pay only when service runs and for metadata stored. Free tier on the storage layer.

Como extrair valor?
Amazon SageMaker
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS IoT Core
Data Lake
on AWS

Amazon Athena
Amazon Athena is an interactive query service
that makes it easy to analyze data in Amazon
S3 using standard SQL.
Schema-on-read

Explorando os dados com Amazon Athena
On-premises Data
Web app data
Amazon RDS
Other Databases
Streaming data
AMAZON
QUICKSIGHT
AMAZON
SAGEMAKER

Hadoop/SparkAnalytics
• Distributed processing
• Diverse analytics
• Batch/Script (Hive/Pig)
• Interactive (Spark, Presto)
• Real-time (Spark)
• Machine Learning (Spark)
• NoSQL (HBase)
• For many use cases
• Log and clickstream analysis
• Machine learning
• Real-time analytics
• Large-scale analytics
• Genomics
• ETL
YARN (Hadoop Resource Manager)
NoSQLMachine
learning
Real-timeInteractiveScriptBatch
Data Lake
on AWS

Hadoop/SparkAnalyticsonAWS
YARN (Hadoop Resource Manager)
NoSQLMachine
learning
Real-timeInteractiveScriptBatch
Data Lake
on AWS
Amazon S3
Amazon EMR
Managed Hadoop/Spark
Object Storage

Reprocess data with Amazon EMR (Spark)
On-premise data
Web app data
Amazon RDS
Other Databases
Streaming data
AMAZON
QUICKSIGHT
AMAZON
SAGEMAKER

MachineLearning onYour DataLake
Amazon SageMaker
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS IoT Core
Data Lake
on AWS

TheAmazon MachineLearning stack
A I S E R V I C E S
M L S E R V I C E S
M L F R A M E W O R K S &
I N F R A S T R U C T U R E
A m a z o n
S a g e M a k e r G r o u n d T r u t h A l g o r i t h m s
N o t e b o o k s
M a r k e t p l a c e
U n s u p e r v i s e d
L e a r n i n g
S u p e r v i s e d
L e a r n i n g
R e i n f o r c e m e n t
L e a r n i n g
O p t i m i z a t i o n
( N e o )
T r a i n i n g
H o s t i n g
D e p l o y m e n t
Frameworks Interfaces Infrastructure
A m a z o n
R e k o g n i t i o n
I m a g e
A m a z o n
P o l l y
A m a z o n
T r a n s c r i b e
A m a z o n
T r a n s l a t e
A m a z o n
C o m p r e h e n d
A m a z o n
L e x
A m a z o n
R e k o g n i t i o n
V i d e o
Vision Speech Language Chatbots
A m a z o n
F o r e c a s t
Forecasting
A m a z o n
T e x t r a c t
A m a z o n
P e r s o n a l i z e
Recommendations
A m a z o n
E C 2 P 3
& P 3 D N
A m a z o n
E C 2 C 5
F P G A s A W S G r e e n g r a s s A m a z o n
E l a s t i c
I n f e r e n c e
A m a z o n
I n f e r e n t i a

AmazonSageMaker
Fully managed
hosting with auto-
scaling
One-click
deployment
Pre-built
notebooks for
common
problems
Built-in, high
performance
algorithms
One-click
training
Hyperparameter
optimization
BUILD TRAIN DEPLOY

Machine Learning com Amazon
Sagemaker
On-premises data
Web app data
Amazon RDS
Other databases
Streaming data
AMAZON
QUICKSIGHT
AMAZON
SAGEMAKER

HealthSuite
https://aws.amazon.com/pt/solutions/case-studies/philips-redshift/
https://aws.amazon.com/solutions/case-studies/philips/?trk=hcls_case-studies_card

“The pace of change in healthcare is moving
extremally fast. AWS is going to help us achieve the
scale that we need to be able to deliver innovations
to our clients”
David Cohen
VP do Departamento
de Inteligência
HealtheIntent

https://aws.amazon.com/solutions/case-studies/Cerner/
• Dados de +150 milhões de pessoas
• +10PB
• +1700 nós de processamento,

André Almeida
andre.almeida@numb3rs.com.br

Preencha a pesquisa de satisfação até 28/08
e ganhe U$30 para usar em qualquer serviço
AWS
https://amazonmr.au1.qualtrics.com/jfe/form/SV_eJvEaZL7EZYyssd

Obrigada!
Melissa Ravanini
ravanini@amazon.com
André Almeida
Numb3rs Analytics
andre.almeida@numb3rs.com.br

Data Lake na área da saúde- AWS

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Lake na área da saúde- AWS

Similar to Data Lake na área da saúde- AWS (20)

More from Amazon Web Services LATAM

More from Amazon Web Services LATAM (20)

Recently uploaded

Recently uploaded (20)

Data Lake na área da saúde- AWS

Editor's Notes