© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Construindo seu Data Lake naAWS
21/08
10:30h
Melissa Ravanini
Arquiteta de Soluções da AWS com foco em Saúde
ravanini@amazon.com
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Organizações que geram valor a partir de
dados de forma bem sucedida se sobressairão
em comparação com a concorrência.
24%
15%
Líderes Seguidore
s
Crescimento Orgânico do
Faturamento
*Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence
Gerando valor a partirde dados
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Desafios
Data Visibility Multiple consumers
and requirements
Multiple Access
Mechanisms
1990 2000 2010 2020
Generated Data
Available for Analysis
Analysts Applications
Data Scientists
Business Users API Access BI Tools
Notebooks
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AAWS éasoluçãoperfeita paraseuDataLake
AWS para Data Lakes
Armazenamento, análise e segurança em larga
escala para compartilhamento de dados Celgene é uma
biofarmacêutica de escala
global que desenvolve
terapias com medicamentos
para câncer e doenças
inflamatórias:
“The speed is important, but
equally important is the
additional intellectual
curiosity this enables for
researchers. Even small
gains in research staff
productivity can have a
significant impact on cost
and time to market.”
Lance Smith
Diretor de TI - Celgene
5
Rápida ingestão de dados Separe a computação do armazenamentoDados centralizados
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
https://aws.amazon.com/pt/solutions/case-studies/celgene/
“Escolhemos a AWS pela liderança. São
pioneiros e hoje não existe nada melhor no
mercado. Além disso, a AWS tem na sua
estrutura todos os dados do projeto 1000
Genomes, o que facilita o acesso e reduz o
tempo de processamento”
Dr. Pedro Galante
Pesquisador do IEP
https://aws.amazon.com/pt/solutions/case-studies/sirio-libanes/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tradicionalmente,Analyticssepareciacom isso
OLTP ERP CRM LOB
Data warehouse
Business intelligence • Relational data
• TBs–PBs scale
• Schema defined prior to data load
• Operational reporting and ad hoc
• Large initial CAPEX + $10K–$50K/TB/year
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Lakes seextendem aomodelo tradicional
Data warehouse
Business intelligence
OLTP ERP CRM LOB
• Relational and nonrelational data
• TBs–EBs scale
• Diverse analytical engines
• Low-cost storage & analytics
Devices Web Sensors Social
Data lake
Big data processing,
real-time, machine learning
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Managed ML Service
Deep Learning AMIs
Video and Image Recognition
Conversational Interfaces
Deep-Learning Video Camera
Natural Language Processing
Language Translation
Speech Recognition
Text-to-Speech
Interactive Analysis
Hadoop & Spark
Data Warehousing
Full-text search
Real-time analytics
Dashboards & Visualizations
Dedicated Network connection
Secure appliances
Ruggedized Shipping Container
Database migration
Connect Devices to AWS
Real-time Data Streams
Real-time Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
PortfolioAWS para DataLakes,AnalyticseIoT
Maisamploecompletoconjuntodeferramentas
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
PortfolioAWS para DataLakes,AnalyticseIoT
Maisamploecompletoconjuntodeferramentas
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Arquitetura dereferência
Athena
Glue
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Porque os dados NUNCAsãoperfeitos
Amazon EMR
Spark and Hive running on EMR
• Clean
• Transform
• Concatenate
• Convert to better formats
• Schedule transformations
• Event-driven transformations
• Transformations expressed as code
AWS Glue
Event based Server-less ETL engine
AWS Lambda
Trigger-basedCode Execution
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWSGlue
Serverless ETL
Scales automatically
Uses crawlers to automatically discover your data
Metadata (table definitions and schema) stored in a centralized Data Catalog
Automatically generates code to extract, transform, and load. Scala or Python
written for Apache Spark. Use your IDE or Notebooks to develop and debug. Share
code through GitHub
Start multiple jobs in parallel or with dependencies. Start by schedule, on-demand or
event-based
Pay only when service runs and for metadata stored. Free tier on the storage layer.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Como extrair valor?
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena
Amazon Athena is an interactive query service
that makes it easy to analyze data in Amazon
S3 using standard SQL.
Schema-on-read
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Explorando os dados com Amazon Athena
On-premises Data
Web app data
Amazon RDS
Other Databases
Streaming data
AMAZON
QUICKSIGHT
AMAZON
SAGEMAKER
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hadoop/SparkAnalytics
• Distributed processing
• Diverse analytics
• Batch/Script (Hive/Pig)
• Interactive (Spark, Presto)
• Real-time (Spark)
• Machine Learning (Spark)
• NoSQL (HBase)
• For many use cases
• Log and clickstream analysis
• Machine learning
• Real-time analytics
• Large-scale analytics
• Genomics
• ETL
YARN (Hadoop Resource Manager)
NoSQLMachine
learning
Real-timeInteractiveScriptBatch
Data Lake
on AWS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hadoop/SparkAnalyticsonAWS
YARN (Hadoop Resource Manager)
NoSQLMachine
learning
Real-timeInteractiveScriptBatch
Data Lake
on AWS
Amazon S3
Amazon EMR
Managed Hadoop/Spark
Object Storage
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reprocess data with Amazon EMR (Spark)
On-premise data
Web app data
Amazon RDS
Other Databases
Streaming data
AMAZON
QUICKSIGHT
AMAZON
SAGEMAKER
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
MachineLearning onYour DataLake
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
TheAmazon MachineLearning stack
A I S E R V I C E S
M L S E R V I C E S
M L F R A M E W O R K S &
I N F R A S T R U C T U R E
A m a z o n
S a g e M a k e r G r o u n d T r u t h A l g o r i t h m s
N o t e b o o k s
M a r k e t p l a c e
U n s u p e r v i s e d
L e a r n i n g
S u p e r v i s e d
L e a r n i n g
R e i n f o r c e m e n t
L e a r n i n g
O p t i m i z a t i o n
( N e o )
T r a i n i n g
H o s t i n g
D e p l o y m e n t
Frameworks Interfaces Infrastructure
A m a z o n
R e k o g n i t i o n
I m a g e
A m a z o n
P o l l y
A m a z o n
T r a n s c r i b e
A m a z o n
T r a n s l a t e
A m a z o n
C o m p r e h e n d
A m a z o n
L e x
A m a z o n
R e k o g n i t i o n
V i d e o
Vision Speech Language Chatbots
A m a z o n
F o r e c a s t
Forecasting
A m a z o n
T e x t r a c t
A m a z o n
P e r s o n a l i z e
Recommendations
A m a z o n
E C 2 P 3
& P 3 D N
A m a z o n
E C 2 C 5
F P G A s A W S G r e e n g r a s s A m a z o n
E l a s t i c
I n f e r e n c e
A m a z o n
I n f e r e n t i a
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AmazonSageMaker
Fully managed
hosting with auto-
scaling
One-click
deployment
Pre-built
notebooks for
common
problems
Built-in, high
performance
algorithms
One-click
training
Hyperparameter
optimization
BUILD TRAIN DEPLOY
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Machine Learning com Amazon
Sagemaker
On-premises data
Web app data
Amazon RDS
Other databases
Streaming data
AMAZON
QUICKSIGHT
AMAZON
SAGEMAKER
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HealthSuite
https://aws.amazon.com/pt/solutions/case-studies/philips-redshift/
https://aws.amazon.com/solutions/case-studies/philips/?trk=hcls_case-studies_card
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“The pace of change in healthcare is moving
extremally fast. AWS is going to help us achieve the
scale that we need to be able to deliver innovations
to our clients”
David Cohen
VP do Departamento
de Inteligência
HealtheIntent
https://aws.amazon.com/solutions/case-studies/Cerner/
• Dados de +150 milhões de pessoas
• +10PB
• +1700 nós de processamento,
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
André Almeida
andre.almeida@numb3rs.com.br
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Preencha a pesquisa de satisfação até 28/08
e ganhe U$30 para usar em qualquer serviço
AWS
https://amazonmr.au1.qualtrics.com/jfe/form/SV_eJvEaZL7EZYyssd
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Obrigada!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Melissa Ravanini
ravanini@amazon.com
André Almeida
Numb3rs Analytics
andre.almeida@numb3rs.com.br

Data Lake na área da saúde- AWS

  • 1.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Construindo seu Data Lake naAWS 21/08 10:30h Melissa Ravanini Arquiteta de Soluções da AWS com foco em Saúde ravanini@amazon.com
  • 2.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Organizações que geram valor a partir de dados de forma bem sucedida se sobressairão em comparação com a concorrência. 24% 15% Líderes Seguidore s Crescimento Orgânico do Faturamento *Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence Gerando valor a partirde dados
  • 3.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Desafios Data Visibility Multiple consumers and requirements Multiple Access Mechanisms 1990 2000 2010 2020 Generated Data Available for Analysis Analysts Applications Data Scientists Business Users API Access BI Tools Notebooks
  • 4.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AAWS éasoluçãoperfeita paraseuDataLake
  • 5.
    AWS para DataLakes Armazenamento, análise e segurança em larga escala para compartilhamento de dados Celgene é uma biofarmacêutica de escala global que desenvolve terapias com medicamentos para câncer e doenças inflamatórias: “The speed is important, but equally important is the additional intellectual curiosity this enables for researchers. Even small gains in research staff productivity can have a significant impact on cost and time to market.” Lance Smith Diretor de TI - Celgene 5 Rápida ingestão de dados Separe a computação do armazenamentoDados centralizados © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://aws.amazon.com/pt/solutions/case-studies/celgene/
  • 6.
    “Escolhemos a AWSpela liderança. São pioneiros e hoje não existe nada melhor no mercado. Além disso, a AWS tem na sua estrutura todos os dados do projeto 1000 Genomes, o que facilita o acesso e reduz o tempo de processamento” Dr. Pedro Galante Pesquisador do IEP https://aws.amazon.com/pt/solutions/case-studies/sirio-libanes/
  • 7.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Tradicionalmente,Analyticssepareciacom isso OLTP ERP CRM LOB Data warehouse Business intelligence • Relational data • TBs–PBs scale • Schema defined prior to data load • Operational reporting and ad hoc • Large initial CAPEX + $10K–$50K/TB/year
  • 8.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Data Lakes seextendem aomodelo tradicional Data warehouse Business intelligence OLTP ERP CRM LOB • Relational and nonrelational data • TBs–EBs scale • Diverse analytical engines • Low-cost storage & analytics Devices Web Sensors Social Data lake Big data processing, real-time, machine learning
  • 9.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Managed ML Service Deep Learning AMIs Video and Image Recognition Conversational Interfaces Deep-Learning Video Camera Natural Language Processing Language Translation Speech Recognition Text-to-Speech Interactive Analysis Hadoop & Spark Data Warehousing Full-text search Real-time analytics Dashboards & Visualizations Dedicated Network connection Secure appliances Ruggedized Shipping Container Database migration Connect Devices to AWS Real-time Data Streams Real-time Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement PortfolioAWS para DataLakes,AnalyticseIoT Maisamploecompletoconjuntodeferramentas
  • 10.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement PortfolioAWS para DataLakes,AnalyticseIoT Maisamploecompletoconjuntodeferramentas
  • 11.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Arquitetura dereferência Athena Glue
  • 12.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Porque os dados NUNCAsãoperfeitos Amazon EMR Spark and Hive running on EMR • Clean • Transform • Concatenate • Convert to better formats • Schedule transformations • Event-driven transformations • Transformations expressed as code AWS Glue Event based Server-less ETL engine AWS Lambda Trigger-basedCode Execution
  • 13.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWSGlue Serverless ETL Scales automatically Uses crawlers to automatically discover your data Metadata (table definitions and schema) stored in a centralized Data Catalog Automatically generates code to extract, transform, and load. Scala or Python written for Apache Spark. Use your IDE or Notebooks to develop and debug. Share code through GitHub Start multiple jobs in parallel or with dependencies. Start by schedule, on-demand or event-based Pay only when service runs and for metadata stored. Free tier on the storage layer.
  • 14.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Como extrair valor? Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement
  • 15.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Amazon Athena Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Schema-on-read
  • 16.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Explorando os dados com Amazon Athena On-premises Data Web app data Amazon RDS Other Databases Streaming data AMAZON QUICKSIGHT AMAZON SAGEMAKER
  • 17.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Hadoop/SparkAnalytics • Distributed processing • Diverse analytics • Batch/Script (Hive/Pig) • Interactive (Spark, Presto) • Real-time (Spark) • Machine Learning (Spark) • NoSQL (HBase) • For many use cases • Log and clickstream analysis • Machine learning • Real-time analytics • Large-scale analytics • Genomics • ETL YARN (Hadoop Resource Manager) NoSQLMachine learning Real-timeInteractiveScriptBatch Data Lake on AWS
  • 18.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Hadoop/SparkAnalyticsonAWS YARN (Hadoop Resource Manager) NoSQLMachine learning Real-timeInteractiveScriptBatch Data Lake on AWS Amazon S3 Amazon EMR Managed Hadoop/Spark Object Storage
  • 19.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Reprocess data with Amazon EMR (Spark) On-premise data Web app data Amazon RDS Other Databases Streaming data AMAZON QUICKSIGHT AMAZON SAGEMAKER
  • 20.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. MachineLearning onYour DataLake Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement
  • 21.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. TheAmazon MachineLearning stack A I S E R V I C E S M L S E R V I C E S M L F R A M E W O R K S & I N F R A S T R U C T U R E A m a z o n S a g e M a k e r G r o u n d T r u t h A l g o r i t h m s N o t e b o o k s M a r k e t p l a c e U n s u p e r v i s e d L e a r n i n g S u p e r v i s e d L e a r n i n g R e i n f o r c e m e n t L e a r n i n g O p t i m i z a t i o n ( N e o ) T r a i n i n g H o s t i n g D e p l o y m e n t Frameworks Interfaces Infrastructure A m a z o n R e k o g n i t i o n I m a g e A m a z o n P o l l y A m a z o n T r a n s c r i b e A m a z o n T r a n s l a t e A m a z o n C o m p r e h e n d A m a z o n L e x A m a z o n R e k o g n i t i o n V i d e o Vision Speech Language Chatbots A m a z o n F o r e c a s t Forecasting A m a z o n T e x t r a c t A m a z o n P e r s o n a l i z e Recommendations A m a z o n E C 2 P 3 & P 3 D N A m a z o n E C 2 C 5 F P G A s A W S G r e e n g r a s s A m a z o n E l a s t i c I n f e r e n c e A m a z o n I n f e r e n t i a
  • 22.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AmazonSageMaker Fully managed hosting with auto- scaling One-click deployment Pre-built notebooks for common problems Built-in, high performance algorithms One-click training Hyperparameter optimization BUILD TRAIN DEPLOY
  • 23.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Machine Learning com Amazon Sagemaker On-premises data Web app data Amazon RDS Other databases Streaming data AMAZON QUICKSIGHT AMAZON SAGEMAKER
  • 24.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 25.
  • 26.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 27.
    “The pace ofchange in healthcare is moving extremally fast. AWS is going to help us achieve the scale that we need to be able to deliver innovations to our clients” David Cohen VP do Departamento de Inteligência HealtheIntent
  • 28.
    https://aws.amazon.com/solutions/case-studies/Cerner/ • Dados de+150 milhões de pessoas • +10PB • +1700 nós de processamento,
  • 29.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 30.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 31.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. André Almeida andre.almeida@numb3rs.com.br
  • 32.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Preencha a pesquisa de satisfação até 28/08 e ganhe U$30 para usar em qualquer serviço AWS https://amazonmr.au1.qualtrics.com/jfe/form/SV_eJvEaZL7EZYyssd
  • 33.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Obrigada! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Melissa Ravanini ravanini@amazon.com André Almeida Numb3rs Analytics andre.almeida@numb3rs.com.br

Editor's Notes

  • #3 Uma pesquisa da Alberdeen observou que organizações que implementam data lakes se sobressaem a companhias similares em 9% do crescimento orgânico do seu faturamento
  • #5 1. Agility – Fail fast, and try more before going all-in with a Big Data solution. 2. Broadest & Deepest Capabilities – Build or support virtually any Big Data workload regardless of volume, velocity, and variety of data. AWS offers deep and rapidly expanding functionality for: data warehousing, distributed analytics (supporting Hadoop, Spark, HBase, Hive, Pig and Yarn), Machine Learning and Business Intelligence. 3. Computational Power Second to None – AWS offers twice as many as any other cloud provider– each EC2 instance is optimized for CPU, memory, storage and networking capacity to satisfy the computational requirements of any big data use cases. 4. Low-Cost Analytics - Now petabyte-scale analytics are affordable for everyone. Big Data storage is as low as $28.16/TB; data archiving as low as $0.007/GB/month; data warehousing and BI is 1/10th the cost of traditional enterprise software solutions; real-time streaming data loads for only $0.35/GB; managed Hadoop, Spark, Presto clusters for as little as $0.15 per hour. 5. Trusted & Secure – AWS environments are continuously audited and certified for compliance with 20+ standards: HIPAA, FedRAMP, CESG, and more. AWS offers efficient and scalable encryption for data at rest and in transit, Key Management Services, and Cloud HSM so customers always have 100% control over which country their data resides in. 6. Data Migrations Made Easy - AWS makes data migration fast, low cost, secure and easy with Amazon S3 Transfer Acceleration (a simple web API that you can call directly to load data and improve data upload speeds by 300%), Amazon Kinesis Firehose for streaming data, AWS Snowball Import/Export appliances (100TB data migrations in 1 day vs. 100s of days), Direct Connect (a dedicated private network connection for low-latency connectivity to the cloud) and AWS Database Migration Service to ensure zero downtime. 7. Largest Partner Ecosystem - AWS offers ISVs and integrators across the data management stack, and a catalog of 290 AWS Marketplace products pre-integrated with the AWS Cloud.o not have to guess capacity needs and can support high velocity use cases.
  • #6 Farmaceutica In order to help customers with their data management strategy, we’ve developed solutions to help store, protect, and optimize healthcare data. In doing so, your data can be centralized and downstream use cases are unlocked. Celgene, for example, deployed their data lake on AWS to drive analytics across their global business units. [If this resonates with customer, refer to SPO industry solution as followup]
  • #7 Open Data -> TCGA, ICGC “cada indivíduo tem cerca de 3 GB de informações de DNA, o que demanda muita capacidade de processamento para encontrar pequenas diferenças dentro dessa enorme quantidade de informações.” Na área de oncologia, por exemplo, já não são raros os pedidos de sequenciamento de parte do DNA de pacientes com o objetivo de entender a origem de determinados tumores ou predizer a resposta a um determinado medicamento. Quando esse trabalho é feito, as informações podem, em caso de consentimento do paciente, ficar disponíveis em bancos de dados abertos a toda a comunidade médica para consulta e benchmark.
  • #15 You can run crawlers on a schedule, on-demand, or trigger them based on an event to ensure that your metadata is up-to-date.
  • #26 Once your model is trained and tuned, SageMaker makes it easy to deploy in production so you can start generating predictions on new data (a process called inference). Amazon SageMaker deploys your model on an auto-scaling cluster of Amazon EC2 instances that are spread across multiple availability zones to deliver both high performance and high availability. It also includes built-in A/B testing capabilities to help you test your model and experiment with different versions to achieve the best results.   For maximum versatility, we designed Amazon SageMaker in three modules – Build, Train, and Deploy – that can be used together or independently as part of any existing ML workflow you might already have in place.
  • #29 The Philips HealthSuite digital platform analyzes and stores 15 PB of patient data gathered from 390 million imaging studies, medical records, and patient inputs to provide healthcare providers with actionable data, which they can use to directly impact patient care. Running on AWS provides the reliability, performance and scalability that Philips needs to help protect patient data as its global digital platform grows at the rate of one petabyte per month.
  • #30 Eles pretendem crescer 1 PB/mês Eles carregaram 35 milhões de registros no Redshift em 90 minutos, 870 vezes melhor que a performance da solução anterior on premises. Em menos de 1 dia eles estavam com o ambiente montado
  • #31 AI/ML Big Data Dados de 150 milhões de pessoas, 10PB, 1700 nós de processamento, HealtheIntent é um population health management platform aggregates longitudinal healthcare data que permite que provedores gerenciem populações e a saúde da comunidade Healthy Data Lab Cerner uses AWS and big data to gain actionable, real-time insights, simplifying healthcare delivery while reducing costs for payers, providers, and patients. Cerner, one of the leading suppliers of health information technology (HIT) solutions, chose AWS for its global reach and breadth of services, including machine learning and artificial intelligence.
  • #32 Porque Cerner escolheu a AWS Dados de 150 milhões de pessoas, 10PB, 1700 nós de processamento,