SlideShare a Scribd company logo
1 of 31
Data Lakes
visão prática
Marco Garcia
CTO, Founder – Cetax, TutorPro
mgarcia@cetax.com.br
https://www.linkedin.com/in/mgarciacetax/
Com mais de 20 anos de experiência em TI, sendo 18 exclusivamente com Business
Intelligence , Data Warehouse e Big Data, Marco Garcia é certificado pelo Kimball University,
nos EUA, onde obteve aula pessoalmente com Ralph Kimball – um dos principais gurus do
Data Warehouse.
1º Instrutor Certificado Hortonworks LATAM
Arquiteto de Dados e Instrutor na Cetax Consultoria.
02
Apresentação
Data Lake ?
Data Lake ?
The ability to learn or understand or to deal with new or trying situations :reason; also:the skilled use of reason
the ability to apply knowledge to manipulate one's environment or to think abstractly as measured by objective
criteria (as tests).
What is intelligence?
04
Data Lake ?
1ª Citação Data
Lake
Outubro-2010
Data Warehouse x Data Lake
https://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differences.html
Garrafas de água:
- Limpas
- Tratadas
- Empacotadas
- Prontas para o
Consumo
Lago de Dados :
- Bruto
- Sem
tratamento
- Precisa ser
trabalhada
para ser
consumida
“Dados são o novo Petróleo”
No ano de 2012 a
Como petróleo, precisam ser refinados !
DATA IS THE NEW OIL!
DADOS PARA BIG DATA
DADOS POR VALIDADE PARA BIG DATA
FERRAMENTAS PARA BIG DATA
ARQUITETURA COMPLETA PARA BIG DATA ? Hadoop !
Hadoop
WhatisApacheHadoop?
 Allows for the distributed processing of large data sets across clusters of computers using
simple programming models
 Is designed to scale up from single servers to thousands of machines, each offering local
computation and storage
 Does not rely on hardware to deliver high-availability, but rather the library itself is
designed to detect and handle failures at the application layer
 Delivers a highly-available service on top of a cluster of computers, each of which may be
prone to failures
The Apache Hadoop project describes the technology as a software framework that:
Source: http://hadoop.apache.org
HadoopCore=Storage+Compute
storage storage
storage storage
CPU RAM
Yet Another Resource
Negotiator (YARN)
Hadoop Distributed File
System (HDFS)
HadoopDistribution
DistinctMastersandScale-OutWorkers
worker node
NodeManager
DataNode
master node 2
ZooKeeper
Resource
Manager
master node 1
ZooKeeper
NameNode
master node 3
ZooKeeper
HiveServer2
utility node 1
Client
Gateway
Knox
utility node 2
Client
Gateway
Ambari Server
worker node
NodeManager
DataNode
worker node
NodeManager
DataNode
worker node
NodeManager
DataNode
worker node
NodeManager
DataNode
worker node
NodeManager
DataNode
worker node
NodeManager
DataNode
worker node
NodeManager
DataNode
worker node
NodeManager
DataNode
worker node
NodeManager
DataNode
worker node
NodeManager
DataNode
worker node
NodeManager
DataNode
Como seria o DataLake
no Hadoop ?
compute
&
storage
. . .
. . .
. .
compute
&
storage
.
.
YARN
KNOX
AMBARI
HCATALOG (table metadata)
Step 2: Model/Apply Metadata
(data processing)
HIVE PIG
Step 3: Transform, Aggregate & Materialize
LOAD
SQOOP/Hive
Web HDFS
Data Sources
RDBMS, No/New SQL Store
(Oracle, Hana)
EDW
(SAP BW)
Step 4a: Publish/Exchange
Step 4c: Analyze
Analytical Tools
SAS, Python, R, Matlib
ANALYTICAL
NN
AppMaster
Streaming
INTERACTIVE
HIVE Server
Query/Visualization/Re
porting Tools
SAP BO
Tableau/Excel
Any JDBC Compliant
ToolStep 4b: Explore/Visualize
FALCON (data lifecycle)
Manage Steps 1-3: Data Lifecycle with Falcon
LOAD
SQOOP
FLUME
NIFI
KAFKA
SOURCE DATA
App/System
Logs
Customer/Invent
ory Data
Transaction/Sale
s Data
Flat Files
Twitter/Facebook
Streams
DB
File
JMS
REST
HTTP
Streaming
Step 1:Extract & Load
PassosparaoDataLake
Passo 1 - Extrair e Carregar
Passo 2 - Modelar e Aplicar os metadados
Passo 3 - Transformar, Agregar e Materializar os dados
Passo 4a - Publicar ou Enviar Dados
Passo 4b - Explorar e Visualizar
Passo 4c - Analisar, fazer Ciência de Dados
Como Estruturar e
Criar o Data Lake
PontosFundamentais
 Alinhe o Data Lake com a Estrutura Organizacional
 Crie áreas (Zones) no Data Lake (ingest zone, transformation zone, presentation zone)
 Processos de Ingestão de Dados
 Segurança
 Linhagem de Dados
 Entender as necessidades
 Integrações serão necessárias !
EstruturaLógicadaOrganização
 Alinhe a estrutura por funções e não por departamentos ou equipes, as organizações
mudam, mas as funções quase sempre são semelhantes.
 Pense em um investimento de longo prazo
 Esteja sempre atendo a regulamentações e controles internos ou mesmo externos.
 Pense no Data Lake em Camadas
OqueArmazenar?TUDO!
HDFSlayer
 Data is written into landing zone
SQOOP
HDF
Flume
…
RAW format
 Security
Contains PII information
Landing zone is using HDFS TDE for data
protection
Only ETL tools are accessing this layer
Access by data wrangler only
Data retention is limited ( < 1 month )
Landing zone
RDBMS
Landing
SQOOP
Nifi
HDFSlayer
 Data is compressed in large files
Hadoop archive (har)
Solve small file problem
 Data is automatically removed
Retention policy managed via Falcon
 Security
Archive zone is using HDFS TDE for data
protection
Limited set of users can access it
 HDFS tiering
Archival layer
Landing Archive
HDFSlayer
 Data is moving from Landing to Speed
Data is cleaned as part of ETL
Optimized file format
Orc, parquet, avro, …
 Multiple copy of same dataset depending
on use cases
RAW data store in optimized file format
Tokenised, normalisation, datamarts, ...
 Security
Sensitive data are tokenised
Business users access this layer
Presentation layer
Landing Archive
Presentation
Multi-tenantenvironment
 Third party tools move data from landing
into dev & test zone
PII information are encrypted using 3rd party
solution
One way tokenisation
Data is consistently tokenised
Enable join in between different datasets
 Benefit
Development is done against realistic dataset
(volume & format)
Give access to data scientist team
Development and test layer
Landing Dev / Test / …
Multi-tenantenvironment
 Data
Accessed from presentation layer
 Benefit
Give access to version of production data to
data scientist teams
Allow data science team to acquire ad-hoc
external datasets
Data exploration layer
Landing Dev / Test / …
Data
exploration
Multi-tenantenvironment
 Third party tools move data from landing
into dev & test zone
PII information are encrypted using 3rd party
solution
Reversible tokenisation
Data is consistently tokenised
Enable join in between different datasets
Production layer
Landing Dev / Test / …
Prod
Data
exploration
Bestpractices
 Create a catalogue of datasets in Atlas
Data owner
Source system
Project using it
 Keep multiple copy of the same data
Raw
Optimized
Tokenized
 Disaster Recovery
Dev / Test / Data Exploration run on DR cluster
Define prioritize workload
 Create dataset structures based upon
projects
Datasets will be reused across projects
 No write access to business users
Do’s Don’ts
Obrigado !
Visite nos :
www.cetax.com.br
Estamos contratando !

More Related Content

What's hot

Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichDatabricks
 
Azure Data Factory Introduction.pdf
Azure Data Factory Introduction.pdfAzure Data Factory Introduction.pdf
Azure Data Factory Introduction.pdfMaheshPandit16
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothAdaryl "Bob" Wakefield, MBA
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Algumas das principais características do NoSQL
Algumas das principais características do NoSQLAlgumas das principais características do NoSQL
Algumas das principais características do NoSQLEric Silva
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overviewDataArt
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 

What's hot (20)

Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Azure Data Factory Introduction.pdf
Azure Data Factory Introduction.pdfAzure Data Factory Introduction.pdf
Azure Data Factory Introduction.pdf
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Algumas das principais características do NoSQL
Algumas das principais características do NoSQLAlgumas das principais características do NoSQL
Algumas das principais características do NoSQL
 
Ebook - The Guide to Master Data Management
Ebook - The Guide to Master Data Management Ebook - The Guide to Master Data Management
Ebook - The Guide to Master Data Management
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 

Similar to Data Lakes visão prática: estruturação e criação

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xNPN Training
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data trainingagiamas
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
Ceph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide DeckCeph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide DeckDaystromTech
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyInside Analysis
 
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...RainStor
 
Hadoop : The Pile of Big Data
Hadoop : The Pile of Big DataHadoop : The Pile of Big Data
Hadoop : The Pile of Big DataEdureka!
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLAdam Muise
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabsSiva Sankar
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Imply
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET Journal
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Cognizant
 

Similar to Data Lakes visão prática: estruturação e criação (20)

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Ceph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide DeckCeph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide Deck
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
 
Hadoop : The Pile of Big Data
Hadoop : The Pile of Big DataHadoop : The Pile of Big Data
Hadoop : The Pile of Big Data
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 

More from Marco Garcia

Webinar Carreiras de Dados
Webinar Carreiras de DadosWebinar Carreiras de Dados
Webinar Carreiras de DadosMarco Garcia
 
Cases Big Data Aplicados a logística
Cases Big Data Aplicados a logísticaCases Big Data Aplicados a logística
Cases Big Data Aplicados a logísticaMarco Garcia
 
Trabalhos Big Data e Algoritmos - Mercado Financeiro
Trabalhos Big Data e Algoritmos - Mercado FinanceiroTrabalhos Big Data e Algoritmos - Mercado Financeiro
Trabalhos Big Data e Algoritmos - Mercado FinanceiroMarco Garcia
 
Webinar carreiras dados
Webinar carreiras dadosWebinar carreiras dados
Webinar carreiras dadosMarco Garcia
 
CASES Cetax de Inteligência em Saúde - Dados e Algorítmos
CASES Cetax de Inteligência em Saúde - Dados e AlgorítmosCASES Cetax de Inteligência em Saúde - Dados e Algorítmos
CASES Cetax de Inteligência em Saúde - Dados e AlgorítmosMarco Garcia
 
Using Data To Tranform Your Business - Marketing Business
Using Data To Tranform Your Business - Marketing BusinessUsing Data To Tranform Your Business - Marketing Business
Using Data To Tranform Your Business - Marketing BusinessMarco Garcia
 
Workshop BigData, Hadoop e Data Science - Cetax x Deal
Workshop BigData, Hadoop e Data Science - Cetax x DealWorkshop BigData, Hadoop e Data Science - Cetax x Deal
Workshop BigData, Hadoop e Data Science - Cetax x DealMarco Garcia
 
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxMarco Garcia
 
Carreiras em Business Intelligence e Big Data
Carreiras em Business Intelligence e Big DataCarreiras em Business Intelligence e Big Data
Carreiras em Business Intelligence e Big DataMarco Garcia
 
Big Data - Artigo, Conceito, o Que é
Big Data - Artigo, Conceito, o Que é Big Data - Artigo, Conceito, o Que é
Big Data - Artigo, Conceito, o Que é Marco Garcia
 
Palestra Business Intelligence
Palestra Business IntelligencePalestra Business Intelligence
Palestra Business IntelligenceMarco Garcia
 
O que é Business Intelligence (BI)
O que é Business Intelligence (BI)O que é Business Intelligence (BI)
O que é Business Intelligence (BI)Marco Garcia
 
Curso de Business Intelligence e Data Warehouse - Conceitos e Fundamentos
Curso de Business Intelligence e Data Warehouse - Conceitos e FundamentosCurso de Business Intelligence e Data Warehouse - Conceitos e Fundamentos
Curso de Business Intelligence e Data Warehouse - Conceitos e FundamentosMarco Garcia
 
Cursos de Data Warehouse
Cursos de Data WarehouseCursos de Data Warehouse
Cursos de Data WarehouseMarco Garcia
 
Business Intelligence - Palestra
Business Intelligence - PalestraBusiness Intelligence - Palestra
Business Intelligence - PalestraMarco Garcia
 
Modelagem Dimensional
Modelagem DimensionalModelagem Dimensional
Modelagem DimensionalMarco Garcia
 

More from Marco Garcia (17)

Webinar Carreiras de Dados
Webinar Carreiras de DadosWebinar Carreiras de Dados
Webinar Carreiras de Dados
 
Cases Big Data Aplicados a logística
Cases Big Data Aplicados a logísticaCases Big Data Aplicados a logística
Cases Big Data Aplicados a logística
 
Trabalhos Big Data e Algoritmos - Mercado Financeiro
Trabalhos Big Data e Algoritmos - Mercado FinanceiroTrabalhos Big Data e Algoritmos - Mercado Financeiro
Trabalhos Big Data e Algoritmos - Mercado Financeiro
 
Webinar carreiras dados
Webinar carreiras dadosWebinar carreiras dados
Webinar carreiras dados
 
CASES Cetax de Inteligência em Saúde - Dados e Algorítmos
CASES Cetax de Inteligência em Saúde - Dados e AlgorítmosCASES Cetax de Inteligência em Saúde - Dados e Algorítmos
CASES Cetax de Inteligência em Saúde - Dados e Algorítmos
 
Using Data To Tranform Your Business - Marketing Business
Using Data To Tranform Your Business - Marketing BusinessUsing Data To Tranform Your Business - Marketing Business
Using Data To Tranform Your Business - Marketing Business
 
Live - BigData
Live - BigDataLive - BigData
Live - BigData
 
Workshop BigData, Hadoop e Data Science - Cetax x Deal
Workshop BigData, Hadoop e Data Science - Cetax x DealWorkshop BigData, Hadoop e Data Science - Cetax x Deal
Workshop BigData, Hadoop e Data Science - Cetax x Deal
 
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia Cetax
 
Carreiras em Business Intelligence e Big Data
Carreiras em Business Intelligence e Big DataCarreiras em Business Intelligence e Big Data
Carreiras em Business Intelligence e Big Data
 
Big Data - Artigo, Conceito, o Que é
Big Data - Artigo, Conceito, o Que é Big Data - Artigo, Conceito, o Que é
Big Data - Artigo, Conceito, o Que é
 
Palestra Business Intelligence
Palestra Business IntelligencePalestra Business Intelligence
Palestra Business Intelligence
 
O que é Business Intelligence (BI)
O que é Business Intelligence (BI)O que é Business Intelligence (BI)
O que é Business Intelligence (BI)
 
Curso de Business Intelligence e Data Warehouse - Conceitos e Fundamentos
Curso de Business Intelligence e Data Warehouse - Conceitos e FundamentosCurso de Business Intelligence e Data Warehouse - Conceitos e Fundamentos
Curso de Business Intelligence e Data Warehouse - Conceitos e Fundamentos
 
Cursos de Data Warehouse
Cursos de Data WarehouseCursos de Data Warehouse
Cursos de Data Warehouse
 
Business Intelligence - Palestra
Business Intelligence - PalestraBusiness Intelligence - Palestra
Business Intelligence - Palestra
 
Modelagem Dimensional
Modelagem DimensionalModelagem Dimensional
Modelagem Dimensional
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

Data Lakes visão prática: estruturação e criação

  • 1. Data Lakes visão prática Marco Garcia CTO, Founder – Cetax, TutorPro mgarcia@cetax.com.br https://www.linkedin.com/in/mgarciacetax/
  • 2. Com mais de 20 anos de experiência em TI, sendo 18 exclusivamente com Business Intelligence , Data Warehouse e Big Data, Marco Garcia é certificado pelo Kimball University, nos EUA, onde obteve aula pessoalmente com Ralph Kimball – um dos principais gurus do Data Warehouse. 1º Instrutor Certificado Hortonworks LATAM Arquiteto de Dados e Instrutor na Cetax Consultoria. 02 Apresentação
  • 5. The ability to learn or understand or to deal with new or trying situations :reason; also:the skilled use of reason the ability to apply knowledge to manipulate one's environment or to think abstractly as measured by objective criteria (as tests). What is intelligence? 04 Data Lake ?
  • 7. Data Warehouse x Data Lake https://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differences.html Garrafas de água: - Limpas - Tratadas - Empacotadas - Prontas para o Consumo Lago de Dados : - Bruto - Sem tratamento - Precisa ser trabalhada para ser consumida
  • 8. “Dados são o novo Petróleo” No ano de 2012 a Como petróleo, precisam ser refinados ! DATA IS THE NEW OIL!
  • 10. DADOS POR VALIDADE PARA BIG DATA
  • 12. ARQUITETURA COMPLETA PARA BIG DATA ? Hadoop ! Hadoop
  • 13. WhatisApacheHadoop?  Allows for the distributed processing of large data sets across clusters of computers using simple programming models  Is designed to scale up from single servers to thousands of machines, each offering local computation and storage  Does not rely on hardware to deliver high-availability, but rather the library itself is designed to detect and handle failures at the application layer  Delivers a highly-available service on top of a cluster of computers, each of which may be prone to failures The Apache Hadoop project describes the technology as a software framework that: Source: http://hadoop.apache.org
  • 14. HadoopCore=Storage+Compute storage storage storage storage CPU RAM Yet Another Resource Negotiator (YARN) Hadoop Distributed File System (HDFS)
  • 16. DistinctMastersandScale-OutWorkers worker node NodeManager DataNode master node 2 ZooKeeper Resource Manager master node 1 ZooKeeper NameNode master node 3 ZooKeeper HiveServer2 utility node 1 Client Gateway Knox utility node 2 Client Gateway Ambari Server worker node NodeManager DataNode worker node NodeManager DataNode worker node NodeManager DataNode worker node NodeManager DataNode worker node NodeManager DataNode worker node NodeManager DataNode worker node NodeManager DataNode worker node NodeManager DataNode worker node NodeManager DataNode worker node NodeManager DataNode worker node NodeManager DataNode
  • 17. Como seria o DataLake no Hadoop ?
  • 18. compute & storage . . . . . . . . compute & storage . . YARN KNOX AMBARI HCATALOG (table metadata) Step 2: Model/Apply Metadata (data processing) HIVE PIG Step 3: Transform, Aggregate & Materialize LOAD SQOOP/Hive Web HDFS Data Sources RDBMS, No/New SQL Store (Oracle, Hana) EDW (SAP BW) Step 4a: Publish/Exchange Step 4c: Analyze Analytical Tools SAS, Python, R, Matlib ANALYTICAL NN AppMaster Streaming INTERACTIVE HIVE Server Query/Visualization/Re porting Tools SAP BO Tableau/Excel Any JDBC Compliant ToolStep 4b: Explore/Visualize FALCON (data lifecycle) Manage Steps 1-3: Data Lifecycle with Falcon LOAD SQOOP FLUME NIFI KAFKA SOURCE DATA App/System Logs Customer/Invent ory Data Transaction/Sale s Data Flat Files Twitter/Facebook Streams DB File JMS REST HTTP Streaming Step 1:Extract & Load
  • 19. PassosparaoDataLake Passo 1 - Extrair e Carregar Passo 2 - Modelar e Aplicar os metadados Passo 3 - Transformar, Agregar e Materializar os dados Passo 4a - Publicar ou Enviar Dados Passo 4b - Explorar e Visualizar Passo 4c - Analisar, fazer Ciência de Dados
  • 20. Como Estruturar e Criar o Data Lake
  • 21. PontosFundamentais  Alinhe o Data Lake com a Estrutura Organizacional  Crie áreas (Zones) no Data Lake (ingest zone, transformation zone, presentation zone)  Processos de Ingestão de Dados  Segurança  Linhagem de Dados  Entender as necessidades  Integrações serão necessárias !
  • 22. EstruturaLógicadaOrganização  Alinhe a estrutura por funções e não por departamentos ou equipes, as organizações mudam, mas as funções quase sempre são semelhantes.  Pense em um investimento de longo prazo  Esteja sempre atendo a regulamentações e controles internos ou mesmo externos.  Pense no Data Lake em Camadas
  • 24. HDFSlayer  Data is written into landing zone SQOOP HDF Flume … RAW format  Security Contains PII information Landing zone is using HDFS TDE for data protection Only ETL tools are accessing this layer Access by data wrangler only Data retention is limited ( < 1 month ) Landing zone RDBMS Landing SQOOP Nifi
  • 25. HDFSlayer  Data is compressed in large files Hadoop archive (har) Solve small file problem  Data is automatically removed Retention policy managed via Falcon  Security Archive zone is using HDFS TDE for data protection Limited set of users can access it  HDFS tiering Archival layer Landing Archive
  • 26. HDFSlayer  Data is moving from Landing to Speed Data is cleaned as part of ETL Optimized file format Orc, parquet, avro, …  Multiple copy of same dataset depending on use cases RAW data store in optimized file format Tokenised, normalisation, datamarts, ...  Security Sensitive data are tokenised Business users access this layer Presentation layer Landing Archive Presentation
  • 27. Multi-tenantenvironment  Third party tools move data from landing into dev & test zone PII information are encrypted using 3rd party solution One way tokenisation Data is consistently tokenised Enable join in between different datasets  Benefit Development is done against realistic dataset (volume & format) Give access to data scientist team Development and test layer Landing Dev / Test / …
  • 28. Multi-tenantenvironment  Data Accessed from presentation layer  Benefit Give access to version of production data to data scientist teams Allow data science team to acquire ad-hoc external datasets Data exploration layer Landing Dev / Test / … Data exploration
  • 29. Multi-tenantenvironment  Third party tools move data from landing into dev & test zone PII information are encrypted using 3rd party solution Reversible tokenisation Data is consistently tokenised Enable join in between different datasets Production layer Landing Dev / Test / … Prod Data exploration
  • 30. Bestpractices  Create a catalogue of datasets in Atlas Data owner Source system Project using it  Keep multiple copy of the same data Raw Optimized Tokenized  Disaster Recovery Dev / Test / Data Exploration run on DR cluster Define prioritize workload  Create dataset structures based upon projects Datasets will be reused across projects  No write access to business users Do’s Don’ts
  • 31. Obrigado ! Visite nos : www.cetax.com.br Estamos contratando !

Editor's Notes

  1. Os dados podem ser o novo petróleo, a nova corrida que as empresas vão enfrentar para multiplicar seus lucros! A correta coleta, processamento e análise dos dados podem ser um diferencial competitivo a todos os negócios. Claro, como petróleo, os dados também precisam ser refinados para um melhor resultado.
  2. Essa lista é um exemplo de possíveis fontes, mas deveremos ter muito mais fontes. As novas ferramentas permitem conexão e captura de dados em diversas categorias de softwares ou mesmo equipamentos eletrônicos que permita captura de dados. Claro que além dos dados tradicionais que hoje buscamos em outros sistemas, bancos de dados e arquivos de texto.
  3. Referencia - http://voltdb.com/blog/big-data/big-data-value-continuum/
  4. Muitos softwares ? Por favor, se acalme, vamos falar disso um pouco mais para frente.
  5. Muitos softwares ? Por favor, se acalme, vamos falar disso um pouco mais para frente.
  6. This “wordy” slide is straight from the project’s self-description and warrants a splash before we go much further… So what is Apache Hadoop? It is a scalable, fault tolerant, open source framework for the distributed storing and processing of large sets of data on commodity hardware. But what does all that mean? Well first of all it is scalable. Hadoop clusters can range from as few as one machine to literally thousands of machines. That is scalability! It is also fault tolerant. Hadoop services become fault tolerant through redundancy. For example, the Hadoop Distributed File System, called HDFS, automatically replicates data blocks to three separate machines, assuming that your cluster has at least three machines in it. Many other Hadoop services are replicated, too, in order to avoid any single points of failure. Hadoop is also open source. Hadoop development is a community effort governed under the licensing of the Apache Software Foundation. Anyone can help to improve Hadoop by adding features, fixing software bugs, or improving performance and scalability. Hadoop also uses distributed storage and processing. Large datasets are automatically split into smaller chunks, called blocks, and distributed across the cluster machines. Not only that, but each machine processes its local block of data. This means that processing is distributed too, potentially across hundreds of CPUs and hundreds of gigabytes of memory. All of this occurs on commodity hardware which reduces not only the original purchase price, but also potentially reduces support costs as well.
  7. At the most granular level, Hadoop is an engine who provides storage via HDFS and compute via YARN capabilities. The “ecosystem” tools wrap around core.
  8. Hadoop is not a monolithic piece of software. It is a collection of architectural pillars that contain software frameworks. Most of the frameworks are part of the Apache software ecosystem. The picture illustrates the Apache frameworks that are part of the Hortonworks Hadoop distribution. So why does Hadoop have so many frameworks and tools? The reason is that each tool is designed for a specific purpose. The functionality of some tools overlap but typically one tool is going to be better than others when performing certain tasks. For example, both Apache Storm and Apache Flume ingest data and perform real-time analysis. But Storm has more functionality and is more powerful for real-time data analysis.
  9. Here is an example cluster with three master nodes, 12 worker nodes, and two utility nodes. The cluster is running various services, like YARN and HDFS. Services can be implemented by one or more service components. The three master nodes are running service master components. The 12 worker nodes are running service worker components, sometimes called slave components. The two utility nodes are running service components that provide access, security, and management services for the cluster. This page does not illustrate all services, service master, or service worker components. More detail is provided in other lessons.
  10. Break Glass?
  11. If need to be reprocess – Copy form Archive into Landing Har tracking by atlas
  12. ISO27001 – Data & Processing should be separated – Doesn’t mean separated env Separated dev & test are used for upgrade / patch testing - can be smaller / virtualised / ..
  13. ISO24001 – Data & Processing should be separated – Doesn’t mean separated env