SlideShare a Scribd company logo
1 of 19
How-to implement a GDPR
solution in a Cloudera
architecture
Cloudera + StreamSets Data Collector
Introduction
Since the implementation of GDPR regulation, all data processors across the world have
been struggling to be GDPR compliant and also deal with the new reality in Big Data, that
data is constantly drifting and mutating.
In this presentation, the approach will be:
● Cloudera architecture
● No additional financial cost
● Masking & Encrypting
Architecture
This architecture enables the following:
● Discover Data - ETL
● Data Governance - Policies
● Protect - Anonymization
Pre-Assumptions
1. Cluster hostname: cm1.localdomain
2. StreamSets Data Collector Linux user: sdc Kerberos Principal: sdc/cm1.localdomain/@DOMAIN.COM
3. Cluster Cloudera Manager: Cloudera Manager 5.12.2
4. StreamSets Data Collector: Installed
5. Cluster Authentication Pre-Installed: Kerberos
a. Kerberos Realm DOMAIN.COM
6. StreamSets Data Collector version: 3.5.0
7. StreamSets Data Collector Authentication: Kerberos
Base of Knowledge
With GDPR regulation it’s difficult to find the right balance
between protecting data and utilizing it, but with the solution
on this presentation you will not only do the ETL on your data
lake as you will also protect all the sensitive data.
Knowing this, you must know your data, the purpose of it, as
also the architecture of your System Environment and
therefore choose one or more ways to handle it.
JDBC Connections
The JDBC connections on StreamSets Origins require additional drivers.
For Hive and Impala JDBC Origin connection you need the Cloudera drivers (HiveJDBC41.jar, ImpalaJDBC41.jar), as for a Oracle connection the
Ojdbc8.jar.
Base of Knowledge
JDBC Conn Security URL
Hive
Metadata
None jdbc:hive2://server:10000/db_name
Hive
Metadata
Kerberos
(static)
jdbc:hive2://server:10000/db_name;principal=hive/server@REALM.COM
Hive
(Origin)
Kerberos
(user)
jdbc:impala://server:10000;AuthMech=1;KrbRealm=REALM.COM;KrbHostFQDN=server.example.com;KrbServiceName=hive
Impala
(Origin)
None jdbc:impala://server:21050;authMech=0
Impala
(Origin)
Kerberos jdbc:impala://server:21050;AuthMech=1;KrbRealm=REALM.COM;KrbHostFQDN=server.example.com;KrbServiceName=impala
Oracle
(Origin)
Static jdbc:oracle:thin:@server:listerner_port:db
Use Case
Ingest and Mask Data from Local File System to Hive Metadata.
Data Description
Sample Data
Synopsis
● Dataset title.basics.tsv from IMDB dataset https://www.imdb.com/interfaces/
● Read Tabular Data File from Local File System,
● Mask the field primaryTitle with a regular expression
● Create a Hive External Table and Write to HDFS
Read data from File System
Mask Field
Hive Metadata
Hadoop File System
Hive Metastore
Pipeline Finisher
Data Result
Sample Data
Synopsis
● Regular expression: (.*)(.{4})
● Irreversible
Use Case
Ingest and Encrypt Data from Local File System to Hive Metadata.
Encrypt Field
Data Result
Sample Data
Synopsis
● Regular expression: (.*)(.{4})
● Reversible
Thanks
Big Data Engineer
Tiago Simões

More Related Content

What's hot

Cloudera cluster setup and configuration
Cloudera cluster setup and configurationCloudera cluster setup and configuration
Cloudera cluster setup and configurationSudheer Kondla
 
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingWhy Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingCloudera, Inc.
 
IT Infrastructure Through The Public Network Challenges And Solutions
IT Infrastructure Through The Public Network   Challenges And SolutionsIT Infrastructure Through The Public Network   Challenges And Solutions
IT Infrastructure Through The Public Network Challenges And SolutionsMartin Jackson
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)YoungHeon (Roy) Kim
 
Mysql Fun
Mysql FunMysql Fun
Mysql FunSHC
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 
MySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKMySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKYoungHeon (Roy) Kim
 
Connect Amazon EC2 Linux Instance
Connect Amazon EC2 Linux InstanceConnect Amazon EC2 Linux Instance
Connect Amazon EC2 Linux InstanceVCP Muthukrishna
 
How Helm, The Package Manager For Kubernetes, Works
How Helm, The Package Manager For Kubernetes, WorksHow Helm, The Package Manager For Kubernetes, Works
How Helm, The Package Manager For Kubernetes, WorksMatthew Farina
 
Squid for Load-Balancing & Cache-Proxy ~ A techXpress Guide
Squid for Load-Balancing & Cache-Proxy ~ A techXpress GuideSquid for Load-Balancing & Cache-Proxy ~ A techXpress Guide
Squid for Load-Balancing & Cache-Proxy ~ A techXpress GuideAbhishek Kumar
 
Multinode kubernetes-cluster
Multinode kubernetes-clusterMultinode kubernetes-cluster
Multinode kubernetes-clusterRam Nath
 
Guide - Migrating from Heroku to AWS using CloudFormation
Guide - Migrating from Heroku to AWS using CloudFormationGuide - Migrating from Heroku to AWS using CloudFormation
Guide - Migrating from Heroku to AWS using CloudFormationRob Linton
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production DeploymentsClouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production DeploymentsAhmed Mekawy
 
How To Install and Configure AWS CLI on RHEL 7
How To Install and Configure AWS CLI on RHEL 7How To Install and Configure AWS CLI on RHEL 7
How To Install and Configure AWS CLI on RHEL 7VCP Muthukrishna
 
OpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackOpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackSean Dague
 
Zookeeper In Action
Zookeeper In ActionZookeeper In Action
Zookeeper In Actionjuvenxu
 
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...DataStax
 

What's hot (20)

Cloudera cluster setup and configuration
Cloudera cluster setup and configurationCloudera cluster setup and configuration
Cloudera cluster setup and configuration
 
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingWhy Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
 
IT Infrastructure Through The Public Network Challenges And Solutions
IT Infrastructure Through The Public Network   Challenges And SolutionsIT Infrastructure Through The Public Network   Challenges And Solutions
IT Infrastructure Through The Public Network Challenges And Solutions
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)
 
Build Automation 101
Build Automation 101Build Automation 101
Build Automation 101
 
Mysql Fun
Mysql FunMysql Fun
Mysql Fun
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
MySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKMySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELK
 
Connect Amazon EC2 Linux Instance
Connect Amazon EC2 Linux InstanceConnect Amazon EC2 Linux Instance
Connect Amazon EC2 Linux Instance
 
How Helm, The Package Manager For Kubernetes, Works
How Helm, The Package Manager For Kubernetes, WorksHow Helm, The Package Manager For Kubernetes, Works
How Helm, The Package Manager For Kubernetes, Works
 
Squid for Load-Balancing & Cache-Proxy ~ A techXpress Guide
Squid for Load-Balancing & Cache-Proxy ~ A techXpress GuideSquid for Load-Balancing & Cache-Proxy ~ A techXpress Guide
Squid for Load-Balancing & Cache-Proxy ~ A techXpress Guide
 
Multinode kubernetes-cluster
Multinode kubernetes-clusterMultinode kubernetes-cluster
Multinode kubernetes-cluster
 
Guide - Migrating from Heroku to AWS using CloudFormation
Guide - Migrating from Heroku to AWS using CloudFormationGuide - Migrating from Heroku to AWS using CloudFormation
Guide - Migrating from Heroku to AWS using CloudFormation
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production DeploymentsClouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
 
How To Install and Configure AWS CLI on RHEL 7
How To Install and Configure AWS CLI on RHEL 7How To Install and Configure AWS CLI on RHEL 7
How To Install and Configure AWS CLI on RHEL 7
 
OpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackOpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with Devstack
 
Zookeeper In Action
Zookeeper In ActionZookeeper In Action
Zookeeper In Action
 
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
 

Similar to Implement a GDPR-compliant data solution using Cloudera and StreamSets

Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Big Data Spain
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionTimothy Spann
 
Data Access Technologies
Data Access TechnologiesData Access Technologies
Data Access TechnologiesDimara Hakim
 
5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...
5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...
5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...NomanKhalid56
 
Oracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud InfrastructureOracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud InfrastructureSinanPetrusToma
 
Introduction to Oracle Data Guard Broker
Introduction to Oracle Data Guard BrokerIntroduction to Oracle Data Guard Broker
Introduction to Oracle Data Guard BrokerZohar Elkayam
 
Oracle Fleet Patching and Provisioning Deep Dive Webcast Slides
Oracle Fleet Patching and Provisioning Deep Dive Webcast SlidesOracle Fleet Patching and Provisioning Deep Dive Webcast Slides
Oracle Fleet Patching and Provisioning Deep Dive Webcast SlidesLudovico Caldara
 
State of The Dolphin - May 2021
State of The Dolphin - May 2021State of The Dolphin - May 2021
State of The Dolphin - May 2021Frederic Descamps
 
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdfSchema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdfseo18
 
Database Mirror for the exceptional DBA – David Izahk
Database Mirror for the exceptional DBA – David IzahkDatabase Mirror for the exceptional DBA – David Izahk
Database Mirror for the exceptional DBA – David Izahksqlserver.co.il
 
TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...
TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...
TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...Michael Noel
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Introduction to firebidSQL 3.x
Introduction to firebidSQL 3.xIntroduction to firebidSQL 3.x
Introduction to firebidSQL 3.xFabio Codebue
 
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax
 
From Java 17 to 21- A Showcase of JDK Security Enhancements
From Java 17 to 21- A Showcase of JDK Security EnhancementsFrom Java 17 to 21- A Showcase of JDK Security Enhancements
From Java 17 to 21- A Showcase of JDK Security EnhancementsAna-Maria Mihalceanu
 
ADF Mobile: Implementing Data Caching and Synching
ADF Mobile: Implementing Data Caching and SynchingADF Mobile: Implementing Data Caching and Synching
ADF Mobile: Implementing Data Caching and SynchingSteven Davelaar
 
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...MongoDB
 

Similar to Implement a GDPR-compliant data solution using Cloudera and StreamSets (20)

Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 
Data Access Technologies
Data Access TechnologiesData Access Technologies
Data Access Technologies
 
5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...
5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...
5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...
 
Oracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud InfrastructureOracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud Infrastructure
 
JDBC.ppt
JDBC.pptJDBC.ppt
JDBC.ppt
 
Introduction to Oracle Data Guard Broker
Introduction to Oracle Data Guard BrokerIntroduction to Oracle Data Guard Broker
Introduction to Oracle Data Guard Broker
 
Oracle Fleet Patching and Provisioning Deep Dive Webcast Slides
Oracle Fleet Patching and Provisioning Deep Dive Webcast SlidesOracle Fleet Patching and Provisioning Deep Dive Webcast Slides
Oracle Fleet Patching and Provisioning Deep Dive Webcast Slides
 
State of The Dolphin - May 2021
State of The Dolphin - May 2021State of The Dolphin - May 2021
State of The Dolphin - May 2021
 
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdfSchema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
 
Database Mirror for the exceptional DBA – David Izahk
Database Mirror for the exceptional DBA – David IzahkDatabase Mirror for the exceptional DBA – David Izahk
Database Mirror for the exceptional DBA – David Izahk
 
TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...
TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...
TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Introduction to firebidSQL 3.x
Introduction to firebidSQL 3.xIntroduction to firebidSQL 3.x
Introduction to firebidSQL 3.x
 
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
 
From Java 17 to 21- A Showcase of JDK Security Enhancements
From Java 17 to 21- A Showcase of JDK Security EnhancementsFrom Java 17 to 21- A Showcase of JDK Security Enhancements
From Java 17 to 21- A Showcase of JDK Security Enhancements
 
Oracle Database 12c : Multitenant
Oracle Database 12c : MultitenantOracle Database 12c : Multitenant
Oracle Database 12c : Multitenant
 
ADF Mobile: Implementing Data Caching and Synching
ADF Mobile: Implementing Data Caching and SynchingADF Mobile: Implementing Data Caching and Synching
ADF Mobile: Implementing Data Caching and Synching
 
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Implement a GDPR-compliant data solution using Cloudera and StreamSets

  • 1. How-to implement a GDPR solution in a Cloudera architecture Cloudera + StreamSets Data Collector
  • 2. Introduction Since the implementation of GDPR regulation, all data processors across the world have been struggling to be GDPR compliant and also deal with the new reality in Big Data, that data is constantly drifting and mutating. In this presentation, the approach will be: ● Cloudera architecture ● No additional financial cost ● Masking & Encrypting
  • 3. Architecture This architecture enables the following: ● Discover Data - ETL ● Data Governance - Policies ● Protect - Anonymization
  • 4. Pre-Assumptions 1. Cluster hostname: cm1.localdomain 2. StreamSets Data Collector Linux user: sdc Kerberos Principal: sdc/cm1.localdomain/@DOMAIN.COM 3. Cluster Cloudera Manager: Cloudera Manager 5.12.2 4. StreamSets Data Collector: Installed 5. Cluster Authentication Pre-Installed: Kerberos a. Kerberos Realm DOMAIN.COM 6. StreamSets Data Collector version: 3.5.0 7. StreamSets Data Collector Authentication: Kerberos
  • 5. Base of Knowledge With GDPR regulation it’s difficult to find the right balance between protecting data and utilizing it, but with the solution on this presentation you will not only do the ETL on your data lake as you will also protect all the sensitive data. Knowing this, you must know your data, the purpose of it, as also the architecture of your System Environment and therefore choose one or more ways to handle it.
  • 6. JDBC Connections The JDBC connections on StreamSets Origins require additional drivers. For Hive and Impala JDBC Origin connection you need the Cloudera drivers (HiveJDBC41.jar, ImpalaJDBC41.jar), as for a Oracle connection the Ojdbc8.jar. Base of Knowledge JDBC Conn Security URL Hive Metadata None jdbc:hive2://server:10000/db_name Hive Metadata Kerberos (static) jdbc:hive2://server:10000/db_name;principal=hive/server@REALM.COM Hive (Origin) Kerberos (user) jdbc:impala://server:10000;AuthMech=1;KrbRealm=REALM.COM;KrbHostFQDN=server.example.com;KrbServiceName=hive Impala (Origin) None jdbc:impala://server:21050;authMech=0 Impala (Origin) Kerberos jdbc:impala://server:21050;AuthMech=1;KrbRealm=REALM.COM;KrbHostFQDN=server.example.com;KrbServiceName=impala Oracle (Origin) Static jdbc:oracle:thin:@server:listerner_port:db
  • 7. Use Case Ingest and Mask Data from Local File System to Hive Metadata.
  • 8. Data Description Sample Data Synopsis ● Dataset title.basics.tsv from IMDB dataset https://www.imdb.com/interfaces/ ● Read Tabular Data File from Local File System, ● Mask the field primaryTitle with a regular expression ● Create a Hive External Table and Write to HDFS
  • 9. Read data from File System
  • 15. Data Result Sample Data Synopsis ● Regular expression: (.*)(.{4}) ● Irreversible
  • 16. Use Case Ingest and Encrypt Data from Local File System to Hive Metadata.
  • 18. Data Result Sample Data Synopsis ● Regular expression: (.*)(.{4}) ● Reversible