SlideShare a Scribd company logo
Securing and Governing a
Multi-Tenant Data Lake
within the Financial Industry
Presenters: Bradley Smith
Ian Pillay
Data Lake
Security, governance, multi-tenancy
• Hadoop Administrator – Jack of all trades
• Bachelor of Information Science with Honors in Big Data
• Interests: track racing
Bradley Smith
• Hadoop Administrator – Open source connoisseur
• Bachelor of Information & Computer Science
• Interests: game development (Unity 3D, Blender)
Ian Pillay
Who are we?
155
“Transparent security focused on user experience.”
Security
Enterprise security model:
Proactive Reactive
Prevent Predict Detect Respond and contain
• POPI – Protection of Personal Information Act
• GDPR – General Data Protection Regulation
• PCI DSS – Payment Card Industry Data Security Standard
Defining requirements - compliance
5 pillars of enterprise security:
Pillar Intent Tool
Administration How do I set policy?
Apache Ambari / Apache Ranger
Password vault
Authentication Who am I? Kerberos / LDAP
Authorization What can I do?
Apache Ranger / LDAP client
Apache Knox
Audit What did I do?
Apache Ranger / Lucene indexer
LDAP Client
Data Protection How can I encrypt data? FPE / Tokenization / SSL
Data lake security model:
Security challenges:
• Kerberos
• Siloed security teams
• Data encryption & tokenization
• SSL
• Integration conflicts
• Elevated accounts
“Responsible approach to an autonomous experience.”
Governance
How we govern:
• Policies
• Data
• Resources
• Compliance
• POPI – Protection of Personal Information Act
• GDPR – General Data Protection Regulation
• PCI DSS – Payment Card Industry Data Security Standard
Chinese Wall
Governance challenges:
• Internal enterprise policies
• Master Data Management
• Homegrown metadata solution (OPMD – Operational Metadata)
• Global data ownership
• Auditing
• Alerting
• Reporting
“All for one, and one for all”
Multi-tenancy
Data
Lake Analytics
Platform
RESOURCES
QUEUES
Business Unit 1 Business Unit 2
Enterprise Lake
Proprietary
Data Science
Workbench
KVM
(Active/Passive)
Load Balanced
Virtual
Machines
Application
Development
Test
Application
Development
Test
Repo (in DMZ)
e.g. IDE
Managed Queues Managed Storage
Common OS
Apps
Production
Workbench
Common OS
Apps
Multi-tenancy challenges:
• Cloud computing
• Queue performance
• Transparent real time performance dashboards
• Data science workbenches
• Distributed application management
• Legacy users
• API framework
• Web services
“We don’t know what we don’t know.”
Disruptive
technologies
Disruptive technologies:
• Quantum computing
• Adversarial machine learning
• Cryptocurrencies
• Blockchain
• Mixed reality
• Cloud computing
• Emerging API threats
• IOT
• Cybersecurity wars
• Autonomous cars
Q & A
You now have 100% elasticity on your Q
Speaker Information
• Bradley Smith
• Twitter: @Tryxster
• LinkedIn: https://www.linkedin.com/in/bradgsmith/
• Email Address: bradley.g.smith@outlook.com
• Ian Pillay
• Twitter: @IanJamie26
• LinkedIn: https://www.linkedin.com/in/ian-pillay-development/
• Email Address: miraj26@gmail.com

More Related Content

What's hot

Ford
FordFord
Synchronicity of a distributed financial system
Synchronicity of a distributed financial systemSynchronicity of a distributed financial system
Synchronicity of a distributed financial system
DataWorks Summit
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data Transformation
DataWorks Summit
 
O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...
O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...
O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...
DataWorks Summit
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
DataWorks Summit
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC Edge
DataWorks Summit
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Diego Alberto Tamayo
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Precisely
 
Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
DataWorks Summit
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
DataWorks Summit
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
DataWorks Summit
 
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Microsoft
 
Airline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseAirline reservations and routing: a graph use case
Airline reservations and routing: a graph use case
DataWorks Summit
 
Hadoop for the Masses
Hadoop for the MassesHadoop for the Masses
Hadoop for the Masses
DataWorks Summit/Hadoop Summit
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
Jeffrey T. Pollock
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
DataWorks Summit
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
SoftServe
 
Oracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer IntroductionOracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer Introduction
Jeffrey T. Pollock
 

What's hot (20)

Ford
FordFord
Ford
 
Synchronicity of a distributed financial system
Synchronicity of a distributed financial systemSynchronicity of a distributed financial system
Synchronicity of a distributed financial system
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data Transformation
 
O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...
O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...
O2’s Financial Data Hub: going beyond IFRS compliance to support digital tran...
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC Edge
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 
Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
 
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft Azure
 
Airline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseAirline reservations and routing: a graph use case
Airline reservations and routing: a graph use case
 
Hadoop for the Masses
Hadoop for the MassesHadoop for the Masses
Hadoop for the Masses
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
Oracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer IntroductionOracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer Introduction
 

Similar to Securing and governing a multi-tenant data lake within the financial industry

A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
Denodo
 
Improve IT Security and Compliance with Mainframe Data in Splunk
Improve IT Security and Compliance with Mainframe Data in SplunkImprove IT Security and Compliance with Mainframe Data in Splunk
Improve IT Security and Compliance with Mainframe Data in Splunk
Precisely
 
Cybersecurity Legos - We're all part of something bigger
Cybersecurity Legos - We're all part of something biggerCybersecurity Legos - We're all part of something bigger
Cybersecurity Legos - We're all part of something bigger
Ben Boyd
 
Aligning Application Security to Compliance
Aligning Application Security to ComplianceAligning Application Security to Compliance
Aligning Application Security to Compliance
Security Innovation
 
Understanding Zero Trust Security for IBM i
Understanding Zero Trust Security for IBM iUnderstanding Zero Trust Security for IBM i
Understanding Zero Trust Security for IBM i
Precisely
 
CASE STUDY: UK NATIONAL HEALTH SERVICE
CASE STUDY: UK NATIONAL HEALTH SERVICECASE STUDY: UK NATIONAL HEALTH SERVICE
CASE STUDY: UK NATIONAL HEALTH SERVICE
ForgeRock
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
BigDataEverywhere
 
Securing Microsoft Technologies for HITECH Compliance
Securing Microsoft Technologies for HITECH ComplianceSecuring Microsoft Technologies for HITECH Compliance
Securing Microsoft Technologies for HITECH Compliance
Marie-Michelle Strah, PhD
 
Ciso executive forum 2013
Ciso executive forum 2013Ciso executive forum 2013
Ciso executive forum 2013
Bill Burns
 
Guardium Data Activiy Monitor For C- Level Executives
Guardium Data Activiy Monitor For C- Level ExecutivesGuardium Data Activiy Monitor For C- Level Executives
Guardium Data Activiy Monitor For C- Level Executives
Camilo Fandiño Gómez
 
Security data deluge
Security data delugeSecurity data deluge
Security data deluge
DataWorks Summit
 
Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control	Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control
DBmaestro - Database DevOps
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
Karan Sachdeva
 
Securing Microsoft Technologies for HITECH Compliance
Securing Microsoft Technologies for HITECH ComplianceSecuring Microsoft Technologies for HITECH Compliance
Securing Microsoft Technologies for HITECH Compliance
Marie-Michelle Strah, PhD
 
Securing your esi_piedmont
Securing your esi_piedmontSecuring your esi_piedmont
Securing your esi_piedmont
scm24
 
Beyond the Scan: The Value Proposition of Vulnerability Assessment
Beyond the Scan: The Value Proposition of Vulnerability AssessmentBeyond the Scan: The Value Proposition of Vulnerability Assessment
Beyond the Scan: The Value Proposition of Vulnerability Assessment
Damon Small
 
Turn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWSTurn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWS
Amazon Web Services
 
A Study in Borderless Over Perimeter
A Study in Borderless Over PerimeterA Study in Borderless Over Perimeter
A Study in Borderless Over Perimeter
ForgeRock
 
Who will guard the guards
Who will guard the guardsWho will guard the guards
Who will guard the guards
Network Intelligence India
 
Protect Sensitive Data on Your IBM i (Social Distance Your IBM i/AS400)
Protect Sensitive Data on Your IBM i (Social Distance Your IBM i/AS400)Protect Sensitive Data on Your IBM i (Social Distance Your IBM i/AS400)
Protect Sensitive Data on Your IBM i (Social Distance Your IBM i/AS400)
Precisely
 

Similar to Securing and governing a multi-tenant data lake within the financial industry (20)

A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Improve IT Security and Compliance with Mainframe Data in Splunk
Improve IT Security and Compliance with Mainframe Data in SplunkImprove IT Security and Compliance with Mainframe Data in Splunk
Improve IT Security and Compliance with Mainframe Data in Splunk
 
Cybersecurity Legos - We're all part of something bigger
Cybersecurity Legos - We're all part of something biggerCybersecurity Legos - We're all part of something bigger
Cybersecurity Legos - We're all part of something bigger
 
Aligning Application Security to Compliance
Aligning Application Security to ComplianceAligning Application Security to Compliance
Aligning Application Security to Compliance
 
Understanding Zero Trust Security for IBM i
Understanding Zero Trust Security for IBM iUnderstanding Zero Trust Security for IBM i
Understanding Zero Trust Security for IBM i
 
CASE STUDY: UK NATIONAL HEALTH SERVICE
CASE STUDY: UK NATIONAL HEALTH SERVICECASE STUDY: UK NATIONAL HEALTH SERVICE
CASE STUDY: UK NATIONAL HEALTH SERVICE
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
 
Securing Microsoft Technologies for HITECH Compliance
Securing Microsoft Technologies for HITECH ComplianceSecuring Microsoft Technologies for HITECH Compliance
Securing Microsoft Technologies for HITECH Compliance
 
Ciso executive forum 2013
Ciso executive forum 2013Ciso executive forum 2013
Ciso executive forum 2013
 
Guardium Data Activiy Monitor For C- Level Executives
Guardium Data Activiy Monitor For C- Level ExecutivesGuardium Data Activiy Monitor For C- Level Executives
Guardium Data Activiy Monitor For C- Level Executives
 
Security data deluge
Security data delugeSecurity data deluge
Security data deluge
 
Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control	Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
 
Securing Microsoft Technologies for HITECH Compliance
Securing Microsoft Technologies for HITECH ComplianceSecuring Microsoft Technologies for HITECH Compliance
Securing Microsoft Technologies for HITECH Compliance
 
Securing your esi_piedmont
Securing your esi_piedmontSecuring your esi_piedmont
Securing your esi_piedmont
 
Beyond the Scan: The Value Proposition of Vulnerability Assessment
Beyond the Scan: The Value Proposition of Vulnerability AssessmentBeyond the Scan: The Value Proposition of Vulnerability Assessment
Beyond the Scan: The Value Proposition of Vulnerability Assessment
 
Turn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWSTurn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWS
 
A Study in Borderless Over Perimeter
A Study in Borderless Over PerimeterA Study in Borderless Over Perimeter
A Study in Borderless Over Perimeter
 
Who will guard the guards
Who will guard the guardsWho will guard the guards
Who will guard the guards
 
Protect Sensitive Data on Your IBM i (Social Distance Your IBM i/AS400)
Protect Sensitive Data on Your IBM i (Social Distance Your IBM i/AS400)Protect Sensitive Data on Your IBM i (Social Distance Your IBM i/AS400)
Protect Sensitive Data on Your IBM i (Social Distance Your IBM i/AS400)
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 

Recently uploaded (20)

Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 

Securing and governing a multi-tenant data lake within the financial industry

  • 1. Securing and Governing a Multi-Tenant Data Lake within the Financial Industry Presenters: Bradley Smith Ian Pillay
  • 3. • Hadoop Administrator – Jack of all trades • Bachelor of Information Science with Honors in Big Data • Interests: track racing Bradley Smith • Hadoop Administrator – Open source connoisseur • Bachelor of Information & Computer Science • Interests: game development (Unity 3D, Blender) Ian Pillay Who are we?
  • 4. 155
  • 5. “Transparent security focused on user experience.” Security
  • 6. Enterprise security model: Proactive Reactive Prevent Predict Detect Respond and contain • POPI – Protection of Personal Information Act • GDPR – General Data Protection Regulation • PCI DSS – Payment Card Industry Data Security Standard Defining requirements - compliance
  • 7. 5 pillars of enterprise security: Pillar Intent Tool Administration How do I set policy? Apache Ambari / Apache Ranger Password vault Authentication Who am I? Kerberos / LDAP Authorization What can I do? Apache Ranger / LDAP client Apache Knox Audit What did I do? Apache Ranger / Lucene indexer LDAP Client Data Protection How can I encrypt data? FPE / Tokenization / SSL
  • 8.
  • 10. Security challenges: • Kerberos • Siloed security teams • Data encryption & tokenization • SSL • Integration conflicts • Elevated accounts
  • 11. “Responsible approach to an autonomous experience.” Governance
  • 12.
  • 13.
  • 14. How we govern: • Policies • Data • Resources • Compliance • POPI – Protection of Personal Information Act • GDPR – General Data Protection Regulation • PCI DSS – Payment Card Industry Data Security Standard
  • 16. Governance challenges: • Internal enterprise policies • Master Data Management • Homegrown metadata solution (OPMD – Operational Metadata) • Global data ownership • Auditing • Alerting • Reporting
  • 17. “All for one, and one for all” Multi-tenancy
  • 20. Business Unit 1 Business Unit 2 Enterprise Lake Proprietary Data Science Workbench KVM (Active/Passive) Load Balanced Virtual Machines Application Development Test Application Development Test Repo (in DMZ) e.g. IDE Managed Queues Managed Storage Common OS Apps Production Workbench Common OS Apps
  • 21. Multi-tenancy challenges: • Cloud computing • Queue performance • Transparent real time performance dashboards • Data science workbenches • Distributed application management • Legacy users • API framework • Web services
  • 22. “We don’t know what we don’t know.” Disruptive technologies
  • 23. Disruptive technologies: • Quantum computing • Adversarial machine learning • Cryptocurrencies • Blockchain • Mixed reality • Cloud computing • Emerging API threats • IOT • Cybersecurity wars • Autonomous cars
  • 24. Q & A You now have 100% elasticity on your Q
  • 25. Speaker Information • Bradley Smith • Twitter: @Tryxster • LinkedIn: https://www.linkedin.com/in/bradgsmith/ • Email Address: bradley.g.smith@outlook.com • Ian Pillay • Twitter: @IanJamie26 • LinkedIn: https://www.linkedin.com/in/ian-pillay-development/ • Email Address: miraj26@gmail.com

Editor's Notes

  1. Philosophy: transparent security focused on user experience
  2. Bank related security requirements Proactive vs reactive to solve for those requirements
  3. The pillars and how we are achieving them
  4. Security architecture High-level view of whats happening in our environment with the pillars in place
  5. Integrated security model Shows how the types of security initiatives is spread throughout the pillars of big data security and more indepth- how we are pulling it off
  6. Philosophy: responsible approach to an autonomous experience
  7. What does it mean to govern? What can we govern? – Focus on can, not can’t, we don’t want to limit usability/flexibility/adapatibility Data, can and can’t do MetaData (lucene indexer, Atlas) Access Rest Motion (EL) Users, can and can’t do Access Data Platform Users, data
  8. Users and data in harmony Governance that doesn’t prohibit data science and other users from doing their jobs. How we are achieving this, and planning to improve Comes down to AD integration throughout the bank too
  9. Go over what we can and can’t govern, and the types of regulations in place for governance
  10. The concept of the Chinese wall, No data leaves, and if it does, it is useless, which brings up encryption too for the regulations mentioned in the above slide
  11. Philosophy: All for one, and one for all
  12. What is a data lake, What is a data analytics platform What do they mean to each other and what purpose do they serve And lastly what we are doing, and why
  13. Balancing users with queues and resource management Go indepth on how we are using queue management with YARN QM with elasticity And go indepth in how we are handling resources with Dynamic Resource Allocation with SPARK/ YARN etc.
  14. Edge Node architecture overview How we split users while maintaining confidentiality and security , whilst also allowing the whole ‘multi-tenancy’ thing to the cluster, sharing responsibly.
  15. Be open to change
  16. Prepare, it is inevitable
  17. Question time! With a nerdy pun to accompany it