SlideShare a Scribd company logo
1 of 33
Download to read offline
Protecting the Data Lake
A bit about Ash Narkar !
@ashtalk
● Data Lake Overview
● Open Policy Agent
○ Community
○ Features
○ Use Cases
● Use case deep dive
○ Ceph Data Protection
Agenda
Data is King !
Data is King !
● Pervasive
● Abundant
● Customer Experience
● Revenue Growth
Data is King !
● Pervasive
● Abundant
● Customer Experience
● Revenue Growth
● Cyber Attacks
● Breaches
● Fines
● Loss of Customer Trust
What Is A Data Lake?
Data Lake Features
● Centralized Content
● Scalability
● Multiple data type support
● Resource optimization
Data Lake Platform
Sources Consumers
Data Lake Platform: Kafka
Features
● Distributed streaming platform
● Building real-time streaming data pipelines and applications
Security Challenges
● Authorization using Access Control Lists(ACLs)
● How to authorize requests based on context, like user, IP, common name in
certificate
Security Policies
● Consumers of topics containing PII must be whitelisted
● Producers to topics with high fanout must be whitelisted
Data Lake Platform: Ceph
Features
● Unified distributed storage system
● Delivers object, block, and file storage
Security Challenges
● Security protocol handles only Ceph clients and servers. NO human
users or applications 😔
Security Policies
● Users can access only those buckets belonging to the same
geographical region as them
● Access based on a user’s Business Unit, Department etc.
Data Lake Platform: Elasticsearch
Features
● Full-text search and analytics engine
● Store, search and analyze
Security Challenges
● Authorization is not considered as part of job
● User responsible for implementing access control
Security Policies
● Access control policies for a patient’s PHI
Security Challenge Overview
● Distinct systems
● Changing security requirements
❌ Hardcoding policy
❌ Tight coupling
✅ Expressiveness
✅ Speed and performance
✅ Unified Solution
Who can solve the Security Challenge ?
What Is OPA?
OPA: Community
Inception
Project started in 2016 at
Styra.
Goal
Unify policy enforcement
across the stack.
Use Cases
Admission control
Authorization
ACLs
RBAC
IAM
ABAC
Risk management
Data Protection
Data Filtering
Users
Netflix
Medallia
Chef
Cloudflare
State Street
Pinterest
Intuit
Capital One
...and many more.
Today
CNCF project (Incubating)
59 contributors
800+ slack members
2000+ stars
20+ integrations
Service
OPA
Policy
(Rego)
Data
(JSON)
Request
DecisionQuery
OPA: General-purpose policy engine
● Declarative Policy Language (Rego)
○ Can user X do operation Y on resource Z?
○ What invariants does workload W violate?
○ Which records should bob be allowed to see?
● Library, sidecar, host-level daemon
○ Policy and data are kept in-memory
○ Zero decision-time dependencies
● Management APIs for control & observability
○ Bundle service API for sending policy & data to OPA
○ Status service API for receiving status from OPA
○ Log service API for receiving audit log from OPA
● Tooling to build, test, and debug policy
○ opa run, opa test, opa fmt, opa deps, opa check, etc.
○ VS Code plugin, Tracing, Profiling, etc.
OPA: Features
Service
OPA
Policy
(Rego)
Data
(JSON)
Request
DecisionQuery
How does OPA work?
How does OPA work?
Salary Service V1
OPA
Policy
(Rego)
Data
(JSON)
Request
DecisionQuery
Example policy
"Employees can read their own salary
and the salary of anyone they
manage."
How does OPA work?
Example policy
Employees can read their own salary and the salary of
anyone they manage.
Input Data
method: "GET"
path: ["salary", "bob"]
user: "bob"
3 Steps to OPA
Step 1: Clone OPA Repo
3 Steps to OPA
Step 1: Clone OPA Repo
Step 2: Build OPA binary
3 Steps to OPA
Step 1: Clone OPA Repo
Step 2: Build OPA binary
Step 3: Execute OPA binary
Use Cases
CLOUD
Host
DB
Host
sshd
App
Container
HTTP API
Microservice APIs
Orchestrator
Admission Control
Container Execution,
SSH, sudo
Linux
Risk
Management
Data Protection and
Data Filtering
OPA Use Case:
Ceph Data Protection
Ceph Architecture
CEPH STORAGE CLUSTER (RADOS)
LIBRADOS
RBD CEPH FSRADOSGW
Ceph Data Protection: Setup
OPA
Node Port
Service
Incoming
Request
HTTP
S3 Api
RADOSGW RADOS
user, method,
bucket name
Allow / Deny
Policy
(Rego)
Data
(JSON)
Example policy
"Users can access only those buckets belonging
to the same geographical region as them."
Demo: Ceph Data Protection
https://katacoda.com/styra
Data is King !
● Pervasive
● Abundant
● Customer Experience
● Revenue Growth
● Cyber Attacks
● Breaches
● Fines
● Loss of Customer Trust
Thank You!
github.com/open-policy-agent/opa
openpolicyagent.org
slack.openpolicyagent.org
Booth S20

More Related Content

What's hot

Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingHemant Kumar
 
Nagios Conference 2014 - Scott Wilkerson - Getting Started with Nagios Networ...
Nagios Conference 2014 - Scott Wilkerson - Getting Started with Nagios Networ...Nagios Conference 2014 - Scott Wilkerson - Getting Started with Nagios Networ...
Nagios Conference 2014 - Scott Wilkerson - Getting Started with Nagios Networ...Nagios
 
APIs: Intelligent Routing, Security, & Management
APIs: Intelligent Routing, Security, & ManagementAPIs: Intelligent Routing, Security, & Management
APIs: Intelligent Routing, Security, & ManagementNGINX, Inc.
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 
NGINX Plus R19 : EMEA
NGINX Plus R19 : EMEANGINX Plus R19 : EMEA
NGINX Plus R19 : EMEANGINX, Inc.
 
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021StreamNative
 
Digital Forensics and Incident Response in The Cloud
Digital Forensics and Incident Response in The CloudDigital Forensics and Incident Response in The Cloud
Digital Forensics and Incident Response in The CloudVelocidex Enterprises
 
Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017Gianluca Arbezzano
 
Scale your application to new heights with NGINX and AWS
Scale your application to new heights with NGINX and AWSScale your application to new heights with NGINX and AWS
Scale your application to new heights with NGINX and AWSNGINX, Inc.
 
Five years of operating a large scale globally replicated Pulsar installation...
Five years of operating a large scale globally replicated Pulsar installation...Five years of operating a large scale globally replicated Pulsar installation...
Five years of operating a large scale globally replicated Pulsar installation...StreamNative
 
What's new in NGINX Plus R19
What's new in NGINX Plus R19What's new in NGINX Plus R19
What's new in NGINX Plus R19NGINX, Inc.
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElasticsearch
 
An approach for migrating enterprise apps into open stack
An approach for migrating enterprise apps into open stackAn approach for migrating enterprise apps into open stack
An approach for migrating enterprise apps into open stackArthur Berezin
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Analyzing NGINX Logs with Datadog
Analyzing NGINX Logs with DatadogAnalyzing NGINX Logs with Datadog
Analyzing NGINX Logs with DatadogNGINX, Inc.
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production Hung Lin
 
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesDistributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesContainer Solutions
 

What's hot (20)

Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
 
Nagios Conference 2014 - Scott Wilkerson - Getting Started with Nagios Networ...
Nagios Conference 2014 - Scott Wilkerson - Getting Started with Nagios Networ...Nagios Conference 2014 - Scott Wilkerson - Getting Started with Nagios Networ...
Nagios Conference 2014 - Scott Wilkerson - Getting Started with Nagios Networ...
 
APIs: Intelligent Routing, Security, & Management
APIs: Intelligent Routing, Security, & ManagementAPIs: Intelligent Routing, Security, & Management
APIs: Intelligent Routing, Security, & Management
 
Elastic{ON} 2017 Recap
Elastic{ON} 2017 RecapElastic{ON} 2017 Recap
Elastic{ON} 2017 Recap
 
Next-Gen DDoS Detection
Next-Gen DDoS DetectionNext-Gen DDoS Detection
Next-Gen DDoS Detection
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
NGINX Plus R19 : EMEA
NGINX Plus R19 : EMEANGINX Plus R19 : EMEA
NGINX Plus R19 : EMEA
 
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
 
Digital Forensics and Incident Response in The Cloud
Digital Forensics and Incident Response in The CloudDigital Forensics and Incident Response in The Cloud
Digital Forensics and Incident Response in The Cloud
 
Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017
 
Scale your application to new heights with NGINX and AWS
Scale your application to new heights with NGINX and AWSScale your application to new heights with NGINX and AWS
Scale your application to new heights with NGINX and AWS
 
Five years of operating a large scale globally replicated Pulsar installation...
Five years of operating a large scale globally replicated Pulsar installation...Five years of operating a large scale globally replicated Pulsar installation...
Five years of operating a large scale globally replicated Pulsar installation...
 
What's new in NGINX Plus R19
What's new in NGINX Plus R19What's new in NGINX Plus R19
What's new in NGINX Plus R19
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep dive
 
An approach for migrating enterprise apps into open stack
An approach for migrating enterprise apps into open stackAn approach for migrating enterprise apps into open stack
An approach for migrating enterprise apps into open stack
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Analyzing NGINX Logs with Datadog
Analyzing NGINX Logs with DatadogAnalyzing NGINX Logs with Datadog
Analyzing NGINX Logs with Datadog
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production
 
Vault Agent and Vault 0.11 features
Vault Agent and Vault 0.11 featuresVault Agent and Vault 0.11 features
Vault Agent and Vault 0.11 features
 
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesDistributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
 

Similar to Protecting the Data Lake

Dynamic Authorization & Policy Control for Docker Environments
Dynamic Authorization & Policy Control for Docker EnvironmentsDynamic Authorization & Policy Control for Docker Environments
Dynamic Authorization & Policy Control for Docker EnvironmentsTorin Sandall
 
Dynamic Policy Enforcement for Microservice Environments
Dynamic Policy Enforcement for Microservice EnvironmentsDynamic Policy Enforcement for Microservice Environments
Dynamic Policy Enforcement for Microservice EnvironmentsNebulaworks
 
OPA APIs and Use Case Survey
OPA APIs and Use Case SurveyOPA APIs and Use Case Survey
OPA APIs and Use Case SurveyTorin Sandall
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsClaudiu Barbura
 
Hive on Spark, production experience @Uber
 Hive on Spark, production experience @Uber Hive on Spark, production experience @Uber
Hive on Spark, production experience @UberFuture of Data Meetup
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the CloudAmihay Zer-Kavod
 
#RADC4L16: An API-First Archives Approach at NPR
#RADC4L16: An API-First Archives Approach at NPR#RADC4L16: An API-First Archives Approach at NPR
#RADC4L16: An API-First Archives Approach at NPRCamille Salas
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles Sonigo
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA
 
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...Amazon Web Services
 
Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023Nelson Calero
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandraVinay Kumar Chella
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managabilityGaurav Bahrani
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analyticsXiang Fu
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisAlex Henthorn-Iwane
 

Similar to Protecting the Data Lake (20)

Open Policy Agent
Open Policy AgentOpen Policy Agent
Open Policy Agent
 
Dynamic Authorization & Policy Control for Docker Environments
Dynamic Authorization & Policy Control for Docker EnvironmentsDynamic Authorization & Policy Control for Docker Environments
Dynamic Authorization & Policy Control for Docker Environments
 
Dynamic Policy Enforcement for Microservice Environments
Dynamic Policy Enforcement for Microservice EnvironmentsDynamic Policy Enforcement for Microservice Environments
Dynamic Policy Enforcement for Microservice Environments
 
OPA APIs and Use Case Survey
OPA APIs and Use Case SurveyOPA APIs and Use Case Survey
OPA APIs and Use Case Survey
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatterns
 
Hive on Spark, production experience @Uber
 Hive on Spark, production experience @Uber Hive on Spark, production experience @Uber
Hive on Spark, production experience @Uber
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
#RADC4L16: An API-First Archives Approach at NPR
#RADC4L16: An API-First Archives Approach at NPR#RADC4L16: An API-First Archives Approach at NPR
#RADC4L16: An API-First Archives Approach at NPR
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
 
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
 
Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023
 
Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandra
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managability
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 

Protecting the Data Lake

  • 2. A bit about Ash Narkar ! @ashtalk
  • 3. ● Data Lake Overview ● Open Policy Agent ○ Community ○ Features ○ Use Cases ● Use case deep dive ○ Ceph Data Protection Agenda
  • 5. Data is King ! ● Pervasive ● Abundant ● Customer Experience ● Revenue Growth
  • 6. Data is King ! ● Pervasive ● Abundant ● Customer Experience ● Revenue Growth ● Cyber Attacks ● Breaches ● Fines ● Loss of Customer Trust
  • 7. What Is A Data Lake?
  • 8. Data Lake Features ● Centralized Content ● Scalability ● Multiple data type support ● Resource optimization
  • 10. Data Lake Platform: Kafka Features ● Distributed streaming platform ● Building real-time streaming data pipelines and applications Security Challenges ● Authorization using Access Control Lists(ACLs) ● How to authorize requests based on context, like user, IP, common name in certificate Security Policies ● Consumers of topics containing PII must be whitelisted ● Producers to topics with high fanout must be whitelisted
  • 11. Data Lake Platform: Ceph Features ● Unified distributed storage system ● Delivers object, block, and file storage Security Challenges ● Security protocol handles only Ceph clients and servers. NO human users or applications 😔 Security Policies ● Users can access only those buckets belonging to the same geographical region as them ● Access based on a user’s Business Unit, Department etc.
  • 12. Data Lake Platform: Elasticsearch Features ● Full-text search and analytics engine ● Store, search and analyze Security Challenges ● Authorization is not considered as part of job ● User responsible for implementing access control Security Policies ● Access control policies for a patient’s PHI
  • 13. Security Challenge Overview ● Distinct systems ● Changing security requirements ❌ Hardcoding policy ❌ Tight coupling ✅ Expressiveness ✅ Speed and performance ✅ Unified Solution
  • 14. Who can solve the Security Challenge ?
  • 16. OPA: Community Inception Project started in 2016 at Styra. Goal Unify policy enforcement across the stack. Use Cases Admission control Authorization ACLs RBAC IAM ABAC Risk management Data Protection Data Filtering Users Netflix Medallia Chef Cloudflare State Street Pinterest Intuit Capital One ...and many more. Today CNCF project (Incubating) 59 contributors 800+ slack members 2000+ stars 20+ integrations
  • 18. ● Declarative Policy Language (Rego) ○ Can user X do operation Y on resource Z? ○ What invariants does workload W violate? ○ Which records should bob be allowed to see? ● Library, sidecar, host-level daemon ○ Policy and data are kept in-memory ○ Zero decision-time dependencies ● Management APIs for control & observability ○ Bundle service API for sending policy & data to OPA ○ Status service API for receiving status from OPA ○ Log service API for receiving audit log from OPA ● Tooling to build, test, and debug policy ○ opa run, opa test, opa fmt, opa deps, opa check, etc. ○ VS Code plugin, Tracing, Profiling, etc. OPA: Features Service OPA Policy (Rego) Data (JSON) Request DecisionQuery
  • 19. How does OPA work?
  • 20. How does OPA work? Salary Service V1 OPA Policy (Rego) Data (JSON) Request DecisionQuery Example policy "Employees can read their own salary and the salary of anyone they manage."
  • 21. How does OPA work? Example policy Employees can read their own salary and the salary of anyone they manage. Input Data method: "GET" path: ["salary", "bob"] user: "bob"
  • 22. 3 Steps to OPA Step 1: Clone OPA Repo
  • 23. 3 Steps to OPA Step 1: Clone OPA Repo Step 2: Build OPA binary
  • 24. 3 Steps to OPA Step 1: Clone OPA Repo Step 2: Build OPA binary Step 3: Execute OPA binary
  • 25.
  • 26. Use Cases CLOUD Host DB Host sshd App Container HTTP API Microservice APIs Orchestrator Admission Control Container Execution, SSH, sudo Linux Risk Management Data Protection and Data Filtering
  • 27. OPA Use Case: Ceph Data Protection
  • 28. Ceph Architecture CEPH STORAGE CLUSTER (RADOS) LIBRADOS RBD CEPH FSRADOSGW
  • 29. Ceph Data Protection: Setup OPA Node Port Service Incoming Request HTTP S3 Api RADOSGW RADOS user, method, bucket name Allow / Deny Policy (Rego) Data (JSON)
  • 30. Example policy "Users can access only those buckets belonging to the same geographical region as them."
  • 31. Demo: Ceph Data Protection https://katacoda.com/styra
  • 32. Data is King ! ● Pervasive ● Abundant ● Customer Experience ● Revenue Growth ● Cyber Attacks ● Breaches ● Fines ● Loss of Customer Trust