SlideShare a Scribd company logo
1 of 76
Download to read offline
1 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
How	to	Secure	your	Data	lake
Omar	Ascofare	– Hortonworks	Solution	Engineer	
Simon	Thibault	– Account	Executive
2 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hortonworks	Company	Profile
IPO	4Q14	(NASDAQ:	HDP)
3 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Barriers	to	Real-Time	Connected	Enterprises
Inability	to	Leverage	“New”	and	“Traditional”	Data	Sources
• State	of	my	customers?
• State	of	my	operations?
• State	of	products	in	field?
Incomplete	View
of	Enterprise
Siloed	Enterprise
Transaction	Data
Consumers	
&	Customers
Manufacturing
&	Supply	Chain
Connected	Products	
&	Services
Siloed	Real-Time
Data
4 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Traditional	Systems	Don’t	Address	Data	Diversity	Trends
Unstructured	Data Machine	Data Systems	Data
5 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Capture
streaming	data
Deliver
perishable	insights
Combine
new	&	old	data
Store
data	forever
Access
a	multi-tenant	data	lake
Model
with	artificial	intelligence
DATA	AT	RESTDATA	IN	MOTION
ACTIONABLE
INTELLIGENCE
Perishable	Insights Historical	Insights
6 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	ReservedPage6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks	Influences	the	
Apache	Community
We Employ the Committers
--one	third	of	all	committers	to	the	Apache®
Hadoop™	project,	and	a	majority	in	other	
important	projects
Our Committers Innovate
and	expand	both	Open	Enterprise	Hadoop and	
Apache	NiFi
We Influence the Hadoop Roadmap
by	communicating	important	requirements	to	the	
community	through	our	leaders
A PA C H E H A D O O P C O M M I T T E R S
7 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Hortonworks	Strategy	- 100%	Open	Source	Data	Platform
M A X I M U M 	 C O M M U N I T Y 	 I N N O V AT I O N
T H E 	
I N N O V AT I O N 	
A D V A N TA G E
P R O P R I E TA R Y 	
H A D O O P
T I M E
INNOVATION
O P E N 	 C O M M U N I T Y
Eliminates	Risk
of	vendor	lock-in	by	delivering
100%	Apache	open	source	technology
Maximizes	Community	Innovation
with	hundreds	of	developers	across	dozens	
of	companies
Integrates	Seamlessly
through	committed	co-engineering	
partnerships	with	other	leading	technologies
8 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Global	Data	Management	With	Hortonworks
Globally	Manage,	Secure,	Govern,	Consume
DATAPLANE	SERVICE	(DPS)
MANAGE,	GOVERN,	SECURE
DATA
LIFECYCLE
MANAGER
DATA	
STEWARD
STUDIO*
ISV
SERVICES
*not	yet	available,	coming	soon
EXTENSIBLE	SERVICES
IBM	DSX*CLOUD-
BREAK*
DATA
ANALYTICS
STUDIO*
CONNECTED	DATA	PLATFORMS
HORTONWORKS
DATA	PLATFORM	(HDP®)	DATA-AT-
REST
HORTONWORKS	
DATAFLOW	(HDF™)
DATA-IN-MOTION
MODERN	DATA	USE	CASES
EDW
OPTIMIZATION
CYBERSECURITY DATA	SCIENCE
ADVANCED
ANALYTICS
HORTONWORKS
CONNECTION
ENTERPRISE	SUPPORT
PREMIER	SUPPORT
EDUCATIONAL	SERVICES
PROFESSIONAL	SERVICES
COMMUNITY	CONNECTION
HORTONWORKS
PLATFORM	SERVICES
OPERATIONAL	SERVICES
SMARTSENSE™
DATA	
SOURCES
DATA	CENTER CLOUD EDGE
Exception	
Monitoring
360	View	of
Operations
Cyber	
Security
Telemetry	–
Connected	
Devices
Time	Series
Sensors,	
Control	
Systems
9 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
What	is	an	Enterprise	Data	Lake?
10 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
P U R P O S E - B U I LT C L U S T E R S
Cluster 1
Application
Security
Governance
Storage
YARN
Operations
Batch
Cluster 2
Application
Security
Governance
Storage
Operations
Interactive
Dedicated
Resource Mgt
…
Application
Security
Governance
Storage
Operations
Real-time
Dedicated
Resource Mgt
Cluster N
An architectural pattern in the data center that uses Hadoop to deliver insight
across a large, broad, diverse set of data at efficient scale. But what is it?
Enterprise	Grade
Security,	Operations,	Governance.
Integration	with	higher	level	services	
(SAS,	SAP,	Microsoft,	etc..)
Platform	for	All	Data
Store	anything,	structure/unstructured.
Store	everything,	retain	all	attributes.
Land	all	data	in	a	single	place.	
Multi-Purpose
Data	in	a	single	open	platform. Interact	in	
many	ways.	
Multi-Tenancy	in	one	shared	cluster.
E N T E R P R I S E D ATA L A K E
HDFS Storage	Layer
YARN Resource Management
Batch Interactive Real-time In-Memory Search
OPERATIONS
GOVERNANCE
SECURITY
Application Application Application Application Application
11 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
Use	Case	Component	Mapping
DATA PLATFORM CORE CAPABILITIES
RESOURCE
MANAGEMENT / SLA
OPERATIONS
AUTHENTICATION / SSO
INFRASTRUCTURE
AUTHORIZATION SCHEDULING
DATA ENCRYPTION /
DATA MASKING
GOVERNANCE / DATA
LINEAGE
BACKUP
BUSINESS TAXONOMY /
META DATA
USE CASES
DATA MANAGEMENT LAYER
RELATIONAL DATA
MODEL
DATA TRANSFORMATION META DATA NoSQL IN MEMORY DATA
PROCESSING
CUSTOM APPLICATION
SINGLE
VIEW OF
ENITY
PREDICTIVE
ANALYTICS
ACTIVE
ARCHIVE
DATA
DISCOVERY
ETL
ONBOARD
END USER TOOLS
BUSINESS INTELLIGENCE DATA SCIENCE
NOTEBOOK
DISCOVERY TOOL MACHINE LEARNING APPLICATION
INTEGRATION
SEARCH
INGESTION
STREAMING
EXTRACT / TRANSFORM /
LOAD
COMMAND LINE
BATCH DATA LOADING
END USER UPLOAD
DATA
ENRICHMENT
12 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
Use	Case	Component	Mapping
DATA PLATFORM CORE CAPABILITIES
YARN
AMBARI
KERBEROS
CLOUDBREAK / AMBARI
RANGER OOZIE
RANGER
ATLAS
DLM
ATLAS
USE CASES
DATA MANAGEMENT LAYER
HIVE PIG HCAT HBASE SPARK JAVA
SINGLE
VIEW OF
ENTITY
PREDICTIVE
ANALYTICS
ACTIVE
ARCHIVE
DATA
DISCOVERY
ETL
ONBOARD
END USER LAYER
ODBC / JDBC ZEPPELIN PARTNER OFFERINGS MLLIB WEB SERVICES SOLR
INGESTION
HDF
ETL VENDORS
HDFS
SQOOP
AMBARI
DATA
ENRICHMENT
13 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
HDFS	Storage	Layer
CORE
YARN Resource Management
Storm
INGESTION
Sqoop
HDF
WebHDFS
Kafka
DATAFLOW
NiFi
MR2 Tez Spark HBase Solr
Hive
DATA PROCESSING
Pig Spark Phoenix Storm
Knox Gateway – Perimeter Security
Ranger	Authorization	/	Audit
DATA SECURITY
Kerberos	Authentication HDFS	Encryption
Cloudbreak
OPS
Ambari
DATA MANAGEMENT & GOVERNANCE
Atlas	Metadata	FrameworkFalcon	Data	Lifecycle	/	Pipeline
SCHEDULING
Oozie	Batch	Scheduler
HCatalog
Centralized	Metadata	Management
Click-Stream
SOURCE DATA
Social Data
IoT
Product Data
Sales Data
Enterprise Data
System Administrator
Data Scientist
Data Engineer
OLTP System
EDW
TARGET SYSTEMS
ANALYTIC TOOLS
Tableau /
Excel
SAP / Others
Hive	(Tez)
CONSUMPTION
Spark
HBase	/	Phoenix
Zeppelin
TOOLS
Ambari	Views	
OPERATIONS
GOVERNANCE
SECURITY
Data Lake Reference Architecture
1402/03/2018
SG DATA OFFER
Big Data As A Service
53
Nodes in Production
6.88T
Storage Capacity
1164
Cores
1502/03/2018
SG DATA
DATALAKE
SOURCE 1 SOURCE 2 SOURCE 3
APPLICATION 1 APPLICATION 2 APPLICATION 3
- Source get the
ownership on the data
- Application requires
access
- Datalake ruled by
conventions
application name, hdfs path, hive
table names,…
16 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
What’s	the	Worst	Case	Scenario?
17 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
Yet	we	had	a	solid	
roadmap	for	a	data	
lake	!
18 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
No	worries	boss,		I	
got	this	!
19 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
How	do	we	Secure	the	Zoo?
20 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Start	with	Authentication
21 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Background:	Kerberos
⬢ Strongly	authenticating	and	establishing	a	user’s	identity	is	the	basis	for	secure	
access	in	Hadoop
⬢ Users	need	to	be	able	to	reliably	“identify”	themselves	and	have	identity	
propagated	throughout	the	Hadoop	cluster
⬢ Why	Kerberos?
⬢ Establishes	identity	for	clients,	hosts	and	services
⬢ Prevents	impersonation/passwords	are	never	sent	over	the	wire
⬢ Integrates	w/	enterprise	identity	mgmt tools	such	as	LDAP	&	Active	Directory
⬢ More	granular	auditing	of	data	access/job	execution
22 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	ReservedPage22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved22
23 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Kerberos	Primer
Page	23
Client
KDC
NN
DN
1. kinit - Login	and	get	Ticket	Granting	Ticket	(TGT)
3.	Get	NameNode Service	Ticket	(NN-ST)
2.	Client	Stores	TGT	in	Ticket	Cache
4. Client	Stores	NN-ST	in	Ticket	Cache
5. Read/write	file	given	NN-ST	and
file	name;	returns	block	locations,	
block	IDs	and Block	Access Tokens
if	access	permitted
6.	Read/write	block	given
Block	Access	Token	and	block	ID
Client’s
Kerberos	Ticket	
Cache
24 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Background:	HDP	+	Kerberos
Service	
Component	
A
Service	
Component	
B
HDP	Cluster
KDC
keytabkeytab
Service	
Component	
C
keytab
Service	
Component	
D
keytab
Service	
Component	
X
Service	
Component	
X
keytabkeytab
Service	
Component	
X
keytab
Service	
Component	
X
keytab
Kerberos	is	used	to	
secure	the	
Components	in	the	
cluster.	Kerberos	
identities	are	
managed	via	
“keytabs”	on	the	
Component	hosts.
Principals	
for	the	
cluster	are	
managed	in	
the	KDC.
25 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Kerberos	+	Active	Directory
Page	25
Cross	Realm	Trust
Client
Hadoop	Cluster
AD	/	
LDAP KDC
Users: smith@EXAMPLE.COM
Hosts: host1@HADOOP.EXAMPLE.COM
Services: hdfs/host1@HADOOP.EXAMPLE.COM
User	Store
Use	existing	directory	
tools	to	manage	users
Use	Kerberos	tools	to	
manage	host	+	service	
principals
Authentication
26 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Background:	Principal	and	Keytab Generation	&	Distribution
1. User	provides	AD	Admin	Account	
credentials	to	Ambari
2. Ambari	connects	to	AD,	creates	
principals	(Service	and	Ambari)	needed	
for	cluster
3. Ambari	generates	keytabs for	the	
principals
4. Ambari	distributes	keytabs to	Ambari	
Server	and	cluster	hosts
5. Ambari	discards	the	AD	Admin	Account	
credentials	(optional)
Ambari
Server AD
1 2
4
3
5
HDP
Cluster
27 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Automated	Kerberos	Setup	with	Ambari	
à Wizard	driven	and	automated	Kerberos	
support
à Removes	cumbersome,	time	consuming	and	
error	prone	administration	of	Kerberos
à Works	with	existing	Kerberos	infrastructure,	
including	Active	Directory	to	automate	
common	tasks,	removing	the	burden	from	
the	operator:
• Add/Delete	Host
• Add	Service
• Add/Delete	Component
• Regenerate	Keytabs
• Disable	Kerberos
28 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Combine	Authentication	with		
Perimeter	Security
29 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Authentication—API	Security	with	Knox
• Eliminates SSH “edge node”
• Central API management
• Central audit control
• Service level authorization
• SSO - SAMLv2, Siteminder
and OAM
• LDAP and AD integration
• SSO for Hadoop UIs (Ranger,
Ambari..)
Incubated	and	led	by	Hortonworks,	
Apache	Knox	extends	the	reach	of	Hadoop	REST	API	without	
Kerberos	complexities
Integrated	with	existing	IdM
systems
Single,	simple	point	of	
access	for	a	cluster
Centralized		and	consistent	
secure	API	across	one	or	
more	clusters
• Kerberos Encapsulation
• Single Hadoop access point
• REST API hierarchy
• Consolidated API calls
• Multi-cluster support
30 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
REST	API
Hadoop
Services
What does “Perimeter Security” really mean?
Gateway
REST	API
Firewall
User
Firewall
required	at	
perimeter
(today)
Knox	Gateway	
controls	Hadoop	
REST	API	access	
through	firewall
Hadoop	
cluster	
mostly	
unaffected
Firewall	only	allows	
connections	through	
specific	hosts	&	ports	
forcing	requests	
through	Knox
Knox	doesn’t	
see	or	control	
what	happens	
inside	the	
cluster
3102/03/2018
SG DATA – DATA ACCESS
INSIDE THE CLUSTER
OUTSITE THE CLUSTER
SWEBHDFS
SOLR
KAFKA
OOZIE
APP
User / password
APP
Keytab / Jaas Conf
SSL
SSL
32 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Centralize	Administration	,	
Authorization	&	Auditing
33 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Ranger
• Central	audit	location	for	all	
access	requests
• Support	multiple	destination	
sources	(HDFS,	Solr,	etc.)
• Real-time	visual	query	interface
AuditingAuthorization
• Store	and	manage	encryption	keys
• Support	HDFS	Transparent	Data	
Encryption
• Integration	with	HSM
• Safenet LUNA
Ranger	KMS
• Centralized	platform	to	define,	administer	
and	manage	security	policies	consistently	
across	Hadoop	components
• HDFS,	Hive,	HBase,	YARN,	Kafka,	Solr,	
Storm,	Knox,	NiFi,	Atlas
• Extensible	Architecture
• Custom	policy	conditions,	user	context	
enrichers
• Easy	to	add	new	component	types	for	
authorization
34 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Ranger Architecture
HDFS
Ranger Administration Portal
HBase
Hive Server2
Ranger Audit Server
Ranger Plugin
HadoopComponentsEnterprise
Users
Ranger Plugin
Ranger Plugin
Legacy Tools and Data Governance
HDFS
Knox
NifI
Ranger Plugin
Ranger Plugin
SolrRanger Plugin
Ranger Policy Server Integration API
KafkaRanger Plugin
YARNRanger Plugin
Ranger PluginStorm Ranger Plugin Atlas
Solr
35 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Ranger	– ABAC	Model
⬢ ABAC	Model
⬢ Combination	of	the	subject,	action,	
resource,	and	environment	
⬢ Uses	descriptive	attributes:	AD	group,	
Apache	Atlas-based	tags	or	classifications,	
geo-location,	etc.
⬢ Ranger	approach	is	consistent	with	NIST	
800-162	
⬢ Avoid	role	proliferation	and	manageability	
issues
Ref	:	https://nvlpubs.nist.gov/nistpubs/specialpublications/NIST.sp.800-162.pdf
36 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
⬢ Comprehensive	coverage	across	Hadoop	
ecosystem	components
⬢ Plugins	for	components	resident	with	
component
⬢ Extensible	Plugin	Model:	plugin	for	
authorizing	other	sources	can	be	built
Apache	Ranger:	Comprehensive	Extensible	Authorization
37 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
⬢ Simple	Intuitive	UI	for	Policy	Editing	and	
Setup
⬢ Fine-grained	specificity	by	resource	type,	
user	context,	tags,	and	operation
⬢ Supports	Access,	Tag	Based,	Dynamic	Data	
Masking,	and	Row	Filtering	Policy	Types
Apache	Ranger	- Intuitive	and	Granular	Policy	Management
38 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Ranger	Audits	- Data	Access
⬢ Comprehensive	scalable	audit	logging		
⬢ Audits	for:
⬢ Resource	Access	Events	with	user	context
⬢ Policy	Edits/Creation/Deletion
⬢ User	session	information
⬢ Component	plugin	policy	sync	operations
39 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Row	Level	Security	in	Hive
R A N G E R
Control	Access	to	Rows	in	Hive	Tables	based	on	Context!
Goal:	Improve	reliability	and	robustness	of	HDP	by	providing	Row	
Level	Security	to	Hive	tables	and	reducing	surface	area	of	security	
system
⬢ Capabilities
– Restrict	data	row	access	based	on	
– user	characteristics	(e.g.	group	membership)	AND
– runtime	context	
⬢ Use	Cases:
v A	hospital	can	create	a	security	policy	that	allows	doctors	to	view	data	rows	only	
for	their	own	patients	
v A	bank	can	create	a	policy	to	restrict	access	to	rows	of	financial	data	based	on	the	
employee's	business	division,	locale	or	based	on	the	employee's	role	
v A	multi-tenant	application	can	create	logical	separation	of	each	tenant's	each	
tenant	can	see	only	its	data	rows.
⬢ Core	Technologies:	Ranger,	Hive
AT L A S
H I V E
40 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Dynamic	Data	Masking	of	Hive	Columns
R A N G E R
Protect	Sensitive	Data	in	real-time	with	Dynamic	Data	Masking/Obfuscation!
Goal:	Mask	or	anonymize	sensitive	columns	of	data	
(e.g.	PII,	PCI,	PHI)	from	Hive	query	output
⬢ Benefits
– Sensitive	information	never	leaves	database
– No	changes	are	required	at	the	application	or	Hive	layer
– No	need	to	produce	additional	protected	duplicate	
versions	of	datasets
– Simple &	easy	to	setup	masking	policies
⬢ Core	Technologies:	Ranger,	Hive
AT L A S
H I V E
4102/03/2018
SG DATA – RANGER USAGE
400
Users in Dev Env
10s
Technical Account in
Production
2000+
Ranger Rules
4202/03/2018
SG DATA – RANGER USAGE
43 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Protect	your	most	valuable	asset
44 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Data Protection in Hadoop
can be applied at three different layers
in HDP
Storage: encrypt data while it is at rest
Transparent	Data	Encryption	in	HDFS,	Ranger	KMS	+	HSM,	Partner	Products	
(HPE	Voltage,	Protegrity,	Dataguise)	
Transmission: encrypt data over the wire when it
leaves the cluster
SASL	(RPC)	and	TLS	(HTTP)
for	intracluster communication
Upon Access: apply restrictions when accessed
Ranger	(Dynamic	Column	Masking	+	Row	Filtering),	Partner	Masking	+	
Encryption	
Data	Protection
45 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Data Protection – Layered Approach
ÃEncryption of Data at Rest
– OS	Level	Encryption	(LUKS)
– Certified	Partners	for	volume	encryption	(e.g:	Vormetric (Thales)	Protegrity,	HPE	Voltage	Security)
– HDFS	TDE	file/folder	level	encryption	with	keys	managed	by	Ranger	KMS,	External	HSM	integration
ÃEncryption of Data on the Wire
– All	wire	protocols	can	be	encrypted	by	HDP	platform	
– Wire-level	encryption	enhancements	(SSL).
ÃGranular Data Protection
– Dynamic	Masking	+	Row	Filtering	for	Hive	with	Ranger
– Classification	Based	Security	with	Ranger	+	Atlas
– Element	level	encryption/masking	from	certified	partners	(HPE	Voltage,	Protegrity)
46 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Ranger	KMS
Transparent	Data	Encryption	in	HDFS
NN
A B
C D
HDFS	Client
A B
C D
A B
C D
DN DN DN
Benefits
v Selective	encryption	of	relevant	files/folders
v Prevent	rogue	admin	access to	sensitive	data
v Fine	grained	access	controls
v Transparent	to end	application	w/o	changes
v Ranger	KMS	integrated	to	external	HSM	
(Safenet Luna)	adding	to	reliability/security	of	
KMS
SafeNet-
Luna	HSM
47 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Integrate	Governance	&	Security
48 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
STRUCTURED
Atlas:	Open	Metadata	&	Governance	Services
TRADITIONAL
RDBMS
METADATA
MPP	
APPLIANCES
Kafka Storm
Sqoop
Hive
ATLAS
METADATA
HBase
RANGER
HDFS
Partners
Provides	metadata-driven	core	foundational	governance	
services	for	Hadoop	and	enterprise	data	ecosystem
Data	Lineage/Provenance
• Captures	data	lineage	across	components
Data	Classification
• Supports	classification	of	data	assets	using	tags	–
PII,	PHI,	PCI,	EXPIRES_ON,	CLAIMS,	
LIFE_INSURANCE
Metadata	Catalog	Search
• Free	text	search	on	metadata	
• Advanced	search	using	DSL
Integrations
• OOtB real-time	metadata	and	lineage	ingestion	
with	Hive,	Sqoop,	Storm/Kafka
• APIs	for	custom	metadata	ingestion
• Apache	Ranger	integration	for	classification	based	
security
Metadata	Repository
• Flexible	metamodel to	capture	technical,	business,	
operational	metadata	
• Out-of-box	models	for	Hive,	Storm,	Sqoop,	HDFS,	
Kafka,	HBase
• APIs	to	register	custom	models
49 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Background:	DGI	Community	becomes	Apache	Atlas
May
2015
Apache	
Atlas
Incubation
DGI	group
Kickoff	
Dec	
2014
Apr
2017
HDP	2.6/
Apache	0.8	
Release
Global	Financial
Company
*	DGI:	Data	Governance	Initiative
Aug
2016
HDP	2.5/
Apache	0.7
Foundation	
Release
Apache	0.8/HDP	2.6
• Simplified	Search	UI	
• Simplified	APIs
• Classification-based	security	for	HDFS,	
Kafka,	HBase
• Knox	SSO
• Performance/scalability	improvements
Apache	0.7.1/HDP	2.5.3
• High	availability	support
• LDAP	Authentication/Authorization
• Classification	based	security	for	Hive
• UI	Redesign
• #Committers	– 35
• Code	contributors	from
- Hortonworks,	IBM,	Aetna,			
Merck,	Target
50 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
High Level Architecture: 4 Key points
Type	System
Repository
Search	DSL
Bridge
Hive Storm
Solr Custom
REST	API
Graph	DB
Search
Kafka
Sqoop
Connectors
Messaging	Framework
3	REST	API
Modern,	flexible	access	
to	Atlas	services,	HDP	
components,	UI	&	
external	tools
1	Data	Lineage	
Only	product	that	
captures	lineage	
across	Hadoop	
components	at	
platform	level.	
4	Exchange
Leverage	existing	
metadata	/	models	by	
importing	it	from	
current	tools.			Export	
metadata	to	
downstream	systems
2	Agile	Data	Modeling:
Type	system	allows	
custom	metadata	
structures	in	a	
hierarchy	taxonomy
51 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Scalable:	Dynamic	Tag-based	Access	Policy
52 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Scalable:	Dynamic	Tag-based	Access	Policy
Key	Benefits:
• New	scalable	metadata	based	security	
paradigm
• Dynamic,	real-time	policy
• Active	protection	– fast	updates	to	changes
• Centralized	and	simple	to	manage	policy
53 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Tag	based	Policies	in	Ranger
R A N G E R
Control	Access	to	Resources	based	on	Classification (Tagging)
Goal:	Separation	of	resource-classification	from	access	
authorization	
⬢ Capabilities
– Authorize	data	access	based	on	resource	classification	(e.g.	
sensitive	data	such	as	PII,	PCI)	rather	than	resource	type	itself
⬢ Benefits:
v After	tagging,	authorization	for	tag	is	automatically	enforced	
=>	no	need	to	create	or	update	policies	for	the	resource	
v Single	authorization	policy	for	a	tag	across	various	Hadoop	
components
⬢ Core	Technologies:	Ranger	+	Atlas	(for	tagging)
⬢ Available	for:	Hive,	HDFS,	Yarn,	HBase,	Kafka,	Storm,	
Solr,	Knox
AT L A S H I V EH B A S EH D F S
S TO R M
AT L A SK A F K AYA R N
AT L A S
K N OX
S O L R
54 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Apache	Atlas:	Connectors	and	Ecosystem
Custom	
Integration
PartnerPartner
Apache	Atlas
RDBMS
Apache
Kafka
Pending:
55 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Lineage	
• Where	does	this	data	originate	from	(source/provenance)?
• Upstream	path:	Path	through	all	data	assets	and	processes	leading	up	to	current	data	asset
Impact
• How	is	this	data	being	used	?
• What	other	data	assets	(derivative/dependent)	does	this	impact?
• Downstream	path:	Path	through	all	data	assets	and	processes	leading	out	of	current	data	asset
Used	for	forensics	
• Impact	analysis
• Auditing	and	Compliance
Apache	Atlas	:	Lineage
56 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Apache	Atlas:	Lineage	and	Impact
57 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Apache	Atlas	Classification	:	use	case	– access	expiry
Data	expiration
• EXPIRES_ON classification	with	attribute	expiry_date
• tax_2009	table	tagged	with	EXPIRES_ON(expiry_date=2016/12/31)
• tax_2010	table	tagged	with	EXPIRES_ON(expiry_date=2017/12/31)
• Apache	Ranger	policies	use	the	attribute	to	block	access	after	expiry	date
58 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Apache	Atlas	Classification	:	use	case	– REFERENCE_DATA
Security	policy	enforcement	for	denying	updates	on	immutable	data	
assets
• REFERENCE_DATA classification	associated	with	immutable	hive_table eu_countries
• Apache	Ranger	policies	block	updates	on	the	table	for	all	users	except	admins
59 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Apache	Atlas	Classification:	use	case	– attribute	based	authorization
Data	quality
• Deny	access	to	analysts group	based	on	data	quality	threshold
60 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Apache	Atlas	Classification:	use	case	- hierarchy
Security	policy	enforcement	based	on	classification hierarchy
• Data	assets	classified	as	PII will	be	denied	for	all	contractors
• Data	assets	classified	as	FinancePII will	be	denied	for	anyone	not	in	
Finance	group	
• Data	assets	classified	as	VendorPII will	be	denied	for	anyone	not	in	
Partner	Manager	group	
• ..hence	contractors	will	be	denied	access	to	data	assets	classified	as	PII,	
FinancePII,	and	VendorPII
PII
Finance
PII
Vendor
PII
Deny:	contractors
Deny:	public
Except:	Finance
Deny:	public
Except:	PartnerManager
61 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Metadata Catalog Search :	Free	Text
Search	for	a	hive_table classified	as	‘PII’	and	name	starting	with	‘emp’
Filter	by
Data	Asset	type
Filter	by
Classification
Search	text
Wildcards:	emp*,	*dept*
Logical	expressions:	emp*	AND	*dept*
62 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Metadata	Catalog	Search	:	Advanced
Filter	by
Data	asset	type
Search	for	a	hive_table named	‘employees’		and	owner	‘hive’
DSL	search	with	SQL	like	syntax	
Select	columns	from	impressions table	in	raw database
hive_column where	table.name=‘impressions’	and	table.db.name =	‘raw’
DSL	query	string
6302/03/2018
SG DATA – JOURNEY TO DATA GOVERNANCE TOOLING
SG | Data
Catalog
2015 2017
Vendors POC
2018
- Data Quality
- Data Lineage
- Fetch Metastore
metadata
- Expose datasets in
single dashboard
SG | Data
CatalogAS A CENTRAL
PIECE CONNECTED
WITH EVERY TOOLS
64 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Wrap-up
65 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
Protecting	the	Elephant	in	the	Castle…..
Kerberos,	
Wire	Encryption
HDFS	Encryption
Apache	Ranger
Network	Segmentation,	
Firewalls	
LDAP/AD
Apache	Knox
66 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
HDFS
Typical	Flow	– SQL	Access	through	Beeline	client
HiveServer 2
A B C
Beeline
Client
67 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
HDFS
Typical	Flow	– Authenticate	through	Kerberos
HiveServer 2
A B C
KDC
Login into Hive using
AD password
Hive gets
Namenode
(NN) service
ticket
Hive creates
map reduce
using NN ST
Client gets
service ticket for
Hive
Beeline
Client
Active
Directory
68 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
HDFS
Typical	Flow	– Add	Authorization	through	Ranger
HiveServer 2
A B C
KDC
Hive gets
Namenode
(NN) service
ticket
Column level
access control,
auditing
Ranger
Beeline
Client
File level
access control
Active
Directory
Import
users/groups
from LDAP
Login into Hive using
AD password
69 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
HDFS
Typical	Flow	– Firewall,	Route	through	Knox	Gateway
HiveServer 2
A B C
KDC
Use Hive ST,
submit query
Hive gets
Namenode
(NN) service
ticket
Hive creates
map reduce
using NN ST
Ranger
Knox gets
service ticket for
Hive
Knox runs as proxy
user using Hive ST
Original
request w/user
id/password
Client gets
query result
Beeline
Client
Apache
Knox
Active
Directory
70 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
HDFS
Typical	Flow	– Add	Wire	and	File	Encryption
HiveServer 2
A B C
KDC
Use Hive ST,
submit query
Hive gets
Namenode
(NN) service
ticket
Hive creates
map reduce
using NN ST
Ranger
Knox gets
service ticket for
Hive
Knox runs as proxy
user using Hive ST
Original
request w/user
id/password
Client gets
query result
SSL
Beeline
Client
SSL SASL
SSL SSL
Apache
Knox
Active
Directory
71 ©	Hortonworks	Inc.	2011	– 2017.	All	Rights	Reserved
HDP	Security:	Comprehensive,	Complete,	Extensible
Data	Protection
Protect	data	at	rest	and	in	motion
Audit
Maintain	a	record	of	data	access
Authorization
Provision	access	to	data
Authentication
Authenticate	users	and	systems
Administration
Central	management	and	consistent	security
Single	administrative	console	to	set	policy	across	the	
entire	cluster:	Apache	Ranger
Authentication	for	perimeter	and	cluster;	integrates	
with	existing	Active	Directory	and	LDAP	solutions:	
Kerberos		|		Apache	Knox
Consistent	authorization	controls	across	all	Apache	
components	within	HDP:	Apache	Ranger
Record	of	data	access	events	across	all	components	
that	is	consistent	and	accessible:	Apache	Ranger		
Secure	data	in	motion	and	data	at	rest:	HDFS	TDE	w/	
Ranger	KMS	+	HSM,	Ranger	Data	Masking	+	Row	
Filtering,	Wire	encryption	+	Partner	Solutions
72 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
What’s	next	!?
73 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
Hortonworks	Platform	Security	– What	Do	We	Do	Today?
• We	closely	follow	and	track	CVE’s	via	Apache	CVE	process	e.g.	
• https://www.cvedetails.com/vulnerability-list/vendor_id-45/product_id-33597/Apache-Ranger.html
• https://cwiki.apache.org/confluence/display/RANGER/Vulnerabilities+found+in+Ranger
• Our	security	SME	team	monitors	CVEs	and	files	appropriate	JIRAs	against	specific	components	for	
remediation	via	internal	HDPSEC	project	(not	publicly	exposed	due	to	sensitive	nature	of	info)
• We	run	Coverity code	scans	on	Apache	projects	frequently	and	remediate	any	critical	vulnerabilities	found	
• We	have	a	process	for	customers	to	report	critical	security	vulnerabilities,	we	create	security	patches	as	
needed
• Release	notes	in	our	public	documentation	contain	a	section	on	CVE	or	vulnerabilities	fixed	in	a	release
• Tech	Alerts	are	sent	to	customers	upon	CVE	fixes	with	patch	instructions,	upgrade	vehicles	and/or	
maintenance	releases
• HP	Fortify	scans	are	built	into	the	release	engineering	process	for	each	release
• Dynamic	application	testing	such	as	pen	testing	is	done	via	various	tools	for	each	release	(BurpSuite,	Nikto,	
nmap,	ZAP	Scanning)
7402/03/2018
SG DATA – CUSTOM MIDDLEWARE
SG DATA CONSOLE
75 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
Questions?
76 ©	Hortonworks	Inc.	2011–2018.	All	rights	reserved.
Thank	you

More Related Content

What's hot

Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
Building Serverless ETL Pipelines
Building Serverless ETL PipelinesBuilding Serverless ETL Pipelines
Building Serverless ETL PipelinesAmazon Web Services
 
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...Amazon Web Services Korea
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
실시간 스트리밍 분석  Kinesis Data Analytics Deep Dive실시간 스트리밍 분석  Kinesis Data Analytics Deep Dive
실시간 스트리밍 분석 Kinesis Data Analytics Deep DiveAmazon Web Services Korea
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...Amazon Web Services Korea
 
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...Amazon Web Services Korea
 
DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환
DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환
DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환Amazon Web Services Korea
 
아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...
아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...
아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...Amazon Web Services Korea
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSAmazon Web Services
 
KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...
KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...
KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...Amazon Web Services Korea
 
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWSAmazon Web Services Korea
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on AzureTrivadis
 

What's hot (20)

Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Building Serverless ETL Pipelines
Building Serverless ETL PipelinesBuilding Serverless ETL Pipelines
Building Serverless ETL Pipelines
 
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
실시간 스트리밍 분석  Kinesis Data Analytics Deep Dive실시간 스트리밍 분석  Kinesis Data Analytics Deep Dive
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...
 
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...
 
AWS Black Belt Tips
AWS Black Belt TipsAWS Black Belt Tips
AWS Black Belt Tips
 
2020.02.06 우리는 왜 glue를 버렸나?
2020.02.06 우리는 왜 glue를 버렸나?2020.02.06 우리는 왜 glue를 버렸나?
2020.02.06 우리는 왜 glue를 버렸나?
 
DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환
DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환
DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환
 
아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...
아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...
아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...
KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...
KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...
 
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 

Similar to How to secure your data lake

Data on the Move - DataCon DC
Data on the Move - DataCon DCData on the Move - DataCon DC
Data on the Move - DataCon DCJoseph Witt
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceThiago Santiago
 
Reinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital TransformationReinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital TransformationDenodo
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureMats Johansson
 
Powering the Future of Data  
Powering the Future of Data	   Powering the Future of Data	   
Powering the Future of Data  Bilot
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...DataWorks Summit
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataMats Johansson
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopMats Johansson
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Hortonworks
 
Achieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingAchieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingDataWorks Summit
 
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...DataWorks Summit
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...DataWorks Summit
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash CourseDataWorks Summit
 
Hadoop Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ Hadoop Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ Daniel Madrigal
 

Similar to How to secure your data lake (20)

Data on the Move - DataCon DC
Data on the Move - DataCon DCData on the Move - DataCon DC
Data on the Move - DataCon DC
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data Science
 
Reinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital TransformationReinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital Transformation
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
 
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
 
Powering the Future of Data  
Powering the Future of Data	   Powering the Future of Data	   
Powering the Future of Data  
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with Hadoop
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
 
Achieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingAchieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturing
 
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
 
Hadoop Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ Hadoop Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

How to secure your data lake