SlideShare a Scribd company logo
1 of 43
Download to read offline
Metadata Store: Generalized
Entity Database for Intelligence
Services and Machine Learning
Trend	Micro	/	SPN	
@	DataCon.tw	2018
SPN	and	the	Speakers	
•  Trend	Micro	Smart	Protec?on	Network	(SPN)	Team	
–  Nearly	10	years	PROD	experience	on	Hadoop	
–  Big-data,	Cloud	(AWS),	and	Big-data	+	Cloud	
2	
Jeff	Hung	
	
@github	
Sco,	Miao	
	
@github
Serving Layer	
•  Big-data	paradigm	shiQ	to	AWS	for	5	years	
The	Journey	to	the	Cloud	
DataLake
Data
Ingestion
EMR, Athena,
Glue, Batch
Batch
Processing
S3, Lambda,
Step Functions
Ad-hoc
Processing
SQS, SNS,
Kinesis
Streaming
Processing
Data Access	
API
Server
Serving	Layer	as-a	Service	
	=	Metadata	Store
Principles	of	Serving	Layer	Design	
•  Storage	Requirements:	
–  Query	to	get	result	instantly	
–  Query	to	get	result	at	once	
•  Define	Schema	by	Needs	
–  Row	=	all	info	to	show	on	page	
–  Op?mize	by	serving	criteria	
4	
Serving Layer	 Data Access	
View 1
View 2
Storage 1
Storage 2
Challenge: 	Fulfill	different	performance	and	schema	
requirements	for	integrated	services
Data	Model	in	API	World	
•  RESTful	API	Design	123	
1.  Use	Nouns	in	URI	
2.  Use	HTTP	Verbs	
3.  Return	Object	
4.  Advanced	Query	
5	
Attribute
« Name »
Value
Entity
« Type »
Key
1 *
An	Object
Query by specific attribute?
Query by range? Regex?
Common	Architectural	Pacern	
6	
Elastic
Search
Aurora
HDFS
 HBase
DynamoDB
Cloud
Search
SolrCloud
 MySQL
Dimension
Tables
Application 1
Application 3
 Application 4
Application 2
Fact Tables
Star	Schema	
•  A	schema	design	widely	used	in	data	warehousing	
7	
TransacEonal	data	–	measurements	
or	metrics	for	a	specific	event	
DescripEve	a,ributes	–	characteris?cs	
to	describe	and	select	the	fact	data
vs.	
8	
Single	Key	Query	ç	
Fast	&	Instant	Retrieval	ç	
Generic	but	Fixed	ç	
Peta-byte	Scale	ç	
Single	Source	of	Truth	ç	
è	Allow	Complex	Query	
è 	Range	Query	may	be	Slower	
è	Specific	and	Op?mized	
è 	Limited	to	hundreds	of	GBs	
è 	Mul?ple	Different	Indexes	
Dimension	Table	Fact	Table	 Propaga?on	
Connect	the	two	storage	paradigms	by	auto	propagaEon	
to	achieve	Serving	Layer	as-a	Service
Dimension Table
Engine:
MySQL (RDS)
Dimension Table
Engine:
Elasticsearch
Dimension Table
Engine:
CSV on S3
High	Level	Architecture	
9	
Propagator
Dynamo DB Streams
Random Writes
Bulk Writes
(Eventually Consistent)
Dimension Table
Schema
Propagation Rule
Configurations
Fact Tables
(Dynamo DB)
Metadata	Store	Applica?ons	
•  Single	Source	of	Truth	
–  Centrally	stores	all	threat	intelligence	
–  Ensure	retrieving	latest	en?ty	acributes	
•  Feature	Store	and	Extractor	
–  Flexible	feature	selec?on	and	prepara?on	
–  Feature	extrac?on	by	auto	propaga?on	
•  Fast	Service	ConstrucEon	
–  Configurable	serving	layer	storage	
–  Standardized	data	access	API	
10	
Service
ML
Data
#TrendInsight	
Internals	design	briefing	
FACT	TABLE	DESIGN	
PROPAGATOR	DESIGN	
DIMENSION	TABLE	DESIGN	
RE-IMPORT	DESIGN	
11
#TrendInsight	
Internals	design	briefing	
FACT	TABLE	DESIGN	(DYNAMODB	+	DYNAMODB	STREAMS)	
12
Requirements	for	Fact	table
•  A	flexible	data	schema	design	on	fact	table	
–  To	store	metadata	for	various	type	of	en??es	
–  To	store	values	for	various	type	of	en??es	
•  An	efficient	write	for	acributes	for	a	specific	en?ty	to	
Peta-bytes	of	data	volume	
•  User	can	also	specify	a	?mestamp	for	acributes	(versioning)	
•  An	efficient	read	for	en??es	by	given	keys	from	Peta-
bytes	of	data	volume	
–  Other	criteria	including	?mestamp,	acribute	name,	acribute	
value	
•  Reliability,	Availability,	opera?ons,	etc	
–  AWS	managed	service	is	becer	
–  AWS	is	taking	care	of	these	–ili?es	for	us
Fact table	
http://www.slideshare.net/AmazonWebServices/bdt310-big-data-architectural-patterns-and-best-practices-on-aws
	
DynamoDB stream for propagations	
RDS triggers for propagations
A	world’s	shortest	intro.	for	DynamoDB	
•  DynamoDB	is	built	to	support	workloads	of	any	
scale	with	predictable,	low-latency	response	
?mes.	
•  Yet	another	distributed	noSQL	database	
–  Key	value	and	document	store	
–  Is	AP	in	CAP	theorem,	such	as	Cassandra	(HBase	is	CP)
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithTables.html
http://wikibon.org/wiki/v/21_NoSQL_Innovators_to_Look_for_in_2020
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Partitions
A	flexible	data	schema	design	on	fact	table
•  To	store	metadata	for	various	type	of	en??es	
•  To	store	values	for	various	type	of	en??es	
Entity
« File »
aaa…
Attribute
« Filename »
Src_1
2018/9/12
Value
Attribute
« Detection »
Src_1
2018/9/12
Value
Attribute
« FirstSeen »
Src_2
2018/8/17
Value
Metadata of an entity: File
Data of an entity: File
DynamoDB
Fact table	
Entity
« Url »
key
Entity
« Url »
key
Entity
« Url »
key
Entity
« Url »
key
Entity
« Url »
abc…
A	flexible	data	schema	design	on	fact	
table
Key (partition
key)
Create_ts (sort
key)
Filename
 Detection
 FirstSeen
 Others…
aaa…:file
 2018/9/12
(10-digits
timestmap)
{“src”: “src_1”,
“value”: “…”}
{“src”: “src_1”,
“value”: “…”}
aaa…:file
 2018/8/17
(10-digits
timestmap)
{“src”: “src_2”,
“value”: “…”}
abc…:url
 …
 …
efg…:email
 …
 …
Fact Table Content
Example	data	in	DynamoDB
An	efficient	write	for	acributes	for	a	specific	
en?ty	to	Peta-bytes	of	data	volume
// InputAttributes API
POST/metadatastore/v1/entities/aaa…/attributes/
src_1
{
"entity_type": "file",
"attributes": [
{
"name": "original_filename",
"value": "wd.sys“ # current timestamp
},
{
"name": "download_from",
"value": "https://www.microsoft.com/zh-tw/",
"timestamp": 1470280448 # set 10-digits
}
]
}
p1
 p2
 pn
●●●	
	
1. Calculate hash based on entity
ID (partition key)
2. Split items based on
timestamp (sort key)
3. Insert item#1
4. Insert item#2
5. Items in one partition are
sorted by timestamp (sort key)
1.	2.	
3.	
4.	
InputAttributes API: POST /metadatastore/v1/
entities/abc…/attributes/src_1
// …
Fact table	5.
DynamoDB	Streams
•  Captures	data	modifica?on	events	in	DynamoDB	tables	
•  In	near	real	?me	(in	secs),	and	in	the	order	that	the	events	
occurred	
https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-aws-integrations.html#es-aws-integrations-dynamodb-es
An	efficient	read	for	en??es	by	given	
keys	from	Peta-bytes	of	data	volume
// GetEntityList API
GET /metadatastore/v1/entities
{
"entities": [
{
"key": “aaa...",
"type": "file"
},
{
"key": “abc…",
"type": "file"
}
],
"attributes": [
{
"source": “src_1",
"name": "original_filename"
},
{
"src": “src_1",
"name": "download_from"
},
]
}
// pseudo code for DynamoDB query
resultSet = //…
For entityKey in entityKeys
SELECT original_filename, download_from
FROM fact_table
WHERE BY KEYS
# find items in range
partitionKey = entityKey
AND sortKey <= timestamp
FILTER
# filter by source
original_filename.src = “src_1”
AND download_from.src = “src_1”
// …
rs_tmp = // get from DynamoDB SDK
for r_tmp in rs_tmp
for f in needField
if r_tmp.exists(f) then
resultSet.add(r_tmp[f])
needFiled.rm(f)
fi
done
done
Done
Return resultSet
pn
●
●
●
Items
Fact table	
1. Transform API to DynamoDB
queries
2. Get items by partitionKey and
sortKey
3. Project attributes from items
4. Filter items by source
5. Arrange DynamoDB results to API
results
6. Return API results
1.	
2.	
4.	
5.	
6.	
3.
An	efficient	read	for	en??es	by	given	
keys	from	Peta-bytes	of	data	volume
// GetEntityList API
GET /metadatastore/v1/entities
{
"entities": [
{
"key": “aaa...",
"type": "file"
},
{
"key": “abc…",
"type": "file"
}
],
"attributes": [
{
"source": “src_1",
"name": "original_filename"
},
{
"src": “src_1",
"name": "download_from"
},
]
}
// pseudo code for DynamoDB query
resultSet = //…
For entityKey in entityKeys
SELECT original_filename, download_from
FROM fact_table
WHERE BY KEYS
# find items in range
partitionKey = entityKey
AND sortKey <= timestamp
FILTER
# filter by source
original_filename.src = “src_1”
AND download_from.src = “src_1”
// …
rs_tmp = // get from DynamoDB SDK
for r_tmp in rs_tmp
for f in needField
if r_tmp.exists(f) then
resultSet.add(r_tmp[f])
needFiled.rm(f)
fi
done
done
Done
Return resultSet
 pn
●
●
●
Items
Fact table	
1. Transform API to DynamoDB
queries
2. Get items by partitionKey and
sortKey
3. Project attributes from items
4. Filter items by source
5. Arrange DynamoDB results to API
results
6. Return API results
1.	
2.	
4.	
5.	
6.	
3.
#TrendInsight	
Internals	design	briefing	
PROPAGATORS	DESIGN		
(DYNAMODB	STREAMS	+	AWS	KINESIS	CLIENT	LIBRARY)	
23
Write	en??es	coordinates	to	Dimension	
table
24	
A dimension table	
1 .Writes
Type: file
Key: abc…
Attr.: detection
Value:
“Ransomware”
…

2 .DynamoDB
streams to
Propagators
3 .Transform and writes to
a dimension table
Metadata Store	
user	
Propagators
Propaga?on	Input	from	DynamoDB	
streams	
•  Top-level	elements:	origin/change/edited,	key/type	
•  2nd-level	elements:	those	as	listed	in	inputs	sec?on	
25	
Replace these
attributes by
those in "change”
section.
✓
✗
Propaga?on	Output	to	Dimension	table	
e.g.	Elas?csearch	
•  For	input	into	ES	
–  ES	document	(JSON)	
–  For	searching	en??es		
By	extra	coordinates	
•  Primary	column	
–  To	iden?fy	same	document	in	ES	for	updates	
–  _id	in	Elas?csearch	
26	
{
"_id":
"b8e76297d0bfbf889c38cff3e80c1e14de9f7a18",
"rescan_decision": "Friday Intel...",
"dump_mip2scan": "NEW VERISIGN DDOS ...",
"file_census_external": 91232,
"received_timestamp": "2016-09-23T16:19:35"
}
output:
engine: ...
columns:
- name: _id
rule: .key
primary: yes
entity_type: file
examples: ...
summary: File SHA1
Means this column
is primary column
jq	–	like	sed	for	JSON	
•  Lightweight/flexible	command-line	JSON	processor	
jq	'.foo'	
Input	 {"foo":	42,	"bar":	"less	interesting	data"}	
Output	 42	
jq	'{user,	title:	.titles[]}'	
Input	 {"user":"stedolan","titles":["JQ	Primer",	"More	JQ"]}	
Output	 {"user":"stedolan",	"title":	"JQ	Primer"}	
{"user":"stedolan",	"title":	"More	JQ"}	
jq	'map_values(.+1)'	
Input	 {"a":	1,	"b":	2,	"c":	3}	
Output	 {"a":	2,	"b":	3,	"c":	4}	
$	cat	input.json	|	jq	'.foo'	
42
Use	jq	in	Metadata	Store	
•  Metadata	Store	needs	a	small	language	–	jq	
–  One	or	few	lines	which	are	easy	to	configure	
–  Like	how	regular	expressions	are	used	everywhere	
•  As	propaga?on	rules:	
–  Extract	value:	from	complex	acribute	input	
–  Transform	data:	shape	it	to	fit	dimension	table	
•  As	input	validators:	
–  Invalid	if	transformed	to	(empty,	null,	or	false)	
–  Loose	asser?ons	instead	of	strict	schema	check	
28
Internals	among		
DynamoDB	->	DynamoDB	streams	->	Propagators
29	
p1
p2
p3
DynamoDB	
s1
s2
s3
DynamoDB
Streams	
Worker_1
App: file
t_1 for s1
t_2 for s3
ec2_1	
Worker_2
App: url
t_1 for s2
t_2 for s3
Worker_3
App: file
t_1 for s2
ec2_2	
Worker_4
App: url
t_1 for s1
Dimension
table
AWS
Kinesis
 AWS Kinesis
Client Library
configs
S3
#TrendInsight	
Internals	design	briefing	
DIMENSION	TABLE	DESIGN	(AWS	ELASTICSEARCH)	
30
Requirements	for	Dimensional	table
•  Users	can	choose	various	technologies	
–  To	fulfill	their	different	needs	
–  Currently	only	support	AWS	Elas?csearch	
•  Why	search	service	comes	first	?	
–  A	search	service	can	help	users	to	find	the	interested	
en??es	with	various	coordinates	not	only	en?ty	keys	
•  Reliability,	Availability,	opera?ons,	etc	
–  Efficient	reads	from	Tera-bytes	of	data	size	
–  AWS	Elas?csearch	is	a	managed	service	
–  Its	support	is	becer	than	AWS	Cloudsearch	(Apache	Solr)	
–  Kibana	built-in
Dimension table	
http://www.slideshare.net/AmazonWebServices/bdt310-big-data-architectural-patterns-and-best-practices-on-aws
•  Elas?csearch	is	a	search	engine	based	on	Lucene.	It	
provides	a	distributed,	mulEtenant-capable	full-
text	search	engine	with	an	HTTP	web	interface	and	
schema-free	JSON	documents.	Elas?csearch	is	
developed	in	Java	and	is	released	as	open	source	
under	the	terms	of	the	Apache	License.
33	https://en.wikipedia.org/wiki/Elasticsearch
Write	en??es	coordinates	to	ES
34	
AWS
Elasticsearch	
1 .Writes
Type: file
Key: abc…
Attr.: detection
Value:
“Ransomware”
…

2 .DynamoDB
streams to
Propagators
3 .Transform to ES
docs and upsert to ES
Doc id: abc…
{
detection:
“Ransomware”
…
}
Metadata Store	
user
Read	en??es	based	on	coordinates	from	
ES
35	
AWS
Elasticsearch	
1 .Query file
entities with
detection
“Ransomware”

Metadata Store	
2 .Return key
“abc…” for file
entity

3 .Get file entity by
key “abc…”

4 .Return file
entity with key
“abc…”

user
Study	metrics	on	Kibana	in	visualized	
manner
36	
AWS
Elasticsearch	
Metadata Store	
What kind of
sourcing for file ?
How many of them ?
etc…
user
#TrendInsight	
Internals	design	briefing	
REIMPORT	(JOURNAL	LOGS	+	AWS	EMR	+	PROPAGATORS)	
37
Reimport	
•  Blue/green	strategy	to	alter	dimension	schema	
–  Works	on	all	scenarios	
–  Other	lightweight	and	more	restric?ve	alterna?ves		
•  such	as	Elas?csearch	re-index	api	
•  Two-phase	reimport	process	
–  Phase	1:	reimport	via	Map/Reduce	
–  Phase	2:	reimport	via	2nd	propagator	
•  New	components	to	enable	re-import	
–  Logstash:	keep	DynamoDB	Stream	Journal	Log	
–  Map/Reduce:	run	propaga?on	logic	in	batch	
38
Overview	for	re-import
39	
AWS Elasticsearch	
Metadata Store	
Admin	
AWS EMR	
Logstash	
1 .Data keeps to
input

 2 .Streams also
connect to Logstash
3 .Dump journal logs
to S3
4 .Start re-import
5 .Launch AWS
EMR for re-import
MR jobs

6 .Run MR jobs
Input: journal logs
Output: ES docs
Reimport	Process	
40	
D.Table
(Elasticsearch)
Start
Re-import
Journal Log
(S3)
DynamoDB
Stream
Fact Table
(DynamoDB)
Input
Attributes
Map/Reduce
LogStash
Propagator
Propagator’
Propagator’’
t
#TrendInsight	
Wrap	Up	
41
42	
•  En?ty	Model	Storage	
Design	on	DynamoDB	
•  Op?mized	for	fast	read	
by	keys	with	versioning	
•  Configurable	propaga?on	
via	Jq	as	ETL	Language	
•  Scalable	implementa?on	
with	Kinesis	Client	Library	
•  Elas?csearch	as	pilot	
dimension	table	engine	
•  Support	Alter	Table	for	
development	needs	
•  Common	serving	layer	
architectural	pacern	
•  Fact	à	dimension	tables	
by	auto	propaga?on	
Metadata Store
We	are	hiring!!	
•  Data	Engineer	
•  Data	Scien?st	
•  Cloud	Engineer	
We	have...	
•  Lots	of	real	world	data	to	process	
•  Lots	of	real	world	ML	challenges	
hcp://j.mp/trendmicro-tw-jobs	
43

More Related Content

What's hot

Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit
 
What’s new in Apache Spark 2.3
What’s new in Apache Spark 2.3What’s new in Apache Spark 2.3
What’s new in Apache Spark 2.3DataWorks Summit
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationDataWorks Summit
 
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...confluent
 
Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Brendan Bouffler
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101Data Con LA
 
Spark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynoteSpark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynoteDatabricks
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data ArchitecturesLynn Langit
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionDataWorks Summit
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server ProLynn Langit
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Svccg nosql 2011_v4
Svccg nosql 2011_v4Svccg nosql 2011_v4
Svccg nosql 2011_v4Sid Anand
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataDataWorks Summit/Hadoop Summit
 
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Databricks
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnDatabricks
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 

What's hot (20)

Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
 
What’s new in Apache Spark 2.3
What’s new in Apache Spark 2.3What’s new in Apache Spark 2.3
What’s new in Apache Spark 2.3
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
 
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
 
Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
Spark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynoteSpark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynote
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
Hyun joong
Hyun joongHyun joong
Hyun joong
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server Pro
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Svccg nosql 2011_v4
Svccg nosql 2011_v4Svccg nosql 2011_v4
Svccg nosql 2011_v4
 
On Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and AmbariOn Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and Ambari
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
 
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDB
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 

Similar to [DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligence Services and Machine Learning

IoT Toolkit and the Smart Object API - Architecture for Interoperability
IoT Toolkit and the Smart Object API - Architecture for InteroperabilityIoT Toolkit and the Smart Object API - Architecture for Interoperability
IoT Toolkit and the Smart Object API - Architecture for InteroperabilityMichael Koster
 
Iot Toolkit and the Smart Object API - Architecture for Interoperability
Iot Toolkit and the Smart Object API - Architecture for InteroperabilityIot Toolkit and the Smart Object API - Architecture for Interoperability
Iot Toolkit and the Smart Object API - Architecture for InteroperabilityMichael Koster
 
IoT Toolkit and the Smart Object API - Architecture for Interoperability
IoT Toolkit and the Smart Object API - Architecture for InteroperabilityIoT Toolkit and the Smart Object API - Architecture for Interoperability
IoT Toolkit and the Smart Object API - Architecture for InteroperabilityMichael Koster
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudScott Miao
 
Spark Kafka summit 2017
Spark Kafka summit 2017Spark Kafka summit 2017
Spark Kafka summit 2017ajay_ei
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
Spring data presentation
Spring data presentationSpring data presentation
Spring data presentationOleksii Usyk
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksAnyscale
 
SPCA2013 - SharePoint Nightmares - Coding Patterns and Practices
SPCA2013 - SharePoint Nightmares - Coding Patterns and PracticesSPCA2013 - SharePoint Nightmares - Coding Patterns and Practices
SPCA2013 - SharePoint Nightmares - Coding Patterns and PracticesNCCOMMS
 
Yaetos_Meetup_SparkBCN_v1.pdf
Yaetos_Meetup_SparkBCN_v1.pdfYaetos_Meetup_SparkBCN_v1.pdf
Yaetos_Meetup_SparkBCN_v1.pdfprevota
 
Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4GraphAware
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
Data analytics master class: predict hotel revenue
Data analytics master class: predict hotel revenueData analytics master class: predict hotel revenue
Data analytics master class: predict hotel revenueKris Peeters
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...Chester Chen
 
Scaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudScaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudChangshu Liu
 
Punta Dreaming by Luciano Straga #pd17 - Punta del Este, Uruguay
Punta Dreaming by Luciano Straga #pd17 - Punta del Este, UruguayPunta Dreaming by Luciano Straga #pd17 - Punta del Este, Uruguay
Punta Dreaming by Luciano Straga #pd17 - Punta del Este, UruguayLuciano Straga
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformEva Tse
 
High quality ap is with api platform
High quality ap is with api platformHigh quality ap is with api platform
High quality ap is with api platformNelson Kopliku
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 

Similar to [DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligence Services and Machine Learning (20)

IoT Toolkit and the Smart Object API - Architecture for Interoperability
IoT Toolkit and the Smart Object API - Architecture for InteroperabilityIoT Toolkit and the Smart Object API - Architecture for Interoperability
IoT Toolkit and the Smart Object API - Architecture for Interoperability
 
Iot Toolkit and the Smart Object API - Architecture for Interoperability
Iot Toolkit and the Smart Object API - Architecture for InteroperabilityIot Toolkit and the Smart Object API - Architecture for Interoperability
Iot Toolkit and the Smart Object API - Architecture for Interoperability
 
IoT Toolkit and the Smart Object API - Architecture for Interoperability
IoT Toolkit and the Smart Object API - Architecture for InteroperabilityIoT Toolkit and the Smart Object API - Architecture for Interoperability
IoT Toolkit and the Smart Object API - Architecture for Interoperability
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
Spark Kafka summit 2017
Spark Kafka summit 2017Spark Kafka summit 2017
Spark Kafka summit 2017
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Spring data presentation
Spring data presentationSpring data presentation
Spring data presentation
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
SPCA2013 - SharePoint Nightmares - Coding Patterns and Practices
SPCA2013 - SharePoint Nightmares - Coding Patterns and PracticesSPCA2013 - SharePoint Nightmares - Coding Patterns and Practices
SPCA2013 - SharePoint Nightmares - Coding Patterns and Practices
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Yaetos_Meetup_SparkBCN_v1.pdf
Yaetos_Meetup_SparkBCN_v1.pdfYaetos_Meetup_SparkBCN_v1.pdf
Yaetos_Meetup_SparkBCN_v1.pdf
 
Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Data analytics master class: predict hotel revenue
Data analytics master class: predict hotel revenueData analytics master class: predict hotel revenue
Data analytics master class: predict hotel revenue
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
Scaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudScaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in Cloud
 
Punta Dreaming by Luciano Straga #pd17 - Punta del Este, Uruguay
Punta Dreaming by Luciano Straga #pd17 - Punta del Este, UruguayPunta Dreaming by Luciano Straga #pd17 - Punta del Este, Uruguay
Punta Dreaming by Luciano Straga #pd17 - Punta del Este, Uruguay
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
 
High quality ap is with api platform
High quality ap is with api platformHigh quality ap is with api platform
High quality ap is with api platform
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 

Recently uploaded

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Recently uploaded (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 

[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligence Services and Machine Learning