SlideShare a Scribd company logo
Let’s think about how 
we approach projects
3
4
5
6
7
8
9
10
Don’t worry
12
Crossdata 
o A new technology that: 
• Is not limited by the underlying datastore capabili9es 
• Leverages Spark to perform non-­‐na9vely supported 
opera9ons 
• Supports batch and streaming queries 
• Supports mul9ple clusters and technologies 
#BDS1413
Learn to use Stratio Crossdata 
Developing your first connector 
14 
Daniel Higuero 
dhiguero@stratio.com 
@dhiguero 
Alvaro Agea 
alvaro@stratio.com 
@alvaroagea 
#BDS14
• Crossdata architecture 
• Crossdata Connectors 
• Contest 
Agenda 
#BDS1415
Our architecture 
#BDS1416
Connecting to the outside world 
o Crossdata defines an IConnector extension interface 
o User can easily add new connectors to support 
• Different datastores 
• Different processing engines 
• Different versions 
o Where each connector defines its capabili9es 
17 
Our planner will choose the best connector for each query 
#BDS14
Query execution 
18 
Parsing 
Valida6on 
Planning 
Execu6on 
datastore 
Connector1 
Connector2 
Connector3 
Our planner will choose the best connector for each query 
#BDS14
Multi-cluster support 
o Stra9o Crossdata offers the possibility of accessing a single 
catalog across a set of datastores. 
• Mul9ple clusters can coexist to op9mize plaOorm 
performance 
§ E.g., produc9on cluster, test cluster, write-­‐op9mized 
cluster, read-­‐op9mized cluster, etc. 
• A table is saved in a unique datastore 
#BDS1419
Logical and physical mapping 
20 
SELECT 
* 
FROM 
app.users; 
Users 
table 
Test 
table 
old_users 
table 
App 
catalog 
C* 
Produc6on 
M 
development 
Other 
datastores 
#BDS14
Metadata 
Management 
21
Metadata in the era of Schemaless NoSQL datastores 
o Some datastores are schemaless but our applica9ons are not! 
• Flexible schemas vs Schemaless 
• Crossdata provides a Metadata manager that stores 
schemas for any datasource 
§ Remember ODBC and those BI tools 
? 
1010010101010 
1010110101010 
1111010001111 
001000 
#BDS1422
Metadata management 
23 
Connector 
C* 
produc6on 
Metadata 
Store 
Infinispan 
Metadata 
Manager 
2 
Updated 
metadata 
informa6on 
is 
maintained 
among 
Crossdata 
servers 
using 
Infinispan 
If 
the 
connector 
does 
not 
support 
metadata 
opera6ons 
those 
are 
skipped 
1 
2 
#BDS14
ODBC/JDBC 
24
Stratio Crossdata ODBC/JDBC 
o Well-­‐known interface standard (for BI tools, external apps, …) 
o We have implemented it using Simba SDK 
o It opens the full poten9al of Stra9o Crossdata to the external 
world 
o Currently tested with Tableau, Qlikview and MS Excel 
25 
One ODBC/JDBC for all datastores! 
#BDS14
Connectors 
26
Crossdata Connectors 
o The Crossdata core is abstracted of the inner workings of each 
Connectors. 
• Common IConnector interface. 
• Use of XML manifest to define datastore and connector 
capabilies 
o Each connector may access different datastores 
o Each connector supports many clusters of the same datastore 
technology 
#BDS1427
Crossdata connectors 
ConnectorApp 
ConnectorApp 
ConnectorApp 
Connector0 
Datastore 
0 
Crossdata 
Server 
Connector1 
Connector2 
Datastore 
1 
Datastore 
2 
#BDS1428
Connector interface 
o The ConnectorApp abstracts the 
communica9on with the server 
using AKKA. 
• No worries about transferring 
data, status, etc. 
• It takes an implementa9on of 
IConnector and launches the 
required actors 
ConnectorApp 
IConnector 
IConnectorImpl 
Datastore 
#BDS1429
IConnector 
public 
interface 
IConnector 
{ 
/** 
* 
Get 
the 
name 
of 
the 
connector. 
* 
@return 
A 
name. 
*/ 
String 
getConnectorName(); 
/** 
* 
Get 
the 
names 
of 
the 
datastores 
supported 
by 
the 
connector. 
* 
Several 
connectors 
may 
declare 
the 
same 
datastore 
name. 
* 
@return 
The 
names. 
*/ 
String[] 
getDatastoreName(); 
#BDS1430
IConnector (II) 
/** 
* 
Initialize 
the 
connector 
service. 
* 
@param 
configuration 
The 
configuration. 
* 
@throws 
InitializationException 
If 
the 
connector 
initialization 
fails. 
*/ 
void 
init(IConfiguration 
configuration) 
throws 
InitializationException; 
/** 
* 
Connect 
to 
a 
datastore 
using 
a 
set 
of 
options. 
* 
@param 
credentials 
The 
required 
credentials 
* 
@param 
config 
The 
cluster 
configuration. 
* 
@throws 
ConnectionException 
If 
the 
connection 
could 
not 
be 
established. 
*/ 
void 
connect(ICredentials 
credentials, 
ConnectorClusterConfig 
config) 
throws 
ConnectionException; 
#BDS1431
IConnector (III) 
/** 
* 
Get 
the 
storage 
engine. 
* 
... 
*/ 
IStorageEngine 
getStorageEngine() 
throws 
UnsupportedException; 
/** 
* 
Get 
the 
query 
engine. 
* 
... 
*/ 
IQueryEngine 
getQueryEngine() 
throws 
UnsupportedException; 
/** 
* 
Get 
the 
metadata 
engine 
... 
*/ 
IMetadataEngine 
getMetadataEngine() 
throws 
UnsupportedException; 
#BDS1432
IMetadataEngine 
o Defines opera9ons related with the metadata management 
createCatalog(ClusterName, 
CatalogMetadata) 
dropCatalog(ClusterName, 
CatalogName) 
createTable(ClusterName, 
TableMetadata) 
alterTable(ClusterName, 
TableName, 
AlterOptions) 
dropTable(ClusterName, 
TableName) 
createIndex(ClusterName, 
IndexMetadata) 
dropIndex(ClusterName, 
IndexName) 
#BDS1433
IStorageEngine 
o Defines opera9ons related with wri9ng data 
insert(ClusterName, 
TableMetadata, 
Row) 
insert(ClusterName, 
TableMetadata, 
Collection<Row>) 
delete(ClusterName, 
TableName, 
Collection<Filter>) 
update(ClusterName, 
TableName, 
Collection<Relation>, 
Collection<Filter>) 
truncate(ClusterName, 
TableName) 
#BDS1434
IQueryEngine 
o Defines opera9ons related to querying data 
execute(LogicalWorkflow) 
asyncExecute(String, 
LogicalWorkflow, 
IResultHandler) 
stop(String) 
#BDS1435
Logical workflows 
o Graph representa9on of a query, 
composed of two types of logical 
steps 
§ Transforma9on step: one input, 
one output 
§ Union step: n inputs, one 
output 
#BDS1436
Building a Logical Workflow 
o Consider the following query 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
Parsing 
Valida6on 
Planning 
#BDS1437
Building a Logical Workflow - Project 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
Iden9fy tables and required fields 
#BDS1438
Building a Logical Workflow - Project 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
100 
For each table, retrieve all columns that 
LIMIT 
are involved in the query 
#BDS1439
Building a Logical Workflow - Project 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
100 
For each table, retrieve all columns that 
LIMIT 
are involved in the query 
#BDS1440
Building a Logical Workflow - Project 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
Project 
TWEET(id, 
user, 
hashtag) 
Build a Project logical step per table 
#BDS1441
Building a Logical Workflow - Project 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
Project 
TWEET(id, 
user, 
hashtag) 
#BDS1442
Building a Logical Workflow - Project 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
Project 
TWEET(id, 
user, 
hashtag) 
Project 
MENTIONS(user, 
counter) 
#BDS1443
Building a Logical Workflow - Filters 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
P 
TWEET(id, 
user, 
hashtag) 
P 
MENTIONS(user, 
counter) 
Next, we add filtering 
steps ASAP 
#BDS1444
Building a Logical Workflow - Filters 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
P 
TWEET(id, 
user, 
hashtag) 
P 
MENTIONS(user, 
counter) 
Filter 
(hashtag 
= 
‘bds14’) 
#BDS1445
Building a Logical Workflow - Filters 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
P 
TWEET(id, 
user, 
hashtag) 
P 
MENTIONS(user, 
Filter 
(hashtag 
= 
‘bds14’) 
counter) 
Filter 
(counter 
> 
100) 
#BDS1446
Building a Logical Workflow - Window 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
P 
TWEET 
P 
MENTIONS 
Filter 
(hashtag 
= 
‘bds14’) 
Filter 
(counter 
> 
100) 
#BDS1447
Building a Logical Workflow - Window 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
P 
TWEET 
P 
MENTIONS 
Filter 
(hashtag 
= 
‘bds14’) 
Filter 
(counter 
> 
100) 
Window 
(2 
min) 
#BDS1448
Building a Logical Workflow - Join 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
P 
TWEET 
P 
MENTIONS 
Filter 
Window 
(2 
m) 
Filter 
(counter 
> 
100) 
#BDS1449
Building a Logical Workflow - Join 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
P 
TWEET 
P 
MENTIONS 
Filter 
Window 
(2 
m) 
Filter 
(counter 
> 
100) 
Join 
m.user 
= 
t.user 
#BDS1450
Building a Logical Workflow – Order By 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
P 
TWEET 
P 
MENTIONS 
Filter 
Window 
(2 
m) 
Filter 
(counter 
> 
100) 
Join 
#BDS1451
Building a Logical Workflow – Order By 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
P 
TWEET 
P 
MENTIONS 
Filter 
Window 
(2 
m) 
Filter 
(counter 
> 
100) 
Join 
GroupBy 
(m.counter) 
#BDS1452
Building a Logical Workflow – Limit 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
P 
TWEET 
P 
MENTIONS 
Filter 
Window 
(2 
m) 
Filter 
(counter 
> 
100) 
Join 
GroupBy 
(m.counter) 
#BDS1453
Building a Logical Workflow – Limit 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
P 
TWEET 
P 
MENTIONS 
Filter 
Window 
(2 
m) 
Filter 
(counter 
> 
100) 
Join 
GroupBy 
(m.counter) 
Limit 
100 
#BDS1454
Building a Logical Workflow – Select 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
P 
TWEET 
P 
MENTIONS 
Window 
Filter 
Filter 
Join 
GroupBy 
Limit 
100 
#BDS1455
Building a Logical Workflow – Select 
SELECT 
tweet.id, 
tweet.user 
FROM 
transactions 
WITH 
WINDOW 
2 
minutes 
JOIN 
mentions 
ON 
mentions.user 
= 
tweet.user 
WHERE 
mentions.counter 
> 
100 
AND 
tweet.hashtag 
= 
‘#bds14’ 
ORDER 
BY 
mentions.counter 
LIMIT 
100 
P 
TWEET 
P 
MENTIONS 
Window 
Filter 
Filter 
IQueryEngine.execute() 
Join 
GroupBy 
Limit 
100 
Select 
id, 
user 
#BDS1456
o Na9ve 
• Cassandra 
• MongoDB 
• Aerospike 
• Elas9cSearch 
• Stra9o Streaming 
o based 
• Cassandra 
• MongoDB 
• Aerospike 
• HDFS 
Existing connectors 
#BDS1457
Demo - IRC 
58
Crossdata Connector 
Challenge 
59
Crossdata Connector Challenge 
Deadline: February, 2nd 2015 
#BDS1460
Crossdata Connector Challenge 
#BDS1461
Crossdata Connector Challenge 
#BDS1462
More information 
stra9o.github.io/crossdata 
stra9o.github.io/crossdata/contest 
crossdata.atlassian.net 
#BDS1463
Learn to use Stratio Crossdata 
Developing your first connector 
64 
Daniel Higuero 
dhiguero@stratio.com 
@dhiguero 
Alvaro Agea 
alvaro@stratio.com 
@alvaroagea 
#BDS14

More Related Content

Similar to Learn to use Stratio Crossdata

Lessons from running AppSync in prod
Lessons from running AppSync in prodLessons from running AppSync in prod
Lessons from running AppSync in prod
Yan Cui
 
ql.io at NodePDX
ql.io at NodePDXql.io at NodePDX
ql.io at NodePDX
Subbu Allamaraju
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
Mark Kromer
 
Implementing Your Full Stack App with MongoDB Stitch (Tutorial)
Implementing Your Full Stack App with MongoDB Stitch (Tutorial)Implementing Your Full Stack App with MongoDB Stitch (Tutorial)
Implementing Your Full Stack App with MongoDB Stitch (Tutorial)
MongoDB
 
Optimizing Code Reusability for SharePoint using Linq to SharePoint & the MVP...
Optimizing Code Reusability for SharePoint using Linq to SharePoint & the MVP...Optimizing Code Reusability for SharePoint using Linq to SharePoint & the MVP...
Optimizing Code Reusability for SharePoint using Linq to SharePoint & the MVP...
Sparkhound Inc.
 
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceNetwork Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Cloudian
 
SH 2 - SES 1 - Stitch_Workshop_TLV.pptx
SH 2 - SES 1 - Stitch_Workshop_TLV.pptxSH 2 - SES 1 - Stitch_Workshop_TLV.pptx
SH 2 - SES 1 - Stitch_Workshop_TLV.pptx
MongoDB
 
MongoDB Stitch Introduction
MongoDB Stitch IntroductionMongoDB Stitch Introduction
MongoDB Stitch Introduction
MongoDB
 
Building Your First App with MongoDB Stitch
Building Your First App with MongoDB StitchBuilding Your First App with MongoDB Stitch
Building Your First App with MongoDB Stitch
MongoDB
 
Virtual training intro to InfluxDB - June 2021
Virtual training  intro to InfluxDB  - June 2021Virtual training  intro to InfluxDB  - June 2021
Virtual training intro to InfluxDB - June 2021
InfluxData
 
MongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB
 
Professional Services Insights into Improving Sitecore XP
Professional Services Insights into Improving Sitecore XPProfessional Services Insights into Improving Sitecore XP
Professional Services Insights into Improving Sitecore XP
SeanHolmesby1
 
Tutorial: Building Your First App with MongoDB Stitch
Tutorial: Building Your First App with MongoDB StitchTutorial: Building Your First App with MongoDB Stitch
Tutorial: Building Your First App with MongoDB Stitch
MongoDB
 
ProjectReport_Subhayu
ProjectReport_SubhayuProjectReport_Subhayu
ProjectReport_Subhayu
Subhayu Chakravorty
 
Moscow MuleSoft meetup May 2021
Moscow MuleSoft meetup May 2021Moscow MuleSoft meetup May 2021
Moscow MuleSoft meetup May 2021
Leadex Systems
 
2Ring Dashboards & Wallboards v8.1
2Ring Dashboards & Wallboards v8.12Ring Dashboards & Wallboards v8.1
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022
InfluxData
 
N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.
Keshav Murthy
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsDiscover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 Analytics
Sriskandarajah Suhothayan
 
Introducing Stitch
Introducing Stitch Introducing Stitch
Introducing Stitch
MongoDB
 

Similar to Learn to use Stratio Crossdata (20)

Lessons from running AppSync in prod
Lessons from running AppSync in prodLessons from running AppSync in prod
Lessons from running AppSync in prod
 
ql.io at NodePDX
ql.io at NodePDXql.io at NodePDX
ql.io at NodePDX
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Implementing Your Full Stack App with MongoDB Stitch (Tutorial)
Implementing Your Full Stack App with MongoDB Stitch (Tutorial)Implementing Your Full Stack App with MongoDB Stitch (Tutorial)
Implementing Your Full Stack App with MongoDB Stitch (Tutorial)
 
Optimizing Code Reusability for SharePoint using Linq to SharePoint & the MVP...
Optimizing Code Reusability for SharePoint using Linq to SharePoint & the MVP...Optimizing Code Reusability for SharePoint using Linq to SharePoint & the MVP...
Optimizing Code Reusability for SharePoint using Linq to SharePoint & the MVP...
 
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceNetwork Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
 
SH 2 - SES 1 - Stitch_Workshop_TLV.pptx
SH 2 - SES 1 - Stitch_Workshop_TLV.pptxSH 2 - SES 1 - Stitch_Workshop_TLV.pptx
SH 2 - SES 1 - Stitch_Workshop_TLV.pptx
 
MongoDB Stitch Introduction
MongoDB Stitch IntroductionMongoDB Stitch Introduction
MongoDB Stitch Introduction
 
Building Your First App with MongoDB Stitch
Building Your First App with MongoDB StitchBuilding Your First App with MongoDB Stitch
Building Your First App with MongoDB Stitch
 
Virtual training intro to InfluxDB - June 2021
Virtual training  intro to InfluxDB  - June 2021Virtual training  intro to InfluxDB  - June 2021
Virtual training intro to InfluxDB - June 2021
 
MongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDB
 
Professional Services Insights into Improving Sitecore XP
Professional Services Insights into Improving Sitecore XPProfessional Services Insights into Improving Sitecore XP
Professional Services Insights into Improving Sitecore XP
 
Tutorial: Building Your First App with MongoDB Stitch
Tutorial: Building Your First App with MongoDB StitchTutorial: Building Your First App with MongoDB Stitch
Tutorial: Building Your First App with MongoDB Stitch
 
ProjectReport_Subhayu
ProjectReport_SubhayuProjectReport_Subhayu
ProjectReport_Subhayu
 
Moscow MuleSoft meetup May 2021
Moscow MuleSoft meetup May 2021Moscow MuleSoft meetup May 2021
Moscow MuleSoft meetup May 2021
 
2Ring Dashboards & Wallboards v8.1
2Ring Dashboards & Wallboards v8.12Ring Dashboards & Wallboards v8.1
2Ring Dashboards & Wallboards v8.1
 
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022
 
N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsDiscover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 Analytics
 
Introducing Stitch
Introducing Stitch Introducing Stitch
Introducing Stitch
 

Recently uploaded

Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 

Recently uploaded (20)

Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 

Learn to use Stratio Crossdata

  • 1. Let’s think about how we approach projects
  • 2.
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. 9
  • 10. 10
  • 12. 12
  • 13. Crossdata o A new technology that: • Is not limited by the underlying datastore capabili9es • Leverages Spark to perform non-­‐na9vely supported opera9ons • Supports batch and streaming queries • Supports mul9ple clusters and technologies #BDS1413
  • 14. Learn to use Stratio Crossdata Developing your first connector 14 Daniel Higuero dhiguero@stratio.com @dhiguero Alvaro Agea alvaro@stratio.com @alvaroagea #BDS14
  • 15. • Crossdata architecture • Crossdata Connectors • Contest Agenda #BDS1415
  • 17. Connecting to the outside world o Crossdata defines an IConnector extension interface o User can easily add new connectors to support • Different datastores • Different processing engines • Different versions o Where each connector defines its capabili9es 17 Our planner will choose the best connector for each query #BDS14
  • 18. Query execution 18 Parsing Valida6on Planning Execu6on datastore Connector1 Connector2 Connector3 Our planner will choose the best connector for each query #BDS14
  • 19. Multi-cluster support o Stra9o Crossdata offers the possibility of accessing a single catalog across a set of datastores. • Mul9ple clusters can coexist to op9mize plaOorm performance § E.g., produc9on cluster, test cluster, write-­‐op9mized cluster, read-­‐op9mized cluster, etc. • A table is saved in a unique datastore #BDS1419
  • 20. Logical and physical mapping 20 SELECT * FROM app.users; Users table Test table old_users table App catalog C* Produc6on M development Other datastores #BDS14
  • 22. Metadata in the era of Schemaless NoSQL datastores o Some datastores are schemaless but our applica9ons are not! • Flexible schemas vs Schemaless • Crossdata provides a Metadata manager that stores schemas for any datasource § Remember ODBC and those BI tools ? 1010010101010 1010110101010 1111010001111 001000 #BDS1422
  • 23. Metadata management 23 Connector C* produc6on Metadata Store Infinispan Metadata Manager 2 Updated metadata informa6on is maintained among Crossdata servers using Infinispan If the connector does not support metadata opera6ons those are skipped 1 2 #BDS14
  • 25. Stratio Crossdata ODBC/JDBC o Well-­‐known interface standard (for BI tools, external apps, …) o We have implemented it using Simba SDK o It opens the full poten9al of Stra9o Crossdata to the external world o Currently tested with Tableau, Qlikview and MS Excel 25 One ODBC/JDBC for all datastores! #BDS14
  • 27. Crossdata Connectors o The Crossdata core is abstracted of the inner workings of each Connectors. • Common IConnector interface. • Use of XML manifest to define datastore and connector capabilies o Each connector may access different datastores o Each connector supports many clusters of the same datastore technology #BDS1427
  • 28. Crossdata connectors ConnectorApp ConnectorApp ConnectorApp Connector0 Datastore 0 Crossdata Server Connector1 Connector2 Datastore 1 Datastore 2 #BDS1428
  • 29. Connector interface o The ConnectorApp abstracts the communica9on with the server using AKKA. • No worries about transferring data, status, etc. • It takes an implementa9on of IConnector and launches the required actors ConnectorApp IConnector IConnectorImpl Datastore #BDS1429
  • 30. IConnector public interface IConnector { /** * Get the name of the connector. * @return A name. */ String getConnectorName(); /** * Get the names of the datastores supported by the connector. * Several connectors may declare the same datastore name. * @return The names. */ String[] getDatastoreName(); #BDS1430
  • 31. IConnector (II) /** * Initialize the connector service. * @param configuration The configuration. * @throws InitializationException If the connector initialization fails. */ void init(IConfiguration configuration) throws InitializationException; /** * Connect to a datastore using a set of options. * @param credentials The required credentials * @param config The cluster configuration. * @throws ConnectionException If the connection could not be established. */ void connect(ICredentials credentials, ConnectorClusterConfig config) throws ConnectionException; #BDS1431
  • 32. IConnector (III) /** * Get the storage engine. * ... */ IStorageEngine getStorageEngine() throws UnsupportedException; /** * Get the query engine. * ... */ IQueryEngine getQueryEngine() throws UnsupportedException; /** * Get the metadata engine ... */ IMetadataEngine getMetadataEngine() throws UnsupportedException; #BDS1432
  • 33. IMetadataEngine o Defines opera9ons related with the metadata management createCatalog(ClusterName, CatalogMetadata) dropCatalog(ClusterName, CatalogName) createTable(ClusterName, TableMetadata) alterTable(ClusterName, TableName, AlterOptions) dropTable(ClusterName, TableName) createIndex(ClusterName, IndexMetadata) dropIndex(ClusterName, IndexName) #BDS1433
  • 34. IStorageEngine o Defines opera9ons related with wri9ng data insert(ClusterName, TableMetadata, Row) insert(ClusterName, TableMetadata, Collection<Row>) delete(ClusterName, TableName, Collection<Filter>) update(ClusterName, TableName, Collection<Relation>, Collection<Filter>) truncate(ClusterName, TableName) #BDS1434
  • 35. IQueryEngine o Defines opera9ons related to querying data execute(LogicalWorkflow) asyncExecute(String, LogicalWorkflow, IResultHandler) stop(String) #BDS1435
  • 36. Logical workflows o Graph representa9on of a query, composed of two types of logical steps § Transforma9on step: one input, one output § Union step: n inputs, one output #BDS1436
  • 37. Building a Logical Workflow o Consider the following query SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 Parsing Valida6on Planning #BDS1437
  • 38. Building a Logical Workflow - Project SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 Iden9fy tables and required fields #BDS1438
  • 39. Building a Logical Workflow - Project SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter 100 For each table, retrieve all columns that LIMIT are involved in the query #BDS1439
  • 40. Building a Logical Workflow - Project SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter 100 For each table, retrieve all columns that LIMIT are involved in the query #BDS1440
  • 41. Building a Logical Workflow - Project SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 Project TWEET(id, user, hashtag) Build a Project logical step per table #BDS1441
  • 42. Building a Logical Workflow - Project SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 Project TWEET(id, user, hashtag) #BDS1442
  • 43. Building a Logical Workflow - Project SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 Project TWEET(id, user, hashtag) Project MENTIONS(user, counter) #BDS1443
  • 44. Building a Logical Workflow - Filters SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 P TWEET(id, user, hashtag) P MENTIONS(user, counter) Next, we add filtering steps ASAP #BDS1444
  • 45. Building a Logical Workflow - Filters SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 P TWEET(id, user, hashtag) P MENTIONS(user, counter) Filter (hashtag = ‘bds14’) #BDS1445
  • 46. Building a Logical Workflow - Filters SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 P TWEET(id, user, hashtag) P MENTIONS(user, Filter (hashtag = ‘bds14’) counter) Filter (counter > 100) #BDS1446
  • 47. Building a Logical Workflow - Window SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 P TWEET P MENTIONS Filter (hashtag = ‘bds14’) Filter (counter > 100) #BDS1447
  • 48. Building a Logical Workflow - Window SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 P TWEET P MENTIONS Filter (hashtag = ‘bds14’) Filter (counter > 100) Window (2 min) #BDS1448
  • 49. Building a Logical Workflow - Join SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 P TWEET P MENTIONS Filter Window (2 m) Filter (counter > 100) #BDS1449
  • 50. Building a Logical Workflow - Join SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 P TWEET P MENTIONS Filter Window (2 m) Filter (counter > 100) Join m.user = t.user #BDS1450
  • 51. Building a Logical Workflow – Order By SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 P TWEET P MENTIONS Filter Window (2 m) Filter (counter > 100) Join #BDS1451
  • 52. Building a Logical Workflow – Order By SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 P TWEET P MENTIONS Filter Window (2 m) Filter (counter > 100) Join GroupBy (m.counter) #BDS1452
  • 53. Building a Logical Workflow – Limit SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 P TWEET P MENTIONS Filter Window (2 m) Filter (counter > 100) Join GroupBy (m.counter) #BDS1453
  • 54. Building a Logical Workflow – Limit SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 P TWEET P MENTIONS Filter Window (2 m) Filter (counter > 100) Join GroupBy (m.counter) Limit 100 #BDS1454
  • 55. Building a Logical Workflow – Select SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 P TWEET P MENTIONS Window Filter Filter Join GroupBy Limit 100 #BDS1455
  • 56. Building a Logical Workflow – Select SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100 P TWEET P MENTIONS Window Filter Filter IQueryEngine.execute() Join GroupBy Limit 100 Select id, user #BDS1456
  • 57. o Na9ve • Cassandra • MongoDB • Aerospike • Elas9cSearch • Stra9o Streaming o based • Cassandra • MongoDB • Aerospike • HDFS Existing connectors #BDS1457
  • 60. Crossdata Connector Challenge Deadline: February, 2nd 2015 #BDS1460
  • 63. More information stra9o.github.io/crossdata stra9o.github.io/crossdata/contest crossdata.atlassian.net #BDS1463
  • 64. Learn to use Stratio Crossdata Developing your first connector 64 Daniel Higuero dhiguero@stratio.com @dhiguero Alvaro Agea alvaro@stratio.com @alvaroagea #BDS14

Editor's Notes

  1. Crossdata is a new technology that have been in development for the last months. It is current open source with Apache license. And it has several new features that we think will change the interaction approach with big data systems. Crossdata avoids the limitations imposed by the underlying datastore. For instance, if a column has not been indexed, we need to think about methods of retriving that columns. It is important to highlight that our focus is on users and performance. If the user request some data, it is important to provide that result. We think it is better to answer a non-optimal query in the system that forbid it. We will always have time to add a new index if that query becomes the norm. As an example, think about users not involved in the database design that interact with the system through business intelligent tools. We use spark to performn non-natively queries in those cases, and we also use it when we need to mix data coming from streaming sources with batch data. Moreover, our design allows us to access several clusters and technologies at the same time.
  2. From the architectural point of view, we can define three main layers On the top we find the driver which is used to built custom applications through his Java/Scala API. We have also an ODBC connector for external tools and we provide REST API. On the middle, we have our server component, which contains the core logic and the distributed capabilities. On the bottom, being a generic architecture we employ a connector-based approach that permits to extend the system to communicate with any datastore. Communication between layers is accomplished using scala actor framework.
  3. With respect to the Connectors approach, we have defined a Java interface that contains the set of operations that an ideal connector should provide. Notice that our design simplifies connector development so users can easy add their datastore to the Crossdata ecosystem. Each developer will define which are the connector capabilities and Crossdata with its planner will choose the best one depending on the query. Several connector to the same datastore can coexist at the same time.
  4. To clarify this, our query execution path is pretty similar to the existing ones, with the addition of the connector selection step after the query planning is determined.
  5. Another nice feature is related to multi-cluster support. Many times we have seen an application that have two completelly different types of accesses. Would it be nice if we can easily tune its database. If both types of accesses target the same cluster this is usually imposible as you need to decide for instance if your workload is write or read oriented. With Crossdata a common logical view is provide to the user independly of which is the cluster storing a particular table. We only have impose the limitation that a tablename can only be found in a single cluster.
  6. To illustrate this, let us imagine an scenario with three cluster. One production cassandra cluster, one development cluster and other cluster with an old version of the database or other technologies. In this case if we submit the following query, Crossdata will determine which is the datastore persisting table users, and it will retrieve a resultset without requiring any knowledge from the user point-of-view about physical deployment.
  7. Now let’s focus on Metadata and how we can solve its management problem when we have different techonolgies.
  8. The first question that arises when we talk about metadata is how are we going to be compatible with schemaless aproaches. It is true that some datastores are schemales, but it is also true that our applications need to know which fields should be queried and shown to the users. The difference for us is not whether a datastore is schemaless, it is more related with providing the user with a flexible way of updating existing schemas. Crossdata stores schemas for any datasource independly of whether they are schemaless. Remember, that some times other applications like business intelligent tools do exists, and they require schemas.
  9. To show how it is done, even though in Cassandra we already have an schema (… thanks for that …) Crossdata stores and shares the schema itselft among the existing Crossdata servers. This is also important to reduce metadata related queries to the underlying datastore (For instance, during query validation).
  10. And finally, it is time for the ODBC.
  11. We have develop an ODBC that retrives data from Crossdata that allow us to integrate with different applications and business intelligent tools. To mention some of them, we currently support Tableau, Qlikview and Excel. It is important to highlight that given the generic nature of Crossdata, the existence of this ODBC opens the possibility of connecting any datastore through ODBC by just writing a new connector. And believe us if we tell you that it is easier to do so.
  12. And finally, it is time for the ODBC.
  13. // Cool Class Diagram http://yuml.me/diagram/scruffy/class/draw [LogicalStep|getOperation();getNextStep();getPreviousSteps()]<>-orders*> [TransformationStep] [UnionStep] [LogicalStep]^-[TransformationStep] [LogicalStep]^-[UnionStep] [TransformationStep]^-[GroupBy] [UnionStep]^-[Join] [TransformationStep]^-[PartialResults] [TransformationStep]^-[Window] [TransformationStep]^-[Limit] [TransformationStep]^-[Filter] [TransformationStep]^-[Select] [TransformationStep]^-[Project]
  14. What about our future research and development lines?
  15. What about our future research and development lines?