Learn to use Stratio Crossdata

Let’s think about how
we approach projects

Crossdata
o A new technology that:
• Is not limited by the underlying datastore capabili9es
• Leverages Spark to perform non-‐na9vely supported
opera9ons
• Supports batch and streaming queries
• Supports mul9ple clusters and technologies
#BDS1413

Learn to use Stratio Crossdata
Developing your first connector
14
Daniel Higuero
dhiguero@stratio.com
@dhiguero
Alvaro Agea
alvaro@stratio.com
@alvaroagea
#BDS14

• Crossdata architecture
• Crossdata Connectors
• Contest
Agenda
#BDS1415

Connecting to the outside world
o Crossdata defines an IConnector extension interface
o User can easily add new connectors to support
• Different datastores
• Different processing engines
• Different versions
o Where each connector defines its capabili9es
17
Our planner will choose the best connector for each query
#BDS14

Query execution
18
Parsing
Valida6on
Planning
Execu6on
datastore
Connector1
Connector2
Connector3
Our planner will choose the best connector for each query
#BDS14

Multi-cluster support
o Stra9o Crossdata offers the possibility of accessing a single
catalog across a set of datastores.
• Mul9ple clusters can coexist to op9mize plaOorm
performance
§ E.g., produc9on cluster, test cluster, write-‐op9mized
cluster, read-‐op9mized cluster, etc.
• A table is saved in a unique datastore
#BDS1419

Logical and physical mapping
20
SELECT
*
FROM
app.users;
Users
table
Test
table
old_users
table
App
catalog
C*
Produc6on
M
development
Other
datastores
#BDS14

Metadata in the era of Schemaless NoSQL datastores
o Some datastores are schemaless but our applica9ons are not!
• Flexible schemas vs Schemaless
• Crossdata provides a Metadata manager that stores
schemas for any datasource
§ Remember ODBC and those BI tools
?
1010010101010
1010110101010
1111010001111
001000
#BDS1422

Metadata management
23
Connector
C*
produc6on
Metadata
Store
Infinispan
Metadata
Manager
2
Updated
metadata
informa6on
is
maintained
among
Crossdata
servers
using
Infinispan
If
the
connector
does
not
support
metadata
opera6ons
those
are
skipped
1
2
#BDS14

Stratio Crossdata ODBC/JDBC
o Well-‐known interface standard (for BI tools, external apps, …)
o We have implemented it using Simba SDK
o It opens the full poten9al of Stra9o Crossdata to the external
world
o Currently tested with Tableau, Qlikview and MS Excel
25
One ODBC/JDBC for all datastores!
#BDS14

Crossdata Connectors
o The Crossdata core is abstracted of the inner workings of each
Connectors.
• Common IConnector interface.
• Use of XML manifest to define datastore and connector
capabilies
o Each connector may access different datastores
o Each connector supports many clusters of the same datastore
technology
#BDS1427

Crossdata connectors
ConnectorApp
ConnectorApp
ConnectorApp
Connector0
Datastore
0
Crossdata
Server
Connector1
Connector2
Datastore
1
Datastore
2
#BDS1428

Connector interface
o The ConnectorApp abstracts the
communica9on with the server
using AKKA.
• No worries about transferring
data, status, etc.
• It takes an implementa9on of
IConnector and launches the
required actors
ConnectorApp
IConnector
IConnectorImpl
Datastore
#BDS1429

IConnector
public
interface
IConnector
{
/**
*
Get
the
name
of
the
connector.
*
@return
A
name.
*/
String
getConnectorName();
/**
*
Get
the
names
of
the
datastores
supported
by
the
connector.
*
Several
connectors
may
declare
the
same
datastore
name.
*
@return
The
names.
*/
String[]
getDatastoreName();
#BDS1430

IConnector (II)
/**
*
Initialize
the
connector
service.
*
@param
configuration
The
configuration.
*
@throws
InitializationException
If
the
connector
initialization
fails.
*/
void
init(IConfiguration
configuration)
throws
InitializationException;
/**
*
Connect
to
a
datastore
using
a
set
of
options.
*
@param
credentials
The
required
credentials
*
@param
config
The
cluster
configuration.
*
@throws
ConnectionException
If
the
connection
could
not
be
established.
*/
void
connect(ICredentials
credentials,
ConnectorClusterConfig
config)
throws
ConnectionException;
#BDS1431

IConnector (III)
/**
*
Get
the
storage
engine.
*
...
*/
IStorageEngine
getStorageEngine()
throws
UnsupportedException;
/**
*
Get
the
query
engine.
*
...
*/
IQueryEngine
getQueryEngine()
throws
/**
*
Get
the
metadata
engine
...
*/
IMetadataEngine
getMetadataEngine()
throws
#BDS1432

IMetadataEngine
o Defines opera9ons related with the metadata management
createCatalog(ClusterName,
CatalogMetadata)
dropCatalog(ClusterName,
CatalogName)
createTable(ClusterName,
TableMetadata)
alterTable(ClusterName,
TableName,
AlterOptions)
dropTable(ClusterName,
TableName)
createIndex(ClusterName,
IndexMetadata)
dropIndex(ClusterName,
IndexName)
#BDS1433

IStorageEngine
o Defines opera9ons related with wri9ng data
insert(ClusterName,
TableMetadata,
Row)
insert(ClusterName,
TableMetadata,
Collection<Row>)
delete(ClusterName,
TableName,
Collection<Filter>)
update(ClusterName,
TableName,
Collection<Relation>,
Collection<Filter>)
truncate(ClusterName,
TableName)
#BDS1434

IQueryEngine
o Defines opera9ons related to querying data
execute(LogicalWorkflow)
asyncExecute(String,
LogicalWorkflow,
IResultHandler)
stop(String)
#BDS1435

Logical workflows
o Graph representa9on of a query,
composed of two types of logical
steps
§ Transforma9on step: one input,
one output
§ Union step: n inputs, one
output
#BDS1436

Building a Logical Workflow
o Consider the following query
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
Parsing
Valida6on
Planning
#BDS1437

Building a Logical Workflow - Project
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
Iden9fy tables and required fields
#BDS1438

SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
100
For each table, retrieve all columns that
LIMIT
are involved in the query
#BDS1439

SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
100
For each table, retrieve all columns that
LIMIT
are involved in the query
#BDS1440

SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
Project
TWEET(id,
user,
hashtag)
Build a Project logical step per table
#BDS1441

SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
Project
TWEET(id,
user,
hashtag)
#BDS1442

SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
Project
TWEET(id,
user,
hashtag)
Project
MENTIONS(user,
counter)
#BDS1443

Building a Logical Workflow - Filters
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET(id,
user,
hashtag)
P
MENTIONS(user,
counter)
Next, we add filtering
steps ASAP
#BDS1444

SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET(id,
user,
hashtag)
P
MENTIONS(user,
counter)
Filter
(hashtag
=
‘bds14’)
#BDS1445

SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET(id,
user,
hashtag)
P
MENTIONS(user,
Filter
(hashtag
=
‘bds14’)
counter)
Filter
(counter
>
100)
#BDS1446

Building a Logical Workflow - Window
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
(hashtag
=
‘bds14’)
Filter
(counter
>
100)
#BDS1447

Building a Logical Workflow - Window
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
(hashtag
=
‘bds14’)
Filter
(counter
>
100)
Window
(2
min)
#BDS1448

Building a Logical Workflow - Join
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
Window
(2
m)
Filter
(counter
>
100)
#BDS1449

Building a Logical Workflow - Join
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
Window
(2
m)
Filter
(counter
>
100)
Join
m.user
=
t.user
#BDS1450

Building a Logical Workflow – Order By
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
Window
(2
m)
Filter
(counter
>
100)
Join
#BDS1451

Building a Logical Workflow – Order By
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
Window
(2
m)
Filter
(counter
>
100)
Join
GroupBy
(m.counter)
#BDS1452

Building a Logical Workflow – Limit
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
Window
(2
m)
Filter
(counter
>
100)
Join
GroupBy
(m.counter)
#BDS1453

Building a Logical Workflow – Limit
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
Window
(2
m)
Filter
(counter
>
100)
Join
GroupBy
(m.counter)
Limit
100
#BDS1454

Building a Logical Workflow – Select
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Window
Filter
Filter
Join
GroupBy
Limit
100
#BDS1455

Building a Logical Workflow – Select
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Window
Filter
Filter
IQueryEngine.execute()
Join
GroupBy
Limit
100
Select
id,
user
#BDS1456

o Na9ve
• Cassandra
• MongoDB
• Aerospike
• Elas9cSearch
• Stra9o Streaming
o based
• Cassandra
• MongoDB
• Aerospike
• HDFS
Existing connectors
#BDS1457

Crossdata Connector
Challenge
59

Crossdata Connector Challenge
Deadline: February, 2nd 2015
#BDS1460

#BDS1461

#BDS1462

More information
stra9o.github.io/crossdata
stra9o.github.io/crossdata/contest
crossdata.atlassian.net
#BDS1463

Learn to use Stratio Crossdata
Developing your first connector
64
Daniel Higuero
dhiguero@stratio.com
@dhiguero
Alvaro Agea
alvaro@stratio.com
@alvaroagea
#BDS14

Learn to use Stratio Crossdata

Recommended

Recommended

More Related Content

Similar to Learn to use Stratio Crossdata

Similar to Learn to use Stratio Crossdata (20)

Recently uploaded

Recently uploaded (20)

Learn to use Stratio Crossdata

Editor's Notes