NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and combine multiple query semantics with NoSQL based engines. We will demonstrate specifically a combination of key/value, SQL like, Document model and Graph based queries as well as more advanced topic such as handling partial update and query through projection. We will also demonstrate how we can create a mash-up between those API's i.e. write fast through Key/Value API and execute complex queries on that same data through SQL query.
8. Christos Erotocritou www.gigaspaces.com
Key/Value Technologies
8
• Query: Key, Value
• Semantics:
• Mostly Read
• No Aggregation
• No Projection
• No Partial update
• Performance: 1M’s/sec
• Consistency: Atomic
• Scaling: Mostly Scale-Out
• Availability: Limited
9. Christos Erotocritou www.gigaspaces.com
Stream Processing Technologies
9
• Semantics:
• Event-driven data processing
• Performance: 10M’s/sec
• Machine learning
• Real-time analytics
• Depend on external persistency for
maintaining state
Spout
Bolt
Bolt
Bolt Bolt
10. Christos Erotocritou www.gigaspaces.com
SSD Technology is quickly shaping the Big Data Landscape
10
Great for heavy-reads initial-loads
Store indexes in memory and payload on SSD
SSD-extended in-memory products can provide
great performance with increased capacity and persistence
Big data but also fast data
12. How can we integrate such
technologies and provide a
common access API ?
13. Christos Erotocritou www.gigaspaces.com
A typical Big Data App logical architecture can look like this
13
Batch
Processing
RT Analytics
Service
Storage
Front End
Application
/ Service
Back End
Front-end users accessing a
distributed multi-facet service
back-end users accessing a
business insights and system
maintenance metrics
14. Christos Erotocritou www.gigaspaces.com
We need a High-Speed Data Store…
14
• Key / Value
• Document
• Graph
• Map / Reduce
• Transactional
• Stream based
But we’re not there just yet…
Common Data Store serving
Multiple Semantics/API
Disk becomes
the new tape
High-speed
Data Store
Front
End
Back
End
15. Christos Erotocritou www.gigaspaces.com
We can use IMDG technologies to integrate all our Data Sources
15
High-speed
Data Bus
(IMDG)
Front
End
Back
End
MySQLMongoDB
Mongo Sync RDBMS Sync
Hadoop Sync
RT Streaming
Direct Access
RT Transactional
Data Access
Batch Layer
Speed Layer
Web Storage Layer
Data bus:
• Resilient, FT & HA
• Transactional
• High-throughput
Hadoop
Storm
16. Christos Erotocritou www.gigaspaces.com
Online consumer media service real-world use-case
16
High-speed
Data Bus
(IMDG)
Storm
Sync new data
available to end
user
Hadoop Sync
RT Streaming
Direct Access
(if needed)
Purchase order
(Transactional)
Long-term
analytics and
storage
Real-time business
analytics on user
activity
State persistency
Fast media
search
Hadoop
Downstream
System
Direct Access
(if needed)
business
analytics
17. Christos Erotocritou www.gigaspaces.com
What really goes on in the grid…
17
New
DataClient
Polling
Container
Notify
Container
Proc.
Data
MongoDB
New
Data
Client
Proc.
Data
Polling
container
writes to
MongoDB
New data
automatically
synced to ES
Mirror DB
Space
mirroring
Service
Application
Storage
Client writes
new data to
the space
21. Christos Erotocritou www.gigaspaces.com
Nested Queries & Projections
21
Query for a Person who lives in New York:
A … = new SQLQuery<Person>(Person.class, “address.city = ‘New York’”);
Query for a Dealer which sells a Honda:
B … = new SQLQuery<Dealer>(Dealer.class, “cars[*] = ‘Honda’”);
Query for a Person with projections on first and last names:
C IdsQuery<Person> idsQuery = new IdsQuery<Person>
(Person.class, new Long[]{id1, id2})
.setProjections(“firstName”, “lastName”);
Person result[] = space.readByIds(idsQuery).getResultsArray();
22. Christos Erotocritou www.gigaspaces.com
Basic Aggregations
22
Create a query that yields a results set:
A SQLQuery<Employee> = new SQLQuery<Employee>
(Employee.class, “country=? OR country=?”);
query.setParameter(1, “UK”);
query.setParameter(2, “USA”);
Perform aggregations for that result set:
B Integer maxAgeInSpace = max(space, query, “age”);
Integer minAgeInSpace = min(space, query, “age”);
Integer combinedAgeInSpace = sum(space, query, “age”);
Double averageAge = average(space, query, “age”);
Person oldestPersonInSpace = maxEntry(space, query, “age”);
Person youngestPersonInSpace = minEntry(space, query, “age”);
23. Christos Erotocritou www.gigaspaces.com
Complex & Compound Aggregations
23
Create a query that yields a results set:
A SQLQuery<Employee> = new SQLQuery<Employee>
(Employee.class, “country=? OR country=?”);
query.setParameter(1, “UK”);
query.setParameter(1, “USA”);
Perform group and filtering aggregations for that result set:
BB … = groupBy(space, query, new GroupByAggregator()
.select(average("salary"), min("salary"), max(“salary"))
.groupBy("department", “gender”))
.having(new GroupByFilter() {
public boolean process(GroupByValue group) {
return group.getDouble("avg(salary)") > 18000;}}));
24. Christos Erotocritou www.gigaspaces.com
Fast Update & Change API
24
Performing changes at the data-store level:
A IdsQuery<Person> idsQuery = new IdsQuery<Person>
(Person.class, id, routing)
space.change(idsQuery, new ChangeSet()
.increment(“balance.euro”, 5.2D));
Performing a series changes at the data-store level:
B IdsQuery<Person> idsQuery = new IdsQuery<Person>
(Person.class, id, routing)
space.change(idsQuery, new ChangeSet()
.increment(“someIntProperty”, 1)
.set(“someStringProperty”, “newValue”)
.putInMap(“someNestedProperty.someMapProperty”, “myKey”, 2);
27. Christos Erotocritou www.gigaspaces.com27
Create a Standardised Blueprint of the Application Topology
Node
Node Node
Node
Type: Container
Type: Server
Type: Container
Type: DB
Node
Type: App
Contained In
relationship
Connected To
relationship
28. Christos Erotocritou www.gigaspaces.com28
Using TOSCA & YAML to Describe the Application Topology
...
host:
type: cloudify.nodes.libcloud.Compute
...
##################################################################################
# Tomcat server
##################################################################################
tomcat_server:
type: cloudify.nodes.TomcatServer
relationships:
- type: cloudify.relationships.contained_in
target: host
##################################################################################
# MongoDB node as a backend data-store for the example Tomcat application
##################################################################################
mongodb:
type: cloudify.nodes.MongoDB
relationships:
- type: cloudify.relationships.contained_in
target: host
...