Big Data Expo 2015 - Gigaspaces Making Sense of it all

BIG DATA|MAKING SENSE OF IT ALL
Author: Christos Erotocritou
christos@gigaspaces.com

Christos Erotocritou www.gigaspaces.com
Agenda
2
High-level view of the big data technology landscape
Big Data architecture & integration patterns
Complex compound queries
Orchestration in Big Data
1
2
3
4

Christos Erotocritou www.gigaspaces.com3
Key / Value
IMDG
Stream Processing
SSD
SQL NoSQL
Making Sense of the Exploding Big Data World

Let’s look at some
tools & technologies

SQL Technologies
5
• Query: ANSI 92
• Semantics:
• CRUD
• Aggregation
• Projection
• Partial update
• Performance: 100’s/Sec
• Consistency: Transactional
• Scaling: Mostly Scale-UP
• Availability: Disk Based

NoSQL Technologies
6
• Query: Proprietary but rich
• Semantics:
• CRUD
• Map/Reduce
• No Projection
• No Partial update
• Performance: 1000’s/Sec
• Consistency: Eventual
• Scaling: Mostly Scale-Out
• Availability: Replication based

IMDG Technologies
7
• Query: Proprietary but rich
• Semantics:
• CRUD
• Aggregation API + Map/Reduce
• Projection
• Partial Update
• Performance: 100k/sec
• Consistency: Transactional
• Availability: Replication based

Key/Value Technologies
8
• Query: Key, Value
• Semantics:
• Mostly Read
• No Aggregation
• No Projection
• No Partial update
• Performance: 1M’s/sec
• Consistency: Atomic
• Availability: Limited

Stream Processing Technologies
9
• Semantics:
• Event-driven data processing
• Performance: 10M’s/sec
• Machine learning
• Real-time analytics
• Depend on external persistency for
maintaining state
Spout
Bolt
Bolt
Bolt Bolt

SSD Technology is quickly shaping the Big Data Landscape
10
Great for heavy-reads initial-loads
Store indexes in memory and payload on SSD
SSD-extended in-memory products can provide
great performance with increased capacity and persistence
Big data but also fast data

Summary
11
Many API’s - Same Data
Use-case requirements across tools
SSD is shaping the landscape
Can we create a mashup of such technologies?

How can we integrate such
technologies and provide a
common access API ?

A typical Big Data App logical architecture can look like this
13
Batch
Processing
RT Analytics
Service
Storage
Front End
Application
/ Service
Back End
Front-end users accessing a
distributed multi-facet service
back-end users accessing a
business insights and system
maintenance metrics

We need a High-Speed Data Store…
14
• Key / Value
• Document
• Graph
• Map / Reduce
• Transactional
• Stream based
But we’re not there just yet…
Common Data Store serving
Multiple Semantics/API
Disk becomes
the new tape
High-speed
Data Store
Front
End
Back
End

We can use IMDG technologies to integrate all our Data Sources
15
High-speed
Data Bus
(IMDG)
Front
End
Back
End
MySQLMongoDB
Mongo Sync RDBMS Sync
Hadoop Sync
RT Streaming
Direct Access
RT Transactional
Data Access
Batch Layer
Speed Layer
Web Storage Layer
Data bus:
• Resilient, FT & HA
• Transactional
• High-throughput
Hadoop
Storm

Online consumer media service real-world use-case
16
High-speed
Data Bus
(IMDG)
Storm
Sync new data
available to end
user
Hadoop Sync
RT Streaming
Direct Access
(if needed)
Purchase order
(Transactional)
Long-term
analytics and
storage
Real-time business
analytics on user
activity
State persistency
Fast media
search
Hadoop
Downstream
System
Direct Access
(if needed)
business
analytics

What really goes on in the grid…
17
New
DataClient
Polling
Container
Notify
Container
Proc.
Data
MongoDB
New
Data
Client
Proc.
Data
Polling
container
writes to
MongoDB
New data
automatically
synced to ES
Mirror DB
Space
mirroring
Service
Application
Storage
Client writes
new data to
the space

Stream Processing Integration
18
Stream Producer
Storm
Stream
Processing
Spout
Data Grid
Data Stream (FIFO)

Summary
19
How to create a common data API
Using a high-speed data bus to integrate
SSD can be used for bigger and faster data

Using multiple
query semantics

Nested Queries & Projections
21
Query for a Person who lives in New York:
A … = new SQLQuery<Person>(Person.class, “address.city = ‘New York’”);
Query for a Dealer which sells a Honda:
B … = new SQLQuery<Dealer>(Dealer.class, “cars[*] = ‘Honda’”);
Query for a Person with projections on first and last names:
C IdsQuery<Person> idsQuery = new IdsQuery<Person>
(Person.class, new Long[]{id1, id2})
.setProjections(“firstName”, “lastName”);
Person result[] = space.readByIds(idsQuery).getResultsArray();

Basic Aggregations
22
Create a query that yields a results set:
A SQLQuery<Employee> = new SQLQuery<Employee>
(Employee.class, “country=? OR country=?”);
query.setParameter(1, “UK”);
query.setParameter(2, “USA”);
Perform aggregations for that result set:
B Integer maxAgeInSpace = max(space, query, “age”);
Integer minAgeInSpace = min(space, query, “age”);
Integer combinedAgeInSpace = sum(space, query, “age”);
Double averageAge = average(space, query, “age”);
Person oldestPersonInSpace = maxEntry(space, query, “age”);
Person youngestPersonInSpace = minEntry(space, query, “age”);

Complex & Compound Aggregations
23
Create a query that yields a results set:
A SQLQuery<Employee> = new SQLQuery<Employee>
(Employee.class, “country=? OR country=?”);
query.setParameter(1, “UK”);
query.setParameter(1, “USA”);
Perform group and filtering aggregations for that result set:
BB … = groupBy(space, query, new GroupByAggregator()
.select(average("salary"), min("salary"), max(“salary"))
.groupBy("department", “gender”))
.having(new GroupByFilter() {
public boolean process(GroupByValue group) {
return group.getDouble("avg(salary)") > 18000;}}));

Fast Update & Change API
24
Performing changes at the data-store level:
A IdsQuery<Person> idsQuery = new IdsQuery<Person>
(Person.class, id, routing)
space.change(idsQuery, new ChangeSet()
.increment(“balance.euro”, 5.2D));
Performing a series changes at the data-store level:
B IdsQuery<Person> idsQuery = new IdsQuery<Person>
(Person.class, id, routing)
space.change(idsQuery, new ChangeSet()
.increment(“someIntProperty”, 1)
.set(“someStringProperty”, “newValue”)
.putInMap(“someNestedProperty.someMapProperty”, “myKey”, 2);

Deploy
Install
Configure
Monitor
Manage Provision
The Application Deployment Lifecycle

Create a Standardised Blueprint of the Application Topology
Node
Node Node
Node
Type: Container
Type: Server
Type: Container
Type: DB
Node
Type: App
Contained In
relationship
Connected To
relationship

Using TOSCA & YAML to Describe the Application Topology
...
host:
type: cloudify.nodes.libcloud.Compute
...
##################################################################################
# Tomcat server
##################################################################################
tomcat_server:
type: cloudify.nodes.TomcatServer
relationships:
- type: cloudify.relationships.contained_in
target: host
##################################################################################
# MongoDB node as a backend data-store for the example Tomcat application
##################################################################################
mongodb:
type: cloudify.nodes.MongoDB
relationships:
- type: cloudify.relationships.contained_in
target: host
...

Post-deployment Management & Monitoring

Thanks For Attending
Author: Christos Erotocritou
christos@gigaspaces.com

Big Data Expo 2015 - Gigaspaces Making Sense of it all

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to Big Data Expo 2015 - Gigaspaces Making Sense of it all

Similar to Big Data Expo 2015 - Gigaspaces Making Sense of it all (20)

More from BigDataExpo

More from BigDataExpo (20)

Recently uploaded

Recently uploaded (20)

Big Data Expo 2015 - Gigaspaces Making Sense of it all