Connecting the dots
Operationalizing Big Data
1
2
3
4
5
6
© Stratio 2017. Confidential, All Rights Reserved.
Agenda
Where are we now?
SAGA and CQRS
Usage example
Why multi-datacenter?
Multi-DC technical dive
Questions
Where are we now?
© Stratio 2017. All Rights Reserved.
Once upon a time...
An application will handle its processes and rely on a database to handle the data persistency.
App DB
ACID
Welcome to Strong consistency systems!
© Stratio 2017. All Rights Reserved.
Data and systems grew...
App
DB
node 4
App
App
App
DB
node 3
DB
node 2
DB
node 1
1. write
2.replica
3.replica
4.OK 5. OK
6. OK
© Stratio 2017. All Rights Reserved.
And we grew more...
Welcome to Eventual consistency systems!
App
DB
node 4
App
App
App
DB
node 3
DB
node 2
DB
node 1
1. write
3.replica
4.replica
6.OK 5. OK
2. OK
© Stratio 2017. All Rights Reserved.
Now, a company looks like...
App
App
App
App
DB
node 1
DB
node 2
DB
node 3
Operational
applications
App
App
DB
node 3
DB
node 2
DB
node 1
Informational
applications
(Big Data!)
Data Science
Business Intelligence
...
Operations personnel
Content managers
...
Integration is (on most
cases) on people,
not on systems!
The situation is...
- Company does not explode benefits of Big Data/Informational systems
- Data analysis depends on dumps/historical data
- Integrating analytics results with operational applications is a nightmare
- There are not built-in mechanisms to facilitate it
- Custom integrations are required and are usually expensive
- Governance and security are splitted in two parts
- This makes much harder its maintenance
USAGE SCENARIO
© Stratio 2017. All Rights Reserved.
USAGE SCENARIO
Price
New
Price
€3.00
€3.00€2.10
Business rules:
● Number → OK
● Greater than zero → OK
● Maximum variation less than or equal to 15% → OK
A user is managing prices of articles in the portfolio of a
hypermarket. In daily operations he queries article prices
recovering the information stored at that time.
The user can modify prices and after validating the new figure
meets the established business rules, the new value shall be
stored. In the event of not meeting business rules, the
previous value shall be kept.
Business rules:
● Number → OK
● Greater than zero → OK
● Maximum variation less than or equal to 15% → KO
Final price
Final price
€3.20
Manual
process
© Stratio 2017. All Rights Reserved.
USAGE SCENARIO
If two or more users are simultaneously working on the price of the same article, the system will
guarantee correct isolation and versioning of the data item, i.e. the modified information is carried out
on the latest version, therefore only one user can apply the changes.
Current
price
€3.20
New Price
€3.20€3.15 €3.18
The current
price has
changed
Current
price
New Price
Manual
process
Manual
process
© Stratio 2017. All Rights Reserved.
USAGE SCENARIO
Price
€11.70
€18.25
1 L
Price
per Litre
€4.20 €4.20
/l
€3.90
/l
€3.65
/l
Through a compensation mechanism of Stratio Datacentric it is possible to carry out operations with
distributed transactions and later compensate the operations that were not correctly completed. For
example, during price validation of a family of articles, when prices do not meet the family coherence:
Manual
process
3 L
5 L
The family coherence validation is performed after the
prices have been modified. The compensation
mechanism will launch the necessary operations to
execute a roll-back
© Stratio 2017. All Rights Reserved.
USAGE SCENARIO
Price
New
Price
€5.40
€9.60
New Price
per Litre
€ 1.00 €1.00 /l
€0.85 /l
€0.87 /l
€5.10
€10.44
Final
price
New
Price
€5.40
€9.60
New Price
per Litre
€ 1.00 €1.00 /l
€0.85 /l
€0.87 /l
€5.10
€10.44
Manual
process
1 L
3 L
5 L
Through a compensation mechanism of Stratio Datacentric the transaction will be completed or
aborted compensating, in the event of the latter, with the necessary operations to restore the initial
state. For example, during price validation of a family of articles, when prices do not meet the family
coherence
© Stratio 2017. All Rights Reserved.
USAGE SCENARIO
In daily operations, many users will work concurrently in Stratio Datacentric, but their user experience
must not be affected.
...
...
© Stratio 2017. All Rights Reserved.
USAGE SCENARIO
In the event of applying multiple price modifications, these changes are made through an automatic
batch process. The file shall contain numerous records that Stratio Datacentric must process following
the transactional or operational flow depending on the established rule.
Batch
process
Date of new price
newer than the
one stored?
Article Price Date
Validate business
rules
Business rules:
● Number → OK
● Greater than zero → OK
● Maximum variation less than or equal to 15% → OK
YES
Save past
records
NO
© Stratio 2017. All Rights Reserved.
USAGE SCENARIO
Finally, the two aforementioned operations coexist in Stratio
Datacentric:
● Manual (online price change by users)
● Automatic (price changes through a batch file)
And all changes are consolidated in the
Operational DB and replicated to the
Analytical Datastore, giving the user the
capacity to query and analyse all past
information included in Stratio Datacentric.
Batch
process
Manual
process
© Stratio 2017. All Rights Reserved.
Manual Process
Validate the coexistence of manual processes (user action) with automatic processes (batch) in Stratio
Datacentric, also facilitating querying and analysis of information included by both operations
Manual
process
MANUAL PROCESS
The data included manually by the
user is sent to Stratio Datacentric,
validating the fulfilment of
established business rules for
storage.
In addition, this information will be
replicated to the analytical datastore
for query and analysis.
Business rule
validation
HDFS
Analytics
Architecture
Microservices
© Stratio 2017. All Rights Reserved.
Automated process
Rule-based information division
The batch-processed information is
derived in Sparta, based on defined
rules, to:
· the Operational DB through the
transactional flow, also replicated
to the Analytical datastore
· the Analytical datastore directly
Analytics
Batch
process
Business rule
validation
Architecture
Microservices
Validate the coexistence of manual processes (user action) with automatic processes (batch) in Stratio
Datacentric, also facilitating querying and analysis of information included by both operations
Business rule
validation
HDFS
© Stratio 2017. All Rights Reserved.
Operational and Analytics coexistence
Batch
process
Finally, the three operations
(transactional, batch and analytical)
must be validated to coexist in
Stratio Datacentric
Manual
process
Analytics
Architecture
Microservices
Business rule
validation
Validate the coexistence of manual processes (user action) with automatic processes (batch) in Stratio
Datacentric, also facilitating querying and analysis of information included by both operations
HDFS
Rule-based information division
User Interface
Streaming
EVENT
MICRO-SERVICES
FRAMEWORK
Event Bus
Elastic
SAGA
Service
Query
Service
Command
Service
Command Bus
HDFS
COMMAND
SECURITY AND GOVERNANCE
DCDATA-CENTRIC
DC
DC
DC
User Interface
MICRO-SERVICES
FRAMEWORK
Event Bus
Elastic
SAGA
Service
Query
Service
Command
Service
Command Bus
HDFS
SECURITY AND GOVERNANCE
DC
DC
SAGA and CQRS
© Stratio 2017. All Rights Reserved.
CQRS PATTERN
© Stratio 2017. All Rights Reserved.
MICROSERVICES -> MICROTRANSACTIONS
© Stratio 2017. All Rights Reserved.
SAGA Pattern
© Stratio 2017. All Rights Reserved.
SAGA compensation
Product
Invoicing
Emailing
...
© Stratio 2017. All Rights Reserved.
SAGA compensation
Product
Invoicing
Emailing
...
An error ocurred on preparing
the invoice for our customer…
Email was sent in parallel… We need to trigger a compensation mechanism for email:
- Send another email asking for missing data
- Cancel the purchase and send an email of apologise
- ...
© Stratio 2017. All Rights Reserved.
ROLL-OUT ARCHITECTURE
© Stratio 2017. All Rights Reserved.
ROLL-OUT ARCHITECTURE - CONCURRENCE
© Stratio 2017. All Rights Reserved.
ROLL-OUT ARCHITECTURE - ONLINE
© Stratio 2017. All Rights Reserved.
ROLL-OUT ARCHITECTURE - BATCH
simplicity
© Stratio 2017. All Rights Reserved.
What happens with Critical apps?
“ “An application important to
keep the business running
© Stratio 2017. All Rights Reserved.
What is Multi-DC?
App App...
DB DB...
App App...
DB DB...
Location 1 Location 2
© Stratio 2017. All Rights Reserved.
Applications on Multi-DC
App App... App App...
Location 1 Location 2
Depending on the application we can guarantee different execution scenarios:
- Active - Active: application is replicated and executed in parallel
- Active - Passive: there is an active part and a stand-by part keeping the same state in case of failover
- Active with failover mechanisms: applications can self-heal and achieve resiliency
Coordination
© Stratio 2017. All Rights Reserved.
Multi-DC datastores
DB DB... DB DB...
Location 1 Location 2
A multi-dc setup needs to guarantee a copy of our data in both datacenters
Data
replication
© Stratio 2017. All Rights Reserved.
Example Cassandra
Client Write
replication
replication
It offers the possibility to modify the configuration to achieve different consistency levels
© Stratio 2017. All Rights Reserved.
Stratio EOS (PaaS)
© Stratio 2017. All Rights Reserved.
Multi-DC PaaS
App App...
DB DB...
App App...
DB DB...
Location 1 Location 2
© Stratio 2017. All Rights Reserved.
Multi-DC PaaS - Traffics
Location 1 Location 2
Master 1 Master 2 Master 3Gosec 1 Gosec 2 Gosec 3
... ...
Management & Control
Security
Data & Application traffic
Agent 1 Agent 2 Agent N Agent N+1 Agent N+MAgent N+2
© Stratio 2017. All Rights Reserved.
Logical network setup - Real
scenario
So… Multi-DC is exactly the
same as one data center?
© Stratio 2017. All Rights Reserved.
Reality...
App App...
DB DB...
App App...
DB DB...
Location 1 Location 2
... ...Bandwidth/
latency?
© Stratio 2017. All Rights Reserved.
Distributed Applications and latency
Location 1 Location 2
Master 1 Master 2 Master 3Gosec 1 Gosec 2 Gosec 3
... ...
Management & Control
Security
Data & Application traffic
Agent 1 Agent 2 Agent N Agent N+1 Agent N+MAgent N+2
© Stratio 2017. All Rights Reserved.
Considerations
Data locality
Failover on disaster:
- Fail on container
- Fail on node
- Fail on datacenter
Location 1 Location 2
Master 1 Master 2 Master 3Gosec 1 Gosec 2 Gosec 3
... ...
Management & Control
Security
Data & Application traffic
Agent 1 Agent 2 Agent N Agent N+1 Agent N+MAgent N+2
© Stratio 2017. All Rights Reserved.
Advantages of using a PaaS with Multi-DC
- Simplicity
- Recovery mechanisms
- Application, Node and DC
- Network defined infrastructure
- ACLs to define network
- Upgrades and new deployments are easy to setup
- Security:
- Communication encrypted
- Centralized access policies definition
- Enforcement of accessing rules
- Multi-tenant with resource isolation
© Stratio 2017. All Rights Reserved.
Conclusions
- Multi datacenter is complex
- Requires coordination
- Requires awareness of the topology on the PaaS and in some cases on applications
- Having a PaaS to manage and handle the deployment and failure logic is
necessary for easier operations.
- Understanding and applying different consistency levels we can use the right
one to cover needs
- Strong consistency
- Eventual consistency
- We can combine operational and informational processes and do it with Multi-DC
Operationalizing Big Data
Operationalizing Big Data

Operationalizing Big Data

  • 1.
  • 2.
    1 2 3 4 5 6 © Stratio 2017.Confidential, All Rights Reserved. Agenda Where are we now? SAGA and CQRS Usage example Why multi-datacenter? Multi-DC technical dive Questions
  • 3.
  • 4.
    © Stratio 2017.All Rights Reserved. Once upon a time... An application will handle its processes and rely on a database to handle the data persistency. App DB ACID Welcome to Strong consistency systems!
  • 5.
    © Stratio 2017.All Rights Reserved. Data and systems grew... App DB node 4 App App App DB node 3 DB node 2 DB node 1 1. write 2.replica 3.replica 4.OK 5. OK 6. OK
  • 6.
    © Stratio 2017.All Rights Reserved. And we grew more... Welcome to Eventual consistency systems! App DB node 4 App App App DB node 3 DB node 2 DB node 1 1. write 3.replica 4.replica 6.OK 5. OK 2. OK
  • 7.
    © Stratio 2017.All Rights Reserved. Now, a company looks like... App App App App DB node 1 DB node 2 DB node 3 Operational applications App App DB node 3 DB node 2 DB node 1 Informational applications (Big Data!) Data Science Business Intelligence ... Operations personnel Content managers ... Integration is (on most cases) on people, not on systems!
  • 8.
    The situation is... -Company does not explode benefits of Big Data/Informational systems - Data analysis depends on dumps/historical data - Integrating analytics results with operational applications is a nightmare - There are not built-in mechanisms to facilitate it - Custom integrations are required and are usually expensive - Governance and security are splitted in two parts - This makes much harder its maintenance
  • 9.
  • 10.
    © Stratio 2017.All Rights Reserved. USAGE SCENARIO Price New Price €3.00 €3.00€2.10 Business rules: ● Number → OK ● Greater than zero → OK ● Maximum variation less than or equal to 15% → OK A user is managing prices of articles in the portfolio of a hypermarket. In daily operations he queries article prices recovering the information stored at that time. The user can modify prices and after validating the new figure meets the established business rules, the new value shall be stored. In the event of not meeting business rules, the previous value shall be kept. Business rules: ● Number → OK ● Greater than zero → OK ● Maximum variation less than or equal to 15% → KO Final price Final price €3.20 Manual process
  • 11.
    © Stratio 2017.All Rights Reserved. USAGE SCENARIO If two or more users are simultaneously working on the price of the same article, the system will guarantee correct isolation and versioning of the data item, i.e. the modified information is carried out on the latest version, therefore only one user can apply the changes. Current price €3.20 New Price €3.20€3.15 €3.18 The current price has changed Current price New Price Manual process Manual process
  • 12.
    © Stratio 2017.All Rights Reserved. USAGE SCENARIO Price €11.70 €18.25 1 L Price per Litre €4.20 €4.20 /l €3.90 /l €3.65 /l Through a compensation mechanism of Stratio Datacentric it is possible to carry out operations with distributed transactions and later compensate the operations that were not correctly completed. For example, during price validation of a family of articles, when prices do not meet the family coherence: Manual process 3 L 5 L The family coherence validation is performed after the prices have been modified. The compensation mechanism will launch the necessary operations to execute a roll-back
  • 13.
    © Stratio 2017.All Rights Reserved. USAGE SCENARIO Price New Price €5.40 €9.60 New Price per Litre € 1.00 €1.00 /l €0.85 /l €0.87 /l €5.10 €10.44 Final price New Price €5.40 €9.60 New Price per Litre € 1.00 €1.00 /l €0.85 /l €0.87 /l €5.10 €10.44 Manual process 1 L 3 L 5 L Through a compensation mechanism of Stratio Datacentric the transaction will be completed or aborted compensating, in the event of the latter, with the necessary operations to restore the initial state. For example, during price validation of a family of articles, when prices do not meet the family coherence
  • 14.
    © Stratio 2017.All Rights Reserved. USAGE SCENARIO In daily operations, many users will work concurrently in Stratio Datacentric, but their user experience must not be affected. ... ...
  • 15.
    © Stratio 2017.All Rights Reserved. USAGE SCENARIO In the event of applying multiple price modifications, these changes are made through an automatic batch process. The file shall contain numerous records that Stratio Datacentric must process following the transactional or operational flow depending on the established rule. Batch process Date of new price newer than the one stored? Article Price Date Validate business rules Business rules: ● Number → OK ● Greater than zero → OK ● Maximum variation less than or equal to 15% → OK YES Save past records NO
  • 16.
    © Stratio 2017.All Rights Reserved. USAGE SCENARIO Finally, the two aforementioned operations coexist in Stratio Datacentric: ● Manual (online price change by users) ● Automatic (price changes through a batch file) And all changes are consolidated in the Operational DB and replicated to the Analytical Datastore, giving the user the capacity to query and analyse all past information included in Stratio Datacentric. Batch process Manual process
  • 17.
    © Stratio 2017.All Rights Reserved. Manual Process Validate the coexistence of manual processes (user action) with automatic processes (batch) in Stratio Datacentric, also facilitating querying and analysis of information included by both operations Manual process MANUAL PROCESS The data included manually by the user is sent to Stratio Datacentric, validating the fulfilment of established business rules for storage. In addition, this information will be replicated to the analytical datastore for query and analysis. Business rule validation HDFS Analytics Architecture Microservices
  • 18.
    © Stratio 2017.All Rights Reserved. Automated process Rule-based information division The batch-processed information is derived in Sparta, based on defined rules, to: · the Operational DB through the transactional flow, also replicated to the Analytical datastore · the Analytical datastore directly Analytics Batch process Business rule validation Architecture Microservices Validate the coexistence of manual processes (user action) with automatic processes (batch) in Stratio Datacentric, also facilitating querying and analysis of information included by both operations Business rule validation HDFS
  • 19.
    © Stratio 2017.All Rights Reserved. Operational and Analytics coexistence Batch process Finally, the three operations (transactional, batch and analytical) must be validated to coexist in Stratio Datacentric Manual process Analytics Architecture Microservices Business rule validation Validate the coexistence of manual processes (user action) with automatic processes (batch) in Stratio Datacentric, also facilitating querying and analysis of information included by both operations HDFS Rule-based information division
  • 20.
  • 21.
  • 22.
  • 23.
    © Stratio 2017.All Rights Reserved. CQRS PATTERN
  • 24.
    © Stratio 2017.All Rights Reserved. MICROSERVICES -> MICROTRANSACTIONS
  • 25.
    © Stratio 2017.All Rights Reserved. SAGA Pattern
  • 26.
    © Stratio 2017.All Rights Reserved. SAGA compensation Product Invoicing Emailing ...
  • 27.
    © Stratio 2017.All Rights Reserved. SAGA compensation Product Invoicing Emailing ... An error ocurred on preparing the invoice for our customer… Email was sent in parallel… We need to trigger a compensation mechanism for email: - Send another email asking for missing data - Cancel the purchase and send an email of apologise - ...
  • 28.
    © Stratio 2017.All Rights Reserved. ROLL-OUT ARCHITECTURE
  • 29.
    © Stratio 2017.All Rights Reserved. ROLL-OUT ARCHITECTURE - CONCURRENCE
  • 30.
    © Stratio 2017.All Rights Reserved. ROLL-OUT ARCHITECTURE - ONLINE
  • 31.
    © Stratio 2017.All Rights Reserved. ROLL-OUT ARCHITECTURE - BATCH
  • 32.
  • 33.
    © Stratio 2017.All Rights Reserved. What happens with Critical apps?
  • 34.
    “ “An applicationimportant to keep the business running
  • 35.
    © Stratio 2017.All Rights Reserved. What is Multi-DC? App App... DB DB... App App... DB DB... Location 1 Location 2
  • 36.
    © Stratio 2017.All Rights Reserved. Applications on Multi-DC App App... App App... Location 1 Location 2 Depending on the application we can guarantee different execution scenarios: - Active - Active: application is replicated and executed in parallel - Active - Passive: there is an active part and a stand-by part keeping the same state in case of failover - Active with failover mechanisms: applications can self-heal and achieve resiliency Coordination
  • 37.
    © Stratio 2017.All Rights Reserved. Multi-DC datastores DB DB... DB DB... Location 1 Location 2 A multi-dc setup needs to guarantee a copy of our data in both datacenters Data replication
  • 38.
    © Stratio 2017.All Rights Reserved. Example Cassandra Client Write replication replication It offers the possibility to modify the configuration to achieve different consistency levels
  • 39.
    © Stratio 2017.All Rights Reserved. Stratio EOS (PaaS)
  • 40.
    © Stratio 2017.All Rights Reserved. Multi-DC PaaS App App... DB DB... App App... DB DB... Location 1 Location 2
  • 41.
    © Stratio 2017.All Rights Reserved. Multi-DC PaaS - Traffics Location 1 Location 2 Master 1 Master 2 Master 3Gosec 1 Gosec 2 Gosec 3 ... ... Management & Control Security Data & Application traffic Agent 1 Agent 2 Agent N Agent N+1 Agent N+MAgent N+2
  • 42.
    © Stratio 2017.All Rights Reserved. Logical network setup - Real scenario
  • 43.
    So… Multi-DC isexactly the same as one data center?
  • 44.
    © Stratio 2017.All Rights Reserved. Reality... App App... DB DB... App App... DB DB... Location 1 Location 2 ... ...Bandwidth/ latency?
  • 45.
    © Stratio 2017.All Rights Reserved. Distributed Applications and latency Location 1 Location 2 Master 1 Master 2 Master 3Gosec 1 Gosec 2 Gosec 3 ... ... Management & Control Security Data & Application traffic Agent 1 Agent 2 Agent N Agent N+1 Agent N+MAgent N+2
  • 46.
    © Stratio 2017.All Rights Reserved. Considerations Data locality Failover on disaster: - Fail on container - Fail on node - Fail on datacenter Location 1 Location 2 Master 1 Master 2 Master 3Gosec 1 Gosec 2 Gosec 3 ... ... Management & Control Security Data & Application traffic Agent 1 Agent 2 Agent N Agent N+1 Agent N+MAgent N+2
  • 47.
    © Stratio 2017.All Rights Reserved. Advantages of using a PaaS with Multi-DC - Simplicity - Recovery mechanisms - Application, Node and DC - Network defined infrastructure - ACLs to define network - Upgrades and new deployments are easy to setup - Security: - Communication encrypted - Centralized access policies definition - Enforcement of accessing rules - Multi-tenant with resource isolation
  • 48.
    © Stratio 2017.All Rights Reserved. Conclusions - Multi datacenter is complex - Requires coordination - Requires awareness of the topology on the PaaS and in some cases on applications - Having a PaaS to manage and handle the deployment and failure logic is necessary for easier operations. - Understanding and applying different consistency levels we can use the right one to cover needs - Strong consistency - Eventual consistency - We can combine operational and informational processes and do it with Multi-DC