Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Operationalizing Big Data

210 views

Published on

Our data lake is full of data, our Business intelligence is squeezing every byte of information and our operational applications are just great… why do I still feel I can do better? Having big data gives you a competitive advantage, but using big data in your daily operations will give you much more. Taking the best of both worlds, we aim for systems in which big data analysis is performed on operational data in real-time and our applications embed the extracted intelligence in their every-day operations. The good news is that combining both is perfectly possible using a data-centric approach together with well-known industry patterns and a few good practices.
By: Nacho Mulas

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Operationalizing Big Data

  1. 1. Connecting the dots Operationalizing Big Data
  2. 2. 1 2 3 4 5 6 © Stratio 2017. Confidential, All Rights Reserved. Agenda Where are we now? SAGA and CQRS Usage example Why multi-datacenter? Multi-DC technical dive Questions
  3. 3. Where are we now?
  4. 4. © Stratio 2017. All Rights Reserved. Once upon a time... An application will handle its processes and rely on a database to handle the data persistency. App DB ACID Welcome to Strong consistency systems!
  5. 5. © Stratio 2017. All Rights Reserved. Data and systems grew... App DB node 4 App App App DB node 3 DB node 2 DB node 1 1. write 2.replica 3.replica 4.OK 5. OK 6. OK
  6. 6. © Stratio 2017. All Rights Reserved. And we grew more... Welcome to Eventual consistency systems! App DB node 4 App App App DB node 3 DB node 2 DB node 1 1. write 3.replica 4.replica 6.OK 5. OK 2. OK
  7. 7. © Stratio 2017. All Rights Reserved. Now, a company looks like... App App App App DB node 1 DB node 2 DB node 3 Operational applications App App DB node 3 DB node 2 DB node 1 Informational applications (Big Data!) Data Science Business Intelligence ... Operations personnel Content managers ... Integration is (on most cases) on people, not on systems!
  8. 8. The situation is... - Company does not explode benefits of Big Data/Informational systems - Data analysis depends on dumps/historical data - Integrating analytics results with operational applications is a nightmare - There are not built-in mechanisms to facilitate it - Custom integrations are required and are usually expensive - Governance and security are splitted in two parts - This makes much harder its maintenance
  9. 9. USAGE SCENARIO
  10. 10. © Stratio 2017. All Rights Reserved. USAGE SCENARIO Price New Price €3.00 €3.00€2.10 Business rules: ● Number → OK ● Greater than zero → OK ● Maximum variation less than or equal to 15% → OK A user is managing prices of articles in the portfolio of a hypermarket. In daily operations he queries article prices recovering the information stored at that time. The user can modify prices and after validating the new figure meets the established business rules, the new value shall be stored. In the event of not meeting business rules, the previous value shall be kept. Business rules: ● Number → OK ● Greater than zero → OK ● Maximum variation less than or equal to 15% → KO Final price Final price €3.20 Manual process
  11. 11. © Stratio 2017. All Rights Reserved. USAGE SCENARIO If two or more users are simultaneously working on the price of the same article, the system will guarantee correct isolation and versioning of the data item, i.e. the modified information is carried out on the latest version, therefore only one user can apply the changes. Current price €3.20 New Price €3.20€3.15 €3.18 The current price has changed Current price New Price Manual process Manual process
  12. 12. © Stratio 2017. All Rights Reserved. USAGE SCENARIO Price €11.70 €18.25 1 L Price per Litre €4.20 €4.20 /l €3.90 /l €3.65 /l Through a compensation mechanism of Stratio Datacentric it is possible to carry out operations with distributed transactions and later compensate the operations that were not correctly completed. For example, during price validation of a family of articles, when prices do not meet the family coherence: Manual process 3 L 5 L The family coherence validation is performed after the prices have been modified. The compensation mechanism will launch the necessary operations to execute a roll-back
  13. 13. © Stratio 2017. All Rights Reserved. USAGE SCENARIO Price New Price €5.40 €9.60 New Price per Litre € 1.00 €1.00 /l €0.85 /l €0.87 /l €5.10 €10.44 Final price New Price €5.40 €9.60 New Price per Litre € 1.00 €1.00 /l €0.85 /l €0.87 /l €5.10 €10.44 Manual process 1 L 3 L 5 L Through a compensation mechanism of Stratio Datacentric the transaction will be completed or aborted compensating, in the event of the latter, with the necessary operations to restore the initial state. For example, during price validation of a family of articles, when prices do not meet the family coherence
  14. 14. © Stratio 2017. All Rights Reserved. USAGE SCENARIO In daily operations, many users will work concurrently in Stratio Datacentric, but their user experience must not be affected. ... ...
  15. 15. © Stratio 2017. All Rights Reserved. USAGE SCENARIO In the event of applying multiple price modifications, these changes are made through an automatic batch process. The file shall contain numerous records that Stratio Datacentric must process following the transactional or operational flow depending on the established rule. Batch process Date of new price newer than the one stored? Article Price Date Validate business rules Business rules: ● Number → OK ● Greater than zero → OK ● Maximum variation less than or equal to 15% → OK YES Save past records NO
  16. 16. © Stratio 2017. All Rights Reserved. USAGE SCENARIO Finally, the two aforementioned operations coexist in Stratio Datacentric: ● Manual (online price change by users) ● Automatic (price changes through a batch file) And all changes are consolidated in the Operational DB and replicated to the Analytical Datastore, giving the user the capacity to query and analyse all past information included in Stratio Datacentric. Batch process Manual process
  17. 17. © Stratio 2017. All Rights Reserved. Manual Process Validate the coexistence of manual processes (user action) with automatic processes (batch) in Stratio Datacentric, also facilitating querying and analysis of information included by both operations Manual process MANUAL PROCESS The data included manually by the user is sent to Stratio Datacentric, validating the fulfilment of established business rules for storage. In addition, this information will be replicated to the analytical datastore for query and analysis. Business rule validation HDFS Analytics Architecture Microservices
  18. 18. © Stratio 2017. All Rights Reserved. Automated process Rule-based information division The batch-processed information is derived in Sparta, based on defined rules, to: · the Operational DB through the transactional flow, also replicated to the Analytical datastore · the Analytical datastore directly Analytics Batch process Business rule validation Architecture Microservices Validate the coexistence of manual processes (user action) with automatic processes (batch) in Stratio Datacentric, also facilitating querying and analysis of information included by both operations Business rule validation HDFS
  19. 19. © Stratio 2017. All Rights Reserved. Operational and Analytics coexistence Batch process Finally, the three operations (transactional, batch and analytical) must be validated to coexist in Stratio Datacentric Manual process Analytics Architecture Microservices Business rule validation Validate the coexistence of manual processes (user action) with automatic processes (batch) in Stratio Datacentric, also facilitating querying and analysis of information included by both operations HDFS Rule-based information division
  20. 20. User Interface Streaming EVENT MICRO-SERVICES FRAMEWORK Event Bus Elastic SAGA Service Query Service Command Service Command Bus HDFS COMMAND SECURITY AND GOVERNANCE DCDATA-CENTRIC DC DC DC
  21. 21. User Interface MICRO-SERVICES FRAMEWORK Event Bus Elastic SAGA Service Query Service Command Service Command Bus HDFS SECURITY AND GOVERNANCE DC DC
  22. 22. SAGA and CQRS
  23. 23. © Stratio 2017. All Rights Reserved. CQRS PATTERN
  24. 24. © Stratio 2017. All Rights Reserved. MICROSERVICES -> MICROTRANSACTIONS
  25. 25. © Stratio 2017. All Rights Reserved. SAGA Pattern
  26. 26. © Stratio 2017. All Rights Reserved. SAGA compensation Product Invoicing Emailing ...
  27. 27. © Stratio 2017. All Rights Reserved. SAGA compensation Product Invoicing Emailing ... An error ocurred on preparing the invoice for our customer… Email was sent in parallel… We need to trigger a compensation mechanism for email: - Send another email asking for missing data - Cancel the purchase and send an email of apologise - ...
  28. 28. © Stratio 2017. All Rights Reserved. ROLL-OUT ARCHITECTURE
  29. 29. © Stratio 2017. All Rights Reserved. ROLL-OUT ARCHITECTURE - CONCURRENCE
  30. 30. © Stratio 2017. All Rights Reserved. ROLL-OUT ARCHITECTURE - ONLINE
  31. 31. © Stratio 2017. All Rights Reserved. ROLL-OUT ARCHITECTURE - BATCH
  32. 32. simplicity
  33. 33. © Stratio 2017. All Rights Reserved. What happens with Critical apps?
  34. 34. “ “An application important to keep the business running
  35. 35. © Stratio 2017. All Rights Reserved. What is Multi-DC? App App... DB DB... App App... DB DB... Location 1 Location 2
  36. 36. © Stratio 2017. All Rights Reserved. Applications on Multi-DC App App... App App... Location 1 Location 2 Depending on the application we can guarantee different execution scenarios: - Active - Active: application is replicated and executed in parallel - Active - Passive: there is an active part and a stand-by part keeping the same state in case of failover - Active with failover mechanisms: applications can self-heal and achieve resiliency Coordination
  37. 37. © Stratio 2017. All Rights Reserved. Multi-DC datastores DB DB... DB DB... Location 1 Location 2 A multi-dc setup needs to guarantee a copy of our data in both datacenters Data replication
  38. 38. © Stratio 2017. All Rights Reserved. Example Cassandra Client Write replication replication It offers the possibility to modify the configuration to achieve different consistency levels
  39. 39. © Stratio 2017. All Rights Reserved. Stratio EOS (PaaS)
  40. 40. © Stratio 2017. All Rights Reserved. Multi-DC PaaS App App... DB DB... App App... DB DB... Location 1 Location 2
  41. 41. © Stratio 2017. All Rights Reserved. Multi-DC PaaS - Traffics Location 1 Location 2 Master 1 Master 2 Master 3Gosec 1 Gosec 2 Gosec 3 ... ... Management & Control Security Data & Application traffic Agent 1 Agent 2 Agent N Agent N+1 Agent N+MAgent N+2
  42. 42. © Stratio 2017. All Rights Reserved. Logical network setup - Real scenario
  43. 43. So… Multi-DC is exactly the same as one data center?
  44. 44. © Stratio 2017. All Rights Reserved. Reality... App App... DB DB... App App... DB DB... Location 1 Location 2 ... ...Bandwidth/ latency?
  45. 45. © Stratio 2017. All Rights Reserved. Distributed Applications and latency Location 1 Location 2 Master 1 Master 2 Master 3Gosec 1 Gosec 2 Gosec 3 ... ... Management & Control Security Data & Application traffic Agent 1 Agent 2 Agent N Agent N+1 Agent N+MAgent N+2
  46. 46. © Stratio 2017. All Rights Reserved. Considerations Data locality Failover on disaster: - Fail on container - Fail on node - Fail on datacenter Location 1 Location 2 Master 1 Master 2 Master 3Gosec 1 Gosec 2 Gosec 3 ... ... Management & Control Security Data & Application traffic Agent 1 Agent 2 Agent N Agent N+1 Agent N+MAgent N+2
  47. 47. © Stratio 2017. All Rights Reserved. Advantages of using a PaaS with Multi-DC - Simplicity - Recovery mechanisms - Application, Node and DC - Network defined infrastructure - ACLs to define network - Upgrades and new deployments are easy to setup - Security: - Communication encrypted - Centralized access policies definition - Enforcement of accessing rules - Multi-tenant with resource isolation
  48. 48. © Stratio 2017. All Rights Reserved. Conclusions - Multi datacenter is complex - Requires coordination - Requires awareness of the topology on the PaaS and in some cases on applications - Having a PaaS to manage and handle the deployment and failure logic is necessary for easier operations. - Understanding and applying different consistency levels we can use the right one to cover needs - Strong consistency - Eventual consistency - We can combine operational and informational processes and do it with Multi-DC

×