Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multi datastores - CLOSER'14


Published on

As part of the MODAClouds and JUNIPER FP7 EU projects we discuss our on going work on modelling big data stores.

Published in: Internet, Technology
  • Be the first to comment

  • Be the first to like this

Multi datastores - CLOSER'14

  1. 1. Multi-cloud and multi-data stores: The challenges behind heterogeneous data models Marcos Almeida, Andrey Sadovykh, SOFTEAM | ModelioSoft CLOSER’14 1
  2. 2. 20 ME 2006 17,5 ME 2005 60 ME 2012 Paris Rennes Nantes Sophia SOFTEAM – We are a French IT services / Software vendor OFTEAM, a growing company  20 years’ experience  700 experts  Regular growth pecialist in OO technologies, new architectures, methodologies anking, Defense, Telecom, … 2 23 ME 2008
  3. 3. Modelio is a modelling tool for Software and Systems Engineering ML editor with 20 years’ history o CloudML o SysML o MARTE o Code generation o Documentation o Teamwork 3 • Available under open source at
  4. 4. Our problem? Heterogeneity ulti-cloud applications ifferent providers = different data stores eterogeneous data models! ractical example o Modelio SaaS = Modelling as a Service o Traditional relational data • Users, roles, projects, services, billing…. o Challenge: How to store models? • current version is SVN 4
  5. 5. Context: Two projects researching on (multi)clouds and big data 5 - 318484 - 318763
  6. 6. MODAClouds: MDE to avoid vendor lock-in roblem: • The main keyword: Multi clouds • Multiplication of cloud providers • Threats: Multiplication of Platforms  Vendor Lock-in • Opportunities: MODACloudsML to reduce vendor-lockin ur role o Case study provider: Modelio as a Service o Technology provider: Modelling applications independently from the cloud Service A Service A Service B Service B Interface IInterface I Service A (Deployment: Paas) Service A (Deployment: Paas) NoSQL Store NoSQL Store Task Queue Task Queue Service A (Deployment: Google App Engine) Service A (Deployment: Google App Engine) BigTable Store BigTable Store Google Task Queue Google Task Queue Service Oriented Architecture based model Cloud specific concepts model Cloud provider specific parameters model <<required>> <<provided>> Deployable source codeDeployable source code .WAR file .WAR file scriptsscripts configurationconfiguration
  7. 7. JUNIPER: MDE to real-time applications roblem o The main keyword: Big Data • Multiple streams of data + Multiple data types + Real-time constraints o Current state of the art: NoSQL • Pros – Optimized for non-relational data – Optimized for answering simple queries as fast as possible! • Cons – The code is “ the model”  – Multiplication of NoSQL databases, paradigms and approaches ur role o Technology provider: Modelling real-time big data application 7 Business Objects (UML) Business Objects (UML) Big Data Structure Models (e.g. Document based Data Model) Big Data Structure Models (e.g. Document based Data Model) Code (Deployment scripts, Data Access code) Code (Deployment scripts, Data Access code)
  8. 8. The main problem is FRAGMENTATIONFRAGMENTATION! any different database management systems o Ex: • MySQL (, • Big Table ( • SimpleDB ( • Memcached ( • … any underlying data representation paradigms o Ex: • Relational Databases • Key-value Stores • Object-oriented Databases • Big Tables 8
  9. 9. The basis of our solution is MDE… Why? eparating the problem from the solution o In MODAClouds we model the problem o In JUNIPER we model the solution ostering automation o Analysis o Code generation 9 Business Objects Transformation HDFS MySQL MongoDB Abstract Models Specific Models / code Transformation Transformation
  10. 10. What do we get from MDE? Pros esign data once, store everywhere! rite your transformation once, transform anything! Cons ransformations are hard to write… ow to make sure they are CORRECT? i.e. – Is there any data/semantic loss? 10
  11. 11. Understanding the problem… Why is it so HARD? (1/2) arget Technologies based on different paradigms xample: 11 A B JPA @Entity public class A { @Basic public B getB(){ … } … } SQL create table A (…) create table B (…) create table A_B (…)
  12. 12. Understanding the problem… Why is it so HARD? (2/2) arget structure is variable xample: 12 A B ER NoSQL A BAB Here A and B are independent entities Here, for performance reasons, B is embedded in A A B
  13. 13. Before modelling we need to understand what to model! hat’s the objective of this work! oExisting databases oSupported concepts oWhat are the trade offs? 13
  14. 14. How? identifying the main concepts, and related expressiveness trade offs oncepts rade offs – Expressiveness: • What one can or cannot “say” in each database? – Performance • What kinds of query are usually cheaper in each data base? – What’s the cost of going from a database that supports concept A to one that supports concept B? 14
  15. 15. What? identify the differences in data-models supported by different data stores 15
  16. 16. Why? to propose a cloud independent model of the application data 16 Example from Modelio SaaS
  17. 17. Why? to support mapping cloud independent data types into specific ones 17 urrent situation o EXML o HTTP o RAMC uture o NoSQL database
  18. 18. Conclusion ontext: Multiplication of … o cloud providers, cloud data stores, data representation paradigms f you are a developer: o How to design my application in a cloud provider independent way? • Ok, this doesn’t exist… o What do I loose or gain when going from provider A to provider B? • Expressiveness • Performance 18
  19. 19. Future Works ODAClouds: o Cloud independent Data Model UNIPER: o Business Object Model • Targets Java 8 o Persistence Management 19
  20. 20. Thank you for your attention! arcos Almeida OFTEAM | ModelioSoft OFTEAM R&D Web Site: ttp:// odelioSoft Web Site: 20
  21. 21. M o d e l i n g s o l u t i o n s.