Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards the Characterization of Realistic Models: Evaluation of Multidisciplinary Graph Metrics

661 views

Published on

My presentation on the MODELS 2016 conference on characterizing engineering models using network theory

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Towards the Characterization of Realistic Models: Evaluation of Multidisciplinary Graph Metrics

  1. 1. Budapest University of Technology and Economics Department of Measurement and Information Systems MTA-BME Lendület Research Group on Cyber-Physical Systems Budapest University of Technology and Economics Fault Tolerant Systems Research Group Towards the Characterization of Realistic Models: Evaluation of Multidisciplinary Graph Metrics Gábor Szárnyas, Zsolt Kővári, Ágnes Salánki, Dániel Varró
  2. 2. Motivation Research Community Problems of experimental evaluation of MDE papers Difficult to find real industrial model Tool Providers Test generation for modeling tools Scalability evaluation and stress testing of MDE tools Smart CPS Synthesis of prototypical test context/environment Testing of autonomous robots (R3COP project)
  3. 3. Motivation Research Community Problems of experimental evaluation of MDE papers Difficult to find real industrial model Tool Providers Test generation for modeling tools Scalability evaluation and stress testing of MDE tools Smart CPS Synthesis of prototypical test context/environment Testing of autonomous robots (R3COP project) How to automatically synthesize graph models…?
  4. 4. Research Question and Objectives • All well-formedness constraints satisfied • Designated seed fragments includedConsistent • How to characterize realistic models? • How to distinguish real and generated models?Realistic • Guaranteed test coverage • Required for tool qualificationDiverse • Performance benchmarks • Stress testing of tools and control algorithmsScalable How to automatically synthesize graph models which are...
  5. 5. Research Question and Objectives • All well-formedness constraints satisfied • Designated seed fragments includedConsistent • How to characterize realistic models? • How to distinguish real and generated models?Realistic • Guaranteed test coverage • Required for tool qualificationDiverse • Performance benchmarks • Stress testing of tools and control algorithmsScalable How to automatically synthesize graph models which are...
  6. 6. Performance Experiments
  7. 7. Performance Experiments  „I would like to benchmark my tool on real models”
  8. 8. Performance Experiments  „I would like to benchmark my tool on real models” o Industrial models are difficult to obtain.
  9. 9. Performance Experiments  „I would like to benchmark my tool on real models” o Industrial models are difficult to obtain.  Workaround #1: „Never mind, my tool has very good performance for the TTC 2038 case.”
  10. 10. Performance Experiments  „I would like to benchmark my tool on real models” o Industrial models are difficult to obtain.  Workaround #1: „Never mind, my tool has very good performance for the TTC 2038 case.” o Great, but what does that imply for real use cases?
  11. 11. Performance Experiments  „I would like to benchmark my tool on real models” o Industrial models are difficult to obtain.  Workaround #1: „Never mind, my tool has very good performance for the TTC 2038 case.” o Great, but what does that imply for real use cases?  Workaround #2: Implement a custom benchmark
  12. 12. Performance Experiments  „I would like to benchmark my tool on real models” o Industrial models are difficult to obtain.  Workaround #1: „Never mind, my tool has very good performance for the TTC 2038 case.” o Great, but what does that imply for real use cases?  Workaround #2: Implement a custom benchmark o Again, what does that imply for real use cases?
  13. 13. Performance Experiments  „I would like to benchmark my tool on real models” o Industrial models are difficult to obtain.  Workaround #1: „Never mind, my tool has very good performance for the TTC 2038 case.” o Great, but what does that imply for real use cases?  Workaround #2: Implement a custom benchmark o Again, what does that imply for real use cases?  Qualitative description of models is required
  14. 14. How to Obtain Models for Benchmarking? • Difficult to obtain • Obfuscated models Industrial • Quality of models?Student work • Good quality models • Small in size Tutorial • How realistic are these models?Generated
  15. 15. What Makes a Model Realistic? How to decide if a model is realistic without domain-specific knowledge?
  16. 16. Statecharts with Attributes Red Red & Orange GreenOrange Red Red & Orange GreenOrange
  17. 17. Statecharts with Attributes Red Red & Orange GreenOrange Red Red & Orange GreenOrange
  18. 18. Statecharts S1 S2 S3S4 S1 S2 S3S4
  19. 19. Statecharts S1 S2 S3S4 S1 S2 S3S4
  20. 20. Typed Graphs of the Models S1 S2 S3 S4 T1 T2 T3 T4 T5 E S4 S1 S2 S3 T1 T2 T3 T4 T5 E
  21. 21. Typed Graphs of the Models S1 S2 S3 S4 T1 T2 T3 T4 T5 E S4 S1 S2 S3 T1 T2 T3 T4 T5 E Which is the graph of a real model?
  22. 22. Graph Metrics Use graph metrics for characterizing the graph of the model.
  23. 23. Graph Metrics
  24. 24. Graph Metrics Number of vertices
  25. 25. Graph Metrics 0 5 10 15 20 25 Number of vertices
  26. 26. Graph Metrics 0 5 10 15 20 25 Number of vertices Number of edges
  27. 27. Graph Metrics 0 5 10 15 20 25 Number of vertices 0 10 20 30 40 Number of edges
  28. 28. Graph Metrics 0 5 10 15 20 25 Number of vertices 0 10 20 30 40 Number of edges Average shortest path
  29. 29. Graph Metrics 0 5 10 15 20 25 Number of vertices 0 10 20 30 40 Number of edges Average shortest path 0 1 2 3 4 5 6
  30. 30. Graph Metrics 0 5 10 15 20 25 Number of vertices 0 10 20 30 40 Number of edges Average shortest path 0 1 2 3 4 5 6
  31. 31. Graph Metrics 0 5 10 15 20 25 Number of vertices 0 10 20 30 40 Number of edges Average shortest path 0 1 2 3 4 5 6 Clusteredness
  32. 32. One-Dimensional Graph Metrics 0 5 10 15 20 25 Number of vertices 0 10 20 30 40 Number of edges Average shortest path 0 1 2 3 4 5 6 Clusteredness 0 0.2 0.4 0.6 0.8 1
  33. 33. Graph Metrics 0 5 10 15 20 25 Number of vertices 0 10 20 30 40 Number of edges Average shortest path 0 1 2 3 4 5 6 Clusteredness 0 0.2 0.4 0.6 0.8 1 Centrality
  34. 34. Graph Metrics 0 5 10 15 20 25 Number of vertices 0 10 20 30 40 Number of edges Average shortest path 0 1 2 3 4 5 6 Clusteredness 0 0.2 0.4 0.6 0.8 1 Centrality 0 0.2 0.4 0.6 0.8 1
  35. 35. Graph Metrics S1 S2 S3 S4 T1 T2 T3 T4 T5 E S4 S1 S2 S3 T1 T2 T3 T4 T5 E Which is the graph of a real model?
  36. 36. Graph Metrics S1 S2 S3 S4 T1 T2 T3 T4 T5 E S4 S1 S2 S3 T1 T2 T3 T4 T5 E Which is the graph of a real model?
  37. 37. Graph Metrics S1 S2 S3 S4 T1 T2 T3 T4 T5 E S4 S1 S2 S3 T1 T2 T3 T4 T5 E They are isomorphic. Which is the graph of a real model?
  38. 38. Graph Metrics S1 S2 S3 S4 T1 T2 T3 T4 T5 E S4 S1 S2 S3 T1 T2 T3 T4 T5 E They are isomorphic. Which is the graph of a real model? Related finding: simple graph metrics are unable to predict query performance
  39. 39. Network Theory  Mid ‘90s, László Albert-Barabási et al. o Preferential attachment: „the rich gets richer”  Scale-free networks (web, power grid, etc.)  Most approaches only consider untyped graphs.
  40. 40. Network Theory  Mid ‘90s, László Albert-Barabási et al. o Preferential attachment: „the rich gets richer”  Scale-free networks (web, power grid, etc.)  Most approaches only consider untyped graphs. S 1 S 2 S 3 S 4 T 1 T 2 T 3 T 4 T 5 E
  41. 41. Network Theory  Mid ‘90s, László Albert-Barabási et al. o Preferential attachment: „the rich gets richer”  Scale-free networks (web, power grid, etc.)  Most approaches only consider untyped graphs. S 1 S 2 S 3 S 4 T 1 T 2 T 3 T 4 T 5 E S4 S1 S2 S3 T1 T2 T3 T4 T5 E
  42. 42. „Evaluation of Multidisciplinary Graph Metrics”  Typed graph (computer science)  Multi-layered networks (social network analysis)  Multidimensional networks (network theory)  Multiplex networks (physics) Source: Wikipedia, Multidimensional network
  43. 43. Multidimensional Metrics  Dimensional degree distributions  Node dimension connectivity o ratio of nodes in the that belong to a dimension  Multiplex participation coefficient o the connections of v are uniformly distributed among D  Node activity & pairwise multiplexity o the ratio of nodes, which are active in both d1 and d2
  44. 44. Methodology Red Red- Orange Green Orange T1 T2 T3 T4 T5 Entry
  45. 45. Methodology 1. Collect models Red Red- Orange Green Orange T1 T2 T3 T4 T5 Entry
  46. 46. Methodology 1. Collect models 2. Data Cleansing: remove Red Red- Orange Green Orange T1 T2 T3 T4 T5 Entry
  47. 47. Methodology 1. Collect models 2. Data Cleansing: remove o layout information Red Red- Orange Green Orange T1 T2 T3 T4 T5 Entry
  48. 48. Methodology 1. Collect models 2. Data Cleansing: remove o layout information o attributes Red Red- Orange Green Orange T1 T2 T3 T4 T5 Entry
  49. 49. Methodology 1. Collect models 2. Data Cleansing: remove o layout information o attributes Red Red- Orange Green Orange T1 T2 T3 T4 T5 Entry
  50. 50. Methodology 1. Collect models 2. Data Cleansing: remove o layout information o attributes o object types Red Red- Orange Green Orange T1 T2 T3 T4 T5 Entry
  51. 51. Methodology 1. Collect models 2. Data Cleansing: remove o layout information o attributes o object types Red Red- Orange Green Orange T1 T2 T3 T4 T5 Entry
  52. 52. Methodology 1. Collect models 2. Data Cleansing: remove o layout information o attributes o object types o small models Red Red- Orange Green Orange T1 T2 T3 T4 T5 Entry
  53. 53. Methodology 1. Collect models 2. Data Cleansing: remove o layout information o attributes o object types o small models o derived references Red Red- Orange Green Orange T1 T2 T3 T4 T5 Entry
  54. 54. Methodology 1. Collect models 2. Data Cleansing: remove o layout information o attributes o object types o small models o derived references 3. Calculate graph metrics Red Red- Orange Green Orange T1 T2 T3 T4 T5 Entry
  55. 55. Methodology 1. Collect models 2. Data Cleansing: remove o layout information o attributes o object types o small models o derived references 3. Calculate graph metrics 4. Analyze results o Statistical + exploratory Red Red- Orange Green Orange T1 T2 T3 T4 T5 Entry
  56. 56. Domains AutoFOCUS Building Information Model Capella JaMoPP Train Benchmark Yakindu real real tutorial synthetic tutorial tutorial
  57. 57. Domain 1 Statistical Analysis
  58. 58. Domain 1 Statistical Analysis 0 1 0 2 4 6 0 1 0 2 4 6
  59. 59. Domain 1 Domain 2 Statistical Analysis 0 1 0 2 4 6 0 1 0 2 4 6 0 1 0 2 4 6 0 1 0 2 4 6
  60. 60. Domain 1 Domain 2 Statistical Analysis 0 1 0 2 4 6 0 1 0 2 4 6 0 1 0 2 4 6 0 1 0 2 4 6 0 1 0 2 4 6
  61. 61. Statistical Analysis 0 1 0 2 4 6
  62. 62. Homogeneity Statistical Analysis 0 1 0 2 4 6 0 1 0 1 2 3 4 0 1 0 2 4 6
  63. 63. Homogeneity Statistical Analysis 0 1 0 2 4 6 0 1 0 1 2 3 4 0 1 0 2 4 6 
  64. 64. Homogeneity Statistical Analysis 0 1 0 2 4 6 0 1 0 1 2 3 4 0 1 0 2 4 6 Kolmogorov-Smirnov distance 
  65. 65. Homogeneity Statistical Analysis 0 1 0 2 4 6 0 1 0 1 2 3 4 0 1 0 2 4 6 Kolmogorov-Smirnov distance 
  66. 66. Homogeneity Distinctiveness Statistical Analysis 0 1 0 2 4 6 0 1 0 1 2 3 4 0 1 0 2 4 6 0 1 0 2 4 6 Kolmogorov-Smirnov distance 
  67. 67. Homogeneity Distinctiveness  Statistical Analysis 0 1 0 2 4 6 0 1 0 1 2 3 4 0 1 0 2 4 6 0 1 0 2 4 6 Kolmogorov-Smirnov distance 
  68. 68. Dimensional Clustering Coefficients
  69. 69. Dimensional Clustering Coefficients KS distance
  70. 70. Findings 1. Metamodel-level information is insufficient
  71. 71. Findings 1. Metamodel-level information is insufficient 1. The ratio of containment edge types in the Capella metamodels: 75% 2. The ratio of containment edges in the Capella models: 42–50 %
  72. 72. Findings 1. Metamodel-level information is insufficient 2. Containment edges dominate distributions
  73. 73. Findings 1. Metamodel-level information is insufficient 2. Containment edges dominate distributions 3. Many edges follow the locality principle
  74. 74. Future Directions  Use metrics for o Instance model generators o Query optimization  Improve performance of calculating metrics: incremental calculation o https://github.com/ftsrg/model-analyzer o Works for both EMF and RDF models  All analysis results & code are available online: o http://docs.inf.mit.bme.hu/model-metrics/
  75. 75. The Train Benchmark  SOSYM paper – The Train Benchmark: Cross-Technology Performance Evaluation of Continuous Model Validation o 6 queries, 12 transformations o EMF, property graphs, RDF, SQL o 12+ tools o Automated visualization & reporting  http://github.com/ftsrg/trainbenchmark
  76. 76.

×