SlideShare a Scribd company logo
Markus Scheidgen
Model representations for
large meta-model based data-sets
■ Introduction: Technological spaces and model representations
■ Comparison of representation
■ Implementation
■ Application
1
Introduction:
Technological Spaces
2
Software Models
Code
reverse engineering
code generation
XML
persistence / exchange
databases
persistence/versioning
processing
(via ORMs: e.g. JPA)
Objects
(e.g. POJOs)
debugging/profiling
reflection
runtimemodeling
processing (e.g. dom/jaxb)
exchange (e.g. in web-services) xslt/xsl/
xquery/xpath
model-transformation/
-constraints/-queries
static analysis/compilation/
refactoring
SQL
running programs
other data
otherdata
otherdata
otherda
ta
ot
herdata
Introduction:
State of the Art
3
Meta-Models
Models
Schemas
XML
Gammars
Code
Classes
Objects
ER-Schemas
Relational Data
*
visualization and editing
by human users
processing in computer programs
exchange
large data-sets/
persistence and querying
Introduction:
New Class of DBMS
4
Meta-Models
Models
Schemas
XML
Gammars
Code
Classes
Objects
ER-Schemas
Relational Data
*
-
Big Data
+
-
Graphs
ER-Schemas
Big Relational Data
?
Representation: Strategies
5
Object-by-object Fragments
Part-of-source Morsa, (Java) XMI, EMF-Frag
Relations CDO ?
References
Objects
Representation: Object-by-object vs. Fragmentation
(considering traversal, theoretical results)
6
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
0
10
1
10
2
10
3
10
4
10
5
Number of loaded objects [l]
no fragmentation [f=m]
optimal fragmentation
total fragmentation [f=1]
Executiontime[t](inms)
1e+00
1e+01
1e+02
1e+03
1e+04
1e+05
1e+06
Fragment size [f]
Representation: Object-by-object vs. Fragmentation
(considering traversal, theoretical results vs. implementation)
7
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
0
10
1
10
2
10
3
10
4
10
5
Number of loaded objects [l]
no fragmentation [f=m]
optimal fragmentation
total fragmentation [f=1]
Executiontime[t](inms)
1e+00
1e+01
1e+02
1e+03
1e+04
1e+05
1e+06
Fragment size [f]
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
0
10
1
10
2
10
3
10
4
10
5
Number of loaded objects [l]
Executiontime[t](inms)
1e+01
1e+02
1e+03
1e+04
1e+05
Fragment size [f]
optimal fragmentation
Representation: Object-by-object vs. Fragmentation
(considering traversal, implementation with actual model)
■ Model traversal of Grabats models with four different sizes
and different characteristics
8
set0 set1 set2 set3 set4
0
1
2
3
4
5
6
7
8
XMI
CDO
Morsa
EMFFrag coarse
EMFFrag fine
notmeasured–extrapolated
notmeasured–extrapolated
Objectspersecond(10
4
)
set0 set1 set2 set3 set4
10
3
10
4
10
5
10
6
10
7
Numberoffragments
CDO/Morsa
EMFFrag coarse
EMFFrag fine
Representation: Object-by-object vs. Fragmentation
(considering query, implementation with actual model)
■ Query of Grabats models with four different sizes and
different characteristics
9
set0 set1 set2 set3 set4
10
3
10
4
10
5
10
6
10
7
Numberoffragments
CDO/Morsa
EMFFrag coarse
EMFFrag fine
set0 set1 set2 set3 set4
0
50
100
150
200
250
300
350
Executiontime(ins)
XMI
CDO w/o SQL
CDO
Morsa w/o index
Morsa
EMFFrag coarse
EMFFrag fine
notmeasured–extrapolated
notmeasured–extrapolated
notmeasured–extrapolated
notmeasured–extrapolated
Representation: Part-of-source vs. Relations
(real implementation, artificial model)
10
10
0
10
2
10
4
10
6
10
1
10
2
10
3
10
4
number of outgoing references
executiontimeinms
10
0
10
2
10
4
10
6
10
1
10
2
10
3
10
4
number of outgoing references
executiontimeinms
Part of source implementation Relation implementation with individual access
access of one outgoing reference
traversal of all outgoing references
access of one outgoing reference
traversal of all outgoing references
Representation: Part-of-source vs. Relations
(real implementation, artificial model)
11
10
0
10
2
10
4
10
6
10
1
10
2
10
3
10
4
number of outgoing references
executiontimeinms
Part of source implementation
access of one outgoing reference
traversal of all outgoing references
10
0
10
2
10
4
10
6
10
1
10
2
10
3
10
4
number of outgoing references
executiontimeinms
Relation implementation with scanning
access of one outgoing reference
traversal of all outgoing references
1
2
3
4
Implementation: EMF-Fragments
12
map/reduce
(hadoop)
“Share Nothing” Nodes
(cluster, adhoc-network)
DFS
(HDFS)
key-value-store
(hbase)
structured datadata-sets
applications meta-model
structured datamodel transformations
Implementation: Datastore mapping
13
regular containment
metamodel
0
1
part of source fragmentation
relation based fragmentation
Implementation: Meta-mode-based
declaration of representations
14
Project
Package
CompilationUnit
FieldMethod
Class
«fragments»
«fragments»
«fragments»
*
*
*
*
*
*
Call
«relation»
Implementation: Architecture
15
FragmentedModel extends Resource
ResourceSet
FObject extends EObject
FStore extends EStore
ResourceSet
Fragment extends Resource
FInternalObject
extends DynamicEObject
URIHandler
DataStore
*
*
1
*
*
1
11
1
visibleAPI
EMF-Fragments Classes
Regular EMF Classes
1
EList
EObjectEList FValueSetList
*
1
*
Applications: Mining and Analyzing Software
Repositories
■ Software repositories contain more information than the current
software code:
■ “developers who changed class/method/statement X also changed class/
method/statement Y”
■ this information leads to knowledge about dependencies that cannot be
determined through static or even dynamic analysis
■ this can be used to
• predict/find bugs
• understand/improve the code-base
■ dependency information should be stored as relational data
■ When a piece of software evolves, its metrics change. Such
dynamic metrics describe software better than static code metrics.
Could lead to a better assessment of methodologies or
understanding of software engineering in general.
16
Applications: Mining and Analyzing Software
Repositories
■ JGit: Java implementation of the Git version control system
■ MoDisco: Reverse engineering framework for eclipse java
projects based on EMF
■ EMF-Compare: Determines matches and differences between
models
■ EMF-Fragments: My own framework for large models
■ over 300 Git repositories with eclipse plug-ins that
constitute the whole eclipse foundation source base as
“example” data-set
17
Applications: Model of a Software Repository
18
A B C
A
A B
A D
PB1.R1
B1.R2
B1.R3
B1.R4
B2.R1
B2.R2
A
A B
Repository
Revision Diff
Compilation
Unit
Model
Package Class
...
* * * *
*
1
prevnext
JGit MoDisco
modelmetamodel
usageIn
Package
Access
*
package1
«relation,
fragmentation»
«fragmentation» «relation,
fragmentation»
«relation»
«fragmentation»
* *
extends1
Summary
■ Choosing the right representation makes a difference
■ Meta-model-based declaration of representations works
(might not be good enough)
■ There are applications that can benefit from different
representations
19
Object-by-object Fragments
Part-of-source Morsa, (Java) XMI, EMF-Frag
Relations CDO ?
References
Objects
Backup
20
Possible Approaches: Different Target Platforms
21
Schemas
XML
*
-
Big Data
-
Graphs
BASE
CAP-Theorem1
1Eric A. Brewer: Towards robust distributed systems; 19th ACM Symposium on Principles of Distributed Computing, 2000
2K. Barmpis and D.S. Kolovos. Comparative Analysis of Data Persistence Technologies for Large-Scale Models. XM 2012
ORM
XMI
XM
I+Resources
ER-Schemas
Relational Data
ACID,
structured data
ER-Schemas
Big Relational Data
BASE,
structured data
BASE,
structured data
Big
*
ORM?
2
Possible Approaches: Different Types of
Mapping
22
*
1Javier Espinazo-Pagán, Jesús Sánchez Cuadrado, Jesús García Molina: Morsa, A Scalable Approach for Persisting and Accessing Large Models; MoDELS 2011
per object m
apping fragm
entation
ER-Schemas
Relational Data
fast query,
slow traversal,
slow entry,
(fine transactions)
fast query,
slow traversal,
slow entry,
(fine transactions)1
Big
*
perobject
mapping
slow query,
fast traversal,
fast entry,
(coarse trans.)
Big
*ER-Schemas
Big Relational Data
/
Fragmentation: Types of references
■ organizing large artifacts in different resources is already
implemented in EMF
■ resources are loaded if necessary, objects in unloaded
resources are represented by proxy objects
■ objects in different resources (as all related objects) are
related through references, therefore models are
fragmented along references
■ EMF-Fragments automatically fragments large models based
on annotations in the meta-model
■ resources are identified via URIs and can be serialized (e.g.
XMI), therefore resources can be stored in a key-value store
23
Fragmentation: Types of references
24
*
normal
references
*
«fragments»fragmenting
references
large value
sets *
Applications
■ HWL sensor and network operation data (or experiment data in general)
■ realtime persistence required ➜ fast data entry
■ hierarchical structured data (different sensors and other data sources) ➜ meta-modeling
■ queries for experiments, sensors, specific time periods ➜ only coarse simple queries
■ traversal of larger sub-trees, mostly applications based on data aggregation
■ actual demand for big-data depends on size of sensor network ➜ scalability
■ CityGML models (or geo-spatial data in general)
■ standardized as XML-schemas ➜ XML based data
■ special proprietary indexes (e.g. spacial indexes like R-trees) and corresponding queries
■ rather query intense applications
■ actual demand for big-data depends on LOL of the models ➜ scalability
■ Software Engineering
■ Code/Model Version Control
■ Mining Software Repositories (MSR)
■ revisions of AST-trees and differences between AST-trees ➜ existing meta-model based frameworks (e.g. designed
for reverse engineering purposes)
■ large number of revisions causes many large value sets
■ queries for revisions, compilation-units ➜ rather coarse queries
■ aggregations and statistics ➜ can be expressed in an OCL-like language
■ immediate demand for processing in (at least smaller) clusters
■ has to be mixed with relational data for some applications
25
Applications: Scientific Data
26
WSN
<xml?...>
<xml?...>
click
*
*
xml-to-model
text-to-model*
Applications: CityGML
■ XML-based standard ➜ meta-models can be generated (1-
to-1 mapping)
■ different standards define XML-schemas that extend each
other: GML⇽CityGML⇽extensions
■ transparent use of spacial indexes
■ map onto existing platforms (e.g. SpatialHadoop)
■ use existing implementations and persist into the key-value
store
■ extensions to CityGML can be facilitated to reference
CityGML-models as spatial context for sensor data
27
backup
28
Research Overview
29
W
IRELESS SENSOR NETWORKS
DATA
ANALYSISFRAMEWORK
G
EO
INFORMATION SYSTEMS
sensor data
heterogenous networks
mesh-
networks
cellular-
networks
spatial data
regular databases
spatial databases
distributed
data stores
distributed
analysis
data homo-
genisation
domain
specific
analysis
languages
HWL: Commodity Hardware
30
31
‣120+ Nodes
‣indoor and outdoor
‣dense and sparse
‣short and long links
‣stationary and mobil nodes
‣120+ Nodes
‣indoor and outdoor
‣dense and sparse
‣short and long links
‣stationary and mobil nodes
1
2
3
4
6
7
8
9
stein
? m
10m
5 10
Richtung Groß-Berliner
Damm
Richtung Institut
MarkusScheidgen:HWL–AHigh-PerformanceWirelessSensorResearchNetwork
35
Experiments: The Test Site
§ simplest case: two lane,
newly paved road
§ spatially equally distributed
nodes on both sides of the
rode
§ 2x5 nodes
§ homogeneous test-bed:
same nodes, equally
calibrated, same stone
ground
§ one camera to record control
data
0 20 40 60 80 100 120 140 160 180 200
0
50
100
150
200
250
300
350
400
450
Single−sided Amplitude Spectrum
Frequency (Hz)
|Y(fr)|
Channel Z
Channel Y
Channel X
0 500 1000 1500 2000 2500 3000
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Time sample (1/400 sec)
Acceleratorvalue
Time signal of all 3 channels
Channel Z
Channel Y
Channel X
MarkusScheidgen:HWL–AHigh-PerformanceWirelessSensorResearchNetwork
Experiments: Example Data
36
Amplitudes Frequencies
MarkusScheidgen:HWL–AHigh-PerformanceWirelessSensorResearchNetwork
Experiment: Algorithm
§ Similar to earthquake detection: comparison of
short and long moving averages (S=0.2s, L=4s)
38
sx = xth acceleration value (1)
mavg(sx, W) =
Px
i=x W si
W
(2)
ˆsx = |sx avg(sx, L)| (3)
wS
x = mavg(ˆsx, S) (4)
wL
x = mavg(ˆsx, L) (5)
w = wS
x wL
x (6)
Data Management
39
Research Overview
40
W
IRELESS SENSOR NETWORKS
DATA
ANALYSISFRAMEWORK
G
EO
INFORMATION SYSTEMS
sensor data
heterogenous networks
mesh-
networks
cellular-
networks
spatial data
regular databases
spatial databases
distributed
data stores
distributed
analysis
data homo-
genisation
domain
specific
analysis
languages
41
internet
cellular
cellular
wifi
zigbee
zigbee
Technological Infrastructure
Logical Infrastructure
actions
visualization
sensors
information
43
internet
cellular
cellular
wifi
zigbee
zigbee
information/knowledge
distributed programming
models
data bases
data representation
algorithms
processes
programming languages
CPUs
machine code radios
network protocols
hard drives
genericdomainspecific
software engineering
algorithms
processes
programming languages
information/knowledge
distributed programming
models
data bases
data representation
DSL
Complex Data Types
44
➡ complex data structures
➡ lots of links between data objects
➡ evolving structures
➡ requires a type safe programming
environment that proliferates re-
use
Large Amounts of Data
45
➡ a certain amount of data needs to be
stored per second (HWL: 120 nodes)
~140x103 data objects per second
~7MB/s serialized
➡ a certain amount of data needs to be
stored all together (24h)
~12x109 data objects
~600GB serialized
➡ Data analysis must complete in
reasonable time. For live
applications in real time.
From Click to ClickWatch
46
Click API software
Element
Element
Element
Compound
Handler
Handler
NetworkInterface
Complex Data Types: Meta-Modeling
47
This [ ] happens all the time in software modeling
state charts class diagrams MSCsOCL
context Foo
self.properties->
foreach(a|a.x != a.y)
eclipse modeling framework (EMF)
➡ Distributed storage and links between different types of data is only a simple
extension of existing technology: multi resource persistence is already implemented
“Share Nothing” Nodes
(cluster, adhoc-network)
DFS
(HDFS)
key-value-store1
(hbase)
Large Amounts of Data:
Problem Statement
48
1. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar
Chandra, Andrew Fikes, and Robert Gruber. Bigtable: A distributed storage system for structured data (awarded
best paper!). In Brian N. Bershad and Jeffrey C. Mogul, editors, OSDI, pages 205–218. USENIX Association, 2006.
2. Jeffrey Dean and Sanjay Ghemawat. Map/reduce: Simplified data processing on large clusters. In OSDI, pages 137–
150. USENIX Association, 2004.
map/reduce2
(hadoop)
hierarchical data
(XML, OGC standards)
data series
(sensor data)
signal analysis, statistics, sensor-fusion
domainspecificgeneric
1
2
3
4
Large Amounts of Data: Approach
49
map/reduce
(hadoop)
“Share Nothing” Nodes
(cluster, adhoc-network)
DFS
(HDFS)
key-value-store
(hbase)
hierarchical data
(XML, OGC standards)
data series
(sensor data)
signal analysis, statistics, sensor-fusion meta-model
structured datamodel transformations

More Related Content

What's hot

Ontologies Ontop Databases
Ontologies Ontop DatabasesOntologies Ontop Databases
Ontologies Ontop Databases
Martín Rezk
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!
Josef Hardi
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Mariano Rodriguez-Muro
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
Masud Rahman
 
ExSchema
ExSchemaExSchema
ExSchema
jccastrejon
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
Mariano Rodriguez-Muro
 
Ontop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesOntop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational Databases
Guohui Xiao
 
Results of the FLOSSMetrics project
Results of the FLOSSMetrics projectResults of the FLOSSMetrics project
Results of the FLOSSMetrics project
Jesus M. Gonzalez-Barahona
 
A Taxonomy for Program Metamodels in Program Reverse Engineering
A Taxonomy for Program Metamodels in Program Reverse EngineeringA Taxonomy for Program Metamodels in Program Reverse Engineering
A Taxonomy for Program Metamodels in Program Reverse Engineering
Hironori Washizaki
 
Floss Metrics 2009
Floss Metrics 2009Floss Metrics 2009
Floss Metrics 2009
Inria
 
A Platform for Application Risk Intelligence
A Platform for Application Risk IntelligenceA Platform for Application Risk Intelligence
A Platform for Application Risk Intelligence
Checkmarx
 
20100309 03 - Vulnerability analysis (McCabe)
20100309 03 - Vulnerability analysis (McCabe)20100309 03 - Vulnerability analysis (McCabe)
20100309 03 - Vulnerability analysis (McCabe)
LeClubQualiteLogicielle
 
A hybrid model to detect malicious executables
A hybrid model to detect malicious executablesA hybrid model to detect malicious executables
A hybrid model to detect malicious executablesUltraUploader
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
MGU SYLLABUS MANUAL-Advance diploma in computer applications
MGU SYLLABUS MANUAL-Advance diploma in computer applicationsMGU SYLLABUS MANUAL-Advance diploma in computer applications
MGU SYLLABUS MANUAL-Advance diploma in computer applications
mahatmagandhiuniversity
 
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3CSemantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Ivan Herman
 
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
Raffi Khatchadourian
 
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffAnalyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Martin Pinzger
 

What's hot (20)

Ontologies Ontop Databases
Ontologies Ontop DatabasesOntologies Ontop Databases
Ontologies Ontop Databases
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
 
Poster (1)
Poster (1)Poster (1)
Poster (1)
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
ExSchema
ExSchemaExSchema
ExSchema
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
 
Ontop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesOntop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational Databases
 
Results of the FLOSSMetrics project
Results of the FLOSSMetrics projectResults of the FLOSSMetrics project
Results of the FLOSSMetrics project
 
A Taxonomy for Program Metamodels in Program Reverse Engineering
A Taxonomy for Program Metamodels in Program Reverse EngineeringA Taxonomy for Program Metamodels in Program Reverse Engineering
A Taxonomy for Program Metamodels in Program Reverse Engineering
 
Floss Metrics 2009
Floss Metrics 2009Floss Metrics 2009
Floss Metrics 2009
 
A Platform for Application Risk Intelligence
A Platform for Application Risk IntelligenceA Platform for Application Risk Intelligence
A Platform for Application Risk Intelligence
 
20100309 03 - Vulnerability analysis (McCabe)
20100309 03 - Vulnerability analysis (McCabe)20100309 03 - Vulnerability analysis (McCabe)
20100309 03 - Vulnerability analysis (McCabe)
 
A hybrid model to detect malicious executables
A hybrid model to detect malicious executablesA hybrid model to detect malicious executables
A hybrid model to detect malicious executables
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
MGU SYLLABUS MANUAL-Advance diploma in computer applications
MGU SYLLABUS MANUAL-Advance diploma in computer applicationsMGU SYLLABUS MANUAL-Advance diploma in computer applications
MGU SYLLABUS MANUAL-Advance diploma in computer applications
 
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3CSemantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
 
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
 
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffAnalyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
 

Similar to Reference Representation in Large Metamodel-based Datasets

Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore
Geoffrey Fox
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By Koneksys
Koneksys
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
Giorgos Santipantakis
 
Modèles de données et langages de description ouverts 6 - 2021-2022
Modèles de données et langages de description ouverts   6 - 2021-2022Modèles de données et langages de description ouverts   6 - 2021-2022
Modèles de données et langages de description ouverts 6 - 2021-2022
François-Xavier Boffy
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
Robert Grossman
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
Ian Foster
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
Geoffrey Fox
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
Michael Häusler
 
Madeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable HardwareMadeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable Hardware
ESUG
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Lightbend
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
kaveirious
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
Ian Foster
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
Enrico Daga
 
Data provenance in Hopsworks
Data provenance in HopsworksData provenance in Hopsworks
Data provenance in Hopsworks
Alexandru Adrian Ormenisan
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
Sion Smith
 

Similar to Reference Representation in Large Metamodel-based Datasets (20)

Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By Koneksys
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Application Hosting
Application HostingApplication Hosting
Application Hosting
 
Modèles de données et langages de description ouverts 6 - 2021-2022
Modèles de données et langages de description ouverts   6 - 2021-2022Modèles de données et langages de description ouverts   6 - 2021-2022
Modèles de données et langages de description ouverts 6 - 2021-2022
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
Madeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable HardwareMadeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable Hardware
 
Patterns for distributed systems
Patterns for distributed systemsPatterns for distributed systems
Patterns for distributed systems
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
Data provenance in Hopsworks
Data provenance in HopsworksData provenance in Hopsworks
Data provenance in Hopsworks
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 

Recently uploaded

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 

Recently uploaded (20)

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 

Reference Representation in Large Metamodel-based Datasets

  • 1. Markus Scheidgen Model representations for large meta-model based data-sets ■ Introduction: Technological spaces and model representations ■ Comparison of representation ■ Implementation ■ Application 1
  • 2. Introduction: Technological Spaces 2 Software Models Code reverse engineering code generation XML persistence / exchange databases persistence/versioning processing (via ORMs: e.g. JPA) Objects (e.g. POJOs) debugging/profiling reflection runtimemodeling processing (e.g. dom/jaxb) exchange (e.g. in web-services) xslt/xsl/ xquery/xpath model-transformation/ -constraints/-queries static analysis/compilation/ refactoring SQL running programs other data otherdata otherdata otherda ta ot herdata
  • 3. Introduction: State of the Art 3 Meta-Models Models Schemas XML Gammars Code Classes Objects ER-Schemas Relational Data * visualization and editing by human users processing in computer programs exchange large data-sets/ persistence and querying
  • 4. Introduction: New Class of DBMS 4 Meta-Models Models Schemas XML Gammars Code Classes Objects ER-Schemas Relational Data * - Big Data + - Graphs ER-Schemas Big Relational Data ?
  • 5. Representation: Strategies 5 Object-by-object Fragments Part-of-source Morsa, (Java) XMI, EMF-Frag Relations CDO ? References Objects
  • 6. Representation: Object-by-object vs. Fragmentation (considering traversal, theoretical results) 6 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 10 4 10 5 Number of loaded objects [l] no fragmentation [f=m] optimal fragmentation total fragmentation [f=1] Executiontime[t](inms) 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 Fragment size [f]
  • 7. Representation: Object-by-object vs. Fragmentation (considering traversal, theoretical results vs. implementation) 7 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 10 4 10 5 Number of loaded objects [l] no fragmentation [f=m] optimal fragmentation total fragmentation [f=1] Executiontime[t](inms) 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 Fragment size [f] 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 10 4 10 5 Number of loaded objects [l] Executiontime[t](inms) 1e+01 1e+02 1e+03 1e+04 1e+05 Fragment size [f] optimal fragmentation
  • 8. Representation: Object-by-object vs. Fragmentation (considering traversal, implementation with actual model) ■ Model traversal of Grabats models with four different sizes and different characteristics 8 set0 set1 set2 set3 set4 0 1 2 3 4 5 6 7 8 XMI CDO Morsa EMFFrag coarse EMFFrag fine notmeasured–extrapolated notmeasured–extrapolated Objectspersecond(10 4 ) set0 set1 set2 set3 set4 10 3 10 4 10 5 10 6 10 7 Numberoffragments CDO/Morsa EMFFrag coarse EMFFrag fine
  • 9. Representation: Object-by-object vs. Fragmentation (considering query, implementation with actual model) ■ Query of Grabats models with four different sizes and different characteristics 9 set0 set1 set2 set3 set4 10 3 10 4 10 5 10 6 10 7 Numberoffragments CDO/Morsa EMFFrag coarse EMFFrag fine set0 set1 set2 set3 set4 0 50 100 150 200 250 300 350 Executiontime(ins) XMI CDO w/o SQL CDO Morsa w/o index Morsa EMFFrag coarse EMFFrag fine notmeasured–extrapolated notmeasured–extrapolated notmeasured–extrapolated notmeasured–extrapolated
  • 10. Representation: Part-of-source vs. Relations (real implementation, artificial model) 10 10 0 10 2 10 4 10 6 10 1 10 2 10 3 10 4 number of outgoing references executiontimeinms 10 0 10 2 10 4 10 6 10 1 10 2 10 3 10 4 number of outgoing references executiontimeinms Part of source implementation Relation implementation with individual access access of one outgoing reference traversal of all outgoing references access of one outgoing reference traversal of all outgoing references
  • 11. Representation: Part-of-source vs. Relations (real implementation, artificial model) 11 10 0 10 2 10 4 10 6 10 1 10 2 10 3 10 4 number of outgoing references executiontimeinms Part of source implementation access of one outgoing reference traversal of all outgoing references 10 0 10 2 10 4 10 6 10 1 10 2 10 3 10 4 number of outgoing references executiontimeinms Relation implementation with scanning access of one outgoing reference traversal of all outgoing references
  • 12. 1 2 3 4 Implementation: EMF-Fragments 12 map/reduce (hadoop) “Share Nothing” Nodes (cluster, adhoc-network) DFS (HDFS) key-value-store (hbase) structured datadata-sets applications meta-model structured datamodel transformations
  • 13. Implementation: Datastore mapping 13 regular containment metamodel 0 1 part of source fragmentation relation based fragmentation
  • 14. Implementation: Meta-mode-based declaration of representations 14 Project Package CompilationUnit FieldMethod Class «fragments» «fragments» «fragments» * * * * * * Call «relation»
  • 15. Implementation: Architecture 15 FragmentedModel extends Resource ResourceSet FObject extends EObject FStore extends EStore ResourceSet Fragment extends Resource FInternalObject extends DynamicEObject URIHandler DataStore * * 1 * * 1 11 1 visibleAPI EMF-Fragments Classes Regular EMF Classes 1 EList EObjectEList FValueSetList * 1 *
  • 16. Applications: Mining and Analyzing Software Repositories ■ Software repositories contain more information than the current software code: ■ “developers who changed class/method/statement X also changed class/ method/statement Y” ■ this information leads to knowledge about dependencies that cannot be determined through static or even dynamic analysis ■ this can be used to • predict/find bugs • understand/improve the code-base ■ dependency information should be stored as relational data ■ When a piece of software evolves, its metrics change. Such dynamic metrics describe software better than static code metrics. Could lead to a better assessment of methodologies or understanding of software engineering in general. 16
  • 17. Applications: Mining and Analyzing Software Repositories ■ JGit: Java implementation of the Git version control system ■ MoDisco: Reverse engineering framework for eclipse java projects based on EMF ■ EMF-Compare: Determines matches and differences between models ■ EMF-Fragments: My own framework for large models ■ over 300 Git repositories with eclipse plug-ins that constitute the whole eclipse foundation source base as “example” data-set 17
  • 18. Applications: Model of a Software Repository 18 A B C A A B A D PB1.R1 B1.R2 B1.R3 B1.R4 B2.R1 B2.R2 A A B Repository Revision Diff Compilation Unit Model Package Class ... * * * * * 1 prevnext JGit MoDisco modelmetamodel usageIn Package Access * package1 «relation, fragmentation» «fragmentation» «relation, fragmentation» «relation» «fragmentation» * * extends1
  • 19. Summary ■ Choosing the right representation makes a difference ■ Meta-model-based declaration of representations works (might not be good enough) ■ There are applications that can benefit from different representations 19 Object-by-object Fragments Part-of-source Morsa, (Java) XMI, EMF-Frag Relations CDO ? References Objects
  • 21. Possible Approaches: Different Target Platforms 21 Schemas XML * - Big Data - Graphs BASE CAP-Theorem1 1Eric A. Brewer: Towards robust distributed systems; 19th ACM Symposium on Principles of Distributed Computing, 2000 2K. Barmpis and D.S. Kolovos. Comparative Analysis of Data Persistence Technologies for Large-Scale Models. XM 2012 ORM XMI XM I+Resources ER-Schemas Relational Data ACID, structured data ER-Schemas Big Relational Data BASE, structured data BASE, structured data Big * ORM? 2
  • 22. Possible Approaches: Different Types of Mapping 22 * 1Javier Espinazo-Pagán, Jesús Sánchez Cuadrado, Jesús García Molina: Morsa, A Scalable Approach for Persisting and Accessing Large Models; MoDELS 2011 per object m apping fragm entation ER-Schemas Relational Data fast query, slow traversal, slow entry, (fine transactions) fast query, slow traversal, slow entry, (fine transactions)1 Big * perobject mapping slow query, fast traversal, fast entry, (coarse trans.) Big *ER-Schemas Big Relational Data /
  • 23. Fragmentation: Types of references ■ organizing large artifacts in different resources is already implemented in EMF ■ resources are loaded if necessary, objects in unloaded resources are represented by proxy objects ■ objects in different resources (as all related objects) are related through references, therefore models are fragmented along references ■ EMF-Fragments automatically fragments large models based on annotations in the meta-model ■ resources are identified via URIs and can be serialized (e.g. XMI), therefore resources can be stored in a key-value store 23
  • 24. Fragmentation: Types of references 24 * normal references * «fragments»fragmenting references large value sets *
  • 25. Applications ■ HWL sensor and network operation data (or experiment data in general) ■ realtime persistence required ➜ fast data entry ■ hierarchical structured data (different sensors and other data sources) ➜ meta-modeling ■ queries for experiments, sensors, specific time periods ➜ only coarse simple queries ■ traversal of larger sub-trees, mostly applications based on data aggregation ■ actual demand for big-data depends on size of sensor network ➜ scalability ■ CityGML models (or geo-spatial data in general) ■ standardized as XML-schemas ➜ XML based data ■ special proprietary indexes (e.g. spacial indexes like R-trees) and corresponding queries ■ rather query intense applications ■ actual demand for big-data depends on LOL of the models ➜ scalability ■ Software Engineering ■ Code/Model Version Control ■ Mining Software Repositories (MSR) ■ revisions of AST-trees and differences between AST-trees ➜ existing meta-model based frameworks (e.g. designed for reverse engineering purposes) ■ large number of revisions causes many large value sets ■ queries for revisions, compilation-units ➜ rather coarse queries ■ aggregations and statistics ➜ can be expressed in an OCL-like language ■ immediate demand for processing in (at least smaller) clusters ■ has to be mixed with relational data for some applications 25
  • 27. Applications: CityGML ■ XML-based standard ➜ meta-models can be generated (1- to-1 mapping) ■ different standards define XML-schemas that extend each other: GML⇽CityGML⇽extensions ■ transparent use of spacial indexes ■ map onto existing platforms (e.g. SpatialHadoop) ■ use existing implementations and persist into the key-value store ■ extensions to CityGML can be facilitated to reference CityGML-models as spatial context for sensor data 27
  • 29. Research Overview 29 W IRELESS SENSOR NETWORKS DATA ANALYSISFRAMEWORK G EO INFORMATION SYSTEMS sensor data heterogenous networks mesh- networks cellular- networks spatial data regular databases spatial databases distributed data stores distributed analysis data homo- genisation domain specific analysis languages
  • 31. 31
  • 32. ‣120+ Nodes ‣indoor and outdoor ‣dense and sparse ‣short and long links ‣stationary and mobil nodes
  • 33. ‣120+ Nodes ‣indoor and outdoor ‣dense and sparse ‣short and long links ‣stationary and mobil nodes
  • 34.
  • 35. 1 2 3 4 6 7 8 9 stein ? m 10m 5 10 Richtung Groß-Berliner Damm Richtung Institut MarkusScheidgen:HWL–AHigh-PerformanceWirelessSensorResearchNetwork 35 Experiments: The Test Site § simplest case: two lane, newly paved road § spatially equally distributed nodes on both sides of the rode § 2x5 nodes § homogeneous test-bed: same nodes, equally calibrated, same stone ground § one camera to record control data
  • 36. 0 20 40 60 80 100 120 140 160 180 200 0 50 100 150 200 250 300 350 400 450 Single−sided Amplitude Spectrum Frequency (Hz) |Y(fr)| Channel Z Channel Y Channel X 0 500 1000 1500 2000 2500 3000 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 Time sample (1/400 sec) Acceleratorvalue Time signal of all 3 channels Channel Z Channel Y Channel X MarkusScheidgen:HWL–AHigh-PerformanceWirelessSensorResearchNetwork Experiments: Example Data 36 Amplitudes Frequencies
  • 37.
  • 38. MarkusScheidgen:HWL–AHigh-PerformanceWirelessSensorResearchNetwork Experiment: Algorithm § Similar to earthquake detection: comparison of short and long moving averages (S=0.2s, L=4s) 38 sx = xth acceleration value (1) mavg(sx, W) = Px i=x W si W (2) ˆsx = |sx avg(sx, L)| (3) wS x = mavg(ˆsx, S) (4) wL x = mavg(ˆsx, L) (5) w = wS x wL x (6)
  • 40. Research Overview 40 W IRELESS SENSOR NETWORKS DATA ANALYSISFRAMEWORK G EO INFORMATION SYSTEMS sensor data heterogenous networks mesh- networks cellular- networks spatial data regular databases spatial databases distributed data stores distributed analysis data homo- genisation domain specific analysis languages
  • 43. 43 internet cellular cellular wifi zigbee zigbee information/knowledge distributed programming models data bases data representation algorithms processes programming languages CPUs machine code radios network protocols hard drives genericdomainspecific software engineering algorithms processes programming languages information/knowledge distributed programming models data bases data representation DSL
  • 44. Complex Data Types 44 ➡ complex data structures ➡ lots of links between data objects ➡ evolving structures ➡ requires a type safe programming environment that proliferates re- use
  • 45. Large Amounts of Data 45 ➡ a certain amount of data needs to be stored per second (HWL: 120 nodes) ~140x103 data objects per second ~7MB/s serialized ➡ a certain amount of data needs to be stored all together (24h) ~12x109 data objects ~600GB serialized ➡ Data analysis must complete in reasonable time. For live applications in real time.
  • 46. From Click to ClickWatch 46 Click API software Element Element Element Compound Handler Handler NetworkInterface
  • 47. Complex Data Types: Meta-Modeling 47 This [ ] happens all the time in software modeling state charts class diagrams MSCsOCL context Foo self.properties-> foreach(a|a.x != a.y) eclipse modeling framework (EMF) ➡ Distributed storage and links between different types of data is only a simple extension of existing technology: multi resource persistence is already implemented
  • 48. “Share Nothing” Nodes (cluster, adhoc-network) DFS (HDFS) key-value-store1 (hbase) Large Amounts of Data: Problem Statement 48 1. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. Bigtable: A distributed storage system for structured data (awarded best paper!). In Brian N. Bershad and Jeffrey C. Mogul, editors, OSDI, pages 205–218. USENIX Association, 2006. 2. Jeffrey Dean and Sanjay Ghemawat. Map/reduce: Simplified data processing on large clusters. In OSDI, pages 137– 150. USENIX Association, 2004. map/reduce2 (hadoop) hierarchical data (XML, OGC standards) data series (sensor data) signal analysis, statistics, sensor-fusion domainspecificgeneric
  • 49. 1 2 3 4 Large Amounts of Data: Approach 49 map/reduce (hadoop) “Share Nothing” Nodes (cluster, adhoc-network) DFS (HDFS) key-value-store (hbase) hierarchical data (XML, OGC standards) data series (sensor data) signal analysis, statistics, sensor-fusion meta-model structured datamodel transformations