Benchmarks like LSBench, SRBench, CSRBench and, more recently, CityBench satisfy the growing need of shared datasets, ontologies and queries to evaluate window-based RDF Stream Processing (RSP) engines. However, no clear winner emerges out of the evaluation. In this paper, we claim that the RSP community needs to adopt a Systematic Comparative Research Approach (SCRA) if it wants to move a step forward. To this end, we propose a framework that enables SCRA for window based RSP engines. The contributions of this paper are: (i) the requirements to satisfy for tools that aim at enabling SCRA; (ii) the architecture of a facility to design and execute experiment guaranteeing repeatability, reproducibility and comparability; (iii) Heaven – a proof of concept implementation of such architecture that we released as open source –; (iv) two RSP engine implementations, also open source, that we propose as baselines for the comparative research (i.e., they can serve as terms of comparison in future works). We prove Heaven effectiveness using the baselines by: (i) showing that top-down hypothesis verification is not straight forward even in controlled conditions and (ii) providing examples of bottom-up comparative analysis.
A Hierarchical approach towards Efficient and Expressive Stream ReasoningRiccardo Tommasini
Abstract. Many approaches have been proposed for Stream Reasoning (SR). Some of them combine information flow processing (IFP) tech- niques and semantic technologies to make sense in real-time of noisy, vast and heterogeneous data streams that come from complex domains. More recent works shown the presence of a trade-off between through- put and reasoning expressiveness. Indeed, systems with IFP-like perfor- mance are not really expressive (e.g. up to an RDFS subset) and vice versa. For static data, Information Integration (II) systems approached the problem already. The idea consists in spreading the reasoning com- plexity over different layers of an hierarchical architecture and treating it where it is easier to do. Is it possible realize an expressive and efficient stream reasoning (E2SR), by defining a hierarchical approach that adapts II techniques to the streaming scenario? In this paper, I discuss my plan towards E2SR, the intuition of adapting Information Integration tech- niques to the streaming scenario and the need of Stream Reasoning of comparative analysis to support its technological progress.
Triplewave: a step towards RDF Stream Processing on the WebDaniele Dell'Aglio
The slides of my talk at INSIGHT Centre for Data Analytics (in NUI Galway) where I presented TripleWave (http://streamreasoning.github.io/TripleWave/), an open-source framework to create and publish streams of RDF data.
A Hierarchical approach towards Efficient and Expressive Stream ReasoningRiccardo Tommasini
Abstract. Many approaches have been proposed for Stream Reasoning (SR). Some of them combine information flow processing (IFP) tech- niques and semantic technologies to make sense in real-time of noisy, vast and heterogeneous data streams that come from complex domains. More recent works shown the presence of a trade-off between through- put and reasoning expressiveness. Indeed, systems with IFP-like perfor- mance are not really expressive (e.g. up to an RDFS subset) and vice versa. For static data, Information Integration (II) systems approached the problem already. The idea consists in spreading the reasoning com- plexity over different layers of an hierarchical architecture and treating it where it is easier to do. Is it possible realize an expressive and efficient stream reasoning (E2SR), by defining a hierarchical approach that adapts II techniques to the streaming scenario? In this paper, I discuss my plan towards E2SR, the intuition of adapting Information Integration tech- niques to the streaming scenario and the need of Stream Reasoning of comparative analysis to support its technological progress.
Triplewave: a step towards RDF Stream Processing on the WebDaniele Dell'Aglio
The slides of my talk at INSIGHT Centre for Data Analytics (in NUI Galway) where I presented TripleWave (http://streamreasoning.github.io/TripleWave/), an open-source framework to create and publish streams of RDF data.
Brief report about the contents of the Stream Reasoning workshop at SIWC 2016. Additional info about the event are available at: http://streamreasoning.org/events/sr2016
Streaming Day: an overview of Stream Reasoning
Logical reasoning in real time on multiple, heterogeneous, gigantic and inevitably noisy data streams in order to support the decision process of extremely large numbers of concurrent users.
-- S. Ceri, E. Della Valle, F. van Harmelen and H. Stuckenschmidt, 2010
The talk I gave at the Stream Reasoning workshop in TU Berlin on December 8. I give an overview of RSEP-QL and how it can capture and formalise the behaviour of existing RSP engines, e.g. CSPARQL, EP-SPARQL, CQELS, SPARQLstream
Towards efficient processing of RDF data streamsAlejandro Llaves
Presentation of short paper submitted to OrdRing workshop, held at ISWC 2014 - http://streamreasoning.org/events/ordring2014.
In the last years, there has been an increase in the amount of real-time data generated. Sensors attached to things are transforming how we interact with our environment. Extracting meaningful information from these streams of data is essential for some application areas and requires processing systems that scale to varying conditions in data sources, complex queries, and system failures. This paper describes ongoing research on the development of a scalable RDF streaming engine.
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo PeriniFlink Forward
World's toughest and most interesting analysis tasks lie at the intersection of graph data (inter-dependencies in data) and deep learning (inter-dependencies in the model). Classical graph embedding techniques have for years occupied research groups seeking how complex graphs can be encoded into a low-dimensional latent space. Recently, deep learning has dominated the space of embeddings generation due to its ability to automatically generate embeddings given any static graph.
Grapharis is a project that revitalizes the concept of graph embeddings, yet it does so in a real setting were graphs are not static but keep changing over time (think of user interactions in social networks). More specifically, we explored how a system like Flink can be used to simplify both the process of training a graph embedding model incrementally but also make complex inferences and predictions in real time using graph structured data streams. To our knowledge, Grapharis is the first complete data pipeline using Flink and Tensorflow for real-time deep graph learning. This talk will cover how we can train, store and generate embeddings continuously and accurately as data evolves over time without the need to re-train the underlying model.
Data Stream Analytics - Why they are importantParis Carbone
Streaming is cool and it can help us do quick analytics and make profit but what about tsunamis? This is a motivation talk presented at the SeRC Big Data Workshop in Sweden during spring 2016. It motivates the streaming paradigm and provides examples on Apache Flink.
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...Riccardo Tommasini
Stream Reasoning (SR) research field is grown enough to prove that reasoning upon rapidly changing information is possible. RDF Stream Processing (RSP) Engines, systems capable to handle at semantic level RDF-encoded information flows, are increasing in number of implemented solutions. Now the Stream Reasoning community is working on the standardization of the methods and tools that supported their development.
Many Computer Science (CS) research fields shown their interest for a deeper comprehension of their own work nature. Studies like [46, 51] investi- gated the publications in those field, highlighting that the majority of them are allied to an Engineering epistemology. However, they also evinced and criticized the concrete differences with other engineering research areas, which focus on evaluation of the proposed systems and not only on their design and development.
The lacks of an empirical approach can be ascribed to the complex nature of the software systems. However, it is possible to face such studies that can not be easily modeled, reducing the complexity of the analysis keeping intact the relevance of each involved system. In social science and economy, where researchers deal with cross case studies, it is commonly used a System- atic Comparative Research Approach (SCRA) within an experimental setting, which grants properties like repeatability, reproducibility and comparability to build the evaluation upon.
The SR community agreed that it is mandatory evaluating RSP Engines, understanding how these systems perform in real uses cases. Recent works in the filed [53, 41, 19] pursued this goal, providing benchmarks for RSP Engines evaluation. Further analysis pointed out the challenges involved by the Stream Reasoning research and posed the basis for a proper RSP Engines evaluation, describing in detail where previous works have failed and how the can be
improved [44].
The limitations of the existing benchmarking proposals proved that the
empirical evaluation of RSP Engines is just at the beginning. What is still missing in an infrastructure that allows to compare, possibly automatically, the performances of many RSP Engines and that grants the properties of an experimental setting. In this thesis we brace this challenge borrowing from the aerospace engineering the idea of an engine test stand, which is an automatic facility for engine testing and development.
A test stand allows to design experiments and to execute them, evaluat- ing engines in a controlled environment. Thus, we formulate the following research question: ”Can an engine test stand, together with queries, datasets and methods, support Systematic Comparative Research Approach for Stream Reasoning? ”
In this thesis we propose Heaven, an open source framework that enables the Systematic Comparative Approach in the Stream Reasoning research field. Heaven consists of: an RSP Engine Test Stand, which emulates the aerospace engineering facility in the Stream Rea
Brief report about the contents of the Stream Reasoning workshop at SIWC 2016. Additional info about the event are available at: http://streamreasoning.org/events/sr2016
Streaming Day: an overview of Stream Reasoning
Logical reasoning in real time on multiple, heterogeneous, gigantic and inevitably noisy data streams in order to support the decision process of extremely large numbers of concurrent users.
-- S. Ceri, E. Della Valle, F. van Harmelen and H. Stuckenschmidt, 2010
The talk I gave at the Stream Reasoning workshop in TU Berlin on December 8. I give an overview of RSEP-QL and how it can capture and formalise the behaviour of existing RSP engines, e.g. CSPARQL, EP-SPARQL, CQELS, SPARQLstream
Towards efficient processing of RDF data streamsAlejandro Llaves
Presentation of short paper submitted to OrdRing workshop, held at ISWC 2014 - http://streamreasoning.org/events/ordring2014.
In the last years, there has been an increase in the amount of real-time data generated. Sensors attached to things are transforming how we interact with our environment. Extracting meaningful information from these streams of data is essential for some application areas and requires processing systems that scale to varying conditions in data sources, complex queries, and system failures. This paper describes ongoing research on the development of a scalable RDF streaming engine.
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo PeriniFlink Forward
World's toughest and most interesting analysis tasks lie at the intersection of graph data (inter-dependencies in data) and deep learning (inter-dependencies in the model). Classical graph embedding techniques have for years occupied research groups seeking how complex graphs can be encoded into a low-dimensional latent space. Recently, deep learning has dominated the space of embeddings generation due to its ability to automatically generate embeddings given any static graph.
Grapharis is a project that revitalizes the concept of graph embeddings, yet it does so in a real setting were graphs are not static but keep changing over time (think of user interactions in social networks). More specifically, we explored how a system like Flink can be used to simplify both the process of training a graph embedding model incrementally but also make complex inferences and predictions in real time using graph structured data streams. To our knowledge, Grapharis is the first complete data pipeline using Flink and Tensorflow for real-time deep graph learning. This talk will cover how we can train, store and generate embeddings continuously and accurately as data evolves over time without the need to re-train the underlying model.
Data Stream Analytics - Why they are importantParis Carbone
Streaming is cool and it can help us do quick analytics and make profit but what about tsunamis? This is a motivation talk presented at the SeRC Big Data Workshop in Sweden during spring 2016. It motivates the streaming paradigm and provides examples on Apache Flink.
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...Riccardo Tommasini
Stream Reasoning (SR) research field is grown enough to prove that reasoning upon rapidly changing information is possible. RDF Stream Processing (RSP) Engines, systems capable to handle at semantic level RDF-encoded information flows, are increasing in number of implemented solutions. Now the Stream Reasoning community is working on the standardization of the methods and tools that supported their development.
Many Computer Science (CS) research fields shown their interest for a deeper comprehension of their own work nature. Studies like [46, 51] investi- gated the publications in those field, highlighting that the majority of them are allied to an Engineering epistemology. However, they also evinced and criticized the concrete differences with other engineering research areas, which focus on evaluation of the proposed systems and not only on their design and development.
The lacks of an empirical approach can be ascribed to the complex nature of the software systems. However, it is possible to face such studies that can not be easily modeled, reducing the complexity of the analysis keeping intact the relevance of each involved system. In social science and economy, where researchers deal with cross case studies, it is commonly used a System- atic Comparative Research Approach (SCRA) within an experimental setting, which grants properties like repeatability, reproducibility and comparability to build the evaluation upon.
The SR community agreed that it is mandatory evaluating RSP Engines, understanding how these systems perform in real uses cases. Recent works in the filed [53, 41, 19] pursued this goal, providing benchmarks for RSP Engines evaluation. Further analysis pointed out the challenges involved by the Stream Reasoning research and posed the basis for a proper RSP Engines evaluation, describing in detail where previous works have failed and how the can be
improved [44].
The limitations of the existing benchmarking proposals proved that the
empirical evaluation of RSP Engines is just at the beginning. What is still missing in an infrastructure that allows to compare, possibly automatically, the performances of many RSP Engines and that grants the properties of an experimental setting. In this thesis we brace this challenge borrowing from the aerospace engineering the idea of an engine test stand, which is an automatic facility for engine testing and development.
A test stand allows to design experiments and to execute them, evaluat- ing engines in a controlled environment. Thus, we formulate the following research question: ”Can an engine test stand, together with queries, datasets and methods, support Systematic Comparative Research Approach for Stream Reasoning? ”
In this thesis we propose Heaven, an open source framework that enables the Systematic Comparative Approach in the Stream Reasoning research field. Heaven consists of: an RSP Engine Test Stand, which emulates the aerospace engineering facility in the Stream Rea
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsDatabricks
Today, users have multiple options for big data analytics in terms of open-source and proprietary systems as well as in cloud computing service providers. In order to obtain the best value for their money in a SaaS cloud environment, users need to be aware of the performance of each service as well as its associated costs, while also taking into account aspects such as usability in conjunction with monitoring, interoperability, and administration capabilities.
We present an independent analysis of two mature and well-known data analytics systems, Apache Spark and Presto. Both running on the Amazon EMR platform, but in the case of Apache Spark, we also analyze the Databricks Unified Analytics Platform and its associated runtime and optimization capabilities. Our analysis is based on running the TPC-DS benchmark and thus focuses on SQL performance, which still is indispensable for data scientists and engineers. In our talk we will present quantitative results that we expect to be valuable for end users, accompanied by an in depth look into the advantages and disadvantages of each alternative.
Thus, attendees will be better informed of the current big data analytics landscape and find themselves in a better position to avoid common pitfalls in deploying data analytics at a scale.
Presented at IDEAS SoCal on Oct 20, 2018. I discuss main approaches of deploying data science engines to production and provide sample code for the comprehensive approach of real time scoring with MLeap and Spark ML.
Scylla Summit 2017: The Upcoming HPC EvolutionScyllaDB
In this talk, I will explain how HPC is beginning to evolve and how we use supercomputers to monitor supercomputers. First we will look at how HPC is different from cloud computing in terms of infrastructure and application architecture. Then I will discuss how those things are changing and why. Finally, I will dive into a use case of monitoring supercomputers as an application area for Scylla.
Performing Simulation-Based, Real-time Decision Making with Cloud HPCinside-BigData.com
Zach Smocha from Rescale presented this deck at the HPC User Forum in Tucson.
Watch the video presentation: http://wp.me/p3RLHQ-fdC
Learn more: http://www.rescale.com/
and
http://hpcuserforum.com
On the need for applications aware adaptive middleware in real-time RDF data ...Zia Ush Shamszaman
Introduction
Problems
Analysis
Evaluation
Adaptive approach
Conclusion
Streams are originated from a variety of sources (physical or virtual sensors)
Data is produced continuously (usually at short intervals) with a time stamp.
Queries over RDF streams are executed once but continuously monitored to report any change.
Different RSP Engines
CQELS, C-SPARQL, SPARQLstream, EPSPARQL, ETALIS, SPARKWAVE, etc.
Various features of RSP engines
Query
Input Data Model
Execution Strategy
Output Data Model
Input rate
Memory consumptions
RSP Engines Characteristics Categorization
Design Time includes aspects such as input data model, language to define processing rules, operational semantics, and supported streaming operators, etc.
Run Time includes aspects such as Memory, Latency, processing & optimization techniques, quality of service (QoS), load balancing, etc.
Is there any single best RSP engine that can adapt to the diverse application requirements?
•There is no single best system:
•according to the evaluation results and
•few RSP
benchmarks.
•The different features of RSP affects:
•user satisfaction, and
•RSP engines performance
•We need an adaptive middleware which can:
•bridge the gap between applications and RSP engines
•can satisfy diverse user requirements
YABench: A Comprehensive Framework for RDF Stream Processor Correctness and P...Maxim Kolchin
RDF stream processing (RSP) has become a vibrant area of research in the semantic web community. Recent advances have resulted in the development of several RSP engines that leverage semantics to facilitate reasoning over flows of incoming data. These engines vary greatly in terms of implemented query syntax, their evaluation and operational semantics, and in various performance dimensions. Existing benchmarks tackle particular aspects such as functional coverage, result correctness, or performance. None of them, however, assess RSP engine behavior comprehensively with respect to all these dimensions. In this paper, we introduce YABench, a novel benchmarking framework for RSP engines. YABench extends the concept of correctness checking and provides a flexible and comprehensive tool set to analyze and evaluate RSP engine behavior. It is highly configurable and provides quantifiable and reproducible results on correctness and performance characteristics. To validate our approach, we replicate results of the existing CSRBench benchmark with YABench. We then assess two well-established RSP engines, CQELS and C-SPARQL, through more comprehensive experiments. In particular, we measure precision, recall, performance, and scalability characteristics while varying throughput and query complexity. Finally, we discuss implications on the development of future stream processing engines and benchmarks.
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and toolsSeveralnines
This is part 1 of a webinar trilogy on MySQL Query Tuning, in which we look at query tuning process and tools to help with that. We’ve covered topics such as SQL tuning, indexing, the optimizer and how to leverage EXPLAIN to gain insight into execution plans. Part 1: Query tuning process and tools.
AGENDA
• Query tuning process
- Build
- Collect
- Analyze
- Tune
- Test
• Tools
- tcpdump
- pt-query-digest
SPEAKER
Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.
The Sierra Supercomputer: Science and Technology on a Missioninside-BigData.com
In this deck from the Stanford HPC Conference, Adam Bertsch from LLNL presents: The Sierra Supercomputer: Science and Technology on a Mission.
"LLNL just celebrated its 65th anniversary. Since 1952, the laboratory has been at the forefront of high performance computing. Initially, HPC was used to accelerate the design and testing of the nation's nuclear stockpile. Since the last U.S. nuclear test in 1992, HPC has been used to validate the safety, security, and reliability of stockpile without nuclear testing.
Our next flagship HPC system at LLNL will be called Sierra. A collaboration between multiple government and industry partners, Sierra and its sister system Summit at ORNL, will pave the way towards Exascale computing architectures and predictive capability."
Watch the video: https://wp.me/p3RLHQ-i4K
Learn more: https://computation.llnl.gov/computers/sierra
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Ontology-based data access: why it is so cool!Josef Hardi
A brief introduction about ontology-based data access (shortly OBDA) and its core implementation. I presented too a recent simple benchmark between -ontop- and Semantika---two most available software for OBDA framework---in term of query performance (including details in the appendix section). The slides were presented for Friday Research Meeting in Stanford Center for Biomedical Informatics Research (BMIR).
License: Creative Commons by Attribution 3.0
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Heaven: A Framework for Systematic Comparative Research Approach for RSP Engines
1. DEIB - Politecnico di Milano
Riccardo Tommasini, Emanuele Della Valle,
Marco Balduini and Daniele Dell’Aglio
Heaven: a framework for systematic
comparative research approach for RSP
engines
2. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
2
Agenda
• Introduction
• Motivation
• Heaven [Contribution]
• Requirements Analysis
• Test Stand Architecture
• Baselines
• Conclusion and Future Works
3. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
3
It’s a Streaming World
5. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
5
Stream Reasoning
Logical real time reasoning on multiple,
heterogeneous, gigantic and inevitably noisy data
streams.
-- E. Della Valle, S. Ceri, F. van Harmelen and H.
Stuckenschmidt, 2010
10. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
10
State of the art RSP Benchmarking
Benchmark
DataStreams &
Ontologies
Queries Metrics
SR Bench ✔ ✔ Feasibility
LS Bench ✔ ✔ Feasibility, Throughput
CSRBench ✔ ✔
Feasibility, Throughput,
Correctness
CityBench ✔ ✔
Feasibility, Throughput,
Memory
No absolute
winner
11. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
11
Domain Specific Benchmark
The goal of a domain specific benchmark is to
foster technological progress by guaranteeing a
fair assessment.
- Jim Gray, The Benchmark Handbook
for Database and Transaction Systems, 1993
12. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
12
A Well-Known Hypothesis
The incremental maintenance of the
materialisation is faster then full re-materialisation
of ontological entailment when content changes
are small enough (e.g. greater than 10%).
15. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
15
Analysis
A. Qualitatively, is there a solution that always
outperforms the others?
B. If no dominant solution can be found, when
does a solution work better than another one?
C. Quantitatively, is there a solution that
distinguishes itself from the others?
D. Why does a solution perform better than another
solution under a certain experimental
condition?
16. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
16
Comparative Research
• It is natively case driven:
• It considers cases as a combination of known properties
• It defines analysis guidelines through baselines
• It is extensively used to analyse complex systems
• It provides layered frameworks to
• systematically examine cases
• identify similarities/differences enabling us to catch more
insights.
17. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
17
Research Question
Can we enable a systematic comparative
research approach (SCRA) for RSP
engines?
18. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
18
Heaven
• A set of requirements to satisfy.
• An architecture for an RSP engine Test Stand.
• Two baseline RSP engine architectures
• A proof-of-concept implementation (open
source)
19. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
19
Requirement Analysis
An Experimental Environment guarantees
Comparability
Repeatability
Reproducibility
On their definitions we eliciting the the
requirements our framework has to satisfy.
20. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
20
Comparability related requirements
[R1] RSP engine agnostic, i.e. independent from
the tested RSP engine.
[R2] Independent from the measured key
performance indicators (KPIs), i.e., the KPIs set
has to be extensible.
[R3] Identify baseline RSP engines, i.e., the
minimal meaningful approaches to realise an RSP
engine.
21. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
21
Reproducibility related requirements
[R4] Data independent, i.e. allowing the usage
of any data stream and any static data.
[R5] Query independent, i.e. allowing the usage
of any query from users’ domains of interest.
22. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
22
Repeatability related requirements
[R6] Minimise the experimental error, i.e., it has
to affect the RSP engine evaluation as little as
possible and in a predictable way.
23. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
23
RSP Experiment Design
is the RSP engine used as subject in the experiment;
is an ontology and any data not subject to change
during the experiment.
is the description of the input data streams:
is the set of continuous queries registered into the
engine
is the set of key performance indicators (KPIs) to
collect.
The result of the execution of an experiment is a
Report that captures the engine dynamics.
E
T
Q
D
K
R
24. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
24
Test Stands (from aerospace engineering)
• Experimental environment
• Systematic evaluation of
complex system
• Black Box evaluation of
complex system
25. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
RSPEngine
< ,Q>
25
E,D,T,Q,KE
Input outputInterface
Interface
T
T QD
Streamer
D
Receiver
Heaven Test Stand Architecture
K
ResultCollector
K
26. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
26
Heaven Test Stand Architecture
27. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
27
RSP Baselines
Simplified RSP engines cases that combine known
properties, i.e. minimal meaningful approaches to
realise an RSP engine.
Pipeline of a Data Stream Management System
(DSMS) and a Reasoner.
30. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
30
RSP Baselines
• 𝞺DF entailment regime
• they exploit absolute time, i.e. their internal
clock can be externally controlled.
• Ensures results correctness even when
overloaded
• Allows to calculate latency of query
response (responsiveness)
31. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
31
Example of Dynamics Comparison
Incremental Baseline
32. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
32
Conclusion
Top-down hypothesis verification, even when an
RSP engines is extremely simple (i.e. the baselines), is
not straight forward.
There is a growing need of comparative analysis.
Heaven enables the systematic execution of
experiments, paving the road to comparative
investigations.
33. ESWC - 2016 - Riccardo Tommasini - @rictomm - DEIB Polimi
ESWC16
33
Future Works
Systematic analysis of existing solutions
A web-based environment where a users can:
• choose one of existing benchmarks (datasets,
queries)
• design experiment
• run them and consult the results online
• compare the results agains the baselines or
existing integrated RSP engines.