This document describes the methodology for organizing semantic evaluation campaigns. It outlines the key phases of an evaluation campaign: initiation, involvement, preparation and execution, and dissemination. Actors involved include evaluation campaign organizers and participants. The methodology provides guidelines for each phase and recommendations based on lessons from previous campaigns.
SEALS presented the results of its first SEALS Evaluation Campaigns at I-SEMANTICS 2011, held from September 7-9 in Graz, Austria.
Read more about SEALS at http://www.seals-project.eu/
SEALS presented the results of its first SEALS Evaluation Campaigns at I-SEMANTICS 2011, held from September 7-9 in Graz, Austria.
Read more about SEALS at http://www.seals-project.eu/
This report is a background report to the Final report 1 ā The R&D Evaluation Methodology. It sets out the detailed cost framework for the implementation of the National Evaluation of Research Organisations ā NERO.
Ch cie gra - stress-test-diffusion-model-and-scoring-performanceC Louiza
Ā
The 2008 crisis has demonstrated the importance of conducting stress tests to prevent banking failure. This exercise has also a significant impact on banksā capital, organization and image.
This paper aims to provide a methodology that diffuses the stress applied on a credit portfolio while taking into account risk and performance for each rating category.
The content is structured in three parts:
The importance of stress testing and the impacts on reputation
Methodology for a dynamic stress diffusion model
Study on a real SME portfolio showing that the model designed in this paper captures relationship between Gini index and the stress diffusion
Utilization focused evaluation: an introduction (Part 1 - ROER4D) SarahG_SS
Ā
Introductory slides on Utilization Focused Evaluation (UFE) that I presented to the ROER4D team (http://roer4d.org/) on 22 September 2014 as part of the project's evaluation process.
Crafting a CMMI V2 Compliant Process for Governance Practice Area: An Experie...Dr. Mustafa DeÄerli
Ā
DeÄerli, M. (2020). Crafting a CMMI V2 Compliant Process for Governance Practice Area: An Experiential Proposal. 14th Turkish National Software Engineering Symposium (UYMS 2020). 10.1109/UYMS50627.2020.9247068 ā https://ieeexplore.ieee.org/xpl/conhome/9247008/proceeding
Syed Zaffar Iqbal, Prof. Urwa Javed and Dr. Shakeel Ahmed Roshan. Department of Computer Science, Alhamd Islamic University, Pakistan. āSoftware Quality Assurance Model for Software Excellence with Its Requirementsā United International Journal for Research & Technology (UIJRT) 1.1 (2019): 39-43.
MONITORING & EVALUATION OF EXTENSION PROGRAMMESAyush Mishra
Ā
MONITORING & EVALUATION OF EXTENSION PROGRAMMES. HIGHLIGHTS EXTENSION PROGRAMME PLANNING, MONITORING AND EVALUATION OF PROJECTS, STEPS IN PROGRAM PLANNING ETC.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
Ā
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Ā
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
This report is a background report to the Final report 1 ā The R&D Evaluation Methodology. It sets out the detailed cost framework for the implementation of the National Evaluation of Research Organisations ā NERO.
Ch cie gra - stress-test-diffusion-model-and-scoring-performanceC Louiza
Ā
The 2008 crisis has demonstrated the importance of conducting stress tests to prevent banking failure. This exercise has also a significant impact on banksā capital, organization and image.
This paper aims to provide a methodology that diffuses the stress applied on a credit portfolio while taking into account risk and performance for each rating category.
The content is structured in three parts:
The importance of stress testing and the impacts on reputation
Methodology for a dynamic stress diffusion model
Study on a real SME portfolio showing that the model designed in this paper captures relationship between Gini index and the stress diffusion
Utilization focused evaluation: an introduction (Part 1 - ROER4D) SarahG_SS
Ā
Introductory slides on Utilization Focused Evaluation (UFE) that I presented to the ROER4D team (http://roer4d.org/) on 22 September 2014 as part of the project's evaluation process.
Crafting a CMMI V2 Compliant Process for Governance Practice Area: An Experie...Dr. Mustafa DeÄerli
Ā
DeÄerli, M. (2020). Crafting a CMMI V2 Compliant Process for Governance Practice Area: An Experiential Proposal. 14th Turkish National Software Engineering Symposium (UYMS 2020). 10.1109/UYMS50627.2020.9247068 ā https://ieeexplore.ieee.org/xpl/conhome/9247008/proceeding
Syed Zaffar Iqbal, Prof. Urwa Javed and Dr. Shakeel Ahmed Roshan. Department of Computer Science, Alhamd Islamic University, Pakistan. āSoftware Quality Assurance Model for Software Excellence with Its Requirementsā United International Journal for Research & Technology (UIJRT) 1.1 (2019): 39-43.
MONITORING & EVALUATION OF EXTENSION PROGRAMMESAyush Mishra
Ā
MONITORING & EVALUATION OF EXTENSION PROGRAMMES. HIGHLIGHTS EXTENSION PROGRAMME PLANNING, MONITORING AND EVALUATION OF PROJECTS, STEPS IN PROGRAM PLANNING ETC.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
Ā
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Ā
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Ā
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
Ā
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
Ā
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Ā
Clients donāt know what they donāt know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clientsā needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Ā
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Ā
Monitoring and observability arenāt traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current companyās observability stack.
While the dev and ops silo continues to crumbleā¦.many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
š Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
Ā
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
Ā
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Enhancing Performance with Globus and the Science DMZGlobus
Ā
ESnet has led the way in helping national facilitiesāand many other institutions in the research communityāconfigure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Ā
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navyās DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATOās (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
2. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
Document Information
IST Project FP7 ā 238975 Acronym SEALS
Number
Full Title Semantic Evaluation at Large Scale
Project URL http://www.seals-project.eu/
RaĀ“l GarcĀ“
u ıa-Castro (Universidad PolitĀ“cnica de Madrid) and Stuart
e
Authors
Wrigley (University of Sheļ¬eld)
Name RaĀ“l GarcĀ“
u ıa-Castro E-mail rgarcia@ļ¬.upm.es
Contact Author
Inst. UPM Phone +34 91 336 36 70
This document describes the second version of the SEALS methodology
Abstract for evaluation campaigns that is being used in the organization of the
diļ¬erent SEALS Evaluation Campaigns.
Keywords evaluation campaign, methodology, guidelines
2 of 22
4. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
List of Figures
2.1 The evaluation campaign process. . . . . . . . . . . . . . . . . . . . . . 7
2.2 Initiation phase of the evaluation campaign process. . . . . . . . . . . . 8
2.3 Involvement phase of the evaluation campaign process. . . . . . . . . . 11
2.4 Preparation and execution phase of the evaluation campaign process. . 13
2.5 Dissemination phase of the evaluation campaign process. . . . . . . . . 18
4 of 22
5. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
1. Introduction
In the SEALS project, we have issued two evaluation campaigns for each of the diļ¬erent
types of semantic technologies covered by the project. In these evaluation campaigns,
diļ¬erent tools were evaluated and compared according to a common set of evaluations
and test data.
This document describes the second version of the methodology to be followed
to implement these evaluation campaigns. The ļ¬rst version of the methodology [1]
was deļ¬ned after analysing previous successful evaluation campaigns and has been
improved with the feedback and lessons learnt obtained during the ļ¬rst SEALS Eval-
uation Campaigns.
This methodology is intended to be general enough to make it applicable to evalu-
ation campaigns that cover any type of technology (either semantic or not) and that
could be deļ¬ned in the future. Because of this, we have described the methodology
independently of the concrete details of the SEALS evaluation campaigns to increase
its usability. Nevertheless, we also highlight the beneļ¬ts of following a āSEALSā ap-
proach for evaluation campaigns (e.g., automatic execution of evaluations, evaluation
resource storage and availability).
The methodology should not be considered a strict step-by-step process; it should
be adapted to each particular case if needed and should be treated as a set of guidelines
to support evaluation campaigns instead of as a normative reference.
This document describes ļ¬rst, in chapter 2, a generic process for carrying out
evaluation campaigns, presenting the actors that participate in it as well as the detailed
sequence of tasks to be performed. Then, chapter 3 includes the agreements that
rule the SEALS evaluation campaigns and that can be reused or adapted to other
campaigns.
5 of 22
6. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
2. Evaluation campaign process
This chapter presents a process to guide the organization and execution of evaluation
campaigns over software technologies. An evaluation campaign is an activity where
several software technologies are compared along one or several evaluation scenarios,
i.e., evaluations where technologies are evaluated according to a certain evaluation
procedure and using common test data.
A ļ¬rst version of this process appeared in [1] and has been the one followed in the
ļ¬rst round of evaluation campaigns that have been performed in the SEALS project.
This document updates that previous version including feedback from people organis-
ing and participating in those evaluation campaigns.
First, the chapter presents the diļ¬erent actors that participate in the evaluation
campaign process; then, a detailed description of such process is included, including
the tasks to be performed and diļ¬erent alternatives and recommendations for carrying
out them.
2.1 Actors
The tasks of the evaluation process are carried out by diļ¬erent actors according to the
kind of roles that must be performed in each task. This section presents the diļ¬erent
kind of actors involved in the evaluation campaign process.
ā¢ Evaluation Campaign Organizers (Organizers from now on). The Organiz-
ers are in charge of the general organization and monitoring of the evaluation
campaign and of the organization of the evaluation scenarios that are performed
in the evaluation campaign. Depending on the size or the complexity of the
evaluation campaign, the Organizers group could be divided into smaller groups
(e.g., groups that include the people in charge of individual evaluation scenarios).
ā¢ Evaluation Campaign Participants (Participants from now on). The Par-
ticipants are tool providers or people with the permission of tool providers that
participate with a tool in the evaluation campaign.
SEALS value-added features
In SEALS there is a committee, the Evaluation Campaign Advisory Committee
(formerly named the Evaluation Campaign Organizing Committee), in charge of
supervising all the evaluation campaigns performed in the project to ensure that
they align to the SEALS community goals. This committee is composed of the
SEALS Executive Project Management Board, the SEALS work package leaders
and other prominent external people. Each evaluation campaign is led by diļ¬er-
ent groups of Evaluation Campaign Organizers (formerly named the Evaluation
Campaign Executing Committee) who coordinate with the Evaluation Campaign
Advisory Committee.
6 of 22
7. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
2.2 Process
This section describes the process to follow for carrying out evaluation campaigns.
Since this process must be general to accommodate diļ¬erent types of evaluation cam-
paigns, this methodology only suggests a set of general tasks to follow, not imposing
any restrictions or speciļ¬c details in purpose.
Some tasks have alternative paths that can be followed. In these cases, the method-
ology does not impose any alternative but presents all the possibilities so the people
carrying out the task can decide which path to follow.
The description of the tasks is completed with a set of recommendations extracted
from the analysis of other evaluation campaigns, as presented in [1]. These recommen-
dations are not compulsory to follow, but support speciļ¬c aspects of the evaluation
campaign.
Figure 2.1 shows the main phases of the evaluation campaign process.
INITIATION
Ā INVOLVEMENT
Ā
PREPARATION
Ā AND
Ā EXECUTION
Ā DISSEMINATION
Ā
Figure 2.1: The evaluation campaign process.
The evaluation campaign process is composed of four phases, namely, Initiation,
Involvement, Preparation and Execution, and Dissemination. The main goals of these
phases are the following:
ā¢ Initiation phase. It comprises the set of tasks where the diļ¬erent people in-
volved in the organization of the evaluation campaign are identiļ¬ed and where
the diļ¬erent evaluation scenarios are deļ¬ned.
ā¢ Involvement phase. It comprises the set of tasks in which the evaluation
campaign is announced and participants show their interest in participating by
registering for the evaluation campaign.
ā¢ Preparation and Execution phase. It comprises the set of tasks that must
be performed to insert the participating tools into the evaluation infrastructure
and to execute each of the evaluation scenarios and analyse their results.
ā¢ Dissemination phase. It comprises the set of tasks that must be performed
to disseminate the evaluation campaign results and to make all the evaluation
campaign results and resources available.
These four phases of the evaluation campaign process are described in the following
sections, where a deļ¬nition of the tasks that constitute them, the actors that perform
these tasks, its inputs, and its outputs are provided.
7 of 22
8. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
2.2.1 Initiation phase
The Initiation phase comprises the set of tasks where the diļ¬erent people involved in
the organization of the evaluation campaign and the evaluation scenarios are identi-
ļ¬ed and where the diļ¬erent evaluation scenarios are deļ¬ned. These tasks and their
interdependencies, shown in ļ¬gure 2.2, are the following:
1. Identify organizers.
2. Deļ¬ne evaluation scenarios.
Evaluation Campaign
Organizers Deļ¬ne
Ā Evaluation Scenarios
Iden%fy
Ā
evalua%on
Ā
organizers
Ā Evaluation Campaign
scenarios
Ā
Schedule
Organizers Organizers
Figure 2.2: Initiation phase of the evaluation campaign process.
Identify organizers
Actors
E. C. Organizers
Inputs Outputs
E. C. Organizers
The goal of this task is to deļ¬ne the group of people who will be in charge of
the general organization and monitoring of the evaluation campaign as well as of
organizing the evaluation scenarios to be performed in the evaluation campaign and
of taking them to a successful end.
Recommendations for the evaluation campaign
Requirement Recommendation
+ objectiveness The evaluation campaign does not favor one participant over
another.
+ consensus The decisions and outcomes in the evaluation campaign are
consensual.
+ transparency The actual state of the evaluation campaign can be known by
anyone.
+ participation The organization overhead of the evaluation campaign is min-
imal.
8 of 22
9. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
Recommendations for the evaluation campaign organization
Requirement Recommendation
+ credibility The evaluation campaign is organized by several organizations.
+ openness Organizing the evaluation campaign is open to anyone inter-
ested.
Organizing an evaluation scenario is open to anyone interested.
+ relevance Community experts are involved in the organization of the
evaluation campaign.
SEALS value-added features
Diļ¬erent people in the SEALS community have expertise in organizing evaluation
campaigns for diļ¬erent types of semantic technologies. Donāt hesitate to ask for
advice or collaboration when organizing an evaluation campaign!
Deļ¬ne evaluation scenarios
Actors
E. C. Organizers
Inputs Outputs
E. C. Organizers Evaluation Scenarios
Evaluation Campaign Schedule
In this task, the Organizers must deļ¬ne the diļ¬erent evaluation scenarios that will
take place in the evaluation campaign and the schedule to follow in the rest of the
evaluation campaign.
For each of these evaluation scenarios, the Organizers must provide the complete
description of the evaluation scenario; we encourage to follow the conventions proposed
by the ISO/IEC 14598 standard on software evaluation [2].
Evaluation scenario descriptions should at least include:
ā¢ Evaluation goals, quality characteristics covered and applicability (i.e., which
requirements tools must satisfy to be evaluated).
ā¢ Test data (i.e., evaluation inputs), evaluation outputs, metrics, and interpreta-
tions.
ā¢ Result interpretation (i.e., how to interpret and visualize evaluation results).
ā¢ Evaluation procedure and the resources needed in this procedure (e.g., software,
hardware, human).
Three diļ¬erent types of test data can be used in an evaluation scenario:
ā¢ Development test data are test data that can be used when developing a tool.
ā¢ Training test data are test data that can be used for training a tool before
performing an evaluation.
9 of 22
10. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
ā¢ Evaluation test data are the test data that are used for performing an evalu-
ation.
Recommendations for evaluation scenarios
Requirement Recommendation
+ credibility Evaluation scenarios are organized by several organizations.
+ relevance Evaluation scenarios are relevant to real-world tasks.
+ consensus Evaluation scenarios are deļ¬ned by consensus.
+ objectiveness Evaluation scenarios can be executed with diļ¬erent test data.
Evaluation scenarios do not favor one participant over an-
other.
+ participation Evaluation scenarios are automated as much as possible.
Recommendations for evaluation descriptions
Requirement Recommendation
+ transparency Evaluation descriptions are publicly available.
+ participation Evaluation descriptions are documented.
Recommendations for test data
Requirement Recommendation
+ objectiveness Development, training and evaluation test data are disjoint.
Test data have the same characteristics.
Test data do not favor one tool over another.
+ consensus Test data are deļ¬ned by consensus.
+ participation Test data are deļ¬ned using a common format.
Test data are annotated in a simple way.
Test data are documented.
+ transparency Test data are reusable.
Test data are publicly available.
+ relevance Test data are novel.
The quality of test data is higher than that of existing test
data.
SEALS value-added features
The SEALS Platform already contains diļ¬erent resources for the evaluation of
semantic technologies (i.e., evaluation workļ¬ows, services and test data). These
resources are publicly available so they can be reused to avoid starting your eval-
uation from scratch.
2.2.2 Involvement phase
The Involvement phase comprises the set of tasks in which the evaluation campaign is
announced and participants show their interest in participating by registering for the
10 of 22
11. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
evaluation campaign. These tasks and their interdependencies, shown in ļ¬gure 2.3,
are the following:
1. Announce evaluation campaign.
2. Provide registration mechanisms.
3. Participant registration.
Evaluation Campaign
Announce
Ā Announcement Participant List
Par+cipant
Ā
evalua+on
Ā
registra+on
Ā
Ā
campaign
Ā
Organizers Participants
Provide
Ā
registra+on
Ā
mechanisms
Ā Registration Mechanisms
Organizers
Figure 2.3: Involvement phase of the evaluation campaign process.
Announce evaluation campaign
Actors
E. C. Organizers
Inputs Outputs
Evaluation Scenarios Evaluation Campaign Announcement
Evaluation Campaign Schedule
Once the diļ¬erent evaluation scenarios are deļ¬ned, the Organizers must announce
the evaluation campaign using any mechanism available (e.g., mailing lists, blogs, etc.)
with the goal of reaching the developers of the tools that are targeted by the evaluation
scenarios.
Recommendations for evaluation campaign announcement
Requirement Recommendation
+ participation The evaluation campaign is announced internationally.
Tool developers are directly contacted.
11 of 22
12. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
Recommendations for evaluation campaign participation
Requirement Recommendation
+ openness Participation in the evaluation campaign is open to any orga-
nization.
+ participation The eļ¬ort needed for participating in the evaluation campaign
is minimal.
People can participate in the evaluation campaign regardless
of their location.
Participation in the evaluation campaign does not require at-
tending to any location.
The requirements for tool participation are minimal.
Participants have time to participate in several evaluation sce-
narios.
+ relevance Community experts participate in the evaluation campaign.
Tools participating in the evaluation campaign include the
most relevant tools.
SEALS value-added features
The SEALS community dissemination services (e.g., SEALS Portal, mailing lists,
blog, etc.) can support spreading your evaluation campaign announcements to a
whole community of users and providers interested in semantic technology evalu-
ation.
Provide registration mechanisms
Actors
E. C. Organizers
Inputs Outputs
Evaluation Scenarios Registration Mechanisms
In this task, the Organizers must provide the mechanisms needed to allow potential
participants to register and to provide detailed information about themselves and their
tools.
Recommendations for registration mechanisms
Requirement Recommendation
+ participation Participants can register in the evaluation campaign regardless
of their location.
SEALS value-added features
If you donāt want to implement your own registration mechanisms, the SEALS
Portal can take care of user registration for your evaluation campaign.
12 of 22
13. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
Participant registration
Actors
E. C. Participants
Inputs Outputs
Evaluation Campaign Announcement Participant list
Registration Mechanisms
In this task, any individual or organization willing to participate in any of the eval-
uation scenarios of the evaluation campaign must register to indicate their willingness
to do so.
SEALS value-added features
If you are using the SEALS Platform for executing your evaluation campaign,
participant registration through the SEALS Portal can be coupled with diļ¬erent
evaluation services (e.g., uploading the participant tool into the platform).
2.2.3 Preparation and execution phase
The Preparation and execution phase comprises the set of tasks that must be per-
formed to insert the participating tools into the evaluation infrastructure, to execute
each of the evaluation scenarios, and to analyse the evaluation results. The tasks
that compose this phase can be performed either independently for each evaluation
scenario or covering all the evaluation scenarios in each task. These tasks and its
interdependencies, shown in ļ¬gure 2.4, are the following:
1. Provide evaluation materials.
2. Insert tools.
3. Perform evaluation.
4. Analyse results.
Evaluation Tools Evaluation Result
Provide
Ā Materials Inserted Results Analysis
Insert
Ā Perform
Ā Analyse
Ā
evalua,on
Ā
tools
Ā evalua,on
Ā results
Ā
materials
Ā
Organizers Participants Participants/Organizers Participants/Organizers
Figure 2.4: Preparation and execution phase of the evaluation campaign process.
13 of 22
14. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
Provide evaluation materials
Actors
E. C. Organizers
Inputs Outputs
Evaluation Scenarios Evaluation Materials
In this task the Organizers must provide to the registered participants all the
evaluation materials needed in the evaluation, including:
ā¢ Instructions on how to participate.
ā¢ Evaluation description.
ā¢ Evaluation test data.
ā¢ Evaluation infrastructure.
ā¢ Any software needed for the evaluation.
Alternatives:
ā Evaluation test data availability
a. Evaluation test data are available to the participants in this task.
b. Evaluation test data are sequestered and are not available to the par-
ticipants.
ā Reference tool
a. Participants are provided with a reference tool and with the results
for this tool. This tool could be made of modules so participants donāt
have to implement a whole tool to participate.
Recommendations for evaluation material provision
Requirement Recommendation
+ objectiveness All the participants are provided with the same evaluation
materials.
+ participation Participants are provided with all the necessary documenta-
tion.
Recommendations for evaluation software
Requirement Recommendation
+ openness The software used in the evaluation is open source.
+ transparency The software used in the evaluation is publicly available.
+ participation The software used in the evaluation is robust and eļ¬cient.
14 of 22
15. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
SEALS value-added features
Using the SEALS Platform participants can access all the evaluation materials
stored in the platform repositories either manually through the SEALS Portal or
automatically using the SEALS Platform services.
Insert tools
Actors
E. C. Participants
Inputs Outputs
Evaluation Materials Tools Inserted
Once the Participants have all the evaluation materials, they must insert their
tools into the evaluation infrastructure and ensure that these tools are ready for the
evaluation execution.
Recommendations for tool insertion
Requirement Recommendation
+ participation Participants can use test data to prepare their tools.
Participants can check that their tool provides the outputs
required.
Participants can test the insertion of their tools.
The insertion of tools is simple.
SEALS value-added features
Similarly to participant registration, tool insertion can be managed through the
SEALS Portal. Furthermore, the SEALS Platform can check that tools are cor-
rectly inserted so they are ready for execution.
Perform evaluation
Actors
E. C. Participants/E. C. Organizers
Inputs Outputs
Tools Inserted Evaluation Results
In this task, the evaluation is executed over all the participating tools and the
evaluation results of all the tools are collected.
15 of 22
16. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
Alternatives:
ā Evaluation execution
a. The execution of the evaluation is performed by the Organizers.
b. The execution of the evaluation is performed by the Participants.
c. The execution of the evaluation is performed by the Organizers and
the Participants.
ā Time for executing evaluations
a. Participants have a limited period of time to return their evaluation
results.
ā Use of auxiliary resources
a. Participants cannot use any auxiliary resource in the evaluation (de-
velopment and training test data, external resources, etc.).
Recommendations for evaluation execution
Requirement Recommendation
+ participation Evaluation execution requires few resources.
Evaluation execution can be made regardless of the location
or time.
+ objectiveness All the tools are evaluated following the same evaluation de-
scription.
All the tools are evaluated using the same test data.
+ credibility The results of the evaluation execution are validated.
SEALS value-added features
If all the resources needed in an evaluation (i.e., evaluation workļ¬ow, test data
and tools) are stored inside the SEALS Platform, the platform can automatically
execute your evaluation and store all the produced results.
Analyse results
Actors
E. C. Participants/E. C. Organizers
Inputs Outputs
Evaluation Results Result Analysis
Once the evaluation results of all the tools are collected, they are analysed both
individually for each tool and globally including all the tools.
This results analysis must be reviewed in order to get agreed conclusions. There-
fore, if the results are analysed by the Organizers then this analysis must be reviewed
16 of 22
17. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
by the Participants and vice versa, that is, if the results are analysed by the Partici-
pants they must be reviewed by the Organizers.
Alternatives:
ā Analysis of evaluation results
a. The analysis of the evaluation results is performed by the Organizers.
b. The analysis of the evaluation results is performed by the Participants.
c. The analysis of the evaluation results is performed by the Organizers
and the Participants.
ā Anonymised results
a. The results of the evaluation campaign are anonymised so the name
of the tool that produces them is not known.
Recommendations for evaluation results
Requirement Recommendation
+ objectiveness Evaluation results are analysed identically for all the tools.
Participants can comment on the evaluation results of their
tools before making them public.
There is enough time for analysing the evaluation results.
+ transparency Evaluation results are publicly available at the end of the
evaluation campaign.
Evaluation results can be replicated by anyone.
+ sustainability Evaluation results are compiled, documented, and expressed
in a common format.
SEALS value-added features
Evaluation results stored in the SEALS Platform can be accessed either through
the SEALS Portal by means of visualization services or automatically through the
platform services so they can be used in your own calculations. Besides, you can
easily combine the results of diļ¬erent evaluations to suit your goals.
2.2.4 Dissemination phase
The Dissemination phase comprises the set of tasks that must be performed to dis-
seminate the evaluation campaign results and to make all the evaluation campaign
result and resources available. The tasks that compose this phase can be performed
either independently for each evaluation scenario or covering all the evaluation scenar-
ios in each task. These tasks and its interdependencies, shown in ļ¬gure 2.5, are the
following:
1. Disseminate results.
17 of 22
18. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
2. Publish outcomes.
Feedback Results Published
Disseminate
Ā Publish
Ā
results
Ā outcomes
Ā Evaluation Resources
Published
Organizers + Participants Organizers + Participants
Figure 2.5: Dissemination phase of the evaluation campaign process.
Disseminate results
Actors
E. C. Organizers
E. C. Participants
Inputs Outputs
Result Analysis Feedback
In this task, the Organizers and the Participants must disseminate the results of the
evaluation campaign. The preferred way of dissemination is one that maximizes dis-
cussion to obtain feedback about the evaluation campaign (e.g., a face-to-face meeting
with participants or a workshop).
Recommendations for result presentation
Requirement Recommendation
+ consensus Evaluation results are presented and discussed in a meeting.
+ transparency Participants write the description of their tools and their re-
sults.
+ objectiveness The presentation of results is identical for all the tools.
The presentation of results includes the reasons for obtaining
these results.
Recommendations for the dissemination meeting
Requirement Recommendation
+ openness The dissemination meeting is public.
+ relevance The dissemination meeting is collocated with a relevant event.
+ participation Participants present the details and results of their tool.
+ credibility The diļ¬culties faced during the evaluation campaign are pre-
sented.
+ consensus Feedback about the evaluation campaign is obtained.
+ sustainability The dissemination meeting includes the discussion of the next
steps to follow.
A survey is performed to organizers, participants and atten-
dants.
18 of 22
19. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
SEALS value-added features
The SEALS community dissemination services (e.g., SEALS Portal, mailing lists,
blog, etc.) can support disseminating your evaluation campaign results and at-
tracting people to meetings where they can be discussed.
Publish outcomes
Actors
E. C. Organizers
E. C. Participants
Inputs Outputs
Feedback Results and Resources Published
In this task, the Organizers and the Participants are encouraged to document
the results of the evaluation campaign and of each of the tools, including not only
the results obtained but also improvement recommendations based on the feedback
obtained and the lessons learnt during the evaluation campaign.
Finally, the Organizers must make publicly available all the evaluation resources
used in the evaluation campaign, including any resources that were not made available
to participants (e.g., sequestered evaluation test data).
Recommendations for result publication
Requirement Recommendation
+ participation Results are jointly published by all participants (e.g., as work-
shop proceedings, journal special issues, etc.).
Recommendations for resource publication
Requirement Recommendation
+ transparency All the evaluation resources are made publicly available.
+ sustainability Test data are maintained by an association.
SEALS value-added features
Making evaluation resources public once the evaluation campaign is over is straight-
forward since all those resources are already available from the SEALS Platform.
Furthermore, the SEALS Portal is the best place to upload or link to any report
resulting from your evaluation campaign.
19 of 22
20. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
3. Evaluation campaign agreements
This chapter presents the general terms for participation in the SEALS evaluation
campaigns and the policies for using the resources and results produced in these eval-
uation campaigns. They are based on the data policies of the Ontology Alignment
Evaluation Initiative1 .
These terms for participation and policies are only a suggestion. They are being
used in the SEALS evaluation campaigns but can be adapted as seen ļ¬t for other
campaigns.
3.1 Terms of participation
By submitting a tool and/or its results to a SEALS evaluation campaign the partici-
pants grant their permission for the publication of the tool results on the SEALS web
site and for their use for scientiļ¬c purposes (e.g., as a basis for experiments).
In return, it is expected that the provenance of these results is correctly and duly
acknowledged.
3.2 Use rights
In order to avoid any inadequate use of the data provided by the SEALS evaluation
campaigns, we make clear the following rules of use of these data.
It is the responsibility of the user of the data to ensure that the authors of the results
are properly acknowledged, unless these data are used in an anonymous aggregated
way. In the case of participant results, an appropriate acknowledgement is the mention
of this participant and a citation of a paper from the participants (e.g., the paper
detailing their participation). The speciļ¬c conditions under which the results have
been produced should not be misrepresented (an explicit link to their source in the
SEALS web site should be made).
These rules apply to any publication mentioning these results. In addition, speciļ¬c
rules below also apply to particular types of use of the data.
Rule applying to the non-public use of the data
Anyone can freely use the evaluations, test data and evaluation results for evaluating
and improving their tools and methods.
Rules applying to evaluation campaign participants
The participants of some evaluation campaign can publish the results as long as they
cite the source of the evaluations and in which evaluation campaign they were obtained.
Participants can compare their results with other published results on the SEALS
web site as long as they also:
1
http://oaei.ontologymatching.org/doc/oaei-deontology.html
20 of 22
21. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
ā¢ compare with the results of all the participants of the same evaluation scenario;
and
ā¢ compare with all the test data of this evaluation scenario.
Of course, participants can mention their participation in the evaluation campaign.
Rules applying to people who did not participate in an evaluation campaign
People who did not participate in an evaluation campaign can publish their results as
long as they cite the sources of the evaluations and in which evaluation campaign they
were obtained and they need to make clear that they did not participate in the oļ¬cial
evaluation campaign.
They can compare their results with other published results on the SEALS web
site as long as they:
ā¢ cite the source of the evaluations and in which evaluation campaign they were
obtained;
ā¢ compare with the results of all the participants of the same evaluation scenario;
and
ā¢ compare with all the test data of this evaluation scenario.
They cannot pretend having executed the evaluation in the same conditions as the
participants. Furthermore, given that evaluation results change over time, it is not
ethical to compare one tool against old results; one should always make comparisons
with the state of the art. In the case that this comparison is not possible, results can
be compared with older results but it must be made clear the age of the result and
the version of the tool that produced them.
Rules applying to other cases
Anyone can mention the evaluations and evaluation campaigns for discussing them.
Any other use of these evaluations and their results is not authorized (you can ask
for permission however to the contact point) and failing to comply to the requirements
above is considered as unethical.
21 of 22
22. FP7 ā 238975
SEALS Methodology for Evaluation Campaigns
References
[1] R. GarcĀ“
ıa-Castro and F. MartĀ“
ın-Recuerda. D3.1 SEALS Methodology for Evalua-
tion Campaigns v1. Technical report, SEALS Project, October 2009.
[2] ISO/IEC. ISO/IEC 14598-6: Software product evaluation - Part 6: Documentation
of evaluation modules. 2001.
22 of 22