SlideShare a Scribd company logo
SEALS
Methodology for
Evaluation
Campaigns


RaĆŗl GarcĆ­a-Castro and Stuart N. Wrigley




                          September 2011
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


Document Information
IST Project        FP7 ā€“ 238975                         Acronym             SEALS
Number
Full Title         Semantic Evaluation at Large Scale
Project URL        http://www.seals-project.eu/


                   RaĀ“l GarcĀ“
                     u       ıa-Castro (Universidad PolitĀ“cnica de Madrid) and Stuart
                                                         e
Authors
                   Wrigley (University of Sheļ¬ƒeld)
                   Name      RaĀ“l GarcĀ“
                                u       ıa-Castro E-mail rgarcia@ļ¬.upm.es
Contact Author
                   Inst.     UPM                  Phone +34 91 336 36 70


                    This document describes the second version of the SEALS methodology
Abstract            for evaluation campaigns that is being used in the organization of the
                    diļ¬€erent SEALS Evaluation Campaigns.
Keywords            evaluation campaign, methodology, guidelines




                                        2 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


Table of Contents

List of figures                                                                                                            4

1 Introduction                                                                                                             5

2 Evaluation campaign process                                                                                              6
  2.1 Actors . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    6
  2.2 Process . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    7
      2.2.1 Initiation phase . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    8
      2.2.2 Involvement phase . . . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   10
      2.2.3 Preparation and execution phase           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
      2.2.4 Dissemination phase . . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   17

3 Evaluation campaign agreements                                                     20
  3.1 Terms of participation . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
  3.2 Use rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

References                                                                                                                21




                                        3 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


List of Figures

   2.1   The evaluation campaign process. . . . . . . . . . . . . . . . . . . . .   .    7
   2.2   Initiation phase of the evaluation campaign process. . . . . . . . . . .   .    8
   2.3   Involvement phase of the evaluation campaign process. . . . . . . . .      .   11
   2.4   Preparation and execution phase of the evaluation campaign process.        .   13
   2.5   Dissemination phase of the evaluation campaign process. . . . . . . .      .   18




                                        4 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


1. Introduction
In the SEALS project, we have issued two evaluation campaigns for each of the diļ¬€erent
types of semantic technologies covered by the project. In these evaluation campaigns,
diļ¬€erent tools were evaluated and compared according to a common set of evaluations
and test data.
    This document describes the second version of the methodology to be followed
to implement these evaluation campaigns. The ļ¬rst version of the methodology [1]
was deļ¬ned after analysing previous successful evaluation campaigns and has been
improved with the feedback and lessons learnt obtained during the ļ¬rst SEALS Eval-
uation Campaigns.
    This methodology is intended to be general enough to make it applicable to evalu-
ation campaigns that cover any type of technology (either semantic or not) and that
could be deļ¬ned in the future. Because of this, we have described the methodology
independently of the concrete details of the SEALS evaluation campaigns to increase
its usability. Nevertheless, we also highlight the beneļ¬ts of following a ā€œSEALSā€ ap-
proach for evaluation campaigns (e.g., automatic execution of evaluations, evaluation
resource storage and availability).
    The methodology should not be considered a strict step-by-step process; it should
be adapted to each particular case if needed and should be treated as a set of guidelines
to support evaluation campaigns instead of as a normative reference.
    This document describes ļ¬rst, in chapter 2, a generic process for carrying out
evaluation campaigns, presenting the actors that participate in it as well as the detailed
sequence of tasks to be performed. Then, chapter 3 includes the agreements that
rule the SEALS evaluation campaigns and that can be reused or adapted to other
campaigns.




                                         5 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


2. Evaluation campaign process
This chapter presents a process to guide the organization and execution of evaluation
campaigns over software technologies. An evaluation campaign is an activity where
several software technologies are compared along one or several evaluation scenarios,
i.e., evaluations where technologies are evaluated according to a certain evaluation
procedure and using common test data.
     A ļ¬rst version of this process appeared in [1] and has been the one followed in the
ļ¬rst round of evaluation campaigns that have been performed in the SEALS project.
This document updates that previous version including feedback from people organis-
ing and participating in those evaluation campaigns.
     First, the chapter presents the diļ¬€erent actors that participate in the evaluation
campaign process; then, a detailed description of such process is included, including
the tasks to be performed and diļ¬€erent alternatives and recommendations for carrying
out them.


2.1 Actors
The tasks of the evaluation process are carried out by diļ¬€erent actors according to the
kind of roles that must be performed in each task. This section presents the diļ¬€erent
kind of actors involved in the evaluation campaign process.

   ā€¢ Evaluation Campaign Organizers (Organizers from now on). The Organiz-
     ers are in charge of the general organization and monitoring of the evaluation
     campaign and of the organization of the evaluation scenarios that are performed
     in the evaluation campaign. Depending on the size or the complexity of the
     evaluation campaign, the Organizers group could be divided into smaller groups
     (e.g., groups that include the people in charge of individual evaluation scenarios).

   ā€¢ Evaluation Campaign Participants (Participants from now on). The Par-
     ticipants are tool providers or people with the permission of tool providers that
     participate with a tool in the evaluation campaign.


  SEALS value-added features

  In SEALS there is a committee, the Evaluation Campaign Advisory Committee
  (formerly named the Evaluation Campaign Organizing Committee), in charge of
  supervising all the evaluation campaigns performed in the project to ensure that
  they align to the SEALS community goals. This committee is composed of the
  SEALS Executive Project Management Board, the SEALS work package leaders
  and other prominent external people. Each evaluation campaign is led by diļ¬€er-
  ent groups of Evaluation Campaign Organizers (formerly named the Evaluation
  Campaign Executing Committee) who coordinate with the Evaluation Campaign
  Advisory Committee.


                                        6 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


2.2 Process
This section describes the process to follow for carrying out evaluation campaigns.
Since this process must be general to accommodate diļ¬€erent types of evaluation cam-
paigns, this methodology only suggests a set of general tasks to follow, not imposing
any restrictions or speciļ¬c details in purpose.
   Some tasks have alternative paths that can be followed. In these cases, the method-
ology does not impose any alternative but presents all the possibilities so the people
carrying out the task can decide which path to follow.
   The description of the tasks is completed with a set of recommendations extracted
from the analysis of other evaluation campaigns, as presented in [1]. These recommen-
dations are not compulsory to follow, but support speciļ¬c aspects of the evaluation
campaign.
   Figure 2.1 shows the main phases of the evaluation campaign process.


          INITIATION	
 Ā        INVOLVEMENT	
 Ā 

                             PREPARATION	
 Ā AND	
 Ā EXECUTION	
 Ā    DISSEMINATION	
 Ā 


                      Figure 2.1: The evaluation campaign process.


   The evaluation campaign process is composed of four phases, namely, Initiation,
Involvement, Preparation and Execution, and Dissemination. The main goals of these
phases are the following:

   ā€¢ Initiation phase. It comprises the set of tasks where the diļ¬€erent people in-
     volved in the organization of the evaluation campaign are identiļ¬ed and where
     the diļ¬€erent evaluation scenarios are deļ¬ned.

   ā€¢ Involvement phase. It comprises the set of tasks in which the evaluation
     campaign is announced and participants show their interest in participating by
     registering for the evaluation campaign.

   ā€¢ Preparation and Execution phase. It comprises the set of tasks that must
     be performed to insert the participating tools into the evaluation infrastructure
     and to execute each of the evaluation scenarios and analyse their results.

   ā€¢ Dissemination phase. It comprises the set of tasks that must be performed
     to disseminate the evaluation campaign results and to make all the evaluation
     campaign results and resources available.

    These four phases of the evaluation campaign process are described in the following
sections, where a deļ¬nition of the tasks that constitute them, the actors that perform
these tasks, its inputs, and its outputs are provided.



                                          7 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


2.2.1 Initiation phase

The Initiation phase comprises the set of tasks where the diļ¬€erent people involved in
the organization of the evaluation campaign and the evaluation scenarios are identi-
ļ¬ed and where the diļ¬€erent evaluation scenarios are deļ¬ned. These tasks and their
interdependencies, shown in ļ¬gure 2.2, are the following:

  1. Identify organizers.

  2. Deļ¬ne evaluation scenarios.

                                Evaluation Campaign
                                     Organizers          Deļ¬ne	
 Ā      Evaluation Scenarios
                Iden%fy	
 Ā 
                                                       evalua%on	
 Ā 
               organizers	
 Ā                                           Evaluation Campaign
                                                       scenarios	
 Ā 
                                                                            Schedule
               Organizers                              Organizers


          Figure 2.2: Initiation phase of the evaluation campaign process.




Identify organizers
 Actors
 E. C. Organizers
 Inputs                                               Outputs
                                                      E. C. Organizers

    The goal of this task is to deļ¬ne the group of people who will be in charge of
the general organization and monitoring of the evaluation campaign as well as of
organizing the evaluation scenarios to be performed in the evaluation campaign and
of taking them to a successful end.
     Recommendations for the evaluation campaign
     Requirement Recommendation
     + objectiveness The evaluation campaign does not favor one participant over
                     another.
     + consensus     The decisions and outcomes in the evaluation campaign are
                     consensual.
     + transparency The actual state of the evaluation campaign can be known by
                     anyone.
     + participation The organization overhead of the evaluation campaign is min-
                     imal.




                                                8 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


     Recommendations for the evaluation campaign organization
     Requirement Recommendation
     + credibility The evaluation campaign is organized by several organizations.
     + openness    Organizing the evaluation campaign is open to anyone inter-
                   ested.
                   Organizing an evaluation scenario is open to anyone interested.
     + relevance   Community experts are involved in the organization of the
                   evaluation campaign.


 SEALS value-added features

 Diļ¬€erent people in the SEALS community have expertise in organizing evaluation
 campaigns for diļ¬€erent types of semantic technologies. Donā€™t hesitate to ask for
 advice or collaboration when organizing an evaluation campaign!


Deļ¬ne evaluation scenarios
 Actors
 E. C. Organizers
 Inputs                                      Outputs
 E. C. Organizers                            Evaluation Scenarios
                                             Evaluation Campaign Schedule

   In this task, the Organizers must deļ¬ne the diļ¬€erent evaluation scenarios that will
take place in the evaluation campaign and the schedule to follow in the rest of the
evaluation campaign.
   For each of these evaluation scenarios, the Organizers must provide the complete
description of the evaluation scenario; we encourage to follow the conventions proposed
by the ISO/IEC 14598 standard on software evaluation [2].
   Evaluation scenario descriptions should at least include:

   ā€¢ Evaluation goals, quality characteristics covered and applicability (i.e., which
     requirements tools must satisfy to be evaluated).

   ā€¢ Test data (i.e., evaluation inputs), evaluation outputs, metrics, and interpreta-
     tions.

   ā€¢ Result interpretation (i.e., how to interpret and visualize evaluation results).

   ā€¢ Evaluation procedure and the resources needed in this procedure (e.g., software,
     hardware, human).

   Three diļ¬€erent types of test data can be used in an evaluation scenario:

   ā€¢ Development test data are test data that can be used when developing a tool.

   ā€¢ Training test data are test data that can be used for training a tool before
     performing an evaluation.

                                        9 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


   ā€¢ Evaluation test data are the test data that are used for performing an evalu-
     ation.

     Recommendations for evaluation scenarios
     Requirement Recommendation
     + credibility   Evaluation scenarios are organized by several organizations.
     + relevance     Evaluation scenarios are relevant to real-world tasks.
     + consensus     Evaluation scenarios are deļ¬ned by consensus.
     + objectiveness Evaluation scenarios can be executed with diļ¬€erent test data.
                     Evaluation scenarios do not favor one participant over an-
                     other.
     + participation Evaluation scenarios are automated as much as possible.

     Recommendations for evaluation descriptions
     Requirement Recommendation
     + transparency Evaluation descriptions are publicly available.
     + participation Evaluation descriptions are documented.

     Recommendations for test data
     Requirement Recommendation
     + objectiveness Development, training and evaluation test data are disjoint.
                     Test data have the same characteristics.
                     Test data do not favor one tool over another.
     + consensus     Test data are deļ¬ned by consensus.
     + participation Test data are deļ¬ned using a common format.
                     Test data are annotated in a simple way.
                     Test data are documented.
     + transparency Test data are reusable.
                     Test data are publicly available.
     + relevance     Test data are novel.
                     The quality of test data is higher than that of existing test
                     data.


 SEALS value-added features

 The SEALS Platform already contains diļ¬€erent resources for the evaluation of
 semantic technologies (i.e., evaluation workļ¬‚ows, services and test data). These
 resources are publicly available so they can be reused to avoid starting your eval-
 uation from scratch.


2.2.2 Involvement phase

The Involvement phase comprises the set of tasks in which the evaluation campaign is
announced and participants show their interest in participating by registering for the


                                      10 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


evaluation campaign. These tasks and their interdependencies, shown in ļ¬gure 2.3,
are the following:

  1. Announce evaluation campaign.

  2. Provide registration mechanisms.

  3. Participant registration.


                              Evaluation Campaign
            Announce	
 Ā         Announcement                                   Participant List
                                                         Par+cipant	
 Ā 
            evalua+on	
 Ā 
                                                         registra+on	
 Ā 	
 Ā 
            campaign	
 Ā 
           Organizers                                    Participants


             Provide	
 Ā 
           registra+on	
 Ā 
           mechanisms	
 Ā     Registration Mechanisms

           Organizers


         Figure 2.3: Involvement phase of the evaluation campaign process.



Announce evaluation campaign
 Actors
 E. C. Organizers
 Inputs                                             Outputs
 Evaluation Scenarios                               Evaluation Campaign Announcement
 Evaluation Campaign Schedule

   Once the diļ¬€erent evaluation scenarios are deļ¬ned, the Organizers must announce
the evaluation campaign using any mechanism available (e.g., mailing lists, blogs, etc.)
with the goal of reaching the developers of the tools that are targeted by the evaluation
scenarios.
     Recommendations for evaluation campaign announcement
     Requirement Recommendation
     + participation The evaluation campaign is announced internationally.
                     Tool developers are directly contacted.




                                           11 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


     Recommendations for evaluation campaign participation
     Requirement Recommendation
     + openness      Participation in the evaluation campaign is open to any orga-
                     nization.
     + participation The eļ¬€ort needed for participating in the evaluation campaign
                     is minimal.
                     People can participate in the evaluation campaign regardless
                     of their location.
                     Participation in the evaluation campaign does not require at-
                     tending to any location.
                     The requirements for tool participation are minimal.
                     Participants have time to participate in several evaluation sce-
                     narios.
     + relevance     Community experts participate in the evaluation campaign.
                     Tools participating in the evaluation campaign include the
                     most relevant tools.


 SEALS value-added features

 The SEALS community dissemination services (e.g., SEALS Portal, mailing lists,
 blog, etc.) can support spreading your evaluation campaign announcements to a
 whole community of users and providers interested in semantic technology evalu-
 ation.


Provide registration mechanisms
 Actors
 E. C. Organizers
 Inputs                                     Outputs
 Evaluation Scenarios                       Registration Mechanisms

   In this task, the Organizers must provide the mechanisms needed to allow potential
participants to register and to provide detailed information about themselves and their
tools.
     Recommendations for registration mechanisms
     Requirement Recommendation
     + participation Participants can register in the evaluation campaign regardless
                     of their location.


 SEALS value-added features

 If you donā€™t want to implement your own registration mechanisms, the SEALS
 Portal can take care of user registration for your evaluation campaign.



                                       12 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


Participant registration
 Actors
 E. C. Participants
 Inputs                                                     Outputs
 Evaluation Campaign Announcement                           Participant list
 Registration Mechanisms

    In this task, any individual or organization willing to participate in any of the eval-
uation scenarios of the evaluation campaign must register to indicate their willingness
to do so.

  SEALS value-added features

  If you are using the SEALS Platform for executing your evaluation campaign,
  participant registration through the SEALS Portal can be coupled with diļ¬€erent
  evaluation services (e.g., uploading the participant tool into the platform).


2.2.3 Preparation and execution phase

The Preparation and execution phase comprises the set of tasks that must be per-
formed to insert the participating tools into the evaluation infrastructure, to execute
each of the evaluation scenarios, and to analyse the evaluation results. The tasks
that compose this phase can be performed either independently for each evaluation
scenario or covering all the evaluation scenarios in each task. These tasks and its
interdependencies, shown in ļ¬gure 2.4, are the following:

  1. Provide evaluation materials.

  2. Insert tools.

  3. Perform evaluation.

  4. Analyse results.


                    Evaluation                     Tools                      Evaluation                   Result
     Provide	
 Ā     Materials                    Inserted                      Results                    Analysis
                                    Insert	
 Ā                  Perform	
 Ā                   Analyse	
 Ā 
    evalua,on	
 Ā 
                                     tools	
 Ā                 evalua,on	
 Ā                  results	
 Ā 
    materials	
 Ā 

   Organizers                    Participants           Participants/Organizers       Participants/Organizers


  Figure 2.4: Preparation and execution phase of the evaluation campaign process.




                                                     13 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


Provide evaluation materials
 Actors
 E. C. Organizers
 Inputs                                     Outputs
 Evaluation Scenarios                       Evaluation Materials

   In this task the Organizers must provide to the registered participants all the
evaluation materials needed in the evaluation, including:

   ā€¢ Instructions on how to participate.

   ā€¢ Evaluation description.

   ā€¢ Evaluation test data.

   ā€¢ Evaluation infrastructure.

   ā€¢ Any software needed for the evaluation.


    Alternatives:

      āŠ— Evaluation test data availability

           a. Evaluation test data are available to the participants in this task.
           b. Evaluation test data are sequestered and are not available to the par-
              ticipants.

      āŠ— Reference tool

           a. Participants are provided with a reference tool and with the results
              for this tool. This tool could be made of modules so participants donā€™t
              have to implement a whole tool to participate.

    Recommendations for evaluation material provision
    Requirement Recommendation
    + objectiveness All the participants are provided with the same evaluation
                    materials.
    + participation Participants are provided with all the necessary documenta-
                    tion.

    Recommendations for evaluation           software
    Requirement Recommendation
    + openness      The software used in    the evaluation is open source.
    + transparency The software used in     the evaluation is publicly available.
    + participation The software used in    the evaluation is robust and eļ¬ƒcient.




                                      14 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns




 SEALS value-added features

 Using the SEALS Platform participants can access all the evaluation materials
 stored in the platform repositories either manually through the SEALS Portal or
 automatically using the SEALS Platform services.


Insert tools
 Actors
 E. C. Participants
 Inputs                                     Outputs
 Evaluation Materials                       Tools Inserted

   Once the Participants have all the evaluation materials, they must insert their
tools into the evaluation infrastructure and ensure that these tools are ready for the
evaluation execution.
     Recommendations for tool insertion
     Requirement Recommendation
     + participation Participants can use test data to prepare their tools.
                     Participants can check that their tool provides the outputs
                     required.
                     Participants can test the insertion of their tools.
                     The insertion of tools is simple.


 SEALS value-added features

 Similarly to participant registration, tool insertion can be managed through the
 SEALS Portal. Furthermore, the SEALS Platform can check that tools are cor-
 rectly inserted so they are ready for execution.


Perform evaluation
 Actors
 E. C. Participants/E. C. Organizers
 Inputs                                     Outputs
 Tools Inserted                             Evaluation Results

   In this task, the evaluation is executed over all the participating tools and the
evaluation results of all the tools are collected.




                                       15 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


    Alternatives:

      āŠ— Evaluation execution

            a. The execution of the evaluation is performed by the Organizers.
            b. The execution of the evaluation is performed by the Participants.
            c. The execution of the evaluation is performed by the Organizers and
               the Participants.

      āŠ— Time for executing evaluations

            a. Participants have a limited period of time to return their evaluation
               results.

      āŠ— Use of auxiliary resources

            a. Participants cannot use any auxiliary resource in the evaluation (de-
               velopment and training test data, external resources, etc.).

     Recommendations for evaluation execution
     Requirement Recommendation
     + participation Evaluation execution requires few resources.
                     Evaluation execution can be made regardless of the location
                     or time.
     + objectiveness All the tools are evaluated following the same evaluation de-
                     scription.
                     All the tools are evaluated using the same test data.
     + credibility   The results of the evaluation execution are validated.


 SEALS value-added features

 If all the resources needed in an evaluation (i.e., evaluation workļ¬‚ow, test data
 and tools) are stored inside the SEALS Platform, the platform can automatically
 execute your evaluation and store all the produced results.


Analyse results
 Actors
 E. C. Participants/E. C. Organizers
 Inputs                                     Outputs
 Evaluation Results                         Result Analysis

    Once the evaluation results of all the tools are collected, they are analysed both
individually for each tool and globally including all the tools.
    This results analysis must be reviewed in order to get agreed conclusions. There-
fore, if the results are analysed by the Organizers then this analysis must be reviewed

                                       16 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


by the Participants and vice versa, that is, if the results are analysed by the Partici-
pants they must be reviewed by the Organizers.
    Alternatives:

       āŠ— Analysis of evaluation results

            a. The analysis of the evaluation results is performed by the Organizers.
            b. The analysis of the evaluation results is performed by the Participants.
            c. The analysis of the evaluation results is performed by the Organizers
               and the Participants.

       āŠ— Anonymised results

            a. The results of the evaluation campaign are anonymised so the name
               of the tool that produces them is not known.

     Recommendations for evaluation results
     Requirement      Recommendation
     + objectiveness Evaluation results are analysed identically for all the tools.
                      Participants can comment on the evaluation results of their
                      tools before making them public.
                      There is enough time for analysing the evaluation results.
     + transparency Evaluation results are publicly available at the end of the
                      evaluation campaign.
                      Evaluation results can be replicated by anyone.
     + sustainability Evaluation results are compiled, documented, and expressed
                      in a common format.


  SEALS value-added features

  Evaluation results stored in the SEALS Platform can be accessed either through
  the SEALS Portal by means of visualization services or automatically through the
  platform services so they can be used in your own calculations. Besides, you can
  easily combine the results of diļ¬€erent evaluations to suit your goals.


2.2.4 Dissemination phase

The Dissemination phase comprises the set of tasks that must be performed to dis-
seminate the evaluation campaign results and to make all the evaluation campaign
result and resources available. The tasks that compose this phase can be performed
either independently for each evaluation scenario or covering all the evaluation scenar-
ios in each task. These tasks and its interdependencies, shown in ļ¬gure 2.5, are the
following:
  1. Disseminate results.

                                       17 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


  2. Publish outcomes.


                                  Feedback                             Results Published
                Disseminate	
 Ā                       Publish	
 Ā 
                   results	
 Ā                       outcomes	
 Ā       Evaluation Resources
                                                                           Published
         Organizers + Participants           Organizers + Participants


        Figure 2.5: Dissemination phase of the evaluation campaign process.




Disseminate results
 Actors
 E. C. Organizers
 E. C. Participants
 Inputs                                              Outputs
 Result Analysis                                     Feedback

   In this task, the Organizers and the Participants must disseminate the results of the
evaluation campaign. The preferred way of dissemination is one that maximizes dis-
cussion to obtain feedback about the evaluation campaign (e.g., a face-to-face meeting
with participants or a workshop).
     Recommendations for result presentation
     Requirement Recommendation
     + consensus     Evaluation results are presented and discussed in a meeting.
     + transparency Participants write the description of their tools and their re-
                     sults.
     + objectiveness The presentation of results is identical for all the tools.
                     The presentation of results includes the reasons for obtaining
                     these results.

     Recommendations for the dissemination meeting
     Requirement      Recommendation
     + openness       The dissemination meeting is public.
     + relevance      The dissemination meeting is collocated with a relevant event.
     + participation Participants present the details and results of their tool.
     + credibility    The diļ¬ƒculties faced during the evaluation campaign are pre-
                      sented.
     + consensus      Feedback about the evaluation campaign is obtained.
     + sustainability The dissemination meeting includes the discussion of the next
                      steps to follow.
                      A survey is performed to organizers, participants and atten-
                      dants.



                                              18 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns




 SEALS value-added features

 The SEALS community dissemination services (e.g., SEALS Portal, mailing lists,
 blog, etc.) can support disseminating your evaluation campaign results and at-
 tracting people to meetings where they can be discussed.


Publish outcomes
 Actors
 E. C. Organizers
 E. C. Participants
 Inputs                                     Outputs
 Feedback                                   Results and Resources Published

    In this task, the Organizers and the Participants are encouraged to document
the results of the evaluation campaign and of each of the tools, including not only
the results obtained but also improvement recommendations based on the feedback
obtained and the lessons learnt during the evaluation campaign.
    Finally, the Organizers must make publicly available all the evaluation resources
used in the evaluation campaign, including any resources that were not made available
to participants (e.g., sequestered evaluation test data).
     Recommendations for result publication
     Requirement Recommendation
     + participation Results are jointly published by all participants (e.g., as work-
                     shop proceedings, journal special issues, etc.).

     Recommendations for resource publication
     Requirement      Recommendation
     + transparency All the evaluation resources are made publicly available.
     + sustainability Test data are maintained by an association.


 SEALS value-added features

 Making evaluation resources public once the evaluation campaign is over is straight-
 forward since all those resources are already available from the SEALS Platform.
 Furthermore, the SEALS Portal is the best place to upload or link to any report
 resulting from your evaluation campaign.




                                       19 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


3. Evaluation campaign agreements
This chapter presents the general terms for participation in the SEALS evaluation
campaigns and the policies for using the resources and results produced in these eval-
uation campaigns. They are based on the data policies of the Ontology Alignment
Evaluation Initiative1 .
   These terms for participation and policies are only a suggestion. They are being
used in the SEALS evaluation campaigns but can be adapted as seen ļ¬t for other
campaigns.


3.1 Terms of participation
By submitting a tool and/or its results to a SEALS evaluation campaign the partici-
pants grant their permission for the publication of the tool results on the SEALS web
site and for their use for scientiļ¬c purposes (e.g., as a basis for experiments).
    In return, it is expected that the provenance of these results is correctly and duly
acknowledged.


3.2 Use rights
In order to avoid any inadequate use of the data provided by the SEALS evaluation
campaigns, we make clear the following rules of use of these data.
    It is the responsibility of the user of the data to ensure that the authors of the results
are properly acknowledged, unless these data are used in an anonymous aggregated
way. In the case of participant results, an appropriate acknowledgement is the mention
of this participant and a citation of a paper from the participants (e.g., the paper
detailing their participation). The speciļ¬c conditions under which the results have
been produced should not be misrepresented (an explicit link to their source in the
SEALS web site should be made).
    These rules apply to any publication mentioning these results. In addition, speciļ¬c
rules below also apply to particular types of use of the data.

Rule applying to the non-public use of the data

Anyone can freely use the evaluations, test data and evaluation results for evaluating
and improving their tools and methods.

Rules applying to evaluation campaign participants

The participants of some evaluation campaign can publish the results as long as they
cite the source of the evaluations and in which evaluation campaign they were obtained.
    Participants can compare their results with other published results on the SEALS
web site as long as they also:
  1
      http://oaei.ontologymatching.org/doc/oaei-deontology.html


                                          20 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


   ā€¢ compare with the results of all the participants of the same evaluation scenario;
     and

   ā€¢ compare with all the test data of this evaluation scenario.

   Of course, participants can mention their participation in the evaluation campaign.

Rules applying to people who did not participate in an evaluation campaign

People who did not participate in an evaluation campaign can publish their results as
long as they cite the sources of the evaluations and in which evaluation campaign they
were obtained and they need to make clear that they did not participate in the oļ¬ƒcial
evaluation campaign.
    They can compare their results with other published results on the SEALS web
site as long as they:

   ā€¢ cite the source of the evaluations and in which evaluation campaign they were
     obtained;

   ā€¢ compare with the results of all the participants of the same evaluation scenario;
     and

   ā€¢ compare with all the test data of this evaluation scenario.

   They cannot pretend having executed the evaluation in the same conditions as the
participants. Furthermore, given that evaluation results change over time, it is not
ethical to compare one tool against old results; one should always make comparisons
with the state of the art. In the case that this comparison is not possible, results can
be compared with older results but it must be made clear the age of the result and
the version of the tool that produced them.

Rules applying to other cases

Anyone can mention the evaluations and evaluation campaigns for discussing them.
    Any other use of these evaluations and their results is not authorized (you can ask
for permission however to the contact point) and failing to comply to the requirements
above is considered as unethical.




                                       21 of 22
FP7 ā€“ 238975
SEALS Methodology for Evaluation Campaigns


References
[1] R. GarcĀ“
           ıa-Castro and F. MartĀ“
                                ın-Recuerda. D3.1 SEALS Methodology for Evalua-
    tion Campaigns v1. Technical report, SEALS Project, October 2009.

[2] ISO/IEC. ISO/IEC 14598-6: Software product evaluation - Part 6: Documentation
    of evaluation modules. 2001.




                                    22 of 22

More Related Content

Similar to SEALS 2nd Whitepaper

BR 4 / PodrobnĆ½ nĆ”kladovĆ½ rĆ”mec hodnocenĆ­
BR 4 / PodrobnĆ½ nĆ”kladovĆ½ rĆ”mec hodnocenĆ­BR 4 / PodrobnĆ½ nĆ”kladovĆ½ rĆ”mec hodnocenĆ­
BR 4 / PodrobnĆ½ nĆ”kladovĆ½ rĆ”mec hodnocenĆ­
MEYS, MÅ MT in Czech
Ā 
Dynamic Stress Test diffusion model and scoring performance
Dynamic Stress Test diffusion model and scoring performanceDynamic Stress Test diffusion model and scoring performance
Dynamic Stress Test diffusion model and scoring performance
Ziad Fares
Ā 
Ch cie gra - stress-test-diffusion-model-and-scoring-performance
Ch cie   gra - stress-test-diffusion-model-and-scoring-performanceCh cie   gra - stress-test-diffusion-model-and-scoring-performance
Ch cie gra - stress-test-diffusion-model-and-scoring-performance
C Louiza
Ā 
61506_Capstone_Report_DFID_FINAL_Quantifying_Governance__Indicators-4
61506_Capstone_Report_DFID_FINAL_Quantifying_Governance__Indicators-461506_Capstone_Report_DFID_FINAL_Quantifying_Governance__Indicators-4
61506_Capstone_Report_DFID_FINAL_Quantifying_Governance__Indicators-4Alexander Hamilton, PhD
Ā 
Utilization focused evaluation: an introduction (Part 1 - ROER4D)
Utilization focused evaluation: an introduction (Part 1 - ROER4D) Utilization focused evaluation: an introduction (Part 1 - ROER4D)
Utilization focused evaluation: an introduction (Part 1 - ROER4D)
SarahG_SS
Ā 
UFE step-by-step: an introduction
UFE step-by-step: an introductionUFE step-by-step: an introduction
UFE step-by-step: an introduction
ROER4D
Ā 
Crafting a CMMI V2 Compliant Process for Governance Practice Area: An Experie...
Crafting a CMMI V2 Compliant Process for Governance Practice Area: An Experie...Crafting a CMMI V2 Compliant Process for Governance Practice Area: An Experie...
Crafting a CMMI V2 Compliant Process for Governance Practice Area: An Experie...
Dr. Mustafa Değerli
Ā 
Measurement-Process-Effectiveness_paper_updated210
Measurement-Process-Effectiveness_paper_updated210Measurement-Process-Effectiveness_paper_updated210
Measurement-Process-Effectiveness_paper_updated210pbaxter
Ā 
Quality improvement paradigm (QIP)
Quality improvement paradigm (QIP)Quality improvement paradigm (QIP)
Quality improvement paradigm (QIP)
Chandan Thakur
Ā 
Multicriteria methodology for kpi identification in outsourced projects
Multicriteria methodology for kpi identification in outsourced projectsMulticriteria methodology for kpi identification in outsourced projects
Multicriteria methodology for kpi identification in outsourced projects
Edilson Giffhorn
Ā 
ISO TC 176
ISO TC 176ISO TC 176
ISO TC 176
kellary1
Ā 
capdev toolkit copy.pdf
capdev toolkit copy.pdfcapdev toolkit copy.pdf
capdev toolkit copy.pdf
jehanne rodrigo
Ā 
Software Quality Assurance Model for Software Excellence with Its Requirements
Software Quality Assurance Model for Software Excellence with Its RequirementsSoftware Quality Assurance Model for Software Excellence with Its Requirements
Software Quality Assurance Model for Software Excellence with Its Requirements
United International Journal for Research & Technology
Ā 
Pmb
PmbPmb
Pmb
soumyarm
Ā 
MONITORING & EVALUATION OF EXTENSION PROGRAMMES
MONITORING & EVALUATION OF EXTENSION PROGRAMMESMONITORING & EVALUATION OF EXTENSION PROGRAMMES
MONITORING & EVALUATION OF EXTENSION PROGRAMMES
Ayush Mishra
Ā 
Outline model Impact Evaluation toolkit
Outline model Impact Evaluation toolkitOutline model Impact Evaluation toolkit
Outline model Impact Evaluation toolkit
NIDA-Net
Ā 
Mb0049 project management
Mb0049 project managementMb0049 project management
Mb0049 project management
consult4solutions
Ā 
Software testing services growth report oct 11
Software testing services growth report oct 11Software testing services growth report oct 11
Software testing services growth report oct 11
Transition Consulting Limited, India
Ā 
Scaling better, together
Scaling better, togetherScaling better, together
Scaling better, together
ILRI
Ā 

Similar to SEALS 2nd Whitepaper (20)

BR 4 / PodrobnĆ½ nĆ”kladovĆ½ rĆ”mec hodnocenĆ­
BR 4 / PodrobnĆ½ nĆ”kladovĆ½ rĆ”mec hodnocenĆ­BR 4 / PodrobnĆ½ nĆ”kladovĆ½ rĆ”mec hodnocenĆ­
BR 4 / PodrobnĆ½ nĆ”kladovĆ½ rĆ”mec hodnocenĆ­
Ā 
Dynamic Stress Test diffusion model and scoring performance
Dynamic Stress Test diffusion model and scoring performanceDynamic Stress Test diffusion model and scoring performance
Dynamic Stress Test diffusion model and scoring performance
Ā 
Ch cie gra - stress-test-diffusion-model-and-scoring-performance
Ch cie   gra - stress-test-diffusion-model-and-scoring-performanceCh cie   gra - stress-test-diffusion-model-and-scoring-performance
Ch cie gra - stress-test-diffusion-model-and-scoring-performance
Ā 
61506_Capstone_Report_DFID_FINAL_Quantifying_Governance__Indicators-4
61506_Capstone_Report_DFID_FINAL_Quantifying_Governance__Indicators-461506_Capstone_Report_DFID_FINAL_Quantifying_Governance__Indicators-4
61506_Capstone_Report_DFID_FINAL_Quantifying_Governance__Indicators-4
Ā 
Utilization focused evaluation: an introduction (Part 1 - ROER4D)
Utilization focused evaluation: an introduction (Part 1 - ROER4D) Utilization focused evaluation: an introduction (Part 1 - ROER4D)
Utilization focused evaluation: an introduction (Part 1 - ROER4D)
Ā 
UFE step-by-step: an introduction
UFE step-by-step: an introductionUFE step-by-step: an introduction
UFE step-by-step: an introduction
Ā 
Crafting a CMMI V2 Compliant Process for Governance Practice Area: An Experie...
Crafting a CMMI V2 Compliant Process for Governance Practice Area: An Experie...Crafting a CMMI V2 Compliant Process for Governance Practice Area: An Experie...
Crafting a CMMI V2 Compliant Process for Governance Practice Area: An Experie...
Ā 
Measurement-Process-Effectiveness_paper_updated210
Measurement-Process-Effectiveness_paper_updated210Measurement-Process-Effectiveness_paper_updated210
Measurement-Process-Effectiveness_paper_updated210
Ā 
Quality improvement paradigm (QIP)
Quality improvement paradigm (QIP)Quality improvement paradigm (QIP)
Quality improvement paradigm (QIP)
Ā 
Multicriteria methodology for kpi identification in outsourced projects
Multicriteria methodology for kpi identification in outsourced projectsMulticriteria methodology for kpi identification in outsourced projects
Multicriteria methodology for kpi identification in outsourced projects
Ā 
csc 510 Project
csc 510 Projectcsc 510 Project
csc 510 Project
Ā 
ISO TC 176
ISO TC 176ISO TC 176
ISO TC 176
Ā 
capdev toolkit copy.pdf
capdev toolkit copy.pdfcapdev toolkit copy.pdf
capdev toolkit copy.pdf
Ā 
Software Quality Assurance Model for Software Excellence with Its Requirements
Software Quality Assurance Model for Software Excellence with Its RequirementsSoftware Quality Assurance Model for Software Excellence with Its Requirements
Software Quality Assurance Model for Software Excellence with Its Requirements
Ā 
Pmb
PmbPmb
Pmb
Ā 
MONITORING & EVALUATION OF EXTENSION PROGRAMMES
MONITORING & EVALUATION OF EXTENSION PROGRAMMESMONITORING & EVALUATION OF EXTENSION PROGRAMMES
MONITORING & EVALUATION OF EXTENSION PROGRAMMES
Ā 
Outline model Impact Evaluation toolkit
Outline model Impact Evaluation toolkitOutline model Impact Evaluation toolkit
Outline model Impact Evaluation toolkit
Ā 
Mb0049 project management
Mb0049 project managementMb0049 project management
Mb0049 project management
Ā 
Software testing services growth report oct 11
Software testing services growth report oct 11Software testing services growth report oct 11
Software testing services growth report oct 11
Ā 
Scaling better, together
Scaling better, togetherScaling better, together
Scaling better, together
Ā 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
Ā 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
Ā 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
Ā 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
Ā 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
Ā 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
Ā 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
Ā 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
Ā 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
Ā 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
Ā 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
Ā 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
Ā 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
Ā 
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
UiPathCommunity
Ā 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
Ā 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
Ā 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
Ā 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
Ā 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
Ā 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
Ā 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Ā 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Ā 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Ā 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Ā 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Ā 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Ā 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
Ā 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Ā 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
Ā 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
Ā 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
Ā 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Ā 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Ā 
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Ā 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Ā 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
Ā 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ā 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Ā 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
Ā 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
Ā 

SEALS 2nd Whitepaper

  • 2. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns Document Information IST Project FP7 ā€“ 238975 Acronym SEALS Number Full Title Semantic Evaluation at Large Scale Project URL http://www.seals-project.eu/ RaĀ“l GarcĀ“ u ıa-Castro (Universidad PolitĀ“cnica de Madrid) and Stuart e Authors Wrigley (University of Sheļ¬ƒeld) Name RaĀ“l GarcĀ“ u ıa-Castro E-mail rgarcia@ļ¬.upm.es Contact Author Inst. UPM Phone +34 91 336 36 70 This document describes the second version of the SEALS methodology Abstract for evaluation campaigns that is being used in the organization of the diļ¬€erent SEALS Evaluation Campaigns. Keywords evaluation campaign, methodology, guidelines 2 of 22
  • 3. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns Table of Contents List of figures 4 1 Introduction 5 2 Evaluation campaign process 6 2.1 Actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Initiation phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Involvement phase . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3 Preparation and execution phase . . . . . . . . . . . . . . . . . 13 2.2.4 Dissemination phase . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Evaluation campaign agreements 20 3.1 Terms of participation . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Use rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 References 21 3 of 22
  • 4. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns List of Figures 2.1 The evaluation campaign process. . . . . . . . . . . . . . . . . . . . . . 7 2.2 Initiation phase of the evaluation campaign process. . . . . . . . . . . . 8 2.3 Involvement phase of the evaluation campaign process. . . . . . . . . . 11 2.4 Preparation and execution phase of the evaluation campaign process. . 13 2.5 Dissemination phase of the evaluation campaign process. . . . . . . . . 18 4 of 22
  • 5. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns 1. Introduction In the SEALS project, we have issued two evaluation campaigns for each of the diļ¬€erent types of semantic technologies covered by the project. In these evaluation campaigns, diļ¬€erent tools were evaluated and compared according to a common set of evaluations and test data. This document describes the second version of the methodology to be followed to implement these evaluation campaigns. The ļ¬rst version of the methodology [1] was deļ¬ned after analysing previous successful evaluation campaigns and has been improved with the feedback and lessons learnt obtained during the ļ¬rst SEALS Eval- uation Campaigns. This methodology is intended to be general enough to make it applicable to evalu- ation campaigns that cover any type of technology (either semantic or not) and that could be deļ¬ned in the future. Because of this, we have described the methodology independently of the concrete details of the SEALS evaluation campaigns to increase its usability. Nevertheless, we also highlight the beneļ¬ts of following a ā€œSEALSā€ ap- proach for evaluation campaigns (e.g., automatic execution of evaluations, evaluation resource storage and availability). The methodology should not be considered a strict step-by-step process; it should be adapted to each particular case if needed and should be treated as a set of guidelines to support evaluation campaigns instead of as a normative reference. This document describes ļ¬rst, in chapter 2, a generic process for carrying out evaluation campaigns, presenting the actors that participate in it as well as the detailed sequence of tasks to be performed. Then, chapter 3 includes the agreements that rule the SEALS evaluation campaigns and that can be reused or adapted to other campaigns. 5 of 22
  • 6. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns 2. Evaluation campaign process This chapter presents a process to guide the organization and execution of evaluation campaigns over software technologies. An evaluation campaign is an activity where several software technologies are compared along one or several evaluation scenarios, i.e., evaluations where technologies are evaluated according to a certain evaluation procedure and using common test data. A ļ¬rst version of this process appeared in [1] and has been the one followed in the ļ¬rst round of evaluation campaigns that have been performed in the SEALS project. This document updates that previous version including feedback from people organis- ing and participating in those evaluation campaigns. First, the chapter presents the diļ¬€erent actors that participate in the evaluation campaign process; then, a detailed description of such process is included, including the tasks to be performed and diļ¬€erent alternatives and recommendations for carrying out them. 2.1 Actors The tasks of the evaluation process are carried out by diļ¬€erent actors according to the kind of roles that must be performed in each task. This section presents the diļ¬€erent kind of actors involved in the evaluation campaign process. ā€¢ Evaluation Campaign Organizers (Organizers from now on). The Organiz- ers are in charge of the general organization and monitoring of the evaluation campaign and of the organization of the evaluation scenarios that are performed in the evaluation campaign. Depending on the size or the complexity of the evaluation campaign, the Organizers group could be divided into smaller groups (e.g., groups that include the people in charge of individual evaluation scenarios). ā€¢ Evaluation Campaign Participants (Participants from now on). The Par- ticipants are tool providers or people with the permission of tool providers that participate with a tool in the evaluation campaign. SEALS value-added features In SEALS there is a committee, the Evaluation Campaign Advisory Committee (formerly named the Evaluation Campaign Organizing Committee), in charge of supervising all the evaluation campaigns performed in the project to ensure that they align to the SEALS community goals. This committee is composed of the SEALS Executive Project Management Board, the SEALS work package leaders and other prominent external people. Each evaluation campaign is led by diļ¬€er- ent groups of Evaluation Campaign Organizers (formerly named the Evaluation Campaign Executing Committee) who coordinate with the Evaluation Campaign Advisory Committee. 6 of 22
  • 7. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns 2.2 Process This section describes the process to follow for carrying out evaluation campaigns. Since this process must be general to accommodate diļ¬€erent types of evaluation cam- paigns, this methodology only suggests a set of general tasks to follow, not imposing any restrictions or speciļ¬c details in purpose. Some tasks have alternative paths that can be followed. In these cases, the method- ology does not impose any alternative but presents all the possibilities so the people carrying out the task can decide which path to follow. The description of the tasks is completed with a set of recommendations extracted from the analysis of other evaluation campaigns, as presented in [1]. These recommen- dations are not compulsory to follow, but support speciļ¬c aspects of the evaluation campaign. Figure 2.1 shows the main phases of the evaluation campaign process. INITIATION Ā  INVOLVEMENT Ā  PREPARATION Ā AND Ā EXECUTION Ā  DISSEMINATION Ā  Figure 2.1: The evaluation campaign process. The evaluation campaign process is composed of four phases, namely, Initiation, Involvement, Preparation and Execution, and Dissemination. The main goals of these phases are the following: ā€¢ Initiation phase. It comprises the set of tasks where the diļ¬€erent people in- volved in the organization of the evaluation campaign are identiļ¬ed and where the diļ¬€erent evaluation scenarios are deļ¬ned. ā€¢ Involvement phase. It comprises the set of tasks in which the evaluation campaign is announced and participants show their interest in participating by registering for the evaluation campaign. ā€¢ Preparation and Execution phase. It comprises the set of tasks that must be performed to insert the participating tools into the evaluation infrastructure and to execute each of the evaluation scenarios and analyse their results. ā€¢ Dissemination phase. It comprises the set of tasks that must be performed to disseminate the evaluation campaign results and to make all the evaluation campaign results and resources available. These four phases of the evaluation campaign process are described in the following sections, where a deļ¬nition of the tasks that constitute them, the actors that perform these tasks, its inputs, and its outputs are provided. 7 of 22
  • 8. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns 2.2.1 Initiation phase The Initiation phase comprises the set of tasks where the diļ¬€erent people involved in the organization of the evaluation campaign and the evaluation scenarios are identi- ļ¬ed and where the diļ¬€erent evaluation scenarios are deļ¬ned. These tasks and their interdependencies, shown in ļ¬gure 2.2, are the following: 1. Identify organizers. 2. Deļ¬ne evaluation scenarios. Evaluation Campaign Organizers Deļ¬ne Ā  Evaluation Scenarios Iden%fy Ā  evalua%on Ā  organizers Ā  Evaluation Campaign scenarios Ā  Schedule Organizers Organizers Figure 2.2: Initiation phase of the evaluation campaign process. Identify organizers Actors E. C. Organizers Inputs Outputs E. C. Organizers The goal of this task is to deļ¬ne the group of people who will be in charge of the general organization and monitoring of the evaluation campaign as well as of organizing the evaluation scenarios to be performed in the evaluation campaign and of taking them to a successful end. Recommendations for the evaluation campaign Requirement Recommendation + objectiveness The evaluation campaign does not favor one participant over another. + consensus The decisions and outcomes in the evaluation campaign are consensual. + transparency The actual state of the evaluation campaign can be known by anyone. + participation The organization overhead of the evaluation campaign is min- imal. 8 of 22
  • 9. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns Recommendations for the evaluation campaign organization Requirement Recommendation + credibility The evaluation campaign is organized by several organizations. + openness Organizing the evaluation campaign is open to anyone inter- ested. Organizing an evaluation scenario is open to anyone interested. + relevance Community experts are involved in the organization of the evaluation campaign. SEALS value-added features Diļ¬€erent people in the SEALS community have expertise in organizing evaluation campaigns for diļ¬€erent types of semantic technologies. Donā€™t hesitate to ask for advice or collaboration when organizing an evaluation campaign! Deļ¬ne evaluation scenarios Actors E. C. Organizers Inputs Outputs E. C. Organizers Evaluation Scenarios Evaluation Campaign Schedule In this task, the Organizers must deļ¬ne the diļ¬€erent evaluation scenarios that will take place in the evaluation campaign and the schedule to follow in the rest of the evaluation campaign. For each of these evaluation scenarios, the Organizers must provide the complete description of the evaluation scenario; we encourage to follow the conventions proposed by the ISO/IEC 14598 standard on software evaluation [2]. Evaluation scenario descriptions should at least include: ā€¢ Evaluation goals, quality characteristics covered and applicability (i.e., which requirements tools must satisfy to be evaluated). ā€¢ Test data (i.e., evaluation inputs), evaluation outputs, metrics, and interpreta- tions. ā€¢ Result interpretation (i.e., how to interpret and visualize evaluation results). ā€¢ Evaluation procedure and the resources needed in this procedure (e.g., software, hardware, human). Three diļ¬€erent types of test data can be used in an evaluation scenario: ā€¢ Development test data are test data that can be used when developing a tool. ā€¢ Training test data are test data that can be used for training a tool before performing an evaluation. 9 of 22
  • 10. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns ā€¢ Evaluation test data are the test data that are used for performing an evalu- ation. Recommendations for evaluation scenarios Requirement Recommendation + credibility Evaluation scenarios are organized by several organizations. + relevance Evaluation scenarios are relevant to real-world tasks. + consensus Evaluation scenarios are deļ¬ned by consensus. + objectiveness Evaluation scenarios can be executed with diļ¬€erent test data. Evaluation scenarios do not favor one participant over an- other. + participation Evaluation scenarios are automated as much as possible. Recommendations for evaluation descriptions Requirement Recommendation + transparency Evaluation descriptions are publicly available. + participation Evaluation descriptions are documented. Recommendations for test data Requirement Recommendation + objectiveness Development, training and evaluation test data are disjoint. Test data have the same characteristics. Test data do not favor one tool over another. + consensus Test data are deļ¬ned by consensus. + participation Test data are deļ¬ned using a common format. Test data are annotated in a simple way. Test data are documented. + transparency Test data are reusable. Test data are publicly available. + relevance Test data are novel. The quality of test data is higher than that of existing test data. SEALS value-added features The SEALS Platform already contains diļ¬€erent resources for the evaluation of semantic technologies (i.e., evaluation workļ¬‚ows, services and test data). These resources are publicly available so they can be reused to avoid starting your eval- uation from scratch. 2.2.2 Involvement phase The Involvement phase comprises the set of tasks in which the evaluation campaign is announced and participants show their interest in participating by registering for the 10 of 22
  • 11. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns evaluation campaign. These tasks and their interdependencies, shown in ļ¬gure 2.3, are the following: 1. Announce evaluation campaign. 2. Provide registration mechanisms. 3. Participant registration. Evaluation Campaign Announce Ā  Announcement Participant List Par+cipant Ā  evalua+on Ā  registra+on Ā  Ā  campaign Ā  Organizers Participants Provide Ā  registra+on Ā  mechanisms Ā  Registration Mechanisms Organizers Figure 2.3: Involvement phase of the evaluation campaign process. Announce evaluation campaign Actors E. C. Organizers Inputs Outputs Evaluation Scenarios Evaluation Campaign Announcement Evaluation Campaign Schedule Once the diļ¬€erent evaluation scenarios are deļ¬ned, the Organizers must announce the evaluation campaign using any mechanism available (e.g., mailing lists, blogs, etc.) with the goal of reaching the developers of the tools that are targeted by the evaluation scenarios. Recommendations for evaluation campaign announcement Requirement Recommendation + participation The evaluation campaign is announced internationally. Tool developers are directly contacted. 11 of 22
  • 12. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns Recommendations for evaluation campaign participation Requirement Recommendation + openness Participation in the evaluation campaign is open to any orga- nization. + participation The eļ¬€ort needed for participating in the evaluation campaign is minimal. People can participate in the evaluation campaign regardless of their location. Participation in the evaluation campaign does not require at- tending to any location. The requirements for tool participation are minimal. Participants have time to participate in several evaluation sce- narios. + relevance Community experts participate in the evaluation campaign. Tools participating in the evaluation campaign include the most relevant tools. SEALS value-added features The SEALS community dissemination services (e.g., SEALS Portal, mailing lists, blog, etc.) can support spreading your evaluation campaign announcements to a whole community of users and providers interested in semantic technology evalu- ation. Provide registration mechanisms Actors E. C. Organizers Inputs Outputs Evaluation Scenarios Registration Mechanisms In this task, the Organizers must provide the mechanisms needed to allow potential participants to register and to provide detailed information about themselves and their tools. Recommendations for registration mechanisms Requirement Recommendation + participation Participants can register in the evaluation campaign regardless of their location. SEALS value-added features If you donā€™t want to implement your own registration mechanisms, the SEALS Portal can take care of user registration for your evaluation campaign. 12 of 22
  • 13. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns Participant registration Actors E. C. Participants Inputs Outputs Evaluation Campaign Announcement Participant list Registration Mechanisms In this task, any individual or organization willing to participate in any of the eval- uation scenarios of the evaluation campaign must register to indicate their willingness to do so. SEALS value-added features If you are using the SEALS Platform for executing your evaluation campaign, participant registration through the SEALS Portal can be coupled with diļ¬€erent evaluation services (e.g., uploading the participant tool into the platform). 2.2.3 Preparation and execution phase The Preparation and execution phase comprises the set of tasks that must be per- formed to insert the participating tools into the evaluation infrastructure, to execute each of the evaluation scenarios, and to analyse the evaluation results. The tasks that compose this phase can be performed either independently for each evaluation scenario or covering all the evaluation scenarios in each task. These tasks and its interdependencies, shown in ļ¬gure 2.4, are the following: 1. Provide evaluation materials. 2. Insert tools. 3. Perform evaluation. 4. Analyse results. Evaluation Tools Evaluation Result Provide Ā  Materials Inserted Results Analysis Insert Ā  Perform Ā  Analyse Ā  evalua,on Ā  tools Ā  evalua,on Ā  results Ā  materials Ā  Organizers Participants Participants/Organizers Participants/Organizers Figure 2.4: Preparation and execution phase of the evaluation campaign process. 13 of 22
  • 14. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns Provide evaluation materials Actors E. C. Organizers Inputs Outputs Evaluation Scenarios Evaluation Materials In this task the Organizers must provide to the registered participants all the evaluation materials needed in the evaluation, including: ā€¢ Instructions on how to participate. ā€¢ Evaluation description. ā€¢ Evaluation test data. ā€¢ Evaluation infrastructure. ā€¢ Any software needed for the evaluation. Alternatives: āŠ— Evaluation test data availability a. Evaluation test data are available to the participants in this task. b. Evaluation test data are sequestered and are not available to the par- ticipants. āŠ— Reference tool a. Participants are provided with a reference tool and with the results for this tool. This tool could be made of modules so participants donā€™t have to implement a whole tool to participate. Recommendations for evaluation material provision Requirement Recommendation + objectiveness All the participants are provided with the same evaluation materials. + participation Participants are provided with all the necessary documenta- tion. Recommendations for evaluation software Requirement Recommendation + openness The software used in the evaluation is open source. + transparency The software used in the evaluation is publicly available. + participation The software used in the evaluation is robust and eļ¬ƒcient. 14 of 22
  • 15. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns SEALS value-added features Using the SEALS Platform participants can access all the evaluation materials stored in the platform repositories either manually through the SEALS Portal or automatically using the SEALS Platform services. Insert tools Actors E. C. Participants Inputs Outputs Evaluation Materials Tools Inserted Once the Participants have all the evaluation materials, they must insert their tools into the evaluation infrastructure and ensure that these tools are ready for the evaluation execution. Recommendations for tool insertion Requirement Recommendation + participation Participants can use test data to prepare their tools. Participants can check that their tool provides the outputs required. Participants can test the insertion of their tools. The insertion of tools is simple. SEALS value-added features Similarly to participant registration, tool insertion can be managed through the SEALS Portal. Furthermore, the SEALS Platform can check that tools are cor- rectly inserted so they are ready for execution. Perform evaluation Actors E. C. Participants/E. C. Organizers Inputs Outputs Tools Inserted Evaluation Results In this task, the evaluation is executed over all the participating tools and the evaluation results of all the tools are collected. 15 of 22
  • 16. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns Alternatives: āŠ— Evaluation execution a. The execution of the evaluation is performed by the Organizers. b. The execution of the evaluation is performed by the Participants. c. The execution of the evaluation is performed by the Organizers and the Participants. āŠ— Time for executing evaluations a. Participants have a limited period of time to return their evaluation results. āŠ— Use of auxiliary resources a. Participants cannot use any auxiliary resource in the evaluation (de- velopment and training test data, external resources, etc.). Recommendations for evaluation execution Requirement Recommendation + participation Evaluation execution requires few resources. Evaluation execution can be made regardless of the location or time. + objectiveness All the tools are evaluated following the same evaluation de- scription. All the tools are evaluated using the same test data. + credibility The results of the evaluation execution are validated. SEALS value-added features If all the resources needed in an evaluation (i.e., evaluation workļ¬‚ow, test data and tools) are stored inside the SEALS Platform, the platform can automatically execute your evaluation and store all the produced results. Analyse results Actors E. C. Participants/E. C. Organizers Inputs Outputs Evaluation Results Result Analysis Once the evaluation results of all the tools are collected, they are analysed both individually for each tool and globally including all the tools. This results analysis must be reviewed in order to get agreed conclusions. There- fore, if the results are analysed by the Organizers then this analysis must be reviewed 16 of 22
  • 17. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns by the Participants and vice versa, that is, if the results are analysed by the Partici- pants they must be reviewed by the Organizers. Alternatives: āŠ— Analysis of evaluation results a. The analysis of the evaluation results is performed by the Organizers. b. The analysis of the evaluation results is performed by the Participants. c. The analysis of the evaluation results is performed by the Organizers and the Participants. āŠ— Anonymised results a. The results of the evaluation campaign are anonymised so the name of the tool that produces them is not known. Recommendations for evaluation results Requirement Recommendation + objectiveness Evaluation results are analysed identically for all the tools. Participants can comment on the evaluation results of their tools before making them public. There is enough time for analysing the evaluation results. + transparency Evaluation results are publicly available at the end of the evaluation campaign. Evaluation results can be replicated by anyone. + sustainability Evaluation results are compiled, documented, and expressed in a common format. SEALS value-added features Evaluation results stored in the SEALS Platform can be accessed either through the SEALS Portal by means of visualization services or automatically through the platform services so they can be used in your own calculations. Besides, you can easily combine the results of diļ¬€erent evaluations to suit your goals. 2.2.4 Dissemination phase The Dissemination phase comprises the set of tasks that must be performed to dis- seminate the evaluation campaign results and to make all the evaluation campaign result and resources available. The tasks that compose this phase can be performed either independently for each evaluation scenario or covering all the evaluation scenar- ios in each task. These tasks and its interdependencies, shown in ļ¬gure 2.5, are the following: 1. Disseminate results. 17 of 22
  • 18. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns 2. Publish outcomes. Feedback Results Published Disseminate Ā  Publish Ā  results Ā  outcomes Ā  Evaluation Resources Published Organizers + Participants Organizers + Participants Figure 2.5: Dissemination phase of the evaluation campaign process. Disseminate results Actors E. C. Organizers E. C. Participants Inputs Outputs Result Analysis Feedback In this task, the Organizers and the Participants must disseminate the results of the evaluation campaign. The preferred way of dissemination is one that maximizes dis- cussion to obtain feedback about the evaluation campaign (e.g., a face-to-face meeting with participants or a workshop). Recommendations for result presentation Requirement Recommendation + consensus Evaluation results are presented and discussed in a meeting. + transparency Participants write the description of their tools and their re- sults. + objectiveness The presentation of results is identical for all the tools. The presentation of results includes the reasons for obtaining these results. Recommendations for the dissemination meeting Requirement Recommendation + openness The dissemination meeting is public. + relevance The dissemination meeting is collocated with a relevant event. + participation Participants present the details and results of their tool. + credibility The diļ¬ƒculties faced during the evaluation campaign are pre- sented. + consensus Feedback about the evaluation campaign is obtained. + sustainability The dissemination meeting includes the discussion of the next steps to follow. A survey is performed to organizers, participants and atten- dants. 18 of 22
  • 19. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns SEALS value-added features The SEALS community dissemination services (e.g., SEALS Portal, mailing lists, blog, etc.) can support disseminating your evaluation campaign results and at- tracting people to meetings where they can be discussed. Publish outcomes Actors E. C. Organizers E. C. Participants Inputs Outputs Feedback Results and Resources Published In this task, the Organizers and the Participants are encouraged to document the results of the evaluation campaign and of each of the tools, including not only the results obtained but also improvement recommendations based on the feedback obtained and the lessons learnt during the evaluation campaign. Finally, the Organizers must make publicly available all the evaluation resources used in the evaluation campaign, including any resources that were not made available to participants (e.g., sequestered evaluation test data). Recommendations for result publication Requirement Recommendation + participation Results are jointly published by all participants (e.g., as work- shop proceedings, journal special issues, etc.). Recommendations for resource publication Requirement Recommendation + transparency All the evaluation resources are made publicly available. + sustainability Test data are maintained by an association. SEALS value-added features Making evaluation resources public once the evaluation campaign is over is straight- forward since all those resources are already available from the SEALS Platform. Furthermore, the SEALS Portal is the best place to upload or link to any report resulting from your evaluation campaign. 19 of 22
  • 20. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns 3. Evaluation campaign agreements This chapter presents the general terms for participation in the SEALS evaluation campaigns and the policies for using the resources and results produced in these eval- uation campaigns. They are based on the data policies of the Ontology Alignment Evaluation Initiative1 . These terms for participation and policies are only a suggestion. They are being used in the SEALS evaluation campaigns but can be adapted as seen ļ¬t for other campaigns. 3.1 Terms of participation By submitting a tool and/or its results to a SEALS evaluation campaign the partici- pants grant their permission for the publication of the tool results on the SEALS web site and for their use for scientiļ¬c purposes (e.g., as a basis for experiments). In return, it is expected that the provenance of these results is correctly and duly acknowledged. 3.2 Use rights In order to avoid any inadequate use of the data provided by the SEALS evaluation campaigns, we make clear the following rules of use of these data. It is the responsibility of the user of the data to ensure that the authors of the results are properly acknowledged, unless these data are used in an anonymous aggregated way. In the case of participant results, an appropriate acknowledgement is the mention of this participant and a citation of a paper from the participants (e.g., the paper detailing their participation). The speciļ¬c conditions under which the results have been produced should not be misrepresented (an explicit link to their source in the SEALS web site should be made). These rules apply to any publication mentioning these results. In addition, speciļ¬c rules below also apply to particular types of use of the data. Rule applying to the non-public use of the data Anyone can freely use the evaluations, test data and evaluation results for evaluating and improving their tools and methods. Rules applying to evaluation campaign participants The participants of some evaluation campaign can publish the results as long as they cite the source of the evaluations and in which evaluation campaign they were obtained. Participants can compare their results with other published results on the SEALS web site as long as they also: 1 http://oaei.ontologymatching.org/doc/oaei-deontology.html 20 of 22
  • 21. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns ā€¢ compare with the results of all the participants of the same evaluation scenario; and ā€¢ compare with all the test data of this evaluation scenario. Of course, participants can mention their participation in the evaluation campaign. Rules applying to people who did not participate in an evaluation campaign People who did not participate in an evaluation campaign can publish their results as long as they cite the sources of the evaluations and in which evaluation campaign they were obtained and they need to make clear that they did not participate in the oļ¬ƒcial evaluation campaign. They can compare their results with other published results on the SEALS web site as long as they: ā€¢ cite the source of the evaluations and in which evaluation campaign they were obtained; ā€¢ compare with the results of all the participants of the same evaluation scenario; and ā€¢ compare with all the test data of this evaluation scenario. They cannot pretend having executed the evaluation in the same conditions as the participants. Furthermore, given that evaluation results change over time, it is not ethical to compare one tool against old results; one should always make comparisons with the state of the art. In the case that this comparison is not possible, results can be compared with older results but it must be made clear the age of the result and the version of the tool that produced them. Rules applying to other cases Anyone can mention the evaluations and evaluation campaigns for discussing them. Any other use of these evaluations and their results is not authorized (you can ask for permission however to the contact point) and failing to comply to the requirements above is considered as unethical. 21 of 22
  • 22. FP7 ā€“ 238975 SEALS Methodology for Evaluation Campaigns References [1] R. GarcĀ“ ıa-Castro and F. MartĀ“ ın-Recuerda. D3.1 SEALS Methodology for Evalua- tion Campaigns v1. Technical report, SEALS Project, October 2009. [2] ISO/IEC. ISO/IEC 14598-6: Software product evaluation - Part 6: Documentation of evaluation modules. 2001. 22 of 22