Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OpenAIRE Text notes of the Tutorial on Automatic Inference Of Links


Published on

  • Be the first to comment

  • Be the first to like this

OpenAIRE Text notes of the Tutorial on Automatic Inference Of Links

  1. 1. Once your repository platform has been made OpenAIRE compliant, researchers from your institution will be able to deposit their publications by providing the relative files and bibliographic metadata, inclusive of license information and the list of EC projects which funded such publications. Integrating your repository with the OpenAIRE infrastructure is an important step towards helping your researchers at complying with the EC Open Access mandate. However, while this will be a clear benefit for the future, what happens with all the publications deposited in the past, whose metadata did not include EC project information? You can approach the problem in two ways. The so-called manual approach consists in asking your researchers to revise and complete all past depositions through the newly provided user interfaces. Since this may be a tedious job, the OpenAIRE infrastructure offers an automatic inference approach, according to which special services are capable of inferring from the PDF files of the publications the list of EC projects that have likely funded such publications. To this aim, repository managers must make available the PDF files of the publications to the OpenAIRE infrastructure. This can happen through standards protocols, such as FTP, to be agreed with the OpenAIRE technical team. Most importantly, the names of the PDF files must include the OAI-PMH identifier provided with the corresponding metadata records. This implicit link will allow for the completion of the metadata information with the EC project information to be extracted by OpenAIRE. The inference process returns to repository managers the list of file names for which it was possible to infer at least one EC project, followed by the relative list of grant agreement numbers. The list can be provided in several formats, including txt or Excel files, to be agreed with the OpenAIRE technical team. Repository managers must write scripts capable of processing such list to complete the local database with the missing associations between publications and EC projects. At this stage, repository managers may involve researchers to confirm the result of the inference process and therefore enable a simplified and faster manual approach. The automatic inference service requires considerable CPU consumption in order to parse large sets of PDF files and identify references to EC projects grant agreement numbers. To this aim, OpenAIRE exploits the GRID power supported by the D4Science infrastructure, in turn powered by the gCube software system. For further information, please visit the highlighted URLs.