SlideShare a Scribd company logo
twitter.com/openminted_eu
(penny@ilsp.gr)
OpenMinTeD sets out to create an open, service-oriented
e-Infrastructure for TDM of scientific and scholarly
content. Researchers can collaboratively create, discover,
share and re-use Knowledge from a wide range of text-
based scientific related sources in a seamless way.
"Achieving interoperability between resources involved in TDM
at the level of metadata"
• Text and Data Mining: “the discovery by computer of new,
previously unknown information, by automatically extracting
and relating information from different (…) resources to reveal
otherwise hidden meanings” [Hearst 1999]
• Interoperability: Relating to systems, especially of computers
or telecommunications, that are capable of working together
without being specially configured to do so. [American Heritage®
Dictionary of the English Language, Fifth Edition. (2011)]
• Language Resource: It encompasses both data sets (textual,
multimodal/multimedia and lexical data, grammars, language
models etc.) and tools/technologies/services used for their
processing [WG1 - Wiki glossary]
• Metadata: contains descriptive, contextual and provenance
assertions about the properties of a Digital Object [RDA - DFT Core
terms]
• to be mined, i.e. in OpenMinTeD, scientific & scholarly publications (built
as "corpora")
• ancillary/reference resources, e.g. typesystems, linguistic tagsets,
terminological lexica, ontologies, machine learning models, reference
corpora, training corpora etc.
• "components" in the form of
• downloadable and locally executable tools
• web services
• workflows composed of the above
• registry service: to register and, later on, search and find content and
s/w components that can process this content - targeting end users
including TDM experts
• workflow service: to search and find s/w components & ancillary
resources that are (or can be made) compatible (hence,
interoperable) in order to compose workflows - targeting TDM service
developers

• document properties that users use in their queries to discover the
resources but also
• document properties that will support the automatic discovery of
compatibility between (a) s/w components & (b) between content &
s/w components (aka. find interoperable resources)
OMTD
publications ancillary content resources s/w components
OpenAIRE CORE
publishers
etc.
…
MetaShare
discipline
portal 2
discipline
portal 1 Maven Docker
component
collections
typesystems
tagsets
ML models
…
lexica
ontologies
ML models
corpora
…
Goal: achieve interoperability
• per language resource type
• across language resource types
Problems
• various metadata schemas
• various communities
 semantics!!
 crosswalks/mappings/semantic links
• Need to define a common core vocabulary for the description of
the resource properties, e.g.
• language of a publication/corpus & language of the input of a s/w
component
• domain/subject of a publication/corpus & domain/subject of an
ontology that can be used to annotate it
but
• how can we select the "common denominator" from all the
schemas?
• gaps in original metadata records deemed important for TDM
• wealth of original records & loss of information
• mismatches between metadata elements/values
organize the schema elements and accommodate common vs.
particular features of resources
be flexible enough to support varying degrees of documentation
completeness
cover documentation needs of all resource types involved in TDM
cover needs of resource discoverability and TDM processing
reuse what is available vs. create and recommend new elements
and values
document processing procedure and outputs
standardize/normalize user input vs. allow for free user input
• OMTD Deliverable D5.2 - Interoperability Requirements Specification
[soon to be made publicly available]
• scenarios & use cases targeted by OMTD in the Areas of: scholarly
communication, life sciences, agriculture & biodiversity, social
sciences
• overview of relevant metadata schemas (e.g. OpenAIRE, CORE,
RIOXX guidelines, CrossRef, MetaShare, DataCite, DCAT, CMDI
relevant metadata profiles etc.) – cf. OMTD Deliverable D5.1 -
Interoperability Landscaping Report
<corpusMetadataRecord>
<metadataHeaderInfo>
<metadataRecordIdentifier metadataIdentifierSchemeName="hdl">PIDtest</metadataRecordIdentifier>
<metadataCreationDate>2016-07-29</metadataCreationDate>
<collectedFrom>
<repositoryName lang="en-us">OpenAIRE</repositoryName>
<repositoryName lang="en-us">CORE</repositoryName>
</collectedFrom>
</metadataHeaderInfo>
<corpusInfo>
<resourceType>corpus</resourceType>
<identificationInfo>
<resourceName>Corpus of English articles in biomedicine from OpenAIRE and CORE</resourceName>
<description lang="en-us">A corpus created automatically by the corpus building process in OpenMinTeD,
consisting of 17987 articles related to biomedicine</description>
<identifier resourceIdentifierSchemeName="hdl">temporary identifier</identifier>
</identificationInfo>
<contactEmail>user@omtd.com</contactEmail>
<resourceCreationInfo>
<resourceCreator>Smith, John</resourceCreator>
</resourceCreationInfo>
<datasetDistributionInfo>
<licence>CC-BY-NC</licence>
<version>4.0</version>
</rightsInfo>
</datasetDistributionInfo>
<corpusSubtype>rawCorpus</corpusSubtype>
<languageTag>en</languageTag>
<domain classificationSchemeName="PAROLE_topicClassification">science</domain>
<textFormats>
<mimeType>text/plain</mimeType>
<mimeType>application/pdf</mimeType>
</textFormats>
<characterEncoding>UTF-8</characterEncoding>
<sizeInfo>
<size>17987</size>
<sizeUnit>articles</sizeUnit>
</sizeInfo>
</corpusInfo>
</corpusMetadataRecord>
<componentMetadataRecord>
<metadataHeaderInfo>…</metadataHeaderInfo>
<componentInfo>
<resourceType>component</resourceType>
<identificationInfo>
<resourceName lang="en-us">ILSP Feature-based multi-tiered POS Tagger</resourceName>
<description lang="en-us">FBT part-of-speech tagger for Greek texts. </description>
<resourceShortName lang="en-us">ilsp_fbt</resourceShortName>
</identificationInfo>
<contactEmail >test@omtd.com</contactEmail>
<version>v1.0.0</version>
<resourceCreator>ILSP team</resourceCreator>
<componentType>morphologicalTagger</componentType>
<componentDistributionMedium>webService</componentDistributionMedium>
<accessURL>http://access.com</accessURL>
<webServiceType>SOAP</webServiceType>
<rightsInfo>
<licence>nonStandardLicenceTerms</licence>
<nonStandardLicenceName lang="en-us">terms of service</nonStandardLicenceName>
<nonStandaradLicenceTermsURL>http://example.com</nonStandaradLicenceTermsURL>
</rightsInfo>
<inputContentResourceInfo>
<resourceType>corpus</resourceType>
<resourceType>document</resourceType>
<languageTag>el</languageTag>
<characterEncoding>UTF-8</characterEncoding>
<mimeType>text/plain</mimeType>
</inputContentResourceInfo>
<outputResourceInfo>
<resourceType>corpus</resourceType>
<resourceType>document</resourceType>
<languageTag>el</languageTag>
<characterEncoding>UTF-8</characterEncoding>
<mimeType>text/xml</mimeType>
<dataFormatSpecific>xces; format-variant=ilsp</dataFormatSpecific>
<typesystem>ILSP-typesystem</typesystem>
<tagset>ILSP-POStagset</tagset>
<annotationLevel>morphosyntacticAnnotation-bPosTagging</annotationLevel>
</outputResourceInfo>
<componentDependencies>
<typesystem>ILSP-typesystem</typesystem>
<tagset>ILSP-POStagset</tagset>
</componentDependencies>
</componentMetadataRecord>
• obligatory: record what is necessary for intended purposes vs. ease
to document,
• e.g. language for scholarly publications but title and author??, format and subject
of a document??
• recommended: features that can help the user or future uses or that
users find useful but providers have not yet standardized,
• e.g. documentation / help files, attribution, citation papers
• optional: all remaining information related to the lifecycle of a
resource
• e.g. funding information (still: funding agencies are becoming more and more
interested in it!), projects where the resources have been used and created
outputs
• organize the schema into semantically coherent elements
• common to all types of resources (e.g. identification, licensing etc.)
• per resource type
• re-usable for more than one resource type but not globally applicable (e.g. subject
classification) and
• strictly applied to specific resource types (e.g. evaluation for s/w components)
set of
elements element
link to entity
• identification & provenance of the metadata record
• metadata record identifier
• metadata creation date
• identification of the resource
• identifiers with identificationScheme (name/URI)
• title & description (multilingual; English should be there but ?)
• distribution & licensing/access
• distribution medium/format (e.g. executable code, downloadable text etc.)
• licence and/or rightsStatement (ongoing work)
• licence text or URL (provided by system for standard licences)
• contact information
• either email or landing page
• resource type (& subtype)
abstract
/ full text
typesystem
title
character
encoding
format
language
dependencies
input / output
content resourc
e
algorithm
tagset
typesystem
language annotation
level
language
classification
tagset
annotation
level
character
encoding
format
size
language
character
encoding
format
publisher
/ journal
classification
authors
annotation
resource
typesystem
tagset
annotation
resource
language
classification
character
encoding
format
size
• relations between resources can be encoded
• inside each metadata record (e.g. between publication & authors)
• separately, from both metadata records (e.g. between component & model)
• implementation issues for optionality and restrictions: uniformity of
metadata records across sources vs. better treatment of restrictions
via the registry service  which restrictions should be in the schema
and which restrictions should be in a system built on top of the
schema?
• recommend and link to authority lists for properties
• format: IANA list of media types BUT need for extensions!
• language: ISO 639-3 vs. IETF BCP47
• subject classification: DDC, LCSH, EUROVOC, discipline-specific lists…  we
cannot enforce one scheme, so we recommend their use and ask for reference to
it; this is currently encoded as enumerations but link to external source is a better
solution
• create elements & values in attested gaps & where considered best
for OMTD purposes
• classification of components, lexical/conceptual resources etc.
• annotation set of elements and values [ to be included in the output resources
automatically via the platform]
•  links to be provided to elements in other metadata schemas (DataCite,
CrossRef, DCAT, etc.) (ongoing work)
• adopt entire metadata schemas and registries for satellite entities
• repositories & registries  openDOAR & re3data
• journals  DOAJ
BUT
• persons  ORCID & SCOPUS id
• organisations  ISNI & fundref
& covered with own metadata elements
• link to other resources or satellite entities via identifier (PID) or
descriptive elements: recommend but allow for backup solutions when
the identifier is not there
• identifier preferably from an authority source, with reference to it: DOI for
publications, DataCite for datasets & services, ORCID for persons, ISNI or fundef
for organizations etc.
• but allow for other identifiers too: "identifierSchemeName" &
"identifierSchemeURL"
• descriptive elements: title, full name, etc.
• value system for elements
• e.g. free text vs. controlled vocabularies
• represented as enumeration
• semantics of closed & open vocabularies
• open vocabularies with the additional value "other" but … how can one add values
and yet curate the vocabularies??
• annotation set of elements:
• set of elements and values that can be added independently as a block to each
resource following the processing
• information on s/w component(s), type of annotation, tagsets, annotation
resources, annotators, format etc.
• covering provenance requirements but also to be used as input for further
processing
• XSD schemas v1.0.0 & documentation:
https://openminted.github.io/openminted-site/releases/omtd-
share/1.0.0/html/index.html
• Guidelines: on the way!
• Conversions from existing descriptors (ongoing work)
• not all information is available (e.g. licence, direct link to publication
contents, language of metadata fields, subject etc.)
• different approach between schemas (element vs. attribute)
• lack of a common API approach (as OAI-PMH across repositories)
• different mechanisms for flagging OA content
• inconsistent provision of full text links (incl. in CrossRef TDM)
• legal and technical issues around systematic full text aggregation
from publishers (including via CrossRef TDM)
• full text harvesting/crawling limits in place on publisher endpoints
• lack of support for discovery of new content
• lack of documentation on publisher systems
• largely technical information
• some non-technical information possible but seldom used (e.g.
developer information in Maven) - why?
• technical elements present but in many cases possible values not
restricted (e.g. media-type or language)
• "persistent identifier" e.g. in Maven is self-assigned and global
uniqueness is not enforced but governed by best-practice in contrast
to e.g. centrally assigned DOI- good or bad or tolerable?
• the closest to OMTD-SHARE schema (for obvious reasons)
• resource types converted: corpora, components, lexical/conceptual
resources & models
• main problems were the lack of persistent identifiers and the
decisions taken for further standardization/normalisation
www.openminted.eutwitter.com/openminted_eu
penny@ilsp.gr

More Related Content

What's hot

Converging research towards AccessForAll
Converging research towards AccessForAllConverging research towards AccessForAll
Converging research towards AccessForAll
liddy
 
Resource Browser
Resource BrowserResource Browser
Resource Browser
Sheila MacNeill
 
CD-LOR SRU Tool
CD-LOR SRU ToolCD-LOR SRU Tool
CD-LOR SRU Tool
Sheila MacNeill
 
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data MobilisationEIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
Vishwas Chavan
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic Technolgy
Talat Fakhri
 
Semantic Web Technology and Ontology designing for e-Learning Environments
Semantic Web Technology and Ontology designing for e-Learning EnvironmentsSemantic Web Technology and Ontology designing for e-Learning Environments
Semantic Web Technology and Ontology designing for e-Learning Environments
Robin Khanna
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
henkvandenberg16
 
Text mining, By Hadi Mohammadzadeh
Text mining, By Hadi MohammadzadehText mining, By Hadi Mohammadzadeh
Text mining, By Hadi Mohammadzadeh
Hadi Mohammadzadeh
 
Linkator: enriching web pages by automatically adding dereferenceable semanti...
Linkator: enriching web pages by automatically adding dereferenceable semanti...Linkator: enriching web pages by automatically adding dereferenceable semanti...
Linkator: enriching web pages by automatically adding dereferenceable semanti...
Samur Araujo
 
Knowledge Engineering for TELDAP
Knowledge Engineering for TELDAPKnowledge Engineering for TELDAP
Knowledge Engineering for TELDAP
AAT Taiwan
 
Web 3 final(1)
Web 3 final(1)Web 3 final(1)
Web 3 final(1)
Venky Dood
 
Longwell final ppt
Longwell final pptLongwell final ppt
Longwell final ppt
Kuldeep Singh
 
Annotating Digital Texts in the Brown University Library
Annotating Digital Texts in the Brown University LibraryAnnotating Digital Texts in the Brown University Library
Annotating Digital Texts in the Brown University Library
Timothy Cole
 
Week12
Week12Week12
Week12
Esha Meher
 
Open Annotation Collaboration Briefing
Open Annotation Collaboration BriefingOpen Annotation Collaboration Briefing
Open Annotation Collaboration Briefing
Timothy Cole
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
pkaviya
 
Folksonomies: a bottom-up social categorization system
Folksonomies: a bottom-up social categorization systemFolksonomies: a bottom-up social categorization system
Folksonomies: a bottom-up social categorization system
domenico79
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
Dag Endresen
 
Wikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization SystemsWikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization Systems
Jakob .
 

What's hot (19)

Converging research towards AccessForAll
Converging research towards AccessForAllConverging research towards AccessForAll
Converging research towards AccessForAll
 
Resource Browser
Resource BrowserResource Browser
Resource Browser
 
CD-LOR SRU Tool
CD-LOR SRU ToolCD-LOR SRU Tool
CD-LOR SRU Tool
 
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data MobilisationEIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic Technolgy
 
Semantic Web Technology and Ontology designing for e-Learning Environments
Semantic Web Technology and Ontology designing for e-Learning EnvironmentsSemantic Web Technology and Ontology designing for e-Learning Environments
Semantic Web Technology and Ontology designing for e-Learning Environments
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
 
Text mining, By Hadi Mohammadzadeh
Text mining, By Hadi MohammadzadehText mining, By Hadi Mohammadzadeh
Text mining, By Hadi Mohammadzadeh
 
Linkator: enriching web pages by automatically adding dereferenceable semanti...
Linkator: enriching web pages by automatically adding dereferenceable semanti...Linkator: enriching web pages by automatically adding dereferenceable semanti...
Linkator: enriching web pages by automatically adding dereferenceable semanti...
 
Knowledge Engineering for TELDAP
Knowledge Engineering for TELDAPKnowledge Engineering for TELDAP
Knowledge Engineering for TELDAP
 
Web 3 final(1)
Web 3 final(1)Web 3 final(1)
Web 3 final(1)
 
Longwell final ppt
Longwell final pptLongwell final ppt
Longwell final ppt
 
Annotating Digital Texts in the Brown University Library
Annotating Digital Texts in the Brown University LibraryAnnotating Digital Texts in the Brown University Library
Annotating Digital Texts in the Brown University Library
 
Week12
Week12Week12
Week12
 
Open Annotation Collaboration Briefing
Open Annotation Collaboration BriefingOpen Annotation Collaboration Briefing
Open Annotation Collaboration Briefing
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
 
Folksonomies: a bottom-up social categorization system
Folksonomies: a bottom-up social categorization systemFolksonomies: a bottom-up social categorization system
Folksonomies: a bottom-up social categorization system
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
Wikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization SystemsWikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization Systems
 

Similar to Webinar slides: Interoperability between resources involved in TDM at the level of metadata

Networked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And DissertationsNetworked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And Dissertations
singlish
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
Jack Eapen
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
Jack Eapen
 
The Mysteries of Metadata
The Mysteries of MetadataThe Mysteries of Metadata
The Mysteries of Metadata
Amit Sheth
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
Adrian Stevenson
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Jenn Riley
 
It's all semantics! -The premises and promises of the semantic web
It's all semantics! -The premises and promises of the semantic webIt's all semantics! -The premises and promises of the semantic web
It's all semantics! -The premises and promises of the semantic web
Scottish Library & Information Council (SLIC), CILIP in Scotland (CILIPS)
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
Dr. Haxel Consult
 
Urm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesUrm concept for sharing information inside of communities
Urm concept for sharing information inside of communities
Karel Charvat
 
Metadata
MetadataMetadata
Metadata
saurabh kaushik
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
Valeria Pesce
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Anita de Waard
 
Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.
Menzo Windhouwer
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
ManjulaPatel
 
MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)
Nikos Palavitsinis, PhD
 
Ontology based metadata schema for digital library projects in China
Ontology based metadata schema for digital library projects in ChinaOntology based metadata schema for digital library projects in China
Ontology based metadata schema for digital library projects in China
AIMS (Agricultural Information Management Standards)
 
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemes
Richard.Sapon-White
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)
mhb120
 
Semantic web technology
Semantic web technologySemantic web technology
Semantic web technology
Stanley Wang
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-World
Amit Sheth
 

Similar to Webinar slides: Interoperability between resources involved in TDM at the level of metadata (20)

Networked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And DissertationsNetworked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And Dissertations
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
The Mysteries of Metadata
The Mysteries of MetadataThe Mysteries of Metadata
The Mysteries of Metadata
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
 
It's all semantics! -The premises and promises of the semantic web
It's all semantics! -The premises and promises of the semantic webIt's all semantics! -The premises and promises of the semantic web
It's all semantics! -The premises and promises of the semantic web
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
 
Urm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesUrm concept for sharing information inside of communities
Urm concept for sharing information inside of communities
 
Metadata
MetadataMetadata
Metadata
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
 
MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)
 
Ontology based metadata schema for digital library projects in China
Ontology based metadata schema for digital library projects in ChinaOntology based metadata schema for digital library projects in China
Ontology based metadata schema for digital library projects in China
 
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemes
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)
 
Semantic web technology
Semantic web technologySemantic web technology
Semantic web technology
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-World
 

More from openminted_eu

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDM
openminted_eu
 
OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
openminted_eu
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
openminted_eu
 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources sync
openminted_eu
 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
openminted_eu
 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Mining
openminted_eu
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
openminted_eu
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK theses
openminted_eu
 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledge
openminted_eu
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilities
openminted_eu
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
openminted_eu
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
openminted_eu
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mine
openminted_eu
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKPro
openminted_eu
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
openminted_eu
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspective
openminted_eu
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
openminted_eu
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
openminted_eu
 

More from openminted_eu (18)

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDM
 
OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources sync
 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Mining
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK theses
 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledge
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilities
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mine
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKPro
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspective
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
 

Recently uploaded

GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 

Recently uploaded (20)

GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 

Webinar slides: Interoperability between resources involved in TDM at the level of metadata

  • 2.
  • 3. OpenMinTeD sets out to create an open, service-oriented e-Infrastructure for TDM of scientific and scholarly content. Researchers can collaboratively create, discover, share and re-use Knowledge from a wide range of text- based scientific related sources in a seamless way. "Achieving interoperability between resources involved in TDM at the level of metadata"
  • 4. • Text and Data Mining: “the discovery by computer of new, previously unknown information, by automatically extracting and relating information from different (…) resources to reveal otherwise hidden meanings” [Hearst 1999] • Interoperability: Relating to systems, especially of computers or telecommunications, that are capable of working together without being specially configured to do so. [American Heritage® Dictionary of the English Language, Fifth Edition. (2011)]
  • 5. • Language Resource: It encompasses both data sets (textual, multimodal/multimedia and lexical data, grammars, language models etc.) and tools/technologies/services used for their processing [WG1 - Wiki glossary] • Metadata: contains descriptive, contextual and provenance assertions about the properties of a Digital Object [RDA - DFT Core terms]
  • 6. • to be mined, i.e. in OpenMinTeD, scientific & scholarly publications (built as "corpora") • ancillary/reference resources, e.g. typesystems, linguistic tagsets, terminological lexica, ontologies, machine learning models, reference corpora, training corpora etc. • "components" in the form of • downloadable and locally executable tools • web services • workflows composed of the above
  • 7. • registry service: to register and, later on, search and find content and s/w components that can process this content - targeting end users including TDM experts • workflow service: to search and find s/w components & ancillary resources that are (or can be made) compatible (hence, interoperable) in order to compose workflows - targeting TDM service developers  • document properties that users use in their queries to discover the resources but also • document properties that will support the automatic discovery of compatibility between (a) s/w components & (b) between content & s/w components (aka. find interoperable resources)
  • 8. OMTD publications ancillary content resources s/w components OpenAIRE CORE publishers etc. … MetaShare discipline portal 2 discipline portal 1 Maven Docker component collections typesystems tagsets ML models … lexica ontologies ML models corpora …
  • 9. Goal: achieve interoperability • per language resource type • across language resource types Problems • various metadata schemas • various communities  semantics!!  crosswalks/mappings/semantic links
  • 10. • Need to define a common core vocabulary for the description of the resource properties, e.g. • language of a publication/corpus & language of the input of a s/w component • domain/subject of a publication/corpus & domain/subject of an ontology that can be used to annotate it but • how can we select the "common denominator" from all the schemas? • gaps in original metadata records deemed important for TDM • wealth of original records & loss of information • mismatches between metadata elements/values
  • 11. organize the schema elements and accommodate common vs. particular features of resources be flexible enough to support varying degrees of documentation completeness cover documentation needs of all resource types involved in TDM cover needs of resource discoverability and TDM processing reuse what is available vs. create and recommend new elements and values document processing procedure and outputs standardize/normalize user input vs. allow for free user input
  • 12. • OMTD Deliverable D5.2 - Interoperability Requirements Specification [soon to be made publicly available] • scenarios & use cases targeted by OMTD in the Areas of: scholarly communication, life sciences, agriculture & biodiversity, social sciences • overview of relevant metadata schemas (e.g. OpenAIRE, CORE, RIOXX guidelines, CrossRef, MetaShare, DataCite, DCAT, CMDI relevant metadata profiles etc.) – cf. OMTD Deliverable D5.1 - Interoperability Landscaping Report
  • 13.
  • 14.
  • 15. <corpusMetadataRecord> <metadataHeaderInfo> <metadataRecordIdentifier metadataIdentifierSchemeName="hdl">PIDtest</metadataRecordIdentifier> <metadataCreationDate>2016-07-29</metadataCreationDate> <collectedFrom> <repositoryName lang="en-us">OpenAIRE</repositoryName> <repositoryName lang="en-us">CORE</repositoryName> </collectedFrom> </metadataHeaderInfo> <corpusInfo> <resourceType>corpus</resourceType> <identificationInfo> <resourceName>Corpus of English articles in biomedicine from OpenAIRE and CORE</resourceName> <description lang="en-us">A corpus created automatically by the corpus building process in OpenMinTeD, consisting of 17987 articles related to biomedicine</description> <identifier resourceIdentifierSchemeName="hdl">temporary identifier</identifier> </identificationInfo> <contactEmail>user@omtd.com</contactEmail> <resourceCreationInfo> <resourceCreator>Smith, John</resourceCreator> </resourceCreationInfo>
  • 17. <componentMetadataRecord> <metadataHeaderInfo>…</metadataHeaderInfo> <componentInfo> <resourceType>component</resourceType> <identificationInfo> <resourceName lang="en-us">ILSP Feature-based multi-tiered POS Tagger</resourceName> <description lang="en-us">FBT part-of-speech tagger for Greek texts. </description> <resourceShortName lang="en-us">ilsp_fbt</resourceShortName> </identificationInfo> <contactEmail >test@omtd.com</contactEmail> <version>v1.0.0</version> <resourceCreator>ILSP team</resourceCreator> <componentType>morphologicalTagger</componentType> <componentDistributionMedium>webService</componentDistributionMedium> <accessURL>http://access.com</accessURL> <webServiceType>SOAP</webServiceType> <rightsInfo> <licence>nonStandardLicenceTerms</licence> <nonStandardLicenceName lang="en-us">terms of service</nonStandardLicenceName> <nonStandaradLicenceTermsURL>http://example.com</nonStandaradLicenceTermsURL> </rightsInfo>
  • 18. <inputContentResourceInfo> <resourceType>corpus</resourceType> <resourceType>document</resourceType> <languageTag>el</languageTag> <characterEncoding>UTF-8</characterEncoding> <mimeType>text/plain</mimeType> </inputContentResourceInfo> <outputResourceInfo> <resourceType>corpus</resourceType> <resourceType>document</resourceType> <languageTag>el</languageTag> <characterEncoding>UTF-8</characterEncoding> <mimeType>text/xml</mimeType> <dataFormatSpecific>xces; format-variant=ilsp</dataFormatSpecific> <typesystem>ILSP-typesystem</typesystem> <tagset>ILSP-POStagset</tagset> <annotationLevel>morphosyntacticAnnotation-bPosTagging</annotationLevel> </outputResourceInfo> <componentDependencies> <typesystem>ILSP-typesystem</typesystem> <tagset>ILSP-POStagset</tagset> </componentDependencies> </componentMetadataRecord>
  • 19. • obligatory: record what is necessary for intended purposes vs. ease to document, • e.g. language for scholarly publications but title and author??, format and subject of a document?? • recommended: features that can help the user or future uses or that users find useful but providers have not yet standardized, • e.g. documentation / help files, attribution, citation papers • optional: all remaining information related to the lifecycle of a resource • e.g. funding information (still: funding agencies are becoming more and more interested in it!), projects where the resources have been used and created outputs
  • 20. • organize the schema into semantically coherent elements • common to all types of resources (e.g. identification, licensing etc.) • per resource type • re-usable for more than one resource type but not globally applicable (e.g. subject classification) and • strictly applied to specific resource types (e.g. evaluation for s/w components)
  • 22. • identification & provenance of the metadata record • metadata record identifier • metadata creation date • identification of the resource • identifiers with identificationScheme (name/URI) • title & description (multilingual; English should be there but ?) • distribution & licensing/access • distribution medium/format (e.g. executable code, downloadable text etc.) • licence and/or rightsStatement (ongoing work) • licence text or URL (provided by system for standard licences) • contact information • either email or landing page • resource type (& subtype)
  • 23. abstract / full text typesystem title character encoding format language dependencies input / output content resourc e algorithm tagset typesystem language annotation level language classification tagset annotation level character encoding format size language character encoding format publisher / journal classification authors annotation resource typesystem tagset annotation resource language classification character encoding format size
  • 24. • relations between resources can be encoded • inside each metadata record (e.g. between publication & authors) • separately, from both metadata records (e.g. between component & model) • implementation issues for optionality and restrictions: uniformity of metadata records across sources vs. better treatment of restrictions via the registry service  which restrictions should be in the schema and which restrictions should be in a system built on top of the schema?
  • 25. • recommend and link to authority lists for properties • format: IANA list of media types BUT need for extensions! • language: ISO 639-3 vs. IETF BCP47 • subject classification: DDC, LCSH, EUROVOC, discipline-specific lists…  we cannot enforce one scheme, so we recommend their use and ask for reference to it; this is currently encoded as enumerations but link to external source is a better solution • create elements & values in attested gaps & where considered best for OMTD purposes • classification of components, lexical/conceptual resources etc. • annotation set of elements and values [ to be included in the output resources automatically via the platform] •  links to be provided to elements in other metadata schemas (DataCite, CrossRef, DCAT, etc.) (ongoing work)
  • 26. • adopt entire metadata schemas and registries for satellite entities • repositories & registries  openDOAR & re3data • journals  DOAJ BUT • persons  ORCID & SCOPUS id • organisations  ISNI & fundref & covered with own metadata elements
  • 27. • link to other resources or satellite entities via identifier (PID) or descriptive elements: recommend but allow for backup solutions when the identifier is not there • identifier preferably from an authority source, with reference to it: DOI for publications, DataCite for datasets & services, ORCID for persons, ISNI or fundef for organizations etc. • but allow for other identifiers too: "identifierSchemeName" & "identifierSchemeURL" • descriptive elements: title, full name, etc. • value system for elements • e.g. free text vs. controlled vocabularies • represented as enumeration • semantics of closed & open vocabularies • open vocabularies with the additional value "other" but … how can one add values and yet curate the vocabularies??
  • 28. • annotation set of elements: • set of elements and values that can be added independently as a block to each resource following the processing • information on s/w component(s), type of annotation, tagsets, annotation resources, annotators, format etc. • covering provenance requirements but also to be used as input for further processing
  • 29. • XSD schemas v1.0.0 & documentation: https://openminted.github.io/openminted-site/releases/omtd- share/1.0.0/html/index.html • Guidelines: on the way! • Conversions from existing descriptors (ongoing work)
  • 30. • not all information is available (e.g. licence, direct link to publication contents, language of metadata fields, subject etc.) • different approach between schemas (element vs. attribute) • lack of a common API approach (as OAI-PMH across repositories) • different mechanisms for flagging OA content • inconsistent provision of full text links (incl. in CrossRef TDM) • legal and technical issues around systematic full text aggregation from publishers (including via CrossRef TDM) • full text harvesting/crawling limits in place on publisher endpoints • lack of support for discovery of new content • lack of documentation on publisher systems
  • 31. • largely technical information • some non-technical information possible but seldom used (e.g. developer information in Maven) - why? • technical elements present but in many cases possible values not restricted (e.g. media-type or language) • "persistent identifier" e.g. in Maven is self-assigned and global uniqueness is not enforced but governed by best-practice in contrast to e.g. centrally assigned DOI- good or bad or tolerable?
  • 32. • the closest to OMTD-SHARE schema (for obvious reasons) • resource types converted: corpora, components, lexical/conceptual resources & models • main problems were the lack of persistent identifiers and the decisions taken for further standardization/normalisation