The EASTER project aims to test and evaluate a range of current automated subject metadata generation tools. It will do so using the Intute digital collection as a testbed, evaluating the tools' usefulness for both cataloguers and end-users. The project will develop a methodology for evaluating such tools and create an enhanced "gold standard" test collection. Initial candidate tools include Temis Categorizer, KEA, TextGarden, TerMine, and others. The tools will be evaluated on their ability to generate subject metadata for veterinary, visual arts and politics domains. Related projects on information extraction from archaeology reports will also inform EASTER's work.
FAIR data and model management for systems biology.FAIRDOM
Written and presented by Carole Goble (University of Manchester) as part of Intelligent Systems for Molecular Biology (ISMB), Dublin. July 10th - 14th 2015.
Being Reproducible: SSBSS Summer School 2017Carole Goble
Lecture 2:
Being Reproducible: Models, Research Objects and R* Brouhaha
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in more depth using the FAIRDOM Platform and its support for reproducible modelling. The talk will cover initiatives and technical issues, and raise social and cultural challenges.
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
FAIR Data, Operations and Model management for Systems Biology and Systems Medicine Projects given at 1st Conference of the European Association of Systems Medicine, 26-28 October 2016, Berlin. the FAIRDOM project is described.
FAIR data and model management for systems biology.FAIRDOM
Written and presented by Carole Goble (University of Manchester) as part of Intelligent Systems for Molecular Biology (ISMB), Dublin. July 10th - 14th 2015.
Being Reproducible: SSBSS Summer School 2017Carole Goble
Lecture 2:
Being Reproducible: Models, Research Objects and R* Brouhaha
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in more depth using the FAIRDOM Platform and its support for reproducible modelling. The talk will cover initiatives and technical issues, and raise social and cultural challenges.
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
FAIR Data, Operations and Model management for Systems Biology and Systems Medicine Projects given at 1st Conference of the European Association of Systems Medicine, 26-28 October 2016, Berlin. the FAIRDOM project is described.
Short talk on Research Object and their use for reproducibility and publishing in the Systems Biology Commons Platform FAIRDOMHub, and the underlying software SEEK.
Reproducible Research: how could Research Objects helpCarole Goble
Reproducible Research: how could Research Objects help, given at 21st Genomic Standards Consortium Meeting
Dates: May 20-23, 2019
https://press3.mcs.anl.gov/gensc/meetings/gsc21/
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
Plenary Lecture Presented at INCF Neuroinformatics 2019 https://www.neuroinformatics2019.org
Title: FAIRy stories: tales from building the FAIR Research Commons
Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any kind of Research Object is a mantra; a method; a meme; a myth; a mystery. For the past 15 years I have been working on FAIR in a range of projects and initiatives in the Life Sciences as we try to build the FAIR Research Commons. Some are top-down like the European Research Infrastructures ELIXIR, ISBE and IBISBA, and the NIH Data Commons. Some are bottom-up, supporting FAIR for investigator-led projects (FAIRDOM), biodiversity analytics (BioVel), and FAIR drug discovery (Open PHACTS, FAIRplus). Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. There are villains and heroes. Some have happy endings; all have morals.
No specimen left behind: Collections digitisation at the NHM, London*Vince Smith
Presentation on the Natural History Museum, London Digitisation Programme, given at the "Collections for the 21st Century" meeting in Gainesville, Florida, 5-6 May 2014
Improving the Management of Computational Models -- Invited talk at the EBIMartin Scharm
Improving the Management of Computational Models:
storage – retrieval & ranking – version control
More information and slides to download at http://sems.uni-rostock.de/2013/12/martin-visits-the-ebi/
Reproducible and citable data and models: an introduction.FAIRDOM
Prepared and presented by Carole Goble (University of Manchester), Wolfgang Mueller (HITS), Dagmar Waltermath (University of Rostock), at the Reproducible and Citable Data and Models Workshop, Warnemünde, Germany. September 14th - 16th 2015.
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording
of metadata for their interpretation.
The FAIR Guiding Principles for scientific data management and stewardship (http://www.nature.com/articles/sdata201618) has been an effective rallying-cry for EU and USA Research Infrastructures. FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has 8 years of experience of asset sharing and data infrastructure ranging across European programmes (SysMO and EraSysAPP ERANets), national initiatives (de.NBI, German Virtual Liver Network, UK SynBio centres) and PI's labs. It aims to support Systems and Synthetic Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges of and approaches to sharing, credit, citation and asset infrastructures in practice. I'll also highlight recent experiments in affecting sharing using behavioural interventions.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
Presented at COMBINE 2016, Newcastle, 19 September.
http://co.mbine.org/events/COMBINE_2016
FAIR Data and Model Management for Systems Biology(and SOPs too!)Carole Goble
MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015
FAIR Data and model management for Systems Biology
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Yes, data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. And the multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Data and model management for the Systems Biology community is a multi-faceted one including: the development and adoption appropriate community standards (and the navigation of the standards maze); the sustaining of international public archives capable of servicing quantitative biology; and the development of the necessary tools and know-how for researchers within their own institutes so that they can steward their assets in a sustainable, coherent and credited manner while minimizing burden and maximising personal benefit.
The FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has grown out of several efforts in European programmes (SysMO and EraSysAPP ERANets and the ISBE ESRFI) and national initiatives (de.NBI, German Virtual Liver Network, SystemsX, UK SynBio centres). It aims to support Systems Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges multi-scale biology presents.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
COMBINE 2019, EU-STANDS4PM, Heidelberg, Germany 18 July 2019
FAIR: Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any other kind of Research Object one can think of, is now a mantra; a method; a meme; a myth; a mystery. FAIR is about supporting and tracking the flow and availability of data across research organisations and the portability and sustainability of processing methods to enable transparent and reproducible results. All this is within the context of a bottom up society of collaborating (or burdened?) scientists, a top down collective of compliance-focused funders and policy makers and an in-the-middle posse of e-infrastructure providers.
Making the FAIR principles a reality is tricky. They are aspirations not standards. They are multi-dimensional and dependent on context such as the sensitivity and availability of the data and methods. We already see a jungle of projects, initiatives and programmes wrestling with the challenges. FAIR efforts have particularly focused on the “last mile” – “FAIRifying” destination community archive repositories and measuring their “compliance” to FAIR metrics (or less controversially “indicators”). But what about FAIR at the first mile, at source and how do we help Alice and Bob with their (secure) data management? If we tackle the FAIR first and last mile, what about the FAIR middle? What about FAIR beyond just data – like exchanging and reusing pipelines for precision medicine?
Since 2008 the FAIRDOM collaboration [1] has worked on FAIR asset management and the development of a FAIR asset Commons for multi-partner researcher projects [2], initially in the Systems Biology field. Since 2016 we have been working with the BioCompute Object Partnership [3] on standardising computational records of HTS precision medicine pipelines.
So, using our FAIRDOM and BioCompute Object binoculars let’s go on a FAIR safari! Let’s peruse the ecosystem, observe the different herds and reflect what where we are for FAIR personalised medicine.
References
[1] http://www.fair-dom.org
[2] http://www.fairdomhub.org
[3] http://www.biocomputeobject.org
Crediting informatics and data folks in life science teamsCarole Goble
Science Europe LEGS Committee: Career Pathways in Multidisciplinary Research: How to Assess the Contributions of Single Authors in Large Teams, 1-2 Dec 2015, Brussels
The People Behind Research Software crediting from the informatics, technical point of view
Comparing and matching archaeological excavation data for integration in onto...ariadnenetwork
Presentation by Anja Masur and Keith May
OAW ( Austrian Academy of Sciences).
OREA (Institute for Oriental and European Archaeology).
English Heritage;
University of South Wales
Full-day session on archaeological infrastructures and services at the 18th Cultural Heritage and New Technologies (CHNT) conference
Vienna, Austria
11th -13th November 2013
NSF Workshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel
FIndable, Accessible, Interoperable, Reusable Software and Data Citation: Europe, Research Objects, and BioSchemas.org
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
Talk presented at Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018, http://earlydetectionresearch.com/ in the Data Science session
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
Presented at Digital Life 2018, Bergen, March 2018. In the Trust and Accountability session.
In recent years we have seen a change in expectations for the management and availability of all the outcomes of research (models, data, SOPs, software etc) and for greater transparency and reproduciblity in the method of research. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for stewardship [1] have proved to be an effective rallying-cry for community groups and for policy makers.
The FAIRDOM Initiative (FAIR Data Models Operations, http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards and sensitivity to asset sharing and credit anxiety. Our aim is a FAIR Research Commons that blends together the doing of research with the communication of research. The Platform has been installed by over 30 labs/projects and our public, centrally hosted FAIRDOMHub [2] supports the outcomes of 90+ projects. We are proud to support projects in Norway’s Digital Life programme.
2018 is our 10th anniversary. Over the past decade we learned a lot about trust between researchers, between researchers and platform developers and curators and between both these groups and funders. We have experienced the Tragedy of the Commons but also seen shifts in attitudes.
In this talk we will use our experiences in FAIRDOM to explore the political, economic, social and technical, social practicalities of Trust.
[1] Wilkinson et al (2016) The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
[2] Wolstencroft, et al (2016) FAIRDOMHub: a repository and collaboration environment for sharing systems biology research Nucleic Acids Research, 45(D1): D404-D407. DOI: 10.1093/nar/gkw1032
Jana Parvanova, Vladimir Alexiev and Stanislav Kostadinov. In workshop Collaborative Annotations in Shared Environments: metadata, vocabularies and techniques in the Digital Humanities (DH-CASE 2013). Collocated with DocEng 2013. Florence, Italy, Sep 2013.
Short talk on Research Object and their use for reproducibility and publishing in the Systems Biology Commons Platform FAIRDOMHub, and the underlying software SEEK.
Reproducible Research: how could Research Objects helpCarole Goble
Reproducible Research: how could Research Objects help, given at 21st Genomic Standards Consortium Meeting
Dates: May 20-23, 2019
https://press3.mcs.anl.gov/gensc/meetings/gsc21/
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
Plenary Lecture Presented at INCF Neuroinformatics 2019 https://www.neuroinformatics2019.org
Title: FAIRy stories: tales from building the FAIR Research Commons
Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any kind of Research Object is a mantra; a method; a meme; a myth; a mystery. For the past 15 years I have been working on FAIR in a range of projects and initiatives in the Life Sciences as we try to build the FAIR Research Commons. Some are top-down like the European Research Infrastructures ELIXIR, ISBE and IBISBA, and the NIH Data Commons. Some are bottom-up, supporting FAIR for investigator-led projects (FAIRDOM), biodiversity analytics (BioVel), and FAIR drug discovery (Open PHACTS, FAIRplus). Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. There are villains and heroes. Some have happy endings; all have morals.
No specimen left behind: Collections digitisation at the NHM, London*Vince Smith
Presentation on the Natural History Museum, London Digitisation Programme, given at the "Collections for the 21st Century" meeting in Gainesville, Florida, 5-6 May 2014
Improving the Management of Computational Models -- Invited talk at the EBIMartin Scharm
Improving the Management of Computational Models:
storage – retrieval & ranking – version control
More information and slides to download at http://sems.uni-rostock.de/2013/12/martin-visits-the-ebi/
Reproducible and citable data and models: an introduction.FAIRDOM
Prepared and presented by Carole Goble (University of Manchester), Wolfgang Mueller (HITS), Dagmar Waltermath (University of Rostock), at the Reproducible and Citable Data and Models Workshop, Warnemünde, Germany. September 14th - 16th 2015.
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording
of metadata for their interpretation.
The FAIR Guiding Principles for scientific data management and stewardship (http://www.nature.com/articles/sdata201618) has been an effective rallying-cry for EU and USA Research Infrastructures. FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has 8 years of experience of asset sharing and data infrastructure ranging across European programmes (SysMO and EraSysAPP ERANets), national initiatives (de.NBI, German Virtual Liver Network, UK SynBio centres) and PI's labs. It aims to support Systems and Synthetic Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges of and approaches to sharing, credit, citation and asset infrastructures in practice. I'll also highlight recent experiments in affecting sharing using behavioural interventions.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
Presented at COMBINE 2016, Newcastle, 19 September.
http://co.mbine.org/events/COMBINE_2016
FAIR Data and Model Management for Systems Biology(and SOPs too!)Carole Goble
MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015
FAIR Data and model management for Systems Biology
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Yes, data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. And the multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Data and model management for the Systems Biology community is a multi-faceted one including: the development and adoption appropriate community standards (and the navigation of the standards maze); the sustaining of international public archives capable of servicing quantitative biology; and the development of the necessary tools and know-how for researchers within their own institutes so that they can steward their assets in a sustainable, coherent and credited manner while minimizing burden and maximising personal benefit.
The FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has grown out of several efforts in European programmes (SysMO and EraSysAPP ERANets and the ISBE ESRFI) and national initiatives (de.NBI, German Virtual Liver Network, SystemsX, UK SynBio centres). It aims to support Systems Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges multi-scale biology presents.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
COMBINE 2019, EU-STANDS4PM, Heidelberg, Germany 18 July 2019
FAIR: Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any other kind of Research Object one can think of, is now a mantra; a method; a meme; a myth; a mystery. FAIR is about supporting and tracking the flow and availability of data across research organisations and the portability and sustainability of processing methods to enable transparent and reproducible results. All this is within the context of a bottom up society of collaborating (or burdened?) scientists, a top down collective of compliance-focused funders and policy makers and an in-the-middle posse of e-infrastructure providers.
Making the FAIR principles a reality is tricky. They are aspirations not standards. They are multi-dimensional and dependent on context such as the sensitivity and availability of the data and methods. We already see a jungle of projects, initiatives and programmes wrestling with the challenges. FAIR efforts have particularly focused on the “last mile” – “FAIRifying” destination community archive repositories and measuring their “compliance” to FAIR metrics (or less controversially “indicators”). But what about FAIR at the first mile, at source and how do we help Alice and Bob with their (secure) data management? If we tackle the FAIR first and last mile, what about the FAIR middle? What about FAIR beyond just data – like exchanging and reusing pipelines for precision medicine?
Since 2008 the FAIRDOM collaboration [1] has worked on FAIR asset management and the development of a FAIR asset Commons for multi-partner researcher projects [2], initially in the Systems Biology field. Since 2016 we have been working with the BioCompute Object Partnership [3] on standardising computational records of HTS precision medicine pipelines.
So, using our FAIRDOM and BioCompute Object binoculars let’s go on a FAIR safari! Let’s peruse the ecosystem, observe the different herds and reflect what where we are for FAIR personalised medicine.
References
[1] http://www.fair-dom.org
[2] http://www.fairdomhub.org
[3] http://www.biocomputeobject.org
Crediting informatics and data folks in life science teamsCarole Goble
Science Europe LEGS Committee: Career Pathways in Multidisciplinary Research: How to Assess the Contributions of Single Authors in Large Teams, 1-2 Dec 2015, Brussels
The People Behind Research Software crediting from the informatics, technical point of view
Comparing and matching archaeological excavation data for integration in onto...ariadnenetwork
Presentation by Anja Masur and Keith May
OAW ( Austrian Academy of Sciences).
OREA (Institute for Oriental and European Archaeology).
English Heritage;
University of South Wales
Full-day session on archaeological infrastructures and services at the 18th Cultural Heritage and New Technologies (CHNT) conference
Vienna, Austria
11th -13th November 2013
NSF Workshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel
FIndable, Accessible, Interoperable, Reusable Software and Data Citation: Europe, Research Objects, and BioSchemas.org
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
Talk presented at Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018, http://earlydetectionresearch.com/ in the Data Science session
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
Presented at Digital Life 2018, Bergen, March 2018. In the Trust and Accountability session.
In recent years we have seen a change in expectations for the management and availability of all the outcomes of research (models, data, SOPs, software etc) and for greater transparency and reproduciblity in the method of research. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for stewardship [1] have proved to be an effective rallying-cry for community groups and for policy makers.
The FAIRDOM Initiative (FAIR Data Models Operations, http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards and sensitivity to asset sharing and credit anxiety. Our aim is a FAIR Research Commons that blends together the doing of research with the communication of research. The Platform has been installed by over 30 labs/projects and our public, centrally hosted FAIRDOMHub [2] supports the outcomes of 90+ projects. We are proud to support projects in Norway’s Digital Life programme.
2018 is our 10th anniversary. Over the past decade we learned a lot about trust between researchers, between researchers and platform developers and curators and between both these groups and funders. We have experienced the Tragedy of the Commons but also seen shifts in attitudes.
In this talk we will use our experiences in FAIRDOM to explore the political, economic, social and technical, social practicalities of Trust.
[1] Wilkinson et al (2016) The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
[2] Wolstencroft, et al (2016) FAIRDOMHub: a repository and collaboration environment for sharing systems biology research Nucleic Acids Research, 45(D1): D404-D407. DOI: 10.1093/nar/gkw1032
Jana Parvanova, Vladimir Alexiev and Stanislav Kostadinov. In workshop Collaborative Annotations in Shared Environments: metadata, vocabularies and techniques in the Digital Humanities (DH-CASE 2013). Collocated with DocEng 2013. Florence, Italy, Sep 2013.
Qualitative AI : Hoo-ha or Step-Change? CAQDAS webinarChristina Silver
Slides from the CAQDAS Networking Project's webinar on 1st September 2023: Artificial Intelligence in Qualitative Data Analysis - Hoo-ha or Step-Change?
During 2023 there’s been increasing discussion about the use of artificial intelligence (AI) in qualitative research, spurred by widespread access to generative-AI technologies such as ChatGPT developed by OpenAI.
In this webinar Christina first recounts the history of AI in qualitative data analysis, outlining developments that far pre-date the current upsurge; including Qualrus, Discovertext, WordStat and QDA Miner, and Leximancer.
She’ll then outline how generative-AI is being used in qualitative data analysis at the moment, discussing three uses: chat bots alongside other analytic tools; integrations of OpenAI technology into already established Qualitative Software; and the rise of new generative-AI applications designed specifically for qualitative data analysis tasks.
Christina will open discussion about the implications of these developments for the practice of qualitative research. When are these tools appropriate? What do we need to know about them? What are the ethics of using them? What should we be cautious and excited about? How can the qualitative community shape their development?
Whether you’re an advocate of the use of AI in qualitative data analysis or a sceptic, these technologies are here, they have already impacted the field of qualitative research and they will continue to do so. Join Christina to be part of the conversation, find out what’s happening, share your experiences and experimentations, your fears and hopes. Let the developers know how you want to see these technologies harnessed.
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...faflrt
ALA/FAFLRT Workshop on Open Archival Information Service (OAIS). Presented by Robin Dale, RLG. Sponsored by ALA Federal and Armed Forces Libraries Roundtable (FAFLRT). Presented on June 16, 2001 at the ALA Annual Conference.
During the last decade several projects with respect to digital preservation have been funded in Europe by the European Commission and have delivered interesting results. Such projects include community building projects or coordination actions such as ERPANET, Delos2, and Digital Preservation Europe (DPE), but also research projects such as Planets, CASPAR, Shaman, Protage. In December 2009 a new call for digital preservation will be closed, so new projects may start in 2010.
One result of all these projects and all the work done is that there is a growing community involved, more organizations and people are aware of the issues, definitely has enhanced the collaboration amongst institutions and universities in Europe, and with the last research projects some potential practical solutions are emerging that could be applied by institutions. How it all will work out in the end is still one of the big questions. For one thing it may have helped to create a good foundation for further collaboration, perhaps even without funding from the European Commission.
This presentation will provide a brief overview of the main results of some of these projects, especially Planets, and what issues they try to resolve, and a brief outlook on possible future developments.
Presentation about the collaboration between ADAPT and the Ordnance Survey Ireland at Linked Data Seminar -- Culture, Base Registries & Visualisations held in Amsterdam, The Netherlands on the 2nd of December 2016
Amit Sheth with TK Prasad, "Semantic Technologies for Big Science and Astrophysics", Invited Plenary Presentation, at Earthcube Solar-Terrestrial End-User Workshop, NJIT, Newark, NJ, August 13, 2014.
Like many other fields of Big Science, Astrophysics and Solar Physics deal with the challenges of Big Data, including Volume, Variety, Velocity, and Veracity. There is already significant work on handling volume related challenges, including the use of high performance computing. In this talk, we will mainly focus on other challenges from the perspective of collaborative sharing and reuse of broad variety of data created by multiple stakeholders, large and small, along with tools that offer semantic variants of search, browsing, integration and discovery capabilities. We will borrow examples of tools and capabilities from state of the art work in supporting physicists (including astrophysicists) [1], life sciences [2], material sciences [3], and describe the role of semantics and semantic technologies that make these capabilities possible or easier to realize. This applied and practice oriented talk will complement more vision oriented counterparts [4].
[1] Science Web-based Interactive Semantic Environment: http://sciencewise.info/
[2] NCBO Bioportal: http://bioportal.bioontology.org/ , Kno.e.sis’s work on Semantic Web for Healthcare and Life Sciences: http://knoesis.org/amit/hcls
[3] MaterialWays (a Materials Genome Initiative related project): http://wiki.knoesis.org/index.php/MaterialWays
[4] From Big Data to Smart Data: http://wiki.knoesis.org/index.php/Smart_Data
Integrating a Domain Ontology Development Environment and an Ontology Search ...Takeshi Morita
In order to reduce the cost of building domain ontologies manually, in this paper, we propose a method and a tool named DODDLE-OWL for domain ontology construction reusing texts and existing ontologies extracted by an ontology search engine: Swoogle. In the experimental evaluation, we applied the method to a particular field of law and evaluated the acquired ontologies.
The project re3data.org–Registry of Research Data Repositories–has begun to index research data repositories in 2012 and offers researchers, funding organizations, libraries and publishers an overview of the heterogeneous research data repository landscape.
Semantic Web technologies, both those envisaged and those already realised, have the potential to benefit domains where issues such as volume, complexity and heterogeneity can overcome traditional techniques. Sensor networks are one such area where the application of semantics is indicated by scale, complexity, and the need to integrate over heterogeneous standards, sensors and systems for multiple purposes and multiple disciplines.
The Semantic Sensor Networks W3C Incubator is an international initiative to develop standards for sharing information collected by sensors and sensor networks over the Web, including an ontology for different types of sensing devices and their observations, and new approaches for the semantic markup of sensor descriptions and services that support sensor data exchange and sensor network management.
Kerry will describe the ongoing effort to increase the quality and reduce the cost of capturing environmental data, to address the growing demand for information about the environmental systems that support Australia’s agricultural, resource and process-based industries.
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
The process of classifying scholarly outputs is crucial to ensure timely access to knowledge. However, this process is typically carried out manually by expert editors, leading to high costs and slow throughput. In this paper we present Smart Topic Miner (STM), a novel solution which uses semantic web technologies to classify scholarly publications on the basis of a very large automatically generated ontology of research areas. STM was developed to support the Springer Nature Computer Science editorial team in classifying proceedings in the LNCS family. It analyses in real time a set of publications provided by an editor and produces a structured set of topics and a number of Springer Nature classification tags, which best characterise the given input. In this paper we present the architecture of the system and report on an evaluation study conducted with a team of Springer Nature editors. The results of the evaluation, which showed that STM classifies publications with a high degree of accuracy, are very encouraging and as a result we are currently discussing the required next steps to ensure large-scale deployment within the company.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
1. EASTER
Evaluating Automated Subject Tools
for Enhancing Retrieval
Douglas Tudhope
Hypermedia Research Unit
University of Glamorgan
JISC Automatic Metadata Generation Meeting, London, May 25, 2010
2. Background
• EASTER is an 18-month JISC project funded under the Information
Environment Programme 2009-11.
• Started April 2009 and involves eight institutional partners
• Aim is to test and evaluate a range of current tools for automated
subject metadata generation
• Anticipated outcomes:
– better understanding of limitations and what possible
– recommendations for services employing subject metadata in JISC community
3. Rationale – problems, issues, relevance
• EASTER investigates the creation and enrichment of subject
metadata using existing automated tools.
• Subject metadata are the most important in resource discovery, yet
most expensive to produce manually. In addition, they are more
difficult to generate automatically compared to formal metadata
such as file type, title, etc. Wide uses in retrieval and NLP tools.
• Due to the high cost of evaluation, automated subject metadata
tools are rarely tested in live environments of use.
• Challenge facing digital collections, institutional repositories, and
aggregators of how to provide high quality subject metadata at
reasonable costs.
4. Intute testbed
• Test-bed is Intute http://www.intute.ac.uk
- a collection of websites (mostly)
However results intended to be generally applicable
• Tools for automated subject metadata generation
will be tested in two contexts:
Intute cataloguers in the cataloguing workflow;
end-users of Intute who search for information
• Task-based end-user retrieval study will examine contribution of
automatically assigned terms and manually assigned terms
5. Methodology
• A methodology for evaluating such tools is intended as a significant
project outcome/contribution
• Low reliability rates between cataloguers and different times of
indexing is a recognised problem
• EASTER methodology includes creating an enhanced ‘gold
standard’ test collection by careful manual cataloguing and expert
review by cataloguers and users. Provision for consideration of
automatic indexing output within enhanced gold standard in
methodology.
6. Candidate Tools
Initial candidate tools (a subset will be selected after review)
1) Temis Categorizer (French SME – inhouse)
2) KEA -- new version Maui (Waikato)
3) TextGarden
4) TerMine (NACTEM)
5) KnowLib’s automated classifier (Lund)
6) Scorpion (OCLC)
7) iVia project’s libiViaClassification (UC Riverside)
7. Candidate Tools
Initial candidate tools (a subset will be selected after review)
1) Temis Categorizer (machine learning, classification)
2) KEA (http://www.nzdl.org/Kea/) -- new version Maui (indexing)
3) TextGarden (http://kt.ijs.si/Dunja/textgarden/)
4) TerMine (http://www.nactem.ac.uk/software/termine/) (noun phrase)
5) KnowLib’s automated classifier (classification)
(http://www.it.lth.se/knowlib/auto.htm)
6) Scorpion
(http://www.oclc.org/research/software/scorpion/default.htm)
7) iVia project’s libiViaClassification
(http://ivia.ucr.edu/manuals/stable/libiViaClassification/5.4.0/)
8. Progress
• Distinguish 3 subject domains associated with different thesauri
• VETINERARY - CAB Thesaurus
• VISUAL ARTS - AAT
• POLITICS - HASSET, (IBSS?)
• KEA/Maui thesauri and training set
• AutoClass thesauri – need to consider main classes to classify
• TERMINE none
• TEMIS thesauri and training set depending on mode
(IPR of thesauri for commercial use an issue)
• Conversion of thesauri to SKOS format underway
• Web crawler for EASTER purposes implemented
9. Lessons learned
Preliminary stages – provisional general observations
• Subject metadata generation tools typically complex layered
software. Require maintenance to stay current. Installation may not
be trivial. Resource implications.
• General subject metadata generation tools often require tuning and
adaptation for different contexts and subject domains?
Resource implications.
• Subject metadata generation for what purpose? Classification,
indexing, annotation associated with different use cases.
Eg browsing and search require different metadata for best results.
An individual tool may not deliver all use cases.
• Possibilities for pipelining different approaches (tools) in sequence
10. STAR/STELLAR Projects also relevant
Information Extraction from archaeology grey literature (AHRC)
‘Rich’, semantic indexing of Archaeology fieldwork reports (ADS
OASIS Grey Literature) with respect to the English Heritage
extension of the CRM Conceptual Reference Model (Ontology),
making use of EH thesauri/glossaries and the GATE NLP tool.
Transforms GATE XML annotations to RDF triples conformant to
conceptual model, allowing cross search with datasets.
In progress
Web service interface planned to NLP semantic indexing
STAR terminology services (based on SKOS vocabularies)
JavaScript widgets browser neutral
11. STAR/STELLAR Projects also relevant
Information Extraction from archaeology grey literature (AHRC)
Archaeology domain specific but investigating generalisation to
cultural heritage more generally
eg classical art history domain (with OUCS)
STELLAR (AHRC) investigates generalising data mapping tool
and producing linked data (with ADS)
http://hypermedia.research.glam.ac.uk/kos/star/
http://hypermedia.research.glam.ac.uk/kos/stellar
12. Grey Literature Information Extraction
(Andreas Vlachidis)
• Looking to extract
CRM-EH period,
context, find,
sample entities
• Aim to cross
search with
archaeology
datasets