Triples for the People (Scientists): Liberating biological knowledge with t...Michel Dumontier
The Semantic Web is an emerging web of knowledge. It provides the basis upon which we can publish, share and link data, and perhaps more saliently, to use computers to reason about increasingly complex information using background knowledge. From the dream to using triples as a currency to pay for it, this talk will illustrate the application of Semantic Web technologies for biological knowledge discovery while touching on issues in knowledge representation, RDFizing, large scale data integration and convergence with semantic web services.
Reproducible and citable data and models: an introduction.FAIRDOM
Prepared and presented by Carole Goble (University of Manchester), Wolfgang Mueller (HITS), Dagmar Waltermath (University of Rostock), at the Reproducible and Citable Data and Models Workshop, Warnemünde, Germany. September 14th - 16th 2015.
Metadata and Semantics Research Conference, Manchester, UK 2015
Research Objects: why, what and how,
In practice the exchange, reuse and reproduction of scientific experiments is hard, dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: codes fork, data is updated, algorithms are revised, workflows break, service updates are released. Neither should they be viewed just as second-class artifacts tethered to publications, but the focus of research outcomes in their own right: articles clustered around datasets, methods with citation profiles. Many funders and publishers have come to acknowledge this, moving to data sharing policies and provisioning e-infrastructure platforms. Many researchers recognise the importance of working with Research Objects. The term has become widespread. However. What is a Research Object? How do you mint one, exchange one, build a platform to support one, curate one? How do we introduce them in a lightweight way that platform developers can migrate to? What is the practical impact of a Research Object Commons on training, stewardship, scholarship, sharing? How do we address the scholarly and technological debt of making and maintaining Research Objects? Are there any examples
I’ll present our practical experiences of the why, what and how of Research Objects.
The document discusses ways for libraries to engage academic scientists through technology and collaboration. It proposes several ideas like using Web 2.0 tools, customizing library catalogs and search tools, bringing library resources to external websites, and supporting scientists in their workspaces both online and in physical libraries. The goal is to adapt libraries as information needs change, meet scientists where they are, and encourage trust and opportunities for collaboration.
New ways to communicate in science: perspectives from biodiversity researchVince Smith
A presentation given at the co-ordination workshop on Open Access to Scientific Information on Wednesday 4th May 2011 at the EU DG Information Society & Media, Avenue de Beaulieu 25, Brussels.
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
A talk given at the Geological Society of London, UK on 2016/03/09 as part of the Lyell meeting on Palaeoinformatics. http://www.geolsoc.org.uk/lyell16 #lyell16
Reproducibility, Research Objects and Reality, Leiden 2016Carole Goble
Presented at the Leiden Bioscience Lecture, 24 November 2016, Reproducibility, Research Objects and Reality
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. It all sounds very laudable and straightforward. BUT…..
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange
In this talk I will explore these issues in data-driven computational life sciences through the examples and stories from initiatives I am involved, and Leiden is involved in too including:
· FAIRDOM which has built a Commons for Systems and Synthetic Biology projects, with an emphasis on standards smuggled in by stealth and efforts to affecting sharing practices using behavioural interventions
· ELIXIR, the EU Research Data Infrastructure, and its efforts to exchange workflows
· Bioschemas.org, an ELIXIR-NIH-Google effort to support the finding of assets.
Communicating Use and Reuse in the Digital Collection Interface by L. Kelly F...Europeana
The document discusses how digital collections identify open access content through open content identifiers. It analyzes case studies of four museum collections - the British Library, J. Paul Getty Museum, Walters Art Museum, and Metropolitan Museum of Art - to see where and how they convey licensing terms and what information the identifiers lead to. The identifiers were typically shown below or above image descriptions and always linked to supplemental reuse information. Open content identifiers connect policy, infrastructure, and users regarding openly licensed digital collection content.
Triples for the People (Scientists): Liberating biological knowledge with t...Michel Dumontier
The Semantic Web is an emerging web of knowledge. It provides the basis upon which we can publish, share and link data, and perhaps more saliently, to use computers to reason about increasingly complex information using background knowledge. From the dream to using triples as a currency to pay for it, this talk will illustrate the application of Semantic Web technologies for biological knowledge discovery while touching on issues in knowledge representation, RDFizing, large scale data integration and convergence with semantic web services.
Reproducible and citable data and models: an introduction.FAIRDOM
Prepared and presented by Carole Goble (University of Manchester), Wolfgang Mueller (HITS), Dagmar Waltermath (University of Rostock), at the Reproducible and Citable Data and Models Workshop, Warnemünde, Germany. September 14th - 16th 2015.
Metadata and Semantics Research Conference, Manchester, UK 2015
Research Objects: why, what and how,
In practice the exchange, reuse and reproduction of scientific experiments is hard, dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: codes fork, data is updated, algorithms are revised, workflows break, service updates are released. Neither should they be viewed just as second-class artifacts tethered to publications, but the focus of research outcomes in their own right: articles clustered around datasets, methods with citation profiles. Many funders and publishers have come to acknowledge this, moving to data sharing policies and provisioning e-infrastructure platforms. Many researchers recognise the importance of working with Research Objects. The term has become widespread. However. What is a Research Object? How do you mint one, exchange one, build a platform to support one, curate one? How do we introduce them in a lightweight way that platform developers can migrate to? What is the practical impact of a Research Object Commons on training, stewardship, scholarship, sharing? How do we address the scholarly and technological debt of making and maintaining Research Objects? Are there any examples
I’ll present our practical experiences of the why, what and how of Research Objects.
The document discusses ways for libraries to engage academic scientists through technology and collaboration. It proposes several ideas like using Web 2.0 tools, customizing library catalogs and search tools, bringing library resources to external websites, and supporting scientists in their workspaces both online and in physical libraries. The goal is to adapt libraries as information needs change, meet scientists where they are, and encourage trust and opportunities for collaboration.
New ways to communicate in science: perspectives from biodiversity researchVince Smith
A presentation given at the co-ordination workshop on Open Access to Scientific Information on Wednesday 4th May 2011 at the EU DG Information Society & Media, Avenue de Beaulieu 25, Brussels.
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
A talk given at the Geological Society of London, UK on 2016/03/09 as part of the Lyell meeting on Palaeoinformatics. http://www.geolsoc.org.uk/lyell16 #lyell16
Reproducibility, Research Objects and Reality, Leiden 2016Carole Goble
Presented at the Leiden Bioscience Lecture, 24 November 2016, Reproducibility, Research Objects and Reality
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. It all sounds very laudable and straightforward. BUT…..
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange
In this talk I will explore these issues in data-driven computational life sciences through the examples and stories from initiatives I am involved, and Leiden is involved in too including:
· FAIRDOM which has built a Commons for Systems and Synthetic Biology projects, with an emphasis on standards smuggled in by stealth and efforts to affecting sharing practices using behavioural interventions
· ELIXIR, the EU Research Data Infrastructure, and its efforts to exchange workflows
· Bioschemas.org, an ELIXIR-NIH-Google effort to support the finding of assets.
Communicating Use and Reuse in the Digital Collection Interface by L. Kelly F...Europeana
The document discusses how digital collections identify open access content through open content identifiers. It analyzes case studies of four museum collections - the British Library, J. Paul Getty Museum, Walters Art Museum, and Metropolitan Museum of Art - to see where and how they convey licensing terms and what information the identifiers lead to. The identifiers were typically shown below or above image descriptions and always linked to supplemental reuse information. Open content identifiers connect policy, infrastructure, and users regarding openly licensed digital collection content.
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
Prepared and presented by Jo McEntyre (EMBL_EBI) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
Open Research Data: Licensing | Standards | FutureRoss Mounce
This document provides an overview of open research data, including definitions, licensing, standards, and history. It defines open data as data that anyone can freely access, use, modify, and share with few restrictions. For data to be truly open, it recommends using a CC0 public domain waiver or an attribution-only license. It discusses issues with non-commercial and no derivatives restrictions. The document also provides guidance on technical aspects like recommended file formats and standards. It briefly summarizes the history of data sharing, from centralized data centers to online supplementary data to emerging data paper journals. The key messages are that data should be FAIR (Findable, Accessible, Interoperable, Reusable) and that open data benefits both
This document discusses using Wikidata as a central repository for chemistry data currently found in Wikipedia infoboxes. It notes issues with the current approach and outlines Wikidata's data model and features that make it suitable for this purpose. As an example, it describes how gene Wiki info boxes have been migrated to Wikidata. It provides guidance on resolving issues with isomers and outlines efforts to improve data quality for chemical compounds in Wikidata.
Presentation at the Online Information Conference, London 20th November 2013. Taking a look at the drivers behind the emerging Web of Data and how libraries need to be and can be part of it in the future.
1) Linked data is a set of best practices for publishing structured data on the web so that both humans and machines can access and link related data across different sources. It realizes Tim Berners-Lee's vision of a Semantic Web.
2) The key principles of linked data are using URIs to identify things, providing HTTP URIs so that URIs can be looked up, and including links to other URIs to allow for discovery of related data on the web.
3) By following these principles, data sources on the web have been connected into a large Web of Data, with over 31 billion RDF triples organized into different domains such as media, geography, life sciences, and libraries. This enables new applications for data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
The document discusses making scientific content more accessible and useful through text and data mining. It notes that the global research community generates over 1.5 million new articles per year but many are never read or cited. Emerging solutions like machine reading, understanding and predicting can help structure and mine textual data to extract meaningful insights. The OpenMinted project aims to establish an open text and data mining platform and infrastructure for researchers to collaboratively work with scientific sources. It outlines challenges around content, services and processing as well as main routes to make content more accessible through metadata, transfer protocols and licensing. The project involves various partners and use cases across domains like scholarly communication, life sciences, agriculture and social sciences.
This document summarizes a session on contributions of Web 2.0 technologies. It discusses tagging, social bookmarking sites like Delicious and LibraryThing, adding tag clouds to library catalogs, and topic maps. Specific topics covered include how tagging differs from controlled subject headings, how Delicious and LibraryThing allow saving and sharing bookmarks and book lists, using tag clouds to represent tags visually, and how topic maps standardize representing and sharing knowledge through topics, associations, and occurrences.
This document summarizes the state of open research data by outlining its evolution over time. It begins with centralized data centers in the 1960s and progresses to more collaborative models of data sharing through community agreements and online supplementary materials. The benefits of open data are discussed, including increased reproducibility and citation advantages for authors who share. While open data is ideal, achieving 3-star open standards according to the 5 star scheme is currently realistic. The future may bring stricter funding and publishing requirements to encourage more widespread data sharing.
The document discusses recommendations for research data and the European Open Science Cloud (EOSC). It promotes making data FAIR (Findable, Accessible, Interusable, and Reusable) according to the FAIR guiding principles. The EOSC aims to provide a single access point for managing and analyzing research data across disciplines through three layers - a data layer, service layer, and governance layer. The EOSC seeks to enable high performance computing, data fusion across disciplines, big data analytics, and privacy protection by leveraging Member State investments and ensuring legacy and sustainability of data through bottom-up governance.
The document discusses updates to the PRIDE Cluster project. PRIDE Cluster analyzes mass spectrometry proteomics data stored in the PRIDE database by clustering peptide spectra. The latest implementation clustered over 256 million spectra using Apache Hadoop. This resulted in 28 million clusters, including clusters with inconsistent identifications, clusters linking identified and unidentified spectra, and large clusters of consistently unidentified spectra that could help identify new peptides and post-translational modifications. The PRIDE Cluster provides a public resource for data mining the large collection of proteomics datasets in PRIDE.
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...Juan Antonio Vizcaino
The document discusses PRIDE and ProteomeXchange, which are resources that support the deposition of proteomics data to public repositories. PRIDE stores mass spectrometry-based proteomics data, and is one of the repositories that is part of ProteomeXchange, a framework that allows standard submission of proteomics data between major repositories. The document outlines the cultural change in proteomics towards public data sharing, and provides information on submitting proteomics data to PRIDE and accessing data deposited in PRIDE and ProteomeXchange.
Presentation given at <a href="http://www.jisc.ac.uk/whatwedo/themes/access_management/federation/federation_events/programmtgjune08.aspx">JISC Identity Management: Future Directions Day</a>, 30 June 2008
The document discusses the Names Project, which aims to create a name authority service for UK institutional repositories. It provides background on institutional repositories in the UK and the scope and goals of the Names Project prototype. The prototype involves building a database based on the Functional Requirements for Authority Records (FRAD) data model and creating records for UK institutions and individuals. It will allow individuals to claim and update their data and provide interfaces for repositories and other services to query the database and help users enter consistent metadata across repositories.
FAIR Data and Model Management for Systems Biology(and SOPs too!)Carole Goble
MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015
FAIR Data and model management for Systems Biology
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Yes, data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. And the multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Data and model management for the Systems Biology community is a multi-faceted one including: the development and adoption appropriate community standards (and the navigation of the standards maze); the sustaining of international public archives capable of servicing quantitative biology; and the development of the necessary tools and know-how for researchers within their own institutes so that they can steward their assets in a sustainable, coherent and credited manner while minimizing burden and maximising personal benefit.
The FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has grown out of several efforts in European programmes (SysMO and EraSysAPP ERANets and the ISBE ESRFI) and national initiatives (de.NBI, German Virtual Liver Network, SystemsX, UK SynBio centres). It aims to support Systems Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges multi-scale biology presents.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
Sharing re-usable phylogenetic data: we're not there yetRoss Mounce
Ross Mounce discusses challenges with sharing phylogenetic data from published studies. Only a small percentage of studies archive their data, and researchers are often unwilling to share data upon request. Mounce developed tools to extract and reformat phylogenetic data from PDFs to make it more accessible and reusable. He received funding to continue this work and develop software to unlock and open phylogenetic literature data.
Linked Open Data in Libraries Archives & MuseumsJon Voss
The document discusses the growing Linked Open Data (LOD) movement in libraries, archives, and museums (LODLAM). It notes that LODLAM allows these institutions to explore data interoperability both within the cultural sector and more broadly on the web. The document outlines several outcomes of a LODLAM summit, including outreach, education, developing use cases, and examining issues around copyright and licensing of open data. Examples are provided of institutions that have published bibliographic and other cultural data using open licenses.
Challenges in Enabling Mixed Media Scholarly Research with Multi-Media Data i...roelandordelman.nl
Presentation at the Digital Humanities 2018 Conference, Mexico City, on the development of the Media Suite, an online research environment that facilitates scholarly research using large multimedia collections maintained at archives, libraries and knowledge institutions. The Media Suite unlocks the data on the collection level, item level, and segment level, provides tools that are aligned with the scholarly primitives (discovery, annotation, comparison, linking), and has a 'workspace' for storing personal mixed media collections and annotations, and to do advanced analysis using Jupyter Notebooks and NLP tools.
See the notes for the narrative that goes with the slides.
The document outlines an agenda for a training session on Scratchpads, a website platform for taxonomists. The agenda includes introductions, an overview presentation of Scratchpads and its features, and training course options on basic and advanced use of the platform. The document also provides background on the goals of Scratchpads to enable taxonomy research and publication and to help inventory the world's species.
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
Prepared and presented by Jo McEntyre (EMBL_EBI) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
Open Research Data: Licensing | Standards | FutureRoss Mounce
This document provides an overview of open research data, including definitions, licensing, standards, and history. It defines open data as data that anyone can freely access, use, modify, and share with few restrictions. For data to be truly open, it recommends using a CC0 public domain waiver or an attribution-only license. It discusses issues with non-commercial and no derivatives restrictions. The document also provides guidance on technical aspects like recommended file formats and standards. It briefly summarizes the history of data sharing, from centralized data centers to online supplementary data to emerging data paper journals. The key messages are that data should be FAIR (Findable, Accessible, Interoperable, Reusable) and that open data benefits both
This document discusses using Wikidata as a central repository for chemistry data currently found in Wikipedia infoboxes. It notes issues with the current approach and outlines Wikidata's data model and features that make it suitable for this purpose. As an example, it describes how gene Wiki info boxes have been migrated to Wikidata. It provides guidance on resolving issues with isomers and outlines efforts to improve data quality for chemical compounds in Wikidata.
Presentation at the Online Information Conference, London 20th November 2013. Taking a look at the drivers behind the emerging Web of Data and how libraries need to be and can be part of it in the future.
1) Linked data is a set of best practices for publishing structured data on the web so that both humans and machines can access and link related data across different sources. It realizes Tim Berners-Lee's vision of a Semantic Web.
2) The key principles of linked data are using URIs to identify things, providing HTTP URIs so that URIs can be looked up, and including links to other URIs to allow for discovery of related data on the web.
3) By following these principles, data sources on the web have been connected into a large Web of Data, with over 31 billion RDF triples organized into different domains such as media, geography, life sciences, and libraries. This enables new applications for data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
The document discusses making scientific content more accessible and useful through text and data mining. It notes that the global research community generates over 1.5 million new articles per year but many are never read or cited. Emerging solutions like machine reading, understanding and predicting can help structure and mine textual data to extract meaningful insights. The OpenMinted project aims to establish an open text and data mining platform and infrastructure for researchers to collaboratively work with scientific sources. It outlines challenges around content, services and processing as well as main routes to make content more accessible through metadata, transfer protocols and licensing. The project involves various partners and use cases across domains like scholarly communication, life sciences, agriculture and social sciences.
This document summarizes a session on contributions of Web 2.0 technologies. It discusses tagging, social bookmarking sites like Delicious and LibraryThing, adding tag clouds to library catalogs, and topic maps. Specific topics covered include how tagging differs from controlled subject headings, how Delicious and LibraryThing allow saving and sharing bookmarks and book lists, using tag clouds to represent tags visually, and how topic maps standardize representing and sharing knowledge through topics, associations, and occurrences.
This document summarizes the state of open research data by outlining its evolution over time. It begins with centralized data centers in the 1960s and progresses to more collaborative models of data sharing through community agreements and online supplementary materials. The benefits of open data are discussed, including increased reproducibility and citation advantages for authors who share. While open data is ideal, achieving 3-star open standards according to the 5 star scheme is currently realistic. The future may bring stricter funding and publishing requirements to encourage more widespread data sharing.
The document discusses recommendations for research data and the European Open Science Cloud (EOSC). It promotes making data FAIR (Findable, Accessible, Interusable, and Reusable) according to the FAIR guiding principles. The EOSC aims to provide a single access point for managing and analyzing research data across disciplines through three layers - a data layer, service layer, and governance layer. The EOSC seeks to enable high performance computing, data fusion across disciplines, big data analytics, and privacy protection by leveraging Member State investments and ensuring legacy and sustainability of data through bottom-up governance.
The document discusses updates to the PRIDE Cluster project. PRIDE Cluster analyzes mass spectrometry proteomics data stored in the PRIDE database by clustering peptide spectra. The latest implementation clustered over 256 million spectra using Apache Hadoop. This resulted in 28 million clusters, including clusters with inconsistent identifications, clusters linking identified and unidentified spectra, and large clusters of consistently unidentified spectra that could help identify new peptides and post-translational modifications. The PRIDE Cluster provides a public resource for data mining the large collection of proteomics datasets in PRIDE.
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...Juan Antonio Vizcaino
The document discusses PRIDE and ProteomeXchange, which are resources that support the deposition of proteomics data to public repositories. PRIDE stores mass spectrometry-based proteomics data, and is one of the repositories that is part of ProteomeXchange, a framework that allows standard submission of proteomics data between major repositories. The document outlines the cultural change in proteomics towards public data sharing, and provides information on submitting proteomics data to PRIDE and accessing data deposited in PRIDE and ProteomeXchange.
Presentation given at <a href="http://www.jisc.ac.uk/whatwedo/themes/access_management/federation/federation_events/programmtgjune08.aspx">JISC Identity Management: Future Directions Day</a>, 30 June 2008
The document discusses the Names Project, which aims to create a name authority service for UK institutional repositories. It provides background on institutional repositories in the UK and the scope and goals of the Names Project prototype. The prototype involves building a database based on the Functional Requirements for Authority Records (FRAD) data model and creating records for UK institutions and individuals. It will allow individuals to claim and update their data and provide interfaces for repositories and other services to query the database and help users enter consistent metadata across repositories.
FAIR Data and Model Management for Systems Biology(and SOPs too!)Carole Goble
MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015
FAIR Data and model management for Systems Biology
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Yes, data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. And the multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Data and model management for the Systems Biology community is a multi-faceted one including: the development and adoption appropriate community standards (and the navigation of the standards maze); the sustaining of international public archives capable of servicing quantitative biology; and the development of the necessary tools and know-how for researchers within their own institutes so that they can steward their assets in a sustainable, coherent and credited manner while minimizing burden and maximising personal benefit.
The FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has grown out of several efforts in European programmes (SysMO and EraSysAPP ERANets and the ISBE ESRFI) and national initiatives (de.NBI, German Virtual Liver Network, SystemsX, UK SynBio centres). It aims to support Systems Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges multi-scale biology presents.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
Sharing re-usable phylogenetic data: we're not there yetRoss Mounce
Ross Mounce discusses challenges with sharing phylogenetic data from published studies. Only a small percentage of studies archive their data, and researchers are often unwilling to share data upon request. Mounce developed tools to extract and reformat phylogenetic data from PDFs to make it more accessible and reusable. He received funding to continue this work and develop software to unlock and open phylogenetic literature data.
Linked Open Data in Libraries Archives & MuseumsJon Voss
The document discusses the growing Linked Open Data (LOD) movement in libraries, archives, and museums (LODLAM). It notes that LODLAM allows these institutions to explore data interoperability both within the cultural sector and more broadly on the web. The document outlines several outcomes of a LODLAM summit, including outreach, education, developing use cases, and examining issues around copyright and licensing of open data. Examples are provided of institutions that have published bibliographic and other cultural data using open licenses.
Challenges in Enabling Mixed Media Scholarly Research with Multi-Media Data i...roelandordelman.nl
Presentation at the Digital Humanities 2018 Conference, Mexico City, on the development of the Media Suite, an online research environment that facilitates scholarly research using large multimedia collections maintained at archives, libraries and knowledge institutions. The Media Suite unlocks the data on the collection level, item level, and segment level, provides tools that are aligned with the scholarly primitives (discovery, annotation, comparison, linking), and has a 'workspace' for storing personal mixed media collections and annotations, and to do advanced analysis using Jupyter Notebooks and NLP tools.
See the notes for the narrative that goes with the slides.
The document outlines an agenda for a training session on Scratchpads, a website platform for taxonomists. The agenda includes introductions, an overview presentation of Scratchpads and its features, and training course options on basic and advanced use of the platform. The document also provides background on the goals of Scratchpads to enable taxonomy research and publication and to help inventory the world's species.
The document discusses scratchpads, which are websites for taxonomists to publish and share their research. It describes how scratchpads allow taxonomists to manage taxonomic data, reference bibliographies, images, phylogenies, character matrices, distribution maps, and specimen records. Over 200 scratchpad communities have been created, with over 2,500 users publishing over 300,000 pages of content. The ViBRANT project aims to further develop and support scratchpads as a virtual research environment for taxonomists.
ViBRANT—Virtual Biodiversity Research and Access Network for TaxonomyVince Smith
Presented by Dave Roberts and coauthored by Vince Smith at BioIdentify 2010, the National Muséum of Natural History (MNHN), Paris, France. 20-22 Sept, 2010.
Laboratories around the world continue to generate immense amounts of data that are non-proprietary and of value to the community. If available these data could dramatically reduce costs by minimizing rework and ultimately facilitating faster research. High quality reference data collections of chemical compound dictionaries, properties and spectra have been generated over many decades. With the advent of social networking tools and platforms such as Wikipedia, the community has an opportunity to contribute. The ChemSpider platform hosted by the Royal Society of Chemistry is a compound centric database with associated data. Already populated with almost 25 million unique compounds the community can deposit and host their own data, and curate and annotate existing data including those generated in Open Notebook Science Efforts. This presentation will provide an overview of progress to date and outline the vision of this community platform for chemistry and ensuring the longevity of chemistry reference data.
The ChemSpider database is a resource hosted by the Royal Society of Chemistry. With over 28 million unique chemicals on the database linked out to over 400 data sources the platform provides access to experimental and predicted data (properties, spectra etc.), links to publications, patents and a myriad of other resources. The ChemSpider database has been used as the foundation of a number of other resources for chemists including ChemSpider SyntheticPages, the Learn Chemistry Wiki and the Spectral Game. This presentation will provide an overview of ChemSpider and discuss how chemists can both derive value from and contribute to the content available from the database and its related resources. We will also discuss our view of future platform for managing personal, institutional and public chemistry in a shared environment.
ContentMine: Open Data and Social MachinesTheContentMine
Published on Nov 13, 2014 by PMR
Scientific information is often hidden or not published properly. The ContentMine is a Social Machine consisting of semantic software and communities of domain expertise; it aims to liberate all scientific facts from the published literature on a daily basis.
The talk , delivered to the Computational Institute, will be /was followed by a hands-on workshop learning how to use the technology and work as a community.
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
Research Objects and their instantiation as RO-Crate: motivation, explanation, examples, history and lessons, and opportunities for scholarly communications, delivered virtually to 17th Italian Research Conference on Digital Libraries
towards interoperable archives: the Universal Preprint Service initiativeHerbert Van de Sompel
The document discusses the Universal Preprint Service initiative which aims to promote interoperability between preprint archives. It provides background on existing preprint models and services. The initiative is supported by several organizations and held its first meeting in 1999 to discuss technical recommendations for achieving interoperability between archives.
This document summarizes a presentation on using open-source tools to provide access to scientific literature on climate change and migration. It describes how ContentMine has built tools called "Open Climate Knowledge" to mine scientific articles on climate change from publishers' websites and other open sources. However, most of this literature (50-90%) is currently behind paywalls. The tools allow querying across open-access sources to provide summaries of available literature on topics like the relationship between climate change and human migration. Examples of results from initial queries on this topic are also provided.
The document discusses building an online open database of spectral data to serve as a teaching resource and reduce duplicative work. It describes ChemSpider, a database of chemical structures and properties that also links to some spectral data. The document calls for adding more spectral data from various sources and formats to ChemSpider to build the most comprehensive open resource for spectral data online. It also describes some interactive features like spectral games to help teach and identify spectra.
GeoChronos: An On-line Collaborative Platform for Earth Observation ScientistsGeoChronos
Presentation given by John Gamon at the AGU Fall Meeting in San Francisco on Dec. 14, 2009. The presentation highlights features and supporting technologies of the GeoChronos Platform
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Pedro Príncipe
This document discusses the power of repositories as infrastructure for open science. It notes that individual repositories have value for their institutions, but that their true value lies in their potential for interconnection to create a unified network providing access to research results. This network requires open access content and interoperability between repositories. OpenAIRE is presented as working to realize this potential through services that support content enrichment, notifications to repositories of relevant research, and usage statistics. Funders are also integrating with OpenAIRE to help monitor open access compliance and the impact of research funding.
Ingredients for Semantic Sensor NetworksOscar Corcho
The document discusses ingredients for creating a Semantic Sensor Web including an ontology model, URI definition practices, semantic technologies like SPARQL, and mappings to integrate sensor data. It provides an overview of the SSN ontology for describing sensors and observations. Examples are given of querying sensor data streams using SPARQL extensions and translating queries to sensor network APIs using mappings. Lessons on publishing and consuming linked stream data are also discussed.
The document summarizes the findings of a study on the impact of digitized scholarly resources. It describes various quantitative and qualitative methods used in the study, including webometrics, analytics, log file analysis, interviews, focus groups, and surveys. The study analyzed five digitization projects and found they had positive impacts like improving research and enabling new types of quantitative analysis. Usage varied by project, with some seeing more impact through teaching resources while others saw more impact through computational analysis of materials.
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Paolo Missier
The document discusses provenance in the context of data science and artificial intelligence. It provides bibliometric data on publications related to data/workflow provenance from 2000 to the present. Recent trends include increased focus on applications in computing and engineering fields. Blockchain is discussed as a method for capturing fine-grained provenance. The document also outlines challenges around explainability, transparency and accountability for high-risk AI systems according to new EU regulations, and argues that provenance techniques may help address these challenges by providing traceability of system functioning and operation monitoring.
In this presentation I present the plan to make rare disease data resources findable, accessible, interoperable, and reusable for humans and computers (FAIR). The presentation was made for the IRDiRC conference 2017 in Paris.
In this tutorial we explain the basics of a 'Linked Data and Ontology' approach for combining data, in particular for the study of rare diseases. The approach is motivated by a case study provided by health care researcher Ulrike Braisch. The main take home lesson is that with this approach the effort for data integration can be substantially lowered, i.e. lead to a shorter path to new treatments for (rare) diseases.
The presentation is based on a tutorial given at the RD-Connect/Neuromics/Euronomics plenary meeting in Heidelberg, Germany, February 26, 2014. It was made possible by RD-Connect, a European project to support Rare Disease research (http://www.rd-connect.eu).
This document discusses different levels of semantics that can be used when making assertions in nanopublications. Weaker semantics include minted URIs which are machine readable but not machine interpretable. Stronger semantics involve linking concepts to existing ontologies to make assertions more machine interpretable. The document outlines approaches ranging from weakest to strongest semantics, noting tradeoffs between interpretability and difficulty.
Slides for the Technology Track of ISMB/ECCB 2013 in Berlin on digital publishing, highlighting the Research Object model, Nanopublications, and ISA as a means to capture methods and results when research is carried out digitally. This work was supported by the EU workflow forever project (http://wf4ever-project.org).
This document discusses how workflows can help biologists by allowing them to combine various computational tools and databases. It notes that individual biologists have limited time and computational skills, but can use workflows to access various expertises and resources. Workflows allow biologists to design complex computational experiments and analyze large amounts of data by connecting different services and applications in an automated, repeatable process.
Extended presentation from the enabling technology track of the BBMRI 'BioBanking for Science' conference in Amsterdam, September 2010. Feedback from the audience have been added.
This document introduces Marco Roos and discusses his transition from traditional molecular biology and bioinformatics work to e-science. It describes how e-science approaches can help address challenges in biology by enabling greater data and knowledge sharing, reuse of tools and workflows, and integrated analysis across multiple data types and sources. Examples discussed include semantic web technologies, workflow systems, and proposed e-laboratory platforms to empower scientists with virtual collaborative environments and intelligent assistance. The goal is to help biologists better exploit computational resources and expertise through enhanced and standardized e-science frameworks.
This document summarizes a presentation about using the Taverna workflow system and myExperiment repository for collaborative bioinformatics research. It discusses how Taverna allows researchers to combine multiple computational methods and online data sources into reproducible workflows. The presenter describes their own experiences with early "spaghetti code" approaches to bioinformatics and how e-Science tools now enable more insightful experiments through collaboration and sharing of workflows.
Presentation in support of AIDA demonstration at the ISMB/ECCB conference in Vienna, 2007. We demonstrated the application of AIDA web services for mining associations of proteins and diseases with an input query through a text mining workflow implemented as a workflow in Taverna. The AIDA toolkit combines services for information retrieval, information extraction, and Semantic Web modelling and storage. The services are created by experts in different fields collaborating under the name of 'Adaptive Information Disclosure' in the VL-e project (http://www.vl-e.nl).
The document summarizes the experience of a biologist in adopting an e-science approach to their work. It describes how before e-science, the biologist took an uncoordinated "spaghetti" approach using various tools without a unified strategy. The biologist then explains how adopting e-science principles like collaboration, reusable workflows, and web services helped enhance their work by allowing experts from different domains to combine their expertise. The biologist also reflects on outreach efforts to promote e-science to other researchers.
1. The document discusses how a biologist, Marco Roos, became interested in e-science through his work in molecular and cellular biology, bioinformatics, and data integration projects.
2. Roos describes how e-science allows for collaboration between different experts and disciplines through technologies like workflows, semantic web, and virtual laboratories.
3. Roos emphasizes that e-science should empower scientists by making tools and resources easy to use, share, and build upon so that scientists can focus on scientific problems rather than technical challenges.
The document discusses developments in e-Science and online tools for scientific communities. It describes how electronic lab notebooks, wikis, blogs and workflows can enable collaboration and knowledge sharing. Computational experiments using web services allow combining various experts' tools and data. E-science approaches leverage many minds to generate hypotheses, publish results and enable virtual laboratories.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
CWA & SWAT4LS Pitch at DILS2009
1. SWAT4LS 2009Semantic Web Applications and Tools for Life Sciences http://www.swat4ls.org/ Amsterdam, Science Park, Friday, 20th of November 2009
2. Topics of interest Standards, Technologies, Tools for the Semantic Web Systems for a Semantic Web for Bioinformatics Existing and prospective applications of the Semantic Web for Bioinformatics 2
3. Deadlines Tue 1st September: Submission opens Mon 28th September: Paper submission Fri 23rd October: Communication of acceptance Fri 6th November: Revised paper Fri 20th November: SWAT4LS Workshop 3
8. What is CWA? A Forum: to unite stakeholders to share complex Life Science data in a new way, through triples. A Facilitator: to promote the development of triple content services and promote the development of triple professional services. A Facility: A “warehouse”, distributor and agent for triples on behalf of contributors and users. Develop our own services.
10. Start Small. Incremental Value. One thing well.Triples. Triples. Triples. Jam today. More Jam Tomorrow.
11. Some initial ideas Focus on what we talk about Focus on factoids Machine readable for use in our applications Bottom-up standardization / certification 11
12. Where significant value is added…suggestions …triples represent an economic value and can be charged for Curated triples: charges at the discretion of the curator Inferred triples: charges at the discretion of the inferrer Disambiguated triples: charges at the discretion of disambiguator Redundancy-removed triples: charges at the discretion of the redundancy remover Observed triples can be charged for if they are taken from proprietary sources – by or on behalf of the proprietors peer-reviewed literature: charges at the discretion of the rights-holder
24. Participation on the journey… Organisation governs community driven creation of concept web Think about: Is the CWA useful to you / your organisation? Would you subscribe? Why? / Why not? How would you like to participate? Would you help us develop the Alliance? You rule!