Connections that work: Linked Open Data demystifiedJakob .
Keynote given 2014-10-22 at the National Library of Finland at Kirjastoverkkopäivät 2014 (https://www.kiwi.fi/pages/viewpage.action?pageId=16767828) #kivepa2014
This document summarizes a presentation given by Naomi Dushay to programmers about collaborations between catalogers and programmers. The presentation discusses how catalogers and programmers often talk past each other due to differences in perspectives and priorities. It suggests that finding catalogers who are frustrated with current MARC-based systems and interested in new approaches could be a good starting point for collaboration. The presentation also acknowledges challenges in managing library data in a more distributed environment with data from diverse sources and suggests that demonstrating the costs of no change will be greater than the costs of change is important. It concludes by asking if catalogers and programmers can do more than just get along and get to know one another better.
The document discusses open data and how it can be used to understand the world. It describes how open government data and open data in general can increase transparency and create value when more people can access and utilize data. The document outlines possible paths for organizations to publish open data, factors for success, and how to ensure quality. It also discusses combining data from different sources and using standards to better integrate and link open data.
This document discusses the evolution of the web from a web of documents to a web of linked data. It outlines the principles of linked data, which involve using URIs to identify things and linking those URIs to other URIs so that machines can discover more data. RDF is introduced as a standard data model for publishing linked data on the web using triples. Examples of linked data applications and datasets are provided to illustrate how linked data allows the web to function as a global database.
This document discusses the evolution of the web from a network of documents to a network of linked data. It begins by describing the original web of documents, which organized information in silos and had implicit semantics. The document then introduces the concept of the semantic web and linked data, which structures information as interconnected data using explicit semantics. It provides examples of how linked data can be represented using RDF triples and describes the principles of linked data for publishing and connecting data on the web. Finally, it discusses characteristics and examples of linked data applications.
Linked Open Europeana aims to leverage Europe's cultural heritage through linked open data. It presents Europeana, an online library of European culture. Europeana started in 2005 and now contains over 15 million objects. Linked open data extends the web by adding semantics through RDF triples and linking disparate data sources. Europeana's data model EDM aims to publish cultural heritage data as linked open data to enable novel uses like knowledge generation by combining data with the rest of the semantic web. For Europeana to fully realize its potential, its data needs to be openly available as linked open data under an open license.
Open Data Mashups: linking fragments into mosaicsphduchesne
This document discusses open data mashups and linking data fragments into mosaics. It covers linked data standards and representation formats like RDF and JSON-LD. It also discusses formalizing URI fragments for different media types and dimensions like text, time, spatial coordinates and more. The presentation demonstrates a mosaics model for building documents from heterogeneous data fragments and embedding them while maintaining their original contexts. Potential use cases include disaster management, open science, fact checking and data journalism.
Choices, modelling and Frankenstein Ontologiesbenosteen
This document discusses an ontology project at the University of Bristol. It addresses issues with representing research information, which changes frequently. The project uses a combination of ontologies like FOAF, Bio, and Dcterms to model "Things" like people and publications. Context about these Things, like time periods of validity, is represented using named graphs. The current implementation stores this information in a Fedora object store with RDF serialization. The project aims to gather relevant domain taxonomies and provide APIs for researchers to maintain them, taking a "Frankenstein" approach of combining relevant standards. It notes some design flaws of the CERIF interchange format compared to the linked data approach taken.
Connections that work: Linked Open Data demystifiedJakob .
Keynote given 2014-10-22 at the National Library of Finland at Kirjastoverkkopäivät 2014 (https://www.kiwi.fi/pages/viewpage.action?pageId=16767828) #kivepa2014
This document summarizes a presentation given by Naomi Dushay to programmers about collaborations between catalogers and programmers. The presentation discusses how catalogers and programmers often talk past each other due to differences in perspectives and priorities. It suggests that finding catalogers who are frustrated with current MARC-based systems and interested in new approaches could be a good starting point for collaboration. The presentation also acknowledges challenges in managing library data in a more distributed environment with data from diverse sources and suggests that demonstrating the costs of no change will be greater than the costs of change is important. It concludes by asking if catalogers and programmers can do more than just get along and get to know one another better.
The document discusses open data and how it can be used to understand the world. It describes how open government data and open data in general can increase transparency and create value when more people can access and utilize data. The document outlines possible paths for organizations to publish open data, factors for success, and how to ensure quality. It also discusses combining data from different sources and using standards to better integrate and link open data.
This document discusses the evolution of the web from a web of documents to a web of linked data. It outlines the principles of linked data, which involve using URIs to identify things and linking those URIs to other URIs so that machines can discover more data. RDF is introduced as a standard data model for publishing linked data on the web using triples. Examples of linked data applications and datasets are provided to illustrate how linked data allows the web to function as a global database.
This document discusses the evolution of the web from a network of documents to a network of linked data. It begins by describing the original web of documents, which organized information in silos and had implicit semantics. The document then introduces the concept of the semantic web and linked data, which structures information as interconnected data using explicit semantics. It provides examples of how linked data can be represented using RDF triples and describes the principles of linked data for publishing and connecting data on the web. Finally, it discusses characteristics and examples of linked data applications.
Linked Open Europeana aims to leverage Europe's cultural heritage through linked open data. It presents Europeana, an online library of European culture. Europeana started in 2005 and now contains over 15 million objects. Linked open data extends the web by adding semantics through RDF triples and linking disparate data sources. Europeana's data model EDM aims to publish cultural heritage data as linked open data to enable novel uses like knowledge generation by combining data with the rest of the semantic web. For Europeana to fully realize its potential, its data needs to be openly available as linked open data under an open license.
Open Data Mashups: linking fragments into mosaicsphduchesne
This document discusses open data mashups and linking data fragments into mosaics. It covers linked data standards and representation formats like RDF and JSON-LD. It also discusses formalizing URI fragments for different media types and dimensions like text, time, spatial coordinates and more. The presentation demonstrates a mosaics model for building documents from heterogeneous data fragments and embedding them while maintaining their original contexts. Potential use cases include disaster management, open science, fact checking and data journalism.
Choices, modelling and Frankenstein Ontologiesbenosteen
This document discusses an ontology project at the University of Bristol. It addresses issues with representing research information, which changes frequently. The project uses a combination of ontologies like FOAF, Bio, and Dcterms to model "Things" like people and publications. Context about these Things, like time periods of validity, is represented using named graphs. The current implementation stores this information in a Fedora object store with RDF serialization. The project aims to gather relevant domain taxonomies and provide APIs for researchers to maintain them, taking a "Frankenstein" approach of combining relevant standards. It notes some design flaws of the CERIF interchange format compared to the linked data approach taken.
Introducción a la web semántica - Linkatu - irekia 2012Alberto Labarga
This document provides an introduction to linked data journalism and the use of semantic web technologies like RDF, SPARQL, and linked open data. It explains key concepts such as using RDF triples to represent knowledge, querying linked data with SPARQL, and examples of data sources like DBpedia and data marketplaces.
This document discusses how linking open data can make data more valuable and useful. It recommends following semantic web and linked data practices like publishing data using RDF, linking entities to related datasets, and maintaining and improving links over time. Linking data allows queries across datasets, facilitates data integration, and enables new applications by connecting related information. The key is to link data in a way that answers questions and benefits both data publishers and users, and to iteratively enhance link quality and coverage.
The document discusses how linked data provides an improved way of publishing and connecting data on the web compared to traditional methods. It explains the basic principles of linked data, including using URIs to identify things and linking those URIs to other related data. This allows data to be integrated and explored more easily across different sources. As an example, it describes how linked data allows research information to be browsed and mapped in a live, interconnected manner rather than separate static datasets.
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
The document discusses the DBpedia project, which extracts structured data from Wikipedia to build a multilingual knowledge graph. It describes DBpedia's goals of making this data openly available and supporting its community. The DBpedia Association is being formed as a non-profit to oversee the infrastructure and support contributors. Funding will come from donations and sponsorships. Upcoming events include the DBpedia Community Meeting coinciding with the SEMANTiCS conference in September.
This document summarizes a presentation about transforming the web together through open data and standards. It discusses the World Wide Web Consortium (W3C) and its role in developing open web standards. It provides examples of linked open data projects including data.gov and mashups of government data. Specific open data portals for cities like Chicago are highlighted. Semantic web technologies like RDF, RDFa, and SPARQL are referenced as working groups at W3C. Links are included to further resources on linked open data basics and news portals. The presentation concludes with mentioning Peter Mika from Yahoo discussing open data.
1) BIBFRAME is a new linked data model being developed by the Library of Congress to replace MARC as the bibliographic framework for libraries in a post-MARC world.
2) It aims to make networked data central to libraries and increase interconnectivity between cultural heritage organizations.
3) The initial phase involved developing the BIBFRAME model and mappings between it and MARC/RDA, as well as prototypes for translation and interfaces. Early experimenters are now helping refine and test the model.
Biblissima's prototype on Medieval Manuscripts Illuminations and their ContextEquipex Biblissima
Présentation dans le cadre du "Workshop SW4SH" organisé dans le cadre de l'ESWC 2015 (Portoroz, Slovénie, 1er juin 2015 ) par Stefanie Gehrke (coordinatrice metadonnées Biblissima), Eduard Frunzeanu, Pauline Charbonnier et Marie Muffat.
The document discusses the lack of discoverability of datasets compared to other scholarly works like journal articles and books. It notes that specialist bibliographic databases are better than Google for finding journal articles. To improve dataset discoverability, the OECD has given datasets equal bibliographic status as other scholarly works by adding DOIs, inclusion in aggregation platforms, alerts, and MARC records. This has resulted in over 700 datasets and 10,000 tables being discoverable through the same systems as ebooks, ejournals, and other works.
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
Presented at the Coalition of Networked Information (CNI) Spring 2024 Project Briefings.
Over the past six years, Getty has been engaged in a project to transform and unify its complex digital infrastructure for cultural heritage information. One of the project’s core goals was to provide validation of the impact and value of the use of linked data throughout this process. With museum, archival, media, and vocabularies in production and others underway, this sessions shares some of the practical implications (and pitfalls) of this work—particularly as it relates to interoperability, discovery, staffing, stakeholder engagement, and complexity management. The session will also share examples of how other organizations can streamline their own, similar work going forward.
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital ObjectsKaren Estlund
Interoperability has long been a goal of digital repositories, as demonstrated by efforts ranging from OAI-PMH, to attempts to create common APIs such as IIIF, to community based metadata standards such as Dublin Core. As repositories have matured and the desire to work more collaboratively and reuse source code has grown, the need for a common understanding of how digital objects are conceived and represented is essential. The Portland Common Data Model (PCDM) is an effort to create a shared, linked data-based model for representing complex digital objects. Starting in the Hydra community but quickly expanding to include contributors from Islandora, Fedora, the Digital Public Library of America, and other repository-related service communities, PCDM is the result of over sixty practitioners’ contributions to a shared model for structuring digital objects. The process was holistic and rooted in concrete use-cases. An initial in-person meeting in Portland, Oregon in fall 2014 resulted in the release of the first draft of the data model for which it is named. With this shared model, we intend to further the goal of interoperability across repositories and related technologies. This presentation will review the origins of PCDM, provide a general technical overview, update on current status, and forecast future work.
This document provides an overview of an introductory course on Web Science. It discusses key topics including:
1. What is Web Science and why it matters as an area of scientific study.
2. Key aspects of web architecture like URIs, URLs, HTTP, and file formats.
3. Methods of measuring the web through network analysis and studying structures like the blogosphere and social networks.
4. The Web Science Method which takes an iterative, mixed methods approach of engineering, measuring, and analyzing the web.
5. The social aspects of the web and challenges of incorporating human behavior.
6. Issues of web governance, security, and standards setting.
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...Ari Berman
Talk given at the Leverage Big Data '14 Event in May 2014.
Big data has arrived in the life science research domain, and it has caught researchers and IT professionals alike off-guard. Workstations, Excel, and even small clusters under people's desks are no longer sufficient to meet the data storage and processing needs of modern biological research techniques. Data is being produced cheaply and rapidly at unprecedented rates in academic, commercial and clinical laboratories while budgets in those spaces continue to be slashed. Despite the reduced budgets, it is predicted that 25% of all researchers will require HPC to analyze their data in the coming year. Research organizations are starting to realize they have to run to catch up, or face failure in the wake of old-school IT infrastructures and policies. IT organizations have been forced to get creative and build amazing infrastructures for pennies, or fail in the wake of the user pressure being generated from the laboratories. Converged infrastructure is the present and the future for biomedical, clinical, and life sciences research. In this talk, I'll cover the IT challenges in life sciences, how and where they are being met, and talk about the near-future trends in IT infrastructure, services, and informatics and how they will affect medical discoveries in the next 5-10 years.
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuousIntegration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of contentmining (TDM)
Published on Jan 29, 2016 by PMR
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuous Integration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of content mining (TDM)
The Culture of Research Data, by Peter Murray-RustLEARN Project
1st LEARN Workshop. Embedding Research Data as part of the research cycle. 29 Jan 2016. Presentation by Peter Murray-Rust, ContentMine.org and University of Cambridge
A presentation I gave at the 2018 Molecular Med Tri-Con in San Francisco, February 2018. This addresses the general challenge of biomedical data management, some of the things to consider when evaluation solutions in this space, and concludes with a brief summary of some of the tools and platforms in this space.
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Duncan Hull
After centuries with little change, scientific libraries have recently experienced massive upheaval. From being almost entirely paper-based, most libraries are now almost completely digital. This information revolution has all happened in less than 20 years and has created many novel opportunities and threats for scientists, publishers and libraries.
Today, we are struggling with an embarassing wealth of digital knowledge on the Web. Most scientists access this knowledge through some kind of digital library, however these places can be cold, impersonal, isolated, and inaccessible places. Many libraries are still clinging to obsolete models of identity, attribution, contribution, citation and publication.
Based on a review published in PLoS Computational Biology, http://pubmed.gov/18974831 this talk will discuss the current chilly state of digital libraries for biologists, chemists and informaticians, including PubMed and Google Scholar. We highlight problems and solutions to the coupling and decoupling of publication data and metadata, with a tool called http://www.citeulike.org. This software tool exploits the Web to make digital libraries “warmer”: more personal, sociable, integrated, and accessible places.
Finally issues that will help or hinder the continued warming of libraries in the future, particularly the accurate identity of authors and their publications, are briefly introduced. These are discussed in the context of the BBSRC funded REFINE project, at the National Centre for Text Mining (NaCTeM.ac.uk), which is linking biochemical pathway data with evidence for pathways from the PubMed database.
From the Benchtop to the Datacenter: HPC Requirements in Life Science ResearchAri Berman
Ari Berman discussed the growing requirements for high performance computing (HPC) in life science research. Large amounts of data are being generated through techniques like next generation sequencing, imaging, and medical scanning. However, infrastructure is struggling to keep up with the rapid changes in instrumentation and data volumes. Proper HPC design requires understanding specific use cases to solve problems rather than just achieve performance. Life scientists can be tough on systems and want everything immediately with limited budgets.
1. The document discusses the history and future of semantic web technologies, including lessons learned and trends. It notes that semantic web's strength is in data aggregation rather than data management.
2. Two scenarios involving expressing claims in RDFa and linking from a homepage are presented, showing how trust can come from linked information.
3. Recent and emerging trends in user interfaces, search engines, and services are moving towards a more machine-readable web where pages make claims and datasets are interconnected.
Introducción a la web semántica - Linkatu - irekia 2012Alberto Labarga
This document provides an introduction to linked data journalism and the use of semantic web technologies like RDF, SPARQL, and linked open data. It explains key concepts such as using RDF triples to represent knowledge, querying linked data with SPARQL, and examples of data sources like DBpedia and data marketplaces.
This document discusses how linking open data can make data more valuable and useful. It recommends following semantic web and linked data practices like publishing data using RDF, linking entities to related datasets, and maintaining and improving links over time. Linking data allows queries across datasets, facilitates data integration, and enables new applications by connecting related information. The key is to link data in a way that answers questions and benefits both data publishers and users, and to iteratively enhance link quality and coverage.
The document discusses how linked data provides an improved way of publishing and connecting data on the web compared to traditional methods. It explains the basic principles of linked data, including using URIs to identify things and linking those URIs to other related data. This allows data to be integrated and explored more easily across different sources. As an example, it describes how linked data allows research information to be browsed and mapped in a live, interconnected manner rather than separate static datasets.
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
The document discusses the DBpedia project, which extracts structured data from Wikipedia to build a multilingual knowledge graph. It describes DBpedia's goals of making this data openly available and supporting its community. The DBpedia Association is being formed as a non-profit to oversee the infrastructure and support contributors. Funding will come from donations and sponsorships. Upcoming events include the DBpedia Community Meeting coinciding with the SEMANTiCS conference in September.
This document summarizes a presentation about transforming the web together through open data and standards. It discusses the World Wide Web Consortium (W3C) and its role in developing open web standards. It provides examples of linked open data projects including data.gov and mashups of government data. Specific open data portals for cities like Chicago are highlighted. Semantic web technologies like RDF, RDFa, and SPARQL are referenced as working groups at W3C. Links are included to further resources on linked open data basics and news portals. The presentation concludes with mentioning Peter Mika from Yahoo discussing open data.
1) BIBFRAME is a new linked data model being developed by the Library of Congress to replace MARC as the bibliographic framework for libraries in a post-MARC world.
2) It aims to make networked data central to libraries and increase interconnectivity between cultural heritage organizations.
3) The initial phase involved developing the BIBFRAME model and mappings between it and MARC/RDA, as well as prototypes for translation and interfaces. Early experimenters are now helping refine and test the model.
Biblissima's prototype on Medieval Manuscripts Illuminations and their ContextEquipex Biblissima
Présentation dans le cadre du "Workshop SW4SH" organisé dans le cadre de l'ESWC 2015 (Portoroz, Slovénie, 1er juin 2015 ) par Stefanie Gehrke (coordinatrice metadonnées Biblissima), Eduard Frunzeanu, Pauline Charbonnier et Marie Muffat.
The document discusses the lack of discoverability of datasets compared to other scholarly works like journal articles and books. It notes that specialist bibliographic databases are better than Google for finding journal articles. To improve dataset discoverability, the OECD has given datasets equal bibliographic status as other scholarly works by adding DOIs, inclusion in aggregation platforms, alerts, and MARC records. This has resulted in over 700 datasets and 10,000 tables being discoverable through the same systems as ebooks, ejournals, and other works.
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
Presented at the Coalition of Networked Information (CNI) Spring 2024 Project Briefings.
Over the past six years, Getty has been engaged in a project to transform and unify its complex digital infrastructure for cultural heritage information. One of the project’s core goals was to provide validation of the impact and value of the use of linked data throughout this process. With museum, archival, media, and vocabularies in production and others underway, this sessions shares some of the practical implications (and pitfalls) of this work—particularly as it relates to interoperability, discovery, staffing, stakeholder engagement, and complexity management. The session will also share examples of how other organizations can streamline their own, similar work going forward.
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital ObjectsKaren Estlund
Interoperability has long been a goal of digital repositories, as demonstrated by efforts ranging from OAI-PMH, to attempts to create common APIs such as IIIF, to community based metadata standards such as Dublin Core. As repositories have matured and the desire to work more collaboratively and reuse source code has grown, the need for a common understanding of how digital objects are conceived and represented is essential. The Portland Common Data Model (PCDM) is an effort to create a shared, linked data-based model for representing complex digital objects. Starting in the Hydra community but quickly expanding to include contributors from Islandora, Fedora, the Digital Public Library of America, and other repository-related service communities, PCDM is the result of over sixty practitioners’ contributions to a shared model for structuring digital objects. The process was holistic and rooted in concrete use-cases. An initial in-person meeting in Portland, Oregon in fall 2014 resulted in the release of the first draft of the data model for which it is named. With this shared model, we intend to further the goal of interoperability across repositories and related technologies. This presentation will review the origins of PCDM, provide a general technical overview, update on current status, and forecast future work.
This document provides an overview of an introductory course on Web Science. It discusses key topics including:
1. What is Web Science and why it matters as an area of scientific study.
2. Key aspects of web architecture like URIs, URLs, HTTP, and file formats.
3. Methods of measuring the web through network analysis and studying structures like the blogosphere and social networks.
4. The Web Science Method which takes an iterative, mixed methods approach of engineering, measuring, and analyzing the web.
5. The social aspects of the web and challenges of incorporating human behavior.
6. Issues of web governance, security, and standards setting.
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...Ari Berman
Talk given at the Leverage Big Data '14 Event in May 2014.
Big data has arrived in the life science research domain, and it has caught researchers and IT professionals alike off-guard. Workstations, Excel, and even small clusters under people's desks are no longer sufficient to meet the data storage and processing needs of modern biological research techniques. Data is being produced cheaply and rapidly at unprecedented rates in academic, commercial and clinical laboratories while budgets in those spaces continue to be slashed. Despite the reduced budgets, it is predicted that 25% of all researchers will require HPC to analyze their data in the coming year. Research organizations are starting to realize they have to run to catch up, or face failure in the wake of old-school IT infrastructures and policies. IT organizations have been forced to get creative and build amazing infrastructures for pennies, or fail in the wake of the user pressure being generated from the laboratories. Converged infrastructure is the present and the future for biomedical, clinical, and life sciences research. In this talk, I'll cover the IT challenges in life sciences, how and where they are being met, and talk about the near-future trends in IT infrastructure, services, and informatics and how they will affect medical discoveries in the next 5-10 years.
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuousIntegration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of contentmining (TDM)
Published on Jan 29, 2016 by PMR
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuous Integration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of content mining (TDM)
The Culture of Research Data, by Peter Murray-RustLEARN Project
1st LEARN Workshop. Embedding Research Data as part of the research cycle. 29 Jan 2016. Presentation by Peter Murray-Rust, ContentMine.org and University of Cambridge
A presentation I gave at the 2018 Molecular Med Tri-Con in San Francisco, February 2018. This addresses the general challenge of biomedical data management, some of the things to consider when evaluation solutions in this space, and concludes with a brief summary of some of the tools and platforms in this space.
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Duncan Hull
After centuries with little change, scientific libraries have recently experienced massive upheaval. From being almost entirely paper-based, most libraries are now almost completely digital. This information revolution has all happened in less than 20 years and has created many novel opportunities and threats for scientists, publishers and libraries.
Today, we are struggling with an embarassing wealth of digital knowledge on the Web. Most scientists access this knowledge through some kind of digital library, however these places can be cold, impersonal, isolated, and inaccessible places. Many libraries are still clinging to obsolete models of identity, attribution, contribution, citation and publication.
Based on a review published in PLoS Computational Biology, http://pubmed.gov/18974831 this talk will discuss the current chilly state of digital libraries for biologists, chemists and informaticians, including PubMed and Google Scholar. We highlight problems and solutions to the coupling and decoupling of publication data and metadata, with a tool called http://www.citeulike.org. This software tool exploits the Web to make digital libraries “warmer”: more personal, sociable, integrated, and accessible places.
Finally issues that will help or hinder the continued warming of libraries in the future, particularly the accurate identity of authors and their publications, are briefly introduced. These are discussed in the context of the BBSRC funded REFINE project, at the National Centre for Text Mining (NaCTeM.ac.uk), which is linking biochemical pathway data with evidence for pathways from the PubMed database.
From the Benchtop to the Datacenter: HPC Requirements in Life Science ResearchAri Berman
Ari Berman discussed the growing requirements for high performance computing (HPC) in life science research. Large amounts of data are being generated through techniques like next generation sequencing, imaging, and medical scanning. However, infrastructure is struggling to keep up with the rapid changes in instrumentation and data volumes. Proper HPC design requires understanding specific use cases to solve problems rather than just achieve performance. Life scientists can be tough on systems and want everything immediately with limited budgets.
1. The document discusses the history and future of semantic web technologies, including lessons learned and trends. It notes that semantic web's strength is in data aggregation rather than data management.
2. Two scenarios involving expressing claims in RDFa and linking from a homepage are presented, showing how trust can come from linked information.
3. Recent and emerging trends in user interfaces, search engines, and services are moving towards a more machine-readable web where pages make claims and datasets are interconnected.
This presentation was provided by Sebastian Kohlmeier of The Allen Institute for AI (AI2), during the NISO event "Transforming Search: What the Information Community Can and Should Build." The virtual conference was held on August 26, 2020.
This presentation was provided by Huajin Wang of Carnegie Mellon University, during the NISO Hot Topic Virtual Conference "Text and Data Mining." The event was held on May 25, 2022.
The document discusses moving from open access to open data in scientific publishing. It outlines the social contract of science which involves validation, dissemination and further development of research. When these principles are not followed, it can constitute scientific malpractice by various stakeholders. The presentation advocates for data journals as an incentive that can help recognize data as a valid research output and encourage data sharing by providing metrics like citations. It provides details on what constitutes a data paper and reviews factors like peer review that are important for data journals to be successful.
I want to know more about compuerized text analysisLuke Czarnecki
This document provides an overview of computerized text analysis and discusses ethical considerations related to using social media data for social science research. It begins with an introduction to the speaker's research analyzing ideological differences through language use. It then covers the history and current capabilities of computerized text analysis. A major theme is the need for rigorous ethics applications and approval processes as technology has outpaced philosophy. The document concludes with a demonstration of basic functions and capabilities in R for collecting, preprocessing, and analyzing text data.
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Chris Dagdigian
This is a talk I put together for a http://www.neren.org/ seminar called "Bridging the Gap: Research Facilitation". Tried to give a biotech/pharma view for a mostly academic audience.
Similar to ACS CINF Luncheon talk (Boston 2018) (20)
Mixtures QSAR: modelling collections of chemicalsAlex Clark
This document discusses representing and modeling chemical mixtures. It proposes a new data format called Mixfile or MInChI to hierarchically define mixtures and their components, including concentrations. This format aims to support cheminformatics applications like property prediction. Examples are given modeling theophylline solubility and gas absorption using mixture data. The document also describes applying similar methods to model polymer entropy of mixing using a spreadsheet dataset converted to the mixtures format. It concludes that defining mixtures in digital formats will enable greater analysis, modeling and use of mixture data.
Mixtures InChI: a story of how standards drive upstream productsAlex Clark
This document discusses the development of Mixtures InChI (MInChI), a standard for representing chemical mixtures in a machine-readable format. MInChI was developed to address the lack of standards for mixture informatics and interoperability. The document outlines the development of open source tools to generate and edit MInChI notation, as well as efforts to build a community and integrate MInChI into commercial products and databases to enable widespread use and generation of mixture data. Future work discussed includes finalizing the MInChI specification, extending it to additional chemical entities, developing associated properties and metadata, and implementing MInChI at large scale.
Mixtures as first class citizens in the realm of informaticsAlex Clark
Presented at Cambridge (UK) cheminformatics meeting, February 2021. Mixtures of chemicals are underutilised from an informatics point of view, and this presentation shows some of the work done by Collaborative Drug Discovery, IUPAC and InChI Trust to remedy this.
See recording: https://www.youtube.com/watch?v=0ILc0owuEzQ&list=PLfj_gc4RCduuwv9p8lh2xS1EhQ3p_Nd9S&index=1 ... my part starts at 1:05:00
Mixtures: informatics for formulations and consumer productsAlex Clark
The document proposes standards for representing mixtures in a machine-readable format. It introduces Mixfile and MInChI (Mixtures InChI) as hierarchical and concise formats for describing mixtures. Examples of formulations are provided to demonstrate how components, concentrations, and metadata can be encoded. Potential applications of the standards are discussed, such as enabling sophisticated searches of mixture data from publications and vendors to facilitate properties prediction and hazards assessment. Adoption of the standards could help ensure the longevity and sharing of mixture data.
Chemical mixtures: File format, open source tools, example data, and mixtures...Alex Clark
This document discusses representing chemical mixtures using an open format called Mixfile. It proposes Mixfile as a standard format for mixtures, analogous to Molfile for individual molecules. Tools were created to edit and manipulate Mixfiles. Over 5,600 real-world mixture examples were extracted from text and represented in the Mixfile format. A MInChI notation was also defined as a condensed representation of mixtures. Future work is proposed to integrate mixture definitions and lookups into electronic lab notebooks and improve automated extraction of mixture information from text.
Bringing bioassay protocols to the world of informatics, using semantic annot...Alex Clark
This document discusses bringing bioassay protocols into the world of informatics by using semantic annotations. It describes how measurements from bioassays contain many details that are usually only available as text, and outlines an approach using ontologies, natural language processing, and machine learning to extract this information and make it accessible for searching, comparing datasets, and identifying trends. The goal is to make all bioassay protocol data machine readable by developing common templates and annotation standards that can be applied to existing and new assay data sources.
Autonomous model building with a preponderance of well annotated assay protocolsAlex Clark
Combining large amounts of publicly available structure-activity data with assays that have carefully curated annotations opens the door to a number of ways to analyze the data behind the scenes. Combining fully machine readable input for a diverse variety of projects with modelling techniques that can be used without fussy parametrization allows models to be created and updated whenever new data arrives. Predictions from these models can be integrated into normal searching and visualization workflows, without any need for the user to opt-in or make extra decisions. This approach is novel and different from the way structure-activity models are normally deployed: useful predictions can be presented ubiquitously with literally zero additional work on behalf of the user. We will present our efforts to date regarding ways to both passively and actively draw attention to important drug discovery trends while exploring compounds and assays.
Representing molecules with minimalism: A solution to the entropy of informaticsAlex Clark
Cheminformatics as we know it is possible because so many molecular structures can be represented with datastructures and rules that are at first glance quite trivial. This first impression is highly misleading, since even within supposedly well behaved domains, edge cases arising from issues such as resonance, tautomerization, symmetry and stereochemistry - to name but a few - quickly add up. To supplement these genuine challenges, there is a whole additional class of problems caused by the mismatch between chemists' understanding of molecules and the datatypes that are necessary to capture a structure for informatics purposes. This line is blurred by the convenience of representing structures in a form that is very closely related to the diagram styles that have been in use since the dawn of chemistry. There are currently four major approaches to structure representation: connection tables (e.g. MDL Molfile), sketches (e.g. ChemDraw), canonical strings (e.g. SMILES and InChI) and atomic models (numerous 3D formats). Not only do all of these approaches have valid use cases, but they are deceptively incompatible with each other, even when addressing identical needs. Almost without exception, format conversions are not commutative, and every translation involves losing some amount of data. Given that recording chemical structures in machine readable form has become such a critical part of scientific research, it is essential to define a fundamental representation that captures the key structural definition asserted by the experimental chemist, for a broad and useful range of molecules, and ideally in a way that is closely related to visual drawing mnemonics. The number of data concepts needed to satisfy these conditions is quite small, and is mostly satisfied by the most commonly used subset of the venerable MDL Molfile format. This presentation will discuss how this subset, with a few minor corrections and clarifications, can and should be used as the reference standard for molecules, and how the informatics community can benefit from having well defined standards.
Presentation to the EPA (August 2016) about the BioAssay Express project, from Collaborative Drug Discovery. Describes the history and potential of the project, with the intention of opening a dialog about incorporating EPA toxicity data.
SLAS2016: Why have one model when you could have thousands?Alex Clark
Society for Laboratory Automation & Screening, San Diego, January 2016. Presented by Dr. Alex M. Clark. Describes the use of open data resources (ChEMBL) to build target-activity models for drug discovery and toxicity prediction, on a massive scale, using a fully automated process. Concludes with a demo of the PolyPharma app, which shows how these models can be used for prospective drug discovery.
The anatomy of a chemical reaction: Dissection by machine learning algorithmsAlex Clark
This document discusses using machine learning algorithms to analyze chemical reaction data. It describes how current reaction reporting formats are not well-suited for computational analysis. A more structured reporting format is proposed to fully describe reactions in a digitally friendly way, including specifying reactants, products, quantities, yields, and metrics like atom efficiency. This structured data would allow modeling of reaction substitutability and enable large-scale machine learning of chemical transformations.
Compact models for compact devices: Visualisation of SAR using mobile appsAlex Clark
Presented at American Chemical Society meeting, Boston, 2015. Describes how cheminformatics algorithms and visualisation interfaces have advanced on mobile apps to cover a diverse variety of functionality, increasingly calculated on the device itself rather than deferring to a web service. Culminates in a demo of the PolyPharma app prototype (see http://cheminf20.org/2015/08/06/the-polypharma-app-a-mash-up-of-ideas-and-technology)
Green chemistry in chemical reactions: informatics by designAlex Clark
Chemical informatics technology can be of assistance to chemists for describing reactions in numerous ways, including calculating green chemistry metrics such as process mass intensity, E-factor and atom economy. To facilitate this, chemical reactions have to be described in more precise detail than is the norm for most chemists. There are also numerous practical ways to add more green chemistry functionality to lab notebooks, such as enumerating searchable reaction transforms for environmentally favourable reactions, automatically looking up toxicity and hazard information, and others which are mentioned in the slides.
This presentation was given at the Green Chemistry & Engineering conference in 2015 (Americal Chemical Society Green Chemistry Insititute).
Green chemistry is an important subject that needs to be a part of every chemist's education, as well as a part of the daily routine of the professional synthetic chemist. This talk describes how a new app can be used to bring green chemistry metrics to reaction descriptions, once they are captured in a proper cheminformatics format. It also describes some of the additional data resources that can be incorporated into the user experience, and how this helps both students and professionals.
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Alex Clark
Mobile apps for cheminformatics are quite powerful on their own, but can be significantly boosted by connecting them with cloud-hosted functionality. This talk explores the range of functionality that can be covered simply by making use of apps with stateless webservices, i.e. anonymous access without persistent data.
Building a mobile reaction lab notebook (ACS Dallas 2014)Alex Clark
This document discusses building a mobile electronic lab notebook focused on chemical reactions called the Green Lab Notebook. It would allow users to draw chemical structures, balance reactions, and calculate quantities, yields, and green metrics. Key features include digitally capturing reaction data, prioritizing computer-friendly data structures and intuitive workflows, and linking to external databases for solvent data, sustainable feedstocks, and curated green reaction transforms. The goal is to facilitate recording, analyzing, and promoting the reuse of experimental reaction data in a sustainable chemistry context.
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Alex Clark
Presented at 2013 the German Chem[o]informatics Conference in Fulda, 2013: entitled "Putting together the pieces: building a reaction-centric electronic lab notebook for mobile devices".
Cheminformatics
workflows using the
mobile + cloud platform. Presentation by Dr. Alex M. Clark of Molecular Materials Informatics at the NETTAB 2013 meeting in Venice, Italy. The presentation introduces the significance of mobile apps in science, and the scope of their capabilities in chemical structure informatics. The bulk of the talk describes an account of a preliminary workflow using open science data to search for viable leads for a cure for tuberculosis. The workflow described makes use of a combination of mobile, cloud and conventional desktop-based technology, all stitched together by facile communication, sharing and collaboration features.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
2. chemical informatics
Background
◈ Big Data: more like Large Data (+ inconvenience)
◈ We can have aspirations
◈ For any established subset of chemistry: vast
amounts of data exists...
◈ ... but not in a way computers can use it
2
3. chemical informatics
History
◈ Early 20th century: science was a small town,
everyone knew each other
◈ The dead tree model of publishing scaled nicely
◈ 1990s: my grad advisor (W.R. Roper) knew all the
players in organometallics, read everything
◈ 2000s: my postdoc advisor (C.A. Reed) opined we
were drowning in peer reviewed literature
3
4. chemical informatics
Grains of sand on a beach
◈ From my graduate
research...
◈ ... and a hundred
more much like it.
4
◈ Pure academic research: to see if we can
◈ Published in respectable journals: a few citations,
mostly forgotten in dusty tomes & PDF files
◈ All that time & effort & funding... for what?
5. chemical informatics
To be a datapoint
◈ We need to rethink the target audience: it's not
humans... it's machines
◈ They can handle the scale...
◈ ... but they cannot interpret the input data.
5
◈ Every unit of science has
potential value, but no way to be
sure when or to whom
◈ Nobody can read it all, but the
solution is not to publish less
6. chemical informatics
Digital trees
◈ Computers for writing & reading since 1990s
◈ Yet we use tech to pretend we're with Newton &
friends in early days of the Royal Society
◈ Each step is digital, but the process is modelled on
typewriters and wood carvings
◈ What little data existed in the intermediate documents
is destroyed during the final production
6
7. chemical informatics
Progress?
◈ Most chemical data is silo'd and opaque
◈ Drug discovery is very motivated: there are structure-
activity datasets that are free, large, useful and
accurate (previously: pick zero)
◈ Now: ChEMBL, PubChem & others (SAR data)
◈ Improved accessibility:
▷ open access is growing (still not data though)
▷ APIs to the patent literature
▷ FAIR data movement: building steam
7
8. chemical informatics
Dark data
◈ Almost all published chemical research is visible only to
humans, and effectively dark to machines
◈ This is almost as true now as it was in 1700
◈ Solutions have distinct categories:
▷ fix the past: curate old publications using descriptive
data formats - machine learning and text mining can
help, but still need expensive expert humans
▷ fix the present (+ future): publish new data using file
formats that target machines and humans - need
better tools and different attitudes
8
9. http://molnote.com
Case 1: Figures
◈ Chemists painstakingly draw molecules & reactions
9
◈ Interpretability is high for other chemists...
◈ ... and surprisingly low for machines.
10. http://molnote.com
Cheminformatics first
◈ Similar amount of effort to
draw fully defined content
◈ e.g. Molecular Notebook
◈ Draw structures & reaction
components for computers
◈ Let algorithms do the
rendering...
10
11. http://molpress.com
Internet-era
manuscripts
◈ Using standard wordprocessor
is not good for data
◈ WordPress plugin MolPress:
▷ insert & retain raw data
▷ plugin handles rendering
▷ open source
◈ Use blogging software as an
electronic lab notebook
11
12. http://assaycentral.org
Case 2: Assay Central
◈ Rare & neglected diseases: long tail, few resources
◈ Premise: the data is out there
◈ Pragmatism: grab & fix, by any means necessary
◈ Sources:
▷ ChEMBL, PubChem (parts thereof)
▷ curation, collaboration, importing, scraping
◈ Goal: seed all targets with only good data
12
13. http://assaycentral.org
Provenance
◈ Some data is good: just have to label (e.g. target),
others must sample, validate by flagging
◈ Merging: standardise, group by, build models
◈ Project ➔ GitHub repository: a molecular build system
13
15. http://assaycentral.org
Virtuous cycle
◈ Initial data integration is labour intensive
◈ Build models & other decision support
◈ Can generate prescribed molecules
▷ high predictions for desired target(s)
▷ low predictions for undesirable off-targets
◈ Collaborators can order & test these candidates, and
return the data in model-ready format
15
18. http://collaborativedrug.com
Case 4: Mixtures
◈ Mixtures of compounds: ad hoc is the way, e.g.
▷ osmium tetroxide 2.5 wt. % in tert-butanol
▷ n-butyllithium 2.5M in hexanes
◈ Long overdue for a well defined open format
▷ Mixfile
◈ High priority: text-extraction importer
◈ Working with IUPAC: data feedstock
17
Molfile ➔ InChI
Mixfile ➔ MInChI
20. http://bioassayexpress.com
Case 5: Bioassay protocols
◈ Assay protocols usually just text: enrich by describing
them with semantic web terms (e.g. BAO, DTO, CLO...)
19
◈ Curated assays are
machine readable
◈ Legacy: ML-support
◈ Current: ELN-style
◈ Began 2014
◈ See Wednesday
21. chemical informatics
Case 6: Data are valuable
◈ The only way forward: machine readability at source
◈ Scientists must be “persuaded” to get on board
◈ Better tools can help, but mostly social engineering
◈ Who controls much of the culture of science?
20
22. chemical informatics
Journal publishers
◈ Only the human-readable parts of a manuscript are
peer reviewed (text & figures)
◈ What if the data were part of the formal process...
◈ ... if an algorithm can't understand it, send it back.
21
◈ Supplementary information becomes
the most important part of the article
◈ Open it to everyone's model
23. chemical informatics
Example
◈ Can start small, e.g. with organic/drug discovery
22
◈ Author provides the files, middleware does basic
validation, reviewer opens and verified
1.mol
2.mol
3.mol
ID,ALK2IC50,LE,LiPE,TPSA
-,uM,uM,uM,uM
1,8.20,0.45,4.79,7014
2,18.2,0.39,4.43,59.28
3,2.30,0.46,5.38,70.14
data.csv
24. chemical informatics
What are you waiting for?
◈ A whole industry & community exists for support
◈ Best of breed tools have open source options
◈ One publisher has to go first...
◈ ... who?
23