The document discusses building a structure-centric community for chemists by leveraging crowdsourcing and text-mining of open chemistry data on the internet. It describes ChemSpider's capabilities to search and aggregate chemical data from various sources by structure and property and its efforts to curate and link open access literature and patents to chemical structures. Challenges around data quality and ambiguity in chemical names are also covered. The goal is to enable new ways of searching chemistry information centered around chemical structures.
There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. However, freedom costs and in many cases the cost is quality. ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. As an aggregator of chemistry related information from many sources, at present over 21.5 million unique chemical entities from over 150 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. This presentation will provide an overview of the issue of quality in many chemistry-related databases, approaches to cleaning up the data and how a curated platform can become the centralized hub for resourcing information about chemical entities. This includes experimental and predicted properties, analytical data, publications, suppliers and integrated databases. I will detail three efforts :1) the curation of chemistry on Wikipedia 2) an examination of structure integrity on the FDA Daily Med website, a web site of medication content and labeling as found in medication package inserts 3) recognizing chemical names in documents and providing a platform for structure-based searching of Open Access chemistry literature.
The internet has provided access to unprecedented quantities of data. In the domain of chemistry specifically over the past decade the web has become populated with tens of millions of chemical structures and related properties of assays together with tens of thousands of spectra and syntheses. The data have, to a large extent, remained disparate and disconnected. In recent years with the wave of Web 2.0 participation any chemist can contribute to both the sharing and validation of chemistry-related data whether it be via Wikipedia, the online encyclopedia, or one of the multiple public compound databases. The presentation will offer a perspective of what is available today, our experiences of building a public compound database to link together the internet and a suggested path forward for enabling even greater integration and connectivity for chemistry data for the masses to both use and participate in developing.
These are the slides I will be giving here at the Science Commons Symposium Pacific Northwest at the Microsoft Campus here in Redmond in about 5 minutes time
A Talk delivered at both UNC Chapel Hill and Drexel University
There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. However, freedom costs and in many cases the cost is quality. ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. As an aggregator of chemistry related information from many sources, at present over 21.5 million unique chemical entities from over 150 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. This presentation will provide an overview of the issue of quality in many chemistry-related databases, approaches to cleaning up the data and how a curated platform can become the centralized hub for resourcing information about chemical entities. This includes experimental and predicted properties, analytical data, publications, suppliers and integrated databases. I will detail three efforts :1) the curation of chemistry on Wikipedia 2) an examination of structure integrity on the FDA Daily Med website, a web site of medication content and labeling as found in medication package inserts 3) recognizing chemical names in documents and providing a platform for structure-based searching of Open Access chemistry literature.
This is a presentation I gave at the Library of Congress as part of a NFAIS/FLICC/CENDI meeting as outlined here: http://www.chemspider.com/blog/making-the-web-work-for-science-presentation-at-the-library-of-congress.html
The presentation provides an overview of some of the challenges the publishers face moving forward, how they are responding to it, how InChI is an enabling technology, how quality is important.
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus.
Three slides have embedded movies - these do not show in slideshare and a first pass of this can be seen as a single file at https://vimeo.com/154705161
There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. However, freedom costs and in many cases the cost is quality. ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. As an aggregator of chemistry related information from many sources, at present over 21.5 million unique chemical entities from over 150 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. This presentation will provide an overview of the issue of quality in many chemistry-related databases, approaches to cleaning up the data and how a curated platform can become the centralized hub for resourcing information about chemical entities. This includes experimental and predicted properties, analytical data, publications, suppliers and integrated databases. I will detail three efforts :1) the curation of chemistry on Wikipedia 2) an examination of structure integrity on the FDA Daily Med website, a web site of medication content and labeling as found in medication package inserts 3) recognizing chemical names in documents and providing a platform for structure-based searching of Open Access chemistry literature.
The internet has provided access to unprecedented quantities of data. In the domain of chemistry specifically over the past decade the web has become populated with tens of millions of chemical structures and related properties of assays together with tens of thousands of spectra and syntheses. The data have, to a large extent, remained disparate and disconnected. In recent years with the wave of Web 2.0 participation any chemist can contribute to both the sharing and validation of chemistry-related data whether it be via Wikipedia, the online encyclopedia, or one of the multiple public compound databases. The presentation will offer a perspective of what is available today, our experiences of building a public compound database to link together the internet and a suggested path forward for enabling even greater integration and connectivity for chemistry data for the masses to both use and participate in developing.
These are the slides I will be giving here at the Science Commons Symposium Pacific Northwest at the Microsoft Campus here in Redmond in about 5 minutes time
A Talk delivered at both UNC Chapel Hill and Drexel University
There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. However, freedom costs and in many cases the cost is quality. ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. As an aggregator of chemistry related information from many sources, at present over 21.5 million unique chemical entities from over 150 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. This presentation will provide an overview of the issue of quality in many chemistry-related databases, approaches to cleaning up the data and how a curated platform can become the centralized hub for resourcing information about chemical entities. This includes experimental and predicted properties, analytical data, publications, suppliers and integrated databases. I will detail three efforts :1) the curation of chemistry on Wikipedia 2) an examination of structure integrity on the FDA Daily Med website, a web site of medication content and labeling as found in medication package inserts 3) recognizing chemical names in documents and providing a platform for structure-based searching of Open Access chemistry literature.
This is a presentation I gave at the Library of Congress as part of a NFAIS/FLICC/CENDI meeting as outlined here: http://www.chemspider.com/blog/making-the-web-work-for-science-presentation-at-the-library-of-congress.html
The presentation provides an overview of some of the challenges the publishers face moving forward, how they are responding to it, how InChI is an enabling technology, how quality is important.
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus.
Three slides have embedded movies - these do not show in slideshare and a first pass of this can be seen as a single file at https://vimeo.com/154705161
Can machines understand the scientific literaturepetermurrayrust
With over 5000 scientific articles per day we need machines to help us understand the content. This material is to be used at an interactive session for the Science Society at Trinity College Cambridge UK
There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. As an aggregator of chemistry related information from many sources, at present over 21.5 million unique chemical entities from over 200 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. This presentation will provide an overview of the ChemSpider platform and how it is fast becoming the centralized hub for resourcing information about chemical entities.
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
The scientific and medical literature contains huge amounts of valuable unused information. This talk shows how to discover it, extract, re-use and interpret it. Wikidata is presented as a key new tool and infrastructure. Everyone can become involved. However some of the barriers to use are sociopolitical and these are identified and discussed.
Online databases containing high throughput screening and other property data continue to proliferate in number. Many pharmaceutical chemists will have used databases such as PubChem, ChemSpider, DrugBank, BindingDB and many others. This work will report on the potential value of these databases for providing data to be used to repurpose drugs using cheminformatics-based approaches (e.g. docking, ligand-based machine learning methods). This work will also discuss the potentially related applications of the Open PHACTS project, a European Union Innovative Medicines Initiative project, that is utilizing semantic web based approaches to integrate large scale chemical and biological data in new ways. We will report on how compound and data quality should be taken into account when utilizing data from online databases and how their careful curation can provide high quality data that can be used to underpin the delivery of molecular models that can in turn identify new uses for old drugs.
The Royal Society of Chemistry provides open access to data associated with tens of millions of chemical compounds. The richness and complexity of the data has continued to expand dramatically and the original vision for providing an integrated hub for structure-centric data has been delivered across the world to hundreds of thousands of users. With an intention of expanding the reach to cover more diverse aspects of chemistry-related data including compounds, reactions and analytical data, to name just a few data-types, we are in the process delivering a Chemistry Data Repository. The data repository will manage the challenges of associated metadata, the various levels of required security (private, shared and public) and exposing the data as appropriate using semantic web technologies. Ultimately this platform will become the host for all chemicals, reactions and analytical data contained within RSC publications and specifically supplementary information. This presentation will report on the challenges of managing “Big Data” for chemists around the world and providing access to tools for structure dereplication, spectral database searching and the crowdsourcing of the worlds’ largest spectral database.
Asking the scientific literature to tell us about metabolismpetermurrayrust
Talk at Lhasa (https://www.lhasalimited.org/) a leading organization for "in silico prediction and database systems for use in metabolism, toxicology and related sciences". ContentMine software can extract data from papers on compound metabolism in reusable semantic form, including metabolic pathways, pharmacokinetic data.
When we look at the rapid growth of scientific databases on the Internet in the past decade, we tend to take the accessibility and provenance of the data for granted. As we see a future of increased database integration, the licensing of the data may be a hurdle that hampers progress and usability. We have formulated four rules for licensing data for open drug discovery, which we propose as a starting point for consideration by databases and for their ultimate adoption. This work could also be extended to the computational models derived from such data. We suggest that scientists in the future will need to consider data licensing before they embark upon re-using such content in databases they construct themselves.
Internet-based public domain databases containing chemical compounds have grown in number, capability and content in recent years. There are now many databases containing millions of chemical compounds associated with different types of data including chemical names, properties, analytical data, and with associated mapping to proteins, assay data, clinical information and so on. These disparate data sources suffer from one common issue – quality of data. This presentation will provide an overview of our efforts to source the appropriate structural representations for 200 top-selling drugs from public domain sources. This intra- and inter-laboratory comparison of approaches, processes and necessary agreements exposed the challenges associated with aggregating structure-based data. The project also provided data regarding the distribution of quality issues associated with many of the community’s popular databases.
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
Methods for extracting facts from the scientific literature, and linking them to Wikidata IDs. Wikidata is introduced by an architectural example and bioscience. Then we explore how data can be extracted from text and from images
High throughput mining of the plant-science literaturepetermurrayrust
We can now mine the plant science literature for facts, especially species (both plants and others), chemicals, diseases and other agricultural terms. This presentation gives a number of examples and links on how you can do this on the Open Access literature
Talk to OpenForum Academy (Open Forum Europe) about Text and data Mining. Four use cases selected fo non-scientists. Also discussion of latest on Europena copyright reform and TDM exceptions
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
ContentMining (Text and Data Mining) is now legal in the UK for non-commercial research. Cambridge UK is a natural centre, with several components:
* a world-class University and Library
* many publishers, both Open Access and conventional
* a digital culture
* ContentMine - a leading proponent and practitioner of mining
Cambridge University Press welcomes content mining and invited PMR to give a talk there. He showed the technology and protocols and proposed a practical way forward in 2017
ChemSpider is being built with the intention of being a chemical structure centric community for chemists. With over 16 million chemical structures as of August 2007, and with data deposition and curation mechanisms in place for text, structure and spectra ChemSpider intends to be a meeting place and collaborative environment for chemists to work together.
This was a presentation I gave to an audience at Nature Publishing Group in New York on May 7th 2009. It's a long presentation and over an hour in length. Not much new here relative to other presentations...just a knitting together of many of the others on here.
There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with an increasing number of Open Source software programs we are in the middle of a revolution in data availability and tools to manipulate these data. ChemSpider is a free access website built with the intention of providing a structure centric community for chemists. As an aggregator of chemistry related information from many sources, at present over 21.5 million unique chemical entities from over 190 separate data sources, ChemSpider has taken on the task of both robotically and manually integrating and curating publicly available data sources. ChemSpider has also provided an environment for users to deposit, curate and annotate chemistry-related information. This has allowed the community to enhance ChemSpider by adding analytical data, associating synthetic pathways and publications and connecting to social networking resources. I will discuss how ChemSpider is fast becoming the premier curated platform and centralized hub for resourcing information about chemical entities and how the platform provides the foundation data for services allowing the analysis of analytical data and collaborative science.
This is a presentation I gave at the FDA on December 1st 2009 in Wahington DC as part of a symposium involving PubChem, ChemIDPLus, PillBox, DailyMed and other related systems. The focus was, as usual, on the quality of data online and how to clean up the information and with a specific focus on the quality of data on the FDA's DailyMed and our efforts to apply semantic markup to the DailyMed articles
Can machines understand the scientific literaturepetermurrayrust
With over 5000 scientific articles per day we need machines to help us understand the content. This material is to be used at an interactive session for the Science Society at Trinity College Cambridge UK
There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. As an aggregator of chemistry related information from many sources, at present over 21.5 million unique chemical entities from over 200 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. This presentation will provide an overview of the ChemSpider platform and how it is fast becoming the centralized hub for resourcing information about chemical entities.
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
The scientific and medical literature contains huge amounts of valuable unused information. This talk shows how to discover it, extract, re-use and interpret it. Wikidata is presented as a key new tool and infrastructure. Everyone can become involved. However some of the barriers to use are sociopolitical and these are identified and discussed.
Online databases containing high throughput screening and other property data continue to proliferate in number. Many pharmaceutical chemists will have used databases such as PubChem, ChemSpider, DrugBank, BindingDB and many others. This work will report on the potential value of these databases for providing data to be used to repurpose drugs using cheminformatics-based approaches (e.g. docking, ligand-based machine learning methods). This work will also discuss the potentially related applications of the Open PHACTS project, a European Union Innovative Medicines Initiative project, that is utilizing semantic web based approaches to integrate large scale chemical and biological data in new ways. We will report on how compound and data quality should be taken into account when utilizing data from online databases and how their careful curation can provide high quality data that can be used to underpin the delivery of molecular models that can in turn identify new uses for old drugs.
The Royal Society of Chemistry provides open access to data associated with tens of millions of chemical compounds. The richness and complexity of the data has continued to expand dramatically and the original vision for providing an integrated hub for structure-centric data has been delivered across the world to hundreds of thousands of users. With an intention of expanding the reach to cover more diverse aspects of chemistry-related data including compounds, reactions and analytical data, to name just a few data-types, we are in the process delivering a Chemistry Data Repository. The data repository will manage the challenges of associated metadata, the various levels of required security (private, shared and public) and exposing the data as appropriate using semantic web technologies. Ultimately this platform will become the host for all chemicals, reactions and analytical data contained within RSC publications and specifically supplementary information. This presentation will report on the challenges of managing “Big Data” for chemists around the world and providing access to tools for structure dereplication, spectral database searching and the crowdsourcing of the worlds’ largest spectral database.
Asking the scientific literature to tell us about metabolismpetermurrayrust
Talk at Lhasa (https://www.lhasalimited.org/) a leading organization for "in silico prediction and database systems for use in metabolism, toxicology and related sciences". ContentMine software can extract data from papers on compound metabolism in reusable semantic form, including metabolic pathways, pharmacokinetic data.
When we look at the rapid growth of scientific databases on the Internet in the past decade, we tend to take the accessibility and provenance of the data for granted. As we see a future of increased database integration, the licensing of the data may be a hurdle that hampers progress and usability. We have formulated four rules for licensing data for open drug discovery, which we propose as a starting point for consideration by databases and for their ultimate adoption. This work could also be extended to the computational models derived from such data. We suggest that scientists in the future will need to consider data licensing before they embark upon re-using such content in databases they construct themselves.
Internet-based public domain databases containing chemical compounds have grown in number, capability and content in recent years. There are now many databases containing millions of chemical compounds associated with different types of data including chemical names, properties, analytical data, and with associated mapping to proteins, assay data, clinical information and so on. These disparate data sources suffer from one common issue – quality of data. This presentation will provide an overview of our efforts to source the appropriate structural representations for 200 top-selling drugs from public domain sources. This intra- and inter-laboratory comparison of approaches, processes and necessary agreements exposed the challenges associated with aggregating structure-based data. The project also provided data regarding the distribution of quality issues associated with many of the community’s popular databases.
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
Methods for extracting facts from the scientific literature, and linking them to Wikidata IDs. Wikidata is introduced by an architectural example and bioscience. Then we explore how data can be extracted from text and from images
High throughput mining of the plant-science literaturepetermurrayrust
We can now mine the plant science literature for facts, especially species (both plants and others), chemicals, diseases and other agricultural terms. This presentation gives a number of examples and links on how you can do this on the Open Access literature
Talk to OpenForum Academy (Open Forum Europe) about Text and data Mining. Four use cases selected fo non-scientists. Also discussion of latest on Europena copyright reform and TDM exceptions
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
ContentMining (Text and Data Mining) is now legal in the UK for non-commercial research. Cambridge UK is a natural centre, with several components:
* a world-class University and Library
* many publishers, both Open Access and conventional
* a digital culture
* ContentMine - a leading proponent and practitioner of mining
Cambridge University Press welcomes content mining and invited PMR to give a talk there. He showed the technology and protocols and proposed a practical way forward in 2017
ChemSpider is being built with the intention of being a chemical structure centric community for chemists. With over 16 million chemical structures as of August 2007, and with data deposition and curation mechanisms in place for text, structure and spectra ChemSpider intends to be a meeting place and collaborative environment for chemists to work together.
This was a presentation I gave to an audience at Nature Publishing Group in New York on May 7th 2009. It's a long presentation and over an hour in length. Not much new here relative to other presentations...just a knitting together of many of the others on here.
There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with an increasing number of Open Source software programs we are in the middle of a revolution in data availability and tools to manipulate these data. ChemSpider is a free access website built with the intention of providing a structure centric community for chemists. As an aggregator of chemistry related information from many sources, at present over 21.5 million unique chemical entities from over 190 separate data sources, ChemSpider has taken on the task of both robotically and manually integrating and curating publicly available data sources. ChemSpider has also provided an environment for users to deposit, curate and annotate chemistry-related information. This has allowed the community to enhance ChemSpider by adding analytical data, associating synthetic pathways and publications and connecting to social networking resources. I will discuss how ChemSpider is fast becoming the premier curated platform and centralized hub for resourcing information about chemical entities and how the platform provides the foundation data for services allowing the analysis of analytical data and collaborative science.
This is a presentation I gave at the FDA on December 1st 2009 in Wahington DC as part of a symposium involving PubChem, ChemIDPLus, PillBox, DailyMed and other related systems. The focus was, as usual, on the quality of data online and how to clean up the information and with a specific focus on the quality of data on the FDA's DailyMed and our efforts to apply semantic markup to the DailyMed articles
This is a presentation given to the Royal Society General Assembly in Birmingham on November 20th 2009. This covers the present status and future vision for ChemSpider
ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. It was developed to index available sources of chemical structures and their associated data into a single searchable repository and making it available to everybody, at no charge. While there are a large number of databases containing chemical compounds and data available online their inherent quality, accuracy and completeness is severely lacking. ChemSpider has provided a platform so that the chemistry community could contribute to improving the quality of data online and expanding the information to include data such as reaction syntheses, analytical data, experimental properties and linkages to other valuable resources. It has grown into a resource containing over 21 million unique chemical structures from over 200 data sources.
This presentation will provide an overview of ChemSpider and its value to chemists as a search tool, as a public repository of information and how it can become one of the primary foundations of internet-based chemistry. I will also discuss the vision for ChemSpider and some of the lofty goals we are setting for the system moving forward.
The increasing availability of free and open access resources for scientists on the internet presents us with a revolution in data availability. The Royal Society of Chemistry hosts ChemSpider, a free access website for chemists built with the intention of building community for chemists (http://www.chemspider.com/).
ChemSpider is an aggregator of chemistry related information, at present over 20 million unique chemical entities linked out to over 300 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. It is also a public deposition platform where chemists can deposit their own data including novel structures, analytical data, synthesis procedures and host data associated with the growing activities associated with Open Notebook Science.
This presentation will examine chemistry on the internet, the dubious quality of what is available and how the ChemSpider crowdsourced curation platform is fast becoming one of the centralized hubs for resourcing information about chemical entities.
We will also review our efforts to provide free resources for synthesis procedures, spectral data and structure-based searching of the chemistry literature and how chemists can contribute directly to each of these projects.
ChemSpider is a free access website for chemists built with the vision of providing a structure centric community for chemists. Vision is great…execution is better. ChemSpider is now one of the internet’s primary portals for chemistry offering access to over 23 million unique chemical structures from over 200 data sources and expanding daily. Even though there are tens if not hundreds of chemical structure databases such as literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data etc. there has been no single way to search across them. Despite the fact that there are a large number of databases containing chemical compounds and data available online their inherent quality, accuracy and completeness remains lacking in many regards. With ChemSpider we have provided a platform whereby the chemistry community could contribute to cleaning up the data, improving the quality of data online and expanding the information available to include data such as reaction syntheses, analytical data, experimental properties and linking to other valuable resources.
This presentation will provide an overview of ChemSpider and its value to chemists as a search tool, as a public repository of information and how it can become one of the primary foundations of internet-based chemistry. I will also discuss the vision for ChemSpider and some of the exciting goals we are setting for the system moving forward.
The presentation of ChemSpider was to a groub of science librarians, specifically chemistry librarians, and was meant to provide an overview of the platform and answer the question posed: What is the difference between ChemSpider, CAS Scifinder and Reaxys.
ChemSpider was developed with the intention of aggregating and indexing available sources of chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge. There are many tens of chemical structure databases such as literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data etc. and no single way to search across them. Despite the diversity of databases available online their inherent quality, accuracy and completeness is lacking in many regards. ChemSpider was established to provide a platform whereby the chemistry community could contribute to cleaning up the data, improving the quality of data online and expanding the information available to include data such as reaction syntheses, analytical data and experimental properties. ChemSpider has now grown into a database of over 20 million chemical substances integrated with over 300 disparate data sources, many of these directly supporting the Life Sciences. This presentation will provide an overview of our efforts to improve the quality of data online, to provide a foundation for the semantic web for chemistry and to provide access to a set online tools and services to support access to these data. I will also discuss how ChemSpider is being used to enhance Semantic Publishing in Chemistry at RSC.
I am an adjunct prof at University of North Carolina Chapel Hill so when I stopped by yesterday for a business meeting I was informed that I had been lined up to give a talk to the students at 1pm. I had 20 minutes to prepare and assembled a mish-mash of information that might be of value to Citizen Chemists, those who might want to contribute to chemistry on the internet
With an intention to provide a free internet resource of chemistry related data for the community, ChemSpider provides an online database of chemical compounds, reaction syntheses and related data. Members of the community can contribute to the database via the deposition of chemical structures, synthesis procedures and analytical data. Data are also aggregated from many other depositors, at present over 400 data sources. The aggregation of data associated with over 25 million chemical compounds does not come without data quality issues. By engaging the community to curate the data the quality continues to improve on a daily basis. The presentation will provide an overview of our ongoing efforts to expand and curate the database. Using a combination of game-based and recognition systems as well as our dependence on societal giveaway by the community ChemSpider continues its path to become a high quality resource and foundation for the semantic web for chemistry.
This is a general presentation about our efforts to build an internet based community for chemists using ChemSpider. A general overview of data quality online, crowdsourced deposition and curation and our progress to deliver a solution to the community for resourcing data.
The internet now offers access to a myriad of online resources that can be of value to chemists working in the Life Sciences. While finding information online is, in many cases, a simple search away, the accuracy and validity of the associated data and information should be questioned. As more databases and resources are introduced online, and commonly not integrated to other resources, a scientist must perform multiple searches and then undertake the task of meshing and merging data. ChemSpider is a freely accessible online database that has taken on the challenge of meshing together distributed resources across the internet to provide a structure-based hub. It is a crowdsourcing environment hosting over 26 million unique compounds linked out to over 400 data sources. With well defined programming interfaces for integration ChemSpider has been integrated to many commercial and open software packages and is presently serving as the chemistry foundation for the IMI Open PHACTS project.
This is a presentation given in Track 4, Open Access and Cheminformatics, at the Bio-IT Meeting in Boston on April 21st 2010. It is a general overview of ChemSpider activities to link together the internet for chemists and validate and curate data. We won the Bio-IT Best Practices Community Service Award that evening also.
The ChemSpider database is a resource hosted by the Royal Society of Chemistry. With over 28 million unique chemicals on the database linked out to over 400 data sources the platform provides access to experimental and predicted data (properties, spectra etc.), links to publications, patents and a myriad of other resources. The ChemSpider database has been used as the foundation of a number of other resources for chemists including ChemSpider SyntheticPages, the Learn Chemistry Wiki and the Spectral Game. This presentation will provide an overview of ChemSpider and discuss how chemists can both derive value from and contribute to the content available from the database and its related resources. We will also discuss our view of future platform for managing personal, institutional and public chemistry in a shared environment.
This is a presentation given at the European Informatics Institute (EBI), in Cambridge on December 1st 2010. This was at an EMBL-EBI Industry Program Workshop regarding "Chemical Structure Resources". This is where I unveiled details regarding the intra/inter-validation studies validating drug structures on multiple public domain chemistry databases. I also unveiled early results regarding the SurveyMonkey study of "trust" that the community has about public domain chemistry resources
ChemSpider was developed with the intention of aggregating and indexing available sources of chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge. There are many tens of chemical structure databases such as literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data etc. and no single way to search across them. Despite the diversity of databases available online their inherent quality, accuracy and completeness is lacking in many regards. ChemSpider was established to provide a platform whereby the chemistry community could contribute to cleaning up the data, improving the quality of data online and expanding the information available to include data such as reaction syntheses, analytical data and experimental properties. ChemSpider has now grown into a database of well over 20 million chemical substances integrated with over 300 disparate data sources, many of these directly supporting the Life Sciences. This presentation will provide an overview of our efforts to improve the quality of data online, to provide a foundation for the semantic web for chemistry and to provide access to a set online tools and services to support access to these data. I will also discuss how ChemSpider is being used to enhance Semantic Publishing in Chemistry at RSC.
Similar to A Presentation at Nature Publishing Group Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry (20)
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
2. Imagine a time when ….
The internet is searchable by chemical structure and
substructure (e.g.Wikipedia, Google Scholar)
Chemistry articles are indexed and searchable by a free
online service
The web is linked together through the “language of
chemistry”
Publicly funded research data can be shared and
discussed in the Open, maybe as ONS?
Cheminformatics has as much of a public face as
bioinformatics
Building a Structure Centric Community for Chemists
3. ChemSpider - A Search Engine for Chemists
Questions a chemist might ask…
What is the melting point of n-butanol?
What is the chemical structure of Xanax?
Chemically, what is phenolphthalein?
What are the stereocenters of cholesterol?
Where can I find publications about xylene?
What are the different trade names for Ketoconazole?
What is the NMR spectrum of Aspirin?
What are the safety handling issues for Thymol Blue?
ChemSpider can answer all of these questions
Building a Structure Centric Community for Chemists
4. What is a Structure?
Ask a computer…ask a chemist
Building a Structure Centric Community for Chemists
5. Tell Me About Glutathione
Building a Structure Centric Community for Chemists
6. Tell Me About Glutathione
Building a Structure Centric Community for Chemists
7. Tell Me About Glutathione
Building a Structure Centric Community for Chemists
8. Tell Me About Glutathione
Building a Structure Centric Community for Chemists
9. Tell Me About Glutathione
Building a Structure Centric Community for Chemists
10. Tell Me About Glutathione
Building a Structure Centric Community for Chemists
12. Links out to KEGG
Kyoto Encyclopedia of Genes and Genomes
Building a Structure Centric Community for Chemists
13. How many names does a compound have?
Building a Structure Centric Community for Chemists
14. ChemSpider Data Content
Over 21.5 million unique chemical structures from ca. 150 data
sources
Online Databases –PubChem, Drugbank, KEGG, Wikipedia
Literature – PubMed, J Het Chem, Nature, RSC, Open Access
Chemical Vendors – over 40 different vendors and growing
Personal Depositions – individual contributions
Content database vendors
Analytical data collections
Patents
Web scraping
Content is linked back to the original data sources
Building a Structure Centric Community for Chemists
15. Other Searches
What compounds have a mass of 300+/-0.001?
or search a combination of intrinsic/predicted properties
Building a Structure Centric Community for Chemists
18. The Quality of Data Online…
Aggregating data opens up quality issues
Structure-identifier associations are “dirty”
Structures are COMMONLY incorrect
Manual curation of small databases is enough work – what
about millions of structures?
Structures are far from perfect. What is a “correct structure”?
Full stereochemistry?
Historical timeline of structure?
Who is the authority?
Building a Structure Centric Community for Chemists
19. Who holds THE Quality Authority?
Chemical Abstracts Service is the structural authority
today. 1400 employees, world standard in chemistry
information
101 years of knowledge, process and expertise.
How can an online, free access system peacefully co-
exist with the authority?
Building a Structure Centric Community for Chemists
20. Quality is a Major Issue- Search Butanol
OLD EXAMPLE..now fixed
Building a Structure Centric Community for Chemists
21. Wikipedia Chemistry Curation project
Only ca. 5000 organic structures, 7000 total
structures
Almost a year of work so far for a team of 6
people
Many errors removed in the process. Curation
process is a daily event for users/depositors
Slow and torturous process
http://en.wikipedia.org/wiki/Talk:Tacrolimus#
IUPAC_Name_and_structure
Building a Structure Centric Community for Chemists
22. Wikipedia Curation
Looking for self-consistency
across a Wikipedia Page
Primary key is the article TITLE
The chemical shown needs to
match the title
Cyclic self-consistency – and
decisions must get made
Building a Structure Centric Community for Chemists
28. Thymol Blue on ChemSpider
Data online includes:
UV-vis spectrum
Measured experimental properties
Link to Wikipedia article
Links to chromatography details
Multiple identifiers/trade names etc.
Links to vendors/suppliers/other databases
Safety information
http://www.chemspider.com/q/thymol%20blue
Building a Structure Centric Community for Chemists
29. Differences between ChemSpider/Wikipedia
ChemSpider Wikipedia
>21 million unique structures ~5000 organics, 2000 others
Complex queries – Properties, Text
Text, structure/substructure, OA
publishers, Data Sources, …
Prediction of properties No
Analytical Data No, but links.
Active depositors/curators – 30 Active editors > 50 (?)
6000 people/day; 1900 registered ????
Compound monographs linked Detailed compound monographs
Building a Structure Centric Community for Chemists
30. Differences between Wikipedia/ChemSpider
Wikipedia ChemSpider
Supported by tried and tested Primarily Microsoft .NET
Media-Wiki platform. technologies with OS components
Established infrastructure and “Out of a basement” on three
Wikipedia Foundation Team servers and 5 volunteers
Chemistry is a subset of the ‘Pedia Chemistry is the focus of ‘Spider
GFL licensing for everything Mixed “licensing”
Strong team of WP:Chem Growing team of advocates,
advocates, curators and admins curators and users
Worldwide reputation as quality Growing reputation as focused on
source – good and bad quality
Building a Structure Centric Community for Chemists
31. Crowd-sourcing Curation
How to curate data for millions of structures?
Robot processes can clean up depositions
Search for Chloride and check molecular formula for Cl
Check for stereochemistry and remove names with stereo
Provide a simple-to-use platform to curate, annotate
and tag data
Provide curator administration to prevent vandalism
(Veropedia)
Building a Structure Centric Community for Chemists
32. Post Comments
Anyone can “Post Comments” associated with a
structure. To curate data we require login to track
Building a Structure Centric Community for Chemists
34. Crowd-sourcing Chemistry
Crowd-sourced curation: identify and tag errors, edit
names, synonyms, identify records for deprecation
ALSO
Crowd-sourced deposition: anyone can deposit data
(structures, text, images, analytical data)
Building a Structure Centric Community for Chemists
38. Structure-Centric
We want to search “information” by structure, substructure,
similarity of structure
Specific focus on Open Chemistry at present
Standard approaches would be:
Identify chemical names “entity extraction”
Convert chemical names to structures and index
ChemSpider has a validated dictionary of structure-name
pairs
Use name extraction, name-conversion and dictionary look-
up. THEN curate.
Building a Structure Centric Community for Chemists
39. “Entity Extraction”
Rule-based recognition of systematic names:
Use a lexeme of name fragments
Rules for identifying bounds of a name
Look-up dictionary:
Drug Names
Trivial Names
Numbers : Registry IDs, EINECS/ELINCS
Massive look-up dictionary of validated identifiers on
ChemSpider
Building a Structure Centric Community for Chemists
41. Name Recognition
Azo aldehyde 2 was synthesized according to a
reported method [17]. To a stirred solution of azo aldehyde
2 (1.08 g, 3.76 mmol ) in dry CH2Cl2 (30.00 mL) at 0
oC were successively added (3,4-diaminophenyl)phenyl
methanone 1(0.40 g, 1.88 mmol) and a excces of anhydrous
MgSO4 (2.00 g,16.67 mmol) .
The resulting mixture was stirred for 6 hours at room
temperature [18]. The mixture was filtered and washed with
dichloromethane . Then the solvent was evaporated under
reduced pressure to give azo Schiff base 3 as a red solid which
was recrystalized from ethanol 95% (1.28 g, 91 %)
Building a Structure Centric Community for Chemists
42. Name Recognition
Azo aldehyde 2 was synthesized according to a
reported method [17]. To a stirred solution of azo aldehyde
2 (1.08 g, 3.76 mmol ) in dry CH2Cl2 (30.00 mL) at 0
oC were successively added (3,4-diaminophenyl)phenyl
methanone 1(0.40 g, 1.88 mmol) and a excess of anhydrous
MgSO4 (2.00 g,16.67 mmol) .
The resulting mixture was stirred for 6 hours at room
temperature [18]. The mixture was filtered and washed with
dichloromethane . Then the solvent was evaporated under
reduced pressure to give azo Schiff base 3 as a red solid which
was recrystalized from ethanol 95% (1.28 g, 91 %)
Building a Structure Centric Community for Chemists
43. How Many Chemical Names?
“She had the drive to derive success in any
venture and was well versed in Karate.
When the man in the tartan shirt
approached her with a dagger in his hand
she spat in his face, took the stance of a
commando and took advantage of his
shock to release the dagger from his grip
and causing him to recoil. He went home
and took an aspirin after the beating.”
Building a Structure Centric Community for Chemists
44. How Many Chemical Names?
“She had the drive to derive success in any
venture and was well versed in Karate.
When the man in the tartan shirt
approached her with a dagger in his hand
she spat in his face, took the stance of a
commando and took advantage of his
shock to release the dagger from his grip
and causing him to recoil. He went home
and took an aspirin after the beating.”
Building a Structure Centric Community for Chemists
45. ChemMantis
Chemical Markup And Nomenclature Transformation
Integrated System
Building a Structure Centric Community for Chemists
46. Making Open Access Articles Searchable
Proof of Concept
Can we HOST Chemistry Open Access articles on
ChemSpider and add-value
Can we identify chemical names in Open Access articles
in a user-friendly manner
Can we convert names to structures in Open-Access
articles and expand ChemSpider and provide structure
searching of Open Access chemistry articles?
Can we provide an environment for chemists to mark-up
their own articles and crowd-source markup of an
archive?
Building a Structure Centric Community for Chemists
47. Document markup
ChemSpider now hosting Open Access articles from
MDPI, Molecular Diversity Preservation International
Hosting the Molbank collection at present
Building a Structure Centric Community for Chemists
48. A Standard for Document Markup?
NLM-DTD: National Library of Medicine; Document
Type Definition
Approved markup definitions to apply to journal
articles – extended as necessary for our purposes
Building a Structure Centric Community for Chemists
59. A Platform for Markup
Can we provide a platform for document markup for
chemists?
Workflow:
Upload word docs, RTF files or point to HTML and load
Apply entity extraction, convert names to structures, mark-up
automatically and ask for user participation
Publish final version with NLM-DTD markup
Deposit all structures on ChemSpider under embargo and
wait for article DOI to release
Building a Structure Centric Community for Chemists
60. Challenges
Computer software can generate chemical names better
than the majority of chemists
The majority of chemical names are generated by
humans, and Incorrect – convert to the wrong structure
or are ambiguous
One name, Multiple Structures
Building a Structure Centric Community for Chemists
72. Single Configuration File defines entities
for markup
Algorithms can be built for certain
entities but the majority are dictionaries
– vendors, Phys Properties, Analytical
We can extend our system to support
your needs based on dictionaries – what
does NPG need/not need?
Building a Structure Centric Community for Chemists
74. Entity Balloons
Structures are the
language of chemistry
Show structures to
chemists and search/link
from there
Building a Structure Centric Community for Chemists
75. Other Dictionaries - Species
We are considering
Bacteria
Fungi
Enzymes
Viruses
PDB codes….
Building a Structure Centric Community for Chemists
76. Integrations Out to Other Sources
Building a Structure Centric Community for Chemists
77. Integrations Out to Other Sources
Building a Structure Centric Community for Chemists
79. Manual Curation is Always Necessary
Building a Structure Centric Community for Chemists
80. Text-Indexing and ChemSpider?
ChemSpider text-indexes almost 500,000 Open Access
and Free Access articles
Collection is growing and more publishers have already
agreed. Including theses in the future.
Building a Structure Centric Community for Chemists
82. Conclusions
The quality of structure-based data online should
always be questioned – that includes ChemSpider
Data on ChemSpider are being added and curated on a
daily basis but we need more eyeballs helping always
ChemSpider has a large validated structure-name
dictionary
Chemical name extraction and document markup is
very enabling
Building a Structure Centric Community for Chemists