This document provides information about text mining full text articles to identify molecular targets. It begins with an introduction to text mining and discusses the value of mining full text versus abstracts alone. Full texts provide richer result sets by including more keywords, facts, and relationships. Concepts may be underrepresented in abstracts but discussed in full text sections. The document then illustrates text mining full texts with an example and outlines common text mining steps. It describes Elsevier text mining solutions and services that can aggregate, structure, normalize and integrate content to extract useful facts and support applications.
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...Pistoia Alliance
Text-mining of journal articles and other publications has long been a subject of interest. It already has applications across R&D and beyond into health care, for instance by analysing electronic health records. The technology has value but also has its limits. With new sources of text to mine becoming mainstream, such as Twitter feeds or Facebook posts that might reference a company’s brand or a drug’s efficacy or adverse events, existing technology needs to be adapted to keep pace. Not only that, but whole new compliance questions arise: does a fleeting mention on Twitter require the same response as a formal notification of an adverse event?
Enabling reuse of arguments and opinions in open collaboration systems PhD vi...jodischneider
This document summarizes a PhD thesis on enabling the reuse of arguments and opinions in open collaboration systems. It discusses three research questions: 1) opportunities and requirements for argumentation support, 2) common arguments used in these systems, and 3) structuring arguments to support reuse. The methodology involved analyzing discussions from Wikipedia and open collaboration projects using argumentation theories like Walton's schemes and factors analysis. The goal is to develop semantic structures and visualizations to help people understand diverse opinions and make collaborative decisions. A prototype system tested with users found structuring discussions by key factors helped people evaluate arguments more effectively.
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text MiningDr. Haxel Consult
This document discusses using agile text mining to make knowledge more discoverable. It describes how text mining can be combined with search to both filter documents relevant to a context and discover new information within those documents. The document outlines Linguamatics' text mining platform, which allows flexible querying across multiple data sources using natural language processing and ontologies. It provides examples of how the platform has been used to extract relationships and numerical data, summarize results from multiple documents, and integrate with workflows.
Using the Micropublications ontology and the Open Annotation Data Model to re...jodischneider
This document discusses a project to construct a knowledge base linking drug interaction assertions to evidence from source documents. It will use the Micropublications Ontology to represent each assertion's support graph of claims and evidence, and the Open Annotation model to dynamically link support graph elements to quoted text excerpts from sources. The knowledge base will help answer competency questions about assertions, evidence, and their provenance. Challenges include representing both structured and unstructured text claims and efficiently querying the evidence base at scale.
10th International Conference Compound Libraries 2014Torben Haagh
VISIT THE CONFERENCE WEBSITE HERE:
http://bit.ly/CompoundLibrariesSlideshare
Maximizing information in early-phase R&D for an optimal library design and target selection
We are excited to conduct the 10th annual meeting of the formerly known Compound Libraries conference! Over the last decade we have provided the pharmaceutical R&D community with a wonderful platform for exchanging knowledge and ideas about how best to optimize the qualification of drug candidates.
We have hosted almost all major pharmaceutical companies and heard dozens of case-studies relating to important and acute issues. When returning back to the programs from previous years, it is interesting to look at the timeline of changing approaches, trends and market-related developments. Our topical spectrum ranged from compound management and acquisition to collaboration frameworks, open access, library design, screening and analysis.
This year we bring you 15 case studies about the most burning issues in early-stage discovery today and offer you a valuable trend-analysis and networking with peers and colleagues from pharmaceutical companies, biotechs, CROs and academic research institutes.
Don’t miss our 10th anniversary and join us in Berlin to take part at our legacy conference!
Benefit from participating in discussions about the following topics:
-10 years perspective on synthesizing and designing compound libraries
-What is the role of ligand efficiency metrics in drug discovery? Have your say in this controversial debate!
-Next generation library design - working towards better PPI and epigenetic libraries
-Exploration of bioactive and novel chemical space by application of privileged structure concept design
-Learn from Janssen’s experience with the assembly of the IMI European Lead Factory (ELF) library
-What is the real potential of macrocycles and are they the drugs of the future?
Journal Club - Best Practices for Scientific ComputingBram Zandbelt
This document discusses the importance of best practices in scientific computing. It notes that scientists rely heavily on software for research, with many writing their own code. However, most scientists are self-taught in software skills and may be unaware of best practices that could help them write more reliable and maintainable code. The document advocates treating software like a scientific instrument and following practices such as version control, testing, and automation. Adopting these practices could help reduce errors and make software easier to reuse.
SocialCite makes its debut at the HighWire Press meetingKent Anderson
A new service designed to allow readers and researchers to comment on the appropriateness, quality, and type of citations made in the literature made its debut at the HighWire Press Publishers Meeting yesterday.
Enhancing Data Integration with Text Analysis to Find Genes Implicated in Pla...Catherine Canevet
This document discusses enhancing data integration to identify proteins implicated in plant stress response. It introduces using text mining to combine structured data from databases with unstructured text from literature. A text mining plugin was developed for the Ondex data integration framework. It was applied to generate a knowledge base combining Arabidopsis proteins, stresses, and publications. Protein-stress association networks were visualized and metrics like interaction potential (IP) were used to filter associations and highlight key relationships validated from literature, improving the signal-to-noise ratio for identifying candidate genes.
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...Pistoia Alliance
Text-mining of journal articles and other publications has long been a subject of interest. It already has applications across R&D and beyond into health care, for instance by analysing electronic health records. The technology has value but also has its limits. With new sources of text to mine becoming mainstream, such as Twitter feeds or Facebook posts that might reference a company’s brand or a drug’s efficacy or adverse events, existing technology needs to be adapted to keep pace. Not only that, but whole new compliance questions arise: does a fleeting mention on Twitter require the same response as a formal notification of an adverse event?
Enabling reuse of arguments and opinions in open collaboration systems PhD vi...jodischneider
This document summarizes a PhD thesis on enabling the reuse of arguments and opinions in open collaboration systems. It discusses three research questions: 1) opportunities and requirements for argumentation support, 2) common arguments used in these systems, and 3) structuring arguments to support reuse. The methodology involved analyzing discussions from Wikipedia and open collaboration projects using argumentation theories like Walton's schemes and factors analysis. The goal is to develop semantic structures and visualizations to help people understand diverse opinions and make collaborative decisions. A prototype system tested with users found structuring discussions by key factors helped people evaluate arguments more effectively.
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text MiningDr. Haxel Consult
This document discusses using agile text mining to make knowledge more discoverable. It describes how text mining can be combined with search to both filter documents relevant to a context and discover new information within those documents. The document outlines Linguamatics' text mining platform, which allows flexible querying across multiple data sources using natural language processing and ontologies. It provides examples of how the platform has been used to extract relationships and numerical data, summarize results from multiple documents, and integrate with workflows.
Using the Micropublications ontology and the Open Annotation Data Model to re...jodischneider
This document discusses a project to construct a knowledge base linking drug interaction assertions to evidence from source documents. It will use the Micropublications Ontology to represent each assertion's support graph of claims and evidence, and the Open Annotation model to dynamically link support graph elements to quoted text excerpts from sources. The knowledge base will help answer competency questions about assertions, evidence, and their provenance. Challenges include representing both structured and unstructured text claims and efficiently querying the evidence base at scale.
10th International Conference Compound Libraries 2014Torben Haagh
VISIT THE CONFERENCE WEBSITE HERE:
http://bit.ly/CompoundLibrariesSlideshare
Maximizing information in early-phase R&D for an optimal library design and target selection
We are excited to conduct the 10th annual meeting of the formerly known Compound Libraries conference! Over the last decade we have provided the pharmaceutical R&D community with a wonderful platform for exchanging knowledge and ideas about how best to optimize the qualification of drug candidates.
We have hosted almost all major pharmaceutical companies and heard dozens of case-studies relating to important and acute issues. When returning back to the programs from previous years, it is interesting to look at the timeline of changing approaches, trends and market-related developments. Our topical spectrum ranged from compound management and acquisition to collaboration frameworks, open access, library design, screening and analysis.
This year we bring you 15 case studies about the most burning issues in early-stage discovery today and offer you a valuable trend-analysis and networking with peers and colleagues from pharmaceutical companies, biotechs, CROs and academic research institutes.
Don’t miss our 10th anniversary and join us in Berlin to take part at our legacy conference!
Benefit from participating in discussions about the following topics:
-10 years perspective on synthesizing and designing compound libraries
-What is the role of ligand efficiency metrics in drug discovery? Have your say in this controversial debate!
-Next generation library design - working towards better PPI and epigenetic libraries
-Exploration of bioactive and novel chemical space by application of privileged structure concept design
-Learn from Janssen’s experience with the assembly of the IMI European Lead Factory (ELF) library
-What is the real potential of macrocycles and are they the drugs of the future?
Journal Club - Best Practices for Scientific ComputingBram Zandbelt
This document discusses the importance of best practices in scientific computing. It notes that scientists rely heavily on software for research, with many writing their own code. However, most scientists are self-taught in software skills and may be unaware of best practices that could help them write more reliable and maintainable code. The document advocates treating software like a scientific instrument and following practices such as version control, testing, and automation. Adopting these practices could help reduce errors and make software easier to reuse.
SocialCite makes its debut at the HighWire Press meetingKent Anderson
A new service designed to allow readers and researchers to comment on the appropriateness, quality, and type of citations made in the literature made its debut at the HighWire Press Publishers Meeting yesterday.
Enhancing Data Integration with Text Analysis to Find Genes Implicated in Pla...Catherine Canevet
This document discusses enhancing data integration to identify proteins implicated in plant stress response. It introduces using text mining to combine structured data from databases with unstructured text from literature. A text mining plugin was developed for the Ondex data integration framework. It was applied to generate a knowledge base combining Arabidopsis proteins, stresses, and publications. Protein-stress association networks were visualized and metrics like interaction potential (IP) were used to filter associations and highlight key relationships validated from literature, improving the signal-to-noise ratio for identifying candidate genes.
Open science and the individual researcherBram Zandbelt
Slides for the Feb 8, 2017 lab meeting of Roshan Cools' Motivation & Cognitive Control group (Donders Institute), discussing the following paper:
McKiernan, E. C., Bourne, P. E., Brown, C. T., Buck, S., Kenall, A., Lin, J., … Yarkoni, T. (2016). How open science helps researchers succeed. eLife, 5, e16800. https://doi.org/10.7554/eLife.16800.
This document provides an overview and introduction to the statistical software R. It describes how R can be obtained and installed. R is a free and open-source software environment for statistical analysis and graphics. The document outlines the basic features of the R environment, including how to work with data and packages in R. It provides a conceptual overview of the organization of the book, which uses R and biological examples to teach statistics concepts ranging from basic to advanced topics.
Annotation examples. This is an overview of some of the software I have used for annotation (and a few extra features some of this software has.) This was presented in the SwissUniversities Doctoral Programme, Language & Cognition, in the Module: Linguistic and corpus perspectives on argumentative discourse.
Screenshots are given of GATE, UAM Corpus Tool, Excel, BRAT, EPPI Reviewer, and a custom tool. In most cases there are references to one of my papers for further details.
I briefly describe a typical annotation process:
Find text of interest
Find phenomena of interest
Draft an annotation manual
Iteratively test annotation & revise manual
Find questionable annotations, check disagreements.
Revise the manual.
Iterate.
Annotate
Quertle is a biomedical big data analytics company that provides a platform using artificial intelligence and other advanced techniques to analyze over 40 million biomedical documents. Their platform allows for more comprehensive and precise searches compared to keyword-based searches, and can discover relationships and make connections that other tools cannot. The platform also provides predictive visual analytics and concept-oriented exploration of the data to provide actionable insights. Quertle aims to help address issues with the growing volume of biomedical literature and information that is missed with current approaches.
Using Bioinformatics Data to inform Therapeutics discovery and developmentEleanor Howe
Diamond Age Data Science and Zafgen, Inc, co-present on their work in using bioinformatics data effectively in the context of a small therapeutics company.
Eleanor Howe, PhD, CEO of Diamond Age, presents on the different types of computational biologist, the characteristics of a good bioinformatics team, and the pluses and minuses of using deep learning/AI in a discovery biology context.
Huseyin Mehmet, VP of Discovery Research at Zafgen, describes his team's work with Diamond Age and uses their capabilities to inform Zafgen's drug development. He discusses the needs of biotech companies for a diverse, experience bioinformatics team.
This document summarizes a presentation by Christina Pikas on how librarians at special libraries like the Johns Hopkins Applied Physics Laboratory (APL) can provide bibliometric analysis services. Pikas discusses how librarians' domain knowledge, access to data, and understanding of ethics uniquely positions them to analyze research output and collaboration in a reliable way. She provides examples of bibliometric questions answered at APL and the tools used. Pikas concludes that librarians should leverage their skills and study bibliometrics to support research assessment activities.
This document summarizes a presentation by Timothy Hoctor, VP of Professional Services at Elsevier, about Elsevier's strategic vision and professional services. The key points are:
1) Elsevier aims to increase R&D productivity by linking data across the development spectrum and increase return on information through enhanced search and visualization tools.
2) Elsevier's Professional Services team leverages Elsevier's capabilities to provide customized data management and analysis solutions.
3) Elsevier's strategic objective is to become a leading collaborator in R&D data management through services like data mapping, gap analysis, data governance, and integrated data management.
Publishing and citing presentation for VLAG graduate school BaarloHugo Besemer
This document discusses publishing and impact metrics for PhD students. It covers motivations for publishing, different types of metrics including article, author, journal, and research group metrics. It also discusses citation databases, journal choice factors like impact factor and acceptance rate, and ways to increase citations like networking and claiming publications. Key metrics covered include the h-index, journal impact factor, and relative impact. The document provides examples and interpretations for bibliometric analysis.
Increasing transparency in Medical Education through Open Data Rebecca Grant
Slides presented at the AMEE Virtual Conference 2021, introducing the MedEdPublish platform and data policies. Approaches to sharing sensitive human data, and particulary qualitative data, are discussed.
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
Scot Edmunds talk at CODATA2019 on Quantifying how FAIR is Hong Kong: The Hong Kong Shareability of Hong Kong University Research Experiment. 19th September 2019 in Beijing
Recommending Scientific Papers: Investigating the User CurriculumJonathas Magalhães
In this paper, we propose a Personalized Paper Recommender System, a new user-paper based approach that takes into consideration the user academic curriculum vitae. To build the user profiles, we use a Brazilian academic platform called CV-Lattes. Furthermore, we examine some issues related to user profiling, such as (i) we define and compare different strategies to build and represent the user profiles, using terms and using concepts; (ii) we verify how much past information of a user is required to provide good recommendations; (iii) we compare our approaches with the state-of-art in paper recommendation using the CV-Lattes. To validate our strategies, we conduct a user study experiment involving 30 users in the Computer Science domain. Our results show that (i) our approaches outperform the state-of-art in CV-Lattes; (ii) concepts profiles are comparable with the terms profiles; (iii) analyzing the content of the past four years for terms profiles and five years for concepts profiles achieved the best results; and (iv) terms profiles provide better results but they are slower than concepts profiles, thus, if the system needs real time recommendations, concepts profiles are better.
Towards knowledge maintenance in scientific digital libraries with the keysto...jodischneider
JCDL2020 full paper.
Abstract:
Scientific digital libraries speed dissemination of scientific publications, but also the propagation of invalid or unreliable knowledge. Although many papers with known validity problems are highly cited, no auditing process is currently available to determine whether a citing paper’s findings fundamentally depend on invalid or unreliable knowledge. To address this, we introduce a new framework, the keystone framework, designed to identify when and how citing unreliable findings impacts a paper, using argumentation theory and citation context analysis. Through two pilot case studies, we demonstrate how the keystone framework can be applied to knowledge maintenance tasks for digital libraries, including addressing citations of a non-reproducible paper and identifying statements most needing validation in a high-impact paper. We identify roles for librarians, database maintainers, knowledge base curators, and research software engineers in applying the framework to scientific digital libraries.
doi:10.1145/3383583.3398514
Preprint: http://jodischneider.com/pubs/jcdl2020.pdf
Research in the time of Covid: Surveying impacts on Early Career ResearchersRebecca Grant
The document summarizes key findings from a survey of nearly 5,000 researchers conducted between March and July 2020 regarding the impact of COVID-19 on research. Some of the main findings include:
- Early career researchers reported being more impacted by COVID-19 than other career stages, with 33% saying their research was extremely or very impacted
- Half of respondents expect to reuse open data from other labs during lockdown and 65% expect to reuse their own data
- Data reuse is seen as important for allowing research to continue given restrictions, with 10-15% increase in intention to reuse data during and after the pandemic
- While early career researchers were generally more supportive of data sharing than other career stages, concerns around mis
Drug Discovery Data Insights with Andrew Leach (ChEMBL), Evan Bolton (PubChem...Lixin Liu
The emergence of freely available PubChem and ChEMBL resources for chemical and biological Structure-Activity-Relationship (SAR) data has radically changed the global drug discovery informatics landscape. The heterogeneity of low throughput and high throughput chemical and biological data do nevertheless present some unique challenges – and opportunities – when creating large-scale community resources – for sophisticated purveyors of drug discovery data.
Do Open data badges influence author behaviour? A case study at Springer NatureRebecca Grant
Digital badges have previously been shown to incentivise journal authors to share their data openly. In this paper we introduce an Open data badging project at the Springer Nature journal BMC Microbiology. The development of the Open data badge is described, as well as the challenges of developing standard badging criteria and ensuring authors’ awareness of the badges. Next steps for the badging project are outlined, which are based on the experiences of the team assessing the badges, the number of badges awarded at the journal to date, and the results of an author survey.
Slides from CDD's March 22 Webinar - Penetrating Gram Negative Bacteria. Hosted by Brad Sherborne (Merck) featuring Derek Tan (Memorial Sloan Kettering Cancer Center) and Helen Zgurskaya (University of Oklahoma).
A poster presented at the 2016 Annual Meeting of the Medical Library Association on a strategy for identifying emerging technologies through Pubmed searching. This is an outcome from the MLA systematic review project from the association's research initiative.
Text mining and summarization technologies can help researchers in 3 key ways:
1) By systematically screening the large volume of literature in their field to quickly assess relevance and quality of papers.
2) By providing quick informative overviews and summaries of academic papers in bullet points highlighting limitations to save researchers time.
3) By extracting references, figures, tables and datasets to allow researchers to analyze information in more depth and follow citation trails more efficiently.
Slides for the class, From Pattern Matching to Knowledge Discovery Using Text Mining and Visualization Techniques, presented June 13, 2010, at the Special Libraries Association 2010 annual meeting.
The class outline covers introduction to unstructured data analysis, word-level analysis using vector space model and TF-IDF, beyond word-level analysis using natural language processing, and a text mining demonstration in R mining Twitter data. The document provides background on text mining, defines what text mining is and its tasks. It discusses features of text data and methods for acquiring texts. It also covers word-level analysis methods like vector space model and TF-IDF, and applications. It discusses limitations of word-level analysis and how natural language processing can help. Finally, it demonstrates Twitter mining in R.
Open science and the individual researcherBram Zandbelt
Slides for the Feb 8, 2017 lab meeting of Roshan Cools' Motivation & Cognitive Control group (Donders Institute), discussing the following paper:
McKiernan, E. C., Bourne, P. E., Brown, C. T., Buck, S., Kenall, A., Lin, J., … Yarkoni, T. (2016). How open science helps researchers succeed. eLife, 5, e16800. https://doi.org/10.7554/eLife.16800.
This document provides an overview and introduction to the statistical software R. It describes how R can be obtained and installed. R is a free and open-source software environment for statistical analysis and graphics. The document outlines the basic features of the R environment, including how to work with data and packages in R. It provides a conceptual overview of the organization of the book, which uses R and biological examples to teach statistics concepts ranging from basic to advanced topics.
Annotation examples. This is an overview of some of the software I have used for annotation (and a few extra features some of this software has.) This was presented in the SwissUniversities Doctoral Programme, Language & Cognition, in the Module: Linguistic and corpus perspectives on argumentative discourse.
Screenshots are given of GATE, UAM Corpus Tool, Excel, BRAT, EPPI Reviewer, and a custom tool. In most cases there are references to one of my papers for further details.
I briefly describe a typical annotation process:
Find text of interest
Find phenomena of interest
Draft an annotation manual
Iteratively test annotation & revise manual
Find questionable annotations, check disagreements.
Revise the manual.
Iterate.
Annotate
Quertle is a biomedical big data analytics company that provides a platform using artificial intelligence and other advanced techniques to analyze over 40 million biomedical documents. Their platform allows for more comprehensive and precise searches compared to keyword-based searches, and can discover relationships and make connections that other tools cannot. The platform also provides predictive visual analytics and concept-oriented exploration of the data to provide actionable insights. Quertle aims to help address issues with the growing volume of biomedical literature and information that is missed with current approaches.
Using Bioinformatics Data to inform Therapeutics discovery and developmentEleanor Howe
Diamond Age Data Science and Zafgen, Inc, co-present on their work in using bioinformatics data effectively in the context of a small therapeutics company.
Eleanor Howe, PhD, CEO of Diamond Age, presents on the different types of computational biologist, the characteristics of a good bioinformatics team, and the pluses and minuses of using deep learning/AI in a discovery biology context.
Huseyin Mehmet, VP of Discovery Research at Zafgen, describes his team's work with Diamond Age and uses their capabilities to inform Zafgen's drug development. He discusses the needs of biotech companies for a diverse, experience bioinformatics team.
This document summarizes a presentation by Christina Pikas on how librarians at special libraries like the Johns Hopkins Applied Physics Laboratory (APL) can provide bibliometric analysis services. Pikas discusses how librarians' domain knowledge, access to data, and understanding of ethics uniquely positions them to analyze research output and collaboration in a reliable way. She provides examples of bibliometric questions answered at APL and the tools used. Pikas concludes that librarians should leverage their skills and study bibliometrics to support research assessment activities.
This document summarizes a presentation by Timothy Hoctor, VP of Professional Services at Elsevier, about Elsevier's strategic vision and professional services. The key points are:
1) Elsevier aims to increase R&D productivity by linking data across the development spectrum and increase return on information through enhanced search and visualization tools.
2) Elsevier's Professional Services team leverages Elsevier's capabilities to provide customized data management and analysis solutions.
3) Elsevier's strategic objective is to become a leading collaborator in R&D data management through services like data mapping, gap analysis, data governance, and integrated data management.
Publishing and citing presentation for VLAG graduate school BaarloHugo Besemer
This document discusses publishing and impact metrics for PhD students. It covers motivations for publishing, different types of metrics including article, author, journal, and research group metrics. It also discusses citation databases, journal choice factors like impact factor and acceptance rate, and ways to increase citations like networking and claiming publications. Key metrics covered include the h-index, journal impact factor, and relative impact. The document provides examples and interpretations for bibliometric analysis.
Increasing transparency in Medical Education through Open Data Rebecca Grant
Slides presented at the AMEE Virtual Conference 2021, introducing the MedEdPublish platform and data policies. Approaches to sharing sensitive human data, and particulary qualitative data, are discussed.
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
Scot Edmunds talk at CODATA2019 on Quantifying how FAIR is Hong Kong: The Hong Kong Shareability of Hong Kong University Research Experiment. 19th September 2019 in Beijing
Recommending Scientific Papers: Investigating the User CurriculumJonathas Magalhães
In this paper, we propose a Personalized Paper Recommender System, a new user-paper based approach that takes into consideration the user academic curriculum vitae. To build the user profiles, we use a Brazilian academic platform called CV-Lattes. Furthermore, we examine some issues related to user profiling, such as (i) we define and compare different strategies to build and represent the user profiles, using terms and using concepts; (ii) we verify how much past information of a user is required to provide good recommendations; (iii) we compare our approaches with the state-of-art in paper recommendation using the CV-Lattes. To validate our strategies, we conduct a user study experiment involving 30 users in the Computer Science domain. Our results show that (i) our approaches outperform the state-of-art in CV-Lattes; (ii) concepts profiles are comparable with the terms profiles; (iii) analyzing the content of the past four years for terms profiles and five years for concepts profiles achieved the best results; and (iv) terms profiles provide better results but they are slower than concepts profiles, thus, if the system needs real time recommendations, concepts profiles are better.
Towards knowledge maintenance in scientific digital libraries with the keysto...jodischneider
JCDL2020 full paper.
Abstract:
Scientific digital libraries speed dissemination of scientific publications, but also the propagation of invalid or unreliable knowledge. Although many papers with known validity problems are highly cited, no auditing process is currently available to determine whether a citing paper’s findings fundamentally depend on invalid or unreliable knowledge. To address this, we introduce a new framework, the keystone framework, designed to identify when and how citing unreliable findings impacts a paper, using argumentation theory and citation context analysis. Through two pilot case studies, we demonstrate how the keystone framework can be applied to knowledge maintenance tasks for digital libraries, including addressing citations of a non-reproducible paper and identifying statements most needing validation in a high-impact paper. We identify roles for librarians, database maintainers, knowledge base curators, and research software engineers in applying the framework to scientific digital libraries.
doi:10.1145/3383583.3398514
Preprint: http://jodischneider.com/pubs/jcdl2020.pdf
Research in the time of Covid: Surveying impacts on Early Career ResearchersRebecca Grant
The document summarizes key findings from a survey of nearly 5,000 researchers conducted between March and July 2020 regarding the impact of COVID-19 on research. Some of the main findings include:
- Early career researchers reported being more impacted by COVID-19 than other career stages, with 33% saying their research was extremely or very impacted
- Half of respondents expect to reuse open data from other labs during lockdown and 65% expect to reuse their own data
- Data reuse is seen as important for allowing research to continue given restrictions, with 10-15% increase in intention to reuse data during and after the pandemic
- While early career researchers were generally more supportive of data sharing than other career stages, concerns around mis
Drug Discovery Data Insights with Andrew Leach (ChEMBL), Evan Bolton (PubChem...Lixin Liu
The emergence of freely available PubChem and ChEMBL resources for chemical and biological Structure-Activity-Relationship (SAR) data has radically changed the global drug discovery informatics landscape. The heterogeneity of low throughput and high throughput chemical and biological data do nevertheless present some unique challenges – and opportunities – when creating large-scale community resources – for sophisticated purveyors of drug discovery data.
Do Open data badges influence author behaviour? A case study at Springer NatureRebecca Grant
Digital badges have previously been shown to incentivise journal authors to share their data openly. In this paper we introduce an Open data badging project at the Springer Nature journal BMC Microbiology. The development of the Open data badge is described, as well as the challenges of developing standard badging criteria and ensuring authors’ awareness of the badges. Next steps for the badging project are outlined, which are based on the experiences of the team assessing the badges, the number of badges awarded at the journal to date, and the results of an author survey.
Slides from CDD's March 22 Webinar - Penetrating Gram Negative Bacteria. Hosted by Brad Sherborne (Merck) featuring Derek Tan (Memorial Sloan Kettering Cancer Center) and Helen Zgurskaya (University of Oklahoma).
A poster presented at the 2016 Annual Meeting of the Medical Library Association on a strategy for identifying emerging technologies through Pubmed searching. This is an outcome from the MLA systematic review project from the association's research initiative.
Text mining and summarization technologies can help researchers in 3 key ways:
1) By systematically screening the large volume of literature in their field to quickly assess relevance and quality of papers.
2) By providing quick informative overviews and summaries of academic papers in bullet points highlighting limitations to save researchers time.
3) By extracting references, figures, tables and datasets to allow researchers to analyze information in more depth and follow citation trails more efficiently.
Slides for the class, From Pattern Matching to Knowledge Discovery Using Text Mining and Visualization Techniques, presented June 13, 2010, at the Special Libraries Association 2010 annual meeting.
The class outline covers introduction to unstructured data analysis, word-level analysis using vector space model and TF-IDF, beyond word-level analysis using natural language processing, and a text mining demonstration in R mining Twitter data. The document provides background on text mining, defines what text mining is and its tasks. It discusses features of text data and methods for acquiring texts. It also covers word-level analysis methods like vector space model and TF-IDF, and applications. It discusses limitations of word-level analysis and how natural language processing can help. Finally, it demonstrates Twitter mining in R.
Visualizing Text: Seth Redmore at the 2015 Smart Data Conferencesredmore
Seth Redmore talks about text and data visualization at this year's Smart Data Conference.
He covers:
-Common software packages for visualization
-Structured plots for unstructured text: Lines vs. bars vs. boxplots vs. piecharts vs. bubble charts
-Less structured plots: word clouds vs. treemaps vs. clusters vs. graphs
-Moving plots: animations over time
The document discusses text mining and provides examples. It defines text mining as the extraction of implicit knowledge from large amounts of textual data. It discusses applications such as marketing, industry research, and job seeking. Key text mining methods covered include information retrieval, information extraction, web mining, and clustering. The document outlines the text mining process and discusses text characteristics, learning methods such as classification and clustering, and evaluation metrics. Examples are provided to illustrate classification using decision trees and k-nearest neighbors on structured and unstructured text data.
This document outlines a seminar on text mining by examples presented by Hadi Mohammadzadeh. The seminar covers new terminologies related to text mining, WordNet as a lexical database, the Reuters-21578 text collection, CMU text learning group data archives, text mine software algorithms, and useful websites. The seminar is divided into seven parts covering these topics in detail with examples.
This document provides a tutorial on text mining and text stream mining techniques. It covers common text mining processes like transforming text into vector space models using bag-of-words representations, computing term weights, and applying machine learning algorithms. Specifically, it discusses vector space models, term weighting using TF-IDF, cosine similarity as a distance measure, and machine learning algorithms for classification like k-Nearest Neighbors, nearest centroid classification, and support vector machines. The tutorial is intended to provide an overview of fundamental text mining and text stream mining concepts.
This document introduces an online course on data warehousing from Edureka. It provides an overview of key topics that will be covered in the course, including what a data warehouse is, its architecture, the ETL process, and modeling dimensions and facts. It also shows examples of using PostgreSQL to create tables and Talend to populate them as part of a hands-on project in the course. The course modules will cover data warehousing introduction, dimensions and facts, normalization, modeling, ETL concepts, and a project building a data warehouse using Talend.
Big Data & Text Mining: Finding Nuggets in Mountains of Textual Data
Big amount of information is available in textual form in databases or online sources, and for many enterprise functions (marketing, maintenance, finance, etc.) represents a huge opportunity to improve their business knowledge. For example, text mining is starting to be used in marketing, more specifically in analytical customer relationship management, in order to achieve the holy 360° view of the customer (integrating elements from inbound mails, web comments, surveys, internal notes, etc.).
Facing this new domain I have make a personal research, and realize a synthesis, which has help me to clarify some ideas. The below presentation does not intend to be exhaustive on the subject, but could perhaps bring you some useful insights.
The document is a chapter from a textbook on data mining written by Akannsha A. Totewar, a professor at YCCE in Nagpur, India. It provides an introduction to data mining, including definitions of data mining, the motivation and evolution of the field, common data mining tasks, and major issues in data mining such as methodology, performance, and privacy.
Text Mining with R -- an Analysis of Twitter DataYanchang Zhao
This document discusses analyzing Twitter data using text mining techniques in R. It outlines extracting tweets from Twitter and cleaning the text by removing punctuation, numbers, URLs, and stopwords. It then analyzes the cleaned text by finding frequent words, word associations, and creating a word cloud visualization. It performs text clustering on the tweets using hierarchical and k-means clustering. Finally, it models topics in the tweets using partitioning around medoids clustering. The overall goal is to demonstrate various text mining and natural language processing techniques for analyzing Twitter data in R.
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GrahamSmith646206
Supporting research data across Springer Nature: joining up policy and practice. Slides from Graham Smith (Research Data Manager, Springer Nature) at HKU Open Data and Data Publishing Seminar, 25th October 2021.
This document provides an overview and syllabus for a course on bioinformatics. It discusses the goals of learning about available bioinformatics programs and tools, and interpreting their outputs. The course will cover topics like sequence alignment, phylogenetics, genome comparison and using databases. Assessment will include homework, exams, a report, and participation. The document contrasts the "old" and "new" biology, noting how the new biology generates large datasets that require computational analysis to make sense of the data. It emphasizes that bioinformatics uses algorithms and databases to organize, analyze and interpret biological data at large scales.
Developing core common outcomes for tropical peatland research and managementMark Reed
Presentation by Prof Mark Reed at CIFOR Indonesian to open UN Global Peatland Initiative workshop to identify key variables that should be measured in tropical peatland research and monitoring. Workshop co-facilitated by Mark Reed and Dylan Young, with slides adapted from a presentation by Gav Stewart, Newcastle University.
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
This document discusses topic modeling on 350 million documents from Mendeley. It describes how topic modeling can be used to categorize documents into topics and subcategories, though categorization is imperfect and topics change over time. It also discusses how topic modeling and metrics can help with fact discovery and reproducibility of research to build more robust datasets.
Practical applications for altmetrics in a changing metrics landscapeDigital Science
"Practical applications for altmetrics in a changing metrics landscape" - Sara Rouhi, Altmetric product specialist, and Anirvan Chatterjee, Director Data Strategy for CTSI at UCSF
Gather evidence to demonstrate the impact of your researchIUPUI
This workshop is the 3rd in a series of 4 titled "Maximize your impact" offered by the IUPUI University Library Center for Digital Scholarship. Faculty must provide strong evidence of impact in order to achieve promotion and tenure. Having strong evidence in year 5 is made easier by strategic dissemination early in your tenure track. In this hands-on workshop, we will introduce key sources of evidence to support your case, demonstrate strategies for gathering this evidence, and provide a variety of examples. These sources include citation metrics, article level metrics, and altmetrics as indicators of impact to support your narrative of excellence.
The document discusses research challenges and commercialization challenges. It provides definitions of basic research and applied research. It explains the differences between research and development approaches. It outlines typical activity details and timeframes for research processes like establishing context, selecting and designing methods, undertaking research, analysis and validation, and review and evaluation. It also discusses managing researchers, choosing good scientific problems, and MIMOS' role in supporting industry and market creation through technology creation, research, and commercialization.
Managing 'Big Data' in the social sciences: the contribution of an analytico-...CILIP MDG
This document discusses managing large datasets in the social sciences. It describes how the UK Data Service curates and provides access to large survey and census data. It explores how classification schemes could help organize and provide subject access to these growing datasets. A pilot project classified datasets using the Universal Decimal Classification scheme and found it efficient and helped visualize subject categories. Overall, carefully chosen knowledge organization tools can help provide multidimensional subject access needed to analyze complex datasets.
Crowdsourcing platforms are revolutionizing research by providing a way to collect clinical and behavioral data with unprecedented speed and efficiency. This seminar explores another digital platform called TurkPrime that is designed to suuport research participant recruitment. TurkPrime is a relatively new panel service that allows researchers to target specific demographic groups. If you watched our previous webinar on Amazon’s Mechanical Turk, also known as MTurk, you may find it interesting that TurkPrime offers a proportional matching sampling approach rather than MTurk’s opt-in, convenience sampling approach. Tasks that can be implemented with TurkPrime include: excluding participants on the basis of previous participation, longitudinal studies, making changes to a study while it is running, automating the approval process, increasing the speed of data collection, sending bulk e-mails and bonuses, enhancing communication with participants, monitoring dropout and engagement rates, providing enhanced sampling options, and many others.
The document discusses optimizing content findability. It emphasizes the importance of governance, organization, user involvement, and metadata to improve search and findability. Successful organizations allocate resources to analyze search usage and improve information architecture through taxonomy and metadata. User testing, feedback loops, and search analytics are also recommended to enhance findability.
Not just for STEM: Open and reproducible research in the social sciencesUoLResearchSupport
On Thursday 22nd April 2021, Dr Viktoria Spaiser spoke about how open and reproducible research is currently practiced in the social sciences, how it varies in quantitative, computational, and qualitative social research and how these practices are currently changing. She also discussed what the specific barriers for open and reproducible research in social science are and how at least some of them could be addressed in the future.
Viktoria Spaiser is an Associate Professor in Sustainability Research and Computational Social Sciences at the School of Politics and International Studies, University of Leeds. Viktoria is interested in sustainability research and specifically in how societies can make a rapid, fair and empowering transition to zero-emissions / zero-pollution. She applies mathematical and computational approaches to these and other social and political science research questions.
The document provides information on various aspects of research methodology. It defines key terms such as research, theory, data types, data collection methods, research design and sampling. It discusses primary and secondary data sources and the advantages and limitations of each. Various data collection techniques for qualitative and quantitative data are also outlined.
Open from beginning to end: addressing barriers to open research - a personal...UoLResearchSupport
Open and reproducible research practises are increasingly recognised as important to scientific integrity. However, there are numerous barriers including research culture - whether as a sector, institution or discipline - lack of training and professional incentives and funding of infrastructure.
On 26 May 2021 Dr Marlene Mengoni was one of two speakers at an event exploring barriers to open research.
Dr Marlene Mengoni is a member of the Institute of Medical & Biological Engineering (IMBE) at the University of Leeds and is interested in theoretical aspects of musculoskeletal tissues biomechanics with a fundamental computational engineering approach.
Speaking from an engineering perspective, Dr Mengoni discussed how the research culture at the University of Leeds can help to foster open research practices, throughout the research cycle, including embedding "open" in research and training.
Let's Talk Research 2015 -Juliet Goldbart - Introduction To Qualitative Metho...NHSNWRD
Introduction To Qualitative Methods: Different Approaches For Different Contexts
Jois Stansfield, Maxine Holt, Nigel Cox, Suzanne Gough, Juliet Goldbart, MMU
Why is Test Driven Development so hard to implement in an analytics platform?Phil Watt
Test Driven Development (TDD) is a common pattern in software engineering that helps reduce cycle time, improve code quality and reduce production defects. Within data engineering and analytics projects, TDD is held up as best practice in development and maintenance lifecycle phases. Anecdotally, many organisations do not see the promised benefits of TDD in an analytics context, prompting the question:
Why is it so hard to effectively implement Test Driven Development in an analytics platform?
This talk outlines Phil's research so far in his master's thesis on the topic of test automation in data and analytics projects. He presents seven key challenges revealed in academic studies, and the next steps in the research process.
Research Data Management Services at UWA (November 2015)Katina Toufexis
Research Data Management Services at the University of Western Australia (November 2015).
Created by Katina Toufexis of the eResearch Support Unit (University Library).
CC-BY
The Simulacrum, a Synthetic Cancer DatasetCongChen35
This presentation describes the applications of synthetic data to cancer registries's efforts to support understanding of and research based on cancer while reducing privacy risks to cancer patients.
The Simulacrum imitates some of the data held securely by the Public Health England’s National Cancer Registration and Analysis Service.
The data in the Simulacrum is entirely artificial. It does not contain data about real patients, so users can never identify a real person. It is free to use and allows anyone who wants to use record-level cancer data to do so, safe in the knowledge that while the data feels like the real thing, there is no danger of breaching patient confidentiality.
- The document discusses open science and various techniques used in the Data4Impact project such as text analysis, social media data collection from Twitter, and linked open data.
- It provides an overview of science norms and compares traditional CUDOS norms to more open PLACE norms.
- Data4Impact aims to build a knowledge graph linking different data sources to analyze the impact of research and innovation funding through new metrics and indicators. Machine learning and linked open data techniques are applied.
From "A National Approach to Open Research Data in Ireland", a workshop held on 8 September 2017 in National Library of Ireland, organised by The National Library of Ireland, the Digital Repository of Ireland, the Research Data Alliance and Open Research Ireland.
Similar to Text mining full text for molecular targets (20)
How predictive models help Medicinal Chemists design better drugs_webinarAnn-Marie Roche
All scientific disciplines, including medicinal chemistry, are experiencing a revolution in unprecedented rates of data being generated and the subsequent analysis and exploitation of this data is increasingly fundamental to innovation. Using data to design better compounds is a challenge for Medicinal and Computational chemists.
The design of small-molecule drug candidates, encompassing characteristics such as potency, selectivity and ADMET (absorption, distribution, metabolism, excretion and toxicity) is a key factor in the success of clinical trials and computer-aided drug discovery/design methods have played a major role in the development of therapeutically important small molecules for over three decades. These methods are broadly classified as either structure-based or ligand-based.
In this webinar our expert Dr. Olivier Barberan will discuss ligand-based methods and he will cover the following:
How to use only ligand information to predict activity depending on its similarity/dissimilarity to previously known active ligands.
- Discuss ligand-based pharmacophores, molecular descriptors, and quantitative structure-activity relationships and important tools such as target/ligand databases necessary for successful implementation of various computer-aided drug discovery/design methods in a drug discovery campaign.
Webinar: New RMC - Your lead_optimization Solution June082017Ann-Marie Roche
The document discusses Reaxys Medicinal Chemistry and how it supports hit-to-lead and lead optimization processes. It provides high quality data on topics like efficacy, ADMET properties, and animal models to help computational and medicinal chemists. The pX concept normalizes bioactivity measurements like IC50, Ki, and % inhibition into a single comparable metric, making it possible to compare compound affinity regardless of the metric reported. This allows researchers to more easily search for and analyze active compounds.
Oil&Gas Thought Leader Webinar - New Plays for Old Ideas - Dr.Gabor TariAnn-Marie Roche
In our April 2017 webinar, three industry experts shared their research and demonstrated the importance of focusing on fundamental geologic and geophysical research approaches that integrate variety of data, information and concepts from disparate sources and related disciplines.
This back-to-fundamentals research can both inspire and accelerate exploration teams’ thinking about petroleum systems and lead to a path to success.
Dr Gabor Tari is currently the Group Chief Geologist at OMV. He has over 20 years’ experience working in upstream oil & gas and has worked for Amoco, BP, and Vanco, before joining OMV in 2007. Gabor has worked on exploration projects in basins around the globe, including Romania, Angola, North Africa, and the Middle East. He has authored over 50 scientific publications, presented papers at dozens of conferences, and most recently co-authored the book Permo-Triassic Salt Provinces of Europe, North Africa and the Atlantic Margins, with Dr Joan Flinch (Repsol) and Juan Soto, Professor of Geodynamics in the Granada University and in the Instituto Andaluz de Ciencias de la Tierra, Spain, which is currently available from Elsevier for pre-order online.
Gabor discussed and shared some examples of how new plays can be built on a solid foundation of petroleum system development and research, and how new ideas can be garnered from building on published research of oil & gas companies, academia, service providers and consultants.
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Rob ForknerAnn-Marie Roche
In our April 2017 webinar, three industry experts shared their research and demonstrated the importance of focusing on fundamental geologic and geophysical research approaches that integrate variety of data, information and concepts from disparate sources and related disciplines. This back-to-fundamentals research can both inspire and accelerate exploration teams’ thinking about petroleum systems and lead to a path to success.
Dr Rob Forkner is a carbonate geologist at Statoil, working in the carbonate plays and reservoirs research group in Austin, Texas, focusing on carbonate play prediction in Atlantic margin systems. Prior to Statoil, Rob worked at Maersk and Shell in onshore and offshore in well planning, geosteering, high-resolution sequence stratigraphy and facies prediction, carbonate sedimentology in unconventional assets, evaporite classification and prediction, rock typing, and more recently, carbonate system suppression and recovery during Oceanic Anoxic Events.
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Sander HoubenAnn-Marie Roche
Dr. Sander Houben presented on combining paleoceanographic and exploration tools to study Early Jurassic anoxic events. He discussed how carbon isotopes can be used as a stratigraphic tool to analyze perturbations to the carbon cycle during these events. Palynological analysis of indicators of photic zone anoxia and chemocline migration provided insight into changes in water column ecology. A case study of the Toarcian OAE and Posidonia Shale Formation showed how isotopic analyses revealed a major increase in export of hydrogen-rich organic matter due to intensified primary productivity by diazotrophs under low oxygen conditions. Paleoceanographic observations combined with an exploration geology perspective provided understanding of the formation of
Embase for pharmacovigilance: Search and validation March 22 2017Ann-Marie Roche
Scientific literature plays a critical role in Pharmacovigilance and Drug Safety workflows. Monitoring literature for mentions of adverse drug reactions (ADRs) is mandated by regulatory bodies, and marketing authorization holders (MAHs) that do not properly report ADRs can be subject to heavy fines. With an increasing volume of unstructured content to cover, along with rising labor costs, MAHs are looking for ways to make their literature monitoring more effective and efficient.
Abstract and indexing (A&I) databases play an important role in Literature Monitoring – due to the vast amount of scientific literature published daily – in order for MAH’s to locate specific articles or conference presentations that may be relevant for their products (for both benefit/risk analysis and ADR detection). Rather than reading all the literature, MAH’s create search strategies that identify the relevant records in A&I databases and execute the searches regularly. GVP module VI mandates that searches are done at least weekly, but many companies maintain a daily monitoring and review cycle.
In this webinar, Senior Product Development Manager Embase, Dr. Ivan Krstic discussed best practices for saving time, staying current, validating search strategies and mitigating risk in the face of these increasingly complex processes in literature monitoring
Literature Management for Pharmacovigilance: Outsource or in-house solution? ...Ann-Marie Roche
Pharmaceutical companies are required to screen scientific literature on a regular basis and this comes with many challenges, such as handling large amounts of data, building search strings and integrating EMA MLM results. Out-sourcing literature screening to service providers reduces the workload for the PV-team, but how does it impact the literature management process overall? Maybe it results in decreased oversight and additional activities like audits and reconciliation? And what about building the search strategy?
During this webinar our PV expert, Dr. Joyce De Langen spoke about the following:
• The importance of literature management in Pharmacovigilance and the challenges.
• An evaluation of the benefits and risks of outsourcing literature management versus alternative solutions.
About the speaker:
Joyce de Langen, Ph.D has more than 10 years of experience in the domain of pharmacovigilance and drug safety. Through her work in the pharmaceutical industry, academia and regulatory authorities, Joyce has developed a broad perspective and knowledge in pharmacovigilance and drug safety.
Finding the right medical device information in embase 11 2016Ann-Marie Roche
The document discusses guidelines for systematic reviews of biomedical literature in Clinical Evaluation Reports (CERs) for medical devices, highlighting how Embase addresses the requirements through its comprehensive indexing of devices, manufacturers, and adverse effects, as well as features for building sensitive searches. It also provides examples of searches in Embase to find information on device clinical performance, comparisons, and safety for a case study on an everolimus eluting coronary stent.
The document discusses medical device adverse event reporting requirements, including definitions of reportable events and timelines for submitting reports to regulatory agencies. It provides an overview of the classification system for medical devices and regulations around reporting malfunctions, deaths and serious injuries caused by devices. Reporting requirements and challenges involving software as a medical device are also reviewed.
The All-New 2016 Engineering Academic Challenge - developed by students for students
The Engineering Academic Challenge (formerly as the Knovel Academic Challenge) is an immersive, 5-week interactive problem-set competition, featuring weekly thematic engineering challenges built around five transdisciplinary themes inspired by the National Academy of Engineering Grand Challenges.
Literature monitoring for pv what are we doing at galderma elsevier webinarAnn-Marie Roche
The document discusses literature monitoring for pharmacovigilance. It describes weekly monitoring of individual case safety reports and periodic monitoring through development safety update reports and periodic benefit-risk evaluation reports. Key databases for literature searches are Medline and Embase. While Embase has more extensive drug coverage, searches on Medline via PubMed are more reliable due to the potential for loss of MeSH subheadings when mapping to Emtree and the risk of false negatives and positives when searching Embase alone. Literature searches support signal detection and periodic evaluation of a product's safety profile.
This document discusses how drug analytics based on manually extracted semantic relationships in Embase can be useful for drug development, repurposing, and safety. It describes how relationships between drugs, diseases, and adverse reactions that are manually indexed can provide valuable information for drug repurposing, development, and safety. Specific examples are provided to show how the semantic relationships can guide drug repositioning strategies, investigate new combination drugs, identify drug-drug interactions, collect drug comparison data, and help improve risk management.
This document discusses Lean Six Sigma and resources available through Knovel to support Lean Six Sigma implementation. It provides an overview of the Lean Six Sigma implementation process including strategic leadership and vision, deployment planning, and execution and results. It describes Knovel's Lean Six Sigma resources such as handbooks, case studies, templates, and guides covering tools like DMAIC, DOE, SPC etc. that can help with the different belts and project phases from Define to Control. Other resources discussed include those for Design for Six Sigma and practical applications/case studies.
Reaxys provides a unified information portal that integrates data from multiple chemistry sources through a single interface. It links chemistry data, structures, citations, and full-text articles. Reaxys also integrates in-house data from sources like electronic lab notebooks through its API and can be used for activities like compound screening, literature searching, and patent analysis to support drug discovery.
Phil Lorenzi discusses pathway analysis approaches and their uses in biomedical research and drug development. He compares strategies for analyzing the autophagy and apoptosis pathways, finding that integrating multiple methods provides the most comprehensive understanding. Lorenzi also provides examples of how pathway analysis could have predicted problems with COX-2 inhibitors and helped explain past failures of AKT inhibitors. He concludes that pathway analysis is consistent with approvals of EGFR, MEK, RANKL and PARP inhibitors and may support development of GLS inhibitors.
Searching literature databases for post authorisation safety studies (pass)Ann-Marie Roche
This document discusses using literature databases like Embase to conduct post-authorization safety studies (PASS) through systematic literature reviews and meta-analyses. It provides an example PASS on the drug brentuximab vedotin that identified adverse events like peripheral neuropathy and infections. The document reviews how to structure a literature search using the PICO framework and Embase's in-depth indexing of concepts, relationships, and causality to comprehensively identify safety outcomes reported for a drug.
Julie glanville embase sunrise seminar may 2016Ann-Marie Roche
Simple text mining tools can help Embase users in several ways:
- Frequency analysis of terms in records can identify useful search terms and concepts to explore. Tools like EndNote and Voyant allow viewing frequencies of words in titles, abstracts, and subject headings.
- Phrase analysis identifies common word combinations or concepts in the text, beyond single words. Voyant and TERMINE are useful for this.
- Word collocation analysis shows which words frequently occur near each other, suggesting relationships between ideas. The Voyant collocates tool supports this.
- Cluster and network visualizations identify major themes or concepts within a set of records. VOSviewer creates visual maps of related terms.
Exploring records
Ian crowlesmith embase retrospective mla 2016Ann-Marie Roche
Embase began in 1946 as Excerpta Medica, founded to provide medical abstracts. It was acquired by Elsevier in 1971 and became available online in 1978. Key developments included introducing a controlled vocabulary called Emtree in 1987 and adding item types and check tags for evidence-based medicine in 1990. Currently, Embase indexes articles in great depth using natural language and extensively covers drugs and devices. The taxonomy Emtree is regularly updated to reflect new terms.
The document provides an update on new features and enhancements to Embase.com. Key points include:
- The addition of a new PICO search page that allows users to build clinical searches by splitting questions into Patient, Intervention, Comparison, and Outcome elements.
- Other enhancements include improved search tips, the ability to add synonyms and view all abstracts, as well as analytics capabilities for drug safety and repurposing based on triple indexing of content.
- Future plans include improvements to content, taxonomy, and indexing as well as a revamp of the search platform interface and functionality.
This document discusses upcoming changes to process safety management (PSM) regulations and standards. It notes several major industrial accidents in recent decades that prompted reforms. New PSM requirements in California will likely be adopted more widely and require more prescriptive tasks, reporting, and accountability. To ensure future PSM success, the document recommends: making no distinction between internal/external compliance; expanding the definition of mechanical integrity; understanding "double jeopardy"; not replacing investigations with management of change; knowing what the operations team is doing; and clarifying teamwork expectations regarding stop work authorizations.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
1. Country Long Distance
Australia +61 3 8488 8993
Austria +43 (0) 7 2088 2171
Belgium +32 (0) 42 68 0164
Canada +1 (647) 497-9386
Denmark +45 (0) 89 88 04 43
Finland +358 (0) 931 58 4587
France +33 (0) 182 880 933
Germany +49 (0) 692 5736 7304
Ireland +353 (0) 19 036 186
Text Mining Full Text for Molecular Targets
with George Jiang, Ph.D., M.B.A
Our Webinar will begin in a few minutes
Country Long Distance
Italy +39 0 294 75 15 36
Netherlands +31 (0) 108 080 115
New Zealand +64 (0) 9 801 0293
Norway +47 21 03 72 89
Spain +34 911 23 4247
Sweden +46 (0) 852 500 292
Switzerland +41 (0) 435 0824 40
United Kingdom +44 (0) 330 221 9921
United States +1 (646) 307-1726
TO USE YOUR COMPUTER'S AUDIO: When the webinar begins, you will be connected to audio using
your computer's microphone and speakers (VoIP). A headset is recommended.
--OR--
TO USE YOUR TELEPHONE: If you prefer to use your phone, you must select "Use Telephone" after
joining the webinar and call in using the numbers below.
Dial your country’s number and then use Access Code: 655-028-479
2. Text Mining Full Text
for Molecular Targets
George Jiang, PhD, MBA
Product Manager, Text Mining
g.jiang@elsevier.com
March 31, 2015
3. George Jiang
Product Manager
Text Mining
Trained scientist with several years of experience in text analytics, data integration, and
scientific software development
• Currently, Product Manager with Elsevier working on text mining projects and
semantic search products, based out of Rockville, MD
• Previously, worked at US National Center for Biotechnology Information (NCBI)
working on Discovery Initiative to understand users needs and crosslink data and
expose it to make research information more discoverable
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
4. World Leader in Digital Information Solutions
Published over
330,000 articles
in 2013
Founded over
130 years ago
Work with over
30 million
scientists, students, health
& information professionals
Received over
1 million submissions
in 2013
SOLUTIONS
Over 53 million
items indexed by
Scopus
Elsevier
R+D Solutions
Elsevier
Clinical Solutions
Helps corporate
researchers, R+D
professionals, and
engineers improve how
they interact with, share,
and apply information to
solve problems using
our digital workflow
tools, analytics, and data
Provides universities,
governments, and
research institutions with
the resources and
insights to improve
institutional research
strategy, management,
and performance.
Elsevier
Education
Helps medical
professionals apply
trusted data and
sophisticated tools to
make better clinical
decisions, deliver better
care, and produce
better healthcare
outcomes.
Helps educate
highly-skilled,
effective healthcare
professionals, using
the most advanced
pedagogical tools
and reference
works.
Elsevier
Research Intelligence
CONTENT
CAPABILITIESPLATFORMS
Publishes over
2,200 online
journals & over
26,000 books
(e + print)
Elsevier eBooks, Online
Journals, Databases
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
5. Working With Text is A Big Data Challenge
Text is everywhere! We’ve already covered 100s of terms in this presentation.
Twitter - 58M tweets/day x 14.98 words/tweet => 868M words/day => 6B
Average journal article = 10, 150, 6000 words in title, abstract , full text
abstracts – 2.4B words (24M abstracts @ PubMed x 100 words/abstract)
full text – 144B words ( if comparable set from PubMed, 25M x 6000
The information deluge of scientific content and how to manage
and/or leverage this information is a big data challenge
Information seeking challenges can be
addressed with automation assistance
and text mining for greater insight
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
6. Summary
• Text mining can help to sift through large amounts of scientific literature and other
textual content
• Text mining can help to increase project team efficiency to find precise statements
and relationships
• Full text articles provide richer result sets that can be useful in finding additional
insights that cannot be garnered just using abstracts
• Several hurdles still exist to implement text mining but the value can outweigh costs
Text mining full text can be used to help find molecular targets of
interest quickly that may be missed if relying on abstracts and
keyword searching
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
7. Agenda
• Introduction to Text Mining
• The Value of Full Text Articles
• Illustration of Text Mining Full Text Articles
• Recap
• Q&A
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
8. What is Text Mining?
Text Mining
• Refers to the process of deriving high-quality structured
A Does B
X Inhibits Y
G Stops D
I Drink T
documents facts
Why Text Mining?
• Text Mining can yield better results, and increase team efficiency
• The application of text mining techniques can be used to solve
business problems
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
9. Example of Getting Structured Information (Facts)
Triple negative breast cancer (TNBC) cells lack receptor expression, are frequently
more aggressive and are resistant to growth factor inhibition
documents
sentence
fact(s)
Tumour cells show greater dependency on glycolysis so providing a sufficient and rapid energy supply for fast growth. In many breast cancers, estrogen, progesterone and
epidermal growth factor receptor-positive cells proliferate in response to growth factors and growth factor antagonists are a mainstay of treatment. However, triple negative
breast cancer (TNBC) cells lack receptor expression, are frequently more aggressive and are resistant to growth factor inhibition. Downstream of growth factor receptors,
signal transduction proceeds via phosphatidylinositol 3-kinase (PI3k), Akt and FOXO3a inhibition, the latter being partly responsible for coordinated increases in glycolysis
and apoptosis resistance. FOXO3a may be an attractive therapeutic target for TNBC. Therefore we have undertaken a systematic review of FOXO3a as a target for breast
cancer therapeutics.
paragraph
TNBC cells lack receptor expression
TNBC cells are more aggressive
TNBC cells resist growth factor inhibition
Excerpt from Taylor et al. Evaluating the
evidence for targeting FOXO3a in breast
cancer: a systematic review.
Wordcloud plotted with Wordle.net
tokens
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Text analytics and
Visualizations
10. What is Text Mining Being Used For?
Use cases include:
• Target identification and prioritization
• Biomarker discovery
• Drug repurposing
• Drug safety and finding adverse events
• Clinical study design and site selection
• Competitive intelligence
DISCOVERY
PRE-
CLINICAL
CLINICAL
POST-
LAUNCH
Text mining article submissions for curation
assistance in publishing
Basic Research Applied Research
Text mining can be used to support several research and development areas
Information retrieval and analysis
of biomedical literature for target
identification, systematic reviews,
etc.
Searching clinical trial data
or electronic health records
to find signals in patient
populations
Triage of news and papers
for literature curation and
regulatory reporting
Identifying relevant items for
meta-analysis of specific research
results
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
11. How to Text Mine?
• Content
• Ontology
• Software solution(s)
• Expertise
Several pieces and steps are often needed to get results from text mining
Aggregate1 Structure2
Normalize3
Integrate4
• PDF -> XML
• XML quality differs
• XML uniformity e.g. dealing
with sources, types, etc.
Default or custom ontology
• Text mining the corpus
• Balancing expectations of
precision and recall
1. Aggregate
2. Structure
3. Normalize
4. Integrate
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Text Mining solutions &
Professional Services
12. Elsevier Offers Several Text Mining Solutions
facts and data out
support downstream
applications and activities
Aggregate
Normalize
Structure
Integrate
1
2
3
4
Journals and
Books
Internal
content
Patents
Other
Software solution
UI / API
Public data
sources
User Questions
Software solutions and Professional Services available for text mining and
semantic searching
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
13. • Introduction to Text Mining
• The Value of Full Text Articles
• Illustration of Text Mining Full Text Articles
• Recap
• Q&A
Agenda
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
14. Abstracts vs Full Text
• Concise summaries
• Readily accessible
• Relatively uniform
Summary of main differences
• Complete documents
• May not be as accessible
• Information within can vary
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
15. Benefits of Using Full Text
• Distribution of keywords, facts and relations – more keywords, facts
and relations are found in full text
• Concept under-representation in abstracts – specific entities may not
be mentioned in abstracts but primarily in full text sections e.g.
biological functions
• Missing Negative data – often negative results or non-significant data
are missing from abstracts
• Citations per article – full text sections are more cited vs abstracts
• Timeliness – Relevant facts and relationships can be found in full text
first before any mentions in abstracts as researchers surmise in
Full Text provide richer results sets
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
16. Additional Reading
• Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatics. 2003 May 29;4:20. Epub 2003 May 29.
• Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles. J Biomed Inform. 2010
Apr;43(2):173-89. doi: 10.1016/j.jbi.2009.11.001. Epub 2009 Nov 10.
• Do Peers See More in a Paper Than Its Authors? Adv Bioinformatics. 2012;2012:750214. doi: 10.1155/2012/750214. Epub 2012 Nov 27.
• Is searching full text more effective than searching abstracts? Bioinformatics. 2009 Feb 3;10:46. doi: 10.1186/1471-2105-10-46.
• Challenges for automatically extracting molecular interactions from full-text articles. BMC Bioinformatics. 2009 Sep 24;10:311. doi:
10.1186/1471-2105-10-311.
• Semi-Automatic Indexing of Full Text Biomedical Articles. AMIA Annu Symp Proc. 2005:271-5.
• Discovering implicit associations between genes and hereditary diseases. Pac Symp Biocomput. 2007:316-27.
• The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics. 2010
Sep 29;11:492. doi: 10.1186/1471-2105-11-492.
• Abstracts in high profile journals often fail to report harm. BMC Med Res Methodol. 2008 Mar 27;8:14. doi: 10.1186/1471-2288-8-14.
• Quality of abstracts of original research articles in CMAJ in 1989. CMAJ. 1991 Feb 15;144(4):449-53.
• Accuracy of data in abstracts of published research articles. JAMA. 1999 Mar 24-31;281(12):1110-1.
Articles highlighting the differences between abstracts and full text
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
17. Abstract vs Full Text Example
Challenges
Sifting through more information!
Finding the right results
Concise abstracts cannot contain all details whereas full text will
contain all the relevant information
Significant advances have been made in the treatment of human immunodeficiency virus (HIV) infection over the past two
decades. Improved therapy has prolonged survival and improved clinical outcome for HIV-infected children and adults.
Sixteen antiretroviral (ART) medications have been approved for use in pediatric HIV infection. The Department of Health
and Human Services (DHHS) has issued “Guidelines for the Use of Antiretroviral Agents in Pediatric HIV Infection”, which
provide detailed information on currently recommended antiretroviral therapies (ART). However, consultation with an HIV
specialist is recommended as the current therapy of pediatric HIV therapy is complex and rapidly evolving.
Elvitegravir is a once daily integrase inhibitor being studied in adults.
Children with treatment failure should be evaluated for medication adherence, drug intolerance, and possible drug
interactions which may lessen the efficacy of the therapeutic regimen.
Abstract
Full Text
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
18. • Introduction to Text Mining
• The Value of Full Text Articles
• Illustration of Text Mining Full Text Articles
• Recap
• Q&A
Agenda
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
19. • Use Elsevier Text Mining solution to search against corpus of biomedical literature
• Abstracts – MEDLINE/PubMed (24M)
• Full text – PubMed Central, Elsevier and partner publishers (4M)
• Refine results corpus, redefine search / text mining output
• Review and analyze data
• Create visual data reports using other tools available
Methods
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
20. Search against scientific literature corpus for sentences related to efficacy
If looking for details, one really needs to look at the full text results
Text Mining Abstracts vs Full Text
Word clouds suggest insight differences between abstracts and full text
Full textAbstracts Only
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
21. Full text provides insights into the specific mutations implicated in differential enzymatic efficacy of
a particular drug class
Finding Molecular Targets in Full Text
Word clouds illustrating differences in point mutations mentioned
Full TextAbstracts Only
Gives insight into the mutations implicated for
changes in efficacy.
No mutations mentioned in abstracts of
comparable document set.
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
22. Finding Molecular Targets in Full Text
Example searching for cancer immunity checkpoint proteins
Full text provides insights into additional protein targets that may be of interest for cancer
immunology research in cancer checkpoints
Full TextAbstracts Only
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
23. Text Mining Results Can Then Be Used For Analyses
• Review results. Not just keyword matching anymore …
identifying more relevant documents for review
identifying relationships and precise statements
Identifying other targets/content of interest
• Link data to other items of interest
• Analytics, visualization and system/network analysis e.g. Pathway Studio,
Cytoscape
• Integrate text mining data and process into different workflows for project
quality and efficiency
Text mining results can be used to improve scientific research and can be
used to address business problems
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
24. Text Mining Finds Answers Faster & Increases Efficiency
An Example Project Comparison
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Savings:
Text mining robustly identifies the relevant articles
Savings of 171 person-days per project
Allows more projects/higher quality with same staff
Keyword searching: Text Mining:
Finds 1,408 articles
Many of them not relevant
Identifies 142 relevant articles
176 person-days to review
@ 20 min/article
5 person-days to review
@ 20 min/article
VS
24
Writing comprehensive state of the science review article on the chemical toxicity of a particular
substance
25. Relationship map using Elsevier Text Mining
results into Cytoscape visualization
NLP
Example of Visual Insights of Text Mining Results
Intersecting adverse events between two anti-TNF drugs
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
26. Summary
• Text mining can help to sift through large amounts of scientific literature and other
textual content
• Text mining can help to increase project team efficiency to find precise statements
and relationships
• Full text articles provide richer result sets that can be useful in finding additional
insights that cannot be garnered just using abstracts
• Several hurdles still exist to implement text mining but the value can outweigh costs
Text mining full text can be used to help find molecular targets of
interest quickly that may be missed if relying on abstracts and
keyword searching
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
27. Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Thank you for joining our webinar today:
Text Mining Full Text for Molecular Targets
with George Jiang, Ph.D., M.B.A
If you have any questions for our speaker, please type them into
the CHAT window.
If you would like more information you can contact:
George Jiang
g.jiang@elsevier.com
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Editor's Notes
Who are we?
Elsevier is a digital information solutions company with roots in publishing world class, peer-reviewed scientific, medical, and technical literature, going back 130 years.
What is our Mission?
Elsevier’s abiding purpose—it’s brand essence—is to empower knowledge and to empower its clients through knowledge. It is to perpetuate knowledge as a vital, organic set of discoveries oriented toward truth and, ultimately, solutions to fundamental human challenges. This mission of empowerment is accomplished by the application of sophisticated digital technology and analytics to some of the world’s greatest scientific, technical, and medical content, which Elsevier has helped to produce, under the peer review system, for over 130 years.
Empowered Knowledge. The Knowledge that Empowers.
What do we do?
We help professionals advance knowledge by expanding it as a body of confirmed facts and ideas, and getting it to yield positive, measurable—sometimes ground-breaking—outcomes in these disciplines (example: more Nobel Laureate authors published in the last half-century than by any other publisher)
What is ‘the product?’
We continue to produce intellectual content—largely in the form of digitized books, journals, and proprietary databases—delivering access to them via the internet and other offline digital channels (examples: journals (Lancet, imprints (Cell, MK), ScienceDirect, Mendeley, Scopus)
In addition to—and layered over—this content, are technology and analytics (tools and solutions), that allow clients and end-users to do ‘more with’ information: to produce it, interact with it, manipulate it, and share it with greater facility, efficiency, and creativity (examples: ClinicalKey, Reaxys, SciVal, SimChart)
What are the Benefits of working with us?
Elsevier empowers knowledge professionals to be more collaborative and competitive, efficient and effective; to perform better, and to create knowledge with impact.
Today, focus is on scientific literature space and full text articles
Sources of data for various statistics
Oxford University Press
Cognition article
Manual copy and paste and word counting