Text mining full text for molecular targets

Country Long Distance
Australia +61 3 8488 8993
Austria +43 (0) 7 2088 2171
Belgium +32 (0) 42 68 0164
Canada +1 (647) 497-9386
Denmark +45 (0) 89 88 04 43
Finland +358 (0) 931 58 4587
France +33 (0) 182 880 933
Germany +49 (0) 692 5736 7304
Ireland +353 (0) 19 036 186
Text Mining Full Text for Molecular Targets
with George Jiang, Ph.D., M.B.A
Our Webinar will begin in a few minutes
Country Long Distance
Italy +39 0 294 75 15 36
Netherlands +31 (0) 108 080 115
New Zealand +64 (0) 9 801 0293
Norway +47 21 03 72 89
Spain +34 911 23 4247
Sweden +46 (0) 852 500 292
Switzerland +41 (0) 435 0824 40
United Kingdom +44 (0) 330 221 9921
United States +1 (646) 307-1726
TO USE YOUR COMPUTER'S AUDIO: When the webinar begins, you will be connected to audio using
your computer's microphone and speakers (VoIP). A headset is recommended.
--OR--
TO USE YOUR TELEPHONE: If you prefer to use your phone, you must select "Use Telephone" after
joining the webinar and call in using the numbers below.
Dial your country’s number and then use Access Code: 655-028-479

Text Mining Full Text
for Molecular Targets
George Jiang, PhD, MBA
Product Manager, Text Mining
g.jiang@elsevier.com
March 31, 2015

George Jiang
Product Manager
Text Mining
Trained scientist with several years of experience in text analytics, data integration, and
scientific software development
• Currently, Product Manager with Elsevier working on text mining projects and
semantic search products, based out of Rockville, MD
• Previously, worked at US National Center for Biotechnology Information (NCBI)
working on Discovery Initiative to understand users needs and crosslink data and
expose it to make research information more discoverable
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

World Leader in Digital Information Solutions
Published over
330,000 articles
in 2013
Founded over
130 years ago
Work with over
30 million
scientists, students, health
& information professionals
Received over
1 million submissions
in 2013
SOLUTIONS
Over 53 million
items indexed by
Scopus
Elsevier
R+D Solutions
Elsevier
Clinical Solutions
Helps corporate
researchers, R+D
professionals, and
engineers improve how
they interact with, share,
and apply information to
solve problems using
our digital workflow
tools, analytics, and data
Provides universities,
governments, and
research institutions with
the resources and
insights to improve
institutional research
strategy, management,
and performance.
Elsevier
Education
Helps medical
professionals apply
trusted data and
sophisticated tools to
make better clinical
decisions, deliver better
care, and produce
better healthcare
outcomes.
Helps educate
highly-skilled,
effective healthcare
professionals, using
the most advanced
pedagogical tools
and reference
works.
Elsevier
Research Intelligence
CONTENT
CAPABILITIESPLATFORMS
Publishes over
2,200 online
journals & over
26,000 books
(e + print)
Elsevier eBooks, Online
Journals, Databases

Working With Text is A Big Data Challenge
Text is everywhere! We’ve already covered 100s of terms in this presentation.
 Twitter - 58M tweets/day x 14.98 words/tweet => 868M words/day => 6B
Average journal article = 10, 150, 6000 words in title, abstract , full text
 abstracts – 2.4B words (24M abstracts @ PubMed x 100 words/abstract)
 full text – 144B words ( if comparable set from PubMed, 25M x 6000
The information deluge of scientific content and how to manage
and/or leverage this information is a big data challenge
Information seeking challenges can be
addressed with automation assistance
and text mining for greater insight

Summary
• Text mining can help to sift through large amounts of scientific literature and other
textual content
• Text mining can help to increase project team efficiency to find precise statements
and relationships
• Full text articles provide richer result sets that can be useful in finding additional
insights that cannot be garnered just using abstracts
• Several hurdles still exist to implement text mining but the value can outweigh costs
Text mining full text can be used to help find molecular targets of
interest quickly that may be missed if relying on abstracts and
keyword searching

Agenda
• Introduction to Text Mining
• The Value of Full Text Articles
• Illustration of Text Mining Full Text Articles
• Recap
• Q&A

What is Text Mining?
Text Mining
• Refers to the process of deriving high-quality structured
A Does B
X Inhibits Y
G Stops D
I Drink T
documents facts
Why Text Mining?
• Text Mining can yield better results, and increase team efficiency
• The application of text mining techniques can be used to solve
business problems

Example of Getting Structured Information (Facts)
Triple negative breast cancer (TNBC) cells lack receptor expression, are frequently
more aggressive and are resistant to growth factor inhibition
documents
sentence
fact(s)
Tumour cells show greater dependency on glycolysis so providing a sufficient and rapid energy supply for fast growth. In many breast cancers, estrogen, progesterone and
epidermal growth factor receptor-positive cells proliferate in response to growth factors and growth factor antagonists are a mainstay of treatment. However, triple negative
breast cancer (TNBC) cells lack receptor expression, are frequently more aggressive and are resistant to growth factor inhibition. Downstream of growth factor receptors,
signal transduction proceeds via phosphatidylinositol 3-kinase (PI3k), Akt and FOXO3a inhibition, the latter being partly responsible for coordinated increases in glycolysis
and apoptosis resistance. FOXO3a may be an attractive therapeutic target for TNBC. Therefore we have undertaken a systematic review of FOXO3a as a target for breast
cancer therapeutics.
paragraph
TNBC cells lack receptor expression
TNBC cells are more aggressive
TNBC cells resist growth factor inhibition
Excerpt from Taylor et al. Evaluating the
evidence for targeting FOXO3a in breast
cancer: a systematic review.
Wordcloud plotted with Wordle.net
tokens
Text analytics and
Visualizations

What is Text Mining Being Used For?
Use cases include:
• Target identification and prioritization
• Biomarker discovery
• Drug repurposing
• Drug safety and finding adverse events
• Clinical study design and site selection
• Competitive intelligence
DISCOVERY
PRE-
CLINICAL
CLINICAL
POST-
LAUNCH
Text mining article submissions for curation
assistance in publishing
Basic Research Applied Research
Text mining can be used to support several research and development areas
Information retrieval and analysis
of biomedical literature for target
identification, systematic reviews,
etc.
Searching clinical trial data
or electronic health records
to find signals in patient
populations
Triage of news and papers
for literature curation and
regulatory reporting
Identifying relevant items for
meta-analysis of specific research
results

How to Text Mine?
• Content
• Ontology
• Software solution(s)
• Expertise
Several pieces and steps are often needed to get results from text mining
Aggregate1 Structure2
Normalize3
Integrate4
• PDF -> XML
• XML quality differs
• XML uniformity e.g. dealing
with sources, types, etc.
Default or custom ontology
• Text mining the corpus
• Balancing expectations of
precision and recall
1. Aggregate
2. Structure
3. Normalize
4. Integrate
Text Mining solutions &
Professional Services

Elsevier Offers Several Text Mining Solutions
facts and data out
support downstream
applications and activities
Aggregate
Normalize
Structure
Integrate
1
2
3
4
Journals and
Books
Internal
content
Patents
Other
Software solution
UI / API
Public data
sources
User Questions
Software solutions and Professional Services available for text mining and
semantic searching

• Introduction to Text Mining
• The Value of Full Text Articles
• Illustration of Text Mining Full Text Articles
• Recap
• Q&A
Agenda

Abstracts vs Full Text
• Concise summaries
• Readily accessible
• Relatively uniform
Summary of main differences
• Complete documents
• May not be as accessible
• Information within can vary

Benefits of Using Full Text
• Distribution of keywords, facts and relations – more keywords, facts
and relations are found in full text
• Concept under-representation in abstracts – specific entities may not
be mentioned in abstracts but primarily in full text sections e.g.
biological functions
• Missing Negative data – often negative results or non-significant data
are missing from abstracts
• Citations per article – full text sections are more cited vs abstracts
• Timeliness – Relevant facts and relationships can be found in full text
first before any mentions in abstracts as researchers surmise in
Full Text provide richer results sets

Additional Reading
• Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatics. 2003 May 29;4:20. Epub 2003 May 29.
• Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles. J Biomed Inform. 2010
Apr;43(2):173-89. doi: 10.1016/j.jbi.2009.11.001. Epub 2009 Nov 10.
• Do Peers See More in a Paper Than Its Authors? Adv Bioinformatics. 2012;2012:750214. doi: 10.1155/2012/750214. Epub 2012 Nov 27.
• Is searching full text more effective than searching abstracts? Bioinformatics. 2009 Feb 3;10:46. doi: 10.1186/1471-2105-10-46.
• Challenges for automatically extracting molecular interactions from full-text articles. BMC Bioinformatics. 2009 Sep 24;10:311. doi:
10.1186/1471-2105-10-311.
• Semi-Automatic Indexing of Full Text Biomedical Articles. AMIA Annu Symp Proc. 2005:271-5.
• Discovering implicit associations between genes and hereditary diseases. Pac Symp Biocomput. 2007:316-27.
• The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics. 2010
Sep 29;11:492. doi: 10.1186/1471-2105-11-492.
• Abstracts in high profile journals often fail to report harm. BMC Med Res Methodol. 2008 Mar 27;8:14. doi: 10.1186/1471-2288-8-14.
• Quality of abstracts of original research articles in CMAJ in 1989. CMAJ. 1991 Feb 15;144(4):449-53.
• Accuracy of data in abstracts of published research articles. JAMA. 1999 Mar 24-31;281(12):1110-1.
Articles highlighting the differences between abstracts and full text

Abstract vs Full Text Example
Challenges
 Sifting through more information!
 Finding the right results
Concise abstracts cannot contain all details whereas full text will
contain all the relevant information
Significant advances have been made in the treatment of human immunodeficiency virus (HIV) infection over the past two
decades. Improved therapy has prolonged survival and improved clinical outcome for HIV-infected children and adults.
Sixteen antiretroviral (ART) medications have been approved for use in pediatric HIV infection. The Department of Health
and Human Services (DHHS) has issued “Guidelines for the Use of Antiretroviral Agents in Pediatric HIV Infection”, which
provide detailed information on currently recommended antiretroviral therapies (ART). However, consultation with an HIV
specialist is recommended as the current therapy of pediatric HIV therapy is complex and rapidly evolving.
Elvitegravir is a once daily integrase inhibitor being studied in adults.
Children with treatment failure should be evaluated for medication adherence, drug intolerance, and possible drug
interactions which may lessen the efficacy of the therapeutic regimen.
Abstract
Full Text

• Use Elsevier Text Mining solution to search against corpus of biomedical literature
• Abstracts – MEDLINE/PubMed (24M)
• Full text – PubMed Central, Elsevier and partner publishers (4M)
• Refine results corpus, redefine search / text mining output
• Review and analyze data
• Create visual data reports using other tools available
Methods

Search against scientific literature corpus for sentences related to efficacy
If looking for details, one really needs to look at the full text results
Text Mining Abstracts vs Full Text
Word clouds suggest insight differences between abstracts and full text
Full textAbstracts Only

Full text provides insights into the specific mutations implicated in differential enzymatic efficacy of
a particular drug class
Finding Molecular Targets in Full Text
Word clouds illustrating differences in point mutations mentioned
Full TextAbstracts Only
Gives insight into the mutations implicated for
changes in efficacy.
No mutations mentioned in abstracts of
comparable document set.

Finding Molecular Targets in Full Text
Example searching for cancer immunity checkpoint proteins
Full text provides insights into additional protein targets that may be of interest for cancer
immunology research in cancer checkpoints
Full TextAbstracts Only

Text Mining Results Can Then Be Used For Analyses
• Review results. Not just keyword matching anymore …
 identifying more relevant documents for review
 identifying relationships and precise statements
 Identifying other targets/content of interest
• Link data to other items of interest
• Analytics, visualization and system/network analysis e.g. Pathway Studio,
Cytoscape
• Integrate text mining data and process into different workflows for project
quality and efficiency
Text mining results can be used to improve scientific research and can be
used to address business problems

Text Mining Finds Answers Faster & Increases Efficiency
An Example Project Comparison
Savings:
 Text mining robustly identifies the relevant articles
 Savings of 171 person-days per project
 Allows more projects/higher quality with same staff
Keyword searching: Text Mining:
Finds 1,408 articles
Many of them not relevant
Identifies 142 relevant articles
176 person-days to review
@ 20 min/article
5 person-days to review
@ 20 min/article
VS
24
Writing comprehensive state of the science review article on the chemical toxicity of a particular
substance

Relationship map using Elsevier Text Mining
results into Cytoscape visualization
NLP
Example of Visual Insights of Text Mining Results
Intersecting adverse events between two anti-TNF drugs

Thank you for joining our webinar today:
Text Mining Full Text for Molecular Targets
with George Jiang, Ph.D., M.B.A
If you have any questions for our speaker, please type them into
the CHAT window.
If you would like more information you can contact:
George Jiang
g.jiang@elsevier.com

Text mining full text for molecular targets

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (16)

Similar to Text mining full text for molecular targets

Similar to Text mining full text for molecular targets (20)

More from Ann-Marie Roche

More from Ann-Marie Roche (20)

Recently uploaded

Recently uploaded (20)

Text mining full text for molecular targets

Editor's Notes