SlideShare a Scribd company logo
1 of 1
Download to read offline
TEMPLATE DESIGN © 2008
www.PosterPresentations.com
Detection of Emerging Research Trends by Biomedical Text Mining Algorithm
Majid Mirzai, Tyler Chia (MPH), Reid Orenstein, Anish Patel, Sachin Devi (Ph.D)
LECOM School of Pharmacy, 5000 Lakewood Ranch Blvd. Bradenton, Florida 34211
Objective
The objective was to demonstrate how a newly developed
text mining tool can be used to identify emerging research
trends.
Background
Methods
Article titles containing the word “obesity” were downloaded
from PubMed. A primary text mining algorithm developed in
Visual Basis 6.0 (VB6) was used to calculate word frequency.
All words since year 1880 were used to create a data
visualization technique known as “word cloud.” Words that
appeared more frequently were presented larger than words
that appeared less frequently. Average percentage increase
in word frequency was then calculated for all words over the
period of five years (2011-2015) to identify the emerging
trends in obesity. A secondary text mining algorithm was
developed to filter unique words that appeared for the very
first time in 2015 along with the word “obesity” in order to
identify the most recent scientific trends.
Article titles containing the word “obesity” were downloaded 
from PubMed and imported into a custom text analytics 
program written in Visual Basic 6.0. PubMed article titles were 
analyzed to identify emerging research trends and novel 
scientific breakthroughs
A word cloud of all the words that appeared along with the word 
“obesity” in the titles of all the articles published since 1880. 
Font size of the word is directly proportional to the word 
frequency of that particular word. The most prominent themes in 
obesity research were treatment, childhood, overweight, 
diabetes, insulin, metabolic, etc.
A representative list of the words that appeared for the very first 
time in the year 2015. Over 100 obesity-related therapeutic 
targets were identified. Not only therapeutic targets were 
identified but several biomarkers, genes, proteins, etc. 
associated with obesity were also identified.
Publication trends for the PubMed articles containing the words 
“nonalcoholic” and “obesity: in their titles. The word 
“nonalcoholic” was published for the first time along with the 
word “obesity” in 1986. Subsequent publication containing the 
words “nonalcoholic” and “obesity” together occurred in 1987 
followed by a 12-year lengthy period of no publication at all. 
Interestingly, in the last 15 years, there is a continuous stream 
of publications containing the word “nonalcoholic” and “obesity”. 
These data clearly indicate a growing research trend in 
“nonalcoholic” + “obesity” research area.
Average percentage increase (APC) in word frequency was 
calculated to identify emerging trends in obesity research over a 
five-year period (2011-2015). This is a representative list of 
interesting emerging trends in the past five years.
Results
Conclusion
A total of 58,215 articles containing the word “obesity” were
downloaded from PubMed. The primary text mining algorithm
found the words with highest frequency were “obesity”
(n=39,830), “treatment” (n=3,758), “risk” (n=3,382), and
“childhood” (n=3,052). Over the five years from 2011 to 2015,
average percentage change found an increase in the terms
“nonalcoholic” by 750%, “placental” by 680%, “microbiome”
by 680%, and “dopamine” by 580%. In 2015 alone, the
secondary algorithm uncovered over 100 terms that were not
previously present. Biomarkers, genes, proteins, etc. were
discovered as novel therapeutic targets associated with
obesity. These terms represent potential obesity-related novel
therapeutic targets that can be used for future research.
This study demonstrates the ability of a text mining algorithm
to uncover emerging research trends that would normally be
buried under the vast number of publications.
PubMed is the largest database of biomedical literature that
contains over 25 million citations. Analyzing this vast number
of articles, coupled with the rapid rate of publication, presents
a challenge to the scientific community. Therefore, there is a
need for a high-performing scalable tool to identify emerging
novel scientific trends. We hypothesized that analyzing the
titles of the scientific articles can assist in identifying
emerging research trends. In the present study, text mining
algorithms were used to unearth novel and emerging
scientific trends using the case study of “obesity as research
interest.”
Bag,S. et al.  (2015)  Fabp4  is  central  to  obesity  associated  genes:  a 
functional gene network-based polymorphic study. J. Theor. Biol., 364,
344-354.
Bekhuis,T. (2006) Conceptual biology, hypothesis discovery, and text 
mining: Swanson’s legacy. Biomed. Digit. Libr., 3, 2.
Bikman,B.T. (2012) A role for sphingolipids in the pathophysiology of 
obesity-induced inflammation. Cell Mol. Life Sci., 69, 2135-46.
Charles,D. (2015) In the search for the perfect sugar substitute, another 
candidate emerges. NPR. 25 August 2015. Web. 
Choi,S. and Snider,A.J. (2015) Sphingolipids in high fat diet and obesity
-related diseases. Mediat. Inflamm., 2015, 1-12.
Cluny,N.L. et al.  (2015)  Interactive  effectsoligofructose  and  obesity 
predisposition on gut hormones and microbiota in diet-induced obese 
rats. Obesity, 23, 769-778.
Doroghazi,R.M.  (2015)  A  candid  discussion  of  obesity. Am. J. Med., 
128, 213-214.
Feng,R. et al. (2014) Higher vaspin levels in subjects with obesity and 
type  2  diabetes  mellitus:   a  meta-analysis. Diabetes Res. Clin. Pr., 
106, 88-94.
He,W. et al (2013) Social media competitive analysis and text mining: a 
case study in the pizza industry. Int. J. Inform. Manage., 33, 464-472.
Hossain,A. et al.  (2015)  Rare  sugar  D-allulose:  potential  role  and 
therapeutic  monitoring  in  maintaining  obesity  and  type  2  diabetes 
mellitus. Pharmacol. Ther., 155, 49-59.
Klöting,N. et al.  (2006)  Vaspin  gene  expression  in  human  adipose 
tissue: association with obesity and type 2 diabetes. Biochem. Bioph.
Res. Co., 339, 430-436.
Kumar,M.J. (2013) Making your research paper discoverable: title plays 
the winning trick. IETE Technical Review, 30, 361-363.
Lopez,C. et al. (2014) How can catchy titles generated without loss of 
informativeness. Expert Syst. Appl., 41, 1051-1062.
Lyssenko,V. et al. (2009) A common variant in the melatonin receptor 
gene  (MTNR1B)  is  associated  with  increased  risk  of  future  type  2 
diabetes and impaired early insulin secretion. Nat. Genet., 41, 82-88.
Massey,V.L. et al.  (2015)  Oligofructose  protects  against  arsenic-
induced  liver  injury  in  a  model  of  environment/obesity  interaction. 
Toxicol. Appl. Pharmacol., 284, 304-314.
Pedrami,F. et al. (2016)  Text  analytics  of  AJPE  article  titles  reveal 
emerging trends in pharmacy education in the past two decades. Am.
J. Pharm. Educ., In press.
PubMed  Help  [Internet].  (2005)  National  Center  for  Biotechnology 
Information (US). PubMed Help. [Updated 2016 Feb 14].
Romero,C.  and  Ventura,S.  (2013)  Data  mining  in  education. Wiley
Interdiscip. Rev. Data Min. Knowl. Discov., 1, 12-27.
Woting,A. et al.  (2015)  Alleviation  of  high  fat-induced  obesity  by 
oligofructose  in  gnotobiotic  mice  is  independent  of  presence  of 
Bifidobacterium longum. Mol. Nutr. Food Res., 59, 2267-2278.
  of  Zaremba,S. et al.  (2009)  Text-mining  of  PubMed  abstracts  by 
natural  language  processing  to  create  a  public  knowledge  base  on 
molecular  mechanisms  of  bacterial  enteropathogens. BMC
Bioinformatics, 10, 177.
References
Top 25 most 
frequently appearing 
words along with the 
word “obesity” in the 
titles of the PubMed 
articles.
Top 25 most frequently 
appearing words along with 
the word “obesity” in the 
titles of PubMed articles 
after ignoring articles, 
prepositions, conjunctions, 
etc.
Overview of Methodology
Most Frequently Appearing Words
Overall Analysis - Word Cloud
5 Year Analysis - Emerging Trends in Obesity Research
Publication Trends of “nonalcoholic”
1 Year Analysis - Novel Scientific Breakthroughs in 2015

More Related Content

Similar to FSHP Poster

Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
Ankur Khanna
 
K Bobyk - %22A Primer on Personalized Medicine - The Imminent Systemic Shift%...
K Bobyk - %22A Primer on Personalized Medicine - The Imminent Systemic Shift%...K Bobyk - %22A Primer on Personalized Medicine - The Imminent Systemic Shift%...
K Bobyk - %22A Primer on Personalized Medicine - The Imminent Systemic Shift%...
Kostyantyn Bobyk
 
Khoury ashg2014
Khoury ashg2014Khoury ashg2014
Khoury ashg2014
muink
 
eumr_issue_2-2_2015-2.compressed
eumr_issue_2-2_2015-2.compressedeumr_issue_2-2_2015-2.compressed
eumr_issue_2-2_2015-2.compressed
Sharon Hsieh
 
Research methodology
Research methodologyResearch methodology
Research methodology
Tosif Ahmad
 
Running headINTRODUCTION, LITERATURE REVIEW AND METHODS SECTION .docx
Running headINTRODUCTION, LITERATURE REVIEW AND METHODS SECTION .docxRunning headINTRODUCTION, LITERATURE REVIEW AND METHODS SECTION .docx
Running headINTRODUCTION, LITERATURE REVIEW AND METHODS SECTION .docx
agnesdcarey33086
 

Similar to FSHP Poster (20)

Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
 
Final-SLR-MARKO-PAPUCKOVSKI
Final-SLR-MARKO-PAPUCKOVSKIFinal-SLR-MARKO-PAPUCKOVSKI
Final-SLR-MARKO-PAPUCKOVSKI
 
K Bobyk - %22A Primer on Personalized Medicine - The Imminent Systemic Shift%...
K Bobyk - %22A Primer on Personalized Medicine - The Imminent Systemic Shift%...K Bobyk - %22A Primer on Personalized Medicine - The Imminent Systemic Shift%...
K Bobyk - %22A Primer on Personalized Medicine - The Imminent Systemic Shift%...
 
The Concept of Precision Nutrition and Product Technology R&D Innovation ——Zh...
The Concept of Precision Nutrition and Product Technology R&D Innovation ——Zh...The Concept of Precision Nutrition and Product Technology R&D Innovation ——Zh...
The Concept of Precision Nutrition and Product Technology R&D Innovation ——Zh...
 
Khoury ashg2014
Khoury ashg2014Khoury ashg2014
Khoury ashg2014
 
Reaching out to collaborators and crowdsourcing for pharmaceutical research
Reaching out to collaborators and crowdsourcing for pharmaceutical research  Reaching out to collaborators and crowdsourcing for pharmaceutical research
Reaching out to collaborators and crowdsourcing for pharmaceutical research
 
Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416
 
Beyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific CommunicationBeyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific Communication
 
eumr_issue_2-2_2015-2.compressed
eumr_issue_2-2_2015-2.compressedeumr_issue_2-2_2015-2.compressed
eumr_issue_2-2_2015-2.compressed
 
Research methodology
Research methodologyResearch methodology
Research methodology
 
Collaborative Technologies for Biomedical Research
Collaborative Technologies for Biomedical ResearchCollaborative Technologies for Biomedical Research
Collaborative Technologies for Biomedical Research
 
Collaborative Computational Technologies for the Life Sciences
Collaborative Computational Technologies for the Life Sciences Collaborative Computational Technologies for the Life Sciences
Collaborative Computational Technologies for the Life Sciences
 
MRADUS16-PBCUS16_Agenda
MRADUS16-PBCUS16_AgendaMRADUS16-PBCUS16_Agenda
MRADUS16-PBCUS16_Agenda
 
Running headINTRODUCTION, LITERATURE REVIEW AND METHODS SECTION .docx
Running headINTRODUCTION, LITERATURE REVIEW AND METHODS SECTION .docxRunning headINTRODUCTION, LITERATURE REVIEW AND METHODS SECTION .docx
Running headINTRODUCTION, LITERATURE REVIEW AND METHODS SECTION .docx
 
2013 09 atul butte mahajani symposium
2013 09 atul butte mahajani symposium2013 09 atul butte mahajani symposium
2013 09 atul butte mahajani symposium
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
Potential Adverse Effect of Caffeine Consumption - JavaCoffeeiq.com
Potential Adverse Effect of Caffeine Consumption - JavaCoffeeiq.comPotential Adverse Effect of Caffeine Consumption - JavaCoffeeiq.com
Potential Adverse Effect of Caffeine Consumption - JavaCoffeeiq.com
 
Big Data and the Future by Sherri Rose
Big Data and the Future by Sherri RoseBig Data and the Future by Sherri Rose
Big Data and the Future by Sherri Rose
 
Sample Of Research Essay
Sample Of Research EssaySample Of Research Essay
Sample Of Research Essay
 
Slides for rare disorders meeting
Slides for rare disorders meetingSlides for rare disorders meeting
Slides for rare disorders meeting
 

FSHP Poster

  • 1. TEMPLATE DESIGN © 2008 www.PosterPresentations.com Detection of Emerging Research Trends by Biomedical Text Mining Algorithm Majid Mirzai, Tyler Chia (MPH), Reid Orenstein, Anish Patel, Sachin Devi (Ph.D) LECOM School of Pharmacy, 5000 Lakewood Ranch Blvd. Bradenton, Florida 34211 Objective The objective was to demonstrate how a newly developed text mining tool can be used to identify emerging research trends. Background Methods Article titles containing the word “obesity” were downloaded from PubMed. A primary text mining algorithm developed in Visual Basis 6.0 (VB6) was used to calculate word frequency. All words since year 1880 were used to create a data visualization technique known as “word cloud.” Words that appeared more frequently were presented larger than words that appeared less frequently. Average percentage increase in word frequency was then calculated for all words over the period of five years (2011-2015) to identify the emerging trends in obesity. A secondary text mining algorithm was developed to filter unique words that appeared for the very first time in 2015 along with the word “obesity” in order to identify the most recent scientific trends. Article titles containing the word “obesity” were downloaded  from PubMed and imported into a custom text analytics  program written in Visual Basic 6.0. PubMed article titles were  analyzed to identify emerging research trends and novel  scientific breakthroughs A word cloud of all the words that appeared along with the word  “obesity” in the titles of all the articles published since 1880.  Font size of the word is directly proportional to the word  frequency of that particular word. The most prominent themes in  obesity research were treatment, childhood, overweight,  diabetes, insulin, metabolic, etc. A representative list of the words that appeared for the very first  time in the year 2015. Over 100 obesity-related therapeutic  targets were identified. Not only therapeutic targets were  identified but several biomarkers, genes, proteins, etc.  associated with obesity were also identified. Publication trends for the PubMed articles containing the words  “nonalcoholic” and “obesity: in their titles. The word  “nonalcoholic” was published for the first time along with the  word “obesity” in 1986. Subsequent publication containing the  words “nonalcoholic” and “obesity” together occurred in 1987  followed by a 12-year lengthy period of no publication at all.  Interestingly, in the last 15 years, there is a continuous stream  of publications containing the word “nonalcoholic” and “obesity”.  These data clearly indicate a growing research trend in  “nonalcoholic” + “obesity” research area. Average percentage increase (APC) in word frequency was  calculated to identify emerging trends in obesity research over a  five-year period (2011-2015). This is a representative list of  interesting emerging trends in the past five years. Results Conclusion A total of 58,215 articles containing the word “obesity” were downloaded from PubMed. The primary text mining algorithm found the words with highest frequency were “obesity” (n=39,830), “treatment” (n=3,758), “risk” (n=3,382), and “childhood” (n=3,052). Over the five years from 2011 to 2015, average percentage change found an increase in the terms “nonalcoholic” by 750%, “placental” by 680%, “microbiome” by 680%, and “dopamine” by 580%. In 2015 alone, the secondary algorithm uncovered over 100 terms that were not previously present. Biomarkers, genes, proteins, etc. were discovered as novel therapeutic targets associated with obesity. These terms represent potential obesity-related novel therapeutic targets that can be used for future research. This study demonstrates the ability of a text mining algorithm to uncover emerging research trends that would normally be buried under the vast number of publications. PubMed is the largest database of biomedical literature that contains over 25 million citations. Analyzing this vast number of articles, coupled with the rapid rate of publication, presents a challenge to the scientific community. Therefore, there is a need for a high-performing scalable tool to identify emerging novel scientific trends. We hypothesized that analyzing the titles of the scientific articles can assist in identifying emerging research trends. In the present study, text mining algorithms were used to unearth novel and emerging scientific trends using the case study of “obesity as research interest.” Bag,S. et al.  (2015)  Fabp4  is  central  to  obesity  associated  genes:  a  functional gene network-based polymorphic study. J. Theor. Biol., 364, 344-354. Bekhuis,T. (2006) Conceptual biology, hypothesis discovery, and text  mining: Swanson’s legacy. Biomed. Digit. Libr., 3, 2. Bikman,B.T. (2012) A role for sphingolipids in the pathophysiology of  obesity-induced inflammation. Cell Mol. Life Sci., 69, 2135-46. Charles,D. (2015) In the search for the perfect sugar substitute, another  candidate emerges. NPR. 25 August 2015. Web.  Choi,S. and Snider,A.J. (2015) Sphingolipids in high fat diet and obesity -related diseases. Mediat. Inflamm., 2015, 1-12. Cluny,N.L. et al.  (2015)  Interactive  effectsoligofructose  and  obesity  predisposition on gut hormones and microbiota in diet-induced obese  rats. Obesity, 23, 769-778. Doroghazi,R.M.  (2015)  A  candid  discussion  of  obesity. Am. J. Med.,  128, 213-214. Feng,R. et al. (2014) Higher vaspin levels in subjects with obesity and  type  2  diabetes  mellitus:   a  meta-analysis. Diabetes Res. Clin. Pr.,  106, 88-94. He,W. et al (2013) Social media competitive analysis and text mining: a  case study in the pizza industry. Int. J. Inform. Manage., 33, 464-472. Hossain,A. et al.  (2015)  Rare  sugar  D-allulose:  potential  role  and  therapeutic  monitoring  in  maintaining  obesity  and  type  2  diabetes  mellitus. Pharmacol. Ther., 155, 49-59. Klöting,N. et al.  (2006)  Vaspin  gene  expression  in  human  adipose  tissue: association with obesity and type 2 diabetes. Biochem. Bioph. Res. Co., 339, 430-436. Kumar,M.J. (2013) Making your research paper discoverable: title plays  the winning trick. IETE Technical Review, 30, 361-363. Lopez,C. et al. (2014) How can catchy titles generated without loss of  informativeness. Expert Syst. Appl., 41, 1051-1062. Lyssenko,V. et al. (2009) A common variant in the melatonin receptor  gene  (MTNR1B)  is  associated  with  increased  risk  of  future  type  2  diabetes and impaired early insulin secretion. Nat. Genet., 41, 82-88. Massey,V.L. et al.  (2015)  Oligofructose  protects  against  arsenic- induced  liver  injury  in  a  model  of  environment/obesity  interaction.  Toxicol. Appl. Pharmacol., 284, 304-314. Pedrami,F. et al. (2016)  Text  analytics  of  AJPE  article  titles  reveal  emerging trends in pharmacy education in the past two decades. Am. J. Pharm. Educ., In press. PubMed  Help  [Internet].  (2005)  National  Center  for  Biotechnology  Information (US). PubMed Help. [Updated 2016 Feb 14]. Romero,C.  and  Ventura,S.  (2013)  Data  mining  in  education. Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 1, 12-27. Woting,A. et al.  (2015)  Alleviation  of  high  fat-induced  obesity  by  oligofructose  in  gnotobiotic  mice  is  independent  of  presence  of  Bifidobacterium longum. Mol. Nutr. Food Res., 59, 2267-2278.   of  Zaremba,S. et al.  (2009)  Text-mining  of  PubMed  abstracts  by  natural  language  processing  to  create  a  public  knowledge  base  on  molecular  mechanisms  of  bacterial  enteropathogens. BMC Bioinformatics, 10, 177. References Top 25 most  frequently appearing  words along with the  word “obesity” in the  titles of the PubMed  articles. Top 25 most frequently  appearing words along with  the word “obesity” in the  titles of PubMed articles  after ignoring articles,  prepositions, conjunctions,  etc. Overview of Methodology Most Frequently Appearing Words Overall Analysis - Word Cloud 5 Year Analysis - Emerging Trends in Obesity Research Publication Trends of “nonalcoholic” 1 Year Analysis - Novel Scientific Breakthroughs in 2015