SlideShare a Scribd company logo
1 of 1
Poster Design & Printing by Genigraphics®
- 800.790.4001
MACHINE LEARNING ALGORITHMS FOR NAME ENTITY RECOGNITION
Neha Gupta, Thomas Hahn, Prasanna Balakrishnan, Dr. Richard Segall
University of Arkansas at Little Rock, University of Arkansas for Medical Sciences, Arkansas State University, Boston Children’s Hospital,
Sathyabama Deemed University, Supported by grants from NCRR(P20RR016460) and NIGMS(P20GM103429) at NIH.HMM
METHODS
MAP REDUCE
REFERENCES
ABSTRACT
CONTACT
University of Arkansas Little Rock
Email: thomas.f.hahn@gmail.com,
nneha2ggupta@gmail.com,
balkiprasanna1984@gmail.com,
rssegall@aol.com
Existing libraries like ABNER and GATE are
being used to extract information from the
wealth of available biomedical literature. But
the disadvantage of ABNER and GATE is
that they run only in single machine.
However, the MapReduce algorithm will run
on multiple machines and is found to be
more efficient than these frameworks in our
analysis. The objective of this project is to
develop a statistically based model for
biomedical text mining. This application is
intended to address current limitations in
biomedical text mining regarding large-scale
datasets. A statistical model is built based
on the Hidden Markov Model (HMM) which
can be used to identify the biological
entities in the literature. This can be used to
identify and build networks crucial for
human diseases. HMM model is generated
and viterbi algorithm was applied to the list
of observations.
..
1. Lantz, B. (2013), Machine Learning with R. Birmingham,
UK: Packt Publishing.
2. Kim, J., Ohta, T., Tateisi, Y. & Tsujii, J. (2003). GENIA
corpus—a semantically annotated corpus for bio-
textmining. Retrieved 02/19, 2014, from http://0-
bioinformatics.oxfordjournals.org.iii-
server.ualr.edu/content/19/suppl_1/i180.abstract
3. Zhang, J., Shen, D., Zhou, G. D., Su, J., & Tan, C. L.
(2004). Enhancing HMM-based biomedical named entity
recognition by studying special phenomena. Journal of
Biomedical Informatics, 37(6), 411-422.
doi:10.1016/j.jbi.2004.08.005. Retrieved 02/21, 2014,
fromhttp://download.journals.elsevierhealth.com/pdfs/jou
rnals/1532-0464/PIIS1532046404000838.pdf
BAUM-WELCH ALGORITHM
-HMM model generated in R had a
logViterbiScore -16.57684.
-HMM was successfully trained in Map
Reduce to generate the expected counts of
initial states, emissions and transitions taken
to generate the sequence.
-The maximum likelihood score L of our
training set S was 151 using Baum-Welch
implementation in Perl.
-The training data was classified and
compared by the algorithms kNN, Naïve Baye
Bayes, Logistic Regression and SVM
-For linear kernel in SVM, 50 model’s
predictions does not agree with the actual
test dataset .
- Using Radial Basis Function kernel, 192
model predictions does not agree
-Clustered seven entities from GENIA
CORPUS 2000 Medline abstracts .
-Aligned the seven clusters to the PubMed
abstracts words and tagged them.
-The transition probability matrix and
emission probability matrix was generated
for these 500 tagged training words.
-HMM model was generated using RHMM
package .
-HMM training was also parallelized using
Map Reduce Algorithm in Perl.
-MLE of hidden parameters of HMM are
inferred using Baum-Welch algorithm
implementation in Perl.
-Implemented machine learning classifiers
like kNN, Naïve-Bayes, Logistic
Regression ,SVM.
RESULTS
Neha Gupta, Thomas Hahn, Prasanna Balakrishnan, Dr. Richard Segall
University of Arkansas at Little Rock, University of Arkansas for Medical Sciences, Arkansas State University, Boston Children’s Hospital,
Sathyabama Deemed University,
This project was supported by the Arkansas INBRE program, with grants from the National Center for Research Resources - NCRR
(P20RR016460) and the National Institute of General Medical Sciences - NIGMS (P20 GM103429) from the National Institutes of Health.

More Related Content

Viewers also liked

1 evaluacion de diseño industrial 7 paola duran (camara de utileria )
1 evaluacion de diseño industrial 7  paola duran (camara de utileria )1 evaluacion de diseño industrial 7  paola duran (camara de utileria )
1 evaluacion de diseño industrial 7 paola duran (camara de utileria )18_SUSANA_03
 
Narrativa do David - MOH
Narrativa do David - MOHNarrativa do David - MOH
Narrativa do David - MOHGrupo OT5
 
Building Tools for Neuroimaging
Building Tools for NeuroimagingBuilding Tools for Neuroimaging
Building Tools for NeuroimagingVanessa S
 
Nivea for men case study solution
Nivea for men case study solutionNivea for men case study solution
Nivea for men case study solutionAnjali Mehta
 
Red bull Marketing Strategies
Red bull Marketing StrategiesRed bull Marketing Strategies
Red bull Marketing StrategiesYujata Pasricha
 

Viewers also liked (7)

Sami
SamiSami
Sami
 
1 evaluacion de diseño industrial 7 paola duran (camara de utileria )
1 evaluacion de diseño industrial 7  paola duran (camara de utileria )1 evaluacion de diseño industrial 7  paola duran (camara de utileria )
1 evaluacion de diseño industrial 7 paola duran (camara de utileria )
 
Narrativa do David - MOH
Narrativa do David - MOHNarrativa do David - MOH
Narrativa do David - MOH
 
Building Tools for Neuroimaging
Building Tools for NeuroimagingBuilding Tools for Neuroimaging
Building Tools for Neuroimaging
 
L'Oreal Case Study
L'Oreal Case StudyL'Oreal Case Study
L'Oreal Case Study
 
Nivea for men case study solution
Nivea for men case study solutionNivea for men case study solution
Nivea for men case study solution
 
Red bull Marketing Strategies
Red bull Marketing StrategiesRed bull Marketing Strategies
Red bull Marketing Strategies
 

Similar to biomedical_machine_learning_poster_48''x36''

IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine LearningIRJET Journal
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Manuel Martín
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Kamel Mansouri
 
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptxBasics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptxRahul Jawarkar
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsValery Tkachenko
 
The FDA’s Data Exploring Tools: ‘Array Track HCA and PCA Packages’
The FDA’s Data Exploring Tools: ‘Array Track HCA and PCA Packages’The FDA’s Data Exploring Tools: ‘Array Track HCA and PCA Packages’
The FDA’s Data Exploring Tools: ‘Array Track HCA and PCA Packages’EMMAIntl
 
Poster - Learning From the JUMP CP Pilot Data: Insights for Platform Development
Poster - Learning From the JUMP CP Pilot Data: Insights for Platform DevelopmentPoster - Learning From the JUMP CP Pilot Data: Insights for Platform Development
Poster - Learning From the JUMP CP Pilot Data: Insights for Platform DevelopmentKML Vision
 
Machine learning to solve bioinformatics problems
Machine learning to solve bioinformatics problemsMachine learning to solve bioinformatics problems
Machine learning to solve bioinformatics problemsJunaidAKG
 
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ijaia
 
Srge most important publications 2020
Srge most important  publications 2020Srge most important  publications 2020
Srge most important publications 2020Aboul Ella Hassanien
 
Heart Failure Prediction using Different MachineLearning Techniques
Heart Failure Prediction using Different MachineLearning TechniquesHeart Failure Prediction using Different MachineLearning Techniques
Heart Failure Prediction using Different MachineLearning TechniquesIRJET Journal
 
Feature Selection Approach based on Firefly Algorithm and Chi-square
Feature Selection Approach based on Firefly Algorithm and Chi-square Feature Selection Approach based on Firefly Algorithm and Chi-square
Feature Selection Approach based on Firefly Algorithm and Chi-square IJECEIAES
 
Automated-tuned hyper-parameter deep neural network by using arithmetic optim...
Automated-tuned hyper-parameter deep neural network by using arithmetic optim...Automated-tuned hyper-parameter deep neural network by using arithmetic optim...
Automated-tuned hyper-parameter deep neural network by using arithmetic optim...IJECEIAES
 
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna DesignANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna DesignAIRCC Publishing Corporation
 
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna DesignANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna DesignAIRCC Publishing Corporation
 

Similar to biomedical_machine_learning_poster_48''x36'' (20)

A hybrid wrapper spider monkey optimization-simulated annealing model for opt...
A hybrid wrapper spider monkey optimization-simulated annealing model for opt...A hybrid wrapper spider monkey optimization-simulated annealing model for opt...
A hybrid wrapper spider monkey optimization-simulated annealing model for opt...
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
 
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptxBasics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
 
The FDA’s Data Exploring Tools: ‘Array Track HCA and PCA Packages’
The FDA’s Data Exploring Tools: ‘Array Track HCA and PCA Packages’The FDA’s Data Exploring Tools: ‘Array Track HCA and PCA Packages’
The FDA’s Data Exploring Tools: ‘Array Track HCA and PCA Packages’
 
Conference slide
Conference slideConference slide
Conference slide
 
Poster - Learning From the JUMP CP Pilot Data: Insights for Platform Development
Poster - Learning From the JUMP CP Pilot Data: Insights for Platform DevelopmentPoster - Learning From the JUMP CP Pilot Data: Insights for Platform Development
Poster - Learning From the JUMP CP Pilot Data: Insights for Platform Development
 
Machine learning to solve bioinformatics problems
Machine learning to solve bioinformatics problemsMachine learning to solve bioinformatics problems
Machine learning to solve bioinformatics problems
 
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
 
Srge most important publications 2020
Srge most important  publications 2020Srge most important  publications 2020
Srge most important publications 2020
 
Heart Failure Prediction using Different MachineLearning Techniques
Heart Failure Prediction using Different MachineLearning TechniquesHeart Failure Prediction using Different MachineLearning Techniques
Heart Failure Prediction using Different MachineLearning Techniques
 
Feature Selection Approach based on Firefly Algorithm and Chi-square
Feature Selection Approach based on Firefly Algorithm and Chi-square Feature Selection Approach based on Firefly Algorithm and Chi-square
Feature Selection Approach based on Firefly Algorithm and Chi-square
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
Automated-tuned hyper-parameter deep neural network by using arithmetic optim...
Automated-tuned hyper-parameter deep neural network by using arithmetic optim...Automated-tuned hyper-parameter deep neural network by using arithmetic optim...
Automated-tuned hyper-parameter deep neural network by using arithmetic optim...
 
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna DesignANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
 
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna DesignANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
 

More from Neha Gupta

applications-and-current-challenges-of-supercomputing-across-multiple-domains...
applications-and-current-challenges-of-supercomputing-across-multiple-domains...applications-and-current-challenges-of-supercomputing-across-multiple-domains...
applications-and-current-challenges-of-supercomputing-across-multiple-domains...Neha Gupta
 
5315Syll2015Fall
5315Syll2015Fall5315Syll2015Fall
5315Syll2015FallNeha Gupta
 
syllabus_OS_Fall_2015
syllabus_OS_Fall_2015syllabus_OS_Fall_2015
syllabus_OS_Fall_2015Neha Gupta
 
Stem Cell Presentation
Stem Cell PresentationStem Cell Presentation
Stem Cell PresentationNeha Gupta
 
Internship Report
Internship ReportInternship Report
Internship ReportNeha Gupta
 
NehaGupta-SupervisorStatement
NehaGupta-SupervisorStatementNehaGupta-SupervisorStatement
NehaGupta-SupervisorStatementNeha Gupta
 
human_mutation_article
human_mutation_articlehuman_mutation_article
human_mutation_articleNeha Gupta
 

More from Neha Gupta (9)

applications-and-current-challenges-of-supercomputing-across-multiple-domains...
applications-and-current-challenges-of-supercomputing-across-multiple-domains...applications-and-current-challenges-of-supercomputing-across-multiple-domains...
applications-and-current-challenges-of-supercomputing-across-multiple-domains...
 
5315Syll2015Fall
5315Syll2015Fall5315Syll2015Fall
5315Syll2015Fall
 
syllabus_OS_Fall_2015
syllabus_OS_Fall_2015syllabus_OS_Fall_2015
syllabus_OS_Fall_2015
 
Stem Cell Presentation
Stem Cell PresentationStem Cell Presentation
Stem Cell Presentation
 
Presentation1
Presentation1Presentation1
Presentation1
 
neha_ppt
neha_pptneha_ppt
neha_ppt
 
Internship Report
Internship ReportInternship Report
Internship Report
 
NehaGupta-SupervisorStatement
NehaGupta-SupervisorStatementNehaGupta-SupervisorStatement
NehaGupta-SupervisorStatement
 
human_mutation_article
human_mutation_articlehuman_mutation_article
human_mutation_article
 

biomedical_machine_learning_poster_48''x36''

  • 1. Poster Design & Printing by Genigraphics® - 800.790.4001 MACHINE LEARNING ALGORITHMS FOR NAME ENTITY RECOGNITION Neha Gupta, Thomas Hahn, Prasanna Balakrishnan, Dr. Richard Segall University of Arkansas at Little Rock, University of Arkansas for Medical Sciences, Arkansas State University, Boston Children’s Hospital, Sathyabama Deemed University, Supported by grants from NCRR(P20RR016460) and NIGMS(P20GM103429) at NIH.HMM METHODS MAP REDUCE REFERENCES ABSTRACT CONTACT University of Arkansas Little Rock Email: thomas.f.hahn@gmail.com, nneha2ggupta@gmail.com, balkiprasanna1984@gmail.com, rssegall@aol.com Existing libraries like ABNER and GATE are being used to extract information from the wealth of available biomedical literature. But the disadvantage of ABNER and GATE is that they run only in single machine. However, the MapReduce algorithm will run on multiple machines and is found to be more efficient than these frameworks in our analysis. The objective of this project is to develop a statistically based model for biomedical text mining. This application is intended to address current limitations in biomedical text mining regarding large-scale datasets. A statistical model is built based on the Hidden Markov Model (HMM) which can be used to identify the biological entities in the literature. This can be used to identify and build networks crucial for human diseases. HMM model is generated and viterbi algorithm was applied to the list of observations. .. 1. Lantz, B. (2013), Machine Learning with R. Birmingham, UK: Packt Publishing. 2. Kim, J., Ohta, T., Tateisi, Y. & Tsujii, J. (2003). GENIA corpus—a semantically annotated corpus for bio- textmining. Retrieved 02/19, 2014, from http://0- bioinformatics.oxfordjournals.org.iii- server.ualr.edu/content/19/suppl_1/i180.abstract 3. Zhang, J., Shen, D., Zhou, G. D., Su, J., & Tan, C. L. (2004). Enhancing HMM-based biomedical named entity recognition by studying special phenomena. Journal of Biomedical Informatics, 37(6), 411-422. doi:10.1016/j.jbi.2004.08.005. Retrieved 02/21, 2014, fromhttp://download.journals.elsevierhealth.com/pdfs/jou rnals/1532-0464/PIIS1532046404000838.pdf BAUM-WELCH ALGORITHM -HMM model generated in R had a logViterbiScore -16.57684. -HMM was successfully trained in Map Reduce to generate the expected counts of initial states, emissions and transitions taken to generate the sequence. -The maximum likelihood score L of our training set S was 151 using Baum-Welch implementation in Perl. -The training data was classified and compared by the algorithms kNN, Naïve Baye Bayes, Logistic Regression and SVM -For linear kernel in SVM, 50 model’s predictions does not agree with the actual test dataset . - Using Radial Basis Function kernel, 192 model predictions does not agree -Clustered seven entities from GENIA CORPUS 2000 Medline abstracts . -Aligned the seven clusters to the PubMed abstracts words and tagged them. -The transition probability matrix and emission probability matrix was generated for these 500 tagged training words. -HMM model was generated using RHMM package . -HMM training was also parallelized using Map Reduce Algorithm in Perl. -MLE of hidden parameters of HMM are inferred using Baum-Welch algorithm implementation in Perl. -Implemented machine learning classifiers like kNN, Naïve-Bayes, Logistic Regression ,SVM. RESULTS Neha Gupta, Thomas Hahn, Prasanna Balakrishnan, Dr. Richard Segall University of Arkansas at Little Rock, University of Arkansas for Medical Sciences, Arkansas State University, Boston Children’s Hospital, Sathyabama Deemed University, This project was supported by the Arkansas INBRE program, with grants from the National Center for Research Resources - NCRR (P20RR016460) and the National Institute of General Medical Sciences - NIGMS (P20 GM103429) from the National Institutes of Health.