SlideShare a Scribd company logo
1 of 13
Manual Curation and Extraction
of miRNAs using miRTex miRNA
recognition NER.
Seerat Sidhu
Supervisors: Jean-marc Schwartz, Goran Nenadic
Faculty of Life Sciences
The University of Manchester
AIMS
 Using miRTex for Analysis.
 Automating the process of information retrieval.
 Manual Curation and Evaluation of results.
 Study the Distribution of errors.
INTRODUCTION
 MicroRNAs are ~22 nucleotide long non-coding endogenous RNAs.
 Automated tools provide assistance in solving the problem of fast
assembling biomedical literature.
 NER (Named Entity Recognition) is a type of text mining technique
that is used to identify the mention of key biological entities in the text.
 miRTex NER system uses Rule-Based approach to extract the mention
of miRNAs present within the free text.
METHODS
miRNA Extraction Pipeline
Corpus Selection
Pre-Processing
*continuous data
Data Input
miRNA-mention
Recognition
*Rule Based
* miRNA
Nomenclature Based
miRNA Extraction
METHODS
Nomenclature of MicroRNAs
METHODS
 “mir” (or “miRNA”,” microRNA”, “miR”) is the prefix for
MicroRNAs which is usually followed by a dash and unique identifier
number.
 The performance of the NER tool was evaluated using two Corpora:
 miRTex corpus was evaluated which consisted of 150 abstracts.
 In-house corpus consisted of 13 full-length articles.
METHODS
 The gaps were removed between lines and paragraphs, to create
continuous data.
 The data was stored in text files, which were further used to
construct dictionaries.
 The results obtained using the NER tool were manually curated.
 Results were additionally evaluated by the calculation of F-score,
as well as precision and recall scores.
RESULTS
Evaluation
 miRTex corpus: F – score of 0.99 with the recall value of 0.99
and precision of 1.
 In – house corpus:
Distribution of errors in the text.
Pubmedid Introduction Discussion Procedures Results Abstracts
22435726 1(FN)
23496142 2(FP) 3(FN)
25081906 1(FP) 1(FN) 1(FP)
24431276 3(FN)
26026730 1(FP) 1(FN)
CONCLUSION
 Up to 100 documents can be processed at a time.
 The system was automated to retrieve text files containing data.
 The F-score was ~0.98 for majority of the results.
 Reliable and Accurate predictions.
 Good precision and recall values.
 Errors were randomly distributed.
FUTURE RESEARCH
 Study only a particular set of miRNAs.
 Integration with curation pipelines, to attempt an analysis of the
relationship between miRNAs and diseases.
 Identification of potential miRNA targets by carrying out a
systematic investigation.
THANK YOU!

More Related Content

What's hot

Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsmaulikchaudhary8
 
Introduction to systems medicine
Introduction to systems medicineIntroduction to systems medicine
Introduction to systems medicineimprovemed
 
An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
 
Application of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicineApplication of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicinePranavathiyani G
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biologyPranavathiyani G
 
NGS Management And Analysis: From Sample To Molecular And Network Biology.
NGS Management And Analysis: From Sample To Molecular And Network Biology.NGS Management And Analysis: From Sample To Molecular And Network Biology.
NGS Management And Analysis: From Sample To Molecular And Network Biology.Arnaud Céol
 
Application of genetic algorithm in intrusion detection system
Application of genetic algorithm in intrusion detection systemApplication of genetic algorithm in intrusion detection system
Application of genetic algorithm in intrusion detection systemAlexander Decker
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...ijitcs
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Projectbutest
 
IRJET- Disease Identification using Proteins Values and Regulatory Modules
IRJET-  	  Disease Identification using Proteins Values and Regulatory  ModulesIRJET-  	  Disease Identification using Proteins Values and Regulatory  Modules
IRJET- Disease Identification using Proteins Values and Regulatory ModulesIRJET Journal
 
Classification of Microarray Gene Expression Data by Gene Combinations using ...
Classification of Microarray Gene Expression Data by Gene Combinations using ...Classification of Microarray Gene Expression Data by Gene Combinations using ...
Classification of Microarray Gene Expression Data by Gene Combinations using ...IJCSEA Journal
 
Brainsci 10-00118
Brainsci 10-00118Brainsci 10-00118
Brainsci 10-00118imen jdey
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Seattle DAML meetup
 
Power spectrum sequence analysis of rheumatic
Power spectrum sequence analysis of rheumaticPower spectrum sequence analysis of rheumatic
Power spectrum sequence analysis of rheumaticeSAT Publishing House
 
GRC-MS: A GENETIC RULE-BASED CLASSIFIER MODEL FOR ANALYSIS OF MASS SPECTRA DATA
GRC-MS: A GENETIC RULE-BASED CLASSIFIER MODEL FOR ANALYSIS OF MASS SPECTRA DATAGRC-MS: A GENETIC RULE-BASED CLASSIFIER MODEL FOR ANALYSIS OF MASS SPECTRA DATA
GRC-MS: A GENETIC RULE-BASED CLASSIFIER MODEL FOR ANALYSIS OF MASS SPECTRA DATAcscpconf
 
Particle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationParticle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationEditor IJCATR
 
NetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarNetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarAlexander Pico
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSMSCW Mysore
 

What's hot (20)

Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Introduction to systems medicine
Introduction to systems medicineIntroduction to systems medicine
Introduction to systems medicine
 
An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification
 
Application of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicineApplication of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicine
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biology
 
Slides
SlidesSlides
Slides
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
NGS Management And Analysis: From Sample To Molecular And Network Biology.
NGS Management And Analysis: From Sample To Molecular And Network Biology.NGS Management And Analysis: From Sample To Molecular And Network Biology.
NGS Management And Analysis: From Sample To Molecular And Network Biology.
 
Application of genetic algorithm in intrusion detection system
Application of genetic algorithm in intrusion detection systemApplication of genetic algorithm in intrusion detection system
Application of genetic algorithm in intrusion detection system
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
 
IRJET- Disease Identification using Proteins Values and Regulatory Modules
IRJET-  	  Disease Identification using Proteins Values and Regulatory  ModulesIRJET-  	  Disease Identification using Proteins Values and Regulatory  Modules
IRJET- Disease Identification using Proteins Values and Regulatory Modules
 
Classification of Microarray Gene Expression Data by Gene Combinations using ...
Classification of Microarray Gene Expression Data by Gene Combinations using ...Classification of Microarray Gene Expression Data by Gene Combinations using ...
Classification of Microarray Gene Expression Data by Gene Combinations using ...
 
Brainsci 10-00118
Brainsci 10-00118Brainsci 10-00118
Brainsci 10-00118
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
 
Power spectrum sequence analysis of rheumatic
Power spectrum sequence analysis of rheumaticPower spectrum sequence analysis of rheumatic
Power spectrum sequence analysis of rheumatic
 
GRC-MS: A GENETIC RULE-BASED CLASSIFIER MODEL FOR ANALYSIS OF MASS SPECTRA DATA
GRC-MS: A GENETIC RULE-BASED CLASSIFIER MODEL FOR ANALYSIS OF MASS SPECTRA DATAGRC-MS: A GENETIC RULE-BASED CLASSIFIER MODEL FOR ANALYSIS OF MASS SPECTRA DATA
GRC-MS: A GENETIC RULE-BASED CLASSIFIER MODEL FOR ANALYSIS OF MASS SPECTRA DATA
 
Particle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationParticle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster Identification
 
NetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarNetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David Amar
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 

Viewers also liked

Lnt emerald isle Powai
Lnt emerald isle PowaiLnt emerald isle Powai
Lnt emerald isle Powaismcrealty
 
Magnoficent : May 2016
Magnoficent : May 2016Magnoficent : May 2016
Magnoficent : May 2016Magnon\TBWA
 
Hormonios.vegetais
Hormonios.vegetaisHormonios.vegetais
Hormonios.vegetaisjoellenka
 
Magnoficent : October 2016
Magnoficent : October 2016Magnoficent : October 2016
Magnoficent : October 2016Magnon\TBWA
 
Créer une chasse au trésor avec Web Worksheet Wizard
Créer une chasse au trésor avec Web Worksheet WizardCréer une chasse au trésor avec Web Worksheet Wizard
Créer une chasse au trésor avec Web Worksheet WizardCarmen Vera
 

Viewers also liked (10)

ENVIRONMENT
ENVIRONMENTENVIRONMENT
ENVIRONMENT
 
Lnt emerald isle Powai
Lnt emerald isle PowaiLnt emerald isle Powai
Lnt emerald isle Powai
 
Magnoficent : May 2016
Magnoficent : May 2016Magnoficent : May 2016
Magnoficent : May 2016
 
Mustafa Al-Turk CV
Mustafa Al-Turk CVMustafa Al-Turk CV
Mustafa Al-Turk CV
 
DIVYA SREE KASIRAJ
DIVYA SREE KASIRAJDIVYA SREE KASIRAJ
DIVYA SREE KASIRAJ
 
Vernon D. Cook
Vernon D. CookVernon D. Cook
Vernon D. Cook
 
Hormonios.vegetais
Hormonios.vegetaisHormonios.vegetais
Hormonios.vegetais
 
Magnoficent : October 2016
Magnoficent : October 2016Magnoficent : October 2016
Magnoficent : October 2016
 
Design pattern of mobile application
Design pattern of mobile applicationDesign pattern of mobile application
Design pattern of mobile application
 
Créer une chasse au trésor avec Web Worksheet Wizard
Créer une chasse au trésor avec Web Worksheet WizardCréer une chasse au trésor avec Web Worksheet Wizard
Créer une chasse au trésor avec Web Worksheet Wizard
 

Similar to MicroRNA extraction

Efficient and accurate analysis of non-coding RNAs with InSyBio ncRNASeq
Efficient and accurate analysis of non-coding RNAs with InSyBio ncRNASeqEfficient and accurate analysis of non-coding RNAs with InSyBio ncRNASeq
Efficient and accurate analysis of non-coding RNAs with InSyBio ncRNASeqTheofilatos Konstantinos
 
Analytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic MethodsAnalytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic Methodscscpconf
 
Emergingroleo fmi rnainmedicalsciences
Emergingroleo fmi rnainmedicalsciencesEmergingroleo fmi rnainmedicalsciences
Emergingroleo fmi rnainmedicalscienceskarenbbs
 
BIOMARKER EXTRACTION FROM EMR / EHR DATA - ASHISH SHARMA & KAIWEN ZHONG
 BIOMARKER EXTRACTION FROM EMR / EHR DATA - ASHISH SHARMA & KAIWEN ZHONG BIOMARKER EXTRACTION FROM EMR / EHR DATA - ASHISH SHARMA & KAIWEN ZHONG
BIOMARKER EXTRACTION FROM EMR / EHR DATA - ASHISH SHARMA & KAIWEN ZHONGAshish Sharma
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...DataScienceConferenc1
 
Degradome sequencing and small RNA targets
Degradome sequencing and small RNA targetsDegradome sequencing and small RNA targets
Degradome sequencing and small RNA targetsMuhammed Ameer
 
overview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csnceroverview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csncerSeham Al-Shehri
 
Bro gef mi_rna_0212_lr
Bro gef mi_rna_0212_lrBro gef mi_rna_0212_lr
Bro gef mi_rna_0212_lrElsa von Licy
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger Eli Kaminuma
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxxRowlet
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0Computer Science Club
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesElia Brodsky
 

Similar to MicroRNA extraction (20)

Efficient and accurate analysis of non-coding RNAs with InSyBio ncRNASeq
Efficient and accurate analysis of non-coding RNAs with InSyBio ncRNASeqEfficient and accurate analysis of non-coding RNAs with InSyBio ncRNASeq
Efficient and accurate analysis of non-coding RNAs with InSyBio ncRNASeq
 
Analytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic MethodsAnalytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic Methods
 
Emergingroleo fmi rnainmedicalsciences
Emergingroleo fmi rnainmedicalsciencesEmergingroleo fmi rnainmedicalsciences
Emergingroleo fmi rnainmedicalsciences
 
BIOMARKER EXTRACTION FROM EMR / EHR DATA - ASHISH SHARMA & KAIWEN ZHONG
 BIOMARKER EXTRACTION FROM EMR / EHR DATA - ASHISH SHARMA & KAIWEN ZHONG BIOMARKER EXTRACTION FROM EMR / EHR DATA - ASHISH SHARMA & KAIWEN ZHONG
BIOMARKER EXTRACTION FROM EMR / EHR DATA - ASHISH SHARMA & KAIWEN ZHONG
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
Mi rna part ii_2013
Mi rna part ii_2013Mi rna part ii_2013
Mi rna part ii_2013
 
Tpa 2013
Tpa 2013Tpa 2013
Tpa 2013
 
Degradome sequencing and small RNA targets
Degradome sequencing and small RNA targetsDegradome sequencing and small RNA targets
Degradome sequencing and small RNA targets
 
Bioinformatics seminar
Bioinformatics seminarBioinformatics seminar
Bioinformatics seminar
 
overview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csnceroverview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csncer
 
Bro gef mi_rna_0212_lr
Bro gef mi_rna_0212_lrBro gef mi_rna_0212_lr
Bro gef mi_rna_0212_lr
 
integration_Aug2015
integration_Aug2015integration_Aug2015
integration_Aug2015
 
Mi rna array 2013
Mi rna array 2013Mi rna array 2013
Mi rna array 2013
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
Mi rna brochure_set
Mi rna brochure_setMi rna brochure_set
Mi rna brochure_set
 
Mi rna toss_set
Mi rna toss_setMi rna toss_set
Mi rna toss_set
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 

MicroRNA extraction

  • 1. Manual Curation and Extraction of miRNAs using miRTex miRNA recognition NER. Seerat Sidhu Supervisors: Jean-marc Schwartz, Goran Nenadic Faculty of Life Sciences The University of Manchester
  • 2. AIMS  Using miRTex for Analysis.  Automating the process of information retrieval.  Manual Curation and Evaluation of results.  Study the Distribution of errors.
  • 3. INTRODUCTION  MicroRNAs are ~22 nucleotide long non-coding endogenous RNAs.  Automated tools provide assistance in solving the problem of fast assembling biomedical literature.  NER (Named Entity Recognition) is a type of text mining technique that is used to identify the mention of key biological entities in the text.  miRTex NER system uses Rule-Based approach to extract the mention of miRNAs present within the free text.
  • 4. METHODS miRNA Extraction Pipeline Corpus Selection Pre-Processing *continuous data Data Input miRNA-mention Recognition *Rule Based * miRNA Nomenclature Based miRNA Extraction
  • 6. METHODS  “mir” (or “miRNA”,” microRNA”, “miR”) is the prefix for MicroRNAs which is usually followed by a dash and unique identifier number.  The performance of the NER tool was evaluated using two Corpora:  miRTex corpus was evaluated which consisted of 150 abstracts.  In-house corpus consisted of 13 full-length articles.
  • 7. METHODS  The gaps were removed between lines and paragraphs, to create continuous data.  The data was stored in text files, which were further used to construct dictionaries.  The results obtained using the NER tool were manually curated.  Results were additionally evaluated by the calculation of F-score, as well as precision and recall scores.
  • 9. Evaluation  miRTex corpus: F – score of 0.99 with the recall value of 0.99 and precision of 1.  In – house corpus:
  • 10. Distribution of errors in the text. Pubmedid Introduction Discussion Procedures Results Abstracts 22435726 1(FN) 23496142 2(FP) 3(FN) 25081906 1(FP) 1(FN) 1(FP) 24431276 3(FN) 26026730 1(FP) 1(FN)
  • 11. CONCLUSION  Up to 100 documents can be processed at a time.  The system was automated to retrieve text files containing data.  The F-score was ~0.98 for majority of the results.  Reliable and Accurate predictions.  Good precision and recall values.  Errors were randomly distributed.
  • 12. FUTURE RESEARCH  Study only a particular set of miRNAs.  Integration with curation pipelines, to attempt an analysis of the relationship between miRNAs and diseases.  Identification of potential miRNA targets by carrying out a systematic investigation.