SlideShare a Scribd company logo
1 of 26
The application of
                    artificial intelligence
 Presented by:
Pallavi Vashistha
                        techniques in
                       bioinformatics
Outline
• Bioinformatics Today
• Artificial Intelligence application
• Examples:
 Symbolic machine learning
 Nearest neighbour approach
 Clustering
 Identification trees


• Major Challenge and Research Issues
History of Bioinformatics
    Year        Subject Name                 MBP
                                     (Millions of base pairs)

    1995    Haemophilus Influenza             1.8


    1996         Bakers Yeast                12.1
    1997             E.Coli                   4.7
    2000    Pseudomonas aeruginosa            6.3
            A. Thaliana                       100
            D. Melonagaster                   180


    2001       Human Genome                 3,000
    2002        House Mouse                 2,500
Bioinformatics Today
  Sequence analysis
   Sequence alignment
   Structure and function prediction
   Gene finding


  Structure analysis
   Protein structure comparison
   Protein structure prediction
   RNA structure modeling

  Expression analysis
    Gene expression
     analysis
    Gene clustering

  Pathway analysis
   Metabolic pathway
   Regulatory networks
                                        4
Artificial Intelligence application
There are several important problems where AI
approaches are particularly promising

  • Prediction of Protein Structure
  • Semiautomatic drug design
  • Knowledge acquisition from genetic data
Artificial Intelligence application

How to classify biological sequences
• SVM(support vector machine ), Neural Nets,
  Decision Trees, Rules
How to cluster biological entities
• Bi-clustering, K-means, hierarchical
How to select features
• LDR (Linear Discriminant Analysis), PCA (Principal
  Components Analysis), SVM-RFE (recursive feature
  elimination)
Nearest neighbour approach
0 Decision tree:
• each node is connected to a set of possible answers,
• each non-leaf node is connected to a test which splits
  its set of possible answers into subsets corresponding
  to different test results,
• each branch carries a particular test result’s subset
  to another node.
Nearest neighbour approach
         Example:                         Solution:
0 Example: To see how             0 To answer this question,
  decision trees are useful for     we need to assume a
  nearest neighbour                 consistency heuristic, as
  calculations, consider 8          follows. Find the most
  blocks of known width,
  height and colour (Winston,       similar case, as
  1992). A new block then           measured by known
  appears of known size but         properties, for which the
  unknown colour. On the            property is known; then
  basis of existing                 guess that the unknown
  information, can we make          property is the same as
  an informed guess as to           the known property. This
  what the colour of the new        is the basis of all nearest
  block is?                         neighbour calculations.
Clustering
0 Clustering follows the principles of nearest neighbour
  calculations but attempts to look at all the attributes
  (positions) of biosequences rather than just one
  attribute (position) for identifying similarities.
0 This is achieved typically by averaging the amount of
  similarity between two biosequences across all
  attributes.
0 For example, imagine that we have a table of
  information concerning four organisms with five
  characteristics:
• Given this table, can we calculate how similar each organism is to every other
  organism?

• The nearest neighbour approach described earlier would work through the
  attributes(‘characteristics’) one at a time. For short bio sequences this may be
  feasible, but for bio sequences with hundreds of attributes (e.g. DNA bases) this is
  not desirable, since we could probably classify all the samples with just the first
  few attributes
Clustering can be demonstrated in the following way:

0 The first step is to calculate a simplematching coefficient for
  every pair of organisms in the table across all attributes.
0 For instance, the matching coefficient for A and B is the
  number of identical characteristics divided by the total
  number of characteristics,
0 4/5 = 0.8 (1+0+1+1+1=4/5=0.8). Similarly,
0 A and C = 0.4 (0+0+0+1+1 =2/5 = 0.4)
0 A and D = 0.2 (0+0+0+0+1 = 1/5 = 0.2)
0 B and C = 0.6 (0+1+0+1+1 = 3/5 = 0.6)
0 B and D = 0.4 (0+1+0+0+1 = 2/5 = 0.4)
0 C and D = 0.8 (1+1+1+0+1 = 4/5 = 0.8)
• We then find the first highest matching coefficient to form the first 'cluster'of
  similar bacteria. Since we have two candidates here (AB and CD both have
  0.8), we randomly choose one cluster to start the process: AB.

• The steps are then repeated, using AB as one ‘organism’ and taking partial
  matches into account.

• the average matching coefficient for
   AB and C = 0.5 (0+0.5+0+1+1 = 2.5/5 = 0.5)
   where the 0.5 second match within the parentheses refers to C sharing its
second
   feature with B but not A.


• The matching coefficients for AB and D = 0.3 (0+0.5+0+0+1 = 1.5/5 = 0.3)
   and for C and D = 0.8 (as before).

•   Since C and D have the highest cooefficient, they form the second cluster.
Finally, we calculate the average matching coefficients for the new 'clusters'of
organism taking AB as one organism and CD as another organism = 0.4
(0+0.5+0+0.5+1 = 2/5 = 0.4)
again taking partial matches into account. We can then construct a similarity tree
using these coefficients, as follows:
Identification tree




The task now is to determine which of the attributes contribute towards someone
being sunburned or not. First, we need to introduce a disorder formula and
associated log values to rank attributes in terms of their influence on who is and
who isn’t sunburned.
where nb is the number of samples in branch b, nt is the total number of samples in all
branches, and nbc is the total of samples in branch b of class c.



• The idea is to divide samples into subsets in which as many of the samples have
  the same classification as possible (as homogeneous subsets as possible). The
  higher the disorder value, the less homogeneous the classification.

•   We now work through each attribute in turn, identifying which of the samples fall
    within the branches (attribute values) of that attribute, and signify into which
    class each of the samples falls
Given the full identification tree, we can then derive rules by following
          all paths from the root to the leaf nodes, as follows:

0 (a) If a person’s hair colour is brown, then the person is not
 sunburned.

0 (b) If a person’s hair colour is red, then the person is
 sunburned.

0 (c) If a person’s hair colour is blond and that person has used
 sun tan lotion, then the person is not sunburned.

0 (d) If a person’s hair colour is blond and that person has not
 used sun tan lotion, then the person is sunburned.
Major Challenges and Research Issues




•   Requires individuals with knowledge of both
    disciplines
•   Requires collaboration of individuals from diverse
    disciplines
Major Challenges and Research Issues

• Data generation in biology/bioinformatics is
  outpacing methods of data analysis
• Data interpretation and generation of
  hypotheses requires intelligence
• AI offers established methods for knowledge
  representation and “intelligent” data
  interpretation

More Related Content

Viewers also liked

An Introduction to Artificial Intelligence
An Introduction to Artificial IntelligenceAn Introduction to Artificial Intelligence
An Introduction to Artificial IntelligenceSeth Juarez
 
Natural resources of Bangladesh by capt Robin amc
Natural resources of Bangladesh by capt Robin amcNatural resources of Bangladesh by capt Robin amc
Natural resources of Bangladesh by capt Robin amcMehedi Robin
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceAIT
 
artificial intelligence and its applications
 artificial intelligence and its applications artificial intelligence and its applications
artificial intelligence and its applicationsYogendra Vishnoi
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial IntelligenceZavain Dar
 
Social problem of Bangladesh and It’s solution
Social problem of Bangladesh  and  It’s solutionSocial problem of Bangladesh  and  It’s solution
Social problem of Bangladesh and It’s solutionnanayem
 
Key Expert Systems Concepts
Key Expert Systems ConceptsKey Expert Systems Concepts
Key Expert Systems ConceptsHarmony Kwawu
 
Artificial Intelligence Master at UPC: some experience on applying AI to real...
Artificial Intelligence Master at UPC: some experience on applying AI to real...Artificial Intelligence Master at UPC: some experience on applying AI to real...
Artificial Intelligence Master at UPC: some experience on applying AI to real...Javier Vázquez-Salceda
 
What is artificial intelligence
What is artificial intelligenceWhat is artificial intelligence
What is artificial intelligenceShreya Chakraborty
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsPragya Pai
 
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...Sunil Nair
 
Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceAlbert Orriols-Puig
 

Viewers also liked (13)

An Introduction to Artificial Intelligence
An Introduction to Artificial IntelligenceAn Introduction to Artificial Intelligence
An Introduction to Artificial Intelligence
 
Natural resources of Bangladesh by capt Robin amc
Natural resources of Bangladesh by capt Robin amcNatural resources of Bangladesh by capt Robin amc
Natural resources of Bangladesh by capt Robin amc
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
artificial intelligence and its applications
 artificial intelligence and its applications artificial intelligence and its applications
artificial intelligence and its applications
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial Intelligence
 
Social problem of Bangladesh and It’s solution
Social problem of Bangladesh  and  It’s solutionSocial problem of Bangladesh  and  It’s solution
Social problem of Bangladesh and It’s solution
 
Key Expert Systems Concepts
Key Expert Systems ConceptsKey Expert Systems Concepts
Key Expert Systems Concepts
 
Artificial Intelligence Master at UPC: some experience on applying AI to real...
Artificial Intelligence Master at UPC: some experience on applying AI to real...Artificial Intelligence Master at UPC: some experience on applying AI to real...
Artificial Intelligence Master at UPC: some experience on applying AI to real...
 
What is artificial intelligence
What is artificial intelligenceWhat is artificial intelligence
What is artificial intelligence
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...
 
Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 

Similar to Artificial intelligence techniques in bioinformatics

Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Rakibul Hasan Pranto
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisJosh Neufeld
 
Pm m23 & pmnm06 week 3 lectures 2015
Pm m23 & pmnm06 week 3 lectures 2015Pm m23 & pmnm06 week 3 lectures 2015
Pm m23 & pmnm06 week 3 lectures 2015pdiddyboy2
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment AnalysisRupak Roy
 
The Semantics of Genomic Analysis
The Semantics of  Genomic AnalysisThe Semantics of  Genomic Analysis
The Semantics of Genomic Analysisrobertstevens65
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collectiondnac
 
Cornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsCornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsMark Gerstein
 
A data-intensive assessment of the species abundance distribution
A data-intensive assessment of the species abundance distributionA data-intensive assessment of the species abundance distribution
A data-intensive assessment of the species abundance distributionElita Baldridge
 
Graphical Models 4dummies
Graphical Models 4dummiesGraphical Models 4dummies
Graphical Models 4dummiesxamdam
 
Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningjaumebp
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
Jillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian Aurisano
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018David Cook
 
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine ModelRisk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine ModelJessica Minnier
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotesc.titus.brown
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 

Similar to Artificial intelligence techniques in bioinformatics (20)

Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Pm m23 & pmnm06 week 3 lectures 2015
Pm m23 & pmnm06 week 3 lectures 2015Pm m23 & pmnm06 week 3 lectures 2015
Pm m23 & pmnm06 week 3 lectures 2015
 
Phylogenetics2
Phylogenetics2Phylogenetics2
Phylogenetics2
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
The Semantics of Genomic Analysis
The Semantics of  Genomic AnalysisThe Semantics of  Genomic Analysis
The Semantics of Genomic Analysis
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
 
02 Network Data Collection (2016)
02 Network Data Collection (2016)02 Network Data Collection (2016)
02 Network Data Collection (2016)
 
Cornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsCornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 Nets
 
A data-intensive assessment of the species abundance distribution
A data-intensive assessment of the species abundance distributionA data-intensive assessment of the species abundance distribution
A data-intensive assessment of the species abundance distribution
 
Graphical Models 4dummies
Graphical Models 4dummiesGraphical Models 4dummies
Graphical Models 4dummies
 
Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learning
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Jillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-ja
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
 
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine ModelRisk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Classification Continued
Classification ContinuedClassification Continued
Classification Continued
 
Classification Continued
Classification ContinuedClassification Continued
Classification Continued
 

Recently uploaded

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Recently uploaded (20)

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

Artificial intelligence techniques in bioinformatics

  • 1. The application of artificial intelligence Presented by: Pallavi Vashistha techniques in bioinformatics
  • 2. Outline • Bioinformatics Today • Artificial Intelligence application • Examples:  Symbolic machine learning  Nearest neighbour approach  Clustering  Identification trees • Major Challenge and Research Issues
  • 3. History of Bioinformatics Year Subject Name MBP (Millions of base pairs) 1995 Haemophilus Influenza 1.8 1996 Bakers Yeast 12.1 1997 E.Coli 4.7 2000 Pseudomonas aeruginosa 6.3 A. Thaliana 100 D. Melonagaster 180 2001 Human Genome 3,000 2002 House Mouse 2,500
  • 4. Bioinformatics Today Sequence analysis  Sequence alignment  Structure and function prediction  Gene finding Structure analysis  Protein structure comparison  Protein structure prediction  RNA structure modeling Expression analysis  Gene expression analysis  Gene clustering Pathway analysis  Metabolic pathway  Regulatory networks 4
  • 5. Artificial Intelligence application There are several important problems where AI approaches are particularly promising • Prediction of Protein Structure • Semiautomatic drug design • Knowledge acquisition from genetic data
  • 6. Artificial Intelligence application How to classify biological sequences • SVM(support vector machine ), Neural Nets, Decision Trees, Rules How to cluster biological entities • Bi-clustering, K-means, hierarchical How to select features • LDR (Linear Discriminant Analysis), PCA (Principal Components Analysis), SVM-RFE (recursive feature elimination)
  • 7. Nearest neighbour approach 0 Decision tree: • each node is connected to a set of possible answers, • each non-leaf node is connected to a test which splits its set of possible answers into subsets corresponding to different test results, • each branch carries a particular test result’s subset to another node.
  • 8. Nearest neighbour approach Example: Solution: 0 Example: To see how 0 To answer this question, decision trees are useful for we need to assume a nearest neighbour consistency heuristic, as calculations, consider 8 follows. Find the most blocks of known width, height and colour (Winston, similar case, as 1992). A new block then measured by known appears of known size but properties, for which the unknown colour. On the property is known; then basis of existing guess that the unknown information, can we make property is the same as an informed guess as to the known property. This what the colour of the new is the basis of all nearest block is? neighbour calculations.
  • 9.
  • 10.
  • 11.
  • 12. Clustering 0 Clustering follows the principles of nearest neighbour calculations but attempts to look at all the attributes (positions) of biosequences rather than just one attribute (position) for identifying similarities. 0 This is achieved typically by averaging the amount of similarity between two biosequences across all attributes. 0 For example, imagine that we have a table of information concerning four organisms with five characteristics:
  • 13. • Given this table, can we calculate how similar each organism is to every other organism? • The nearest neighbour approach described earlier would work through the attributes(‘characteristics’) one at a time. For short bio sequences this may be feasible, but for bio sequences with hundreds of attributes (e.g. DNA bases) this is not desirable, since we could probably classify all the samples with just the first few attributes
  • 14. Clustering can be demonstrated in the following way: 0 The first step is to calculate a simplematching coefficient for every pair of organisms in the table across all attributes. 0 For instance, the matching coefficient for A and B is the number of identical characteristics divided by the total number of characteristics, 0 4/5 = 0.8 (1+0+1+1+1=4/5=0.8). Similarly, 0 A and C = 0.4 (0+0+0+1+1 =2/5 = 0.4) 0 A and D = 0.2 (0+0+0+0+1 = 1/5 = 0.2) 0 B and C = 0.6 (0+1+0+1+1 = 3/5 = 0.6) 0 B and D = 0.4 (0+1+0+0+1 = 2/5 = 0.4) 0 C and D = 0.8 (1+1+1+0+1 = 4/5 = 0.8)
  • 15. • We then find the first highest matching coefficient to form the first 'cluster'of similar bacteria. Since we have two candidates here (AB and CD both have 0.8), we randomly choose one cluster to start the process: AB. • The steps are then repeated, using AB as one ‘organism’ and taking partial matches into account. • the average matching coefficient for AB and C = 0.5 (0+0.5+0+1+1 = 2.5/5 = 0.5) where the 0.5 second match within the parentheses refers to C sharing its second feature with B but not A. • The matching coefficients for AB and D = 0.3 (0+0.5+0+0+1 = 1.5/5 = 0.3) and for C and D = 0.8 (as before). • Since C and D have the highest cooefficient, they form the second cluster.
  • 16. Finally, we calculate the average matching coefficients for the new 'clusters'of organism taking AB as one organism and CD as another organism = 0.4 (0+0.5+0+0.5+1 = 2/5 = 0.4) again taking partial matches into account. We can then construct a similarity tree using these coefficients, as follows:
  • 17. Identification tree The task now is to determine which of the attributes contribute towards someone being sunburned or not. First, we need to introduce a disorder formula and associated log values to rank attributes in terms of their influence on who is and who isn’t sunburned.
  • 18. where nb is the number of samples in branch b, nt is the total number of samples in all branches, and nbc is the total of samples in branch b of class c. • The idea is to divide samples into subsets in which as many of the samples have the same classification as possible (as homogeneous subsets as possible). The higher the disorder value, the less homogeneous the classification. • We now work through each attribute in turn, identifying which of the samples fall within the branches (attribute values) of that attribute, and signify into which class each of the samples falls
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Given the full identification tree, we can then derive rules by following all paths from the root to the leaf nodes, as follows: 0 (a) If a person’s hair colour is brown, then the person is not sunburned. 0 (b) If a person’s hair colour is red, then the person is sunburned. 0 (c) If a person’s hair colour is blond and that person has used sun tan lotion, then the person is not sunburned. 0 (d) If a person’s hair colour is blond and that person has not used sun tan lotion, then the person is sunburned.
  • 25. Major Challenges and Research Issues • Requires individuals with knowledge of both disciplines • Requires collaboration of individuals from diverse disciplines
  • 26. Major Challenges and Research Issues • Data generation in biology/bioinformatics is outpacing methods of data analysis • Data interpretation and generation of hypotheses requires intelligence • AI offers established methods for knowledge representation and “intelligent” data interpretation