Bioinformatics                                            TCSS 588A             Winter 2007


                                 TCSS 588 A Winter 2007
                                         Bioinformatics

Instructor: Isabelle Bichindaritz, Ph.D.
Class: PNK 104           M/W 7:30 P.M. – 9:45 P.M.
         CP 206M         for assignments
Email: ibichind@u.washington.edu
Office: Cherry Parkes 216
Office hours: M/W 2:20 P.M. – 4:20 P.M.
                always by email
                by appointment
Class Web-site: http://courses.washington.edu/tcss588

DESCRIPTION
           The bioinformatics course explains how to apply computer science to biology,
           medicine, genomics, and proteomics. After a detailed presentation of a selection of
           life sciences problems, the course examines and compares specific computational
           methods and systems to solve them, focusing on machine learning, concept learning,
           statistical learning, hidden Markov models, case based reasoning, neural networks,
           evolutionary computing, knowledge-based systems and ontologies, stochastic
           grammars and linguistics, grid computing, and semantic Web. Several applications
           and projects (the Human Genome Project) are detailed, and tools to build new
           applications are provided.

OBJECTIVES
Some of the objectives for this course include:
           o   Understand medical and biological concepts and set of problems.
           o   Understand scientific framework for bioinformatics in statistics, complexity, and
               information theory.
           o   Understand machine learning algorithms and methods for biomedical informatics.
           o   Learn familiarity with specific machine learning methods and algorithms such as
               statistical learning, concept learning, hidden Markov models, case based
               reasoning, neural networks, knowledge-based systems and ontologies, genetic
               algorithms, stochastic grammars and linguistics, grid computing, and semantic
               Web.
           o   Compare different machine learning methods and learn how to select the best
               suited to solve a particular problem in biomedical informatics.
           o   Program using available biomedical informatics tools (SPSS. Clementine, Weka,
               Phylip, …).
Bioinformatics                                           TCSS 588A            Winter 2007


          o   Design and develop new computer systems for bioinformatics.

TOPICS
           1. Biological and medical foundations.
           2. Machine learning algorithms and applications to biology/life sciences.
           3. Bayesian models.
           4. Regression.
           5. Multivariate methods.
           6. Neural networks.
           7. Hidden Markov Models.
           8. Clustering.
           9. Non parametric methods.
           10. Decision trees.
           11. Phylogenetic trees induction.
           12. Combining several learners.

Detailed tentative schedule for each class, assignments, project, and schedules can be found at
the class home page at http://courses.washington.edu/tcss588 .

PREREQUISITE
    Graduate students: TCSS343
    Undergraduate students: Core completed.

TEXTBOOKS
Introduction to Machine Learning, Ethem ALPAYDIN, The MIT Press, 2004, ISBN
0-262-01211-1.

Medical Informatics: Knowledge Management and Data Mining in Biomedicine, Hsinnchun
Chen, Hsinchun Chen (Editor), Sherrilynne S. Fuller (Editor), Springer-Verlag, 2005, ISBN:
038724381X.

RECOMMENDED
  o The Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani, Jerome
    Friedman, Springer Verlag, 2001, ISBN: 0-387-95284-5.
  o Bioinformatics: The Machine Learning Approach (second edition), Pierre Baldi, Soren
    Brunak, Sren Brunak, MIT Press, 2001, ISBN: 026202506X.
  o An Introduction to Bioinformatics Algorithms, Neil C. Jones, Pavel A. Pevzner, the MIT
    Press, 2004, ISBN: 0-262-10106-8.
  o Inferring Phylogenies, Joseph Felsenstein, Sinauer Associates, 2004, ISBN:
    0-87893-177-5.
  o Data mining on multimedia data, Petra Perner, Springer Verlag, Series: Lecture Notes in
    Computer Science, Vol. 2558, 2003, ISBN: 3-540-00317-7.

CLASS WORK AND EVALUATION
Bioinformatics                                              TCSS 588A             Winter 2007


There will be three individual assignments due, a project (group or individual), and two exams.
Assignments are due by midnight on the due date, and will be submitted electronically. The
deliverables will be either individual assignments (three assignments), or group project
deliverables. Assignments will be worked on in the lab. Assignments and the project may
involve programming, but design, writing, critical thinking, and analysis will be most
predominant. Project programming will be a team effort, if this option is chosen, with peer
evaluations due at the end of the quarter. Homework assignments and project deliverables are
posted on the class Web site. Incomplete assignments will be accepted. No late assignment will
be accepted.

PROJECT WORK
The project will be developed in teams of no more than 2 students. The teams will be formed
during the first two weeks of class. Although grades for the different components of the project
will be granted at the team-level, a peer evaluation will be performed at the end of the project,
and may alter, either positively, or negatively, a student’s final project grade.

PARTICIPATION
Participation will be evaluated by attendance, and active involvement in in-class discussions. In
order to participate efficiently in class, performing the pre-readings and critically thinking about
these is a prerequisite.

GRADING
    Assignments:       25%    (individual)
    Project:           30%    (individual or team)
    Participation      15%
    Midterm:           15%
    Final:             15%

CODE OF CONDUCT
The assignments, and of course the quizzes, and exams need to be done individually. Copying
of another student's work or code, even if changes are subsequently made, is inappropriate,
and such work or code will not be accepted. The University has very clear guidelines for
academic misconduct, and they will be enforced in this class.

COURSE CHANGES
The schedule and procedures for this course are subject to change. Changes will be announced in
class and it is the student's responsibility to learn and adjust to changes.

IMPORTANT
If you would like to request academic accommodations due to a temporary or permanent
disability, please contact Lisa Tice, Manager of Disability Support Services (DSS) in the
Mattress Factory Bldg, Suite 206. An appointment can be made through the front desk of
Student Affairs (692-4400), through Student Services (692-4501), by phoning Lisa directly at
692-4493 (voice), 692-4413 (TTY), or by e-mail (ltice@u.washington.edu). Appropriate
Bioinformatics                                         TCSS 588A           Winter 2007


accommodations are arranged after you've conferred with the DSS Manager and presented the
required documentation of your disability to DSS.
Bioinformatics                                             TCSS 588A            Winter 2007


                                 TENTATIVE SCHEDULE


             Day                     Topic                        Assignment
             Week     Introduction to class
              1       What is machine learning

             Week     Supervised learning                   Project proposal
              2       Bayesian decision theory
                      Data mining in medical informatics
             Week     Classification / regression
              3       Data mining in bioinformatics
             Week     Multivariate classification /         Project bibliography
              4       regression
                      Identification of biological
                      relationships
             Week     Dimensionality reduction              Assignment #1
              5       Clustering
                      Learning metabolic networks
             Week     Non parametric methods                Evaluation plan
              6       Decision trees
                      Learning gene pathways
                      MIDTERM
             Week     Neural networks                       Assignment #2
              7       Multilayer perceptrons
                      The genomic data mine
             Week     Hidden Markov Models                  Draft of article
              8       Exploratory genomic data analysis
             Week     Local models                          Assignment #3
              9       Classification
                      Joint learning in genomics
             Week     Multiple learners                     Oral presentation
              10      Reinforcement learning
                      Infectious disease informatics
             Finals   FINAL (M 3/12)                        Final article
             week

Bioinformatics.doc

  • 1.
    Bioinformatics TCSS 588A Winter 2007 TCSS 588 A Winter 2007 Bioinformatics Instructor: Isabelle Bichindaritz, Ph.D. Class: PNK 104 M/W 7:30 P.M. – 9:45 P.M. CP 206M for assignments Email: ibichind@u.washington.edu Office: Cherry Parkes 216 Office hours: M/W 2:20 P.M. – 4:20 P.M. always by email by appointment Class Web-site: http://courses.washington.edu/tcss588 DESCRIPTION The bioinformatics course explains how to apply computer science to biology, medicine, genomics, and proteomics. After a detailed presentation of a selection of life sciences problems, the course examines and compares specific computational methods and systems to solve them, focusing on machine learning, concept learning, statistical learning, hidden Markov models, case based reasoning, neural networks, evolutionary computing, knowledge-based systems and ontologies, stochastic grammars and linguistics, grid computing, and semantic Web. Several applications and projects (the Human Genome Project) are detailed, and tools to build new applications are provided. OBJECTIVES Some of the objectives for this course include: o Understand medical and biological concepts and set of problems. o Understand scientific framework for bioinformatics in statistics, complexity, and information theory. o Understand machine learning algorithms and methods for biomedical informatics. o Learn familiarity with specific machine learning methods and algorithms such as statistical learning, concept learning, hidden Markov models, case based reasoning, neural networks, knowledge-based systems and ontologies, genetic algorithms, stochastic grammars and linguistics, grid computing, and semantic Web. o Compare different machine learning methods and learn how to select the best suited to solve a particular problem in biomedical informatics. o Program using available biomedical informatics tools (SPSS. Clementine, Weka, Phylip, …).
  • 2.
    Bioinformatics TCSS 588A Winter 2007 o Design and develop new computer systems for bioinformatics. TOPICS 1. Biological and medical foundations. 2. Machine learning algorithms and applications to biology/life sciences. 3. Bayesian models. 4. Regression. 5. Multivariate methods. 6. Neural networks. 7. Hidden Markov Models. 8. Clustering. 9. Non parametric methods. 10. Decision trees. 11. Phylogenetic trees induction. 12. Combining several learners. Detailed tentative schedule for each class, assignments, project, and schedules can be found at the class home page at http://courses.washington.edu/tcss588 . PREREQUISITE Graduate students: TCSS343 Undergraduate students: Core completed. TEXTBOOKS Introduction to Machine Learning, Ethem ALPAYDIN, The MIT Press, 2004, ISBN 0-262-01211-1. Medical Informatics: Knowledge Management and Data Mining in Biomedicine, Hsinnchun Chen, Hsinchun Chen (Editor), Sherrilynne S. Fuller (Editor), Springer-Verlag, 2005, ISBN: 038724381X. RECOMMENDED o The Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani, Jerome Friedman, Springer Verlag, 2001, ISBN: 0-387-95284-5. o Bioinformatics: The Machine Learning Approach (second edition), Pierre Baldi, Soren Brunak, Sren Brunak, MIT Press, 2001, ISBN: 026202506X. o An Introduction to Bioinformatics Algorithms, Neil C. Jones, Pavel A. Pevzner, the MIT Press, 2004, ISBN: 0-262-10106-8. o Inferring Phylogenies, Joseph Felsenstein, Sinauer Associates, 2004, ISBN: 0-87893-177-5. o Data mining on multimedia data, Petra Perner, Springer Verlag, Series: Lecture Notes in Computer Science, Vol. 2558, 2003, ISBN: 3-540-00317-7. CLASS WORK AND EVALUATION
  • 3.
    Bioinformatics TCSS 588A Winter 2007 There will be three individual assignments due, a project (group or individual), and two exams. Assignments are due by midnight on the due date, and will be submitted electronically. The deliverables will be either individual assignments (three assignments), or group project deliverables. Assignments will be worked on in the lab. Assignments and the project may involve programming, but design, writing, critical thinking, and analysis will be most predominant. Project programming will be a team effort, if this option is chosen, with peer evaluations due at the end of the quarter. Homework assignments and project deliverables are posted on the class Web site. Incomplete assignments will be accepted. No late assignment will be accepted. PROJECT WORK The project will be developed in teams of no more than 2 students. The teams will be formed during the first two weeks of class. Although grades for the different components of the project will be granted at the team-level, a peer evaluation will be performed at the end of the project, and may alter, either positively, or negatively, a student’s final project grade. PARTICIPATION Participation will be evaluated by attendance, and active involvement in in-class discussions. In order to participate efficiently in class, performing the pre-readings and critically thinking about these is a prerequisite. GRADING Assignments: 25% (individual) Project: 30% (individual or team) Participation 15% Midterm: 15% Final: 15% CODE OF CONDUCT The assignments, and of course the quizzes, and exams need to be done individually. Copying of another student's work or code, even if changes are subsequently made, is inappropriate, and such work or code will not be accepted. The University has very clear guidelines for academic misconduct, and they will be enforced in this class. COURSE CHANGES The schedule and procedures for this course are subject to change. Changes will be announced in class and it is the student's responsibility to learn and adjust to changes. IMPORTANT If you would like to request academic accommodations due to a temporary or permanent disability, please contact Lisa Tice, Manager of Disability Support Services (DSS) in the Mattress Factory Bldg, Suite 206. An appointment can be made through the front desk of Student Affairs (692-4400), through Student Services (692-4501), by phoning Lisa directly at 692-4493 (voice), 692-4413 (TTY), or by e-mail (ltice@u.washington.edu). Appropriate
  • 4.
    Bioinformatics TCSS 588A Winter 2007 accommodations are arranged after you've conferred with the DSS Manager and presented the required documentation of your disability to DSS.
  • 5.
    Bioinformatics TCSS 588A Winter 2007 TENTATIVE SCHEDULE Day Topic Assignment Week Introduction to class 1 What is machine learning Week Supervised learning Project proposal 2 Bayesian decision theory Data mining in medical informatics Week Classification / regression 3 Data mining in bioinformatics Week Multivariate classification / Project bibliography 4 regression Identification of biological relationships Week Dimensionality reduction Assignment #1 5 Clustering Learning metabolic networks Week Non parametric methods Evaluation plan 6 Decision trees Learning gene pathways MIDTERM Week Neural networks Assignment #2 7 Multilayer perceptrons The genomic data mine Week Hidden Markov Models Draft of article 8 Exploratory genomic data analysis Week Local models Assignment #3 9 Classification Joint learning in genomics Week Multiple learners Oral presentation 10 Reinforcement learning Infectious disease informatics Finals FINAL (M 3/12) Final article week