1. George Mason University – Graduate Council Graduate Course Approval Form
All courses numbered 500 or above must be submitted to the Graduate Council for final approval after approval by the
sponsoring College, School or Institute.
Graduate Council requires submission of this form for a new course or any change to existing courses. For a new course,
please attach a copy of the syllabus and catalog description (with catalog credit format, e.g. 3:2:1). The designated
representative of the College, School or Institute should forward the form along with the syllabus and catalog description,
if required, as an email attachment (in one file) to the secretary of the Graduate Council. A printed copy of the form with
signatures and the attachments should be brought to the Graduate Council meeting. Please complete the Graduate Course
Coordinator Form if the proposed changes will affect other units.
Note: Colleges, Schools or Institutes are responsible for submitting new or modified catalog descriptions (35 words or
less, using catalog format) to Creative Services by deadlines outlined in the yearly Catalog production calendar.
Please indicate: New___X____ Modify_______ Delete_______
Department/Unit: BCB/COS Course Subject/Number: BINF 760
Submitted by: Prof. John J. Grefenstette Ext: 8398 Email: jgrefens@gmu.edu
Course Title: Machine Learning for Bioinformatics
Effective Term (New/Modified Courses only): Spring 2009 Final Term (deleted courses only):____________
Credit Hours: (Fixed) _3 (Var.) ______ to ______ Grade Type (check one): ___X__ Regular graduate (A, B, C, etc.)
_____ Satisfactory/No Credit only
_____ Special graduate (A, B, C, etc. +
IP)
Repeat Status*(check one): ___ NR-Not repeatable _X___ RD-Repeatable within degree ____ RT-Repeatable within term
*Note: Used only for special topics, independent study, or internships courses Total Number of Hours Allowed: ______
Schedule Type Code(s): 1.LEC LEC=Lecture SEM=Seminar STU=Studio INT=Internship IND=Independent Study
2.____ LAB=Lab RCT=Recitation (second code used only for courses with Lab or Rct
component)
Prereq _X_ Coreq ___ (Check one):__BINF630, BINF631 and BINF634, or permission of instructor.
_________________________________________________________________________________________
_
Note: Modified courses - review prereq or coreq for necessary changes; Deleted courses - review other courses to correct prereqs that list the deleted course.
Description of Modification (for modified
courses):____________________________________________________________________
Special Instructions (major/college/class code restrictions, if
needed) :__________________________________________
Department/Unit Approval Signature: _________________________________________ Date: ___________
College/School Committee Approval Signature: __________________________________ Date: ___________
2. Graduate Council Approval Date: ____________ Provost Office Signature: ________________________________
George Mason University Graduate Course Coordination Form
Approval from other units:
Please list those units outside of your own who may be affected by this new, modified, or deleted course. Each of these units must
approve this change prior to its being submitted to the Graduate Council for approval.
Unit: Head of Unit’s Signature: Date:
Unit: Head of Unit’s Signature: Date:
Unit: Head of Unit’s Signature: Date:
Unit: Head of Unit’s Signature: Date:
Unit: Head of Units Signature: Date:
Graduate Council approval: ______________________________________________ Date: ____________
Graduate Council representative: __________________________________________ Date: ____________
Provost Office representative: ____________________________________________ Date: ____________
3. Course Proposal Submitted to the Graduate Council
by
The College of Science
1. COURSE NUMBER AND TITLE: BINF 760 Machine Learning for Bioinformatics
Course Prerequisites: Familiarity with bioinformatics methods and databases (e.g., BINF630), molecular
cell biology (e.g., BINF631), bioinformatics programming (e.g., BINF634), or permission of the instructor.
Catalog Description: (3:3:0) Machine learning and data mining methods relevant to application to problems
in computational biology. Methods include decision trees, random forests, rule learning methods, support
vector machines, neural networks, genetic algorithms, instance based learning, Bayesian networks, and
evaluation metrics for learning systems. Applications include cancer prediction, gene finding, protein
function classification, gene regulation network inference, and other recent bioinformatics applications
selected from the literature. In addition to lectures from the instructor, students will present papers from the
literature, and complete a machine learning project.
2. COURSE JUSTIFICATION:
Course Objectives:
Students taking this course will:
1. Learn and number of common machine learning and data mining methods;
2. Learn how these methods are being applied to many problems in bioinformatics, including cancer
prediction, gene finding, protein function classification, and gene regulation network inference;
3. Present and critically analyze recent research articles from the literature;
4. Design and execute a machine learning project in bioinformatics;
5. Prepare and present both an oral report and a written report in a format suitable for publication.
Course Necessity:
The methods covered in this course are widely used in current bioinformatics research, and are likely to be
useful to graduate student in bioinformatics as part of their research projects and dissertations.
Course Relationship to Existing Programs:
This course is an elective course in the Bioinformatics and Computational Biology M.S. and Ph.D. programs.
It will also serve as an elective course in the Bioinformatics Certificate program. This course would also be
appropriate as an elective for graduate students in BioSciences, Neuroscience, or Computer Science.
Course Relationship to Existing Courses:
No existing course covers both the relevant machine learning methods and the specific issues involved in
bioinformatics applications.
3. APPROVAL HISTORY:
4. SCHEDULING AND PROPOSED INSTRUCTORS:
4. Semester of Initial Offering: Spring 2009.
(This course was offered as BINF739 (Topics in Bioinformatics) during Spring 2008.)
Proposed Instructors: Prof. John Grefenstette or Prof. Jeff Solka
5. TENTATIVE SYLLABUS: See attached.
5. BINF 760
Machine Learning for Bioinformatics
-- SYLLABUS --
Prerequisites: Familiarity with bioinformatics methods and databases (e.g., BINF630), molecular cell biology
(e.g., BINF631), bioinformatics programming (e.g., BINF634), or permission of the instructor.
Credits: 3
Instructor: Prof. John J. Grefenstette
Office Hours: M 4-5 pm.
Course Description:
Machine learning and data mining methods have been applied to many problems in genomics, including cancer
prediction, gene finding, protein function classification, and gene regulation network inference. This course
provides an intensive introduction to machine learning methods relevant to application to problems in
computational biology. Methods include decision trees, random forests, rule learning methods, support vector
machines, neural networks, genetic algorithms, instance based learning, Bayesian networks, and evaluation
metrics for learning systems. Applications include selecting predictive genes from microarray studies,
classification of protein function based on structure, haplotype inference, and other recent bioinformatics
applications selected from the literature. In addition to lectures from the instructor, students will present papers
from the literature, and complete a machine learning project.
The students take a very active role in this course. The instructor will provide background lectures and lead
discussions on machine learning techniques, methodology and applications. Students will present and discuss
articles focusing on specific applications. Student will also spend significant time presenting and discussing
course projects in class.
Lecture Content:
• Week 1: Overview of Machine Learning. Concepts, instances, attributes. Knowledge representations.
Survey of applications to bioinformatics.
• Week 2: Evaluation of learning methods. Performance measures, cross-validation, ROC curves, evaluating
numeric predictions. Hands-on exercises using WEKA.
• Week 3: Decision trees, random forests, classification rules. Selected bioinformatics applications.
• Week 4: Support vector machines, radial basis functions, neural networks. Student presentations of
research articles.
• Week 5: Instance based learning, genetic algorithms. Student presentations of research articles.
• Week 6: Unsupervised learning: Clustering. Student presentations of research articles. Written project
proposals due.
• Week 7: Oral presentations of project proposals.
• Week 8: Bayesian networks. Selected bioinformatics applications.
• Week 9: Attribute selection and transformations, PCA. Selected bioinformatics applications.
• Week 10: Combining multiple models: Bagging, boosting, stacking.
• Week 11: Project preparation and working session.
6. • Week 12: Current trends in machine learning. Recent applications.
• Week 13: Student project presentations.
• Week 14: Student project presentations.
Homework: Student will be assigned homework assignments using the WEKA machine learning software
toolkit. Assignments will cover the operation of the software, data preparation, data cleansing, visualization,
use of classifiers, and experimental design.
Project: A significant portion of your grade depends on your participation in classroom discussions and oral
presentations. A course project will be defined by each student, subject to the approval of the instructor. Each
student will give two classroom presentations about the project: one describing its plan and rationale, and one
describing the results. A written final report in publishable format is also required.
Exams: There will be no exams for the course. The final written report on the course project will be due on the
Final Exam date.
Grading Criteria:
• Class discussion participation: 25%
• Presentation of research article: 25%
• Project presentations: 25%
• Final project report: 25%
Required Texts:
Data Mining: Practical Machine Learning Tools and Techniques (Second Edition), by Ian H. Witten and Eibe
Frank.
Other readings will be assigned from the bioinformatics research literature.