Slideshare.net (beta)

 
Post to TwitterPost to Twitter
Post: 
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons

All comments

Add a comment on Slide 1

If you have a SlideShare account, login to comment; else you can comment as a guest


Showing 1-50 of 1 (more)

An Open Source Framework for Teaching BIoinformatics

From bosc, 2 years ago

Title: An Open Source Framework for Teaching Bioinformatics Author more

1316 views  |  0 comments  |  1 favorite  |  58 downloads
 

Categories

Add Category
 
 

Groups / Events

 

 
Embed
options

More Info

This slideshow is Public
Total Views: 1316
on Slideshare: 1316
from embeds: 0

Slideshow transcript

Slide 1: An Open Source Framework for Teaching Bioinformatics Kam D. Dahlquist Department of Biology John David N. Dionisio Department of Electrical Engineering & Computer Science Loyola Marymount University BOSC Vienna, Austria July 20, 2007

Slide 2: Outline • Motivation • Open source culture • Implementation --Computer science curriculum --Bioinformatics course

Slide 3: Scientific Computing and the Digital Divide Wilson GV (2006) Where’s the real bottleneck in scientific computing? American Scientist 94:5–6. Scientists who come to computer science after being trained in a different primary discipline often have to rediscover, relearn, or keep up with work in the computer science and software development realms in order to get the most out of their work. This causses unecessary and unknowing repetitions of past discoveries and errors. Tools or paradigms that are out-of-date in computer science and software engineering remain in place. At worst, software flaws slow or impede research. Baxter SM, Day SW, Fetrow JS, and Reisinger SJ (2006) Scientific software development is not an oxymoron. PLoS Computational Biology 2:e87.

Slide 4: The Disconnect Between Undergraduate Computer Science Training and Expectations and Skill Sets Required for Industry and Research Undergraduate Industry Training Expectation Work alone Work in a team “Toy” programs Large, modular and algorithms project Code longevity Throwaway (for better or code worse)

Slide 5: http://recourse.cs.lmu.edu/ inroads – The SIGCSE Bulletin, Volume 39, Number 2, 2007 June, pp. 70-74

Slide 6: Official Open Source Definition (version 1.9) Free redistribution No discrimination against fields of endeavor Source code Distribution of license Derived works License must not be specific to a product Integrity of the author’s License must not source code restrict other software No discrimination against License must be persons or groups technology-neutral

Slide 7: Open Source Values • Source code is available, modifiable, and long-lived • Accountability implies community --explicit “paper trail” for how a particular program changes over time --communication and answerability • Responsibilities accompany rights --explicit consideration of the license --giving credit where it is due --appropriate access to software, documentation, and communities --acknowledging security, privacy, legal issues

Slide 8: Open Source Teaching Framework Source Code: • All code resides in a centralized, public repository • As much as possible, everyone’s code is visible to everyone else for code review or team fixing • No code is thrown away, it remains available to future “generations”

Slide 9: Open Source Teaching Framework Source Code: • All code resides in a centralized, public repository • As much as possible, everyone’s code is visible to everyone else for code review or team fixing • No code is thrown away, it remains available to future “generations” Quality & Community: • Documentation, inline and online • Automated tests • Constructive code review, beyond “does it work?” • Long-term projects release early, release often • Form collaborative communities among faculty, students, classes, and projects

Slide 10: Progression through Computer Science Curriculum • Sample code bazaar --the creation and maintenance of live, organized, searchable, student accessible sample code libraries --students check out, modify, and check in code • Test infection (Erich Gamma) --test suite vs. implementation matrix • Life cycle of code --as juniors and seniors, students revisit code written the first year --direct experience with need for documentation • Release early, release often --applies to both students and faculty

Slide 11: “CourseForge” A Hardware + Software Infrastructure for Supporting the Teaching Framework • Certain teaching elements are impractical without some degree of automation • Support platform for management of student work --revision control --test frameworks --communication --issue tracking • Derived from open source software, delivered as open source software — the system will interoperate with existing open source tools

Slide 12: CMSI 698: Special Studies in Bioinformatics • Team-taught by a biologist and a computer scientist

Slide 13: CMSI 698: Special Studies in Bioinformatics • Team-taught by a biologist and a computer scientist • Enrollment in Spring 2006: -- eight students from Master’s degree program in Computer Science -- several coming from aerospace industry -- none with more than college-level introductory biology

Slide 14: CMSI 698: Special Studies in Bioinformatics • Team-taught by a biologist and a computer scientist • Enrollment in Spring 2006: -- eight students from Master’s degree program in Computer Science -- several coming from aerospace industry -- none with more than college-level introductory biology • Project-based class began development of XMLPipeDB

Slide 15: CMSI 698: Special Studies in Bioinformatics • Team-taught by a biologist and a computer scientist • Enrollment in Spring 2006: -- eight students from Master’s degree program in Computer Science -- several coming from aerospace industry -- none with more than college-level introductory biology • Project-based class began development of XMLPipeDB • XMLPipeDB development continued by four students in summer session course entitled Open Source Software Development Workshop

Slide 16: CMSI 698: Special Studies in Bioinformatics • Team-taught by a biologist and a computer scientist • Enrollment in Spring 2006: -- eight students from Master’s degree program in Computer Science -- several coming from aerospace industry -- none with more than college-level introductory biology • Project-based class began development of XMLPipeDB --authentic bioinformatics problem to solve • XMLPipeDB development continued by four students in summer session course entitled Open Source Software Development Workshop • One student continued development for Master’s thesis

Slide 17: XMLPipeDB Project Management: Lessons Learned • Students on the project had varying levels of maturity, knowledge, and skill coming into the project -- some naturally took on a leadership role -- some hung back or did the minimum required to get by

Slide 18: XMLPipeDB Project Management: Lessons Learned • Students on the project had varying levels of maturity, knowledge, and skill coming into the project -- some naturally took on a leadership role -- some hung back or did the minimum required to get by • Needed to increase communication and sense of team -- students preferred to interact with faculty for questions, rather than each other -- bug trackers and developer’s forum used only sporadically -- implemented weekly reports on Wiki to increase accountability

Slide 19: XMLPipeDB Project Management: Lessons Learned • Students on the project had varying levels of maturity, knowledge, and skill coming into the project -- some naturally took on a leadership role -- some hung back or did the minimum required to get by • Needed to increase communication and sense of team -- students preferred to interact with faculty for questions, rather than each other -- bug trackers and developer’s forum used only sporadically -- implemented weekly reports on Wiki to increase accountability • [SourceForge servers were frequently down during class]

Slide 20: XMLPipeDB Project Management: Lessons Learned • Students on the project had varying levels of maturity, knowledge, and skill coming into the project -- some naturally took on a leadership role -- some hung back or did the minimum required to get by • Needed to increase communication and sense of team -- students preferred to interact with faculty for questions, rather than each other -- bug trackers and developer’s forum used only sporadically -- implemented weekly reports on Wiki to increase accountability • [SourceForge servers were frequently down during class] • 6 months from conception to product

Slide 21: XMLPipeDB Project Management: Lessons Learned • Students on the project had varying levels of maturity, knowledge, and skill coming into the project -- some naturally took on a leadership role -- some hung back or did the minimum required to get by • Needed to increase communication and sense of team -- students preferred to interact with faculty for questions, rather than each other -- bug trackers and developer’s forum used only sporadically -- implemented weekly reports on Wiki to increase accountability • [SourceForge servers were frequently down during class] • 6 months from conception to product • Even the weakest student contributed useable code

Slide 22: Take-home Messages • Catch them early • Making open source values and software development best practices explicit will produce better software and better computer scientists

Slide 23: LMU Bioinformatics Group http://xmlpipedb.cs.lmu.edu Kam D. Dahlquist http://myweb.lmu.edu/kdahqui XSD-to-DB UniProtDB kdahlquist@lmu.edu Adam Carasso Joe Boyle Jeffrey Nicholas Joey Barrett Scott Spicer John David N. Dionisio http://myweb.lmu.edu/dondi GODB XMLPipeDBUtils dondi@lmu.edu Scott Spicer David Hoffman Roberto Ruiz Babak Naffas Jeffrey Nicholas GenMAPP Builder Ryan Nakamoto Joey Barrett Jeffrey Nicholas Scott Spicer Special Thanks GenMAPP.org Development Group Caskey L. Dickson, Wesley T. Citti NSF CCLI Program (http://recourse.cs.lmu.edu)