Applying a new software development paradigm to biology


Published on

May 7-11, 2003: Giddings, M. C. and Long, J. “Applying a New Software Development Paradigm to Biology: Developing applications that handle complexity and stand the test of time”. Poster session presented with Dr. M. C. Giddings, of the University of North Carolina, Chapel Hill, at the Genome Informatics Conference, sponsored by Cold Spring Harbor Laboratory.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Applying a new software development paradigm to biology

  1. 1. Cover Page   Applying a New  Software Development  Paradigm to Biology  Authors: M. C. Giddings and Jeffrey G. Long ( Date: May 7, 2003 Forum: Poster session presented the Genome Informatics Conference, sponsored by Cold Spring Harbor Laboratory. Contents Page 1: Abstract Pages 2‐20: Slides (but no text) for presentation   License This work is licensed under the Creative Commons Attribution‐NonCommercial 3.0 Unported License. To view a copy of this license, visit‐nc/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.  Uploaded June 26, 2011 
  2. 2. Genome Informatics LongPreference: Oral presentationAPPLYING A NEW SOFTWARE DEVELOPMENT PARADIGM TOBIOLOGYM.C. Giddings, University of North Carolina; J. LongRules are typically hard-coded into software applications, and themaintenance of these rules as they change, due to updated domainknowledge or user requirements, results in a significant time and costexpenditure. Subject experts must communicate the rules they wish tosee automated to programmers who often are not experts in the subjectmatter of the application; much can be lost in the translation. As thisprocess continues through time, software systems become large andunwieldy, such that no one involved in a project can comprehend ormanage it as a whole. There have been numerous initiatives directed atsolving these problems, but the solutions have been only partially usefulbecause the problems they address are actually secondary andsymptomatic rather than primary.The premise of Ultra-Structure theory is that these issues can beaddressed by removing most rules and all knowledge of the world fromsoftware and instead representing them the same way we represent data,i.e. as tables in a relational database. This approach combines keyfeatures of the normally disparate areas of management informationsystems, expert systems, and simulations, borrowing the strengths ofeach and potentially eliminating some of the known problems of each.Ultra-Structure has been applied to a variety of rule-based systems, andwe are investigating its utility for biology. In particular, we’ve beenbuilding a multi-function prototype that can be used to store, in anintegrated and manageable way, laboratory results, simulations, andgeneral biological knowledge pertaining to microbial genomics andproteomics research efforts. Based on results thus far we believe theapproach warrants further investigation. The presentation is intended tointroduce Ultra-Structure theory, discuss the prototype biological systembeing developed, and generate discussion with our peers about thebenefits and pitfalls of this approach.
  3. 3. Applying a New Software DevelopmentParadigm to Biology: Developing applications that handleP di t Bi l complexity and stand the test of time Morgan Giddings and Jeff Long Genome Informatics Conference,
  4. 4. Fundamental Hypothesis of Notational Engineering Many problems in government, science, business, the arts, and engineering exist solely because of the way we currently represent them. These problems present an apparent “complexity barrier” and cannot be complexity barrier resolved with more computing power or more money. Their resolution requires a new abstraction, which becomes the basis of a notational revolution and solves a whole class of previously-intractable problems.2 May 2003
  5. 5. A New Notational System Often Requires a Change of Paradigm  A way of looking at a subject  An example, pattern, archetype, or model  A set of unconscious assumptions we have about a subject3 May 2003
  6. 6. Current Paradigm Assumption 1  Computer applications are defined in terms of algorithms and data  Algorithms are the rules which are used to manipulate the d h data; ddata and rules are di i d l distinct  The model for this is the abacus  When using computer systems, algorithms are systems implemented as software  But all knowledge should be stored in a formal (executable), public (executable) “public”, and readily updateable format4 May 2003
  7. 7. Current Paradigm Assumption 2  Software can be designed using the same approaches as other engineering fields – e.g. civil, electrical, or aeronautical engineering, using the “waterfall” development methodology – but it’s not the same: in addition to being complex, software and the requirements it supports are dynamic and change greatly over short periods of time  A new design approach is required that can handle both complexity and changing requirements5 May 2003
  8. 8. Current Paradigm Assumption 3  Subject experts can communicate their requirements to programmers – but their expertise took many years to acquire – their own understanding will evolve  But subject experts must see working prototypes, not paper representations (e.g. flowcharts, OO diagrams), in order to truly understand what they will be getting  Subject experts must be able to directly and continuously update an application’s rules as needed6 May 2003
  9. 9. Ultra-Structure Addresses These Issues  Remove 99% of all rules from the software  Represent them in a standard If/Then form R t th i t d d If/Th f (multiple ‘Ifs’, multiple ‘Thens’)  Represent them as records of data within a very small set of tables  Distinction between rules and data largely disappears!7 May 2003
  10. 10. We Need a More Insightful Way to Look at Complex Systems and Processes observables surface structure generates rules middle structure constrains groups of rules f l deep structure8 May 2003
  11. 11. The Ruleform Hypothesis Complex system structures are created by not-necessarily complex processes; and these processes are created by the animation of competency rules. Competency rules can be grouped into a small number of classes whose form is prescribed b " l f ib d by "ruleforms". Whil the competency rules of a " While th t l f system change over time, the ruleforms remain constant. A well-designed collection of ruleforms can anticipate all logically possible competency rules that might apply to the system and system, constitutes the deep structure of the system.9 May 2003
  12. 12. How are Rules Best Represented?  Statement of rules and device for executing them can be different; need not be software for both  Rules can be reformulated into a canonical form of “If a and b and c... then consider x and y and z”  Thousands or millions of rules can b grouped i Th d illi f l be d into 10 10- 50 ruleforms (classes of rules) based on their syntax and semantics  These ruleforms can be implemented as tables in a RDBMS and managed easily by standard RDBMS tools; the application essentially becomes an Expert ; pp y p System using a RDBMS10 May 2003
  13. 13. What is the Design Process?  Design proceeds by iterative prototype with monthly f db k f thl feedback from users; smallll prototypes can easily evolve to any necessary level of complexity  Basic design process is to: – define what exists (existential rules) – define relations between these (network & authorization rules) – define processes (protocol & meta-protocol rules)11 May 2003
  14. 14. Ultra-Structure Benefits  Software size is reduced by 2+ orders of magnitude – simpler to create, manage, understand, t t document, and i l t t d t d test, d t d teach – remaining software has no knowledge of the world; it provides basic b i control l i th t k t l logic that knows what t bl t check i what h t tables to h k in h t order, how to resolve conflicts, etc.  The development team is very small (e.g. <10 people) and is therefore much more manageable than a large team of dozens or hundreds of developers, and it does a better job by any metric12 May 2003
  15. 15. Ultra-Structure Benefits (cont’d)  Most knowledge is externalized and is in a g form anyone can see and understand  Subject experts can enter, change, and j g otherwise manage rules (knowledge) directly, without going to programmers for assistance  Knowledge is actionable not only by subject experts (e.g. as an encyclopedia) but also by the th computer, for reasoning, simulations, t f i i l ti decision support, etc.13 May 2003
  16. 16. Ultra-Structure Benefits (cont’d)  Programmers do not need to know or understand all rules, j t enough t d t d t d ll l just h to determine i the classes of rules and the proper animation procedures  Serious prototyping becomes feasible; communications with users improves  Testing & QA can be far more rigorous  Documentation can be more complete14 May 2003
  17. 17. Early Prototype of Biology Model  An integrated prototype has been developed to: – simulate simple RNA->polypeptide process RNA polypeptide – store and analyze laboratory results – store general biological and chemical knowledge – compare simulated and actual lab results – track sources of knowledge  Key conceptual components of model include: – BioEntities (chemical elements and compounds, biological compounds objects such as amino acids and RNA, lab techs) – BioEvents (activities engaged in by BioEntities) – resources (people books lab equipment that provided (people, books, information used in model)15 May 2003
  18. 18. Examples of BioEntities16 May 2003
  19. 19. Possible Relations between P ibl R l ti b t BioEntities and/or BioEvents17 May 2003
  20. 20. Hopefully, this model can be H f ll thi d l b generalized (The CoRE Hypothesis) We can create “Competency Rule Engines”, or CoREs, consisting of <50 ruleforms, that are sufficient to represent all rules found among systems sharing broad family resemblances, e.g. all corporations. Their definitive deep structure will be permanent, unchanging, and robust f all members of th f il whose h i d b t for ll b f the family, h differences in manifest structures and behaviors will be represented entirely as differences in competency rules. The animation procedures for each engine will be relatively simple compared to current applications, requiring less than 100,000 lines of code in a third generation language.18 May 2003
  21. 21. References  Long, J., and Denning, D., “Ultra-Structure: A design theory for complex systems and processes.” In Communications of the ACM (January 1995) y p ( y )  Long, J., “A new notation for representing business and other rules.” In Long, J. (guest editor), Semiotica Special Issue on Notational Engineering, Volume 125-1/3 (1999)  Long, J., “How could the notation be the limitation?” In Long, J. (guest editor), Semiotica Special Issue on Notational Engineering, Volume 125- 1/3 (1999)  Long, J., Automated Long J "Automated Identification of Sensitive Information in Documents Using Ultra-Structure". In Proceedings of the 20th Annual ASEM Conference, American Society for Engineering Management (October 1999)19 May 2003