Associative
                                                                             methods in
                                                                              Systems
                                                                              Biology

                                                                                Hugh
                                                                              Shanahan
Associative methods in Systems Biology
                                                                            Outline

                                                                            Gene
                                                                            Ontologies
                Hugh Shanahan                                               Over-representation
                                                                            Semantic similarity

                                                                            Associative
                                                                            Measures
          Department of Computer Science                                    Hypotheses
         Royal Holloway, University of London                               Linear Correlation
                                                                            Partial Correlation
                                                                            Non-linear measures

             September 22, 2009                                             Validation
                                                                            DREAM




                  Hugh Shanahan    Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
Gene Ontologies

                                                                                  Associative
                                                                                  methods in
                                                                                   Systems
                                                                                   Biology

                                                                                     Hugh
Before finding interactions, need to be able                                        Shanahan


    to systematically annotate all genes                                         Outline

                                                                                 Gene
    to determine which functional groupings are                                  Ontologies
                                                                                 Over-representation
    over-represented                                                             Semantic similarity


    measure objectively the “functional similarity” of two                       Associative
                                                                                 Measures
    genes.                                                                       Hypotheses
                                                                                 Linear Correlation
                                                                                 Partial Correlation
Gene Ontology (GO) is a means to do this.                                        Non-linear measures

                                                                                 Validation
                                                                                 DREAM




                        Hugh Shanahan   Associative methods in Systems Biology
Ontologies

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
                                                                                  Biology
    Abstract method for expressing structured data.                                 Hugh
                                                                                  Shanahan
    Annotation of any gene can be expressed in terms of
                                                                                Outline
    incresingly accurate description, e.g.
                                                                                Gene
    This gene is involved in transport.                                         Ontologies
                                                                                Over-representation

    This gene is involved in vesicle mediated                                   Semantic similarity

                                                                                Associative
    transport.                                                                  Measures
                                                                                Hypotheses
    This gene is involved in vesicle fusion.                                    Linear Correlation
                                                                                Partial Correlation

    Genes may not have an accurate annotation, so                               Non-linear measures

                                                                                Validation
    definition can stop higher up in this hierarchy.                             DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
More complexity required

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
                                                                                  Biology
    Annotation is not a simple chain.
                                                                                    Hugh
    A single gene can have have a very specific annotation,                        Shanahan

    which comes from two (or more) more general                                 Outline
    descriptions.                                                               Gene
                                                                                Ontologies
    Different types of annotation as well: location,                            Over-representation
                                                                                Semantic similarity

    biochemistry, part of organism expressed in, and so on.                     Associative
                                                                                Measures
    An Ontology is a Directed Acyclic Graph (DAG), not a                        Hypotheses
                                                                                Linear Correlation
    Tree (this means a lot to Graph Theorists).                                 Partial Correlation
                                                                                Non-linear measures

    Each node in the DAG is an annotation term.                                 Validation
                                                                                DREAM
    Each “child” node can more than one “parent” nodes.



                       Hugh Shanahan   Associative methods in Systems Biology
GO’s visualised

                                                                                                                                                  Associative
 IEWS                                                                                                                                             methods in
                                                                                                                                                   Systems
                                                                                                                                                   Biology

            a                                    b                                             c                                                     Hugh
                                                                                                                 Biological
                                                                                                                 process (root)                    Shanahan

                                                                                                                                                 Outline
                                                                                                    Transport          Membrane organization
                                                                                                                       and biogenesis            Gene
asing
ficity                                                                                                                                           Ontologies
                                                                                                        is_a                        is_a
or
                                                                                                                                                 Over-representation
 larity
                                                                                                   Vesicle-mediated                              Semantic similarity
                                                                                                                             Membrane fusion
                                                                                                   transport
                                                                                                                                                 Associative
                                                                                                      part_of                     is_a
                                                                                                                                                 Measures
                                                                                                                                                 Hypotheses
                                                                                                                 Vesicle fusion
                                                                                                                                                 Linear Correlation
                                                                                                                                                 Partial Correlation
                Figure 1 | Simple trees versus directed acyclic graphs. Boxes represent nodes and arrows represent edges. a | An
                                                                                                                    Nature Reviews | Genetics
                example of a simple tree, in which each child has only one parent and the edges are directed, that is, there is a source         Non-linear measures

                (parent) and a destination (child) for each edge. b | A directed acyclic graph (DAG), in which each child can have one or
                                                                                                                                                 Validation
          Rhee et al., Nature Reviews Genetics, (2008)
                more parents. The node with multiple parents is coloured red and the additional edge is coloured grey.c | An example of
                a node, vesicle fusion, in the biological process ontology with multiple parentage. The dashed edges indicate that there         DREAM
                are other nodes not shown between the nodes and the root node (biological process). A root is a node with no incoming
                edges, and at least one leaf (also called a sink). A leaf node is a node with no outgoing edges, that is, a terminal node with
                no children (vesicle fusion). Similar to a simple tree, A DAG has directed edges and does not have cycles, that is, no path
                starts and ends at the same node, and will always have at least one root node. The depth of a node is the length of the
                longest path from the root to that node, whereas the height is the length of the longest path from that node to a leaf41.
                is_a and part_of are types of relationships that link the terms in the GO ontology. More information about the
                relationships between GO terms are found online (An Introduction to the Gene Ontology).
                                                            Hugh Shanahan                 Associative methods in Systems Biology
GO’s visualised

                                                                            Associative
                                                                            methods in
                                                                             Systems
                                                                             Biology

                                                                               Hugh
                                                                             Shanahan

                                                                           Outline

                                                                           Gene
                                                                           Ontologies
                                                                           Over-representation
                                                                           Semantic similarity

                                                                           Associative
                                                                           Measures
                                                                           Hypotheses
                                                                           Linear Correlation
                                                                           Partial Correlation
http://amigo.geneontology.org/                                             Non-linear measures

                                                                           Validation
                                                                           DREAM




                  Hugh Shanahan   Associative methods in Systems Biology
Different types of Annotation

                                                                                   Associative
                                                                                   methods in
                                                                                    Systems
                                                                                    Biology

                                                                                      Hugh
    Typically, there are three distinct ontologies                                  Shanahan

    (overwhelmingly used).                                                        Outline

    Cellular Compartment                                                          Gene
                                                                                  Ontologies
                                                                                  Over-representation
    Biological Process                                                            Semantic similarity


    Molecular Function                                                            Associative
                                                                                  Measures
                                                                                  Hypotheses
    Many other ontologies have been constructed, e.g.                             Linear Correlation
                                                                                  Partial Correlation
    Plant Organ for Arabidopsis.                                                  Non-linear measures

                                                                                  Validation
                                                                                  DREAM




                         Hugh Shanahan   Associative methods in Systems Biology
Caveat

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
                                                                                  Biology
The annotation of most genes (90%) have been carried out                            Hugh
                                                                                  Shanahan
computationally. The annotations usually work pretty well,
though they have a tendency not to be as accurate as those                      Outline

obtained by direct assay.                                                       Gene
                                                                                Ontologies
All annotated genes have an evidence code (IED)                                 Over-representation
                                                                                Semantic similarity
associated with them in order to demonstrate how much we                        Associative
can rely on it.                                                                 Measures
                                                                                Hypotheses

Increasingly, co-expression is being used as a means to                         Linear Correlation
                                                                                Partial Correlation

annotate genes, so one should be careful in not using this                      Non-linear measures


information in constructing annotations !                                       Validation
                                                                                DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
Over-representation

                                                                                  Associative
                                                                                  methods in
                                                                                   Systems
One of the most useful tools to hand when one analyses                             Biology
micro-array data is to ask what functional groupings occur                           Hugh
                                                                                   Shanahan
more often than one expects.
                                                                                 Outline
    Notation
                                                                                 Gene
    N number of genes in the genome.                                             Ontologies
                                                                                 Over-representation
                                                                                 Semantic similarity
    n number of genes which have been found to be
                                                                                 Associative
    differentially expressed.                                                    Measures
                                                                                 Hypotheses

    m number of genes in the genome with a specific                               Linear Correlation
                                                                                 Partial Correlation

    annotation.                                                                  Non-linear measures

                                                                                 Validation
    k number of genes which are differentially expressed                         DREAM


    with the same annotation.


                        Hugh Shanahan   Associative methods in Systems Biology
Probabilities

                                                                                   Associative
                                                                                   methods in
                                                                                    Systems
One can derive the probability Pk that k genes would be                             Biology

found by chance amongst n genes using the                                             Hugh
                                                                                    Shanahan
hypergeometric probability distribution and the above
                                                                                  Outline
parameters.
                                                                                  Gene
For the record                                                                    Ontologies
                                                                                  Over-representation
                                                                                  Semantic similarity

                                                                                  Associative
                             m C N−m C                                            Measures
                                k     n−k
                    Pk   =        NC
                                                   ,                        (1)   Hypotheses
                                                                                  Linear Correlation
                                    n                                             Partial Correlation

                N               N!                                                Non-linear measures

                    Cm =                .                                   (2)   Validation
                             (N − n)!n!                                           DREAM




                         Hugh Shanahan   Associative methods in Systems Biology
Difficulties

                                                                                  Associative
                                                                                  methods in
                                                                                   Systems
                                                                                   Biology
    There are thousand’s of possible GO terms and one                                Hugh
    should adjust the probabilities to deal with multiple                          Shanahan

    hypotheses.                                                                  Outline

    Applying Bonferroni correction using all GO terms gives                      Gene
                                                                                 Ontologies
    a p-value of 10−7 equivalent to 1% significence.                              Over-representation
                                                                                 Semantic similarity

    Because of the structure of the GO terms these                               Associative
                                                                                 Measures
    probabilities are highly correlated with each other.                         Hypotheses
                                                                                 Linear Correlation

    In all these cases focussing on as short a list of                           Partial Correlation
                                                                                 Non-linear measures

    possible biological processes as possible will minimise                      Validation
    the above difficulties.                                                       DREAM




                        Hugh Shanahan   Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
What genes match

In benchmarking methods to infer interactions between                            Associative
                                                                                 methods in
gene products, we expect genes that interact to have similar                      Systems
                                                                                  Biology
GO terms, though perhaps not entirely the same.                                     Hugh
Semantic Similarity is a means to measure how similar the                         Shanahan

annotations of two genes are (0 being no similarity, 1                          Outline

meaning total similarity).                                                      Gene
                                                                                Ontologies
GO provides us with a means to do this in a quantitative                        Over-representation
                                                                                Semantic similarity
fashion.
                                                                                Associative
                                                                                Measures
                                                                                Hypotheses
                                                                                Linear Correlation
                                                                                Partial Correlation
                                                                                Non-linear measures

                                                                                Validation
                                                                                DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
Simple implementation

Determine the ratio of the number of nodes two genes share                     Associative
                                                                               methods in
with the total number of nodes they have between them.                          Systems
                                                                                Biology

                                                                                  Hugh
                          #{N(G1 ) ∩ N(G2 )}                                    Shanahan
                GOsimUI =                                               (3)
                          #{N(G1 ) ∪ N(G2 )}                                  Outline
N(G1 ) being the set of nodes associated with G1 ’s                           Gene
                                                                              Ontologies
annotation.                                                                   Over-representation
                                                                              Semantic similarity

                                                                              Associative
                                                                              Measures
                                                                              Hypotheses
                                                                              Linear Correlation
                                                                              Partial Correlation
                                                                              Non-linear measures

                                                                              Validation
                                                                              DREAM




More elaborate methods are available.
                      Hugh Shanahan Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
Motivation

                                                                                  Associative
                                                                                  methods in
                                                                                   Systems
    Yesterday, encountered clustering.                                             Biology

                                                                                     Hugh
    Hypothesis 1 (weak) :- coexpression implies involvment                         Shanahan
    in the same process.
                                                                                 Outline
    Expand to many different experiments.                                        Gene
                                                                                 Ontologies
    Hypothesis 2 (strong) :- greater a level of association,                     Over-representation
                                                                                 Semantic similarity
    greater the chance of interaction.
                                                                                 Associative
                                                                                 Measures
    Hypothesis 2 is often referred to as “guilt by                               Hypotheses

    association”.                                                                Linear Correlation
                                                                                 Partial Correlation
                                                                                 Non-linear measures
    Association may tell us about interactions between
                                                                                 Validation
    gene products. It does not tell us anything about the                        DREAM


    regulation mechanism.


                        Hugh Shanahan   Associative methods in Systems Biology
Associative
                                                                                           methods in
                                                                                            Systems
                                                                                            Biology

                                                                                              Hugh
                                                                                            Shanahan

                                                                                          Outline

                                                                                          Gene
                                                                                          Ontologies
                                                                                          Over-representation
                                                                                          Semantic similarity

                                                                                          Associative
                                                                                          Measures
                                                                                          Hypotheses
                                                                                          Linear Correlation
http://www.arabidopsis.leeds.ac.uk/act/index.php                                          Partial Correlation
                                                                                          Non-linear measures


266841_at AT2G26150                                                                       Validation
heat shock transcription factor family protein contains Pfam profile:                     DREAM
PF00447 HSF-type DNA-binding domain
260978_at AT1G53540

17.6 kDa class I small heat shock protein



                                Hugh Shanahan    Associative methods in Systems Biology
What do we mean by association ?

                                                                                Associative
                                                                                methods in
                                                                                 Systems
Knowing something about the expression level of one gene                         Biology
(over many different experiments) means we know                                    Hugh
                                                                                 Shanahan
something about the expression level of the other.
Replotting the above                                                           Outline

                                                                               Gene
                                                                               Ontologies
                                                                               Over-representation
                                                                               Semantic similarity

                                                                               Associative
                                                                               Measures
                                                                               Hypotheses
                                                                               Linear Correlation
                                                                               Partial Correlation
                                                                               Non-linear measures

                                                                               Validation
                                                                               DREAM




                      Hugh Shanahan   Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
Linear Correlation
coexpression

                                                                                     Associative
                                                                                     methods in
      Simplest form of association.                                                   Systems
                                                                                      Biology
      Assume that there is a linear relationship between                                Hugh
                                                                                      Shanahan
      genes.
                                                                                    Outline
      Formally :-
                                                                                    Gene
                       y1 = a12 + c12 y2 + η12 ,                              (4)   Ontologies
                                                                                    Over-representation
                                                                                    Semantic similarity

                                                                                    Associative
           y1 , y2 are (log) expression levels                                      Measures
           η12 noise term.                                                          Hypotheses
                                                                                    Linear Correlation
           a12 , c12 parameters to be determined.                                   Partial Correlation
                                                                                    Non-linear measures

      But we’re not interested in that !                                            Validation
                                                                                    DREAM
      We are interested in asking how good a model is this
      for this pair of genes ?

                           Hugh Shanahan   Associative methods in Systems Biology
Covariance

                                                                                   Associative
                                                                                   methods in
Can estimate how good the linear model is by computing                              Systems
                                                                                    Biology

                   E((y1 − y 1 )(y2 − y 2 )) ,                                        Hugh
                                                                                    Shanahan


where y 1 , y 2 = E(y1 ), E(y2 ) are the means of y1 and y2 .                     Outline

                                                                                  Gene
    E means the expectation value of the above (think of it                       Ontologies
                                                                                  Over-representation
    for now as taking the average over all the points in the                      Semantic similarity


    previous figure).                                                              Associative
                                                                                  Measures
    Can prove to oneself (exercise) that the magnitude of                         Hypotheses
                                                                                  Linear Correlation

    the covariance is largest when y1 can be perfectly                            Partial Correlation
                                                                                  Non-linear measures

    expressed as a linear function of y2 .                                        Validation
                                                                                  DREAM
    The covariance is zero when there is no relationship at
    all between y1 and y2 .

                         Hugh Shanahan   Associative methods in Systems Biology
Associative
                                                                                                                                                                                                                                                                                              methods in
                                                                                                                                                                                                                                                                                               Systems
                                                                                                                                                                                                                                                                                               Biology

                                                                                                                                                                                                                                                                                                 Hugh
                                                                                                                   q
                                                                                                                             q


                                                                                                                                                                                                                       q
                                                                                                                                                                                                                                                                q
                                                                                                                                                                                                                                                                                               Shanahan
                                                                                                               q
                                                                                                               q                                                                                                                                    qq
                                                                                                           q
                                                                                                           q                                                                                                           q           q
     2




                                                                                                                                                2
                                                                                                    q
                                                                                                     q
                                                                                                     q                                                                             q
                                                                                                                                                                                                  q           q
                                                                                                                                                                                                                                                                                             Outline
                                                                                               qq
                                                                                                q                                                                                                     q
                                                                                                                                                                                                      q            q
                                                                                               q                                                                                                                                                    q
                                                                                           q
                                                                                           q                                                                                                                  q q                                               q
                                                                                         qq                                                                                                                    q                                                        q
                                                                                         q                                                                                                                                                 q


                                                                              q
                                                                               q
                                                                               q
                                                                                q
                                                                                q
                                                                                 q
                                                                                 q
                                                                                     q
                                                                                                                                                                  q                       qq
                                                                                                                                                                                                          q q
                                                                                                                                                                                                                   q
                                                                                                                                                                                                                           q

                                                                                                                                                                                                                                                                            q
                                                                                                                                                                                                                                                                                             Gene
     1




                                                                                                                                                1
                                                                            qq
                                                                             qq                                                                          q                    q               q                    q                            q       q
                                                                           qq                                                                                                  q                                               q                    q
                                                                           q                                                                                              q
                                                                                                                                                                                                                                                                                             Ontologies
y2




                                                                                                                                           y2
                                                                      qq
                                                                       q                                                                                                           q                                       q                        q
                                                                     q
                                                                     q                                                                                                                            q                                                                                  q
                                                                  qq                                                                                                                                  q            q                     q                  q
                                                                qq                                                                                                                                                                        q                                 q
                                                               q
                                                               q                                                                                                                   q              q                            q
                                                             qq                                                                                                                                               q                        q q

                                                      qq
                                                       q
                                                           qqq                                                                                       q
                                                                                                                                                                                                                       q
                                                                                                                                                                                                                               q        q
                                                                                                                                                                                                                                           q qq                                 q
                                                                                                                                                                                                                                                                                    q        Over-representation
                                                     qq                                                                                                                               q                                                                         q                    q   q
                                                   q
                                                   qq                                                                                                                             q           q                                             q               q                       q
                                                  q                                                                                                                                                                                                 q
                                                q
                                                 q
                                                                                                                                                                                              q
                                                                                                                                                                                                                                       q
                                                                                                                                                                                                                                                                                             Semantic similarity
     0




                                                                                                                                                0
                                               q
                                               q                                                                                                                                                              q                                             q           q
                                           q
                                           q                                                                                                                                           qq
                                         qq                                                                                                                                            q              q
                                        qq                                                                                                                                                                    q                q
                                      qq                                                                                                                                          q                            q
                                     qq                                                                                                                                                                         q                       q
                                   qq                                                                                                                                             q                                                q


                        q
                            q
                            q
                                q
                                q
                                  q

                                                                                                                                                                                          q
                                                                                                                                                                                                          q
                                                                                                                                                                                                              q
                                                                                                                                                                                                               q

                                                                                                                                                                                                                   q
                                                                                                                                                                                                                       q
                                                                                                                                                                                                                                            q
                                                                                                                                                                                                                                                                                             Associative
                    q                                                                                                                                                                                                                               q
                                                                                                                                                                                                                                                                                             Measures
     −1




                                                                                                                                                −1
                q                                                                                                                                                     q

            q                                                                                                                                                                                                                                                       q
                                                                                                                                                                                                                                                                                             Hypotheses
          0.6           0.8                1.0                   1.2                     1.4             1.6           1.8                                   −1               0                                1                                    2                                3       Linear Correlation
                                                                     y1                                                                                                                                            y1
                                                                                                                                                                                                                                                                                             Partial Correlation
                                                                                                                                                                                                                                                                                             Non-linear measures

Maximum covariance                                                                                                                         Zero covariance                                                                                                                                   Validation
                                                                                                                                                                                                                                                                                             DREAM




                                                                                                                                 Hugh Shanahan                    Associative methods in Systems Biology
Correlation

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
                                                                                  Biology

                                                                                    Hugh
We want to compare every possible pair of genes, so using                         Shanahan

the covariance is not very practical since the maximum                          Outline
covariance will vary from pair of gene to pair of gene.                         Gene
However,                                                                        Ontologies
                                                                                Over-representation
                                                                                Semantic similarity

                    E((y1 − y 1 )(y2 − y 2 ))                                   Associative
          ρ12 =                                            ,              (5)   Measures
                   E((y1 − y 1 )2 )E((y2 − y 2 )2 )                             Hypotheses
                                                                                Linear Correlation
                                                                                Partial Correlation

is bounded: −1 ≤ ρ12 ≤ 1.                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
How well does it work ?

                                                                                Associative
                                                                                methods in
                                                                                 Systems
    Number of examples of improved functional annotation.                        Biology

    Unannotated gene which is highly correlated with gene                          Hugh
                                                                                 Shanahan
    in a known response implies it is likely to be in the
    same response.                                                             Outline

                                                                               Gene
                                                                               Ontologies
                                                                               Over-representation
                                                                               Semantic similarity

                                                                               Associative
                                                                               Measures
                                                                               Hypotheses
                                                                               Linear Correlation
                                                                               Partial Correlation
                                                                               Non-linear measures

                                                                               Validation
                                                                               DREAM




                      Hugh Shanahan   Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
Associative
                                                                              methods in
                                                                               Systems
                                                                               Biology

                                                                                 Hugh
                                                                               Shanahan
Difficulty : genes correlate with many other genes, not
                                                                             Outline
just a few.
                                                                             Gene
Why ?                                                                        Ontologies
                                                                             Over-representation

Suggestion : correlations do not distinguish between                         Semantic similarity

                                                                             Associative
potential direct interactions and indirect interactions                      Measures
                                                                             Hypotheses
between gene products.                                                       Linear Correlation
                                                                             Partial Correlation
                                                                             Non-linear measures

                                                                             Validation
                                                                             DREAM




                    Hugh Shanahan   Associative methods in Systems Biology
Example

                                                                                  Associative
                                                                                  methods in
                   Other interactions                                              Systems
   A                                                                               Biology

                                                                                     Hugh
                                                                                   Shanahan
              B
                                  F                                              Outline

                                                                                 Gene
                                                                                 Ontologies
                         D                                                       Over-representation

       C                                                                         Semantic similarity

                                                                                 Associative
                                                                                 Measures
                  E                                                              Hypotheses
                                                                                 Linear Correlation
                                                                                 Partial Correlation
                                                                                 Non-linear measures

   B directly interacts with three other genes, but could be                     Validation
   highly correlated with others.                                                DREAM




   C and D would be highly correlated with each other
   even though they are not directly interacting.
                        Hugh Shanahan   Associative methods in Systems Biology
Associative
                                                                                                                                                                                                                                                                                                                                 methods in
          Artificial example: create randomised data to represent                                                                                                                                                                                                                                                                  Systems
                                                                                                                                                                                                                                                                                                                                  Biology
          expression of B.
                                                                                                                                                                                                                                                                                                                                    Hugh
          Generate two other sets of data (C and D) that are by                                                                                                                                                                                                                                                                   Shanahan

          construction highly correlated to the original data set, but                                                                                                                                                                                                                                                          Outline
          are not set out to have a relationship with each other.                                                                                                                                                                                                                                                               Gene
                                                                                                                                                                                                                                                                                                                                Ontologies
                                                                                                                                                                                                                                                                                                                                Over-representation
    3




                                                                                                   q                                                                                                       q                                                                                                                q




                                                                                                                                                                                                                     3
                                                                                                           3




                                                                                               q                                                                                                       q                                                                                                           q

                                                                                         q     q
                                                                                     qq
                                                                                         qq
                                                                                    q qqq q
                                                                                         q
                                                                                     q qq q
                                                                                           q                                                                                                      q
                                                                                                                                                                                                 qq
                                                                                                                                                                                                    q
                                                                                                                                                                                                  qq q
                                                                                                                                                                                                 qq
                                                                                                                                                                                                                                                                                                                q
                                                                                                                                                                                                                                                                                                               qqq
                                                                                                                                                                                                                                                                                                               q q
                                                                                                                                                                                                                                                                                                                q
                                                                                                                                                                                                                                                                                                                 q
                                                                                                                                                                                                                                                                                                                                Semantic similarity
    2




                                                                                         q                                                                                                       q q                                                                                                             qq q
                                                                                  qqq q
                                                                                     q
                                                                                 q qqq                                                                                                      q          q                                                                                                    q           q




                                                                                                                                                                                                                     2
                                                                                                           2




                                                                        q              q                                                                                                  q qqq q
                                                                                                                                                                                           q q                                                                                                            qq q
                                                                                                                                                                                                                                                                                                             q
                                                                                 q qqq
                                                                              qqqq q   q                                                                                                    qq q                                                                                                         qq q q q
                                                                                                                                                                                                                                                                                                           q
                                                                              q qq q q
                                                                            qqqq q q                                                                                                     q qq
                                                                                                                                                                                            qq                                                                                                      q qq qq
                                                                                                                                                                                                                                                                                                            q
                                                                                                                                                                                                                                                                                                   q q q qq q q
                                                                             q q
                                                                            q qq                                                                                                        qq q
                                                                                                                                                                                     q qq q qq q                                                                                                       q
                                                                                                                                                                                                                                                                                                  q q q q qq
                                                                           q q q qq
                                                                           q qq q
                                                                            q q q                                                                                                  q q qqq
                                                                                                                                                                                   q q q
                                                                                                                                                                                                                                                                                                       q
                                                                                                                                                                                                                                                                                                       q      q
                                                                          qq q q
                                                                      q q qq q
                                                                                                                                                                                       qq
                                                                                                                                                                                   q qq q qq
                                                                                                                                                                                     qq q q
                                                                                                                                                                                                                                                                                              q         qq
                                                                                                                                                                                                                                                                                                q q qq qq q q
                                                                                                                                                                                                                                                                                                          q
                                                                         qqq q
                                                                     qq qqqq q                                                                                                        q q                                                                                                           q q
                                                                      q qqq q
                                                                          q
                                                                          q                                                                                                      qq q qq q
                                                                                                                                                                               q qqq qq q                                                                                                       qq q q qq
                                                                                                                                                                                                                                                                                                 q q q q
                                                                                                                                                                                                                                                                                             q qq q q q
                                                                    q qqqqqqq
                                                                      q qq
                                                                       qq
                                                                      q q
                                                                      q q
                                                                  q qqq q
                                                                        q
                                                                     q qq
                                                                         q
                                                                         q
                                                                 q q qqqqq q
                                                                  q qq q q
                                                                      q qq
                                                                                                                                                                                  q q
                                                                                                                                                                             q q qqqq q
                                                                                                                                                                                   q
                                                                                                                                                                                qqq q q
                                                                                                                                                                                  qq q
                                                                                                                                                                              q qqq q q
                                                                                                                                                                                  qq
                                                                                                                                                                              q qq q
                                                                                                                                                                                                                                                                                           q q qq q q
                                                                                                                                                                                                                                                                                          q q q q qqqq q
                                                                                                                                                                                                                                                                                                 qq
                                                                                                                                                                                                                                                                                                qqq q q q
                                                                                                                                                                                                                                                                                                  q
                                                                                                                                                                                                                                                                                         q qq qqqq qqq q
                                                                                                                                                                                                                                                                                                 qq q q                         Associative
    1




                                                                qqqqqqq q q
                                                                 qqq qq q
                                                                  q qqqq q                                                                                                  qq qq qqq qq
                                                                                                                                                                                  q
                                                                                                                                                                             qqq qqq q                                                                                                q qqqq qqqq qqq q
                                                                                                                                                                                                                                                                                         q q
                                                                      qq
                                                                 qqqqq q                                                                                                  q qqqqqq q
                                                                                                                                                                           qqqqqq q                                                                                                  q q qqqq qqqq




                                                                                                                                                                                                                     1
                                                                                                           1




                                                                                                                                                                               qqq                                                                                                          q
                                                                                                                                                                                                                                                                                       q qq q q
                                                                q qqqq q
                                                              qqqqqqqq q q
                                                                     q
                                                                q qqq qq
                                                                qqqqqqq
                                                                   qq
                                                                   qqq                                                                                             q    qq q qqqqq
                                                                                                                                                                                q
                                                                                                                                                                         qqqqqqqqq
                                                                                                                                                                           qq qq qq
                                                                                                                                                                             q qq                                                                                               q           qq q
                                                                                                                                                                                                                                                                                            qq q
                                                                                                                                                                                                                                                                                      q q qqq qq
                                                                                                                                                                                                                                                                                            q q
                                                                                                                                                                                                                                                                                      q q qqq qq qq q
                                                                                                                                                                                                                                                                                      qqq q qqqq q
                                                                qq q
                                                              q qq qqq
                                                                    q                                                                                                       q qqq
                                                                                                                                                                        q qqqqq q
                                                                                                                                                                      qqqqqq q                                                                                                         qq qqqqq q
                                                              q qqq
                                                              qqqqq q q
                                                             qqqqqqq                                                                                                    q qqq
                                                                                                                                                                         q qq
                                                                                                                                                                         q q
                                                                                                                                                                     q q qqqq q
                                                                                                                                                                 qqqqqqqqqqq
                                                                                                                                                                                                                                                                                   q qq qq q q q
                                                                                                                                                                                                                                                                                         q q
                                                                                                                                                                                                                                                                                         q q
                                                                                                                                                                                                                                                                                    q qqq qqq q q
                                                                                                                                                                                                                                                                                         q
                                                              q qqqq
                                                             qqqqqqq q
                                                             qqqq q
                                                                qq
                                                                                                                                                                         q
                                                                                                                                                                      q qqqqqq
                                                                                                                                                                         q
                                                                                                                                                                        qq q q                                                                                                   qq qq qqqq qq q
                                                                                                                                                                                                                                                                             q q qqqqqq q qqq
                                                                                                                                                                                                                                                                                      q q qqq
                                                                                                                                                                                                                                                                                 q qqqqqqq q
                                                            qqqqq q                                                                                                     qq
                                                                                                                                                                q qqqqqqq q
                                                                                                                                                                               q                                                                                                 q qqqqqqqq q
                                                                                                                                                                                                                                                                                        q q

                                                                                                                                                                                                                                                                                                                                Measures
                                                                 q
                                                         q qq qqq q q                                                                                                qqqqqq q
                                                         q q qq qq
                                                      q q qqqqq q
                                                               q q
                                                         q qqq q
                                                                                                                                                                      qqq q
                                                                                                                                                                q qqqqq q
                                                                                                                                                                     qqq q q
                                                                                                                                                                         q                                                                                                            qq
                                                                                                                                                                                                                                                                                    q q q qq
                                                                                                                                                                                                                                                                                q qqq qqqq q q
                                                      q qqqqqqqq q q
                                                      q qqqqqqqq
                                                             qq q
                                                       qq qqq qq
                                                                                                                                                                       q
                                                                                                                                                                    qqqqq
                                                                                                                                                                q qqqqqq q
                                                                                                                                                                       qqq
                                                                                                                                                                qqqqqqqq q
                                                                                                                                                                     qqq q
                                                                                                                                                                       qq                                                                                                  qqq qqqqqqqqqq
                                                                                                                                                                                                                                                                                    qq qq
                                                                                                                                                                                                                                                                                     q q
                                                                                                                                                                                                                                                                                 q qqqqq
                                                        qq qqq
                                                             qq
                                                     q qq qqq q
                                                        q qqq
                                                    q qqq qqqq q                                                                                                 q qqq
                                                                                                                                                               q qqqq qq
                                                                                                                                                                qqqqq q
                                                                                                                                                                     q
                                                                                                                                                            q q qq qqqqq                                                                                                         qqqq q
                                                                                                                                                                                                                                                                                q qqqqq qq q
                                                                                                                                                                                                                                                                          q q qq qqqqq q qq
                                                                                                                                                                                                                                                                                    qq
                                                     q qqq q q
                                                          q
                                                    q qqqq q
                                                   qqqqqq qq q
                                                     qq q q q                                                                                                   q qq q
                                                                                                                                                             qqqqqqq
                                                                                                                                                               qq qqqq                                                                                                     qq q q qq qqq q
                                                                                                                                                                                                                                                                                   qqq
                                                                                                                                                                                                                                                                             q q qqqqqq
                                                                                                                                                                                                                                                                              qq q q q
                                                                                                                                                                                                                                                                                    qq
                                                                                                                                                                                                                                                                                                q
                                                  qqqqqq qq q
                                                     q qq
                                             q qqqqqqq qq qq                                                                                                     q qq
                                                                                                                                                                 q q
                                                                                                                                                            q q qqq                                                                                                    q qqqqqqqqqq q
                                                     q q
                                                      q q q
                                                    qq qq q                                                                                            q qqqqqqqq qq
                                                                                                                                                           q qqq q
                                                                                                                                                          qqqqqqqq q
                                                                                                                                                            qqq q                                                                                                           q q q qqq
                                                                                                                                                                                                                                                                         qqq qqqqqq
                                                                                                                                                                                                                                                                        qqqqqqqqqq qq q
                                                                                                                                                                                                                                                                               q
    0




                                               qq qqqqqq q
                                                   qqqqq q
                                                     q q
                                               q qqqqqq q
                                                     q q                                                                                                   q qqqqq q
                                                                                                                                                           q qq q q
                                                                                                                                                                 q
                                                                                                                                                         qq q qq q
                                                                                                                                                        qqqqqqqq q                                                                                                           q qq qq
                                                                                                                                                                                                                                                                           qq q q qqq
                                                                                                                                                                                                                                                                          qqq qq q
                                                                                                                                                          qqq q q                                                                                                         qqq qq q
                                                                                                                                                                                                                                                                                 q




                                                                                                                                                                                                                     0
                                                                                                           0




                                                                                                                                                                                                                                                                   q
                                              q qqqqqqqq
                                                   qq q q
                                                 qqqqq q
                                                 qqqqq
                                              qqqqqqqqq q
                                               qqqqqq q
                                                  qq
                                                                                                                                                          qqqqq qq
                                                                                                                                                            q qq
                                                                                                                                                         qqqqq q
                                                                                                                                                    q qqqqq qq
                                                                                                                                                               qq
                                                                                                                                                                q
                                                                                                                                                            q qqq
                                                                                                                                                                q
                                                                                                                                                                 q                                                                                                         q q qq q
                                                                                                                                                                                                                                                                         qqqqqqqqq
                                                                                                                                                                                                                                                                          qqqq q q q q
                                                                                                                                                                                                                                                                       qqq qqqqqq q
                                                                                                                                                                                                                                                                      q qq qqqqqq q
                                                                                                                                                                                                                                                                           qq q q
                                                  qq q
                                              qqqq q q
                                              q qqqq q qq
                                                  qq                                                                                                  q qqq q
                                                                                                                                                         qq q
                                                                                                                                                             q q
                                                                                                                                                          qqq q q
                                                                                                                                                    q qqqqqqqq q                                                                                                        q qq q q q
                                                                                                                                                                                                                                                                           q
                                                                                                                                                                                                                                                                        q q qqq q
                                                                                                                                                                                                                                                                  qq qqqqqqqq q q
                                           q q q qqqqqq q
                                                   q
                                            q qqqqqqqq
                                                qq q
                                                 qq                                                                                                  q qqq qq
                                                                                                                                                          q
                                                                                                                                                         qq q
                                                                                                                                                         q q
                                                                                                                                                      q q q q                                                                                                      qqqqqqq qqq
                                                                                                                                                                                                                                                                    qq qq q q q
                                                                                                                                                                                                                                                                   qqqqqqq qqqq q
                                           qqqqqq qq                                                                                               q qqqqqqqq                                                                                                             q




                                                                                                                                                                                                                 D
                                                                                                                                                                                                                                                                        qq q q
C




                                                  q
                                                                                                       D




                                                                                                                                                          q                                                                                                               q
                                                                                                                                                                                                                                                                    qqqqqqq q q
                                            q qqqq q
                                           qqq qq q q                                                                                                 qqqqqq
                                                                                                                                                      qqqqqqq                                                                                                           qq
                                          qqqqqqqq
                                              q q
                                           qq q qqq
                                               q q
                                             qqq
                                            q qq
                                    qq qqqqqq q q
                                          qqqqqq
                                         q q qqqq
                                          qq qq
                                       qqqqqqq qq
                                        qq q qq q
                                          qq q
                                       qqqqqqq q
                                            q
                                        q q qq
                                                                                                                                                      qqqqq q
                                                                                                                                                       q q q
                                                                                                                                                       q q q
                                                                                                                                                    qqqqqq q q
                                                                                                                                                   q qqqqq q
                                                                                                                                                      qqq
                                                                                                                                             q qqqqqqqqq
                                                                                                                                                  q qqqqq
                                                                                                                                                       qq
                                                                                                                                                      qq q
                                                                                                                                                  qqqqqq
                                                                                                                                                       qq
                                                                                                                                                    q qq
                                                                                                                                                   q qqq
                                                                                                                                           qq qqqqqqqq qq
                                                                                                                                                qqq qqqq
                                                                                                                                                   q qq
                                                                                                                                            q q qqqqq q q
                                                                                                                                                                                                                                                               q qq qqqqq
                                                                                                                                                                                                                                                                   q qq q
                                                                                                                                                                                                                                                                         q
                                                                                                                                                                                                                                                                        q q
                                                                                                                                                                                                                                                                q qqqqqqqq q q q
                                                                                                                                                                                                                                                                       q qq q
                                                                                                                                                                                                                                                                    qqqqqq q q
                                                                                                                                                                                                                                                                    q q qq
                                                                                                                                                                                                                                                                        q
                                                                                                                                                                                                                                                                  qq qqq q
                                                                                                                                                                                                                                                                        q
                                                                                                                                                                                                                                                             qq qqqqqqqq qqq
                                                                                                                                                                                                                                                              q qqqqqq qq q
                                                                                                                                                                                                                                                                      q
                                                                                                                                                                                                                                                                     qqq q
                                                                                                                                                                                                                                                              q qqqq q q qq
                                                                                                                                                                                                                                                                     qqq q
                                                                                                                                                                                                                                                        qq q qqqqqqq q
                                                                                                                                                                                                                                                         q q q qqq qq qq
                                                                                                                                                                                                                                                                        q
                                                                                                                                                                                                                                                                                                                                Hypotheses
                                      qqqq q q q
                                       q q q                                                                                                    q qqq q q
                                                                                                                                                  qqq                                                                                                          q     q q
                                  q qqqqq q
                                      q qq
                                       q qq                                                                                                 q qqqqq q
                                                                                                                                                    q                                                                                                       q q q qq q
                                                                                                                                                                                                                                                                 q q
                                                                                                                                                                                                                                                            q qq qqqqq
    −1




                                     q qq q
                                           q
                                  qqqqqqqq q
                                      q qq q                                                                                                q qqqqqq
                                                                                                                                               qqqq
                                                                                                                                           q qqqqqq q
                                                                                                                                                qqqq                                                                                                            q q
                                                                                                                                                                                                                                                             q q qqq
                                                                                                                                                                                                                                                             q q qqq q q




                                                                                                                                                                                                                     −1
                                                                                                           −1




                                     q
                                   qq q
                                    qq                                                                                                                                                                                                                           q q
                                    qq                                                                                                         qqq q
                                                                                                                                            qqq q
                                                                                                                                           qqqq q q q                                                                                                            q
                                                                                                                                                                                                                                                               qq q
                                                                                                                                                                                                                                                       q qqqqqqqq q q
                                                                                                                                                                                                                                                            q qqq q
                                qq qq q q
                                  q qq
                               q q qq qq
                                    qq
                                     q                                                                                                     q q qq q
                                                                                                                                             qqq
                                                                                                                                       qqqqqqqqq
                                                                                                                                                q                                                                                                       q qqqqq q
                                                                                                                                                                                                                                                                q
                             q q q qqq                                                                                                     qqq
                            q q qqqq
                               q q q
                              q qq
                            qqq qq
                                    q
                               q q qq
                               q qq q
                                    q                                                                                                   qqq q q q
                                                                                                                                         qqqqq
                                                                                                                                           qq
                                                                                                                                    q q qqqq
                                                                                                                                   qq qqq qqqq
                                                                                                                                     qqq qq
                                                                                                                                   q q q
                                                                                                                                   qqq q q
                                                                                                                                       qq qqqq
                                                                                                                                                                                                                                                  q        qqqq
                                                                                                                                                                                                                                                        q qqqqq q
                                                                                                                                                                                                                                                            qq q
                                                                                                                                                                                                                                                           qq q
                                                                                                                                                                                                                                                     q q qq q
                                                                                                                                                                                                                                                            q q
                                                                                                                                                                                                                                                  q q qqqqq q q q
                                                                                                                                                                                                                                                     q q qq q q
                                                                                                                                                                                                                                                   q q qq q
                                                                                                                                                                                                                                                      q     q                                                                   Linear Correlation
                             q qq
                              qq
                          q qqq                                                                                                      qq
                                                                                                                                 qq qq qqqq
                                                                                                                                     qqq                                                                                                              qq
                                                                                                                                                                                                                                             q q q qq q q q
                                                                                                                                                                                                                                                   q qqq      q
                          qqq
                           q                                                                                                      qqq q qq
                                                                                                                                 qqq q q
                                                                                                                                    q                                                                                                           qq q
                                                                                                                                                                                                                                                     q
                                                                                                                                                                                                                                               q q q q qq
                       q q qq q
                            q
                            qq q                                                                                               q qqq
                                                                                                                                   q                                                                                                             q qq
    −2




                        qq qq
                                                                                                                                                                                                                                                                                                                                Partial Correlation




                                                                                                                                                                                                                     −2
                                                                                                           −2




                    qq qqq                                                                                                     qq
                                                                                                                                qq                                                                                                               qq
                                                                                                                                                                                                                                               q q
                     q q q                                                                                                           q                                                                                                         q      q
                      q                                                                                                      q
                                                                                                                            q q q                                                                                                            qq
                                                                                                                                                                                                                                            qq
                                                                                                                           q q                                                                                                   q
                  q   q                                                                                                  q   q                                                                                                         qq

              q
                  q
                  q q                                                                                                    q q
                                                                                                                         q                                                                                                           q
                                                                                                                                                                                                                                      q        q
                                                                                                                                                                                                                                                                                                                                Non-linear measures
    −3




                                                                                                                                                                                                                     −3
                                                                                                           −3




                                                                                                                     q                                                                                                           q




         q                                                                                                      q                                                                                                          q
                                                                                                                                                                                                                                                                                                                                Validation



                                                                                                                                                                                                                     −4
                                                                                                           −4
    −4




         −4                −2                          0                         2                              −4                  −2                           0                         2                              −4    −3             −2             −1              0              1              2               3
                                                                                                                                                                                                                                                                                                                                DREAM
                                                   B                                                                                                         B                                                                                                         C




                  ρ = 0.98                                                                                               ρ = 0.98                                                                                              ρ = 0.96 (!!!)


                                                                                                                     Hugh Shanahan                                                                  Associative methods in Systems Biology
Extending correlations :- partial correlations

                                                                                   Associative
                                                                                   methods in
                                                                                    Systems
    Correlations only take pairs of genes into consideration.                       Biology

                                                                                      Hugh
    Partial correlations extends the initial pairwise                               Shanahan
    regression model introduced in equation 4.
                                                                                  Outline

                                                                                  Gene
          y1 = a1 + b12 y2 + b13 y3 + · · · + b1n yp + η1 .                 (6)   Ontologies
                                                                                  Over-representation
                                                                                  Semantic similarity
    Again, we are not interested in solving this explicitely.                     Associative
                                                                                  Measures
    We want to understand the correlation that each one of                        Hypotheses

    the genes y2 . . . yp has on y1 once we have removed                          Linear Correlation
                                                                                  Partial Correlation

    the effect of all the other genes.                                            Non-linear measures

                                                                                  Validation
    We will use the notation PCij to refer to this partial                        DREAM


    correlation.


                         Hugh Shanahan   Associative methods in Systems Biology
Derivation

                                                                                    Associative
    Computed easily once you have all the correlations                              methods in
                                                                                     Systems
    between all the genes.                                                           Biology

                                                                                       Hugh
                                                                                     Shanahan
                                          
                        1 ρ12 ρ13 . . .
                      ρ12 1 ρ23 . . .                                            Outline
                R= ρ                       ,         (7)
                                          
                      13 ρ23 1 . . . 
                                                                                   Gene
                                                                                   Ontologies
                        .
                        .   .
                            .    .
                                 .    ..                                           Over-representation
                        .   .    .       .                                         Semantic similarity

                                                                                   Associative
                                                                                   Measures
    Covariance matrix C is defined similarly.                                       Hypotheses
                                                                                   Linear Correlation

    ρij is the correlation between gene i and gene j.                              Partial Correlation
                                                                                   Non-linear measures

                                                                                   Validation
                                         −1
                                        Rij                                        DREAM

                      PCij = −                                               (8)
                                         −1 −1
                                        Rii Rjj

                        Hugh Shanahan     Associative methods in Systems Biology
Questions

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
                                                                                  Biology

                                                                                    Hugh
                                                                                  Shanahan

                                                                                Outline
   ρij = ρji - why ?                                                            Gene
                                                                                Ontologies
   Diagonals are 1 - why ?                                                      Over-representation
                                                                                Semantic similarity

   Exercise :- compute PC using the covariance matrix.                          Associative
                                                                                Measures
                                                                                Hypotheses
                                                                                Linear Correlation
                                                                                Partial Correlation
                                                                                Non-linear measures

                                                                                Validation
                                                                                DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
Artificial example

                                                                                  Associative
                                                                                  methods in
                                                                                   Systems
                                                                                   Biology

                                                                                     Hugh
                                                                                   Shanahan
                                
                   1.0 0.96 0.98                                                 Outline

        RBCD   =  0.96 1.0 0.98  ,                                       (9)   Gene
                                                                                 Ontologies
                   0.98 0.98 1.0                                                 Over-representation
                                                                                 Semantic similarity
                                    
                   −1.0 −0.01 0.70                                               Associative
                                                                                 Measures
       PCBCD   =  −0.01 −1.0 0.70  .                                   (10)    Hypotheses
                                                                                 Linear Correlation
                    0.70   0.70 −1.0                                             Partial Correlation
                                                                                 Non-linear measures

                                                                                 Validation
                                                                                 DREAM




                        Hugh Shanahan   Associative methods in Systems Biology
Disadvantages of using Partial correlations

                                                                                  Associative
                                                                                  methods in
                                                                                   Systems
                                                                                   Biology

                                                                                     Hugh
    High partial correlations no longer tend to go to 1 (or                        Shanahan
    -1).
                                                                                 Outline
    Dependent on ranking.                                                        Gene
                                                                                 Ontologies
    How large should/can p (the number of genes                                  Over-representation
                                                                                 Semantic similarity
    examined) be ?                                                               Associative
                                                                                 Measures
    Taking inverses of matrices should make us jumpy -                           Hypotheses

    especially when there is limited data.                                       Linear Correlation
                                                                                 Partial Correlation
                                                                                 Non-linear measures
    Problem also dates to computing correlations.                                Validation
                                                                                 DREAM




                        Hugh Shanahan   Associative methods in Systems Biology
“Large p, small n”

Notation                                                                          Associative
                                                                                  methods in
                                                                                   Systems
    p :- number of variables (in this case, expression of a                        Biology

    gene)                                                                            Hugh
                                                                                   Shanahan
    n :- number of measurements (total number of affy
                                                                                 Outline
    slides)
                                                                                 Gene
                                                                                 Ontologies

    R has of the order p2 (p(p − 1)/2 to be exact)                               Over-representation
                                                                                 Semantic similarity

    potentially interesting correlations.                                        Associative
                                                                                 Measures
    Could be dealing with of the order 10, 0002 variables.                       Hypotheses
                                                                                 Linear Correlation

    Have at best a few thousand measurements per gene :-                         Partial Correlation
                                                                                 Non-linear measures

    n ∼ 1000.                                                                    Validation
                                                                                 DREAM
    If p   n, then the definition of equation (5) gives a
    robust estimate of all those correlations, but that is not
    where we are !
                        Hugh Shanahan   Associative methods in Systems Biology
Artificial example

                                                      p/n = 0.1                                                                    p/n = 0.5                                  Associative
                                                                                                                                                                              methods in

                               600




                                                                                                           600
                                                                                                                                                                               Systems
                               400                                                                                                                                             Biology




                                                                                                           400
                  eigenvalue




                                                                                              eigenvalue
                                      !
                                      !
                                      !
                                       !!
                                        !!!
                                                                                                                  !
                                                                                                                  !
                                                                                                                   !!
                                                                                                                    !
                                                                                                                     !
                                                                                                                                                                                 Hugh
                                          !                                                                           !
                                                                                                                      !
                                           !                                                                           !
                                                                                                                                                                               Shanahan
                               200




                                                                                                           200
                                            !!
                                             !                                                                          !
                                                                                                                        !
                                              !!
                                               !                                                                         !!
                                                                                                                          !!
                                                !!!                                                                        !!
                                                 !                                                                          !
                                                  !!
                                                   !!                                                                        !!
                                                                                                                              !!
                                                    !!
                                                     !!                                                                        !!
                                                                                                                                !!
                                                      !!
                                                       !!                                                                        !!
                                                                                                                                  !!
                                                        !!
                                                         !!                                                                        !!
                                                                                                                                    !!!
                                                          !!
                                                           !!                                                                        !!!
                                                            !!!
                                                             !!!                                                                       !!!
                                                                                                                                         !!!
                                                               !!!
                                                                !!!                                                                        !!!!
                                                                                                                                             !!!!
                                                                  !!!
                                                                   !!!                                                                         !!!!!
                                                                                                                                                 !!!!!!!
                                                                     !!!!
                                                                      !!!!                                                                           !!!!!!!!!!!!
                                                                                                                                                         !!!!!!!!!!!!!!
                                                                         !!!!!!
                                                                          !!!!!!                                                                                  !!!!!!!!
                                                                                                                                                                        !!
                                                                               !!!!!!!!!!!
                                                                                !!!!!!!!!!!
                                                                                                                                                                             Outline
                               0




                                                                                                           0
                                      0       20        40         60        80        100                        0       20         40         60         80        100
                                                                                                                                                                             Gene
                                                                                                                                                                             Ontologies
                                                                                                                                                                             Over-representation
                                                                                                                                                                             Semantic similarity
                                                        p/n = 2                                                                     p/n = 10
                                                                                                                                                                             Associative
                               1500




                                                                                                           1500
                                                                                                                                                                             Measures
                                                                                                                                                                             Hypotheses
                               1000




                                                                                                           1000
                  eigenvalue




                                                                                              eigenvalue


                                                                                                                                                                             Linear Correlation
                                                                                                                  !                                                          Partial Correlation
                               500




                                                                                                           500



                                                                                                                  !
                                                                                                                  !                                                          Non-linear measures
                                                                                                                   !
                                      !                                                                             !
                                      !!
                                       !!                                                                           !!
                                        !!                                                                            !
                                         !!
                                          !!                                                                          !
                                           !!!
                                            !!!                                                                        !!!
                                              !!!!
                                                !!!!!
                                                   !!!!!!!!
                                                      !!!!!!!!!!!!!!!!!!!
                                                            !!!!!!!!!!!!!!!!!!!!!!!!!!!!
                                                                         !!!!!!!!!!!!!!!!!
                                                                                        !!
                                                                                                                        !!!!!
                                                                                                                           !!!!!!!
                                                                                                                             !!!!!!!!!
                                                                                                                                  !!!!!!!!!!!!!
                                                                                                                                       !!!!!!!!!!!!!!!
                                                                                                                                               !!!!!!!!!!!!!
                                                                                                                                                      !!!!!!!!!!!!
                                                                                                                                                             !!!!!!!!!!
                                                                                                                                                                   !!!!
                                                                                                                                                                             Validation
                               0




                                                                                                           0




                                                                                                                                                                             DREAM
                                      0       20        40         60        80        100                        0       20         40         60         80        100




Schäfer and Strimmer, Statistical Applications in Genetics
       Figure 1: Ordered eigenvalues of the sample covariance matrix S (thin black line) and
       that of an alternative estimator 4, 1, (2005)
and Molecular Biology, S (fat green line, for definition see Tab. 1), calculated from
            simulated data with underlying p-variate normal distribution, for p = 100 and various ratios Biology
                                            Hugh Shanahan             Associative methods in Systems
Explanation

                                                                                  Associative
                                                                                  methods in
                                                                                   Systems
                                                                                   Biology

    Spectrum of eigen-values.                                                        Hugh
                                                                                   Shanahan
    Any eigen-value equal to zero means matrix is
                                                                                 Outline
    non-invertible.
                                                                                 Gene
    Dashed lines - actual eigenvalues.                                           Ontologies
                                                                                 Over-representation

    Thin black lines - estimated eigenvalues using equation                      Semantic similarity

                                                                                 Associative
    (5).                                                                         Measures
                                                                                 Hypotheses

    Green line - improved estimator.                                             Linear Correlation
                                                                                 Partial Correlation

    In general, if n < p then the correlation/covariance                         Non-linear measures

                                                                                 Validation
    matrix is non-invertible.                                                    DREAM




                        Hugh Shanahan   Associative methods in Systems Biology
Strategies

                                                                                  Associative
                                                                                  methods in
                                                                                   Systems
                                                                                   Biology

                                                                                     Hugh
                                                                                   Shanahan
    Reduce p - cluster data initially, then perform analysis
    on each cluster. (Toh and Harimoto, 2002).                                   Outline

                                                                                 Gene
    Compute lower order partial correlations - compute first                      Ontologies
    order partial correlations (Magwene and Kim, 2004).                          Over-representation
                                                                                 Semantic similarity


    Employ improved estimator of correlations (Schäffer                          Associative
                                                                                 Measures
    and Strimmer, 2005).                                                         Hypotheses
                                                                                 Linear Correlation

    These options are not necessarily exclusive.                                 Partial Correlation
                                                                                 Non-linear measures

                                                                                 Validation
                                                                                 DREAM




                        Hugh Shanahan   Associative methods in Systems Biology
Shrinkage estimate - wordy explanation

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
What is computed in equation (5) is an estimate of the                            Biology

correlation based on the available data, not the actual                             Hugh
                                                                                  Shanahan
correlation if we knew the underlying multi-variate
distribution. They would coincide if we had much greater                        Outline

                                                                                Gene
statistics. That said, we can use other estimators of                           Ontologies
correlation. Statisticians have pointed out that many other                     Over-representation
                                                                                Semantic similarity

possible estimators can be used which work better in the                        Associative
                                                                                Measures
regime we lie (large p, small n).                                               Hypotheses

Shrinkage estimates attempt to combine different naive                          Linear Correlation
                                                                                Partial Correlation

estimates to get an improved estimate. The principal has                        Non-linear measures

                                                                                Validation
been around for some time (Stein, 1956) though its use has                      DREAM

increased significantly in the last ten years.


                       Hugh Shanahan   Associative methods in Systems Biology
Shrinkage estimates - the details

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
                                                                                  Biology
Notation:
                                                                                    Hugh
                                                                                  Shanahan
    C is the actual covariance matrix.
    ˆ
    C is an estimate of the covariance matrix.                                  Outline

                                                                                Gene
                   ˆ
    In computing C we could either attempt to compute it                        Ontologies
                                                                                Over-representation

    using the standard definition (“full”) or assume (for                        Semantic similarity


    example) that all the off-diagonal entries are zero                         Associative
                                                                                Measures
    (“reduced”).                                                                Hypotheses
                                                                                Linear Correlation

    ˆ
    CF for the “full” estimate of covariance matrix.                            Partial Correlation
                                                                                Non-linear measures


    ˆ
    CR for the “reduced” estimate of covariance matrix.                         Validation
                                                                                DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
Mean Square Error

                                                                                   Associative
                                                                                   methods in
                                                                                    Systems
Defining                                                                             Biology

                                                                                      Hugh
                                                                                    Shanahan
               ˆ
           MSE(C)      =             ˆ
                             E((C − C)2 ) ,                               (11)
                       =          ˆ           ˆ
                             Var (C) + Bias2 (C) .                        (12)
                                                                                  Outline

                                                                                  Gene
                  ˆ
             Var (C)   =         ˆ     ˆ
                             E((C − E(C))2 ) ,                            (13)    Ontologies
                                                                                  Over-representation

                  ˆ
            Bias(C)    =       ˆ
                             E(C) − C .                                   (14)
                                                                                  Semantic similarity

                                                                                  Associative
                                                                                  Measures
(Expectation operator is over the data that we have).                             Hypotheses
                                                                                  Linear Correlation
                                                                                  Partial Correlation
          ˆ                      ˆ
    Bias(CF ) is small but Var (CF ) will be large.                               Non-linear measures

                                                                                  Validation
         ˆ                           ˆ
    Var (CR ) will be small but Bias(CR ) will be large.                          DREAM




                         Hugh Shanahan   Associative methods in Systems Biology
The problem

                                                                                Associative
                                                                                methods in
                                                                                 Systems
                                                                                 Biology

                                                                                   Hugh
Depending on the assumptions we use to estimate the                              Shanahan

correlation/covariance matrix                                                  Outline

    we can either compute a very poor estimate of the                          Gene
                                                                               Ontologies
    parameters in a very accurate model,                                       Over-representation
                                                                               Semantic similarity

    or compute a good estimate of the parameters for a                         Associative
                                                                               Measures
    very inaccurate model (!)                                                  Hypotheses
                                                                               Linear Correlation

    But maybe we can reconcile the two...                                      Partial Correlation
                                                                               Non-linear measures

                                                                               Validation
                                                                               DREAM




                      Hugh Shanahan   Associative methods in Systems Biology
The problem

                                                                                Associative
                                                                                methods in
                                                                                 Systems
                                                                                 Biology

                                                                                   Hugh
Depending on the assumptions we use to estimate the                              Shanahan

correlation/covariance matrix                                                  Outline

    we can either compute a very poor estimate of the                          Gene
                                                                               Ontologies
    parameters in a very accurate model,                                       Over-representation
                                                                               Semantic similarity

    or compute a good estimate of the parameters for a                         Associative
                                                                               Measures
    very inaccurate model (!)                                                  Hypotheses
                                                                               Linear Correlation

    But maybe we can reconcile the two...                                      Partial Correlation
                                                                               Non-linear measures

                                                                               Validation
                                                                               DREAM




                      Hugh Shanahan   Associative methods in Systems Biology
The problem

                                                                                Associative
                                                                                methods in
                                                                                 Systems
                                                                                 Biology

                                                                                   Hugh
Depending on the assumptions we use to estimate the                              Shanahan

correlation/covariance matrix                                                  Outline

    we can either compute a very poor estimate of the                          Gene
                                                                               Ontologies
    parameters in a very accurate model,                                       Over-representation
                                                                               Semantic similarity

    or compute a good estimate of the parameters for a                         Associative
                                                                               Measures
    very inaccurate model (!)                                                  Hypotheses
                                                                               Linear Correlation

    But maybe we can reconcile the two...                                      Partial Correlation
                                                                               Non-linear measures

                                                                               Validation
                                                                               DREAM




                      Hugh Shanahan   Associative methods in Systems Biology
Combining the two

                                                                                 Associative
                                                                                 methods in
One can combine these two estimates:                                              Systems
                                                                                  Biology

                                                                                    Hugh
                                                                                  Shanahan
               C ∗      ˆ           ˆ
                     = λCR + (1 − λ)CF ,                                (15)
                                                                                Outline
                0 ≤ λ ≤ 1 ,                                             (16)    Gene
                                                                                Ontologies
                                                                                Over-representation

choosing a λ such that MSE(C∗ ) is minimised.                                   Semantic similarity

                                                                                Associative
    Computing λ is normally very expensive.                                     Measures
                                                                                Hypotheses

    Ledoit and Wolf (2003) came up with a short analytical                      Linear Correlation
                                                                                Partial Correlation

    way of computing λ; Schäffer and Strimmer modified                           Non-linear measures

                                                                                Validation
    this for genomic data.                                                      DREAM


    We have an R package to do this.


                       Hugh Shanahan   Associative methods in Systems Biology
Final Note

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
                                                                                  Biology

                                                                                    Hugh
                                                                                  Shanahan

The Schäffer and Strimmer estimate uses the                                     Outline
“zero-covariance” low dimensional model for their estimate,                     Gene
but this isn’t necessarily the best choice.                                     Ontologies
                                                                                Over-representation

Notably, while shrinkage estimates make much of                                 Semantic similarity


incorporating information, they don’t explicitely include                       Associative
                                                                                Measures
Biological information.                                                         Hypotheses
                                                                                Linear Correlation
                                                                                Partial Correlation
                                                                                Non-linear measures

                                                                                Validation
                                                                                DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
b1057 pckA
                                                                                           nuoI
                                               pspB
                                                     ompT
   resultsyecO
         ompC                                      ompF
                                                                                         nuoF
                                                                      artQ
                                                               glnH                  glnP
                                         hupB
 yfiA                                                                                                                                                                                                                                                                                                                                                                                                           Associative
                                                                                                                                                                                                                                                                                                                                                                                                                methods in
    Using E. coli time series data (8 time slices), Schäffer and                                                                                                                                                                                                                                                                                                                                                 Systems
GM network                                                                                                                                                                                                                                                                                                                                                                                                       Biology
    Strimmer examined 102 genes using correlations and                                                                                                           Schäfer and Strimmer: Large-Scale Covariance Matrix Estimation
                                                                                                                                                                                                                                                                                                                                                                                                                  Hugh
    partial correlations.                                                                                                                                                                                                                                                                                                                                                                                       Shanahan
                                                                                                  yjbO                                          yjbE                        ybjZ                                                                                                                                                      sucC               pyrI
                                                                                                                                                                                                                                                                yhgI                                                                                                         hns
                              yfiA      yfaD                         yedE                                   pspD                                                                                                                                                                                      manX
          sucA                                     sodA                       nuoM nmpC                                                                nuoA                                                     atpG                                                                                                                      flgD                 atpH
                       grpE     gltA     gatC                glnP                                    gatA     yhfV
                                                                                                                                                                                                                                                                                                                                                                                                          Outline
                                                                      fixC        dnaJ                                            ibpA                      nuoL                                                                                            manZ                                       hupA                                         manY               flgC
                ilvC     lacA                        cspA                                                                                                                                 cyoD                                                                                            dnaJ
                                     lacY                     aldA                          atpB                                yjcH
                                                                                                     manY                                                 artJ                                                                                              sucD                                                                      b0725 nuoH
                                                ptsG
                                                                      b0725                                       yjbE                                                                          gltA                           yhdM                                                                       ygbD
                                                                                                                                                                                                                                                                                                                                                                             hisJ                         Gene
                                                                                     yrfH                                                        cchB                                                                                                                                                                                       ynaF
                 pyrI             aceB
                                                          manZ
                                                                    ompF                      cchB       mopB                                                                                                                                               yhfV                               sodA                                                             atpF                                      Ontologies
                                                                                                                                                                                                                        yfaD
                                  pyrB          ompC                                   atpF         ibpB                                                                                                                                                                                                   nmpC                    ahpC               nuoB                                                  Over-representation
           ompT degS                                                                                                                                   cstA                               ftsJ                                                                            sucA
                                                                     sucC                                   yecO                                                                                                                                   tnaA                                         ycgX                                                                      gatB
                                                                                                                                                ibpB                                                                   icdA                                                                                                                 pspC                                                            Semantic similarity
                       dnaK                            pckA                           ybjZ                                                                                                                                                                                       ygcE
                                              atpE                                                         nuoI                                                                                                                                                 fixC                                          gatZ                                              atpE
           yjcH                 artJ                       b1057                               cyoD                                                                                    yceP                           eutG aceB
                                                                                                                                        gatC                                                                                                                                                   b1191
                                                                           ibpA
                                                   cstA
                                                                                              flgC
                                                                                                           nuoC                                                                                                                                                             atpD
                                                                                                                                                                                                                                                                                                        dnaG
                                                                                                                                                                                                                                                                                                                                            folK             aldA
                                                                                                                                                                                                                                                                                                                                                                                 pyrB
                                                                                                                                                                                                                                                                                                                                                                                                          Associative
                                        nuoH                nuoL                                                                                                                                       nuoM
                lacZ                                                                                     manX                                   mopB                                                                                asnA                        yjbO                            yheI                                                         pspD                                         Measures
                          nuoB                                                                                                                                                                                                                                                                                                                                                   ptsG
                                                     gatB
                                        ahpC                       flgD     aceA                    eutG                        yrfH                                                                                                                                        b1963                               pspA                          lpdA
                                                                                                             ycgX                                                                      b1583                             lacY                            dnaK                                                                                                   atpB                                        Hypotheses
         hisJ            atpH                          asnA                                                                                                                                                                                                                                       yedE
                                                                             atpD            artQ                                                                                                                                                                                                                                         b1057 pckA                                                        Linear Correlation
                                 atpG                                                                  icdA                                       grpE                                                                                                                       cspG                                                                                                nuoI
                                               gatD         b1191                                                                                                                                                                            lacA                                                                                         pspB                                                              Partial Correlation
          tnaA                                                            sucD b1583                                                                                                                    lacZ                                                                                                                                    ompT
                        nuoF cspG                                                             b1963        nuoA                         gatD                     nuoC
                                                                                                                                                                                                                                                                                                       yecO                                                                      nuoF
                                            folK            dnaG                                                                                                                                                                                                          cspA                                          ompC ompF                                                                           Non-linear measures
        ynaF     hns      hupA                                       gatZ           ftsJ     glnH                                                                                                                                                                                                                                                               artQ
                                                            lpdA                                      hupB                                      degS                                             gatA
                                       pspA    pspB
         ygbD     ygcE                                               yaeM pspC             yceP                                                                                                                                                                 ilvC                                                                                     glnH                glnP
                                                            yhgI                                    yhdM      yheI                                         aceA                                                                yaeM                                                             yfiA
                                                                                                                                                                                                                                                                                                                                     hupB                                                                 Validation
                                                                                                                                                                                                                                                                                                                                                                                                            DREAM
                                                                                                                                                                                                                                        (a) Shrinkage GGM network
               (c) Relevance network
    Correlations                                                                                                           Partial Correlations (Graphical                                                       gatD                  gatB
                                                                                                                                                            aceB                                                                                     nuoL



    (Relevance Network)
                                                                                                                                                                                               yjbO




                                                                                                                           Gaussian Network)
                                                                                                                                manY
                                                                                                                                                                                                                                                                                                                                                                                       yjbO
                                                                                                                                                                                                                             ahpC
                                                                                                                                                                                                                                                                                                       sucA             yfiA       yfaD      sodA              yedE                                pspD
                                                                                                                                                   asnA ygcE                                                                                                                                                                                                           nuoM nmpC

 coli data by (a) the shrinkage GGM ap-                                                                                                         sodA
                                                                                                                                                          atpB



                                                                                                                                                                     ynaF
                                                                                                                                                                         icdA
                                                                                                                                                                                          cspA



                                                                                                                                                                                       yfiA
                                                                                                                                                                                                               gatZ                 artJ



                                                                                                                                                                                                                                             pckA
                                                                                                                                                                                                                                                     cstA

                                                                                                                                                                                                                                                                                                          ilvC
                                                                                                                                                                                                                                                                                                                 grpE
                                                                                                                                                                                                                                                                                                                     lacA
                                                                                                                                                                                                                                                                                                                            gltA

                                                                                                                                                                                                                                                                                                                               lacY
                                                                                                                                                                                                                                                                                                                                     gatC
                                                                                                                                                                                                                                                                                                                                              cspA

                                                                                                                                                                                                                                                                                                                                            ptsG
                                                                                                                                                                                                                                                                                                                                                      glnP

                                                                                                                                                                                                                                                                                                                                                       aldA
                                                                                                                                                                                                                                                                                                                                                                fixC

                                                                                                                                                                                                                                                                                                                                                                b0725
                                                                                                                                                                                                                                                                                                                                                                          dnaJ
                                                                                                                                                                                                                                                                                                                                                                                    atpB
                                                                                                                                                                                                                                                                                                                                                                                            gatA

                                                                                                                                                                                                                                                                                                                                                                                             manY
                                                                                                                                                                                                                                                                                                                                                                                                     yhfV

                                                                                                                                                                                                                                                                                                                                                                                                         yjbE
                                                                                                                                                                                                                      nmpC                                                                                                                                                   yrfH

 asso GGM approach by Meinshausen and
                                                                                                                         yrfH             b1583                                               hupA
                                                                                                                                                                                                        manX                                                                                                  pyrI            aceB                           ompF                    cchB     mopB
                                                                                                                                                             gatC           hupB                                                                                                                                                                     manZ
                                                                                                                                                                                                                                 flgC        ybjZ ibpB
                                                                                                                                                                                                 hns                                                                                                                          pyrB          ompC                              atpF         ibpB
                                                                                                                                                yjbE                    ompF                                   pyrI                         nuoH                                        yecO
                                                                                                                                                                                                                                                                                                       ompT degS                                                                                   yecO
                                                                                                                                 ibpA                                                  degS
                                                                                                                                                                                                                        fixC                             cchB                                                                                                  sucC
                                                                                                                                                                                                                                                                                 cspG                                                              pckA                      ybjZ


k with abs(r) > 0.8. Black and grey edges Shanahan
                                                                                                                                                          manZ                                 pyrB                                                                                                              dnaK
                                                                                                                                                  ompC              gltA    sucC
                                                                                                                                                                                                                 nuoB                                                     yaeM                                                         atpE                                                       nuoI
                                                                                                                                                                                                                                           folK    atpD b1963                                          yjcH                                            b1057                          cyoD
                                                                                                                                        b1057                                      glnH               yjcH                                                       ygbD                                                       artJ
                                                                                                                                                                     atpF
                                                                                                                                                                                          sucD                                                                                                                                                                     ibpA                           nuoC
                                                                                                                                                                                                                          atpH                                                                                                               cstA
                                                                                                                                                                                flgD
                                                                                                                                                                                                                                                                                                                                                      nuoL                            flgC
                                     Hugh
                                                                                                                         atpE

                                                                                                                                                                            Associative methods in Systems Biology
                                                                                                                                                  ptsG           pspC                                 ilvC                                                         yhfV                                                            nuoH
                                                                                                                                                                                        aceA                   nuoF                               hisJ
                                                                                                                                                                                                                                                                tnaA
                                                                                                                                                                                                                                                                          b1191                          lacZ        nuoB                                                                     manX
                                                                                                                                                                            cyoD
                                                                                                                                                         ompT
                                                                                                                                                                  pspD                        glnP     nuoC             dnaJ
                                                                                                                                                                                                                                                                                                                                              gatB
                                                                                                                                        dnaK                                                                                     eutG                                                                                                                                 aceA                 eutG
                                                                                                                                                                                                                                                                                                                                   ahpC                     flgD
Results of comparison

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
                                                                                  Biology

                                                                                    Hugh
                                                                                  Shanahan

                                                                                Outline
    Recover centrality of sucA gene.                                            Gene
                                                                                Ontologies
    lacA, lacZ and lacY genes have the largest absolute                         Over-representation
                                                                                Semantic similarity
    partial correlations.                                                       Associative
                                                                                Measures
                                                                                Hypotheses
                                                                                Linear Correlation
                                                                                Partial Correlation
                                                                                Non-linear measures

                                                                                Validation
                                                                                DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
Associative
                                                                              methods in
                                                                               Systems
                                                                               Biology
So far, we have concerned outselves with linear
                                                                                 Hugh
relationships.                                                                 Shanahan

However, such an approximation may not be valid.                             Outline

Naively, one expects a more non-linear relatiopnship                         Gene
                                                                             Ontologies
between gene products.                                                       Over-representation
                                                                             Semantic similarity

For example, typically Transcription Factor - target                         Associative
                                                                             Measures
interactions are modelled using Michaelis-Menton                             Hypotheses
                                                                             Linear Correlation
kinetics.                                                                    Partial Correlation
                                                                             Non-linear measures

Furthermore expression levels are derived after a                            Validation
number of transformations.                                                   DREAM




                    Hugh Shanahan   Associative methods in Systems Biology
One approach: Spearman correlation

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
                                                                                  Biology

                                                                                    Hugh
                                                                                  Shanahan
   Basic idea: use ranks rather than raw data.
                                                                                Outline
   Use nearly the same definition of linear (Pearson)                            Gene
                                                                                Ontologies
   correlation.                                                                 Over-representation
                                                                                Semantic similarity
   Must be careful about ties, i.e. raw data having                             Associative
   precisely the same value (unlikely for floating point                         Measures
                                                                                Hypotheses
   data).                                                                       Linear Correlation
                                                                                Partial Correlation
                                                                                Non-linear measures

                                                                                Validation
                                                                                DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
Comparison

                                                                                Associative
                                                                                methods in
                                                                                 Systems
                                                                                 Biology

                                                                                   Hugh
                                                                                 Shanahan

                                                                               Outline

                                                                               Gene
                                                                               Ontologies
                                                                               Over-representation
                                                                               Semantic similarity

                                                                               Associative
                                                                               Measures
                                                                               Hypotheses
                                                                               Linear Correlation
                                                                               Partial Correlation
                                                                               Non-linear measures

Comparison of different measures.                                              Validation
                                                                               DREAM
Many other methods for non-linear measures are possible,
the best known being Mutual Information.


                      Hugh Shanahan   Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
Modelling of gene interactions

                                                                                  Associative
We have only just touched upon methods for inferring                              methods in
                                                                                   Systems
interactions between gene products using transcriptomic                            Biology
data. Some of the others include the use of                                          Hugh
                                                                                   Shanahan
    Mutual Information/Spearman Correlations - addresses
                                                                                 Outline
    non-linearities.
                                                                                 Gene
    Kinetic models - attempt to infer interactions.                              Ontologies
                                                                                 Over-representation

    Boolean Networks - model interactions as circuitry.                          Semantic similarity

                                                                                 Associative
    Petri Nets - Prof. Ming Chen.                                                Measures
                                                                                 Hypotheses

    Bayesian Networks - Dr. Chris Needham.                                       Linear Correlation
                                                                                 Partial Correlation
                                                                                 Non-linear measures
    Machine Learning methods -                                                   Validation
    Unsupervised/semi-supervised/supervised learning.                            DREAM



    Integration of other data sources.
    ...
                        Hugh Shanahan   Associative methods in Systems Biology
Explosion of methods
        160                                                    Annals of the New York Academy of Sciences
                                                                                                             Associative
                                                                                                             methods in
                                                                                                              Systems
                                                                                                              Biology

                                                                                                                Hugh
                                                                                                              Shanahan

                                                                                                            Outline

                                                                                                            Gene
                                                                                                            Ontologies
                                                                                                            Over-representation
                                                                                                            Semantic similarity

                                                                                                            Associative
                                                                                                            Measures
                                                                                                            Hypotheses
                                                                                                            Linear Correlation
                                                                                                            Partial Correlation
                                                                                                            Non-linear measures

                 FIGURE 1. The number of publications retrieved from a PubMed search of “Pathway            Validation
              Inference” or “Reverse Engineering.” On average, this number has been doubling every two      DREAM
              years since about 1995. (Color is shown in online version.)


     reverse-engineering methods was going to be that computational methods can, in the blink of
     a key prerequisite to their increasing value to an eye, generate large numbers of predictions,
Stolovitzsky et al., Ann.the field of re- from a few (2009) hundreds of thousands,
     biology. Indeed, at that time, N.Y. Acad. Sci. hundred to
     verse engineering biological networks was be- most (if not all) of which usually go untested.
     ginning to experience considerable expansion, Even worse, and this would be a best case sce-
                                          Hugh Shanahan         Associative methods in Systems Biology
Validation - DREAM

                                                                                    Associative
                                                                                    methods in
While there are a colossal number of methods out there, the                          Systems
                                                                                     Biology
validation of them is very much in its infancy.
                                                                                       Hugh
DREAM (Dialogue for Reverse Engineering Assessments                                  Shanahan

and Methods) is an attempt to deal with this question.                             Outline
Features: Unseen experimental data of (for example)                                Gene
                                                                                   Ontologies
    Transcription Factor bindings sites,                                           Over-representation
                                                                                   Semantic similarity

    artificial data (in silico),                                                    Associative
                                                                                   Measures
    genome-wide interactions,                                                      Hypotheses
                                                                                   Linear Correlation
                                                                                   Partial Correlation
is gathered and groups are invited to reproduce the                                Non-linear measures


interactions. Different groups results are then compared                           Validation
                                                                                   DREAM

against data to determine how well they did. New
challenges are presented on an annual basis.

                          Hugh Shanahan   Associative methods in Systems Biology
Some Results from DREAM 2 (2007)

                                                                                   Associative
                                                                                   methods in
                                                                                    Systems
                                                                                    Biology

                                                                                      Hugh
Challenge 1:                                                                        Shanahan

Identify targets of transcription factor BCL6.                                    Outline

    53 genuine targets of BCL6 inferred from unpublished                          Gene
                                                                                  Ontologies
    ChP-chip data.                                                                Over-representation
                                                                                  Semantic similarity

    147 decoys addes.                                                             Associative
                                                                                  Measures
    Task : identify the genuine targets from decoys by                            Hypotheses
                                                                                  Linear Correlation

    picking genes with similar expression patterns to BCL6.                       Partial Correlation
                                                                                  Non-linear measures

                                                                                  Validation
                                                                                  DREAM




                         Hugh Shanahan   Associative methods in Systems Biology
Results

                                                                                   Associative
                                                                                   methods in
                                                                                    Systems
                                                                                    Biology

    Best approaches were selective in the data sets                                   Hugh
                                                                                    Shanahan
    employed. (Data sets where BCL6 was highly
    expressed or not expressed at all were used.)                                 Outline

                                                                                  Gene
    Semi-supervised learning was employed - using known                           Ontologies
                                                                                  Over-representation
    targets of BCL6 to train best method.                                         Semantic similarity


    Used correlations.                                                            Associative
                                                                                  Measures
                                                                                  Hypotheses
    1st-order Partial correlations did badly.                                     Linear Correlation
                                                                                  Partial Correlation

    Basic correlations were approaching most                                      Non-linear measures


    sophisticated approaches.                                                     Validation
                                                                                  DREAM




                         Hugh Shanahan   Associative methods in Systems Biology
More results from DREAM

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
Challenge 2: E. Coli network.                                                     Biology

                                                                                    Hugh
From RegulonDB good evidence for targets to assorted                              Shanahan
Transcription Factors.
                                                                                Outline
Task : identify targets.
                                                                                Gene
Results                                                                         Ontologies
                                                                                Over-representation
                                                                                Semantic similarity
    Best method used Mutual Information and ideas behind
                                                                                Associative
    1st order partial correlation.                                              Measures
                                                                                Hypotheses

    Correlations and Partial Correlations were not too far                      Linear Correlation
                                                                                Partial Correlation

    behind.                                                                     Non-linear measures

                                                                                Validation
    Level of identification of targets was low - perhaps 5%.                     DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
Conclusions

                                                                                Associative
   GO terms allow us to handle large amounts of                                 methods in
                                                                                 Systems
   annotation in a structured fashion.                                           Biology

                                                                                   Hugh
   Associative measures are a first attempt at using the                          Shanahan

   huge amounts of expression data that is out there.
                                                                               Outline
   Very simple ideas such as correlation work surprisingly                     Gene
                                                                               Ontologies
   well (or rather more complicated methods of                                 Over-representation

   association don’t give orders of magnitude better                           Semantic similarity

                                                                               Associative
   performance).                                                               Measures
                                                                               Hypotheses

   A long way to go nonetheless.                                               Linear Correlation
                                                                               Partial Correlation

   The type of expression data we select;                                      Non-linear measures

                                                                               Validation
   a clear understanding of what microarray/RNA-seq/...                        DREAM


   technology;
   may be even more important.
                      Hugh Shanahan   Associative methods in Systems Biology

Hugh Shanahan Association Talk

  • 1.
    Associative methods in Systems Biology Hugh Shanahan Associative methods in Systems Biology Outline Gene Ontologies Hugh Shanahan Over-representation Semantic similarity Associative Measures Department of Computer Science Hypotheses Royal Holloway, University of London Linear Correlation Partial Correlation Non-linear measures September 22, 2009 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 2.
    Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 3.
    Gene Ontologies Associative methods in Systems Biology Hugh Before finding interactions, need to be able Shanahan to systematically annotate all genes Outline Gene to determine which functional groupings are Ontologies Over-representation over-represented Semantic similarity measure objectively the “functional similarity” of two Associative Measures genes. Hypotheses Linear Correlation Partial Correlation Gene Ontology (GO) is a means to do this. Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 4.
    Ontologies Associative methods in Systems Biology Abstract method for expressing structured data. Hugh Shanahan Annotation of any gene can be expressed in terms of Outline incresingly accurate description, e.g. Gene This gene is involved in transport. Ontologies Over-representation This gene is involved in vesicle mediated Semantic similarity Associative transport. Measures Hypotheses This gene is involved in vesicle fusion. Linear Correlation Partial Correlation Genes may not have an accurate annotation, so Non-linear measures Validation definition can stop higher up in this hierarchy. DREAM Hugh Shanahan Associative methods in Systems Biology
  • 5.
    More complexity required Associative methods in Systems Biology Annotation is not a simple chain. Hugh A single gene can have have a very specific annotation, Shanahan which comes from two (or more) more general Outline descriptions. Gene Ontologies Different types of annotation as well: location, Over-representation Semantic similarity biochemistry, part of organism expressed in, and so on. Associative Measures An Ontology is a Directed Acyclic Graph (DAG), not a Hypotheses Linear Correlation Tree (this means a lot to Graph Theorists). Partial Correlation Non-linear measures Each node in the DAG is an annotation term. Validation DREAM Each “child” node can more than one “parent” nodes. Hugh Shanahan Associative methods in Systems Biology
  • 6.
    GO’s visualised Associative IEWS methods in Systems Biology a b c Hugh Biological process (root) Shanahan Outline Transport Membrane organization and biogenesis Gene asing ficity Ontologies is_a is_a or Over-representation larity Vesicle-mediated Semantic similarity Membrane fusion transport Associative part_of is_a Measures Hypotheses Vesicle fusion Linear Correlation Partial Correlation Figure 1 | Simple trees versus directed acyclic graphs. Boxes represent nodes and arrows represent edges. a | An Nature Reviews | Genetics example of a simple tree, in which each child has only one parent and the edges are directed, that is, there is a source Non-linear measures (parent) and a destination (child) for each edge. b | A directed acyclic graph (DAG), in which each child can have one or Validation Rhee et al., Nature Reviews Genetics, (2008) more parents. The node with multiple parents is coloured red and the additional edge is coloured grey.c | An example of a node, vesicle fusion, in the biological process ontology with multiple parentage. The dashed edges indicate that there DREAM are other nodes not shown between the nodes and the root node (biological process). A root is a node with no incoming edges, and at least one leaf (also called a sink). A leaf node is a node with no outgoing edges, that is, a terminal node with no children (vesicle fusion). Similar to a simple tree, A DAG has directed edges and does not have cycles, that is, no path starts and ends at the same node, and will always have at least one root node. The depth of a node is the length of the longest path from the root to that node, whereas the height is the length of the longest path from that node to a leaf41. is_a and part_of are types of relationships that link the terms in the GO ontology. More information about the relationships between GO terms are found online (An Introduction to the Gene Ontology). Hugh Shanahan Associative methods in Systems Biology
  • 7.
    GO’s visualised Associative methods in Systems Biology Hugh Shanahan Outline Gene Ontologies Over-representation Semantic similarity Associative Measures Hypotheses Linear Correlation Partial Correlation http://amigo.geneontology.org/ Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 8.
    Different types ofAnnotation Associative methods in Systems Biology Hugh Typically, there are three distinct ontologies Shanahan (overwhelmingly used). Outline Cellular Compartment Gene Ontologies Over-representation Biological Process Semantic similarity Molecular Function Associative Measures Hypotheses Many other ontologies have been constructed, e.g. Linear Correlation Partial Correlation Plant Organ for Arabidopsis. Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 9.
    Caveat Associative methods in Systems Biology The annotation of most genes (90%) have been carried out Hugh Shanahan computationally. The annotations usually work pretty well, though they have a tendency not to be as accurate as those Outline obtained by direct assay. Gene Ontologies All annotated genes have an evidence code (IED) Over-representation Semantic similarity associated with them in order to demonstrate how much we Associative can rely on it. Measures Hypotheses Increasingly, co-expression is being used as a means to Linear Correlation Partial Correlation annotate genes, so one should be careful in not using this Non-linear measures information in constructing annotations ! Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 10.
    Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 11.
    Over-representation Associative methods in Systems One of the most useful tools to hand when one analyses Biology micro-array data is to ask what functional groupings occur Hugh Shanahan more often than one expects. Outline Notation Gene N number of genes in the genome. Ontologies Over-representation Semantic similarity n number of genes which have been found to be Associative differentially expressed. Measures Hypotheses m number of genes in the genome with a specific Linear Correlation Partial Correlation annotation. Non-linear measures Validation k number of genes which are differentially expressed DREAM with the same annotation. Hugh Shanahan Associative methods in Systems Biology
  • 12.
    Probabilities Associative methods in Systems One can derive the probability Pk that k genes would be Biology found by chance amongst n genes using the Hugh Shanahan hypergeometric probability distribution and the above Outline parameters. Gene For the record Ontologies Over-representation Semantic similarity Associative m C N−m C Measures k n−k Pk = NC , (1) Hypotheses Linear Correlation n Partial Correlation N N! Non-linear measures Cm = . (2) Validation (N − n)!n! DREAM Hugh Shanahan Associative methods in Systems Biology
  • 13.
    Difficulties Associative methods in Systems Biology There are thousand’s of possible GO terms and one Hugh should adjust the probabilities to deal with multiple Shanahan hypotheses. Outline Applying Bonferroni correction using all GO terms gives Gene Ontologies a p-value of 10−7 equivalent to 1% significence. Over-representation Semantic similarity Because of the structure of the GO terms these Associative Measures probabilities are highly correlated with each other. Hypotheses Linear Correlation In all these cases focussing on as short a list of Partial Correlation Non-linear measures possible biological processes as possible will minimise Validation the above difficulties. DREAM Hugh Shanahan Associative methods in Systems Biology
  • 14.
    Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 15.
    What genes match Inbenchmarking methods to infer interactions between Associative methods in gene products, we expect genes that interact to have similar Systems Biology GO terms, though perhaps not entirely the same. Hugh Semantic Similarity is a means to measure how similar the Shanahan annotations of two genes are (0 being no similarity, 1 Outline meaning total similarity). Gene Ontologies GO provides us with a means to do this in a quantitative Over-representation Semantic similarity fashion. Associative Measures Hypotheses Linear Correlation Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 16.
    Simple implementation Determine theratio of the number of nodes two genes share Associative methods in with the total number of nodes they have between them. Systems Biology Hugh #{N(G1 ) ∩ N(G2 )} Shanahan GOsimUI = (3) #{N(G1 ) ∪ N(G2 )} Outline N(G1 ) being the set of nodes associated with G1 ’s Gene Ontologies annotation. Over-representation Semantic similarity Associative Measures Hypotheses Linear Correlation Partial Correlation Non-linear measures Validation DREAM More elaborate methods are available. Hugh Shanahan Associative methods in Systems Biology
  • 17.
    Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 18.
    Motivation Associative methods in Systems Yesterday, encountered clustering. Biology Hugh Hypothesis 1 (weak) :- coexpression implies involvment Shanahan in the same process. Outline Expand to many different experiments. Gene Ontologies Hypothesis 2 (strong) :- greater a level of association, Over-representation Semantic similarity greater the chance of interaction. Associative Measures Hypothesis 2 is often referred to as “guilt by Hypotheses association”. Linear Correlation Partial Correlation Non-linear measures Association may tell us about interactions between Validation gene products. It does not tell us anything about the DREAM regulation mechanism. Hugh Shanahan Associative methods in Systems Biology
  • 19.
    Associative methods in Systems Biology Hugh Shanahan Outline Gene Ontologies Over-representation Semantic similarity Associative Measures Hypotheses Linear Correlation http://www.arabidopsis.leeds.ac.uk/act/index.php Partial Correlation Non-linear measures 266841_at AT2G26150 Validation heat shock transcription factor family protein contains Pfam profile: DREAM PF00447 HSF-type DNA-binding domain 260978_at AT1G53540 17.6 kDa class I small heat shock protein Hugh Shanahan Associative methods in Systems Biology
  • 20.
    What do wemean by association ? Associative methods in Systems Knowing something about the expression level of one gene Biology (over many different experiments) means we know Hugh Shanahan something about the expression level of the other. Replotting the above Outline Gene Ontologies Over-representation Semantic similarity Associative Measures Hypotheses Linear Correlation Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 21.
    Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 22.
    Linear Correlation coexpression Associative methods in Simplest form of association. Systems Biology Assume that there is a linear relationship between Hugh Shanahan genes. Outline Formally :- Gene y1 = a12 + c12 y2 + η12 , (4) Ontologies Over-representation Semantic similarity Associative y1 , y2 are (log) expression levels Measures η12 noise term. Hypotheses Linear Correlation a12 , c12 parameters to be determined. Partial Correlation Non-linear measures But we’re not interested in that ! Validation DREAM We are interested in asking how good a model is this for this pair of genes ? Hugh Shanahan Associative methods in Systems Biology
  • 23.
    Covariance Associative methods in Can estimate how good the linear model is by computing Systems Biology E((y1 − y 1 )(y2 − y 2 )) , Hugh Shanahan where y 1 , y 2 = E(y1 ), E(y2 ) are the means of y1 and y2 . Outline Gene E means the expectation value of the above (think of it Ontologies Over-representation for now as taking the average over all the points in the Semantic similarity previous figure). Associative Measures Can prove to oneself (exercise) that the magnitude of Hypotheses Linear Correlation the covariance is largest when y1 can be perfectly Partial Correlation Non-linear measures expressed as a linear function of y2 . Validation DREAM The covariance is zero when there is no relationship at all between y1 and y2 . Hugh Shanahan Associative methods in Systems Biology
  • 24.
    Associative methods in Systems Biology Hugh q q q q Shanahan q q qq q q q q 2 2 q q q q q q Outline qq q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q Gene 1 1 qq qq q q q q q q qq q q q q q Ontologies y2 y2 qq q q q q q q q q qq q q q q qq q q q q q q q qq q q q qq q qqq q q q q q qq q q Over-representation qq q q q q q qq q q q q q q q q q q q Semantic similarity 0 0 q q q q q q q qq qq q q qq q q qq q q qq q q qq q q q q q q q q q q q q q q q Associative q q Measures −1 −1 q q q q Hypotheses 0.6 0.8 1.0 1.2 1.4 1.6 1.8 −1 0 1 2 3 Linear Correlation y1 y1 Partial Correlation Non-linear measures Maximum covariance Zero covariance Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 25.
    Correlation Associative methods in Systems Biology Hugh We want to compare every possible pair of genes, so using Shanahan the covariance is not very practical since the maximum Outline covariance will vary from pair of gene to pair of gene. Gene However, Ontologies Over-representation Semantic similarity E((y1 − y 1 )(y2 − y 2 )) Associative ρ12 = , (5) Measures E((y1 − y 1 )2 )E((y2 − y 2 )2 ) Hypotheses Linear Correlation Partial Correlation is bounded: −1 ≤ ρ12 ≤ 1. Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 26.
    How well doesit work ? Associative methods in Systems Number of examples of improved functional annotation. Biology Unannotated gene which is highly correlated with gene Hugh Shanahan in a known response implies it is likely to be in the same response. Outline Gene Ontologies Over-representation Semantic similarity Associative Measures Hypotheses Linear Correlation Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 27.
    Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 28.
    Associative methods in Systems Biology Hugh Shanahan Difficulty : genes correlate with many other genes, not Outline just a few. Gene Why ? Ontologies Over-representation Suggestion : correlations do not distinguish between Semantic similarity Associative potential direct interactions and indirect interactions Measures Hypotheses between gene products. Linear Correlation Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 29.
    Example Associative methods in Other interactions Systems A Biology Hugh Shanahan B F Outline Gene Ontologies D Over-representation C Semantic similarity Associative Measures E Hypotheses Linear Correlation Partial Correlation Non-linear measures B directly interacts with three other genes, but could be Validation highly correlated with others. DREAM C and D would be highly correlated with each other even though they are not directly interacting. Hugh Shanahan Associative methods in Systems Biology
  • 30.
    Associative methods in Artificial example: create randomised data to represent Systems Biology expression of B. Hugh Generate two other sets of data (C and D) that are by Shanahan construction highly correlated to the original data set, but Outline are not set out to have a relationship with each other. Gene Ontologies Over-representation 3 q q q 3 3 q q q q q qq qq q qqq q q q qq q q q qq q qq q qq q qqq q q q q Semantic similarity 2 q q q qq q qqq q q q qqq q q q q 2 2 q q q qqq q q q qq q q q qqq qqqq q q qq q qq q q q q q qq q q qqqq q q q qq qq q qq qq q q q q qq q q q q q qq qq q q qq q qq q q q q q q qq q q q qq q qq q q q q q q qqq q q q q q q qq q q q q qq q qq q qq q qq qq q q q qq q q qq qq q q q qqq q qq qqqq q q q q q q qqq q q q qq q qq q q qqq qq q qq q q qq q q q q q qq q q q q qqqqqqq q qq qq q q q q q qqq q q q qq q q q q qqqqq q q qq q q q qq q q q q qqqq q q qqq q q qq q q qqq q q qq q qq q q q qq q q q q q q qqqq q qq qqq q q q q q qq qqqq qqq q qq q q Associative 1 qqqqqqq q q qqq qq q q qqqq q qq qq qqq qq q qqq qqq q q qqqq qqqq qqq q q q qq qqqqq q q qqqqqq q qqqqqq q q q qqqq qqqq 1 1 qqq q q qq q q q qqqq q qqqqqqqq q q q q qqq qq qqqqqqq qq qqq q qq q qqqqq q qqqqqqqqq qq qq qq q qq q qq q qq q q q qqq qq q q q q qqq qq qq q qqq q qqqq q qq q q qq qqq q q qqq q qqqqq q qqqqqq q qq qqqqq q q qqq qqqqq q q qqqqqqq q qqq q qq q q q q qqqq q qqqqqqqqqqq q qq qq q q q q q q q q qqq qqq q q q q qqqq qqqqqqq q qqqq q qq q q qqqqqq q qq q q qq qq qqqq qq q q q qqqqqq q qqq q q qqq q qqqqqqq q qqqqq q qq q qqqqqqq q q q qqqqqqqq q q q Measures q q qq qqq q q qqqqqq q q q qq qq q q qqqqq q q q q qqq q qqq q q qqqqq q qqq q q q qq q q q qq q qqq qqqq q q q qqqqqqqq q q q qqqqqqqq qq q qq qqq qq q qqqqq q qqqqqq q qqq qqqqqqqq q qqq q qq qqq qqqqqqqqqq qq qq q q q qqqqq qq qqq qq q qq qqq q q qqq q qqq qqqq q q qqq q qqqq qq qqqqq q q q q qq qqqqq qqqq q q qqqqq qq q q q qq qqqqq q qq qq q qqq q q q q qqqq q qqqqqq qq q qq q q q q qq q qqqqqqq qq qqqq qq q q qq qqq q qqq q q qqqqqq qq q q q qq q qqqqqq qq q q qq q qqqqqqq qq qq q qq q q q q qqq q qqqqqqqqqq q q q q q q qq qq q q qqqqqqqq qq q qqq q qqqqqqqq q qqq q q q q qqq qqq qqqqqq qqqqqqqqqq qq q q 0 qq qqqqqq q qqqqq q q q q qqqqqq q q q q qqqqq q q qq q q q qq q qq q qqqqqqqq q q qq qq qq q q qqq qqq qq q qqq q q qqq qq q q 0 0 q q qqqqqqqq qq q q qqqqq q qqqqq qqqqqqqqq q qqqqqq q qq qqqqq qq q qq qqqqq q q qqqqq qq qq q q qqq q q q q qq q qqqqqqqqq qqqq q q q q qqq qqqqqq q q qq qqqqqq q qq q q qq q qqqq q q q qqqq q qq qq q qqq q qq q q q qqq q q q qqqqqqqq q q qq q q q q q q qqq q qq qqqqqqqq q q q q q qqqqqq q q q qqqqqqqq qq q qq q qqq qq q qq q q q q q q q qqqqqqq qqq qq qq q q q qqqqqqq qqqq q qqqqqq qq q qqqqqqqq q D qq q q C q D q q qqqqqqq q q q qqqq q qqq qq q q qqqqqq qqqqqqq qq qqqqqqqq q q qq q qqq q q qqq q qq qq qqqqqq q q qqqqqq q q qqqq qq qq qqqqqqq qq qq q qq q qq q qqqqqqq q q q q qq qqqqq q q q q q q q qqqqqq q q q qqqqq q qqq q qqqqqqqqq q qqqqq qq qq q qqqqqq qq q qq q qqq qq qqqqqqqq qq qqq qqqq q qq q q qqqqq q q q qq qqqqq q qq q q q q q qqqqqqqq q q q q qq q qqqqqq q q q q qq q qq qqq q q qq qqqqqqqq qqq q qqqqqq qq q q qqq q q qqqq q q qq qqq q qq q qqqqqqq q q q q qqq qq qq q Hypotheses qqqq q q q q q q q qqq q q qqq q q q q qqqqq q q qq q qq q qqqqq q q q q q qq q q q q qq qqqqq −1 q qq q q qqqqqqqq q q qq q q qqqqqq qqqq q qqqqqq q qqqq q q q q qqq q q qqq q q −1 −1 q qq q qq q q qq qqq q qqq q qqqq q q q q qq q q qqqqqqqq q q q qqq q qq qq q q q qq q q qq qq qq q q q qq q qqq qqqqqqqqq q q qqqqq q q q q q qqq qqq q q qqqq q q q q qq qqq qq q q q qq q qq q q qqq q q q qqqqq qq q q qqqq qq qqq qqqq qqq qq q q q qqq q q qq qqqq q qqqq q qqqqq q qq q qq q q q qq q q q q q qqqqq q q q q q qq q q q q qq q q q Linear Correlation q qq qq q qqq qq qq qq qqqq qqq qq q q q qq q q q q qqq q qqq q qqq q qq qqq q q q qq q q q q q q qq q q qq q q qq q q qqq q q qq −2 qq qq Partial Correlation −2 −2 qq qqq qq qq qq q q q q q q q q q q q q q qq qq q q q q q q q qq q q q q q q q q q q Non-linear measures −3 −3 −3 q q q q q Validation −4 −4 −4 −4 −2 0 2 −4 −2 0 2 −4 −3 −2 −1 0 1 2 3 DREAM B B C ρ = 0.98 ρ = 0.98 ρ = 0.96 (!!!) Hugh Shanahan Associative methods in Systems Biology
  • 31.
    Extending correlations :-partial correlations Associative methods in Systems Correlations only take pairs of genes into consideration. Biology Hugh Partial correlations extends the initial pairwise Shanahan regression model introduced in equation 4. Outline Gene y1 = a1 + b12 y2 + b13 y3 + · · · + b1n yp + η1 . (6) Ontologies Over-representation Semantic similarity Again, we are not interested in solving this explicitely. Associative Measures We want to understand the correlation that each one of Hypotheses the genes y2 . . . yp has on y1 once we have removed Linear Correlation Partial Correlation the effect of all the other genes. Non-linear measures Validation We will use the notation PCij to refer to this partial DREAM correlation. Hugh Shanahan Associative methods in Systems Biology
  • 32.
    Derivation Associative Computed easily once you have all the correlations methods in Systems between all the genes. Biology Hugh Shanahan   1 ρ12 ρ13 . . .  ρ12 1 ρ23 . . .  Outline R= ρ  , (7)    13 ρ23 1 . . .  Gene Ontologies . . . . . . .. Over-representation . . . . Semantic similarity Associative Measures Covariance matrix C is defined similarly. Hypotheses Linear Correlation ρij is the correlation between gene i and gene j. Partial Correlation Non-linear measures Validation −1 Rij DREAM PCij = − (8) −1 −1 Rii Rjj Hugh Shanahan Associative methods in Systems Biology
  • 33.
    Questions Associative methods in Systems Biology Hugh Shanahan Outline ρij = ρji - why ? Gene Ontologies Diagonals are 1 - why ? Over-representation Semantic similarity Exercise :- compute PC using the covariance matrix. Associative Measures Hypotheses Linear Correlation Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 34.
    Artificial example Associative methods in Systems Biology Hugh Shanahan   1.0 0.96 0.98 Outline RBCD =  0.96 1.0 0.98  , (9) Gene Ontologies 0.98 0.98 1.0 Over-representation Semantic similarity   −1.0 −0.01 0.70 Associative Measures PCBCD =  −0.01 −1.0 0.70  . (10) Hypotheses Linear Correlation 0.70 0.70 −1.0 Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 35.
    Disadvantages of usingPartial correlations Associative methods in Systems Biology Hugh High partial correlations no longer tend to go to 1 (or Shanahan -1). Outline Dependent on ranking. Gene Ontologies How large should/can p (the number of genes Over-representation Semantic similarity examined) be ? Associative Measures Taking inverses of matrices should make us jumpy - Hypotheses especially when there is limited data. Linear Correlation Partial Correlation Non-linear measures Problem also dates to computing correlations. Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 36.
    “Large p, smalln” Notation Associative methods in Systems p :- number of variables (in this case, expression of a Biology gene) Hugh Shanahan n :- number of measurements (total number of affy Outline slides) Gene Ontologies R has of the order p2 (p(p − 1)/2 to be exact) Over-representation Semantic similarity potentially interesting correlations. Associative Measures Could be dealing with of the order 10, 0002 variables. Hypotheses Linear Correlation Have at best a few thousand measurements per gene :- Partial Correlation Non-linear measures n ∼ 1000. Validation DREAM If p n, then the definition of equation (5) gives a robust estimate of all those correlations, but that is not where we are ! Hugh Shanahan Associative methods in Systems Biology
  • 37.
    Artificial example p/n = 0.1 p/n = 0.5 Associative methods in 600 600 Systems 400 Biology 400 eigenvalue eigenvalue ! ! ! !! !!! ! ! !! ! ! Hugh ! ! ! ! ! Shanahan 200 200 !! ! ! ! !! ! !! !! !!! !! ! ! !! !! !! !! !! !! !! !! !! !! !! !! !! !! !! !!! !! !! !!! !!! !!! !!! !!! !!! !!! !!!! !!!! !!! !!! !!!!! !!!!!!! !!!! !!!! !!!!!!!!!!!! !!!!!!!!!!!!!! !!!!!! !!!!!! !!!!!!!! !! !!!!!!!!!!! !!!!!!!!!!! Outline 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Gene Ontologies Over-representation Semantic similarity p/n = 2 p/n = 10 Associative 1500 1500 Measures Hypotheses 1000 1000 eigenvalue eigenvalue Linear Correlation ! Partial Correlation 500 500 ! ! Non-linear measures ! ! ! !! !! !! !! ! !! !! ! !!! !!! !!! !!!! !!!!! !!!!!!!! !!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!! !! !!!!! !!!!!!! !!!!!!!!! !!!!!!!!!!!!! !!!!!!!!!!!!!!! !!!!!!!!!!!!! !!!!!!!!!!!! !!!!!!!!!! !!!! Validation 0 0 DREAM 0 20 40 60 80 100 0 20 40 60 80 100 Schäfer and Strimmer, Statistical Applications in Genetics Figure 1: Ordered eigenvalues of the sample covariance matrix S (thin black line) and that of an alternative estimator 4, 1, (2005) and Molecular Biology, S (fat green line, for definition see Tab. 1), calculated from simulated data with underlying p-variate normal distribution, for p = 100 and various ratios Biology Hugh Shanahan Associative methods in Systems
  • 38.
    Explanation Associative methods in Systems Biology Spectrum of eigen-values. Hugh Shanahan Any eigen-value equal to zero means matrix is Outline non-invertible. Gene Dashed lines - actual eigenvalues. Ontologies Over-representation Thin black lines - estimated eigenvalues using equation Semantic similarity Associative (5). Measures Hypotheses Green line - improved estimator. Linear Correlation Partial Correlation In general, if n < p then the correlation/covariance Non-linear measures Validation matrix is non-invertible. DREAM Hugh Shanahan Associative methods in Systems Biology
  • 39.
    Strategies Associative methods in Systems Biology Hugh Shanahan Reduce p - cluster data initially, then perform analysis on each cluster. (Toh and Harimoto, 2002). Outline Gene Compute lower order partial correlations - compute first Ontologies order partial correlations (Magwene and Kim, 2004). Over-representation Semantic similarity Employ improved estimator of correlations (Schäffer Associative Measures and Strimmer, 2005). Hypotheses Linear Correlation These options are not necessarily exclusive. Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 40.
    Shrinkage estimate -wordy explanation Associative methods in Systems What is computed in equation (5) is an estimate of the Biology correlation based on the available data, not the actual Hugh Shanahan correlation if we knew the underlying multi-variate distribution. They would coincide if we had much greater Outline Gene statistics. That said, we can use other estimators of Ontologies correlation. Statisticians have pointed out that many other Over-representation Semantic similarity possible estimators can be used which work better in the Associative Measures regime we lie (large p, small n). Hypotheses Shrinkage estimates attempt to combine different naive Linear Correlation Partial Correlation estimates to get an improved estimate. The principal has Non-linear measures Validation been around for some time (Stein, 1956) though its use has DREAM increased significantly in the last ten years. Hugh Shanahan Associative methods in Systems Biology
  • 41.
    Shrinkage estimates -the details Associative methods in Systems Biology Notation: Hugh Shanahan C is the actual covariance matrix. ˆ C is an estimate of the covariance matrix. Outline Gene ˆ In computing C we could either attempt to compute it Ontologies Over-representation using the standard definition (“full”) or assume (for Semantic similarity example) that all the off-diagonal entries are zero Associative Measures (“reduced”). Hypotheses Linear Correlation ˆ CF for the “full” estimate of covariance matrix. Partial Correlation Non-linear measures ˆ CR for the “reduced” estimate of covariance matrix. Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 42.
    Mean Square Error Associative methods in Systems Defining Biology Hugh Shanahan ˆ MSE(C) = ˆ E((C − C)2 ) , (11) = ˆ ˆ Var (C) + Bias2 (C) . (12) Outline Gene ˆ Var (C) = ˆ ˆ E((C − E(C))2 ) , (13) Ontologies Over-representation ˆ Bias(C) = ˆ E(C) − C . (14) Semantic similarity Associative Measures (Expectation operator is over the data that we have). Hypotheses Linear Correlation Partial Correlation ˆ ˆ Bias(CF ) is small but Var (CF ) will be large. Non-linear measures Validation ˆ ˆ Var (CR ) will be small but Bias(CR ) will be large. DREAM Hugh Shanahan Associative methods in Systems Biology
  • 43.
    The problem Associative methods in Systems Biology Hugh Depending on the assumptions we use to estimate the Shanahan correlation/covariance matrix Outline we can either compute a very poor estimate of the Gene Ontologies parameters in a very accurate model, Over-representation Semantic similarity or compute a good estimate of the parameters for a Associative Measures very inaccurate model (!) Hypotheses Linear Correlation But maybe we can reconcile the two... Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 44.
    The problem Associative methods in Systems Biology Hugh Depending on the assumptions we use to estimate the Shanahan correlation/covariance matrix Outline we can either compute a very poor estimate of the Gene Ontologies parameters in a very accurate model, Over-representation Semantic similarity or compute a good estimate of the parameters for a Associative Measures very inaccurate model (!) Hypotheses Linear Correlation But maybe we can reconcile the two... Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 45.
    The problem Associative methods in Systems Biology Hugh Depending on the assumptions we use to estimate the Shanahan correlation/covariance matrix Outline we can either compute a very poor estimate of the Gene Ontologies parameters in a very accurate model, Over-representation Semantic similarity or compute a good estimate of the parameters for a Associative Measures very inaccurate model (!) Hypotheses Linear Correlation But maybe we can reconcile the two... Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 46.
    Combining the two Associative methods in One can combine these two estimates: Systems Biology Hugh Shanahan C ∗ ˆ ˆ = λCR + (1 − λ)CF , (15) Outline 0 ≤ λ ≤ 1 , (16) Gene Ontologies Over-representation choosing a λ such that MSE(C∗ ) is minimised. Semantic similarity Associative Computing λ is normally very expensive. Measures Hypotheses Ledoit and Wolf (2003) came up with a short analytical Linear Correlation Partial Correlation way of computing λ; Schäffer and Strimmer modified Non-linear measures Validation this for genomic data. DREAM We have an R package to do this. Hugh Shanahan Associative methods in Systems Biology
  • 47.
    Final Note Associative methods in Systems Biology Hugh Shanahan The Schäffer and Strimmer estimate uses the Outline “zero-covariance” low dimensional model for their estimate, Gene but this isn’t necessarily the best choice. Ontologies Over-representation Notably, while shrinkage estimates make much of Semantic similarity incorporating information, they don’t explicitely include Associative Measures Biological information. Hypotheses Linear Correlation Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 48.
    b1057 pckA nuoI pspB ompT resultsyecO ompC ompF nuoF artQ glnH glnP hupB yfiA Associative methods in Using E. coli time series data (8 time slices), Schäffer and Systems GM network Biology Strimmer examined 102 genes using correlations and Schäfer and Strimmer: Large-Scale Covariance Matrix Estimation Hugh partial correlations. Shanahan yjbO yjbE ybjZ sucC pyrI yhgI hns yfiA yfaD yedE pspD manX sucA sodA nuoM nmpC nuoA atpG flgD atpH grpE gltA gatC glnP gatA yhfV Outline fixC dnaJ ibpA nuoL manZ hupA manY flgC ilvC lacA cspA cyoD dnaJ lacY aldA atpB yjcH manY artJ sucD b0725 nuoH ptsG b0725 yjbE gltA yhdM ygbD hisJ Gene yrfH cchB ynaF pyrI aceB manZ ompF cchB mopB yhfV sodA atpF Ontologies yfaD pyrB ompC atpF ibpB nmpC ahpC nuoB Over-representation ompT degS cstA ftsJ sucA sucC yecO tnaA ycgX gatB ibpB icdA pspC Semantic similarity dnaK pckA ybjZ ygcE atpE nuoI fixC gatZ atpE yjcH artJ b1057 cyoD yceP eutG aceB gatC b1191 ibpA cstA flgC nuoC atpD dnaG folK aldA pyrB Associative nuoH nuoL nuoM lacZ manX mopB asnA yjbO yheI pspD Measures nuoB ptsG gatB ahpC flgD aceA eutG yrfH b1963 pspA lpdA ycgX b1583 lacY dnaK atpB Hypotheses hisJ atpH asnA yedE atpD artQ b1057 pckA Linear Correlation atpG icdA grpE cspG nuoI gatD b1191 lacA pspB Partial Correlation tnaA sucD b1583 lacZ ompT nuoF cspG b1963 nuoA gatD nuoC yecO nuoF folK dnaG cspA ompC ompF Non-linear measures ynaF hns hupA gatZ ftsJ glnH artQ lpdA hupB degS gatA pspA pspB ygbD ygcE yaeM pspC yceP ilvC glnH glnP yhgI yhdM yheI aceA yaeM yfiA hupB Validation DREAM (a) Shrinkage GGM network (c) Relevance network Correlations Partial Correlations (Graphical gatD gatB aceB nuoL (Relevance Network) yjbO Gaussian Network) manY yjbO ahpC sucA yfiA yfaD sodA yedE pspD asnA ygcE nuoM nmpC coli data by (a) the shrinkage GGM ap- sodA atpB ynaF icdA cspA yfiA gatZ artJ pckA cstA ilvC grpE lacA gltA lacY gatC cspA ptsG glnP aldA fixC b0725 dnaJ atpB gatA manY yhfV yjbE nmpC yrfH asso GGM approach by Meinshausen and yrfH b1583 hupA manX pyrI aceB ompF cchB mopB gatC hupB manZ flgC ybjZ ibpB hns pyrB ompC atpF ibpB yjbE ompF pyrI nuoH yecO ompT degS yecO ibpA degS fixC cchB sucC cspG pckA ybjZ k with abs(r) > 0.8. Black and grey edges Shanahan manZ pyrB dnaK ompC gltA sucC nuoB yaeM atpE nuoI folK atpD b1963 yjcH b1057 cyoD b1057 glnH yjcH ygbD artJ atpF sucD ibpA nuoC atpH cstA flgD nuoL flgC Hugh atpE Associative methods in Systems Biology ptsG pspC ilvC yhfV nuoH aceA nuoF hisJ tnaA b1191 lacZ nuoB manX cyoD ompT pspD glnP nuoC dnaJ gatB dnaK eutG aceA eutG ahpC flgD
  • 49.
    Results of comparison Associative methods in Systems Biology Hugh Shanahan Outline Recover centrality of sucA gene. Gene Ontologies lacA, lacZ and lacY genes have the largest absolute Over-representation Semantic similarity partial correlations. Associative Measures Hypotheses Linear Correlation Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 50.
    Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 51.
    Associative methods in Systems Biology So far, we have concerned outselves with linear Hugh relationships. Shanahan However, such an approximation may not be valid. Outline Naively, one expects a more non-linear relatiopnship Gene Ontologies between gene products. Over-representation Semantic similarity For example, typically Transcription Factor - target Associative Measures interactions are modelled using Michaelis-Menton Hypotheses Linear Correlation kinetics. Partial Correlation Non-linear measures Furthermore expression levels are derived after a Validation number of transformations. DREAM Hugh Shanahan Associative methods in Systems Biology
  • 52.
    One approach: Spearmancorrelation Associative methods in Systems Biology Hugh Shanahan Basic idea: use ranks rather than raw data. Outline Use nearly the same definition of linear (Pearson) Gene Ontologies correlation. Over-representation Semantic similarity Must be careful about ties, i.e. raw data having Associative precisely the same value (unlikely for floating point Measures Hypotheses data). Linear Correlation Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 53.
    Comparison Associative methods in Systems Biology Hugh Shanahan Outline Gene Ontologies Over-representation Semantic similarity Associative Measures Hypotheses Linear Correlation Partial Correlation Non-linear measures Comparison of different measures. Validation DREAM Many other methods for non-linear measures are possible, the best known being Mutual Information. Hugh Shanahan Associative methods in Systems Biology
  • 54.
    Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 55.
    Modelling of geneinteractions Associative We have only just touched upon methods for inferring methods in Systems interactions between gene products using transcriptomic Biology data. Some of the others include the use of Hugh Shanahan Mutual Information/Spearman Correlations - addresses Outline non-linearities. Gene Kinetic models - attempt to infer interactions. Ontologies Over-representation Boolean Networks - model interactions as circuitry. Semantic similarity Associative Petri Nets - Prof. Ming Chen. Measures Hypotheses Bayesian Networks - Dr. Chris Needham. Linear Correlation Partial Correlation Non-linear measures Machine Learning methods - Validation Unsupervised/semi-supervised/supervised learning. DREAM Integration of other data sources. ... Hugh Shanahan Associative methods in Systems Biology
  • 56.
    Explosion of methods 160 Annals of the New York Academy of Sciences Associative methods in Systems Biology Hugh Shanahan Outline Gene Ontologies Over-representation Semantic similarity Associative Measures Hypotheses Linear Correlation Partial Correlation Non-linear measures FIGURE 1. The number of publications retrieved from a PubMed search of “Pathway Validation Inference” or “Reverse Engineering.” On average, this number has been doubling every two DREAM years since about 1995. (Color is shown in online version.) reverse-engineering methods was going to be that computational methods can, in the blink of a key prerequisite to their increasing value to an eye, generate large numbers of predictions, Stolovitzsky et al., Ann.the field of re- from a few (2009) hundreds of thousands, biology. Indeed, at that time, N.Y. Acad. Sci. hundred to verse engineering biological networks was be- most (if not all) of which usually go untested. ginning to experience considerable expansion, Even worse, and this would be a best case sce- Hugh Shanahan Associative methods in Systems Biology
  • 57.
    Validation - DREAM Associative methods in While there are a colossal number of methods out there, the Systems Biology validation of them is very much in its infancy. Hugh DREAM (Dialogue for Reverse Engineering Assessments Shanahan and Methods) is an attempt to deal with this question. Outline Features: Unseen experimental data of (for example) Gene Ontologies Transcription Factor bindings sites, Over-representation Semantic similarity artificial data (in silico), Associative Measures genome-wide interactions, Hypotheses Linear Correlation Partial Correlation is gathered and groups are invited to reproduce the Non-linear measures interactions. Different groups results are then compared Validation DREAM against data to determine how well they did. New challenges are presented on an annual basis. Hugh Shanahan Associative methods in Systems Biology
  • 58.
    Some Results fromDREAM 2 (2007) Associative methods in Systems Biology Hugh Challenge 1: Shanahan Identify targets of transcription factor BCL6. Outline 53 genuine targets of BCL6 inferred from unpublished Gene Ontologies ChP-chip data. Over-representation Semantic similarity 147 decoys addes. Associative Measures Task : identify the genuine targets from decoys by Hypotheses Linear Correlation picking genes with similar expression patterns to BCL6. Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 59.
    Results Associative methods in Systems Biology Best approaches were selective in the data sets Hugh Shanahan employed. (Data sets where BCL6 was highly expressed or not expressed at all were used.) Outline Gene Semi-supervised learning was employed - using known Ontologies Over-representation targets of BCL6 to train best method. Semantic similarity Used correlations. Associative Measures Hypotheses 1st-order Partial correlations did badly. Linear Correlation Partial Correlation Basic correlations were approaching most Non-linear measures sophisticated approaches. Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 60.
    More results fromDREAM Associative methods in Systems Challenge 2: E. Coli network. Biology Hugh From RegulonDB good evidence for targets to assorted Shanahan Transcription Factors. Outline Task : identify targets. Gene Results Ontologies Over-representation Semantic similarity Best method used Mutual Information and ideas behind Associative 1st order partial correlation. Measures Hypotheses Correlations and Partial Correlations were not too far Linear Correlation Partial Correlation behind. Non-linear measures Validation Level of identification of targets was low - perhaps 5%. DREAM Hugh Shanahan Associative methods in Systems Biology
  • 61.
    Conclusions Associative GO terms allow us to handle large amounts of methods in Systems annotation in a structured fashion. Biology Hugh Associative measures are a first attempt at using the Shanahan huge amounts of expression data that is out there. Outline Very simple ideas such as correlation work surprisingly Gene Ontologies well (or rather more complicated methods of Over-representation association don’t give orders of magnitude better Semantic similarity Associative performance). Measures Hypotheses A long way to go nonetheless. Linear Correlation Partial Correlation The type of expression data we select; Non-linear measures Validation a clear understanding of what microarray/RNA-seq/... DREAM technology; may be even more important. Hugh Shanahan Associative methods in Systems Biology