Big Data goes 3D: BioLayout Express3D Prof Tom Freeman University of Edinburgh
Network Graphs of (Biological) RelationshipsMany types of data, biological or otherwise, can best beviewed and interrogated as networks, best visualised as so-called network graphs.In biology these may include:• Social interactions between individuals Spread of TB via contact tracing• Transmission of disease• Relationship (evolutionary, homology) between genes and proteins• Interactions between proteins (data, co-citation, pathway Protein homology models)• ‘omics data Pathways Protein interaction
Example: Microarray Gene Expression Data• Can sequence and measure tissue-specific activity of 23,000 Microarrays genes in human body• Microarrays comprised of 1000s/millions of DNA probes – routinely used to measure activity across the genome• Produce highly complex data – analysis/visualisation is Display of statistical hits challenging• BioLayout Express3D developed originally to analyse this kind of data through use of 3D network graphs Display of clusters
Example (cont.): Steps Involved in Analyzing Gene Expression Data• Microarray data (many measurements over many samples) imported• Co-expression defined using correlation measure (read: is gene A upregulated in the same samples as gene B?)• Genes (nodes) are connected to each other in a network based on their level of co-expression (edges) (read: pretty graphs!) 1.25 billion 50,000 calculations r> 50,000 Correlation matrix
Example (cont.): The program’s work-flow in detail Data quality control, normalisation and annotation Gene-to-gene Pearson correlation calculated for every probe set on the array Filter correlations file based on user definedthreshold (0 - 1.0), i.e. exclude weak correlations Edges drawn between nodes (genes) based on correlations > than selected threshold 2D or 3D visualisation Clustering and visual exploration CPU or GPU parallelization used for all computationally intensive algorithms
Advantages of Graph-based Analyses of Complex Data using BioLayout Express3D• Rapid calculation of networks from primary data• Support for the visualization of large (10s of thousands of nodes, millions of edges) network graphs• Rendering of the networks in 3D space with real-time interactive navigation• Full range of tools for network visualization, inspection, querying and analysis• Rapid calculations as CPU and GPU are used for parallel calculations• Can in principle use to visualise data from all kinds of fields as well as linking to primary data manipulation programs such as Excel
Modelling and Visualization of Stochastic Flow through Large Network Systems – e.g. biological pathways• Standardized graphical notation system depicts the complex network of relationships within e.g. biological pathways• Previously no way of using these models as a basis for the computational modeling of pathway function• Biolayout can dynamically model the stochastic flow of ‘activity’ through large networks/pathways• Can represent this flow visuallyBasically: Can model and animate how components of a complex network influence each other over time & compare to real data to test the model.
Modelling and Visualization of Stochastic Flow through Large Network Systems 1. Pathway models drawn in yEd graph editor, parameterized and saved as .graphml files 2. Models imported into BioLayout and used to calculate time- dependent stochastic flow through network 3. The results of flow simulations can be visualised as graphs (mouse- over function) or viewed as real- time animations where the size and colour of nodes is used to represent their activity
What we’re looking forThe code is open source for non-commercial use – we’d love for you to use it in yourresearch, be it in biology or anywhere else•Where do you see this tool making an impact in a research setting?Maybe you’re a programmer who’d like to get involved in adapting the software for:•Adapting it to new applications•Integrating it with other tools•Exploring the visualisation capabilities of the tool in new settingWe’re also looking to develop the technology commercially•Can you think of any great market opportunities for BioLayout?•Who should we be partnering with to develop the tool for this application? Who mightwant to license it?Either way, we’d love to hear from you!
BioLayout Express3D Team The Roslin Institute Tim Angus Derek Wright Tom Freeman EMBL-EBI Anton Enright Stijn van DongenThanks to the challenge sponsors