Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bioinformatics foam 2013 program and abstracts


Published on

Bioinformatics Focus On Analytical Methods (FOAM) 2013 was run as part of CSIRO’s Computational and Simulation Sciences and eResearch Annual Conference and Workshops, and sponsored by the CSIRO Bioinformatics Core and The Australian Bioinformatics Network (ABN).
The first half of FOAM 2013 was aimed at CSIRO bioinformaticians, computational biologists and quantitative bioscientists, recognising that this is a once-a-year opportunity for staff across Australia to get together to discuss CSIRO-specific issues.
The second half of the meeting wass aimed at bioinformaticians, computational biologists and quantitative bioscientists in general. Feedback to the ABN indicated a preference to hold bioinformatics-oriented meetings in conjunction with other events, rather than initiating a standalone conference (at least for the time being). CSIRO’s CSS conference gives us a great opportunity to hold a very affordable (i.e., free to members) ABN event at a great location in a citywith a high concentration of Australian life-science research.
We saw a diverse and engaging agenda of presentations, reflecting the breadth of research that falls under the heading “bioinformatics”. We encourage you to get a sense of the event by checking out those presentations uploaded to the Australian Bioinformatics Network Slideshare:

  • Be the first to comment

  • Be the first to like this

Bioinformatics foam 2013 program and abstracts

  1. 1. Bioinformatics Focus On Analytical Methods 2013
  2. 2. irony, n. pron: /aɪərənɪ/, as in “This page was intentionally blank until we put this footer in”
  3. 3. WelcomeDear Colleagues,Welcome to Bioinformatics Focus On Analytical Methods (FOAM) 2013, run as part of CSIRO’sComputational and Simulation Sciences and eResearch Annual Conference and Workshops, andsponsored by the CSIRO Bioinformatics Core and The Australian Bioinformatics Network (ABN).The first half of FOAM 2013 is aimed at CSIRO bioinformaticians, computational biologists andquantitative bioscientists, recognising that this is a once-a-year opportunity for staff across Australiato get together to discuss CSIRO-specific issues.The second half of the meeting is aimed at bioinformaticians, computational biologists andquantitative bioscientists in general. Feedback to the ABN indicated a preference to holdbioinformatics-oriented meetings in conjunction with other events, rather than initiating astandalone conference (at least for the time being). CSIRO’s CSS conference gives us a greatopportunity to hold a very affordable (i.e., free to members) ABN event at a great location in a citywith a high concentration of Australian life-science research.We have a diverse and engaging agenda of presentations, reflecting the breadth of research thatfalls under the heading “bioinformatics”. We also want to encourage you to use this opportunity tomeet new colleagues and catch up with old friends and will again be holding a special BioinformaticsFOAM dinner. We will be privileged to hear from Graham Cameron, Director of the BioinformaticsResource Australia – EMBL (BRAEMBL), at that event.We hope you enjoy Bioinformatics FOAM 2013 and welcome your feedback and ideas about how tomake future events even better.With best wishes from the Bioinformatics FOAM 2013 Organising Committee: • Annette McGrath (CSIRO Bioinformatics Core Leader) • David Lovell (Australian Bioinformatics Network Director) • Lars Jermiin (OCE Science Leader in Genomics)
  4. 4. Bioinformatics FOAM 2013: Program and AbstractsStart Speaker Running titleDay 1: Wednesday 20 March13:30 Annette McGrath Welcome to Day 113:35 Ross Crowhurst A draft genome sequence of European pear (Pyrus communis L. ‘Bartlett’)”13:55 Jason Ross “Stop_gap, measure”. Tools for handling deep bisulphite sequencing data.14:15 Tim Peters Identifying differentially methylated regions in human genome14:35 Denis Bauer Cancer from every angle15:00 Afternoon tea15:30 Annette McGrath An update on Bioinformatics Core activities15:50 Alec Zwart Reproducible Research and R16:10 Neil Saunders Version control in bioinformatics: our experience using Git16:25 Steve McMahon and Philippe The CSIRO Galaxy Pilot Moncuquet16:45 Sean Li Annotation of the Helicoverpa genomeDay 2: Thursday 21 March9:00 Shared keynote session with CSS conference10:00 Morning tea10:30 Group Discussion Bioinformatics at CSIRO12:00 Lunch13:00 Lars Jermiin A revised phylogenetic protocol13:15 Paul Greenfield Error correction in primary sequence reads13:30 Rob Lanfear Identifying optimal partition schemes and models for molecular phylogenetic data13:45 David Yeates Phylogenetics in the context of collections-based research14:00 Stuart Denman The role of phylogenetics in the context of ecogenomics14:15 Peter Grewe Examining population/gene phylogenies: can old school allozyme techniques help guide Next Gen research?14:30 Afternoon tea14:45 David Lovell Welcome to ABN members14:50 Graham Cameron Introducing Bioinformatics Resource Australia EMBL (BRAEMBL)15:30 Roy Storey Ensembl for non-model organisms15:50 Bruno Gaeta Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes16:15 CSS conference close18:30 Bioinformatics Dinner Dinner speaker: Graham Cameron on "The Genesis of EBI" Page 1 of 12
  5. 5. Bioinformatics FOAM 2013: Program and AbstractsDay 3: Friday 22 March9:00 Joint session with Visualisation in Science workshop9:00 Ajay Limaye Tools for Effective Volume Exploration9:30 Felice Frankel Communicating Science Visually (Live streaming of keynote from VIZBI conference Boston)10:30 Morning Tea11:00 Cecilia Deng Integration of WGS, RNA-Seq, and comparative genomics reveals the candidate effector repertoire of closely related Venturia pathogens of the Maloideae11:20 Mani Grover Association of detailed Drug data with predicted candidate genes in Gentrepid11:40 Melissa Davis Rewiring the dynamic interactome: alternative splicing alters protein interactions across human tissues12:00 Paul Berkman GAME: modelling a genes-eye view of evolution12:15 Vidana Epa Accurate Structural Modelling of the Interaction of a Designed Ankyrin Repeat Protein with the Human Epidermal Growth Factor Receptor 212:30 Lunch13:30 Andrew Lonie Progress on the Genomics Virtual Laboratory13:50 Ross Lazarus Transmuting dark script matter into reproducible tools14:10 Tamsyn Crowley Milking the pigeon14:30 Nathan Hall Bioinformatician – more than just a number cruncher (or bridging the gap between computer scientists and biologists)15:00 Ross Lazarus Afternoon Tea15:30 Lauren Bragg Shining a light on dark sequencing: Characterising errors in Ion Torrent PGM data15:50 Ken Doig PipeCleaner: Sanitation for your NGS pipeline16:10 Tony Papenfuss Making sense of tumours in man, mouse and devils16:30 David Lovell Australian Bioinformatics Network: update to members17:00 Workshop Close Page 2 of 12
  6. 6. Bioinformatics FOAM 2013: Program and AbstractsAuthors (Speaker) Title AbstractDenis Bauer Cancer from every angle In order to understand disease states or cancer progression, we need to gain a better insight into the interplay of different regulatory mechanisms in the cell. However, modern high-throughput data generation allows us to only capture a discreet snapshot of cellular regulation, e.g. RNA, DNA, methylation. Our goal is hence to build predictive models from these layers of discrete omics data that capture the continuous regulatory interplay to inform medical genomics research. To achieve this, we generated matched genetics, transcriptomics, epigenomics as well as microbiomics data from lean and obese colorectal cancer patients. We employ statistical and machine learning methods that integrate information from the different omics data sources at single base resolution to identify regions with functional relevance for cancer development and prognosis.Paul Berkman GAME: modelling a genes-eye It is now nearly half a century since the establishment of game theory as a mechanism for studying evolution. While view of evolution the primary application of this work has been at the population and species level, the genes-eye view of evolution was postulated only shortly after evolutionary game theory itself. However, an experimental or empirical approach to the genes-eye view has not been well developed, primarily due to the challenges associated with measuring how genes act as agents over the course of evolution, with the first mathematical theory describing this perspective only published in 2011. Major advances in our understanding of the core tenets of genetics and biochemistry over the last few decades are providing the data needed to calibrate the genes-eye approach, and high-throughput sequencing technologies promise to provide even more such data. In this talk I will present GAME (Gene-Agent Modelling of Evolution), a software package designed for agent-based modelling of evolution from the gene perspective. This model provides a simulation of changes to the value and fitness of individual genes in a population of organisms over time. I will present preliminary results regarding the impacts of mutation and allelic diversity over time, testing the hypothesis that greater allelic diversity at a locus results in greater fitness for that locus.Lauren Bragg Shining a light on dark The highly anticipated Ion Torrent Personal Genome Machine (PGM) debuted on the sequencing market in 2011. sequencing: Characterising Novel platform design, most notably the measurement of pH changes to detect polymerisation events, yielded the errors in Ion Torrent PGM first sequencing platform under $100K. The long-read lengths (now 400bp) and marketed high base-accuracy data suggested that the PGM would supersede the Roche 454 platform for most applications. To identify potential applications for this new technology, I analysed a number of re-sequencing datasets generated using the PGM, investigating the errors and biases introduced by the PGM library preparation and sequencing process. In this presentation I will be discussing these results and how errors/biases introduced by the PGM platform may compromise specific studies.Graham Cameron Bioinformatics Resource The EMBL Australia Bioinformatics Resource developed out of the EBI Mirror Project, whose goal was to create a Australia/EMBL (BRAEMBL) “mirror” of EBI databases and services at UQ to serve Australia. This was motivated in large part by a desire to remove perceived disadvantages in the exploitation of bioinformatics data and tools due to Australia’s geographical remoteness and its network connectivity to the rest of the world. Page 3 of 12
  7. 7. Bioinformatics FOAM 2013: Program and Abstracts It turns out that mirroring the EBI in its entirety is impossible and probably not even desirable. Many of the services of the EBI depend on a complex and extensive IT context, and even for “mirrorable” services there is a real difficulty in keeping up with the data releases and updates from the original source. This has caused us to re-examine our goals and to cast the mission in less specific terms. It is to: • enable optimal exploitation of the tools and data of bioinformatics by Australian scientists • contribute to the global biomolecular information infrastructure in a way which showcases Australian science. This mission is entirely compatible with the underlying motivation for the mirror project, but is agnostic about the solution. Alongside the Mirror project, a related project at UQ, the Specialised Facility in Bioinformatics (SFB), provides compute capability to Australian bioinformatics. Our respecified mission is as fitting for the SFB as for the Mirror, and the two projects have been unified under this mission as a single project the “Bioinformatics Resource Australia/EMBL” (BRAEMBL). We are now working out what is required in practical terms in pursuit of the BRAEMBL mission. The first stage of this was a data gathering exercise – a survey of bioinformatics activities and needs in Australia. I will: • present the key findings of this survey as indicators of the activities, mood and desires of our scientific constituency • give my opinion about the global trends in IT in the life sciences • present some emerging ideas about how we might marry modern IT and Australian bioinformatics • give some thoughts about components beyond BRAEMBL necessary to a healthy Australian bioinformatics ecosystem.Ross Crowhurst, A draft genome sequence of We have sequenced the genome of European pear, Pyrus communis cultivar ‘Bartlett’/‘Williams’ Bon Chrétien’ usingChagné D, Pindo European pear (Pyrus second generation sequencing technology (Roche 454). A draft assembly was produced from single end reads, 2 kb,M, Thrimawithana communis L. ‘Bartlett’)” and 8 kb insert paired end reads using Newbler (version 2.7). The assembly contained 142,083 scaffolds greaterA, Deng C, Ireland than 499 bases (maximum scaffold length of 1.29Mb) covering a total of 577.3 Mb and representing 96.1% of theH, Fiers M, expected 600 Mb Pyrus genome. Gene prediction using Augustus (version 2.6.1) predicted 50,703 models, of whichDzierzon H, 5339 proteins are unique to European pear. Preliminary analysis indicated that 2279 SNP markers anchored 171 MbCestaro A, Lu A, of the assembled genome. Further analysis is in progress to improve anchoring. This preliminary ‘Bartlett’ genomeStorey R, Knaebel sequence is a unique tool for identifying the genetic control of key horticultural traits and for developing better pearM, Saeed M, cultivars, enabling wide application of marker-assisted and genomic selection.Montanari S, KimYK, Nicolini D,Larger S, Stefani E,Allan AC, Bowen J,Johnston J, Malnoy Page 4 of 12
  8. 8. Bioinformatics FOAM 2013: Program and AbstractsM, Troggio M,Perchepied L,Sawyer G, WiedowC, Won KH, Viola R,Hellens R, BrewerL, Bus VGM,Schaffer R,Gardiner SE,Velasco RTamsyn Crowley Milking the Pigeon The pigeon is one of only a few birds that produce a nutrient substance ‘crop milk’ to feed their young. This nutrient substance is produced in the crop by both male and female birds and has been shown to have functional similarities with mammalian milk. As with mammalian milk, crop milk is essential for squab growth, providing both nutritional and immune benefits. We have spent the last few years studying this interesting biological phenomenon employing many different tools, including bioinformatics. Until recently there was little genomic information available, hence we have utilised bioinformatics and experimental biology in order to gain an insight into the production and benefits of pigeon crop milk.David LA Wood, Rewiring the dynamic Transcriptomics continues to provide ever-more evidence that in morphologically complex eukaryotes, eachMark A Ragan, interactome: alternative protein-coding genetic locus can give rise to multiple transcripts that differ in length, exon content and/or otherNicole Cloonan, splicing alters protein sequence features. In humans, the majority of loci give rise to multiple transcripts in this way. Motifs that mediateSean M Grimmond interactions across human protein-protein interactions can be present or absent in these transcripts. Analysis of protein interaction networksand Melissa J Davis tissues has been a valuable development in systems biology. Interactions are typically recorded for representative proteins or even genes, although exploratory transcriptomics has revealed great spatiotemporal diversity in the output of genes at both the transcript and protein-isoform levels. The increasing availability of high-resolution protein structures has made it possible to identify the domain-domain interactions that underpin many protein interactions. Thus we are able to identify protein isoforms that gain or lose the ability to interact with other proteins by identifying the interaction domains present or absent in the set of isoforms produced from a given gene. Here we explore the impact of transcript and isoform diversity on protein interactions in 16 phenotypically normal human tissues. We use the sequenced transcriptomes of these tissues to interrogate the protein-coding transcriptional output of genes, identifying tissue-specific variation in the inclusion of protein interaction domains. We map these data to a set of high-quality protein interactions, and characterise the variation in network connectivity likely to result from tissue specific alternative splicing. We find strong evidence for altered interaction potential in many genes, suggesting that transcriptional variation can significantly rewire the human interactome. We further identify interactions that are wide spread and supported at the transcript level across most human tissues, as well as interactions that are restricted to single, or a small number of tissues. Our work highlights the Page 5 of 12
  9. 9. Bioinformatics FOAM 2013: Program and Abstracts rewiring of interaction networks resulting from alternate transcriptional events and underpinning the unique molecular interaction systems of each tissue.Stuart Denman The Role of Phylogenetics in Culture-based methods focused on the isolating and describing of specific populations from within an environment the Context of Economics are time consuming and heavily biased by the selected isolation media and methods employed. Culture independent methods were devised to overcome these short falls. By far, the majority of these studies use molecular markers (DNA based) to identify and describe the microbes present and their changing abundance within these ecosystems. These range from methods that produce a high level/low resolution “fingerprint” of the community through to high resolution phylogenetic targeted methods and metagenomic phylogenetic assignment.Cecilia Deng, Integration of WGS, RNA-Seq, Host specificity is exhibited by different species and races of Venturia, a fungus that infects members of theDaniel Jones, and comparative genomics Maloideae. V. inaequalis causes the economically important disease apple scab; however, certain isolates classifiedBruno Le Cam, Kim reveals the candidate effector as V. inaequalis infect loquat but not Malus. V. pirina infects the related woody host European pear. The genetics ofPlummer, Carl repertoire of closely related the interaction between apple and V. inaequalis follow the gene-for-gene model. Effectors (small pathogen proteinsMesarich, Venturia pathogens of the required for infection) are secreted into the plant/pathogen interface to suppress defence/enhance infection. AMatthew Maloideae subset of effectors can be recognised by plant resistance gene (R) products to induce resistance. Seventeen gene-Templeton and for-gene pairings between effector and R genes have been identified to date. The effector repertoire of VenturiaJoanna Bowen isolates determines their cultivar specificity and probably host specificity. The draft genome of three V. inaequalis isolates (two from apple, one from loquat) and an isolate of V. pirina have been assembled using second generation sequencing data. Additionally, RNA sequencing has been obtained from samples taken at two time points after inoculation both in planta and in vitro. Comparative analysis of the predicted proteomes of the four Venturia isolates, coupled with detection of differential gene expression levels, has enabled the identification of candidate effectors determining host range. Eighty-four effectors are unique to V. pirina, six are unique to the V. inaequalis isolate specific to loquat, and 145 specific to the apple-infecting isolates. These effector candidates are currently being characterised with respect to functionality.Ken Doig and Jason PipeCleaner: Sanitation for Increasingly affordable sequencing platforms has led to their wide spread adoption beyond research groups. TheEllul your NGS pipeline infiltration of desktop sequencers into the clinic has meant many institutions have had to beg, borrow or steal analysis pipelines that can crunch locally generated data piles. These pipelines are needed to refine the voluminous data generated by next generation sequencing (NGS) platforms. Ideally, they transform raw sequencing reads into meaningful biological data suitable for clinical reporting or research analysis. Unfortunately, there is little consensus across labs on how this should be done and indeed, there is such a vast range of software components with varying attributes that there is unlikely to be any standardisation in the near future. Here we present PipeCleaner as implemented at the Peter MacCallum Cancer Centre and describes its operation and utility in developing robust clinical pipelines. We will also present a number of sequencing scenarios where it’s application has enhanced our internal amplicon somatic mutationVidana C. Epa, Olan Accurate Structural Modelling The human epidermal growth factor receptor 2 (HER2) is over-expressed in a significant proportion of breastDolezal, Larissa of the Interaction of a cancers and is a target for therapeutic intervention with monoclonal antibodies and small molecule inhibitors. The Page 6 of 12
  10. 10. Bioinformatics FOAM 2013: Program and AbstractsDoughty, Xiaowen Designed Ankyrin Repeat novel binding proteins called Designed Ankyrin Repeat Proteins (DARPins) can be selected to be high affinityXiao, and Timothy Protein with the Human binders to targets. The DARPin H10-2-G3 has been evolved to bind with picomolar affinity to HER2. In this work, weE. Adams Epidermal Growth Factor modelled the structure of the complex between the DARPin H10-2-G3 and HER2 using computational Receptor 2 macromolecular docking. After analyzing the structural interface between the two proteins, we validated the structural model by showing that HER2 mutations at the putative interface significantly reduce binding to the DARPin but have no effect on binding to Herceptin, a HER2-specific monoclonal antibody. Very recently the X-ray crystal structure of this complex was solved and showed that the backbone RMSD between the computational model and the X-ray structure was better than 1 Angstrom. This work illustrates the utility of computational structural biology methodologies in elucidating the details of protein-protein interactions.Bruno Gaeta Characterising the human The study of inherited variation in the immunoglobulin heavy chain (IGH) locus has lagged behind that of other loci. immunoglobulin heavy chain This locus undergoes recombination during B-­‐ lymphocyte differentiation, as well as somatic hypermutation after locus by ultra-­‐deep antigen challenge, and the resulting variation is difficult to distinguish from inherited polymorphisms. In addition, sequencing of rearranged most large-­‐scale human genomics projects (including the Human Genome Project and the 1000 Genomes Project) immunoglobulin genes have ignored the IGH locus as they are based on sequencing DNA from lymphoblastoid cells in which the IGH locus has been recombined. As an alternative, our group has pioneered the use of ultra-­‐deep sequencing of rearranged immunoglobulin genes to understand inherited variation in the germline locus. By sampling and comparing tens of thousands of rearranged sequences from an individual it is possible to identify the patterns of variation that are consistent with inherited polymorphisms instead of resulting from somatic mutation. It is also possible to genotype, and in some cases haplotype, the IGH loci for this individual. This approach has required the development of a whole new range of bioinformatics algorithms tailored to immunoglobulin genes, and has resulted in the discovery of several new polymorphisms as well as providing the basis for in-­‐depth population analysis of the IGH locus. In this presentation I will outline the difficulties in applying standard genomic techniques to immunoglobulin genes and describe the bioinformatics methods we developed to study this unusual locus.Paul Greenfield Error correction in primary Sequence data is now cheap, and become cheaper all the time. The only problem with all this data is that it isn’t sequence reads perfect and contains errors, some random, some more systematic. Commonly used tools, such as aligners and assemblers, know about these errors and deal with them in various ways, such as looking for consensus or doing error-tolerant string matching. Another way of dealing with sequencing errors is to correct them and there have been a number of published error-correction algorithms. This presentation looks at a number of these algorithms and discusses their effectiveness and performance. Do they actually work and correct the errors present in typical sequencing data? Are they sufficiently practical to be at all useful? How do you measure the effectiveness of a correction algorithm? Are these programs even worth running or are existing aligners and assemblers already handling errors well enough? The results presented here come from a comparison of published algorithms done for the paper describing Blue (a fast correction algorithm based on consensus and context).Peter Grewe Examining population/gene Determining phylogenetic relationships between/among populations can lead to understanding population genetic phylogenies: can old school relatedness and differentiation in a way that is useful for management. In management of marine fish populations, Page 7 of 12
  11. 11. Bioinformatics FOAM 2013: Program and Abstracts allozyme techniques guide demonstrating stock delineation has been difficult due to low levels of differentiation between areas, even when Next Gen research? separated by large distances. This low level of differentiation has been attributed to many factors including very large population sizes, lack of sufficient barriers to migration/immigration, and even homoplasy in the data where mutation rates may be equal or greater than the rates of genetic drift that promote differentiation. However, genetic analyses in the past have been limited to small snapshots of the genome that have limited resolution and capability to examine populations in sufficient detail to sufficiently address these issues. Next gen sequencing techniques are now opening the way forward to examine genetic data in ways never before thought possible. Our lab is now examining a protein polymorphism revealed by cellulose acetate electrophoresis and shown to have spatially different allozyme frequencies that appear to be temporally stable. We are unravelling the nucleotide variation responsible for the protein phenotypes in an effort to examine the phylogeny of these polymorphisms with increased resolution afforded at the nucleotide level. Examination of the phylogenies of allele variants should also give us an understanding of relationships among these populations. By mapping these relationships in a geographic context we hope to reveal important regions that can be used to define fish stocks useful for management purposes. We also hope to uncover subtle variation that would indicate finer relationships and further substantiate that these two populations are indeed reproductively isolatedMani. P. Grover, K. Association of detailed Drug Candidate gene prediction systems identify genes likely to be of functional relevance to a phenotype fromA. data with predicted candidate associated genetic loci. Gentrepid, a human candidate gene discovery platform, utilizes two algorithms- CommonMohanasundaram, genes in Gentrepid Module Profiling and Common Pathway Scanning - to prioritize candidate genes for human inherited disorders.Sara Ballouz, R. A. Recently, several protocols were developed to apply Gentrepid to the analysis of data from Genome WideGeorge, C. D. H. Association Studies (GWAS) using the Wellcome Trust Case Control Consortium (WTCCC) data set on seven complexSherman1 M. A. diseases as an example (Ballouz et al, 2011).We are integrating drug databases now to enable researchers toWouters immediately associate potential therapeutics with candidate genes. In this work presented here, we associated drugs with seven WTCCC phenotypes. For instance, Gentrepid predicted Peroxisome proliferator activated receptor delta (PPARD) as a candidate gene for Type II diabetes. Using the reference drug databases, we identified a dozen drugs that target PPARD. Drug Bank (Wishart et al, 2006) suggested 10 drugs used to treat lipid and glucose metabolic diseases, the Therapeutic Target Database (TTD) (Chen et al, 2002) indicated two drugs currently used to treat obesity and hyperlipidemia, and Pharm-GKB database (Hernandez et al, 2008) suggested two drugs used to treat prostatic neoplasms. For Carbohydrate (chondroitin 6) sulfotranferase 3 (CHST3), another Gentrepid candidate gene for Type II diabetes, Pharm-GKB suggested the same two drugs to treat prostatic neoplasms as identified for the PPARD gene. Thus, these drugs can be immediately utilized in further laboratory studies and in phase III clinical trials.Nathan Hall Bioinformatician – more than What is a bioinformatician? What does a bioinformatician do? Biologist? Computer scientist? Statistician? All of just a number cruncher (or the above? Every bioinformatician is different, but the one thing for sure is that a bioinformatician is much more bridging the gap between than just a number cruncher and a critical role is to bridge the gap between biologists and computer scientists computer scientists and Page 8 of 12
  12. 12. Bioinformatics FOAM 2013: Program and Abstracts biologists) Bioinformatics, especially in the area of next-generation sequencing, is growing at tremendous speed and will need to continue to do so in the future. This leads to the questions: “What are the best ways to go about teaching biologists to be bioinformaticians?”, and “Who should call themselves a bioinformatician, should this be an inclusive or exclusive club?”. I will relate my experiences in working in bioinformatics at the interface between biology and computer science and discuss the benefits and downfalls of becoming a generalist bioinformatician, and the fun of getting to think about a huge range of interesting problems.Lars Jermiin A revised phylogenetic Molecular phylogenetics has acquired an increasingly central role in studies of genomes and genomics data. In this protocol context, a sequential set of procedures — the phylogenetic protocol — is commonly applied to extract information from these types of data. The phylogenetic protocol, however, is flawed as it contains several illogical feedback loops. In addition, the assumptions of many of the phylogenetic methods used are often not considered in sufficient detail. In this seminar, I present a revised phylogenetic protocol with a sound set of feedback loops. I also present some of the phylogenetic tools that we have developed or are in the process of developing. Finally, I demonstrate the value of the revised phylogenetic protocol using insect and yeast genome data.Rob Lanfear Finding Good Models of As phylogenetic datasets increase in size, it becomes more and more important to use an appropriate model of Molecular Evolution in molecular evolution. Incorrect models can lead to incorrect inferences, and this problem is exacerbated with larger Phylogenetics datasets. I will present some new methods and associated software, PartitionFinder, which simplify and automate model selection in phylogenetics. These methods can be applied to datasets of any size - from a single locus to genome-scale datasets of many thousands of loci, and can be efficiently parallelised. Ill show how these methods can lead to huge improvements in the models of molecular evolution that are used, and discuss how this can improve the inferences we make from DNA sequence data.Sean Li Annotation of the Helicoverpa The high-throughput next-generation sequencing techniques and nowadays computational power have greatly genome facilitated the genome sequencing, assembly and annotation process in terms of data resources, cost and time. Yet each of three tasks has its own challenges to overcome. Especially in genome annotation, though a number of automatic pipelines have been proposed and shown some promising results, the approach of constructing a reasonably accurate gene set remains unclear. In this talk, we will present an overview of works we have done so far for annotating the Helicoverpa Genome, including annotation tools that have been applied, such as Maker, CEGMA, PASA and Blast2GO, methods to produce a consensus gene set from multiple annotation runs, experience that we have learnt from multiple approaches, questions raised from the annotation quality assessment, as well as the future plan towards the completion of Helicoverpa genome annotation.Andrew Lonie Progress on the Genomics The Genomics Virtual Laboratory (GVL) project, funded by NeCTAR, is building scalable infrastructure, workflow Virtual Laboratory platforms and community resources for Australian genomics researchers. At this stage, the GVL comprises: a prototype workflow management system based on the Galaxy framework, a bioinformatics toolkit (for command-line users), and a visualisation service based on the UCSC Genome Browser, all implemented on the Page 9 of 12
  13. 13. Bioinformatics FOAM 2013: Program and Abstracts NeCTAR Research Cloud; and a developing set of tutorials and exemplar workflows targetted at common high throughput genomics tasks. In this talk I will demonstrate GVL capabilities and discuss progress and the GVL roadmap.David Lovell Australian Bioinformatics The Australian Bioinformatics Network aims to connect people to Network: update to members • people (and those yet to join!) • resources • opportunities to increase the benefits Australian bioinformatics can deliver. The ABN now has over 250 members and this presentation is to provide an update and gather some feedback about how it can serve these members and those yet to join.Annette McGrath Bioinformatics Core update I will present an update on the activities of the CSIRO Bioinformatics Core since our last meeting. In particular I will be updating you on projects that are already underway and highlighting upcoming projects for the CSIRO bioinformatics community for your input.Steve McMahon & The CSIRO Galaxy pilot project A Galaxy service pilot has been set up in CSIRO for the benefit of biologists and bioinformaticians within thePhilippe organisation. The service pilot is implemented as a collaboration between CSIRO’s Information Management andMoncuquet Technology staff (IM&T) and the CSIRO bioinformatics core. This makes best use of the IT infrastructure and service delivery expertise of the IT and the bioinformatics domain expertise of the bioinformatics staff. This presentation outlines Galaxy, the way it has been implemented in CSIRO as a service pilot and some of the outcomes and related experiences as well as how to use it and how it can benefit both bioinformatician and biologist. This presentation encourages the bioinformatics community to show demand for a full production Galaxy service.Tony Papenfuss Making sense of tumour Analysis of next generation sequencing data from tumour genomes requires pipelines built around specialised tools sequence data in man, mouse for SNV calling, copy number analysis and genomic rearrangement prediction. These tools must deal with many and devil challenges. Some are intrinsic to the biology, such as contaminating normal cells, aneuploidy and intra-tumour heterogeneity, and some are extrinsic, for example sample quality, experimental design or its mis-design. Our work has been focused on two areas: methods for predicting somatic structural variation and going from pipeline results to biological insight. With examples from human, mouse and Tassie devil tumours, Ill discuss how identifying genomic rearrangements works; how, motivated by different datasets, our approach has developed; and how we made sense of insanely complex genomic rearrangements.Tim Peters Identifying differentially The Illumina® HM450K array interrogates the human methylome by measuring methylation signals at methylated regions in human approximately half a million CpG sites of biological interest. However, identifying the most differentially methylated genome (DM) probes alone, even with annotation, is of fairly limited use. What is more useful is identifying regions of DM; clusters of probes whose DM signals correspond with loci of particular biological functionality. A principled agglomeration of DM probes, informed by consecutivity, annotation, and relative genomic position, along with a robust measure of differential methylation itself, is needed to properly extract these regions. Methods such as bump hunting (Jaffe et al. 2012) attempt to do this, but suffer from unnecessary parameterisation and operational Page 10 of 12
  14. 14. Bioinformatics FOAM 2013: Program and Abstracts issues. We present a less parameterised method that fits probes of interest to a weighted probability density function with kernel estimation, which is able to rank the most differentially methylated regions based on the density of the DM signal at any given point in the genome. This method is also able to detect regions of high variability of methylation in unlabelled data, and has scope for integration into existing visualisation tools and statistical analysis software packages.Jason Ross “Stop_gap, measure”. Tools The current iteration of Ion Torrent instruments offer high throughput, long read lengths and relatively low costs, for handling deep bisulphite making them attractive platforms for deep sequencing. However, Ion Torrent sequencing (like 454 sequencing) has sequencing data. an error mode where the number of nucleotides in longer homopolymers are often incorrectly estimated. Bisulphite treated DNA often has long runs of thymines and the resulting Ion Torrent read errors introduce misalignments - making the estimation of cytosine methylation particularly difficult. “Stop_gap” is a software tool that reads BAM files, implements approaches to correct for such misalignments and writes corrected BAM files. “Measure” is software that can walk through a BAM file from any deep sequencing platform and calculate methylation rates at CpG or CpN sites. Output can be either a csv or Excel file. Both software tools can be executed from the command line as part of a pipeline, or alternatively are callable as Python classes.Neil Saunders Version control in Version control is an important aspect of reproducible bioinformatics research. However, it is still not employed as bioinformatics: our widely as we would like. experience using Git In this presentation I aim to: (1) Provide a basic introduction to Git, a popular open-source distributed version control system (2) Illustrate how we use Git to manage projects in the CMIS Bioinformatics & Biostatistics groupRoy Storey Ensembl for non-model EnsEMBL started as a data and visualisation framework for the release of the Human genome organisms (doi:10.1038/35057062). Since then, EnsEMBL has been accumulating and hosting genomes. Initially this was confined to vertebrate genomes, such as Human, Mouse and Zebrafish, but now the EnsEMBL Genomes ( project includes over 6000 genomes spanning all the biotic phyla. The EnsEMBL framework serves as a resource with which to warehouse and access "omics" data in a genomic context, in an extensible and reproducible manner. We present lessons learnt from running EnsEMBL as a local instance, incorporating mirrors of public genomes and genomes that we have sequenced, assembled and annotated. This provides insight into the challenges faced and how we have extended the application programming interface (API), website visuals and functionality to provide integration into other local services.David Yeates Phylogenetics in the context CSIRO manages and develops four major biological collections in the Australian National Biological Collections of collections-based research Facility: The National Herbarium and the National Collections of Insects, Wildlife and Fish. Together these collections manage millions of specimens and are a significant resource for studying Australia’s biodiversity. Research scientists in the collections use phylogenetic research results to illuminate the tempo and mode of evolution in Australia’s biodiversity in an effort to understand the processes that have shaped biological evolution here. Increasingly these results have important implications for conservation, and natural resource management, in particular helping us predict the impacts of threatening processes such as climate change. I will focus on the use of Page 11 of 12
  15. 15. Bioinformatics FOAM 2013: Program and Abstracts phylogenetic results using multilocus molecular datasets to understand biogeographic and coevolutionary processes in a number of different biological systems. The emerging promise of phylogenomic-scale datasets offers an expanded arsenal of tools to understand evolutionary patterns and processes, and brings with it a new set of challenges in analysis and interpretation.Alec Zwart Reproducible Research and R Literate programming (Knuth 1984) systems such as CWeb or Sweave (in R) provide tools to enable the concept of reproducible research (Fomel & Claerbout 2009, Donoho 2010) – the idea that a publication describing the results of research can (& should) also include the code and data needed to reproduce the results and figures presented in the publication. In this talk I briefly introduce the motivations for literate programming and reproducible research, the concept of a compendium (Gentleman & Temple Lang 2007) as a format for distribution of reproducible research, and I demonstrate a particularly easy-to-use literate programming system recently developed for the R statistical software – knitr+markdown – perfect for simple, reproducible reports of analyses. Page 12 of 12
  16. 16. Life’s complex… …use bioinformaticsThe CSIRO Bioinformatics Core and the Australian Bioinformatics Network are proud to supportBioinformatics FOAM 2013.The Core aims to complement and augment the efforts of bioinformaticians and bioinformatics teams across CSIRO.The Australian Bioinformatics Network aims to connect people, resources and opportunities to increase the benefitsAustralian bioinformatics can deliver.We wish all delegates a successful meeting.