Free software and bioinformatics

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    5 Favorites

    Free software and bioinformatics - Presentation Transcript

    1. Free software and  biomedical research Alberto Labarga alberto.labarga@scientifik.info
    2. When Craig Venter was asked, “What makes you think you can do a better job with life and genetics than God?”, he  answered…
    3. we have computers
    4. and software too
    5. ¡free software!
    6. biology is a data intensive science
    7. Scientific information available in 2010  will double every 72 hours
    8. data mining
    9. my data is mine!
    10. and your data is mine,  too!
    11. open source open data open access
    12. open science
    13. Sequence (DNA/RNA) Comparative genomics & phylogeny Protein sequence analysis & Regulation of gene  evolution expression; transcription  factors & micro RNAs Protein structure & function:  computational crystallography Protein families,  motifs and domains Chemical biology Protein interactions & complexes: modelling and prediction Pathway analysis Data integration & literature mining Image analysis Systems  modelling
    14. The first Atlas of Protein Sequence and Structure, presented information  about 65 proteins. 
    15. In 1981 the EMBL Nucleotide Sequence Data Library is created.  Version 2 was composed of 811 secuences, around 1 million bases introduced by hand.
    16. Smith TF, Waterman MS (1981). \"Identification of common  molecular subsequences.\". J Mol Biol. 147 (1): 195‐7.
    17. S.F. Altschul, et al. (1990), \"Basic Local Alignment Search Tool,\" J.  Molec. Biol., 215(3): 403‐10, 1990. 15,306 citations
    18. J. Thompson, T. Gibson, D. Higgins (1994), CLUSTAL W: improving  the sensitivity of progressive multiple sequence alignment. Nuc.  Acids. Res. 22, 4673 ‐ 4680
    19. In 1995 the European bioinformatics institute is created.
    20. http://emboss.sourceforge.net/ EMBOSS (The European Molecular Biology Open Software Suite)  is a free Open Source software analysis package that provides a  comprehensive set of sequence analysis package specially  developed for the needs of the molecular biology user  community.  First requirements based on a list of long‐standing problems in  existing commercial software (GCG), and the need for public  source code Within EMBOSS you will find around 200 programs (applications). Current version is 6.0.1 
    21. Main Programs in EMBOSS Retrieve sequences from database Sequence alignment Nucleic gene finding and translation Protein secondary structure prediction Rapid database searching with sequence patterns Protein motif identification, including domain analysis Nucleotide sequence pattern analysis, for example to  identify CpG islands or repeats. Codon usage analysis for small genomes Rapid identification of sequence patterns in large scale  sequence sets Presentation tools for publication
    22. 2 open‐bio.org • The Open Bioinformatics Foundation is a non profit,  volunteer run organization focused on supporting  open source programming in bioinformatics.  • Its main activities are: – Underwriting and supporting the BOSC conferences – Organizing and supporting developer‐centric \"hackathon\"  events (Bio*)
    23. O’Reilly Books and Conferences
    24. http://www.ensembl.org
    25. 30 http://www.uniprot.org
    26. Generic Model Organism Database project http://gmod.org
    27. DAS Concept Annotation server A Annotation server B Annotation server C Client Reference server http://www.biodas.org
    28. DAS Server • DAS request to retrieve features on a segment: • http://das.ensembl.org/das/ens_36_omim_genes/features?s egment=1:1,1000000 • Result:
    29. Das viewer
    30. http://www.ebi.ac.uk/dasty/
    31. Illumina / Solexa Genetic Analyzer Applied Biosystems Roche / 454 Applied Biosystems Genome Sequencer SOLiD ABI 3730XL 3000 Mb/run 100 Mb/run 1 Mb/day
    32. Sequencing  Fragment assembly problem  The Shortest  Superstring Problem Velvet (Zerbino, 2008)  Sequence comparison  pairwise and multiple sequence  alignments  dynamic algorithm, heuristic methods PSI‐ BLAST  (Altschul et. al., 1997) (SSAHA, 2001) (MUMmerGPU, 2008)  Gene finding  Hidden Markov Models, pattern recognition  methods  GenScan (Burge & Karlin, 1997)
    33. Genomes Nucleotides Proteins Structures Other molecules Interactions Experiments Literature Ontologies http://www.ebi.ac.uk/Databases/
    34. Curso práctico de base de datos e  integración de información biológica
    35. Challenges of Data Integration • Different types of data (sequence, function, literature etc.) • Different data formats (FASTA, EMBL, Genbank, tab  delimited etc.) • Different storage formats (ASCII flatfile, XML, RDBMS) • No standard formats for common fields (citations,  descriptions, dates etc.) • Volume and size of data
    36. BioMart is a simple and robust data integration system for large scale data querying, providing researchers with fast and flexible access to biological databases http://www.biomart.org/
    37. Web Services http://www.ebi.ac.uk/Tools/
    38. Challenges when using tools in unison • Manually transfer data from one application to  another • Understand disparate data formats • Convert file formats where appropriate • Manage and understand disparate application  environments e.g. web browser, desktop  application
    39. submission curation ws ws ws ws ws dataflow workflow
    40. RESTful web services GET, POST HTML,XML,PNG REST: REpresentational State Transfer http://www.ebi.ac.uk/Tools/webservices/rest/dbfetch/uniprot/slpi_human
    41. Any web page is a web service http://www.ebi.ac.uk/cgi-bin/dbfetch?db=uniprot&id=alk1_human&style=html&format=default
    42. Friendly URL and XML documents http://www.ebi.ac.uk/Tools/webservices/rest/dbfetch/uniprot/slpi_human • http://www.ebi.ac.uk/Tools/webservices/rest/dbfecth/uniprot/slpi_human/xml • http://www.ebi.ac.uk/Tools/webservices/rest/dbfetch/uniprot/slpi_human/fasta •
    43. Biomart query <Query virtualSchemaName=\"central_server_1\"> <Dataset name=\"hsapiens_gene_ensembl\" > <Attribute name=\"ensembl_gene_id\"/> <Attribute name=\"ensembl_transcript_id\"/> <Filter name=\"chromosome_name\" value=\"1\"/> <Filter name=\"band_end\" value=”p36.33\"/> <Filter name=\"band_start\" value=”q44\"/> </Dataset> <Dataset name=\"msd\"> <Attribute name=\"pdb_id\"/> <Attribute name=”experiment_type\"/> <Filter name=\"experiment_type\" value=”NMR\"/> </Dataset> </Query>
    44. SOAP services SOAP: Simple Object Access Protocol fetchData(uniprot,wap_rat,default,xml)
    45. wsdbfetch fetchData (db, id, format, style) entry
    46. Perl client use SOAP::Lite; my $WSDL='http://www.ebi.ac.uk/Tools/webservices/wsdl/WSDbfetch.wsdl'; my $soap = SOAP::Lite->service($WSDL); # fetchData dbName:id <format> <style> my $result = $soap->fetchData(‘uniprot’, ‘default’, ‘raw’); die $soap->call->faultstring if $soap->call->fault; foreach my $i (@$result) { print \"$i\\n\"; }
    47. EBI web services (analysis tools) run(params, data) jobid checkStatus (jobid) status getResults (jobid) results available poll (jobid, type) result file
    48. Perl client use SOAP::Lite; my $WSDL = 'http://www.ebi.ac.uk/Tools/webservices/wsdl/WSFasta.wsdl'; my $fasta_client = SOAP::Lite->service($WSDL); my %params=(); $params{'program'}='fasta3'; $params{'database'}='uniprot'; $params{'email'}='your@email.com'; $params{‘async'}= 1; $data={type=>\"sequence\", content=>\"MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDSFSEDTECI\"}; # $data={type=>\"sequence\", # content=>“uniprot:slpi_human\"}; my $jobid = $fasta_client >runFasta( SOAP::Data->name('params')->type(map=>\\%params), SOAP::Data->name( content => [$data])); print $fasta_client->poll($jobid);
    49. Perl client (cont.) # set a loop for checking job submission status # RUNNING, NOT_FOUND, ERROR, DONE my $status = $fasta_client ->checkStatus($jobid); while (status eq \"RUNNING\") { sleep 10; $status = $fasta_client->checkStatus($jobid); } # when job is done, poll for the results my $result = $fasta_client ->poll($jobid) if ($status eq \"DONE\") ; print $result;
    50. http://taverna.sourceforge.net/
    51. http://www.myexperiment.org/users/471
    52. high throughput genomics
    53. data management
    54. http://base.thep.lu.se/ https://carmaweb.genome.tugraz.at/
    55. Why must support standards? • Unambiguous representation, description  and communication – Final results and metadata • Interoperability  – Data management and analysis  • Integration of OMICS    system biology
    56. What to standarize? • CONTENT: Minimal/Core Information to be  reported ‐> MIBBI (http://www.mibbi.org) • SEMANTIC: Terminology Used ‐>  Ontologies, OBI (http://obi‐ontology.org) • SYNTAX: Data Model, Data Exchange ‐ >Fuge (http://fuge.sourceforge.net/) ISA‐ TAB, MAGE‐TAB, PRIDE
    57. MIBBI: Standard Content Promoting Coherent Minimum Reporting Requirements for Biological and  Biomedical Investigations: The MIBBI Project, Taylor et Al, Nature Biotechnology
    58. data analysis
    59. Biological question Microarray Experimental design Microarray experiment Image analysis Expression quantification Pre‐processing A Normalization n RT‐PCR a l Testing Estimation Prediction Clustering y s Biological verification  i and interpretation s
    60. r‐project.org R is an open source implementation of the S Language  • Many statistical and machine learning algorithms • Good visualization capabilities • Possible to write scripts that can be reused • Sophisticated package creation and distribution system • Supports many data technologies: XML, DBI, SOAP • Interacts with other languages: C; Perl; Python; Java • R is largely platform independent: Unix; Windows; OSX • R has an active user community  • cran.r‐project.org
    61. BioConductor • Access wide range of powerful statistical and graphical tools • Facilitate the integration of biological metadata in the analysis of  experimental data • Allow the rapid development of extensible, scalable, and  interoperable software;  • Promote high‐quality documentation and reproducible research. • Provide training in computational and statistical methods for the  analysis of genomic data.  http://www.bioconductor.org/
    62. Bioconductor Packages/libraries Two releases each year that follow the biannual releases of R  294 software packages 490 Metadata packages >700 citations No. software packages Release        1.1        1.2          1.3        1.4       1.5          1.6        1.7         1.8        1.9       2.0      2.1          2.2       2.3 ‐> 294  packages
    63. Bioconductor for Microarray Analysis Quickly becoming the accepted approach • Open source • Flexible • (fairly) simple to use ‐ intuitive • Wide applications – many packages •
    64. affy package Pre-processing oligonucleotide chip data: • diagnostic plots, • background correction, • probe-level normalization, • computation of expression measures. plotAffyRNADeg barplot.ProbeSet image plotDensity
    65. mva package heatmap
    66. proteomics
    67. http://www.agml.org/
    68. Trans‐Proteomic Pipeline (TPP) is a collection of integrated tools for MS/MS proteomics http://tools.proteomecenter.org http://proteowizard.sourceforge.net http://www.thegpm.org/TANDEM
    69. Bioclipse Editor View View Console Properties http://www.bioclipse.net/
    70. Work with spectra: Spectrum plugin
    71. Work with sequences: BioJava plugin
    72. CMLRSS plugin: Chemistry on the web
    73. cytoscape http://www.cytoscape.org
    74. pyMol http://www.pymol.org
    75. image processing
    76. Open Microscopy Environment • OME is a multi‐site collaborative effort among academic  laboratories and a number of commercial entities that  produces open tools to support data management for  biological light microscopy. • The original OME server is an application written in Perl  running under Apache. It is accessed using a Web User  Interface, via a Java API, or using a plugin for ImageJ. • The server can support images in a wide range of file  formats. This model is also extendable allowing custom  data to be stored in the server. • It supports multiple users and provides appropriate  security for private research and collaboration. http://openmicroscopy.org
    77. OMERO
    78. OMERO
    79. beyond software
    80. At $150,000, the Polonator is the cheapest instrument on the market, says Harvard  University's George Church, whose lab developed the technology in conjunction with Dover Systems, Plus, the tool uses five‐ fold less reagents than other platforms, and  is the smallest instrument available.  http://www.polonator.org/
    81. http://www.igem.org http://www.bioparts.org/
    82. where is the stuff
    83. http://bioinformatics.oxfordjournals.org
    84. http://nar.oxfordjournals.org
    85. http://www.biomedcentral.com/bmcbioinformatics/
    86. http://genomebiology.com/software/
    87. the future
    88. [Phil Bourne] Growth of open access scientists digital natives, always online, hybrids catalysts for change
    89. • Making scientific research “re‐useful” — We help  people and organizations open and mark their research  and data for reuse.  • Enabling “one‐click” access to research materials — We help streamline the materials‐transfer process so  researchers can easily replicate, verify and extend  research.  • Integrating fragmented information sources — We  help researchers find, analyze and use data from  disparate sources by marking and integrating the  information with a common, computer‐readable  language. 

    + Alberto LabargaAlberto Labarga, 9 months ago

    custom

    1194 views, 5 favs, 2 embeds more stats

    an overview of the free software philosophy as it h more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1194
      • 1144 on SlideShare
      • 50 from embeds
    • Comments 0
    • Favorites 5
    • Downloads 94
    Most viewed embeds
    • 46 views on http://blogs.scientifik.info
    • 4 views on http://www.scientifik.info

    more

    All embeds
    • 46 views on http://blogs.scientifik.info
    • 4 views on http://www.scientifik.info

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories