Cool Informatics Tools and Services
      for Biomedical Research


                David Ruau, PhD.
                August 1st, 2012                           @druau



     Sponsored by the Office of Postdoctoral Affairs and
                 the Lane Medical Library
BIG DATA
BIG DATA
Big Data in Biomedicine




http://www.nature.com/news/gene-data-to-hit-milestone-1.11019
We live in a Big Dataoutline
                 Course world


1. Analyzing genomic data
   1. Traditional bioinformatics tools
   2. Microarrays/gene lists without any code
   3. Microarrays/gene lists with code
   4. NGS and mRNA-seq
2. Beyond genomic
   1. Protein-protein interaction network
3. General data handling tools
   1. Storing your data
   2. Data are dirty
4. Statistics made easy
5. Graphics rules!
6. Demystifying “the work”! (the code)
7. Conclusion + Q&A
We live in a Big Data world tools
          Traditional bioinformatics
Bioinformatics software to solve everyday problems.

The EMBOSS tool suite http://emboss.sourceforge.net/
One web portal is: http://mobyle.pasteur.fr/cgi-bin/portal.py
- DNA / AA Pairwise global and local alignment
- Sequence feature analysis (CpG island, gene scan, restriction enzyme site,
  2D/3D structure...)
- Protein structure and domains
- Similarity search (Blast, phi-blast, psi-blast, delta-blast...)
- Phylogenetics (trees from multiple alignments)
- ...
We live in a Big Data world tools
          Traditional bioinformatics
Bioinformatics software to solve everyday problems.

The EMBOSS tool suite http://emboss.sourceforge.net/
One web portal is: http://mobyle.pasteur.fr/cgi-bin/portal.py
- DNA / AA Pairwise global and local alignment
- Sequence feature analysis (CpG island, gene scan, restriction enzyme site,
  2D/3D structure...)
- Protein structure and domains
- Similarity search (Blast, phi-blast, psi-blast, delta-blast...)
- Phylogenetics (trees from multiple alignments)
- ...




                                                                 UPGMA joining method
We live in a Big Data world tools
          Traditional bioinformatics
Bioinformatics software to solve everyday problems.

Some tools are provided through databases interface such as NCBI Entrez.
- The UCSC genome browser.
   - The Encode project results
   - For example: visualize GC content and restriction enzyme site in your
       gene of interest.
Stating the obvious

This is not because you have
  a GUI that the analysis is
      brain dead simple.
We live in a Big Data worlddata
              Analyzing genomic
Analyzing microarray gene expression microarray without any code.

Gene Pattern: http://genepattern.broadinstitute.org/gp/
We live in a Big Data worlddata
                  Analyzing genomic

   Upload your expression data as a text file.
   Gene Pattern takes RES and GCT files. Conversion tools are provided




To transform CEL files to GCT. RES
We live in a Big Data worlddata
              Analyzing genomic


StarBiogene http://web.mit.edu/star/biogene/index.html (java web app)
- Part of GenePattern but provide pipeline style process online

SeqExpress http://www.seqexpress.com/ (Windows only)
- Alternative independent application (less activity than GenePattern)

Expander http://acgt.cs.tau.ac.il/expander/
- Alternative independent application (less activity than GenePattern)

RMAExpress http://rmaexpress.bmbolstad.com/
- Interesting to perform a quality control of your microarrays.

Cluster http://bonsai.hgc.jp/~mdehoon/software/cluster/
- This is the original program to analyze microarray results. No pre-processing
   functionality. You need to pre-process separately (using RMAExpress for example)

SAM http://www-stat.stanford.edu/~tibs/SAM/ (significance Analysis of Microarrays)
- To extract the DE genes. This is a Excel plugin. Again, you need to pre-process
separately
We live in a Big Data worlddata
              Analyzing genomic

Commercial solution
Genespring GX (first 20 days are free)
Access through subscription @ Stanford with CMGM http://cmgm3.stanford.edu
We live in a Big Datayour results
                 Interpreting world

Interpreting a gene list rely on external knowledge.
Several resources / tools are available to help.

     KEGG:               http://www.genome.jp/kegg/
                         pathway database
     REACTOME:           http://www.reactome.org
                         pathway 2.0 database
     Gene Ontology:      http://www.geneontology.org/
                         the ultimate resource for gene function, processes, localization
     BioMart:            http://www.biomart.org/
                         Portal providing access to multiple database
     GSEA:               http://www.broadinstitute.org/gsea/index.jsp
                         part of GenePattern but also R
     David:              http://david.abcc.ncifcrf.gov/
                         to perform an over-representation analysis
     Bingo:              http://www.psb.ugent.be/cbd/papers/BiNGO/Home.html
                         over-representation analysis but produce graphical result
(cytoscape)
     BioGPS:             http://biogps.org/
                         To know where your gene is expressed in the body or which cell line
We live in a Big Datayour results
                  Interpreting world

Reactome
• Made to be used programmatically
      • Cytoscape (a network tool) has a plugin for Reactome.
      Just give a gene list or a list of gene + the number of sample where the gene is mutated
(for Cox survival analysis)




- Retrieve a network from a gene list
- Do network analysis
- Perform Gene Ontology analysis
- Survival analysis
http://www.reactome.org/userguide/Usersguide.html#FI_Network_Tool
We live in a Big Datayour results
                 Interpreting world

DAVID database
Perform fast over-representation analysis again different databases
- KEGG; Reactome; OMIM (diseases), Generif (literature), protein domain etc...




Protein domains
We live in a Big Datayour results
                 Interpreting world

bioGPS. Exploring expression across tissues and cell lines




                               Look at other library of
                               tissues
We live in a Big Datayour results
                  Interpreting world

RMAexpress and quality control of microarrays

Several test exist to test if the microarray performed correctly.

Hall of fame of failed microarrays:
http://plmimagegallery.bmbolstad.com/
Analyzing public gene expression data
      We live in a Big Data world
Analyzing public microarray with code (kind of...)
Analyzing public gene expression data
We live in a Big Data world




              Then clic on “TOP 250” button
Analyzing public gene expression data
We live in a Big Data world

                                      Top 250 genes




                                     R code
Next Generation Sequencing
      We live in a Big Data world
Next Generation Sequencing
The main NGS platform are:
• Roche /454 (Genome Sequencer; GS)
• Illumina/Solexa (Genome Analyzer software)
• SOLiD (Applied Bioscience)
Upcoming challengers:
• Ion Torrent (Illumina)
• Oxford Nanopore
                                                    What you should request




                                    Done by the
                                    core facility
Analyzing mRNA-seq
     We live in a Big Data world
Analyzing mRNA-seq data: 4 steps.
1- Alignment and trimming of reads:
      [no GUI]                                       [with GUI and commercial]
      Tophat (assembly and splice junction mapper)   Genome Studio from Illumina
      Cufflinks (assembly and RPKM estimates)        Genomequest [looks pretty awesome.]
      GALAXY provide access to Tophat, Cufflinks.

2- Calling variants and indels:
     GATK (http://www.broadinstitute.org/gsa/wiki/index.php/Home_Page)
     VarScan (http://varscan.sourceforge.net/)
     SHRIMP2; VARiD; Atlas-SNP2; SomaticSniper...
     Interpretation of variants: SIFT (galaxy)

3- Finding differentially expressed genes
      Cuffdiff (galaxy)
      DEXseq (R)

4- Visualization:
     SAVANT (http://genomesavant.com/savant/)
     IGV (http://www.broadinstitute.org/software/igv)
How to use Galaxy?
      We live in a Big Data world
Analyzing mRNAseq data: Introducing GALAXY

      http://galaxy.psu.edu/
We live in a Big Data the cloud
             Working in world




Dudley JT, and Butte AJ. 2010. In silico research in the era of cloud computing. Nat
Biotechnol 28: 1181–1185.
Summary mRNA-seq
    We live in a Big Data world
GALAXY
This is a compendium of software. You even have UNIX tools and EMBOSS in it.
Take home message:
FASTQ files > Tophat > Cuffdiff > IGV (for differential expression)
FASTQ files > Tophat > GATK > IGV (for variant detection)




Where to find help: http://seqanswers.com

Analyzing RNAseq using R
DEXSeq is a R / BioConductor package.
R is a statistical programming software
widely used in bioinformatics
Summary mRNA-seq
    We live in a Big Data world
Additional tools for genomic
-- Genomespace: http://www.genomespace.org
     Collection of tools: GenePattern, Galaxy, cytoscape, genomica etc... (free
apparently). Data are stored in the cloud on Amazon VM.

If you do not want to do it yourself:
-- Science exchange: https://www.scienceexchange.com/
      Science job for hire! This is where top core facilities compete to provide the
best service.
-- Assay Depot: https://www.assaydepot.com/
      like home depot but for science




-- taskrabbit: http://www.taskrabbit.com/
      If science take too much of your time!
Beyond genomics: results interpretation
  We live in a Big Data world
Interpreting your gene list with protein-protein interaction network.

iHOP: http://www.ihop-net.org/UniPub/iHOP/




Ingenuity Pathway Analysis
(commercial) access through
CMGM @ stanford
Beyond genomics: results interpretation
    We live in a Big Data world
Looking into PPI databases:
    IntAct: http://www.ebi.ac.uk/intact/
    BioGrid: http://thebiogrid.org/ (soon multigene search)
    HPRD: http://www.hprd.org/index_html

What about open-source solutions for searching the interaction between the genes in
your gene list?
• Cytoscape http://cytoscape.org
     •   BioNetBuilder http://chianti.ucsd.edu/cyto_web/plugins/
     •   ...
• R for programmatic access to databases
     •   http://brainchronicle.blogspot.com

The plus of using R is that results are
reproducible and you can share your method
more easily than with point and clic interface.
We live in management and manipulation
         Data a Big Data world

REDCap: http://project-redcap.org/
    Web app for building and managing online survey and databases




To find participants: https://www.researchmatch.org

MySQL for a professional relational database.
   Requires some programming skills in SQL and database design.


Application to query and build databases (goodbye command line):
[OS X]: SequelPro
[Windows]: sqlyog; Toad for MySQL...
Data are dirty...
    We live in a Big Data world
How to clean your data more efficiently than doing everything by hand?

 12:10:00   9999999    POCT Comment                 GLUCOSE BY METER
 21:24:00        51    O2 Saturation, ISTAT (Ven)   ISTAT EG7, VENOUS
  5:39:00        91    Glu                          GLUCOSE BY METER
 10:58:00   9999999    Comments                     BLOOD CULTURE (2 AEROBIC BOTTLES)
  9:36:00   9999999    Report Status                BLOOD CULTURE (2 AEROBIC BOTTLES)
 16:25:00        25    CO2, Ser/Plas                METABOLIC PANEL, COMPREHENSIVE
  8:12:00       132    Glucose, Ser/Plas            METABOLIC PANEL, BASIC
  8:06:00        5.7   MONO, %                      CBC WITH DIFF
  8:01:00        9.6   Glucose                      METABOLIC PANEL, BASIC
 13:22:00       16.2   CO2 (a)                      BLOOD GASES, ARTERIAL
  4:45:00        2.7   MONO                         CBC WITH DIFF



DataWrangler @ Stanford
http://vimeo.com/19185801

Google-refine @ down the road.
A bit less intuitive than Wrangler.


For more complex data transformation: reshape2 package in R
We live in a Statistics made easy...
                 Big Data world
Excel... Obviously. But what else when you want something more powerful?

• Switch to a statistical software like R.
   • R graphical interface: Deducer (http://www.deducer.org/)
     •   http://www.youtube.com/watch?v=T6kOvlMaFCA


The case of starting using R
1. Powerful statistics procedures
     • R has become the lingua franca for statistical programming
2. Packages for everything from
     • Flow cytometry
     • DNA microarrays
     • RNA-seq
     • Google graph API
     • ... See http://goo.gl/RwER7
3. Graphics, graphics, graphics...
     • R graphical manual: http://goo.gl/qSHMQ
We live in a BigGraphics in R
                 Data world
We liveData Science Visualization: Circos
          in a Big Data world

CIRCOS: http://circos.ca/
To visualize genome scale interaction and
functional information




                                            CIRCOS is a Perl program. Some light
                                            programming is needed. But it is worth it!
We live in Data Science Visualization
               a Big Data world
Tableau: http://www.tableausoftware.com/ Great for geo-localized data
We live in Data Science Visualization
               a Big Data world
Google Visualization: https://developers.google.com/chart/interactive/docs/gallery

Require data in JSON format. Fortunately a bridge with R is possible.




      Earthquake in Japan
We live in Data Science Visualization
                a Big Data world
 Google Visualization: https://developers.google.com/chart/interactive/docs/gallery

 Motion chart
 http://www.youtube.com/watch?v=rnF-7TCIe08




R commands:
> M1 <- gvisMotionChart(Fruits, idvar="Fruit", timevar="Year”)
> plot(M1)
We live in aDemystifying the work
                   Big Data world
  Its all about “reproducible research”

  Sharing your analytical process (aka. what you did) is as important as the final manuscript.

  How do you share what you did with a graphical interface?

  The solution is to use a programming language, like R if suitable, and share your code.

Several tools can make your life easier.
Rstudio or Deducer


Come to the workshop in 2 weeks!
We live in a Big The kitchen
                    Data world
TextMate and NotePad++ for coding

Use version control systems like GitHub or Bitbucket

To make research reproducible when data are not available:
DataThief: http://www.datathief.org/

To follow the last buzz in science: Twitter     @druau




 Some R books. Most of those book are available online
 for free through the Stanford Library.
Q&A
We live in a Big Data world

This Class was sponsored by the Office of Postdoctoral Affairs and
the Lane Library


Offline questions to druau@stanford.edu




                                Thanks!

Cool Informatics Tools and Services for Biomedical Research

  • 1.
    Cool Informatics Toolsand Services for Biomedical Research David Ruau, PhD. August 1st, 2012 @druau Sponsored by the Office of Postdoctoral Affairs and the Lane Medical Library
  • 2.
  • 3.
  • 4.
    Big Data inBiomedicine http://www.nature.com/news/gene-data-to-hit-milestone-1.11019
  • 5.
    We live ina Big Dataoutline Course world 1. Analyzing genomic data 1. Traditional bioinformatics tools 2. Microarrays/gene lists without any code 3. Microarrays/gene lists with code 4. NGS and mRNA-seq 2. Beyond genomic 1. Protein-protein interaction network 3. General data handling tools 1. Storing your data 2. Data are dirty 4. Statistics made easy 5. Graphics rules! 6. Demystifying “the work”! (the code) 7. Conclusion + Q&A
  • 6.
    We live ina Big Data world tools Traditional bioinformatics Bioinformatics software to solve everyday problems. The EMBOSS tool suite http://emboss.sourceforge.net/ One web portal is: http://mobyle.pasteur.fr/cgi-bin/portal.py - DNA / AA Pairwise global and local alignment - Sequence feature analysis (CpG island, gene scan, restriction enzyme site, 2D/3D structure...) - Protein structure and domains - Similarity search (Blast, phi-blast, psi-blast, delta-blast...) - Phylogenetics (trees from multiple alignments) - ...
  • 7.
    We live ina Big Data world tools Traditional bioinformatics Bioinformatics software to solve everyday problems. The EMBOSS tool suite http://emboss.sourceforge.net/ One web portal is: http://mobyle.pasteur.fr/cgi-bin/portal.py - DNA / AA Pairwise global and local alignment - Sequence feature analysis (CpG island, gene scan, restriction enzyme site, 2D/3D structure...) - Protein structure and domains - Similarity search (Blast, phi-blast, psi-blast, delta-blast...) - Phylogenetics (trees from multiple alignments) - ... UPGMA joining method
  • 8.
    We live ina Big Data world tools Traditional bioinformatics Bioinformatics software to solve everyday problems. Some tools are provided through databases interface such as NCBI Entrez. - The UCSC genome browser. - The Encode project results - For example: visualize GC content and restriction enzyme site in your gene of interest.
  • 9.
    Stating the obvious Thisis not because you have a GUI that the analysis is brain dead simple.
  • 10.
    We live ina Big Data worlddata Analyzing genomic Analyzing microarray gene expression microarray without any code. Gene Pattern: http://genepattern.broadinstitute.org/gp/
  • 11.
    We live ina Big Data worlddata Analyzing genomic Upload your expression data as a text file. Gene Pattern takes RES and GCT files. Conversion tools are provided To transform CEL files to GCT. RES
  • 12.
    We live ina Big Data worlddata Analyzing genomic StarBiogene http://web.mit.edu/star/biogene/index.html (java web app) - Part of GenePattern but provide pipeline style process online SeqExpress http://www.seqexpress.com/ (Windows only) - Alternative independent application (less activity than GenePattern) Expander http://acgt.cs.tau.ac.il/expander/ - Alternative independent application (less activity than GenePattern) RMAExpress http://rmaexpress.bmbolstad.com/ - Interesting to perform a quality control of your microarrays. Cluster http://bonsai.hgc.jp/~mdehoon/software/cluster/ - This is the original program to analyze microarray results. No pre-processing functionality. You need to pre-process separately (using RMAExpress for example) SAM http://www-stat.stanford.edu/~tibs/SAM/ (significance Analysis of Microarrays) - To extract the DE genes. This is a Excel plugin. Again, you need to pre-process separately
  • 13.
    We live ina Big Data worlddata Analyzing genomic Commercial solution Genespring GX (first 20 days are free) Access through subscription @ Stanford with CMGM http://cmgm3.stanford.edu
  • 14.
    We live ina Big Datayour results Interpreting world Interpreting a gene list rely on external knowledge. Several resources / tools are available to help. KEGG: http://www.genome.jp/kegg/ pathway database REACTOME: http://www.reactome.org pathway 2.0 database Gene Ontology: http://www.geneontology.org/ the ultimate resource for gene function, processes, localization BioMart: http://www.biomart.org/ Portal providing access to multiple database GSEA: http://www.broadinstitute.org/gsea/index.jsp part of GenePattern but also R David: http://david.abcc.ncifcrf.gov/ to perform an over-representation analysis Bingo: http://www.psb.ugent.be/cbd/papers/BiNGO/Home.html over-representation analysis but produce graphical result (cytoscape) BioGPS: http://biogps.org/ To know where your gene is expressed in the body or which cell line
  • 15.
    We live ina Big Datayour results Interpreting world Reactome • Made to be used programmatically • Cytoscape (a network tool) has a plugin for Reactome. Just give a gene list or a list of gene + the number of sample where the gene is mutated (for Cox survival analysis) - Retrieve a network from a gene list - Do network analysis - Perform Gene Ontology analysis - Survival analysis http://www.reactome.org/userguide/Usersguide.html#FI_Network_Tool
  • 16.
    We live ina Big Datayour results Interpreting world DAVID database Perform fast over-representation analysis again different databases - KEGG; Reactome; OMIM (diseases), Generif (literature), protein domain etc... Protein domains
  • 17.
    We live ina Big Datayour results Interpreting world bioGPS. Exploring expression across tissues and cell lines Look at other library of tissues
  • 18.
    We live ina Big Datayour results Interpreting world RMAexpress and quality control of microarrays Several test exist to test if the microarray performed correctly. Hall of fame of failed microarrays: http://plmimagegallery.bmbolstad.com/
  • 19.
    Analyzing public geneexpression data We live in a Big Data world Analyzing public microarray with code (kind of...)
  • 20.
    Analyzing public geneexpression data We live in a Big Data world Then clic on “TOP 250” button
  • 21.
    Analyzing public geneexpression data We live in a Big Data world Top 250 genes R code
  • 22.
    Next Generation Sequencing We live in a Big Data world Next Generation Sequencing The main NGS platform are: • Roche /454 (Genome Sequencer; GS) • Illumina/Solexa (Genome Analyzer software) • SOLiD (Applied Bioscience) Upcoming challengers: • Ion Torrent (Illumina) • Oxford Nanopore What you should request Done by the core facility
  • 23.
    Analyzing mRNA-seq We live in a Big Data world Analyzing mRNA-seq data: 4 steps. 1- Alignment and trimming of reads: [no GUI] [with GUI and commercial] Tophat (assembly and splice junction mapper) Genome Studio from Illumina Cufflinks (assembly and RPKM estimates) Genomequest [looks pretty awesome.] GALAXY provide access to Tophat, Cufflinks. 2- Calling variants and indels: GATK (http://www.broadinstitute.org/gsa/wiki/index.php/Home_Page) VarScan (http://varscan.sourceforge.net/) SHRIMP2; VARiD; Atlas-SNP2; SomaticSniper... Interpretation of variants: SIFT (galaxy) 3- Finding differentially expressed genes Cuffdiff (galaxy) DEXseq (R) 4- Visualization: SAVANT (http://genomesavant.com/savant/) IGV (http://www.broadinstitute.org/software/igv)
  • 24.
    How to useGalaxy? We live in a Big Data world Analyzing mRNAseq data: Introducing GALAXY http://galaxy.psu.edu/
  • 25.
    We live ina Big Data the cloud Working in world Dudley JT, and Butte AJ. 2010. In silico research in the era of cloud computing. Nat Biotechnol 28: 1181–1185.
  • 26.
    Summary mRNA-seq We live in a Big Data world GALAXY This is a compendium of software. You even have UNIX tools and EMBOSS in it. Take home message: FASTQ files > Tophat > Cuffdiff > IGV (for differential expression) FASTQ files > Tophat > GATK > IGV (for variant detection) Where to find help: http://seqanswers.com Analyzing RNAseq using R DEXSeq is a R / BioConductor package. R is a statistical programming software widely used in bioinformatics
  • 27.
    Summary mRNA-seq We live in a Big Data world Additional tools for genomic -- Genomespace: http://www.genomespace.org Collection of tools: GenePattern, Galaxy, cytoscape, genomica etc... (free apparently). Data are stored in the cloud on Amazon VM. If you do not want to do it yourself: -- Science exchange: https://www.scienceexchange.com/ Science job for hire! This is where top core facilities compete to provide the best service. -- Assay Depot: https://www.assaydepot.com/ like home depot but for science -- taskrabbit: http://www.taskrabbit.com/ If science take too much of your time!
  • 28.
    Beyond genomics: resultsinterpretation We live in a Big Data world Interpreting your gene list with protein-protein interaction network. iHOP: http://www.ihop-net.org/UniPub/iHOP/ Ingenuity Pathway Analysis (commercial) access through CMGM @ stanford
  • 29.
    Beyond genomics: resultsinterpretation We live in a Big Data world Looking into PPI databases: IntAct: http://www.ebi.ac.uk/intact/ BioGrid: http://thebiogrid.org/ (soon multigene search) HPRD: http://www.hprd.org/index_html What about open-source solutions for searching the interaction between the genes in your gene list? • Cytoscape http://cytoscape.org • BioNetBuilder http://chianti.ucsd.edu/cyto_web/plugins/ • ... • R for programmatic access to databases • http://brainchronicle.blogspot.com The plus of using R is that results are reproducible and you can share your method more easily than with point and clic interface.
  • 30.
    We live inmanagement and manipulation Data a Big Data world REDCap: http://project-redcap.org/ Web app for building and managing online survey and databases To find participants: https://www.researchmatch.org MySQL for a professional relational database. Requires some programming skills in SQL and database design. Application to query and build databases (goodbye command line): [OS X]: SequelPro [Windows]: sqlyog; Toad for MySQL...
  • 31.
    Data are dirty... We live in a Big Data world How to clean your data more efficiently than doing everything by hand? 12:10:00 9999999 POCT Comment GLUCOSE BY METER 21:24:00 51 O2 Saturation, ISTAT (Ven) ISTAT EG7, VENOUS 5:39:00 91 Glu GLUCOSE BY METER 10:58:00 9999999 Comments BLOOD CULTURE (2 AEROBIC BOTTLES) 9:36:00 9999999 Report Status BLOOD CULTURE (2 AEROBIC BOTTLES) 16:25:00 25 CO2, Ser/Plas METABOLIC PANEL, COMPREHENSIVE 8:12:00 132 Glucose, Ser/Plas METABOLIC PANEL, BASIC 8:06:00 5.7 MONO, % CBC WITH DIFF 8:01:00 9.6 Glucose METABOLIC PANEL, BASIC 13:22:00 16.2 CO2 (a) BLOOD GASES, ARTERIAL 4:45:00 2.7 MONO CBC WITH DIFF DataWrangler @ Stanford http://vimeo.com/19185801 Google-refine @ down the road. A bit less intuitive than Wrangler. For more complex data transformation: reshape2 package in R
  • 32.
    We live ina Statistics made easy... Big Data world Excel... Obviously. But what else when you want something more powerful? • Switch to a statistical software like R. • R graphical interface: Deducer (http://www.deducer.org/) • http://www.youtube.com/watch?v=T6kOvlMaFCA The case of starting using R 1. Powerful statistics procedures • R has become the lingua franca for statistical programming 2. Packages for everything from • Flow cytometry • DNA microarrays • RNA-seq • Google graph API • ... See http://goo.gl/RwER7 3. Graphics, graphics, graphics... • R graphical manual: http://goo.gl/qSHMQ
  • 33.
    We live ina BigGraphics in R Data world
  • 34.
    We liveData ScienceVisualization: Circos in a Big Data world CIRCOS: http://circos.ca/ To visualize genome scale interaction and functional information CIRCOS is a Perl program. Some light programming is needed. But it is worth it!
  • 35.
    We live inData Science Visualization a Big Data world Tableau: http://www.tableausoftware.com/ Great for geo-localized data
  • 36.
    We live inData Science Visualization a Big Data world Google Visualization: https://developers.google.com/chart/interactive/docs/gallery Require data in JSON format. Fortunately a bridge with R is possible. Earthquake in Japan
  • 37.
    We live inData Science Visualization a Big Data world Google Visualization: https://developers.google.com/chart/interactive/docs/gallery Motion chart http://www.youtube.com/watch?v=rnF-7TCIe08 R commands: > M1 <- gvisMotionChart(Fruits, idvar="Fruit", timevar="Year”) > plot(M1)
  • 38.
    We live inaDemystifying the work Big Data world Its all about “reproducible research” Sharing your analytical process (aka. what you did) is as important as the final manuscript. How do you share what you did with a graphical interface? The solution is to use a programming language, like R if suitable, and share your code. Several tools can make your life easier. Rstudio or Deducer Come to the workshop in 2 weeks!
  • 39.
    We live ina Big The kitchen Data world TextMate and NotePad++ for coding Use version control systems like GitHub or Bitbucket To make research reproducible when data are not available: DataThief: http://www.datathief.org/ To follow the last buzz in science: Twitter @druau Some R books. Most of those book are available online for free through the Stanford Library.
  • 40.
    Q&A We live ina Big Data world This Class was sponsored by the Office of Postdoctoral Affairs and the Lane Library Offline questions to druau@stanford.edu Thanks!

Editor's Notes

  • #3 The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions
  • #6 The course will be organized around use case scenarios
  • #7 EMBOSS suite is just one example where to find software. There are many other software online.
  • #8 EMBOSS suite is just one example where to find software. There are many other software online.
  • #9 EMBOSS suite is just one example where to find software. There are many other software online.
  • #10 EMBOSS suite is just one example where to find software. There are many other software online.
  • #11 Analyzing gene expression data with a graphical interface simplify the task but does not mean this is a brain dead simple
  • #12 Analyzing gene expression data with a graphical interface simplify the task but does not mean this is a brain dead simple
  • #13 EXPANDER (EXpression Analyzer and DisplayER) is a java-based tool for analysis of gene expression data.SeqExpress for gene expression data
  • #14 It rely on some R packages in the background
  • #33 Demo of graphics capabilities
  • #34 Demo of graphics capabilities