• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Presentation at ZSJ 2013 by Shigehiro Kuraku
 

Presentation at ZSJ 2013 by Shigehiro Kuraku

on

  • 287 views

Slides from an oral presentation given in Japanese at ZSJ 2013 Meeting in Okayama, Japan, in September 2013.

Slides from an oral presentation given in Japanese at ZSJ 2013 Meeting in Okayama, Japan, in September 2013.

Statistics

Views

Total Views
287
Views on SlideShare
287
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Presentation at ZSJ 2013 by Shigehiro Kuraku Presentation at ZSJ 2013 by Shigehiro Kuraku Presentation Transcript

    • ・ ・ ・ ・
    • ‘the complete set of phylogenetic trees derived from the proteome of an organism’ Sicheritz-Pontén and Andersson, 2001. Nuc. Acids Res. 29: 545 genome-wide events + gene family-specific events August 2012. At Daitoku-ji Temple, Kyoto
    • Hypothesis A Hypothesis B Hypothesis C chicken chicken chicken shark shark shark lamprey hagfish lamprey hagfish lamprey hagfish Cyclostomes human Cyclostomes human Cyclostomes human amphioxus amphioxus amphioxus tunicate tunicate tunicate - Mol. phylogeny of 55 gene families Kuraku et al., 2009. MBE - Globin gene phylogeny Hoffmann et al., 2010. PNAS - Sea lamprey genome analysis Smith, Kuraku et al., 2013. Nature Genetics - Composition of Hox/Dlx clusters Neidert et al., 2001. PNAS Irvine et al., 2002. J Exp Zool B Force et al., 2002. J Exp Zool B etc - Mol. phylogeny of 33 gene families Escriva et al., 2002. MBE - Amphioxus genome Putnam et al., 2008. Nature - ParaHox clusters Furlong et al., 2007. MBE
    • Kuraku and Kuratani, 2011 Heuristic ML JTT+G4 ML-BP/NJ-BP
    • (Kuraku & Kuratani, 2011. Genome Biol. Evol.) (cf. hidden paralogy)
    • Informatics Modern sequencing Genome Resource & Analysis Unit Center for Developmental Biology RIKEN, Kobe, Japan Molecular Developmental Biology
    • Sanger sequencing, Cell sorting with FACS, clone distribution, etc. illumina HiSeq1500 ~150 bp reads in Rapid Run mode Installed in November 2011
    • Not only sequencing Kuraku et al., 2013. Nucleic Acids Res. Amemiya et al., 2013
    • ・ ・ ・ ・
    • Our experiences at GRAS ・Main applications: RNA-seq & ChIP-seq ・Diverse non-model organisms for RNA-seq ・Trouble shooting with tight wet-dry communication ・Many requests with limited sample amounts
    • For retrieving complete genome and original transcriptome ・Sequencers ‘can’ produce ‘data’ from problematic samples Low quality DNA/RNA, contamination, over-amplification, … ・Look carefully for acceptable pricing and service contents e.g. How many reads do you need? ・Longer illumina reads are not necessarily beneficial ~150bp on HiSeq & ~300bp MiSeq (as of September 2013) Prep of libraries with longer inserts
    • ・ ・ ・ ・
    • Species Sequenced at Gene model by Sequencing technology Published in # of authors Started in sea lamprey Wash. Univ. Yandell lab / Ensembl Sanger Nat. Genet. (2013) 59 2005? soft-shelled turtle BGI BGI / Ensembl illumina Nat. Genet. (2013) 34 2010 coelacanth Broad Institute Broad / Ensembl illumina Nature (2013) 91 2011
    • Sequenced at Wash. Univ. Genome Institute International consortium Smith, Kuraku, et al. 2013. Nature Genetics Contributed analysis Vertebrate ‘new genes’ GC & codon usage bias Myelin-associated genes In-house annotation effort Trained gene prediction setting available at Augustus web server GC-content & codon usage bias Qiu et al., 2011. BMC Genomics Horizontal gene transfer Kuraku et al., 2012. Genome Biol. Evol. http://www.ensembl.org/Petromyzon_marinus/Info/Index Coding genes: 10,415 Incomplete genome assembly: Pax6 missing Incomplete gene annotation: Fgf8/17-A missing (as of September 2013; release 73)
    • Amino acid composition CA Methods: Correspondence analysis for frequencies of 20 amino acids CA Deviation of ‘gene model’ in lamprey genome Smith, Kuraku, et al. 2013. Nature Genetics
    • Codon usage bias Methods: RSCU (Sharp et al., 1986) and ENc (Wright, 1990) N sea lamprey stickleback Tetraodon Takifugu platypus medaka dog human mouse ghost shark zebrafish chicken anole lizard opossum X. tropicalis Heavy use of GC-rich codons Qiu et al., 2011. BMC Genomics
    • Genomic DNA Sanger, 454, illumina, or/and PacBio Heterochromatin etc. Raw reads Assembly Repeats, regions with low depth Genome assembly (contigs/scaffolds) Gene prediction (after ‘training’) ‘Unusual’ genes ‘Gene model’ (protein-coding sequences) Reference: transcriptome, annotated genes in GenBank
    • Genomic DNA Sanger, 454, illumina, or/and PacBio Raw reads Assembly Genome assembly (contigs/scaffolds) Gene prediction (after ‘training’) ‘Gene model’ (protein-coding sequences) Reference: transcriptome, annotated genes in GenBank
    • (cf. Assemblathon2 - Bradnam et al., 2013) ‘NG50’ instead of N50 CEGMA (Parra et al., 2007) – coverage of CEGs CGAL, REAPR, ALE – evaluation by identifying misassemblies QUAST – computation of assembly summary
    • Species Assembly release # of CEGs found (including ‘partial’) Published? human GRCh37 (hg19) 248 First draft in 2001 mouse GRCm38 (mm10) 239 First draft in 2002 X. tropicalis JGI_4.2 239 Hellsten et al., 2010 coelacanth LatCal1 236 Amemiya et al., 2013 spotted gar LepOcu1 235 soft-shell turtle PelSin_1.0 232 Wang et al., 2013 anole lizard AnoCar2.0 231 Alföldi et al., 2011 zebrafish Zv9 230 Howe et al., 2013 chicken galGal4 220 chicken WASHUC2.63 (galGal3) 210 First draft in 2004 Japanese lamprey LetCam1 199 Mehta et al., 2013 sea lamprey PerMar1 172 Smith et al., 2013 little skate version2 77 elephant shark (1.4x) 58 unpublished unpublished Venkatesh et al., 2007 248 core eukaryotic genes (CEGs)
    • Genomic DNA Sanger, 454, illumina, or/and PacBio Raw reads Assembly Genome assembly (contigs/scaffolds) Gene prediction (after ‘training’) ‘Gene model’ (protein-coding sequences) Reference: transcriptome, annotated genes in GenBank
    • (cf. Assemblathon2 - Bradnam et al., 2013) ‘NG50’ instead of N50 CEGMA (Parra et al., 2007) – coverage of CEGs CGAL, REAPR, ALE – evaluation by identifying misassemblies QUAST – computation of assembly summary ‘Annotation Turnover’ and ‘AED’ (Eilbeck et al., 2009) Also, run CEGMA to check transcript diversity?
    • – Nakamura et al., 2013
    • ・ ・ ・ ・
    • - Phylogenetic property of the species of your interest e.g. Ploidy level, distance to close relatives, … www.genomesize.com, www.timetree.org - Any clue about its molecular attributes ? e.g. GC-content, repeats, intron/UTR length, … Using existing resources at SRA & Sanger traces at NCBI dbEST
    • - Genome or transcriptome to sequence ? Any existing or emerging resources? - RNA-seq: sequence identification or quantification? - Sample prep mostly determines the fate of the project Quantification with Qubit; rRNA removal controlled with BioAnalyzer Replication > Depth (Rapaport et al., 2013. Genome Biol.) - Rigorous QC of prepared libraries before sequencing ChIP-qPCR before ChIP-seq
    • - Fostering more productive sequencing facilities in Japan GRAS accepts visits of facility managers/staffs - Education of researchers with dual (wet/dry) capabilities ‘A sequencer or a bioinformatician ?‘ Learning material: ‘Unix & Perl for Biologists’ by Korf Lab http://korflab.ucdavis.edu/unix_and_Perl/ - Importing latest information from overseas → shigehiro-kuraku@cdb.riken.jp