Assessing the impact of transposable element variation on mouse phenotypes and traits

908 views
818 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
908
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Assessing the impact of transposable element variation on mouse phenotypes and traits

  1. 1. Thomas Keane, WTSI 14th May, 2011 Assessing the impact of transposable element variation on mouse phenotypes and traits Thomas Keane Vertebrate Resequencing Informatics Wellcome Trust Sanger Institute Cambridge, UK Christoffer Nellåker and Chris Ponting MRC Functional Genomics Unit University of Oxford Oxford, UK
  2. 2. Thomas Keane, WTSI 14th May, 2011 Transposable Elements (TEs) Transposons are segments of DNA that can move within the genome   A minimal ‘genome’ – ability to replicate and change location Dominate landscape of mammalian genomes   38-45% of rodent and primate genomes   Genome size proportional to number of TEs Class 1 (RNA intermediate) and 2 (DNA intermediate) Potent genetic mutagens   Disrupt expression of genes   Genome reorganisation and evolution   Transduction of flanking sequence Transposable elements (TEs) active amongst laboratory mouse strains Mouse Genomes Project: Whole genome sequencing of 17 key laboratory mouse strains   13 classical laboratory strains and 4 wild derived inbred strains   Average of ~25x illumina sequencing per strain
  3. 3. Thomas Keane, WTSI 14th May, 2011 Agouti Mouse Model Dolinoy PNAS 2007;104:13056–13061
  4. 4. Thomas Keane, WTSI 14th May, 2011 Mouse TEs 3 main classes of TEs in mouse genome   Long interspersed nuclear elements (LINE)   Short interspersed nuclear elements (SINE)   Endogenous retrovirus superfamily (ERV)  Etn, IAP, MuLV, IS2, MaLR, VL30, RLTR Key questions   What is the true extent and distribution of TEs in the germline of laboratory mouse strains?   What can we learn about the selective pressure acting on TEs maintained in the germline?   How much phenotypic variation and complex traits can we associate with TEs?
  5. 5. Thomas Keane, WTSI 14th May, 2011 TE Calling Terminology   B6+: Present in the reference genome   B6-: Not present in reference   TEV: Transposable element variant Computational calling methods   B6+  SVMerge* pipeline: Integrate calls from several read-pair based SV ‘deletion’ (!) callers (Kim Wong, WTSI)   B6-  RetroSeq** pipeline developed   Identifies discordant mate pairs and compares to a library of known TE sequences  Size estimation   Full length element (~5-8kb) vs. solo LTR (<1kb)   30-40x physical coverage long fragment (~3kb) end reads (15 strains)   Test if insertion point spanned by 3kb fragment read pairs *Wong K, Keane TM, Stalker J, Adams DJ (2010) SVMerge: Enhanced structural variant and breakpoint detection by integration of multiple detection methods and local assembly, Genome Biology, 11:R128 **RetroSeq available from https://github.com/tk2/RetroSeq
  6. 6. Thomas Keane, WTSI 14th May, 2011 B6+ TEV Example C57B6/NJ strain has the ERV Absent in DBA/2J strain   Flanking spanning read pairs denote absence DBA/2J C57B6/NJ
  7. 7. Thomas Keane, WTSI 14th May, 2011 B6- TEV Example NOD/ShiLtJ   Full length (~8kb) IAP insertion   Not spanned by 3kb fragment reads 3kb fragments Zoomed into breakpoint
  8. 8. Thomas Keane, WTSI 14th May, 2011 TEV Catalog 103,798 TEVs detected   28,951 SINEs   40,074 LINEs   34,773 ERVs Evolutionary context   MP consensus tree based on strain distribution patterns of TEs B6+   44,401 insertions within the C57BL/6J lineage B6-   59,397 TEVs insertions outside of C57BL/6J lineage TEs more frequent in wild strains   13.8-22.4 vs. 4.2-6.3 per Mb Notable expansion/contraction of certain classes   ERVs expanding relative to the other classes   IAPs active amongst ERVs !"#$ %"&%'( %"&%)* %"&%)$ +,"! +-"% ./0 .12 3&4 ! " # $ *566&3!78 *566&3!79 ))( )0#(' )()): )#;0; )#:)' )'$$' )#55) )#;)< )'<0$ )):;( )'50< )<55) #:()0 '$:## 50:#; 1 * = > 3 ? @ A . B C " + D E F 2 =(<*"G5B =(<*"G5DB DHEGAI"JB DE>G/KL"JB 1C%GB 1GB *1"*GMB =#AGANB =*1GB >*1G0B )0;/)G/OIPB )0;20GEI-A7Q )0;/(G/O3O*RQ "2GB S/*G3LB =1/&G3LB 2SCG2KB /2%3&G3LB % 1* *= => >3 3? ?@ @A )#T#)< 'T$($ ;T(:< )T5'$ :T<)) 5T<<00T);# )$$U :$U 5$U '$U 0$U $U 3%! /.D3 ".D3VWR-X ".D3 1* *= => >3 3? ?" "+ +D )T0(( )'0 )0$)#T#)< 'T$($ ;T(:< )T5'$0T);# )$$U :$U 5$U '$U 0$U $U 1* *= => >3 3? ?@ @A )T<## 5$0 )T5): ';5 #'$ 0T$0# )T:(; )$$U :$U 5$U '$U 0$U $U 1* *= => >3 3? ?" "+ +D 5)$ << <$)T<## 5$0 )T5): ';5 #'$ )$$U :$U 5$U '$U 0$U $U !"#$ %"&%'( %"&%)* %"&%)$ +,"! +-"% ./0 .12 3&4 ! " # $ *566&3!78 *566&3!79 ))( )0#(' )()): )#;0; )#:)' )'$$' )#55) )#;)< )'<0$ )):;( )'50< )<55) #:()0 '$:## 50:#; 1 * = > 3 ? @ A . B C " + D E F 2 =(<*"G5B =(<*"G5DB DHEGAI"JB DE>G/KL"JB 1C%GB 1GB *1"*GMB =#AGANB =*1GB >*1G0B )0;/)G/OIPB )0;20GEI-A7Q )0;/(G/O3O*RQ "2GB S/*G3LB =1/&G3LB 2SCG2KB /2%3&G3LB % 1* *= => >3 3? ?@ @A )#T#)< 'T$($ ;T(:< )T5'$ :T<)) 5T<<00T);# )$$U :$U 5$U '$U 0$U $U 3%! /.D3 ".D3VWR-X ".D3 1* *= => >3 3? ?" "+ +D )T0(( )'0 )0$)#T#)< 'T$($ ;T(:< )T5'$0T);# )$$U :$U 5$U '$U 0$U $U 1* *= => >3 3? ?@ @A )T<## 5$0 )T5): ';5 #'$ 0T$0# )T:(; )$$U :$U 5$U '$U 0$U $U 1* *= => >3 3? ?" "+ +D 5)$ << <$)T<## 5$0 )T5): ';5 #'$ )$$U :$U 5$U '$U 0$U $U !"#$ %"&%'( %"&%)* %"&%)$ +,"! +-"% ./0 .12 3&4 ! " # $ *566&3!78 *566&3!79 ))( )0#(' )()): )#;0; )#:)' )'$$' )#55) )#;)< )'<0$ )):;( )'50< )<55) #:()0 '$:## 50:#; 1 * = > 3 ? @ A . B C " + D E F 2 =(<*"G5B =(<*"G5DB DHEGAI"JB DE>G/KL"JB 1C%GB 1GB *1"*GMB =#AGANB =*1GB >*1G0B )0;/)G/OIPB )0;20GEI-A7Q )0;/(G/O3O*RQ "2GB S/*G3LB =1/&G3LB 2SCG2KB /2%3&G3LB % 1* *= => >3 3? ?@ @A )#T#)< 'T$($ ;T(:< )T5'$ :T<)) 5T<<00T);# )$$U :$U 5$U '$U 0$U $U 3%! /.D3 ".D3VWR-X ".D3 1* *= => >3 3? ?" "+ +D )T0(( )'0 )0$)#T#)< 'T$($ ;T(:< )T5'$0T);# )$$U :$U 5$U '$U 0$U $U 1* *= => >3 3? ?@ @A )T<## 5$0 )T5): ';5 #'$ 0T$0# )T:(; )$$U :$U 5$U '$U 0$U $U 1* *= => >3 3? ?" "+ +D 5)$ << <$)T<## 5$0 )T5): ';5 #'$ )$$U :$U 5$U '$U 0$U $U
  9. 9. Thomas Keane, WTSI 14th May, 2011 Callset Validation B6+   Manually annotated all of Chr19 across 8 strains (Flint group, Oxford)   PCR validation of 250 randomly selected calls across 8 strains B6-   PCR validation of 109 calls across 8 strains (Binnaz Yalcin, Oxford)   Initially SINE false positive rate found to be high  Further filtering of low complexity, microsatellites, simple repeats was required  Reduced false positive from ~30% to 9%   False negative determined by examining SDP from PCR data   Size status assignment accurate  >95% of SINEs assigned <3kb status
  10. 10. Thomas Keane, WTSI 14th May, 2011 Structure of ERV Families ! "! #! $! %! &!! '()*+(,- .)-)/012 MuLV VL30 IS2 ETn RLTR1B RLTR45 IAP RLTR10 MaLR 34(5467819:892:;8<=> ! "! #! $! %! &!! ?+@4A 19: MuLV VL30 IS2 ETn RLTR1B RLTR45 IAP RLTR10 MaLR 34(54678)B8C46)D+5892:;8<=> ! E $ F 0)C&!8G,;4;892:;8+68C46)D4 MuLV VL30 IS2 ETn RLTR1B RLTR45 IAP RLTR10 MaLR " # $ 34(5467819:8;)-)8012;8<=> ! "! #! $! %! &!! HG GI IJ J9 9? ?K KL 916 MH' 2012&! N,02 24D,+6+6C 892:; 5’ LTR (~430 nt) 3’ LTR IAP Type I 7.3 kb (full length) gag-pol genes (usually defective) Solo LTR Solo LTR element Recombination of the flanking LTRs
  11. 11. Thomas Keane, WTSI 14th May, 2011 Genomic Sequence Context ! "! #! $! %! &!! '!()"* ')"()#* ')#()$* ')$()%* ')%(#!* '#!(#"* '#"(##* '##(#$* '#$(#%* '#%(+!* '+!(+"* '+"(+#* '+#(&!!* ,-.-/01234564784910:45;<5=>?@5'A* B,5(5C295'A* DEF>5=>? GEF>5=>? >H?5=>? B49;.285DEF> B49;.285GEF> B49;.285>H? I-D?5=>? ! >H? DEF> GEF> >H? DEF> GEF> "#$%&'%#()"#$&*#() >H? DEF> GEF> >J;9 E917;9 K/09L E9147:4928 + ) -./ ! "! #! $! '!()"* ')"()#* ')#()$* ')$()%* ')%(#!* '#!(#"* '#"(##* '##(#$* '#$(#%* '#%(+!* '+!(+"* '+"(+#* '+#(&!!* ,-.-/01234564784910:45 B,5(5C295'A* DEF>5=>? GEF>5=>? >H?5=>? B49;.285DEF> B49;.285GEF> B49;.285>H? I-D?5=>? S5&OM+ K;/P58T09:4 Q5!O"+ Q5!O+ Q5!OM+ Q5& Q5&O"+ Q5&O+ Q5&OM+ R(?0/-4@ Q5&! Q5&! Q5&! Q5&! Q5&!(% (M ($ (+ (# >H? DEF> GEF> >J;9 E917;9 K/09L E9147:4928 + , -./ % DEF>5:49;.28 DEF>5=>? >H?5:49;.28 >H?5=>? GEF>5:49;.28 GEF>5=>? &## ")) )MM $&! N%M &+NM "+%# #&%& $M$+ &!N#$ &MM&& "%$+M #$)$% " & !O+ !O"+ !O&"+ !"#$$%&' " P2@109845'C6* &## ")) )MM $&! N%M &+NM "+%# P2@109845'C6* " & 01 21
  12. 12. Thomas Keane, WTSI 14th May, 2011 5’ and 3’ Relative Densities E917;9 K/09L E9147:4928 , -./ % DEF>5:49;.28 DEF>5=>? >H?5:49;.28 >H?5=>? GEF>5:49;.28 GEF>5=>? &## ")) )MM $&! N%M &+NM "+%# #&%& $M$+ &!N#$ &MM&& "%$+M #$)$% " & !O+ !O"+ !O&"+ !"#$$%&' " P2@109845'C6* &## ")) )MM $&! N%M &+NM "+%# #&%& $M$+ &!N#$ &MM&& "%$+M #$)$% P2@109845'C6* &## ")) )MM $&! N%M &+NM "+%# #&%& $M$+ &!N#$ &MM&& "%$+M #$)$% " & !O+ !O"+ !O&"+ !"#$$%&' " P2@109845'C6* &## ")) )MM $&! N%M &+NM "+%# #&%& $M$+ &!N#$ &MM&& "%$+M #$)$% P2@109845'C6* 01 21 5’ 3’ Sense Anti-sense
  13. 13. Thomas Keane, WTSI 14th May, 2011 Density and Orientation within Genes Distinct anti-sense bias observed in all types   Significantly different bias in first introns between ERVs vs SINEs Orientation bias remains constant despite divergence of element   Biphasic selection process Assuming no sense/anti-sense insertion bias   Implies that half of sense orientated ERVs and one third of SINE/LINEs are deleterious ! " !"# $!# $"# %!# %"# &!# '()*+,*-.*/0#1 "23 &2% $2! 425 627 3"233 3&23% 3$23! !"#!#"$%#&'%&'()&()'#"%)&$*$%#&'+,- 89:; <9:; ;=> ;?@*.A*B ;=> <9:; 89:; !"# $"# %"# &"# 3"# "# C(+DA E(BBF* <GDA HHH HHH ! " !"# $!# $"# %!# %"# &!# '()*+,*-.*/0#1 "23 &2% $2! 425 627 3"233 3&23% 3$23! !"#!#"$%#&'%&'()&()'#"%)&$*$%#&'+,- 89:; <9:; ;=> ;?@*.A*B ;=> <9:; 89:; !"# $"# %"# &"# 3"# "# C(+DA E(BBF* <GDA HHH HHH " !"# $!# $"# %!# %"# &!# '()*+,*-.*/0#1 "23 &2% $2! 425 627 3"233 3&23% 3$23! !"#!#"$%#&'%&'()&()'#"%)&$*$%#&'+,- 89:; <9:; ;=> ;?@*.A*B 89:; C(+DA E(BBF* <GDA HHH
  14. 14. Thomas Keane, WTSI 14th May, 2011 QTLs associated with TEs 29 Table 3: QTLs associated with SVs Phenotype Chr SV start SV stop Ancestral Event Gene SV overlap LogP Mean platelet volume 1 175158884 175158885 insertion Fcer1a upstream 52.833 OFT Total activity 2 144402772 144402974 SINE insertion Sec23b intron 15.721 Hippocampus cellular proliferation marker 4 49690364 49690365 SINE insertion Grin3a intron 20.119 Home cage activity 4 108951264 108951265 ERV insertion Eps15 upstream 15.922 T-cells: %CD3 4 130038389 130038390 SINE insertion Snrnp40 intron 12.129 Wound healing 7 90731819 90731820 ERV insertion Tmc3 upstream 22.216 Red cells: mean cellular haemoglobin 7 111398000 111480000 insertion Trim5 exon 13.016 Red cells: mean cellular haemoglobin 7 111504957 111505193 deletion Trim30b UTR 12.806 Red cells: mean cellular volume 8 87957244 87957245 LINE insertion 4921524J17Rik upstream 18.141 Serum urea concentration 11 115106122 115106250 deletion Tmem104 UTR 13.404 Hippocampus cellular proliferation marker 13 113783196 113783359 deletion Gm6320 upstream 17.456 T-cells: CD4/CD8 ratio 17 34483680 34483681 deletion H2-Ea upstream 82.858 Start and stop coordinates are given for build37 of the mouse genome, so that insertions into the reference are given as consecutive base pairs (columns headed SV start and SV stop). The part of the gene overlapped is reported in the column headed SV overlap. LogP is the negative logarithm of the P-value for association between the SV and the phenotype as assessed in outbred HS mice 22 . Yalcin et al, under review SINE deletion
  15. 15. Thomas Keane, WTSI 14th May, 2011 Eps15 IAP Candidate !"#$% #$% &'( )(( )'( *(( *'( +(( +'( $,-&' !"#$% #$% &( &' )( ' ( #./* &0*(1, #/.* 234 &0)(1, $,-&' 234 !"#$% #$% &'( )(( )'( *(( *'( +(( +'( $,-&' )( #./* $,-&' 0 2000 4000 6000 8000 10000 12000 14000 16000 Eps15/Eps15 -/- Whole Arena Total Distance 0 50 100 150 200 250 Eps15/Eps15 -/- Number Of Entries To Centre Eps15: epidermal growth factor receptor pathway substrate 15
  16. 16. Thomas Keane, WTSI 14th May, 2011 Conclusions Unprecedented catalog (>100k) of mouse TEV elements identified False positive and negative rates are low Wild derived strains contain significantly more TEs Evolutionary context shows expansion of ERVs in mouse lineage Distinct anti-sense bias for all elements within genes Estimate that half of sense orientated ERVs and one third of SINE/ LINEs are deleterious
  17. 17. Thomas Keane, WTSI 14th May, 2011 Acknowledgements Mouse TE Project   Christoffer Nellåker (Oxford)   Wayne Frankel (Jax)   Chris Ponting (Oxford) Mouse Genomes Project   Sanger   Petr Danecek   Kim Wong   David Adams   Richard Durbin   Sanger Sequencing Teams   EBI   Ewan Birney   Wellcome Trust Centre Oxford   Jonathan Flint et al.   Binnaz Yalcin   Avigail Agam   Richard Mott   Jackson Lab   Laura Reinholdt   Leah Rae Donahue Further Information   http://www.sanger.ac.uk/mousegenomes   Contacts   thomas.keane@sanger.ac.uk   christoffer.nellaker@gmail.com   chris.ponting@dpag.ox.ac.uk

×