Characterization/Bioinformatics
Working Group
Chunlin Xiao
Mike Eberle
Characterization
• What are the barriers to submitting data via SRA?
– Experienced submitters are okay with the process
– ...
Integration of SNPs/indels
• Merged NA12878 calls available
– GIAB + RTG + PG
– Readme file explains rules
• Next step is ...
Merging/integrating calls
• Current data release directory includes multiple
versions of files, a bit of confusing to user...
Long Read Technologies
• Incorporating long read technologies?
– PacBio data/calls combined with BioNano creates long
cont...
Upcoming SlideShare
Loading in …5
×

Aug2014 working group report characterization bioinformatics

368 views

Published on

Aug2014 working group report characterization bioinformatics

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
368
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Aug2014 working group report characterization bioinformatics

  1. 1. Characterization/Bioinformatics Working Group Chunlin Xiao Mike Eberle
  2. 2. Characterization • What are the barriers to submitting data via SRA? – Experienced submitters are okay with the process – First time submitter: need simpler instructions for submission – Plan for accepting BioNano long read sequence • What raw sequence data is currently available? – NA12878 (available) – Illumina, 454, SOLiD – Pedigree (available) – Illumina Platinum Genomes, CG – PGP trios (in progress) – Illumina, CG, Ion AmpliSeq exome – Long Reads (in progress)– CG LFR, Illumina Moleculo, PacBio (older), BioNano
  3. 3. Integration of SNPs/indels • Merged NA12878 calls available – GIAB + RTG + PG – Readme file explains rules • Next step is Integrating Illumina, CG & Ion Torrent for PGP trios – Proposal slide will be available for comments – Test first on NA12878 and apply to trios
  4. 4. Merging/integrating calls • Current data release directory includes multiple versions of files, a bit of confusing to users – Create a subdir under ftp/release directory, just containing one file for vcf, one bed file for regions, and one README file • Need to develop a merging tool – Illumina is working on one for PG – RTG has one for normalizing data – GA4GH will need to develop a tool for vcf comparison
  5. 5. Long Read Technologies • Incorporating long read technologies? – PacBio data/calls combined with BioNano creates long contigs/scaffolds – Sequence & calls will be available soon (submitted) • How should we call structural variants? – Spiral Genetics is developing an assembly-based approach – NIST is incorporating a set of rules using to score SVs (~180 annotations per SV) – Bcbio – caller combining multiple callers – All call sets have a bias towards deletions – still a lot of work to do • Adding hg38 reference based call set – Start by aligning with the alternate haplotypes and then mask these regions

×