The missing data issue<br />and the data resurrection miracle<br />[ElCierne ]<br />December 10, 2010<br />
What is the missing data issue<br />Critical Run files are missing/corrupt after the Run folder was transferred from the H...
What causes the missing data issue?<br />Files are not transferred correctly<br />Millisecond hang-ups of the network, whi...
Why is it an issue?<br />Usual workflow crashes: bclConverter does not proceed if there are missing files.<br />December 1...
Solutions to recoverable missing data issues<br />1<br />2<br />3<br />4<br />Copy .stats from the same tile of a differen...
New workflow with OLB<br />Identify missing files, calculate qseq for them and merge with the qseqs from the normal workfl...
Details: If *.stats or *.bcl was missing<br />Start offline base caller (OLB) for the missing tiles<br />Comment out missi...
Solution requires .cifs to be saved <br />Intensity files (*.cif) are not stored by default<br />Remember to tick the safe...
Acknowledgement<br />Thanks to <br />Dr. Steven Leonard, Informatics Division, The Sanger Institute. <br />Eugene, illumin...
Upcoming SlideShare
Loading in …5
×

The missing data issue for HiSeq runs

2,403 views

Published on

Critical Run files can be missing/corrupt after the Run folder was transferred from the HiSeq storage to the cluster storage. This presentation discusses the issue and suggests four workarounds.

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,403
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
21
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • http://new.taringa.net/posts/info/7202836/Sherlock-Holmes.html
  • ILLUMINA:The bclToQseq converter only needs them to pass forward the cluster position information and the intensity averages. The former stays unchanded from one cycle to the next within the same tile, and the latter is only used for building IVC plots. So, the effect of replacing one file with a copy form another cycle will be an IVC plot that&apos;s not 100% accurate at the given tile/cycle. Since you would normally be interested in avegaes across all tiles, the effect of this is really minimal. Still, this is just a workaround and certainly not a long term solution.
  • The missing data issue for HiSeq runs

    1. 1. The missing data issue<br />and the data resurrection miracle<br />[ElCierne ]<br />December 10, 2010<br />
    2. 2. What is the missing data issue<br />Critical Run files are missing/corrupt after the Run folder was transferred from the HiSeq storage to the cluster storage<br />Consequence<br />Config.xmlmight need to be corrected<br />Missing *.bcl, *.stats can be recreated<br />Missing *.filter, *.pos.txtcauses theloss of a tile<br />December 10, 2010<br />
    3. 3. What causes the missing data issue?<br />Files are not transferred correctly<br />Millisecond hang-ups of the network, which are not recognized by windows<br />RTA did not generate files in the first place<br />HiSeq computer overload<br />Mismanagement of parallel threads (two processes accessing the same file)<br />December 10, 2010<br />
    4. 4. Why is it an issue?<br />Usual workflow crashes: bclConverter does not proceed if there are missing files.<br />December 10, 2010<br />
    5. 5. Solutions to recoverable missing data issues<br />1<br />2<br />3<br />4<br />Copy .stats from the same tile of a different cycle<br />PRO: fast <br />CON: fudge, trusts RTA, requires separate workflow for missing *.bcl files<br />Recalculate *.stats from *.dif, *.filter and *.bcl (Sanger)<br />PRO: accurate & fast<br />CON: requires separate workflow for missing *.bcl files, trusts RTA<br />Calculate *.qseqfrom *.cif for missing tile (QBI)<br />PRO: handles missing *.stats, *.bcl<br />CON: slow, trusts RTA<br />Calculate *.qseqfrom *.cif for all tiles<br />PRO: handles missing *.stats, *.bcl, recalculates all – no usage of potentially corrupt RTA bcl/stats files<br />CON: slow (days)<br />December 10, 2010<br />
    6. 6. New workflow with OLB<br />Identify missing files, calculate qseq for them and merge with the qseqs from the normal workflow to proceed<br />December 10, 2010<br />
    7. 7. Details: If *.stats or *.bcl was missing<br />Start offline base caller (OLB) for the missing tiles<br />Comment out missing tile in config.xml and start bclConverter to convert intact tiles<br />(or use setupBclToQseq + bcl2qseq directly with --ignore-missing-bcl or --ignore-missing-stats)<br />Merge *.qseqgenerated from OLB and bclConverter in one directory (BaseCalls_<date>_<user>)<br />Start GERALD to convert to fastq (_sequence.txt)<br />December 10, 2010<br />
    8. 8. Solution requires .cifs to be saved <br />Intensity files (*.cif) are not stored by default<br />Remember to tick the safe intensity box when starting a run<br />Or make it default: In c:/illumina/HiSeqControlSoftware/RTA/RTA.exe.config add<br /><add key="DeleteIntensityFiles" value="0" /> <br />December 10, 2010<br />
    9. 9. Acknowledgement<br />Thanks to <br />Dr. Steven Leonard, Informatics Division, The Sanger Institute. <br />Eugene, illumina tech-support.<br />December 10, 2010<br />

    ×