참고 http://sourceforge.net/apps/mediawiki/cloudburst-bio/index.php?title=Sample_Results[hadoop@skcc-nebdap02 CloudBurst-1.1...
s_suis.br: reference genome in CloudBurst binary format100k.br:   Reads in CloudBurst binary format... 생략 ...[hadoop@skcc-...
GCGGCATAAATTGGATAAAGAAAGAACTGAAGGACA>100000GTTACCATGTATTGTGACAGATAACCACGGTGGAGT[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop d...
NUM_REDUCE_TASKS: 48BLOCK_SIZE: 128REDUNDANCY: 16 Removing old results12/06/15 17:11:27 WARN mapred.JobClient: Use Generic...
12/06/15 17:12:19 INFO mapred.JobClient: map 100% reduce 70%12/06/15 17:12:20 INFO mapred.JobClient: map 100% reduce 87%12...
12/06/15 17:12:32 INFO mapred.JobClient:     Reduce input groups=25419612/06/15 17:12:32 INFO mapred.JobClient:     Combin...
12/06/15 17:13:20 INFO mapred.JobClient:     File Output Format Counters12/06/15 17:13:20 INFO mapred.JobClient:      Byte...
1    1779842 1779878 9        0               -1    86299    86335   10      0           -1    1503808 1503844 11         ...
Cloud burst tutorial
Upcoming SlideShare
Loading in...5
×

Cloud burst tutorial

743

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
743
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cloud burst tutorial

  1. 1. 참고 http://sourceforge.net/apps/mediawiki/cloudburst-bio/index.php?title=Sample_Results[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ lltotal 24332-rw-r--r-- 1 hadoop hadoop 1995984 Jun 15 17:19 100k.3.txt-rw-r--r-- 1 hadoop hadoop 1995984 Dec 5 2008 100k.3.txt.gold-rw-r--r-- 1 hadoop hadoop 4493593 Jun 15 17:06 100k.br-rw-r--r-- 1 hadoop hadoop 4388895 Dec 5 2008 100k.fa-rw-r--r-- 1 hadoop hadoop 1177790 Jun 15 17:06 100k.fa.map-rw-r--r-- 1 hadoop hadoop 8337 Dec 5 2008 cloudburst.err.gold-rw-r--r-- 1 hadoop hadoop 57014 Jul 9 2010 CloudBurst.jar-rw-r--r-- 1 hadoop hadoop 4067962 Jul 9 2010 ConvertFastaForCloud.jar-rw-r--r-- 1 hadoop hadoop 4067959 Jul 9 2010 PrintAlignments.jar-rw-r--r-- 1 hadoop hadoop 1452 Jul 9 2010 README.txtdrwxr-xr-x 2 hadoop hadoop 4096 Jun 15 17:19 results-rw-r--r-- 1 hadoop hadoop 579773 Jun 15 17:06 s_suis.br-rw-r--r-- 1 hadoop hadoop 2040970 Dec 5 2008 s_suis.fa-rw-r--r-- 1 hadoop hadoop 21 Jun 15 17:06 s_suis.fa.map[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ cat README.txtSample data for CloudBurst==========================CloudBurst has several parameters to control the sensitivity of thealignment algorithm. Here it finds the unambiguous best alignment for100,000 reads allowing up to 3 mismatches when mapping to the correspondingS. suis genome.== Sample input datas_suis.fa: Streptococcus suis reference genome sequence100k.fa: 100,000 36bp Illumina reads available from http://www.sanger.ac.uk/Projects/S_suis/== Format the input data$ java -jar ConvertFastaForCloud.jar s_suis.fa s_suis.br$ java -jar ConvertFastaForCloud.jar 100k.fa 100k.br
  2. 2. s_suis.br: reference genome in CloudBurst binary format100k.br: Reads in CloudBurst binary format... 생략 ...[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ head s_suis.fa>Streptococcus_suisatgaaccaagaacaacttttttggcaacgatttattgaattggcaaaggtaaattttaagccatctatttatgatttttatgtcgctgatgcaaaattactcggaatcaaccagcaagttgccaatattttcttaaatcgtccatttaaaaaagatttctgggaaaaaaacttcgaagagttaatgattgccgctagttttgaaagctacggagagcctcttaccatccaatatcaattt... 생략 ...acagaggatgaacaggagattaggaatactacaaacacaagaagttcaatagttcaccaggtacagacacttgagccggctactcctcaagaaacttttaaaccggttcattctgatataaaatcccagtacacctttgctaattttgtacaaggagacaataatcactgggcaaaggctgcagctttagctgtatctgataacctaggtgagctctacaatccattattcatttttggtggtcctggtcttggaaaaactcatattttaaatgcgattggaaataaggttctagccgat[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ wc -l s_suis.fa33460 s_suis.fa[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ head 100k.fa>1GCCTGTTCTTTACATGATTTTTGGTCTAGTGTATGG>2AACCGCTGTAAAGGCTTCTGCCACACCGATTTCTTG>3GAGGTGATTGTGGTATTGT.GGTAAATCGGTGATTG>4GCTTTAGCCGACCTGAACT.GACTACAAGTTGACCA>5AAAGGCTACCCGCGGTTGAACCTTACGTGACACATT[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ tail 100k.fa>99996AATGCCCGTAACAACGGGCTTTTATCTTGTTCTAAA>99997GTCAGATAGCGCAGGAATTTCAAAGGAATTTGGACC>99998AGTTAACTCTTCAGCTGTAAAGTTGTAGTTTTCTAA>99999
  3. 3. GCGGCATAAATTGGATAAAGAAAGAACTGAAGGACA>100000GTTACCATGTATTGTGACAGATAACCACGGTGGAGT[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -mkdir /data/cloudburst[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -put ../CloudBurst-1.1.0/s_suis.br /data/cloudburst[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -put ../CloudBurst-1.1.0/100k.br /data/cloudburst[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop jar ../CloudBurst-1.1.0/CloudBurst.jarUsage: CloudBurst refpath qrypath outpath minreadlen maxreadlen k allowdifferences filteralignments #mappers #reduces#fmappers #freducers blocksize redundancy1. refpath: path in hdfs to the reference file2. qrypath: path in hdfs to the query file3. outpath: path to a directory to store the results (old results are automatically deleted)4. minreadlen: minimum length of the reads5. maxreadlen: maximum read length6. k: number of mismatches / differences to allow (higher number requires more time)7. allowdifferences: 0: mismatches only, 1: indels as well8. filteralignments: 0: all alignments, 1: only report unambiguous best alignment (results identical to RMAP)9. #mappers: number of mappers to use. suggested: #processor-cores * 1010. #reduces: number of reducers to use. suggested: #processor-cores * 211. #fmappers: number of mappers for filtration alg. suggested: #processor-cores12. #freducers: number of reducers for filtration alg. suggested: #processor-cores13. blocksize: number of qry and ref tuples to consider at a time in the reduce phase. suggested: 12814. redundancy: number of copies of low complexity seeds to use. suggested: # processor cores[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop jar ../CloudBurst-1.1.0/CloudBurst.jar /data/cloudburst/s_suis.br/data/cloudburst/100k.br /data/results 36 36 3 0 1 240 48 24 24 128 16 >& cloudburst.err[hadoop@skcc-nebdap02 hadoop]$ cat cloudburst.errrefath: /data/cloudburst/s_suis.brqrypath: /data/cloudburst/100k.broutpath: /data/results-alignmentsMIN_READ_LEN: 36MAX_READ_LEN: 36K: 3SEED_LEN: 9FLANK_LEN: 30ALLOW_DIFFERENCES: 0FILTER_ALIGNMENTS: trueNUM_MAP_TASKS: 240
  4. 4. NUM_REDUCE_TASKS: 48BLOCK_SIZE: 128REDUNDANCY: 16 Removing old results12/06/15 17:11:27 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications shouldimplement Tool for the same.12/06/15 17:11:28 INFO mapred.FileInputFormat: Total input paths to process : 212/06/15 17:11:28 INFO mapred.JobClient: Running job: job_201206112243_001812/06/15 17:11:29 INFO mapred.JobClient: map 0% reduce 0%12/06/15 17:11:47 INFO mapred.JobClient: map 12% reduce 0%12/06/15 17:11:48 INFO mapred.JobClient: map 14% reduce 0%12/06/15 17:11:49 INFO mapred.JobClient: map 15% reduce 0%12/06/15 17:11:50 INFO mapred.JobClient: map 17% reduce 0%12/06/15 17:11:51 INFO mapred.JobClient: map 19% reduce 0%12/06/15 17:11:52 INFO mapred.JobClient: map 21% reduce 0%12/06/15 17:11:53 INFO mapred.JobClient: map 36% reduce 0%12/06/15 17:11:54 INFO mapred.JobClient: map 40% reduce 0%12/06/15 17:11:55 INFO mapred.JobClient: map 45% reduce 0%12/06/15 17:11:56 INFO mapred.JobClient: map 49% reduce 0%12/06/15 17:11:57 INFO mapred.JobClient: map 56% reduce 0%12/06/15 17:11:58 INFO mapred.JobClient: map 57% reduce 0%12/06/15 17:11:59 INFO mapred.JobClient: map 74% reduce 0%12/06/15 17:12:00 INFO mapred.JobClient: map 80% reduce 1%12/06/15 17:12:01 INFO mapred.JobClient: map 80% reduce 2%12/06/15 17:12:02 INFO mapred.JobClient: map 83% reduce 3%12/06/15 17:12:03 INFO mapred.JobClient: map 91% reduce 4%12/06/15 17:12:05 INFO mapred.JobClient: map 95% reduce 6%12/06/15 17:12:06 INFO mapred.JobClient: map 95% reduce 9%12/06/15 17:12:07 INFO mapred.JobClient: map 95% reduce 10%12/06/15 17:12:08 INFO mapred.JobClient: map 100% reduce 14%12/06/15 17:12:09 INFO mapred.JobClient: map 100% reduce 17%12/06/15 17:12:10 INFO mapred.JobClient: map 100% reduce 18%12/06/15 17:12:11 INFO mapred.JobClient: map 100% reduce 22%12/06/15 17:12:13 INFO mapred.JobClient: map 100% reduce 23%12/06/15 17:12:14 INFO mapred.JobClient: map 100% reduce 28%12/06/15 17:12:15 INFO mapred.JobClient: map 100% reduce 31%12/06/15 17:12:17 INFO mapred.JobClient: map 100% reduce 51%12/06/15 17:12:18 INFO mapred.JobClient: map 100% reduce 65%
  5. 5. 12/06/15 17:12:19 INFO mapred.JobClient: map 100% reduce 70%12/06/15 17:12:20 INFO mapred.JobClient: map 100% reduce 87%12/06/15 17:12:21 INFO mapred.JobClient: map 100% reduce 92%12/06/15 17:12:22 INFO mapred.JobClient: map 100% reduce 94%12/06/15 17:12:23 INFO mapred.JobClient: map 100% reduce 98%12/06/15 17:12:26 INFO mapred.JobClient: map 100% reduce 100%12/06/15 17:12:31 INFO mapred.JobClient: Job complete: job_201206112243_001812/06/15 17:12:32 INFO mapred.JobClient: Counters: 3112/06/15 17:12:32 INFO mapred.JobClient: Job Counters12/06/15 17:12:32 INFO mapred.JobClient: Launched reduce tasks=4812/06/15 17:12:32 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=298099212/06/15 17:12:32 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=012/06/15 17:12:32 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=012/06/15 17:12:32 INFO mapred.JobClient: Rack-local map tasks=15812/06/15 17:12:32 INFO mapred.JobClient: Launched map tasks=24112/06/15 17:12:32 INFO mapred.JobClient: Data-local map tasks=8312/06/15 17:12:32 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=110691512/06/15 17:12:32 INFO mapred.JobClient: File Input Format Counters12/06/15 17:12:32 INFO mapred.JobClient: Bytes Read=558710112/06/15 17:12:32 INFO mapred.JobClient: File Output Format Counters12/06/15 17:12:32 INFO mapred.JobClient: Bytes Written=270783612/06/15 17:12:32 INFO mapred.JobClient: FileSystemCounters12/06/15 17:12:32 INFO mapred.JobClient: FILE_BYTES_READ=14051579712/06/15 17:12:32 INFO mapred.JobClient: HDFS_BYTES_READ=611226712/06/15 17:12:32 INFO mapred.JobClient: FILE_BYTES_WRITTEN=28816703012/06/15 17:12:32 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=270783612/06/15 17:12:32 INFO mapred.JobClient: Map-Reduce Framework12/06/15 17:12:32 INFO mapred.JobClient: Map output materialized bytes=14058491712/06/15 17:12:32 INFO mapred.JobClient: Map input records=10003212/06/15 17:12:32 INFO mapred.JobClient: Reduce shuffle bytes=14043627312/06/15 17:12:32 INFO mapred.JobClient: Spilled Records=555865812/06/15 17:12:32 INFO mapred.JobClient: Map output bytes=13495685112/06/15 17:12:32 INFO mapred.JobClient: Total committed heap usage (bytes)=5793631436812/06/15 17:12:32 INFO mapred.JobClient: CPU time spent (ms)=169337012/06/15 17:12:32 INFO mapred.JobClient: Map input bytes=507309212/06/15 17:12:32 INFO mapred.JobClient: SPLIT_RAW_BYTES=2463812/06/15 17:12:32 INFO mapred.JobClient: Combine input records=012/06/15 17:12:32 INFO mapred.JobClient: Reduce input records=2774585
  6. 6. 12/06/15 17:12:32 INFO mapred.JobClient: Reduce input groups=25419612/06/15 17:12:32 INFO mapred.JobClient: Combine output records=012/06/15 17:12:32 INFO mapred.JobClient: Physical memory (bytes) snapshot=5745998233612/06/15 17:12:32 INFO mapred.JobClient: Reduce output records=8112812/06/15 17:12:32 INFO mapred.JobClient: Virtual memory (bytes) snapshot=75487473664012/06/15 17:12:32 INFO mapred.JobClient: Map output records=2779329CloudBurst FinishedAlignment time: 65.36NUM_FMAP_TASKS: 24NUM_FREDUCE_TASKS: 24 Removing old results12/06/15 17:12:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications shouldimplement Tool for the same.12/06/15 17:12:32 INFO mapred.FileInputFormat: Total input paths to process : 4812/06/15 17:12:39 INFO mapred.JobClient: Running job: job_201206112243_001912/06/15 17:12:40 INFO mapred.JobClient: map 0% reduce 0%12/06/15 17:12:54 INFO mapred.JobClient: map 62% reduce 0%12/06/15 17:12:55 INFO mapred.JobClient: map 100% reduce 0%12/06/15 17:13:06 INFO mapred.JobClient: map 100% reduce 16%12/06/15 17:13:07 INFO mapred.JobClient: map 100% reduce 33%12/06/15 17:13:09 INFO mapred.JobClient: map 100% reduce 58%12/06/15 17:13:10 INFO mapred.JobClient: map 100% reduce 75%12/06/15 17:13:12 INFO mapred.JobClient: map 100% reduce 87%12/06/15 17:13:13 INFO mapred.JobClient: map 100% reduce 91%12/06/15 17:13:15 INFO mapred.JobClient: map 100% reduce 100%12/06/15 17:13:20 INFO mapred.JobClient: Job complete: job_201206112243_001912/06/15 17:13:20 INFO mapred.JobClient: Counters: 3112/06/15 17:13:20 INFO mapred.JobClient: Job Counters12/06/15 17:13:20 INFO mapred.JobClient: Launched reduce tasks=2412/06/15 17:13:20 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=20723212/06/15 17:13:20 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=012/06/15 17:13:20 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=012/06/15 17:13:20 INFO mapred.JobClient: Rack-local map tasks=512/06/15 17:13:20 INFO mapred.JobClient: Launched map tasks=4812/06/15 17:13:20 INFO mapred.JobClient: Data-local map tasks=4312/06/15 17:13:20 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=24565112/06/15 17:13:20 INFO mapred.JobClient: File Input Format Counters12/06/15 17:13:20 INFO mapred.JobClient: Bytes Read=2707836
  7. 7. 12/06/15 17:13:20 INFO mapred.JobClient: File Output Format Counters12/06/15 17:13:20 INFO mapred.JobClient: Bytes Written=248504212/06/15 17:13:20 INFO mapred.JobClient: FileSystemCounters12/06/15 17:13:20 INFO mapred.JobClient: FILE_BYTES_READ=218833212/06/15 17:13:20 INFO mapred.JobClient: HDFS_BYTES_READ=271326012/06/15 17:13:20 INFO mapred.JobClient: FILE_BYTES_WRITTEN=603953212/06/15 17:13:20 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=248504212/06/15 17:13:20 INFO mapred.JobClient: Map-Reduce Framework12/06/15 17:13:20 INFO mapred.JobClient: Map output materialized bytes=219510012/06/15 17:13:20 INFO mapred.JobClient: Map input records=8112812/06/15 17:13:20 INFO mapred.JobClient: Reduce shuffle bytes=215379312/06/15 17:13:20 INFO mapred.JobClient: Spilled Records=16208812/06/15 17:13:20 INFO mapred.JobClient: Map output bytes=202820012/06/15 17:13:20 INFO mapred.JobClient: Total committed heap usage (bytes)=1447192166412/06/15 17:13:20 INFO mapred.JobClient: CPU time spent (ms)=9539012/06/15 17:13:20 INFO mapred.JobClient: Map input bytes=270332412/06/15 17:13:20 INFO mapred.JobClient: SPLIT_RAW_BYTES=542412/06/15 17:13:20 INFO mapred.JobClient: Combine input records=8112812/06/15 17:13:20 INFO mapred.JobClient: Reduce input records=8104412/06/15 17:13:20 INFO mapred.JobClient: Reduce input groups=7651112/06/15 17:13:20 INFO mapred.JobClient: Combine output records=8104412/06/15 17:13:20 INFO mapred.JobClient: Physical memory (bytes) snapshot=1316917248012/06/15 17:13:20 INFO mapred.JobClient: Reduce output records=7450212/06/15 17:13:20 INFO mapred.JobClient: Virtual memory (bytes) snapshot=19376190259212/06/15 17:13:20 INFO mapred.JobClient: Map output records=81128FilterAlignments FinishedFiltering time: 48.481Total Running time: 113.841[hadoop@skcc-nebdap02 hadoop]$[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -get /data/results/ ../CloudBurst-1.1.0/results[hadoop@skcc-nebdap02 hadoop]$ cd ../CloudBurst-1.1.0[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ java -jar PrintAlignments.jar results | sort -nk4 > 100k.3.txt[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ head -n 20 100k.3.txt1 766133 766169 1 1 +1 297899 297935 2 0 -1 1325118 1325154 4 1 +1 145970 146006 7 1 -1 553513 553549 8 0 -
  8. 8. 1 1779842 1779878 9 0 -1 86299 86335 10 0 -1 1503808 1503844 11 2 +1 397758 397794 12 0 +1 241778 241814 13 0 -1 626711 626747 14 0 +1 142141 142177 15 1 +1 1401129 1401165 16 1 -1 306289 306325 17 1 +1 628571 628607 18 1 -1 815172 815208 19 0 -1 1624600 1624636 20 0 +1 13779 13815 21 0 +1 129064 129100 22 1 +1 1382938 1382974 24 2 +[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ tail -n 20 100k.3.txt1 1796768 1796804 99976 2 -1 1021128 1021164 99978 0 -1 1350005 1350041 99980 1 +1 799280 799316 99981 2 -1 139518 139554 99983 0 +1 57158 57194 99985 0 +1 1663030 1663066 99986 2 +1 549235 549271 99987 0 -1 1400509 1400545 99988 0 +1 880593 880629 99989 0 +1 918064 918100 99990 0 +1 937994 938030 99992 1 -1 94456 94492 99993 0 +1 1144320 1144356 99994 0 +1 1441627 1441663 99995 0 +1 1281557 1281593 99996 0 +1 1323611 1323647 99997 2 -1 800095 800131 99998 0 -1 1956458 1956494 99999 1 +1 134848 134884 100000 2 -[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ wc -l 100k.3.txt74502 100k.3.txt

×