Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fast Variant Calling with ADAM and avocado

1,567 views

Published on

Slides from the BioBankCloud Hadoop/NGS Workshop at KTH Kista on 2/19/2015. Discusses early work on variant calling using ADAM and avocado.

Published in: Engineering
  • Be the first to comment

Fast Variant Calling with ADAM and avocado

  1. 1. Fast Variant Calling with ADAM and avocado Frank Austin Nothaft, UC Berkeley AMPLab fnothaft@berkeley.edu, @fnothaft 2/19/2015
  2. 2. Data Intensive Genomics • New population-scale experiments will sequence 10-100k samples • 100k samples @ 60x WGS will generate ~20PB of read data and ~300TB of genotype data • End-to-end pipeline latency is important to clinical work • We want to jointly analyze samples to uncover low frequency variations
  3. 3. How can we improve analysis productivity? • Flat file formats sacrifice interoperability but do not improve performance • Common sort order invariants imposed by tools compromise correctness • Genomics APIs tend to be at a lower level of abstraction, which compromises productivity
  4. 4. Our building block: ADAM • ADAM is an open source, high performance, distributed platform for genomic analysis • ADAM defines a: 1. Data schema and layout on disk* 2. Programming interface for distributed processing of genomic data** 3. Command line interface * Via Parquet and Avro ** Work on Python integration is underway
  5. 5. ADAM’s guiding principle: Use a schema as a narrow waist Application Transformations Physical Storage Attached Storage Data Distribution Parallel FS Materialized Data Columnar Storage Evidence Access MapReduce/DBMS Presentation Enriched Models Schema Data Models Variant calling & analysis, RNA-seq analysis, etc. Disk, SDD, block store, memory cache HDFS, Tachyon, HPC file systems, S3 Load data from Parquet and legacy formats Spark, Spark-SQL, Hadoop Enriched Read/Variant Avro Schema for reads, variants, and genotypes Users define analyses via transformations Enriched models provide convenient methods on common models The evidence access layer efficiently executes transformations Schemas define the logical structure of basic genomic objects Common interfaces map logical schema to bytes on disk Parallel file system layer coordinates distribution of data Decoupling storage enables performance/cost tradeoff
  6. 6. Data Format • Schema can be updated without breaking backwards compatibility • Normalize metadata fields into schema for O(1) metadata access • Models are “dumb”; enhance as necessary with rich objects record AlignmentRecord { union { null, Contig } contig = null; union { null, long } start = null; union { null, long } end = null; union { null, int } mapq = null; union { null, string } readName = null; union { null, string } sequence = null; union { null, string } mateReference = null; union { null, long } mateAlignmentStart = null; union { null, string } cigar = null; union { null, string } qual = null; union { null, string } recordGroupName = null; union { int, null } basesTrimmedFromStart = 0; union { int, null } basesTrimmedFromEnd = 0; union { boolean, null } readPaired = false; union { boolean, null } properPair = false; union { boolean, null } readMapped = false; union { boolean, null } mateMapped = false; union { boolean, null } firstOfPair = false; union { boolean, null } secondOfPair = false; union { boolean, null } failedVendorQualityChecks = false; union { boolean, null } duplicateRead = false; union { boolean, null } readNegativeStrand = false; union { boolean, null } mateNegativeStrand = false; union { boolean, null } primaryAlignment = false; union { boolean, null } secondaryAlignment = false; union { boolean, null } supplementaryAlignment = false; union { null, string } mismatchingPositions = null; union { null, string } origQual = null; union { null, string } attributes = null; union { null, string } recordGroupSequencingCenter = null; union { null, string } recordGroupDescription = null; union { null, long } recordGroupRunDateEpoch = null; union { null, string } recordGroupFlowOrder = null; union { null, string } recordGroupKeySequence = null; union { null, string } recordGroupLibrary = null; union { null, int } recordGroupPredictedMedianInsertSize = null; union { null, string } recordGroupPlatform = null; union { null, string } recordGroupPlatformUnit = null; union { null, string } recordGroupSample = null; union { null, Contig } mateContig = null; } Schemas at https://www.github.com/bigdatagenomics/bdg-formats
  7. 7. Parquet • ASF Incubator project, based on Google Dremel • High performance columnar store with support for projections and push-down predicates • Short read data stored in Parquet achieves a 25% improvement in size over compressed BAM Image from Parquet format definition: https://www.github.com/apache/incubator-parquet-format
  8. 8. Backwards Compatibility • Short reads: compatible with SAM, BAM, FASTQ • Convert on read and write • Working on CRAM support • Variants, genotypes, and variant annotations schemas can convert to/from VCF • Support wide variety of genomic annotation formats (e.g., GTF, BED, narrowPeak)
  9. 9. ADAM’s API Design • ADAM is built on top of Apache Spark, which provides the RDD abstraction —> distributed arrays • Common primitives include: • Aggregates: BQSR, Indel Realignment • Bucketing: Duplicate Marking, Concordance • Region Joins: Variant Calling and Filtration
  10. 10. ADAM’s Performance • Achieve linear scalability out to 128 nodes for most tasks • 2-4x improvement over {GATK, samtools,Picard} on single node Analysis run using Amazon EC2, single node was hs1.8xlarge, cluster was m2.4xlarge Scripts available at https://www.github.com/fnothaft/bdg-recipes.git, “sigmod" branch
  11. 11. ADAM: Implementation • 27k LOC (94% Scala) • Apache 2 licensed OSS • 33 contributors across 12 institutions
  12. 12. BDG: ADAM’s Ecosystem ! ADAM:! Core API + CLIs bdg- formats:! Data schemas RNAdam:! RNA analysis on ADAM avocado:! Distributed local assembler PacMin:! Long read assembly eggo:! Datasets
  13. 13. Downstream focus: Genome Resequencing • We’re working on two approaches: • avocado: find variants via local reassembly • PacMin: use long reads to find variants via de novo assembly • We’ll focus on avocado today
  14. 14. What are the challenges? • For accurate INDEL discovery, we want to reassemble variants, but reassembly is expensive • We need to statistically integrate over a large collection of samples to discover low frequency variants • The reference genome is not always representative
  15. 15. avocado performs efficient de Bruijn reassembly ACACTGCACT ACA CAC ACT CTG TGC GCA CAC ACT ACA CAC ACT CTGTGCGCA • Several high accuracy variant callers (GATK, Platypus, Scalpel) reassemble reads aligned at genomic regions • Typically use a de Bruijn graph: nodes are k-mers, and edges represent observed transitions between k-mers
  16. 16. Efficient Local Reassembly • Current methods elaborate all paths through the graph, perform O(hn) realignments at O(lrlh) cost, score O(h2 ) haplotype pairs • Instead, identify “bubbles” and emit statistics directly from the graph: • Eliminate expensive realignment! • Variant alleles are provably canonical. ACA CAC ACT CTGTGCGCA CTTTTCTCA Reference: CTGA Bubble: CTTA h: number of haplotypes (paths), n: number of reads, lr: read length, lh: haplotype length Proofs that alleles are canonical are too long for slides; will gladly share offline.
  17. 17. Otherwise, ACCCAAATCTAATCAAGGC CCCAAATCTAATCAAGGCT CATTGCCATTTACCCTGCT ATTGCCATTTACCCTGCTT TTGCCATTTACCCTGCTTG GAGGAAGAATTTGAGATGA AGGAAGAATTTGAGATGAG GGAAGAATTTGAGATGAGG GACTAAGGAAGATCATGAA ACTAAGGAAGATCATGAAA CTAAGGAAGATCATGAAAT ACTCCCAAGCTCTAGGATA CTCCCAAGCTCTAGGATAT TCCCAAGCTCTAGGATATA GAGGGGACGGATTTGCTGC AGGGGACGGATTTGCTGCC GGGGACGGATTTGCTGCCT TACCAGAGCCTGTTATATT ACCAGAGCCTGTTATATTT CCAGAGCCTGTTATATTTT CATGAAATACCACCATGGG ATGAAATACCACCATGGGG TGAAATACCACCATGGGGA TCAATCAGCAAATTCTGAA CAATCAGCAAATTCTGAAA AATCAGCAAATTCTGAAAT GAGGAGGGCATTAGAATAG AGGAGGGCATTAGAATAGA GGAGGGCATTAGAATAGAA TTTGCTGCCTCTGAGGAGG TTGCTGCCTCTGAGGAGGG TGCTGCCTCTGAGGAGGGC ACTCCAGGAAAAAGTCAGC CTCCAGGAAAAAGTCAGCT TCCAGGAAAAAGTCAGCTG GGCCTAAAAGTACAAAAAA GCCTAAAAGTACAAAAAAA CCTAAAAGTACAAAAAAAC TATCCTTCACCCTGCTTGG ATCCTTCACCCTGCTTGGC TCCTTCACCCTGCTTGGCC AGAAAAATTAGTTTCCAGA GAAAAATTAGTTTCCAGAG AAAAATTAGTTTCCAGAGC TACCACCATGGTGATTCAA ACCACCATGGTGATTCAAT CCACCATGGTGATTCAATC GGTCTAAAAGTACAAAATA GTCTAAAAGTACAAAATAA TCTAAAAGTACAAAATAAC CCAGAGCCAGTTATATTTT CAGAGCCAGTTATATTTTG AGAGCCAGTTATATTTTGA TACCAAGGACAAAGGAAGA ACCAAGGACAAAGGAAGAT CCAAGGACAAAGGAAGATC CCTGCTTGACTTAAAAGTA CTGCTTGACTTAAAAGTAC TGCTTGACTTAAAAGTACA GCTCTAGGACATACCAAGG CTCTAGGACATACCAAGGA TCTAGGACATACCAAGGAC AATCAGCAAAGTCTGAAAT ATCAGCAAAGTCTGAAATG TCAGCAAAGTCTGAAATGC GGAAGATCATGAAATCCCA GAAGATCATGAAATCCCAC AAGATCATGAAATCCCACC NNNNNNNNNNNNTTTCTGA NNNNNNNNNNNTTTCTGAA NNNNNNNNNNTTTCTGAAT CCAGAGCCAGTTATACTTT CAGAGCCAGTTATACTTTG AGAGCCAGTTATACTTTGA ATGAAATCCCACCATGGGG TGAAATCCCACCATGGGGA GAAATCCCACCATGGGGAT AATCAGCCAATTCTGAAAT ATCAGCCAATTCTGAAATG TCAGCCAATTCTGAAATGC GAGATTCAATCAGCAAATT AGATTCAATCAGCAAATTC GATTCAATCAGCAAATTCT CCAGGAAAAAGTCAGCTGT CAGGAAAAAGTCAGCTGTG AGGAAAAAGTCAGCTGTGT AATTTGAGATGAGGGGACG ATTTGAGATGAGGGGACGG TTTGAGATGAGGGGACGGA AATAACACGAAGAAAAATT ATAACACGAAGAAAAATTA TAACACGAAGAAAAATTAG AATCAACGATAGAATATAC ATCAACGATAGAATATACA TCAACGATAGAATATACAG GCCAGTTATATTTTGAAAA CCAGTTATATTTTGAAAAA GCCTAAAAGGACAAAACAA CCTAAAAGGACAAAACAAC CTAAAAGGACAAAACAACA AAAATAACACGAGGAAAAA AAATAACACGAGGAAAAAT AATAACACGAGGAAAAATT GCTTGACTTAAAAGTACAA AACTCCCAAGCTCTAGGAC ACTCCCAAGCTCTAGGACA CTCCCAAGCTCTAGGACAT GAAGAACCATTAGTTACCA AAGAACCATTAGTTACCAG AGAACCATTAGTTACCAGA TCCCAAGCTCTAGGACATA ATCCTTCCCCCTGCTTGGC TCCTTCCCCCTGCTTGGCC CCTTCCCCCTGCTTGGCCT ATCACCCAAAAACCAAGAA TCACCCAAAAACCAAGAAT CACCCAAAAACCAAGAATC AAGGACAAAGGAAGATCAT AGGACAAAGGAAGATCATG GGACAAAGGAAGATCATGA AAAAACACGAAGAACCATT AAAACACGAAGAACCATTA AAACACGAAGAACCATTAG GAGCCAGTTATATTTTGAA AGCCAGTTATATTTTGAAA TTAGTTTCCACAGCCTGTT TAGTTTCCACAGCCTGTTA AGTTTCCACAGCCTGTTAT ACATTATCCTTCACCCTGC CATTATCCTTCACCCTGCT ATTATCCTTCACCCTGCTT CCATCGGAATCCACTCAGC CATCGGAATCCACTCAGCA ATCGGAATCCACTCAGCAA ATACCAAGGACAAAGGAAG GGGACGGATTTGCTGCCTC GGACGGATTTGCTGCCTCT CAAAGCTAATCAAGGCTCC AAAGCTAATCAAGGCTCCC AAGCTAATCAAGGCTCCCA ATTAGTTTCCAGAGCCAGT TTAGTTTCCAGAGCCAGTT TAGTTTCCAGAGCCAGTTA TACCTCCCAAGCTCTAGGA ACCTCCCAAGCTCTAGGAT CCTCCCAAGCTCTAGGATA TGCAACATTGCCATTTACC GCAACATTGCCATTTACCC CAACATTGCCATTTACCCT GTGAAATGCAACATTGCCA TGAAATGCAACATTGCCAT GAAATGCAACATTGCCATT AAATCCCACCATGGGGATT AATCCCACCATGGGGATTC ATCCCACCATGGGGATTCA CAGCAAATTCTGAAATGCN AGCAAATTCTGAAATGCNN GCAAATTCTGAAATGCNNN NNNNNNNNNNNNNNNANAT NNNNNNNNNNNNNNANATT NNNNNNNNNNNNNANATTN ATCAAGGCTCCCACTCTAC TCAAGGCTCCCACTCTACC CAAGGCTCCCACTCTACCT TTATCCTTCACCCTGCTTG TATCCTTCACCCTGCTTGA CCAAGCTCTAGGATATACC CAAGCTCTAGGATATACCA AAGCTCTAGGATATACCAAAGAATCAACGATAGAATAT GAATCAACGATAGAATATA GCCAGTTATATTGTTAAAA CCAGTTATATTGTTAAAAA CAGTTATATTGTTAAAAAT CTCAGCAAATTCTGAAATG TCAGCAAATTCTGAAATGC CAGCAAATTCTGAAATGCA ATATTGTTAAAAATCACCC TATTGTTAAAAATCACCCA ATTGTTAAAAATCACCCAA GATCATGAAATCCCACCAT ATCATGAAATCCCACCATG TCATGAAATCCCACCATGG CCCAAGCTCTAGGACATAC CCAAGCTCTAGGACATACC TTCCACAGCCTGTTATATT TCCACAGCCTGTTATATTT CCACAGCCTGTTATATTTT GAAATACCACCATGGTGAT AAATACCACCATGGTGATT AATACCACCATGGTGATTC CCATGNGGATTCAATCAGC CATGNGGATTCAATCAGCA ATGNGGATTCAATCAGCAA GCCTGTTATATTTTGAAAA CCTGTTATATTTTGAAAAACCTGTTATATTTTGAAAAC ATGGGGATTCAATCAGCAA TGGGGATTCAATCAGCAAA GGGGATTCAATCAGCAAAGGGGGATTCAATCAGCAAAT TATCCTTCACCCCGCTTGG ATCCTTCACCCCGCTTGGC TCCTTCACCCCGCTTGGCC AGGAAGATCATGAAATACC GGAAGATCATGAAATACCA GAAGATCATGAAATACCAC CCAATTCTGAAATGCAACA CAATTCTGAAATGCAACAT AATTCTGAAATGCAACATT ATTCAATCAGCAAATTCTG TTCAATCAGCAAATTCTGA CCCCCTGCTTGGCCTAAAA CCCCTGCTTGGCCTAAAAG CCCTGCTTGGCCTAAAAGT TAGACCAAGGACAAAGGAA AGACCAAGGACAAAGGAAG GACCAAGGACAAAGGAAGA TGCCTCTGAGGAGGGCATT GCCTCTGAGGAGGGCATTA CCTCTGAGGAGGGCATTAG ATCTAATCAAGGCTCCCAC TCTAATCAAGGCTCCCACT CTAATCAAGGCTCCCACTC TATACCAAGGACAAAGGAA TCACCCTGCTTGGCCTAAA CACCCTGCTTGGCCTAAAA ACCCTGCTTGGCCTAAAAG CAATCTGAGGAAGAATTTG AATCTGAGGAAGAATTTGA ATCTGAGGAAGAATTTGAG TCATTATCCTTCCCCCTGC CATTATCCTTCCCCCTGCT ATTATCCTTCCCCCTGCTT CTGCCTCTGAGGAGGGCAT CNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNA TACAAAATAACACGAAGAA ACAAAATAACACGAAGAAA CAAAATAACACGAAGAAAA AACGATAGAATATACAGTA ACGATAGAATATACAGTAC CGATAGAATATACAGTACA AATGCATCATTATCCTTCC ATGCATCATTATCCTTCCC TGCATCATTATCCTTCCCC AGTACAAAATAACACGAAG GTACAAAATAACACGAAGA TTACCAGAGCCTGTTATAT GTACATTCCTTCCCCGGAA TACATTCCTTCCCCGGAAG ACATTCCTTCCCCGGAAGC GTTTCCAGAGCCTGTTATA TTTCCAGAGCCTGTTATAT TTCCAGAGCCTGTTATATT AACTCCAGGAAAAAGTCAG TCGATCAGCAAATTCTGAA CGATCAGCAAATTCTGAAA GATCAGCAAATTCTGAAAT AGCTTCCACAGTTGCATCA GCTTCCACAGTTGCATCAG CTTCCACAGTTGCATCAGC GCATCATTATCCTTCCCCC CATCATTATCCTTCCCCCT ATCATTATCCTTCCCCCTG GATAGAATATACAGTACAT AAAATAACACGAAGAAAAA AAATAACACGAAGAAAAAT ATTCTGAAATGCATCATTA TTCTGAAATGCATCATTAT TCTGAAATGCATCATTATC AGTTTCCAGAGCCTGTTAT ACACGAAGAAAAATTAGTT CACGAAGAAAAATTAGTTT ACGAAGAAAAATTAGTTTC TCTACCTCCCAAGCTCTAG CTACCTCCCAAGCTCTAGG AAAGGAAGATCATGAAATA AAGGAAGATCATGAAATAC AATTAGTTTCCAGAGCCAG TTATCCTTCCCCCTGCTTG TATCCTTCCCCCTGCTTGG GATAGACCAAGGACAAAGG ATAGACCAAGGACAAAGGA CAACGATAGAATATACAGT CCACCATCGGAATCCACTC CACCATCGGAATCCACTCA ACCATCGGAATCCACTCAG CTAGGACATACCAAGGACA TAGGACATACCAAGGACAA AGGACATACCAAGGACAAA AAAAAAACACGAAGAACCA AAAAAACACGAAGAACCAT GGGATTCAATCAGCAAAGTGGGATTCAATCAGCAAATT CAAATTCTGAAATGCAACA AAATTCTGAAATGCAACAT GCTGCCTCTGAGGAGGGCA CCCCGCTTGGCCTAAAAGT CCCGCTTGGCCTAAAAGTA CCGCTTGGCCTAAAAGTAC GAATATACAGTACATTCCT AATATACAGTACATTCCTT ATATACAGTACATTCCTTC TGGGGATTCGATCAGCAAA GGGGATTCGATCAGCAAAT GGGATTCGATCAGCAAATT AACATTATCCTTCACCCTG CACTCTAACTCCCAAGCTC ACTCTAACTCCCAAGCTCT CTCTAACTCCCAAGCTCTA CTAAAAGTACAAAAAAACA CACCATGGTGATTCAATCA ACCATGGTGATTCAATCAG TAAAAGTACAAAAAAACAC TGGCCTAAAAGGACAAAAC GGCCTAAAAGGACAAAACA AATACCACCATGGGGATTC ATACCACCATGGGGATTCA TACCACCATGGGGATTCAA TAACTCCCAAGCTCTAGGA AACTCCCAAGCTCTAGGAT CCAAATCTAATCAAGGCTC CAAATCTAATCAAGGCTCC NATTAGTTTCCAGAGCCTG ATTAGTTTCCAGAGCCTGT TTAGTTTCCAGAGCCTGTT TCTGAAATGCAACATTATC CTGAAATGCAACATTATCC TGAAATGCAACATTATCCT TAGAATAACTCCAGGAAAA AGAATAACTCCAGGAAAAA GAATAACTCCAGGAAAAAG TTCACCCTGCTTGGCCTAA CATTATCCTTCACCCCGCT ATTATCCTTCACCCCGCTT TTATCCTTCACCCCGCTTG TAGTTACCAGAGCCTGTTA AGTTACCAGAGCCTGTTAT GTTACCAGAGCCTGTTATA TCCCACCATGGGGATTCAA CCCACCATGGGGATTCAAT CCACCATGGGGATTCAATC GAAGCTTCCACAGTTGCAT AAGCTTCCACAGTTGCATC ATTCTGAAATGCAACATTA TTCTGAAATGCAACATTAT AACATTGCCATTTACCCTG ACATTGCCATTTACCCTGC CGAAGAAAAATTAGTTTCC GAAGAAAAATTAGTTTCCA GATTCAATCAGCAAAGTCT ATTCAATCAGCAAAGTCTG TTCAATCAGCAAAGTCTGA GCCAGTTATACTTTGAAAA CCAGTTATACTTTGAAAAA ACCATGNGGATTCAATCAG CAAGAATCAACGATAGAAT AAGAATCAACGATAGAATA CACAGCCTGTTATATTTTG ACAGCCTGTTATATTTTGA CAGCCTGTTATATTTTGAA GGATTCGATCAGCAAATTC GATTCGATCAGCAAATTCT GAGGGCATTAGAATAGAAT AGGGCATTAGAATAGAATA GGGCATTAGAATAGAATAA CGAGGAAAAATTAGTTTCC GAGGAAAAATTAGTTTCCA AGGAAAAATTAGTTTCCAG TGCCATTTACCCTGCTTGG GCCATTTACCCTGCTTGGC CCATGGTGATTCAATCAGC AAATTCTGAAATGCNNNNN AATTCTGAAATGCNNNNNN ATTCTGAAATGCNNNNNNN ATTAGTTTCCACAGCCTGT TCGAGGAAAAATTAGTTTC AACACGAGGAAAAATTAGT ACACGAGGAAAAATTAGTT CACGAGGAAAAATTAGTTT TAGGATATACCAAGGACTA AGGATATACCAAGGACTAA GGATATACCAAGGACTAAG GTTTCCAGAGCCAGTTATA TTTCCAGAGCCAGTTATACTTTCCAGAGCCAGTTATAT TTCCAGAGCCAGTTATACTTTCCAGAGCCAGTTATATT TGNGGATTCAATCAGCAAA GNGGATTCAATCAGCAAAT TCGGAATCCACTCAGCAAA CGGAATCCACTCAGCAAAT GGAATCCACTCAGCAAATT CTTCACCCCGCTTGGCCTA TTCACCCCGCTTGGCCTAA TCACCCCGCTTGGCCTAAA CTCGAGGAAAAATTAGTTT CCTTCACCCCGCTTGGCCT GTCTGAAATGCAACATTAT NNNNNNANATTNTNANAAA AAATGCNNNNNNNNNNNNN AATGCNNNNNNNNNNNNNN ATGCNNNNNNNNNNNNNNN ATACCACCATGGTGATTCA NGGATTCAATCAGCAAATT CTTCCCCCTGCTTGGCCTA CATGAAATACCACCATGGT ATGAAATACCACCATGGTG TGAAATACCACCATGGTGA CGAAGAACCATTAGTTACC ATTCGATCAGCAAATTCTG TTCGATCAGCAAATTCTGA AGCCAATCTGAGGAAGAAT GCCAATCTGAGGAAGAATT CCAATCTGAGGAAGAATTT CACCCTGCTTGACTTAAAA ACCCTGCTTGACTTAAAAG CCCTGCTTGACTTAAAAGT AAAGTACAAAAAAACACGA AAGTACAAAAAAACACGAA AGTACAAAAAAACACGAAG CGGATTTGCTGCCTCTGAG GGATTTGCTGCCTCTGAGG GATTTGCTGCCTCTGAGGA CAATCAGCAAAGTCTGAAA ATGCAACATTATCCTTCAC TGCAACATTATCCTTCACCTGCAACATTATCCTTCACA GCAACATTATCCTTCACCCGCAACATTATCCTTCACAC TCCAGAGCCAGTTATATTT AAAATCACCCAAAAACCAA AAATCACCCAAAAACCAAG AATCACCCAAAAACCAAGA AAGTACAAAATAACACGAG AGTACAAAATAACACGAGG GTACAAAATAACACGAGGA AGTTTCCAGAGCCAGTTAT AGGCTCCCACTCTACCTCC GGCTCCCACTCTACCTCCC GCTCCCACTCTACCTCCCA CAAATTCTGAAATGCNNNN TGTTAAAAATCACCCAAAA GTTAAAAATCACCCAAAAA TTAAAAATCACCCAAAAAC TTCCTTCCCCGGAAGCTTC TCCTTCCCCGGAAGCTTCC CCTTCCCCGGAAGCTTCCA TATACCAAGGACTAAGGAA ATACCAAGGACTAAGGAAG TACCAAGGACTAAGGAAGA ATGAGGGGACGGATTTGCT TGAGGGGACGGATTTGCTG TGCNNNNNNNNNNNNNNNN GCNNNNNNNNNNNNNNNNN CCTTCACACTGCTTGGCCT CTTCACACTGCTTGGCCTA TTCACACTGCTTGGCCTAA CCACTCTAACTCCCAAGCT CCTGCTTGGCCTAAAAGTA CCGGAAGCTTCCACAGTTG CGGAAGCTTCCACAGTTGC GGAAGCTTCCACAGTTGCA CAAGGCTCCCACTCTAACT AAGGCTCCCACTCTAACTC AGGCTCCCACTCTAACTCC CCCAAGCTCTAGGATATAC AAATTCTGAAATGCATCAT AATTCTGAAATGCATCATT GTTTCCACAGCCTGTTATA TTTCCACAGCCTGTTATAT AAGATCATGAAATACCACC AACCATTAGTTACCAGAGC ACCATTAGTTACCAGAGCC CCATTAGTTACCAGAGCCT CAAGGACAAAGGAAGATCA CTTGACTTAAAAGTACAAA AGTTCAAAATAACACGAGG GTTCAAAATAACACGAGGA TTCAAAATAACACGAGGAA ATAACACGAGGAAAAATTA TTCCACAGTTGCATCAGCG TCCACAGTTGCATCAGCGT CCACAGTTGCATCAGCGTA CTCCCACTCTACCTCCCAA TCCCACTCTACCTCCCAAG CCCACTCTACCTCCCAAGC CAAGGACTAAGGAAGATCA AAGGACTAAGGAAGATCAT AGGACTAAGGAAGATCATG TGGTGATTCAATCAGCAAA GGTGATTCAATCAGCAAAT GTGATTCAATCAGCAAATT ACATACCAAGGACAAAGGA CATACCAAGGACAAAGGAA AGCCTGTTATATTTTGAAA AGATGAGGGGACGGATTTG GATGAGGGGACGGATTTGC GGAAAAATTAGTTTCCAGA TATACAGTACATTCCTTCC ATACAGTACATTCCTTCCC NNNNNNNNNNNNNNNNANA CTAACTCCCAAGCTCTAGG AGCAAATTCTGAAATGCAAAGCAAATTCTGAAATGCAT AATAACTCCAGGAAAAAGT ATAACTCCAGGAAAAAGTC TAACTCCAGGAAAAAGTCA AGCTAATCAAGGCTCCCAC GCTAATCAAGGCTCCCACT CTCTAGGATATACCAAGGA TCTAGGATATACCAAGGAC CTAGGATATACCAAGGACTCTAGGATATACCAAGGACA GGACTAAGGAAGATCATGA CAAAGGAAGATCATGAAAT AAAGGAAGATCATGAAATC AAGGAAGATCATGAAATCC GAGCCAGTTATACTTTGAA CAACATTATCCTTCACCCCCAACATTATCCTTCACCCT AACATTATCCTTCACCCCG CAGAGCCTGTTATATTTTG AGAGCCTGTTATATTTTGA TTGAGATGAGGGGACGGAT CAGTACATTCCTTCCCCGG AGTACATTCCTTCCCCGGA AGCTCTAGGATATACCAAG GCTCTAGGATATACCAAGG GCTTGGTCTAAAAGTACAA CTTGGTCTAAAAGTACAAA TTGGTCTAAAAGTACAAAA TTGTTAAAAATCACCCAAA TAGGATATACCAAGGACAA AGGATATACCAAGGACAAA GGATATACCAAGGACAAAG CCCAAAAACCAAGAATCAA CCAAAAACCAAGAATCAAC CAAAAACCAAGAATCAACG AATACCACCATCGGAATCC ATACCACCATCGGAATCCA TACCACCATCGGAATCCAC TTACCCTGCTTGGCCTAAA TACCCTGCTTGGCCTAAAA TTCACCCTGCTTGACTTAA TCACCCTGCTTGACTTAAA GACATACCAAGGACAAAGG ATCCTTCACCCTGCTTGAC TCCTTCACCCTGCTTGACT TCCAGAGCCAGTTATACTT ACCACCATCGGAATCCACT CACTCAGCAAATTCTGAAA ACTCAGCAAATTCTGAAAT TTAGAATAGAATAACTCCA TAGAATAGAATAACTCCAG AGAATAGAATAACTCCAGG TGGTCTAAAAGTACAAAAT GGAAAAAGTCAGCTGTGTT GAAAAAGTCAGCTGTGTTG TGCTTGGCCTAAAAGTACA GCTTGGCCTAAAAGTACAA CTTGGCCTAAAAGTACAAA GGATTCAATCAGCAAATTC NNNNNNNNNNNNNNNNNAN CCCCGGAAGCTTCCACAGT CCCGGAAGCTTCCACAGTT TATCCTTCACACTGCTTGG ATCCTTCACACTGCTTGGC TCCTTCACACTGCTTGGCC GAAATACCACCATGGGGAT AAATACCACCATGGGGATT AGATCATGAAATCCCACCA GAATAGAATAACTCCAGGA AATAGAATAACTCCAGGAA CAATCAGCCAATTCTGAAA GAAGAATTTGAGATGAGGG AAGAATTTGAGATGAGGGG TAACACGAGGAAAAATTAG CAGCCAATTCTGAAATGCA AGCCAATTCTGAAATGCAA GCCAATTCTGAAATGCAAC AAATGCAACATTGCCATTT AATGCAACATTGCCATTTA ATGCAACATTGCCATTTAC TCACACTGCTTGGCCTAAA TCTGAGGAAGAATTTGAGA AAAAACCAAGAATCAACGA AAAACCAAGAATCAACGAT AAACCAAGAATCAACGATA TAGTTTCCAGAGCCTGTTA AGTCTGAAATGCAACATTA AACACGAAGAACCATTAGT ACACGAAGAACCATTAGTT CACGAAGAACCATTAGTTA ACAAAATAACACGAGGAAA CAAAATAACACGAGGAAAA NNNNNNNNNNNNNTTTCTG ATTAGAATAGAATAACTCC CCTTCACCCTGCTTGGCCT TCTAACTCCCAAGCTCTAG AAATCTAATCAAGGCTCCC AATCTAATCAAGGCTCCCA ATCCACTCAGCAAATTCTG TCCACTCAGCAAATTCTGA CCACTCAGCAAATTCTGAA CAACATTATCCTTCACACT AACATTATCCTTCACACTG NNNNNNNNNNNNNNNNNTT GAAATGCATCATTATCCTT AAATGCATCATTATCCTTC NNNNNNTTTCTGAATGTTT CTTCCCCGGAAGCTTCCAC TTCCCCGGAAGCTTCCACA TCCCCGGAAGCTTCCACAG GGATAGACCAAGGACAAAG CACCCCGCTTGGCCTAAAA ACCCCGCTTGGCCTAAAAG AAGAAAAATTAGTTTCCAG AGATCATGAAATACCACCA CATTCCTTCCCCGGAAGCT ATTCCTTCCCCGGAAGCTT CAGCAAAGTCTGAAATGCA AGCAAAGTCTGAAATGCAA CTTCACCCTGCTTGGCCTA CACTGCTTGGCCTAAAAGG ACTGCTTGGCCTAAAAGGA CTGCTTGGCCTAAAAGGAC ATCAGCAAATTCTGAAATG GAATTTGAGATGAGGGGAC GCTCCCACTCTAACTCCCA CTCCCACTCTAACTCCCAA TCCCACTCTAACTCCCAAG GAGCCTGTTATATTTTGAA AAAAAGTCAGCTGTGTTGA AAAAGTCAGCTGTGTTGAT GATATACCAAGGACTAAGG ATATACCAAGGACTAAGGA ATAGAATAACTCCAGGAAA NNNNNNNNNNNNNNTTTCT AATCAAGGCTCCCACTCTA ATCAAGGCTCCCACTCTAA TCAAGGCTCCCACTCTAAC CAAATACCACCATCGGAAT AAATACCACCATCGGAATC GCCTAAAAGTACAAAATAA CCTAAAAGTACAAAATAAC CTAAAAGTACAAAATAACA ATTTACCCTGCTTGGCCTA TTTACCCTGCTTGGCCTAA TTGGCCTAAAAGTACAAAA TGGCCTAAAAGTACAAAAATGGCCTAAAAGTACAAAAT ATCATGAAATACCACCATG TCATGAAATACCACCATGG ATTTGCTGCCTCTGAGGAG GATATACCAAGGACAAAGG CTGCTTGGCCTAAAAGTAC TACAGTACATTCCTTCCCC CCTTCACCCTGCTTGACTT CCATGGGGATTCGATCAGC CATGGGGATTCGATCAGCA ATGGGGATTCGATCAGCAA TCAATCAGCAAAGTCTGAA CTGAGGAAGAATTTGAGAT CGCTTGGCCTAAAAGTACA GCAAATTCTGAAATGCAACGCAAATTCTGAAATGCATC CTCTGAGGAGGGCATTAGA TCTGAGGAGGGCATTAGAA CCACCATGGGGATTCGATC CACCATGGGGATTCGATCA ACCATGGGGATTCGATCAG NNNNNNNNNNNNNNNNTTT NNNNNNNNNNNNNNNTTTC TACAAAATAACACGAGGAA ACGAGGAAAAATTAGTTTC TGATTCAATCAGCAAATTC AACCAAGAATCAACGATAG ACCAAGAATCAACGATAGA CCAAGAATCAACGATAGAA AGCTCTAGGACATACCAAG GAACCATTAGTTACCAGAG ACGAAGAACCATTAGTTAC ACAAAAAAACACGAAGAAC CAAAAAAACACGAAGAACC TAATCAAGGCTCCCACTCT AAAATTAGTTTCCAGAGCC AAATTAGTTTCCAGAGCCAAAATTAGTTTCCAGAGCCT AATTAGTTTCCAGAGCCTG CTTCACCCTGCTTGACTTA NNNNNNNNNNNNANATTNT NNNNNNNNNNNANATTNTN CTGAAATGCATCATTATCC TGAAATGCATCATTATCCT GGAGATTCAATCAGCAAAT AGGAAGATCATGAAATCCC ACAGTACATTCCTTCCCCG TAAAAATCACCCAAAAACC AAAAATCACCCAAAAACCA AGTTATATTGTTAAAAATC NNNNNNNNNTTTCTGAATG NNNNNNNNTTTCTGAATGT NNNNNNNTTTCTGAATGTT CAAATTCTGAAATGCATCA AGCCAGTTATACTTTGAAA TCAAAATAACACGAGGAAA TTCCCCCTGCTTGGCCTAA TCCCCCTGCTTGGCCTAAA GTTATATTGTTAAAAATCA TTATATTGTTAAAAATCAC TATATTGTTAAAAATCACC CCATTTACCCTGCTTGGCC CATTTACCCTGCTTGGCCT CATGGTGATTCAATCAGCA ATGGTGATTCAATCAGCAA ACCCAAAAACCAAGAATCA CACCATGGGGATTCAATCA ACCATGGGGATTCAATCAG CCATGGGGATTCAATCAGC AAGTTCAAAATAACACGAG TGCTTGGCCTAAAAGGACA CAAGCTCTAGGACATACCA AAGCTCTAGGACATACCAA GGCATTAGAATAGAATAAC GCATTAGAATAGAATAACT GATCATGAAATACCACCAT CATGGGGATTCAATCAGCA CCTCCCACTCTAACTCCCA CATTATCCTTCACACTGCT ATTATCCTTCACACTGCTT TTATCCTTCACACTGCTTG GGATTCAATCAGCAAAGTC AAGTCTGAAATGCAACATT TGAGATGAGGGGACGGATT GAGATGAGGGGACGGATTT AACACGAAGAAAAATTAGT GACAAAGGAAGATCATGAA ACAAAGGAAGATCATGAAA ACATTATCCTTCACACTGC TGAGGAAGAATTTGAGATG CCCACTCTAACTCCCAAGC ATAGAATATACAGTACATT TAGAATATACAGTACATTC AGAATATACAGTACATTCC ACCAAGGACTAAGGAAGAT CCAAGGACTAAGGAAGATC TACAAAAAAACACGAAGAA CTGAGGAGGGCATTAGAAT TGAGGAGGGCATTAGAATA CATGAAATCCCACCATGGG AAAAGTTCAAAATAACACG AAAGTTCAAAATAACACGA ACCACCATGGGGATTCAAT GGACATACCAAGGACAAAG TAAGGAAGATCATGAAATA TAAAAGTACAAAATAACAC CCTGCTTGGTCTAAAAGTA CTGCTTGGTCTAAAAGTAC TGCTTGGTCTAAAAGTACA GAAATGCAACATTATCCTT GCAAAGTCTGAAATGCAAC CAAAGTCTGAAATGCAACA AAATGCAACATTATCCTTC AATGCAACATTATCCTTCA GAATCCACTCAGCAAATTC AATCCACTCAGCAAATTCT AAGGCTCCCACTCTACCTC ACGGATTTGCTGCCTCTGA AGAATTTGAGATGAGGGGA AAAAGTACAAAAAAACACG CACAGTTGCATCAGCGTAG TCTGAAATGCNNNNNNNNN CTGAAATGCNNNNNNNNNN TGAAATGCNNNNNNNNNNN GAAATGCNNNNNNNNNNNN NNNNNNNNNNANATTNTNA NNNNNNNNNANATTNTNAN GCTTGGCCTAAAAGGACAA CTTGGCCTAAAAGGACAAA ATTAGTTACCAGAGCCTGT TTAGTTACCAGAGCCTGTT GGCTCCCACTCTAACTCCC GACGGATTTGCTGCCTCTG CACACTGCTTGGCCTAAAA ACACTGCTTGGCCTAAAAG CCCAAAGCTAATCAAGGCT CCAAAGCTAATCAAGGCTC GAGCCAATCTGAGGAAGAA CCACTCTACCTCCCAAGCT CCCTGCTTGGTCTAAAAGT CATTAGAATAGAATAACTC ATATACCAAGGACAAAGGA TTCTGAAATGCNNNNNNNN TCCAGAGCCTGTTATATTT ACTCTACCTCCCAAGCTCT CTCTACCTCCCAAGCTCTA AAAAGTACAAAATAACACG AAAGTACAAAATAACACGA AAGTACAAAATAACACGAA AAAGTCTGAAATGCAACAT TAAAAGGACAAAACAACAG GTACAAAAAAACACGAAGA ACAGTTGCATCAGCGTAGA GGCCTAAAAGTACAAAATA TTGGCCTAAAAGGACAAAA CACTCTACCTCCCAAGCTC ACCCTGCTTGGTCTAAAAG TAAAAGTTCAAAATAACAC CCTAAAAGTTCAAAATAAC CTAAAAGTTCAAAATAACA ACTCGAGGAAAAATTAGTT NNNNNNNNANATTNTNANA NNNNNNNANATTNTNANAA AGCCAGTTATATTGTTAAA CATTAGTTACCAGAGCCTG ACATTATCCTTCACCCCGC ACCCAAAGCTAATCAAGGC This is 200 bases from…
  18. 18. performing reassembly CACGGACGGCCCGCCAGTC ACGGACGGCCCGCCAGTCA CGGACGGCCCGCCAGTCATGAGTGGGGTCTCCAGTCAT AGTGGGGTCTCCAGTCATT GTGGGGTCTCCAGTCATTA AAATGTGTAATTTCATGAG AATGTGTAATTTCATGAGT ATGTGTAATTTCATGAGTG ACATCAGAAAACTGAGAAT CATCAGAAAACTGAGAATC ATCAGAAAACTGAGAATCA AAAGTCCCTCCCCCTAAGG AAGTCCCTCCCCCTAAGGC AGTCCCTCCCCCTAAGGCT CCCGCTGACAGGCCCCCAG CCGCTGACAGGCCCCCAGT CGCTGACAGGCCCCCAGTC GCCTAGGAGAAAGCAACAT CCTAGGAGAAAGCAACATG CTAGGAGAAAGCAACATGA ATTCAAGCTCCAAGAAACA TTCAAGCTCCAAGAAACAA TCAAGCTCCAAGAAACAAA GCCTAGGAGATAGCAACAT CCTAGGAGATAGCAACATG CTAGGAGATAGCAACATGA GTGGCTATCCCCCTGAGGG TGGCTATCCCCCTGAGGGG GGCTATCCCCCTGAGGGGC AATTGTAAGAACTGCCCTC ATTGTAAGAACTGCCCTCC TTGTAAGAACTGCCCTCCC GTGTATATTGGTGGCTATC TGTATATTGGTGGCTATCC GTATATTGGTGGCTATCCC GTAATTGTAAGAACTGCCC TAATTGTAAGAACTGCCCT CCCCGTAAAGCTTTCACAC CCCGTAAAGCTTTCACACT CCGTAAAGCTTTCACACTT ACTCCCGGGCCGCCAGTCA CTCCCGGGCCGCCAGTCAT TCCCGGGCCGCCAGTCATT GCCTCAGTGTATATATGAG CCTCAGTGTATATATGAGG CTCAGTGTATATATGAGGC ACTCATCAGAAAACTGAGA CTCATCAGAAAACTGAGAA TCATCAGAAAACTGAGAAT GTCTTTACTGGTGCTCTTC TCTTTACTGGTGCTCTTCC CTTTACTGGTGCTCTTCCC TCCCCCTGACGGCCCGCCA CCCCCTGACGGCCCGCCAG CCCCTGACGGCCCGCCAGT TTTACTGGTGCTCTTCCCA TTACTGGTGCTCTTCCCAC TACTGGTGCTCTTCCCACT GAAAAATCATCAGAAAACT AAAAATCATCAGAAAACTAAAAAATCATCAGAAAACTG AAAATCATCAGAAAACTAAAAAATCATCAGAAAACTGA AGACAAACCCTTGAAAAAA GACAAACCCTTGAAAAAAA ACAAACCCTTGAAAAAAAG CTACCCCACTCCCGGGCCG TACCCCACTCCCGGGCCGC ACCCCACTCCCGGGCCGCC NNTCAGAAAACTGAGAATC NTCAGAAAACTGAGAATCA TCAGAAAACTGAGAATCAA AGTTATACTTTGAAAAATC GTTATACTTTGAAAAATCA TTATACTTTGAAAAATCAT ACACTTGCCTCAGTGTAAA CACTTGCCTCAGTGTAAAT ACTTGCCTCAGTGTAAATA CCCCCAGTCATAAAATTCA CCCCAGTCATAAAATTCAA CCCAGTCATAAAATTCAAG TATCCCACTGACAGGCCGC ATCCCACTGACAGGCCGCC TCCCACTGACAGGCCGCCA GAAAGTTCCTCCCCCTAAA AAAGTTCCTCCCCCTAAAG AAGTTCCTCCCCCTAAAGC CAGAAAACTAAGAATCAAG AGAAAACTAAGAATCAAGG GAAAACTAAGAATCAAGGA TGACAGGCCCCCAGTCATT GACAGGCCCCCAGTCATTA ACAGGCCCCCAGTCATTAA GGCATTAAATTCAAGCTCC GCATTAAATTCAAGCTCCA CATTAAATTCAAGCTCCAA AGCTTTCACTCTTGCCTCA GCTTTCACTCTTGCCTCAG CTTTCACTCTTGCCTCAGT AAAAGCCAGCCTAGGAGAA AAAGCCAGCCTAGGAGAAA AAGCCAGCCTAGGAGAAAG AAGGGACAAAGCAGTAAAA AGGGACAAAGCAGTAAAAT GGGACAAAGCAGTAAAATG ACTCTTGCCTCAGTGTATA CTCTTGCCTCAGTGTATAT TCTTGCCTCAGTGTATATA CCTCGGAGAAAGCAACATG CTCGGAGAAAGCAACATGA TCGGAGAAAGCAACATGAT GCTTTCACACTTGCCTCAG CTTTCACACTTGCCTCAGT TTTCACACTTGCCTCAGTG CCGGACCCCCAGTCATAAA CGGACCCCCAGTCATAAAA GGACCCCCAGTCATAAAAT TTTGCCCTAAAGATTTCAC TTGCCCTAAAGATTTCACA TGCCCTAAAGATTTCACAC CAAGGGACAAAGCAGTAAA GCCAGTTATATTTTGAAAA CCAGTTATATTTTGAAAAA CAGTTATATTTTGAAAAAT ATACTTTGAAAAATCATCA TACTTTGAAAAATCATCAG ACTTTGAAAAATCATCAGA AAAATGTGTAATTTCATGA CCACTCCCGGGCCGCCAGT CACTCCCGGGCCGCCAGTC TAGAAAGTTCCTTCCCCTA AGAAAGTTCCTTCCCCTAA GAAAGTTCCTTCCCCTAAA AGGCTATACCACTGACGGG GGCTATACCACTGACGGGC GCTATACCACTGACGGGCC ATGCAAGCTCCAAGAGACA TGCAAGCTCCAAGAGACAA GCAAGCTCCAAGAGACAAA CATTAAATTCAACCACCAA ATTAAATTCAACCACCAAG TTAAATTCAACCACCAAGA ATCAAGGATAGACTTTCTA TCAAGGATAGACTTTCTAG CAAGGATAGACTTTCTAGA TTCAACCCTGGCCTCAGTG TCAACCCTGGCCTCAGTGT CAACCCTGGCCTCAGTGTA TGAAAAATCATCAGAAAAC CCCACTCCCGGGCCGCCAG CACACTTGCCTAGGTGAAT ACACTTGCCTAGGTGAATA CACTTGCCTAGGTGAATAT GTATATATGGGGCTATACC TATATATGGGGCTATACCA ATATATGGGGCTATACCAC CCTTTGACAGGCCGCCAGT CTTTGACAGGCCGCCAGTC TTTGACAGGCCGCCAGTCA AGCTTCCACACTTGCCTCA GCTTCCACACTTGCCTCAG CTTCCACACTTGCCTCAGT TCATGAGTGGGGTCTCCAG CATGAGTGGGGTCTCCAGT ATGAGTGGGGTCTCCAGTC CTTTGAAAAATCATCAGAA CCCCCTAAAGCTTCAACCC CCCCTAAAGCTTCAACCCT CCCTAAAGCTTCAACCCTG CAGGCATTAAATTCAAGCT AGGCATTAAATTCAAGCTC AAATGTGATTTGCCCAGGA AATGTGATTTGCCCAGGAG ATGTGATTTGCCCAGGAGG AGGGGCCGCCAGTCATTAA GGGGCCGCCAGTCATTAAA GGGCCGCCAGTCATTAAAT CCACTTCCCTCAGTGTATA CACTTCCCTCAGTGTATAT ACTTCCCTCAGTGTATATA GTGGCTATCCCACTGACGG TGGCTATCCCACTGACGGG GGCTATCCCACTGACGGGC AGCCCCAAGAGACAAACCC GCCCCAAGAGACAAACCCT CCCCAAGAGACAAACCCTT CTAGGAGAAAGAAACATGA TAGGAGAAAGAAACATGAT AGGAGAAAGAAACATGATT TTTTGGAAAAAAAGGCAGC TTTGGAAAAAAAGGCAGCC TTGGAAAAAAAGGCAGCCT TTCCTCCCCCTAAAGCTTT TCCTCCCCCTAAAGCTTTC CCTCCCCCTAAAGCTTTCA AGAAACAAACTCTTGAAAA GAAACAAACTCTTGAAAAA AAACAAACTCTTGAAAAAA ACTGGTGCTCTTCCCACTT CATTAAATGCAAGCTCCAA ATTAAATGCAAGCTCCAAG TTAAATGCAAGCTCCAAGA CCCAAGAGACAAACCCTTG GAAAATTTGTGCCTACCCC AAAATTTGTGCCTACCCCA AAATTTGTGCCTACCCCAC AAGAATCAAGGATAGAATT AGAATCAAGGATAGAATTT GAATCAAGGATAGAATTTC ACCACCGACGGCCCGCCAG CCACCGACGGCCCGCCAGG CACCGACGGCCCGCCAGGC CACACTTGCCTCGGTGTAT ACACTTGCCTCGGTGTATA CACTTGCCTCGGTGTATAT GAGGCTATACCACTGACGG AAAGAGACAAACTCTTGAA AAGAGACAAACTCTTGAAA AGAGACAAACTCTTGAAAA GGCAGCCTAGGAGAAAGCA GCAGCCTAGGAGAAAGCAA CAGCCTAGGAGAAAGCAAC AGTGTATATATGTGGCTAT GTGTATATATGTGGCTATCGTGTATATATGTGGCTATA TGTATATATGTGGCTATCCTGTATATATGTGGCTATAC TCTAGAAAGTACCTTCCCC CTAGAAAGTACCTTCCCCT TAGAAAGTACCTTCCCCTA TATAGGTGGGTATCCCGCT ATAGGTGGGTATCCCGCTG TAGGTGGGTATCCCGCTGA TTTCCCACTTCCCTCAGTG TTCCCACTTCCCTCAGTGT TCCCACTTCCCTCAGTGTA GGATAGAAGTTCTAGAAAG GATAGAAGTTCTAGAAAGT ATAGAAGTTCTAGAAAGTT CAAAGCAGTAAAATGTGTA AAAGCAGTAAAATGTGTAA AAGCAGTAAAATGTGTAAT TAAAATGTGTAATTTCATG GCCTAGGTGAATATAGGTG CCTAGGTGAATATAGGTGG CTAGGTGAATATAGGTGGG GCCTGTTATATTTTGAAAA CCTGTTATATTTTGAAAAACCTGTTATATTTTGAAAAC CTGTTATATTTTGAAAAATCTGTTATATTTTGAAAAAACTGTTATATTTTGAAAACT TTGACAGGCCGCCAGTCAT TGACAGGCCGCCAGTCATT GACAGGCCGCCAGTCATTA TAGAATTTCTAGAAATTTC AGAATTTCTAGAAATTTCC GAATTTCTAGAAATTTCCT AGGAGAAAGCAGCATGATT GGAGAAAGCAGCATGATTA GAGAAAGCAGCATGATTAT GGCCCGCCAGTCATTAAAT GCCCGCCAGTCATTAAATT CCCGCCAGTCATTAAATTC TGTAAGAACTGCCCTCCCC ATGTGCCTATACCACGGAC TGTGCCTATACCACGGACG GTGCCTATACCACGGACGG CCACCAAGAGACAAACTCT CACCAAGAGACAAACTCTT ACCAAGAGACAAACTCTTG CAGGCCGCCAGTCATTAAA AGGCCGCCAGTCATTAAAT GGCCGCCAGTCATTAAATT TAAAGATTTCACACTTGTG AAAGATTTCACACTTGTGT AAGATTTCACACTTGTGTC CTTTCCCACTTCCCTCAGT TTNTNANAAATNNTCAGAA TNTNANAAATNNTCAGAAA NTNANAAATNNTCAGAAAA CCTAAAGATTTCACACTTG CTAAAGATTTCACACTTGT AAAGAAACATGATTTTTCA TAAATTCAACCACCAAGAG AAATTCAACCACCAAGAGA TGGCTATTCCTTTGACAGG GGCTATTCCTTTGACAGGC GCTATTCCTTTGACAGGCC CCTTCCCCTAAAGCTTTCA CTTCCCCTAAAGCTTTCAC TTCCCCTAAAGCTTTCACTTTCCCCTAAAGCTTTCACA AAGCTTTCACTCTTGCCTC TGCTCTTCCCACTTCCGGA GCTCTTCCCACTTCCGGAC CTCTTCCCACTTCCGGACC CCCCTGACAGGCCGCCAGT CCCTGACAGGCCGCCAGTC CCTGACAGGCCGCCAGTCA TATATTTTGAAAAAACATC ATATTTTGAAAAAACATCA TATTTTGAAAAAACATCAG ACTGGCCTCAGTGTATATA CTGGCCTCAGTGTATATAT TGGCCTCAGTGTATATATG ACAGGCCGCCAGTCATTAA CTAGAAATTTCCTTCCCCT TAGAAATTTCCTTCCCCTA AGAAATTTCCTTCCCCTAA CTTTCACACTGGCCTCAGT TTTCACACTGGCCTCAGTG TTCACACTGGCCTCAGTGT CTCCCCCTAAAGCTTTCAC CCCACTGACAGGCCGCCAG CCACTGACAGGCCGCCAGT GCTGACAGGCCCCCAGTCA CTGACAGGCCCCCAGTCAT AACTCATCAGAAAACTGAG CCCTAAAGATTTCACACTT TCACACTGGCCTCAGTGTA NTTTCTGAATGTTTCTTAG AGAAAGTCCCTCCCCCTAA GAAAGTCCCTCCCCCTAAG CAGTCATTAAATTCAAACT AGTCATTAAATTCAAACTC GTCATTAAATTCAAACTCC ATGTGGCTATACCACTTAC TGTGGCTATACCACTTACG GTGGCTATACCACTTACGG GGGCGTCCAGTCATTAAAT GGCGTCCAGTCATTAAATT GCGTCCAGTCATTAAATTC ACGGCCCGCCAGTCATTAA CGGCCCGCCAGTCATTAAA CTCCAAGAGACAAACCCTT TCCAAGAGACAAACCCTTG CCAAGAGACAAACCCTTGA CTATTCCTTTGACAGGCCG CTCCCCGTAAAGCTTTCAC TCCCCGTAAAGCTTTCACA CAAGCTCCAAGAGACAAAC CTCCCCCTAAGGCTTTCAC TCCCCCTAAGGCTTTCACA CCCCCTAAGGCTTTCACAC CTAGAAAGTTCCTTCCCCT CAATATATGTGACTACACC AATATATGTGACTACACCA ATATATGTGACTACACCAC ATTAAATTCAAGCTCCAAG TTAAATTCAAGCTCCAAGA TAAATTCAAGCTCCAAGAGTAAATTCAAGCTCCAAGAATAAATTCAAACTCCAAGAG AAATTCAAACTCCAAGAGA AATTCAAACTCCAAGAGAC GTGGCTATACCACTGACAG TGGCTATACCACTGACAGG GGCTATACCACTGACAGGC TGTGATTTGCCCAGGAGGG TCTAGAAAGTCCCTCCCCC CTAGAAAGTCCCTCCCCCT TAGAAAGTCCCTCCCCCTA TCAAAGAGACAAACTCTTG CAAAGAGACAAACTCTTGA AGGCAGCCTAGGAGAAAGA GGCAGCCTAGGAGAAAGAA GCAGCCTAGGAGAAAGAAA AAGCTCCAAGAGACAAACCAAGCTCCAAGAGACAAACT AGCTCCAAGAGACAAACCCAGCTCCAAGAGACAAACTC CATTAAATTCAAACTCCAA ATTAAATTCAAACTCCAAG TTAAATTCAAACTCCAAGA AAACTGAGAATCAAGGATA AACTGAGAATCAAGGATAG ACTGAGAATCAAGGATAGA TTGAAAAAAAGCCAGCCTA TGAAAAAAAGCCAGCCTAG GAAAAAAAGCCAGCCTAGG CTTGCCTCAGTGTATATAT TTGCCTCAGTGTATATATG TGCCTCAGTGTATATATGTTGCCTCAGTGTATATATGATGCCTCAGTGTATATATGG AAAGCAACCGGATTTTTCA TAGAAAGTTCATTCCCCTA AGAAAGTTCATTCCCCTAA GAAAGTTCATTCCCCTAAA TCAAGGATAGAATTTCTAG CAAGGATAGAATTTCTAGA AAGGATAGAATTTCTAGAA GACAAGTTTTGGAAAAAAA ACAAGTTTTGGAAAAAAAG CAAGTTTTGGAAAAAAAGG AATTTCTAGAAAGTTCCTT ATTTCTAGAAAGTTCCTTC TTTCTAGAAAGTTCCTTCC TTCCCACTTCCGGACCCCC TCCCACTTCCGGACCCCCA CCCACTTCCGGACCCCCAG ACTTGCCTCGGTGTATATA CTTGCCTCGGTGTATATAT AGCTTCAACCCTGGCCTCA GCTTCAACCCTGGCCTCAG CTTCAACCCTGGCCTCAGT AACAAATGTGATTTGCCCA ACAAATGTGATTTGCCCAG CAAATGTGATTTGCCCAGG AGACAAGTTTTGGAAAAAA TTATATATGTGGCTATCCC TATATATGTGGCTATCCCA ATATATGTGGCTATCCCAC TCAAGCTCCAAGAGACAAA TCCGAGGAGAAAGCAACCG CCGAGGAGAAAGCAACCGG CGAGGAGAAAGCAACCGGA ATGTGGGTATACCACTGAC TGTGGGTATACCACTGACA GTGGGTATACCACTGACAG ATGAGGCTATACCACTGAC TGAGGCTATACCACTGACG ACACTTGCCACAGTGAAAA CACTTGCCACAGTGAAAAT ACTTGCCACAGTGAAAATT TCTTGAAAAAAAGGCAGCC CTTGAAAAAAAGGCAGCCT TTGAAAAAAAGGCAGCCTATTGAAAAAAAGGCAGCCTC GGCCTCAGTGTATATATGT GCCTCAGTGTATATATGTG CTGAGGGGCCGCCAGTCAT TGAGGGGCCGCCAGTCATT GAGGGGCCGCCAGTCATTA AATTTCCTTCCCCTAAACC ATTTCCTTCCCCTAAACCT TTTCCTTCCCCTAAACCTT CAAGAGACAAACCCTTGAA TGTATATATGAGGCTATAC GTATATATGAGGCTATACC TATATATGAGGCTATACCA TATTGGTGGCTATCCCCCT ATTGGTGGCTATCCCCCTG TTGGTGGCTATCCCCCTGA TAAAGCTTTCCCACTTCCC AAAGCTTTCCCACTTCCCT AAGCTTTCCCACTTCCCTC CCAAGAAACAAACTCTTGA CAAGAAACAAACTCTTGAA AAGAAACAAACTCTTGAAA TCCCGCTGACAGGCCCCCA TATGAGGCTATACCACTGA AGAAAGCAGCATGATTATT TTCCACACTTGCCTCAGTG CAAGCTCAAAGAGACAAAC AAGCTCAAAGAGACAAACT AGCTCAAAGAGACAAACTC GTATATATGTGGCTATACC TATATATGTGGCTATACCA ATATATGTGGCTATACCAC TTCTTTGCCCTAAAGATTT TCTTTGCCCTAAAGATTTC CTTTGCCCTAAAGATTTCA AAAGTTCCTTCCCCTAAAG AAGTTCCTTCCCCTAAAGC AGTTCCTTCCCCTAAAGCT ATATGTGCCTATACCACGG TATGTGCCTATACCACGGA TNNTCAGAAAACTGAGAAT TATATATGTGGGTATACCA ATATATGTGGGTATACCAC TATATGTGGGTATACCACT CACTGACGGGCCGCCAGTC ACTGACGGGCCGCCAGTCAACTGACGGGCCGCCAGTCC CTGACGGGCCGCCAGTCATCTGACGGGCCGCCAGTCCT AGTTCCTCCCCCTAAAGCT GTTCCTCCCCCTAAAGCTT CCACACTTGCCTCAGTGTA CACACTTGCCTCAGTGTAA AGGTGGGTATCCCGCTGAC CCACTTCCGGACCCCCAGT CACTTCCGGACCCCCAGTC CCCTTGCCTCAGTGTATAT CCTTGCCTCAGTGTATATA AAAAAAAGGCAGCCTCGGA AAAAAAGGCAGCCTCGGAG AAAAAGGCAGCCTCGGAGA TAAAGCTTTCACTCTTGCC AAAGCTTTCACTCTTGCCT CTAAAGCTTCAACCCTGGC TAAAGCTTCAACCCTGGCC AAAGCTTCAACCCTGGCCT GAGAAAGCAACATGATTTT AGAAAGCAACATGATTTTT GAAAGCAACATGATTTTTC GTGGCTATCCCCCTGACGG TGGCTATCCCCCTGACGGC GGCTATCCCCCTGACGGCC GTAAAGCTTTCACACTTGC TAAAGCTTTCACACTTGCC AAAGCTTTCACACTTGCCT TTCTAGAAAGTTCTTTGCC TCTAGAAAGTTCTTTGCCC CTAGAAAGTTCTTTGCCCT ACGGCCCGCCAGGCATTAA CGGCCCGCCAGGCATTAAA GGCCCGCCAGGCATTAAAT CAGTGTATATATGAGGCTA AGTGTATATATGAGGCTAT GTGTATATATGAGGCTATA ATATGTGGGTATACCACTG TATGTGGGTATACCACTGA ATTCAACCACCAAGAGACA TTCAACCACCAAGAGACAA TCAACCACCAAGAGACAAA TGAAAAAAAGGCAGCCTCG GAAAAAAAGGCAGCCTCGG GCTCCAAGAGACAAACTCT CTCCAAGAGACAAACTCTT CCCTAAAGCTTCCACACTT CCTAAAGCTTCCACACTTG CTAAAGCTTCCACACTTGC AGCTTCCACACTTGCCTAG GCTTCCACACTTGCCTAGG CTTCCACACTTGCCTAGGT CACTGGCCTCAGTGTATAT ATTTCATGAGTGGGGTCTC TTTCATGAGTGGGGTCTCC TTCATGAGTGGGGTCTCCA AGGATAGAATTTCTAGAAA ACTCAAGGGACAAAGCAGT CTCAAGGGACAAAGCAGTA TCAAGGGACAAAGCAGTAA CCTGAGGGGCCGCCAGTCA AAAAAAAGCCAGCCTAGGA GGGGCTATACCACTGACAG GGGCTATACCACTGACAGG AAATTTCCTTCCCCTAAAC GTAAAATGTGTAATTTCAT AGATTTCACACTTGTGTCA TCAAGCTCCAAGAGACAAG CAAGCTCCAAGAGACAAGT AAGCTCCAAGAGACAAGTT GCCAGTTATACTTTGAAAA CCAGTTATACTTTGAAAAA CAGTTATACTTTGAAAAAT CCAAGAGACAAGTTTTGGA CAAGAGACAAGTTTTGGAA AAGAGACAAGTTTTGGAAA AAGCTCCAAGAAACAAACT AGCTCCAAGAAACAAACTC GCTCCAAGAAACAAACTCT TAGAATTTCTAGAAAGTCC AGAATTTCTAGAAAGTCCC GAATTTCTAGAAAGTCCCT GTATACCACTGACAGGCCG TATACCACTGACAGGCCGC ATACCACTGACAGGCCGCC ACCACGGACAGGCCGCCAG CCACGGACAGGCCGCCAGT CACGGACAGGCCGCCAGTC AAAATTCAAGCTCCAAGAG AAATTCAAGCTCCAAGAGA AATTCAAGCTCCAAGAGAC GTGTAATTTCATGAGTGGG TGTAATTTCATGAGTGGGG GTAATTTCATGAGTGGGGT AACTGCCCTCCCCCTAAAG ACTGCCCTCCCCCTAAAGC CTGCCCTCCCCCTAAAGCT CCCTTAAAGCTTCCACACT CCTTAAAGCTTCCACACTT CTTAAAGCTTCCACACTTG AGCCTCGGAGAAAGCAACA GCCTCGGAGAAAGCAACAT AGAAAGTACCTTCCCCTAA ACTTCCGGACCCCCAGTCA CTTCCGGACCCCCAGTCAT TTCCGGACCCCCAGTCATA ATAGCAACATGATTTTTCA GAGACAAACCCTTGAAAAA TTTGTGCCTACCCCACTCC TTGTGCCTACCCCACTCCC TGTGCCTACCCCACTCCCG AGTTCATTCCCCTAAAGCC GTTCATTCCCCTAAAGCCT TTCATTCCCCTAAAGCCTT AAATCATCAGAAAACTAAG AATCATCAGAAAACTAAGA ATCATCAGAAAACTAAGAA TGGCTATACCACTTACGGG AATTTGTGCCTACCCCACT ATTTGTGCCTACCCCACTC CCGCCAGGCATTAAATTCA CGCCAGGCATTAAATTCAA GCCAGGCATTAAATTCAAG AGCTCCAAGAGACAAGTTT GCTCCAAGAGACAAGTTTT CTCCAAGAGACAAGTTTTG TATTCCTTTGACAGGCCGC ATTCCTTTGACAGGCCGCC CAGTGTATATTGGTGGCTA AGTGTATATTGGTGGCTAT TACCACTGACAGGCCGCCA ACCACTGACAGGCCGCCAG ACGGGCCGCCAGTCATTAA CGGGCCGCCAGTCATTAAA GAAAAAAAGGCAGCCTAGG AAAAAAAGGCAGCCTAGGAAAAAAAAGGCAGCCTAGGC AAAAAAGGCAGCCTAGGAGAAAAAAGGCAGCCTAGGCG CAGCCTCGGAGAAAGCAAC TCAAACTCCAAGAGACAAA CAAACTCCAAGAGACAAAC AAACTCCAAGAGACAAACT CTATCCCCCTGACGGCCCG TATCCCCCTGACGGCCCGC ATCCCCCTGACGGCCCGCC GTGATTTGCCCAGGAGGGG TGATTTGCCCAGGAGGGGG GATTTGCCCAGGAGGGGGC ACCTTTCACACTTGCCTCA CCTTTCACACTTGCCTCAG TCAGTGTATATATGAGGCT CCAAGAGACAAACTCTTGA AGTTATATTTTGAAAAATC GTTATATTTTGAAAAATCA TTATATTTTGAAAAATCAT AAGCTTCAACCCTGGCCTC GGACGGCCCGCCAGTCATT GACGGCCCGCCAGTCATTA AAGGCAGCCTCGGAGAAAG AGGCAGCCTCGGAGAAAGC GGCAGCCTCGGAGAAAGCA CCACAGTGAAAATTTGTGC CACAGTGAAAATTTGTGCC ACAGTGAAAATTTGTGCCT GAAAGTACCTTCCCCTAAA AAAGTACCTTCCCCTAAAG GTTCCTTCCCCTAAAGCTT TTCCTTCCCCTAAAGCTTT ACTTGCCTAGGTGAATATA TCCTTCCCCTAAAGCTTTC GAAGATAGACTCAAGGGAC AAGATAGACTCAAGGGACA AGATAGACTCAAGGGACAA AATNNTCAGAAAACTGAGA ATNNTCAGAAAACTGAGAA TTCACTCTTGCCTCAGTGT TCACTCTTGCCTCAGTGTA CACTCTTGCCTCAGTGTAT AGGTGGCTATTCCTTTGAC GGTGGCTATTCCTTTGACA GTGGCTATTCCTTTGACAG AAGGATAGACTTTCTAGAA GGTGGGTATCCCGCTGACA GTGGGTATCCCGCTGACAG AAGTTTTGGAAAAAAAGGC AGTTTTGGAAAAAAAGGCA GTTTTGGAAAAAAAGGCAG TTGCCTAGGTGAATATAGG TGCCTAGGTGAATATAGGT NNNNNNANATTNTNANAAA NNNNNANATTNTNANAAAT NNNNANATTNTNANAAATN TCAGTGTATATATGGGGCT CAGTGTATATATGGGGCTA AGTGTATATATGGGGCTAT AGGATAGACTTTCTAGAAA GGATAGACTTTCTAGAAAG TCCCCCTAAAGCTTTCACA CCCCCTAAAGCTTTCACAC CCCCTAAAGCTTTCACACT GCGCGAACCCACGGACAGG CGCGAACCCACGGACAGGC GCGAACCCACGGACAGGCC TGGGTATACCACTGACAGG GGGTATACCACTGACAGGC TGCCTATACCACGGACGGC GCCTATACCACGGACGGCC CAGAAAACTGAGAATCAAG GTGCCTACCCCACTCCCGG ATTCAAACTCCAAGAGACA TTCAAACTCCAAGAGACAA TGGAAAAAAAGGCAGCCTA GGAAAAAAAGGCAGCCTAG TCATTAAATTCAAGCTCCA TCCGGACCCCCAGTCATAA GCTTTCACACTTGCCTCGG CTTTCACACTTGCCTCGGT TTTCACACTTGCCTCGGTG CTTCCCTCAGTGTATATAT TTCCCTCAGTGTATATATG TCCCTCAGTGTATATATGT GTGAAAATTTGTGCCTACC TGAAAATTTGTGCCTACCC GGTATACCACTGACAGGCC AATGCAAGCTCCAAGAGAC AACCCTGGCCTCAGTGTAT ACCCTGGCCTCAGTGTATA CCCTGGCCTCAGTGTATAT AGTACCTTCCCCTAAAGCT GTACCTTCCCCTAAAGCTT TACCTTCCCCTAAAGCTTT ATACCACGGACGGCCCGCC TACCACGGACGGCCCGCCA ACCACGGACGGCCCGCCAG CAGTGAAAATTTGTGCCTA AGTGAAAATTTGTGCCTAC CCCACTGACGGGCCGCCAG CCACTGACGGGCCGCCAGT GCCGCCAGTCATTAAATTC CCGCCAGTCATTAAATTCA CGCCAGTCATTAAATTCAA ATTTTGAAAAAACATCAGA TTTTGAAAAAACATCAGAA TTTGAAAAAACATCAGAAA GAAAAAACATCAGAAAACT AAAAAACATCAGAAAACTG AAAAACATCAGAAAACTGA GTCATTAAATTCAACCACC TCATTAAATTCAACCACCA GGACAAAGCAGTAAAATGT CTATACCACTTACGGGCCG TATACCACTTACGGGCCGC ATACCACTTACGGGCCGCC TCATTAAATTCAAACTCCA AATATAGGTGGGTATCCCG ATATAGGTGGGTATCCCGC CCTGGCCTCAGTGTATATA GAGAAAGCAACCGGATTTT AGAAAGCAACCGGATTTTT GAAAGCAACCGGATTTTTC ATAGATTTTCTAGAAAGTT TAGATTTTCTAGAAAGTTC AGATTTTCTAGAAAGTTCC GATAGACTTTCTAGAAAGT GTATATATGTGGGTATACC CCTAAAGCTTCAACCCTGG CCCCTAAAGCTTTCACAGT CCCTAAAGCTTTCACAGTT CCTAAAGCTTTCACAGTTG TCCAAGAGACAAACTCTTG TCCCCTAAAGCTTTCACTC CCCCTAAAGCTTTCACTCT CCCTAAAGCTTTCACTCTT AGAATTTCTAGAAAGTTCA GAATTTCTAGAAAGTTCAT AATTTCTAGAAAGTTCATTATTTTCTAGAAAGTTCCTT TTTTCTAGAAAGTTCCTTC CTAAGAATCAAGGATAGAA TAAGAATCAAGGATAGAATTAAGAATCAAGGATAGAAG AAGAATCAAGGATAGAAGT CTCAGTGTATATTGGTGGC TCAGTGTATATTGGTGGCT GCTATCCCCCTGACGGCCC TAAAGCTTCCACACTTGCC CTTGAAAAAAAGCCAGCCT CAGTAAAATGTGTAATTTC AGTAAAATGTGTAATTTCA NNNANATTNTNANAAATNN NNANATTNTNANAAATNNT NANATTNTNANAAATNNTC CCTAGGAGAAAGAAACATG TCTTCCCACTTCCGGACCC CTTCCCACTTCCGGACCCC TNANAAATNNTCAGAAAAC NANAAATNNTCAGAAAACT ANAAATNNTCAGAAAACTG GACGGGCCGCCAGTCCTTA ACGGGCCGCCAGTCCTTAA CGGGCCGCCAGTCCTTAAA TGTAAATATGTGGCTATAC GTAAATATGTGGCTATACC TAAATATGTGGCTATACCA CCCTGACGGCCCGCCAGTC CCTGACGGCCCGCCAGTCA CTGACGGCCCGCCAGTCAT CTAAAGCTTTCACAGTTGA TAAAGCTTTCACAGTTGAC CTACACCACCGACGGCCCG TACACCACCGACGGCCCGC ACACCACCGACGGCCCGCC TCATTCCCCTAAAGCCTTC CATTCCCCTAAAGCCTTCA ATTCCCCTAAAGCCTTCAC TGTTATATTTTGAAAACTC GTTATATTTTGAAAACTCA ATTAAATTCAAGCCCCAAG TTAAATTCAAGCCCCAAGA TAAATTCAAGCCCCAAGAG CCCCTAAAGCTTCCACACT GGATAGATTTTCTAGAAAG GATAGATTTTCTAGAAAGT GCTATACCACTGACAGGCC ACTCTTGAAAAAAAGGCAG CTCTTGAAAAAAAGGCAGC CACACTTGCCTCAGTGTAT ACACTTGCCTCAGTGTATA CACTTGCCTCAGTGTATAT CAGTCTTTACTGGTGCTCT AGTCTTTACTGGTGCTCTTATGTAACAAATGTGATTTG TGTAACAAATGTGATTTGC GTAACAAATGTGATTTGCC AAAAGGCAGCCTCGGAGAA AAAGGCAGCCTCGGAGAAA GATAGAATTTCTAGAAAGT ATAGAATTTCTAGAAAGTT TAGAATTTCTAGAAAGTTC AAGTACCTTCCCCTAAAGC CAAGAGACAAACTCTTGAA GCTATCCCCCTGAGGGGCC CTATCCCCCTGAGGGGCCG TCCAGTCATTAAATTCAAG CCAGTCATTAAATTCAAGC CAGTCATTAAATTCAAGCTCAGTCATTAAATTCAAGCC CCCACGGACAGGCCGCCAG AACTCTTGAAAAAAAGGCA TTCCACACTTGCCTAGGTG TCCACACTTGCCTAGGTGA CCCCTAAGGCTTTCACACT CCCTAAGGCTTTCACACTT CTTGCCTCAGTGTAAATAT TTGCCTCAGTGTAAATATG TGCCTCAGTGTAAATATGT CTCTTGAAAAAAAGCCAGC TCTTGAAAAAAAGCCAGCC AACCACCAAGAGACAAACT ACCACCAAGAGACAAACTC CTGAGAATCAAGGATAGAA TGAGAATCAAGGATAGAAT TGGCTATCCCCCTGACAGG GGCTATCCCCCTGACAGGC GCTATCCCCCTGACAGGCC AGCTTTCACACTTGCCTCA TATACTTTGAAAAATCATC CTTGCCTAGGTGAATATAG TTGAAAAAACATCAGAAAA TGAAAAAACATCAGAAAAC TACCACTTACGGGCCGCCA GCCCTAAAGATTTCACACT GCCGCCAGTCATTAAATGC CCGCCAGTCATTAAATGCA CGCCAGTCATTAAATGCAA GAAATTTCCTTCCCCTAAA TTGCCCAGGAGGGGGCGTC TGCCCAGGAGGGGGCGTCC GCCCAGGAGGGGGCGTCCA AACAAACTCTTGAAAAAAA ACAAACTCTTGAAAAAAAG AAATCATCAGAAAACTGAG TCCCCTAAAGCTTTCACACTCCCCTAAAGCTTTCACAG CTGGTGCTCTTCCCACTTC TGGTGCTCTTCCCACTTCC TCCAAGAAACAAACTCTTG AAGCTTCCACACTTGCCTA AAAGCAACATGATTTTTCTAAAGCAACATGATTTTTCA GATAGACTCAAGGGACAAA TGTGTAATTTCATGAGTGG GACGGGCCGCCAGTCATTA AAGCTTTCACACTTGCCTC AGCTTTCACACTTGCCTCG GGGGGCGTCCAGTCATTAA GGGGCGTCCAGTCATTAAA GGAGATAGCAACATGATTT GAGATAGCAACATGATTTT AGATAGCAACATGATTTTT TAGGAGAAAGCAGCATGAT TGCCCTCCCCCTAAAGCTT GCCCTCCCCCTAAAGCTTC CCCTCCCCCTAAAGCTTCA TGGGGTCTCCAGTCATTAA GGGGTCTCCAGTCATTAAA GGGTATCCCGCTGACAGGC GGTATCCCGCTGACAGGCC GTATCCCGCTGACAGGCCC AAAGTTCTTTGCCCTAAAG AAGTTCTTTGCCCTAAAGA AGTTCTTTGCCCTAAAGAT GTTCTAGAAAGTTCTTTGC GCCCGCCAGGCATTAAATT GCCTCAGTGTATATATGGG CCTCAGTGTATATATGGGG CTCAGTGTATATATGGGGC ACCTTCCCCTAAAGCTTTC AGCCAGCCTAGGAGAAAGC GGCCGCCAGTCATTAAATG CAGCCTAGGAGATAGCAAC AGCCTAGGAGATAGCAACA GAGGGGGCGTCCAGTCATT AGGGGGCGTCCAGTCATTA ATTTTGAAAACTCATCAGA TTTTGAAAACTCATCAGAA TTTGAAAACTCATCAGAAA ATTTGCCCAGGAGGGGGCG TTTGCCCAGGAGGGGGCGT AAATTCAAGCTCCAAGAAA AATTCAAGCTCCAAGAAAC GGACAGGCCGCCAGTCATT AACTCCAAGAGACAAACTC ACTCCAAGAGACAAACTCT GCTATCCCACTGACGGGCC CTATCCCACTGACGGGCCG TATCCCACTGACGGGCCGC CAGGCCCCCAGTCATTAAA CGTAAAGCTTTCACACTTG TAATTTCATGAGTGGGGTC AATTTCATGAGTGGGGTCT GGGCCGCCAGTCCTTAAAT ATGTGGCTATCCCACTGAC TGTGGCTATCCCACTGACATGTGGCTATCCCACTGACG GTGGCTATCCCACTGACAG GCTATACCACTTACGGGCC ACAAAGCAGTAAAATGTGT TATATGTGGCTATACCACT ATATGTGGCTATACCACTGATATGTGGCTATACCACTT AAAGTTCATTCCCCTAAAG ATCCCGCTGACAGGCCCCC GTTCTTTGCCCTAAAGATT CAGCCTAGGAGAAAGAAAC AGAATCAAGGATAGAAGTT TCCAAGAGACAAGTTTTGG GTATATATGTGGCTATCCC TATATATGTGGCTATCCCC ATATATGTGGCTATCCCCC TGACGGGCCGCCAGTCATT CTTGCCACAGTGAAAATTT TTGCCACAGTGAAAATTTG TGCCACAGTGAAAATTTGT CCCTCAGTGTATATATGTG CCTCAGTGTATATATGTGG AAAACTAAGAATCAAGGAT AAACTAAGAATCAAGGATA AACTAAGAATCAAGGATAG GATAGCAACATGATTTTTC GCCTCAGTGTAAATATGTG CCTCAGTGTAAATATGTGG CACTGACAGGCCGCCAGTC ACTGACAGGCCGCCAGTCA CTGACAGGCCGCCAGTCAT CCCTGAGGGGCCGCCAGTC CCTATACCACGGACGGCCC CTATACCACGGACGGCCCG ATGTGACTACACCACCGAC TGTGACTACACCACCGACG GTGACTACACCACCGACGG TTCCTTTGACAGGCCGCCA TCCTTTGACAGGCCGCCAG AGAGACAAGTTTTGGAAAA ANATTNTNANAAATNNTCA AATCAAGGATAGAATTTCT CCCACTTCCCTCAGTGTAT TATCCCGCTGACAGGCCCC TTAACACTTGCCACAGTGA TAACACTTGCCACAGTGAA AACACTTGCCACAGTGAAA CTCCAGTCATTAAATTCAA TATACCACGGACGGCCCGC NNNNNNTTTCTGAATGTTT NNNNNTTTCTGAATGTTTC NNNNTTTCTGAATGTTTCT GTCATTAAATTCAAGCTCA TCATTAAATTCAAGCTCAA CATTAAATTCAAGCTCAAACATAAAATTCAAGCTCCAA ATAAAATTCAAGCTCCAAG TAAAATTCAAGCTCCAAGA CTATCCCCCTGACAGGCCG TATCCCCCTGACAGGCCGC CACTTACGGGCCGCCAGTC ACTTACGGGCCGCCAGTCA CTTACGGGCCGCCAGTCAT TGTTATATTTTGAAAAATC GAATCAAGGATAGACTTTC AATCAAGGATAGACTTTCT AGGAGAAAGCAACATGATT GGAGAAAGCAACATGATTT NAAATNNTCAGAAAACTGA AAATNNTCAGAAAACTGAG GGCTATACCACTTACGGGC CATCAGAAAACTAAGAATC ATCAGAAAACTAAGAATCA TCAGAAAACTAAGAATCAA TTCCTTCCCCTAAACCTTT TCCTTCCCCTAAACCTTTC CCTTCCCCTAAACCTTTCA TGGCTATCCCACTGACAGG AATTTCTAGAAATTTCCTT ATATATGAGGCTATACCAC CCCGGGCCGCCAGTCATTA CCGGGCCGCCAGTCATTAA GAATATAGGTGGGTATCCC GCTTTAACACTTGCCACAG CTTTAACACTTGCCACAGT TTTAACACTTGCCACAGTG GAAAACTGAGAATCAAGGA AAAACTGAGAATCAAGGAT CAGTCATTAAATTCAACCA AGTCATTAAATTCAACCAC CTCCAAGAAACAAACTCTT AACCTTTCACACTTGCCTC CGCAGCCTAGGAGATAGCA GCAGCCTAGGAGATAGCAA AAAACATCAGAAAACTGAG AAACATCAGAAAACTGAGA AACATCAGAAAACTGAGAA CCAGTCATAAAATTCAAGC TATATTGGTGGCTATCCCC ATATTGGTGGCTATCCCCC TCGGTGTATATATGTGGCT CGGTGTATATATGTGGCTA GGTGTATATATGTGGCTAT TGTTATATTTTGAAAAAAC AGAAAGTTCCTCCCCCTAA GCTCAAAGAGACAAACTCT AGCCTAGGAGAAAGCAACA GACTACACCACCGACGGCC ACTACACCACCGACGGCCC TCCCACTGACGGGCCGCCA AGTCATAAAATTCAAGCTC GTCATAAAATTCAAGCTCC TCATAAAATTCAAGCTCCA TCCACACTTGCCTCAGTGT ATTTTGAAAAATCATCAGA TTTTGAAAAATCATCAGAA TTTGAAAAATCATCAGAAA TAGAAGTTCTAGAAAGTTC AGAAGTTCTAGAAAGTTCT GAAGTTCTAGAAAGTTCTT ATGTGGCTATACCACTGAC TGTGGCTATACCACTGACGTGTGGCTATACCACTGACA GTGGCTATACCACTGACGG AGTCATTAAATTCAAGCTC GTCATTAAATTCAAGCTCC CTAAAGCTTTCCCACTTCC ATCCCACTGACGGGCCGCC CCTCCCCCTAAGGCTTTCA CGAACCCACGGACAGGCCG GATTTTCTAGAAAGTTCCT AGTGTATATATGTGGGTAT GTGTATATATGTGGGTATA TGTATATATGTGGGTATAC AAATATGTGGCTATACCAC NNNTTTCTGAATGTTTCTT NNTTTCTGAATGTTTCTTA TATGTGGCTATACCACTGATATGTGGCTATACCACTTA CACCACCGACGGCCCGCCA NATTNTNANAAATNNTCAG GAGAATCAAGGATAGAATT GGCTATCCCACTGACAGGC GCTATCCCACTGACAGGCC CTATCCCACTGACAGGCCG TAGAAAGTTCCTCCCCCTA CCTCCCCCTAAAGCTTCAA CTCCCCCTAAAGCTTCAAC TCCCCCTAAAGCTTCAACC GGTGGCTATCCCCCTGAGG CCACGGACGGCCCGCCAGT TAGAAAGTTCTTTGCCCTA AGAAAGTTCTTTGCCCTAA GAAAGTTCTTTGCCCTAAA ATTCAAGCCCCAAGAGACA TTCAAGCCCCAAGAGACAA TCAAGCCCCAAGAGACAAA TCATTAAATTCAAGCCCCA CATTAAATTCAAGCCCCAA TATATGAGGCTATACCACT ACTAAGAATCAAGGATAGA CTAAGAATCAAGGATAGAC TAAGAATCAAGGATAGACT TGGTGGCTATCCCCCTGAG AGTCATTAAATTCAAGCCC GTCATTAAATTCAAGCCCC CCAGGCATTAAATTCAAGC AAGTTCATTCCCCTAAAGC CCCTAAAGCTTTCACACTT CCTAAAGCTTTCACACTTG CTAAAGCTTTCACACTTGC GTAAGAACTGCCCTCCCCC AAACTCTTGAAAAAAAGGC CCCTAAACCTTTCACACTT CCTAAACCTTTCACACTTG CTAAACCTTTCACACTTGC TATCCCCCTGAGGGGCCGC ATCCCCCTGAGGGGCCGCC AAAAGGCAGCCTAGGCGAA AAAGGCAGCCTAGGCGAAA AAGGCAGCCTAGGCGAAAG GCTCCAAGAGACAAACCCT TTTCACTCTTGCCTCAGTG TAAACCTTTCACACTTGCC AAACCTTTCACACTTGCCT TATATGTGGCTATCCCCCT ATATGTGGCTATCCCCCTG TATGTGGCTATCCCCCTGA TTCCCCTAAACCTTTCACA TCCCCTAAACCTTTCACAC CCCCTAAACCTTTCACACT CCACACTTGCCTAGGTGAA AGACTCAAGGGACAAAGCA GACTCAAGGGACAAAGCAG GAGGAGAAAGCAACCGGAT AGGAGAAAGCAACCGGATT TATATTTTGAAAAATCATC ATATTTTGAAAAATCATCA TATTTTGAAAAATCATCAG CCAGTCATTAAATTCAAAC TTCTAGAAAGTTCCTTCCC ACCACTGACGGGCCGCCAG CCTCCCCGTAAAGCTTTCA GAAAGAAACATGATTTTTC ATTNTNANAAATNNTCAGA CCCCCTAAAGCTTTCCCAC CCCCTAAAGCTTTCCCACT CCCTAAAGCTTTCCCACTT AAGAATCAAGGATAGACTT AGAATCAAGGATAGACTTT GTCATTAAATGCAAGCTCC TCATTAAATGCAAGCTCCA TCAGTGTAAATATGTGGCT CAGTGTAAATATGTGGCTA AGTGTAAATATGTGGCTAT CTATACCACTGACGGGCCG TATACCACTGACGGGCCGC GAGACAAGTTTTGGAAAAA GGTGAATATAGGTGGGTAT GTGAATATAGGTGGGTATC TGAATATAGGTGGGTATCC AACTCTTGAAAAAAAGCCA ACTCTTGAAAAAAAGCCAG TGCCTCGGTGTATATATGT GCCTCGGTGTATATATGTG CCTCGGTGTATATATGTGG AATATGTGGCTATACCACT AAAAAGGCAGCCTAGGAGA GAATTTCTAGAAAGTTCCT ACCCACGGACAGGCCGCCA AATTTCTAGAAAGTCCCTC ATTTCTAGAAAGTCCCTCC CAAACTCTTGAAAAAAAGCCAAACTCTTGAAAAAAAGG AGGCAGCCTAGGAGAAAGC ATATGTGACTACACCACCG TATGTGACTACACCACCGA GTTATATTTTGAAAAAACA ATTTCTAGAAAGTTCATTC TTTCTAGAAAGTTCATTCC TTCACACTTGCCTCGGTGT AGCCTAGGAGAAAGAAACA GCCTAGGAGAAAGAAACAT GTGTAAATATGTGGCTATA AGGCAGCCTAGGCGAAAGC GGCAGCCTAGGCGAAAGCA TCATCAGAAAACTAAGAAT TAGGAGATAGCAACATGAT AGGAGATAGCAACATGATT ATATGGGGCTATACCACTG TATGGGGCTATACCACTGA ATGGGGCTATACCACTGAC GACAAAGCAGTAAAATGTG CTAGAAAGTTCATTCCCCTCTAGAAAGTTCCTCCCCCT AGAGACAAACCCTTGAAAA NNNNNNNTTTCTGAATGTT GACGGCCCGCCAGGCATTA GTCCAGTCATTAAATTCAA AGGAGGGGGCGTCCAGTCA GGAGGGGGCGTCCAGTCAT TTATATTTTGAAAAAACAT GGAGAAAGCAACCGGATTT TGCCTACCCCACTCCCGGG GCCTACCCCACTCCCGGGC TGAAAACTCATCAGAAAAC GAAAACTCATCAGAAAACT AAAACTCATCAGAAAACTG AGTCAATATATGTGACTAC GTCAATATATGTGACTACA TCAATATATGTGACTACAC AAGTTCTAGAAAGTTCTTT AGTTCTAGAAAGTTCTTTG TTGCCTCGGTGTATATATG AAACTCATCAGAAAACTGA TGAAAAAAAGGCAGCCTAG TAACAAATGTGATTTGCCC ATCCCCCTGACAGGCCGCC AAAGGCAGCCTAGGAGAAA AAGGCAGCCTAGGAGAAAG AAATTCAAGCTCAAAGAGA AATTCAAGCTCAAAGAGAC ATTCAAGCTCAAAGAGACA TCCCACCAGTCATTAAATT CCCACCAGTCATTAAATTC CCACCAGTCATTAAATTCA AGAATTTCTAGAAAGTTCC GGAGAAAGAAACATGATTT GAGAAAGAAACATGATTTT TACCACGGACAGGCCGCCA CAACCACCAAGAGACAAAC AAAAAAGCCAGCCTAGGAG AAATTCAAGCCCCAAGAGA AATTCAAGCCCCAAGAGAC GTCTCCAGTCATTAAATTC TCTCCAGTCATTAAATTCA CCTCAGTCAATATATGTGA CTCAGTCAATATATGTGAC TCAGTCAATATATGTGACT AGAAAACTGAGAATCAAGG ATTTCTAGAAATTTCCTTC TTTCTAGAAATTTCCTTCC TAAATGCAAGCTCCAAGAG GAAAGCAGCATGATTATTC TTTCTAGAAAGTCCCTCCC CCCTCCCCCTAAGGCTTTC GCTTTCCCACTTCCCTCAG TGGGTATCCCGCTGACAGG CACCAGTCATTAAATTCAA ACCAGTCATTAAATTCAAC TGACGGGCCGCCAGTCCTT GTCCCTCCCCCTAAGGCTT TCCCTCCCCCTAAGGCTTT TCCCCTAAAGCTTCCACAC CCTAAAGCTTTCCCACTTC ATGTGGCTATCCCCCTGAC TGTGGCTATCCCCCTGACATGTGGCTATCCCCCTGACG GTGGCTATCCCCCTGACAG AATTCAACCACCAAGAGAC AAAGCAGCATGATTATTCA TAAGAACTGCCCTCCCCCT AAGAACTGCCCTCCCCCTA GTGTATATATGGGGCTATA GCTTTCACAGTTGACTCAG CTTTCACAGTTGACTCAGT TTTCACAGTTGACTCAGTG AGCTTTCCCACTTCCCTCA CCTAAAGCTTTCACTCTTG CCCAGTCATTAAATTCAAG CTAAAGCTTTCACTCTTGC TAAGGCTTTCACACTTGCC AAGGCTTTCACACTTGCCT AGGCTTTCACACTTGCCTC TTCTAGAAAGTCCCTCCCC GGCCCCCAGTCATTAAATT GCCCCCAGTCATTAAATTC CCCCCAGTCATTAAATTCA ACCACTTACGGGCCGCCAG ATCAAGGATAGAATTTCTA TTCTAGAAATTTCCTTCCC TCTAGAAATTTCCTTCCCC GAACTGCCCTCCCCCTAAA CAGGAGGGGGCGTCCAGTC CTCAGTGTAAATATGTGGC GGCTTTCACACTTGCCTCA GCCACAGTGAAAATTTGTG AAATATGTGCCTATACCAC AATATGTGCCTATACCACG TTCAAGCTCCAAGAGACAA CTAAGGCTTTCACACTTGC TCCCCCTGACAGGCCGCCA CCCCCTGACAGGCCGCCAG TACCACTGACGGGCCGCCA AATCATCAGAAAACTGAGA GAGACAAACTCTTGAAAAA CGACGGCCCGCCAGGCATT ATATGAGGCTATACCACTG CAAGCCCCAAGAGACAAAC CCCCACTCCCGGGCCGCCA ATATGTGGCTATCCCACTG TATGTGGCTATCCCACTGA AAATGCAAGCTCCAAGAGA CTCAGTGTATATATGTGGGCTCAGTGTATATATGTGGC AGGATAGAAGTTCTAGAAA AAGAGACAAACCCTTGAAA TTGAAAACTCATCAGAAAA CAAGCTCCAAGAAACAAAC TAGGAGAAAGCAACATGAT TCCCCCTGAGGGGCCGCCA CGGACAGGCCGCCAGTCAT AGGCCCCCAGTCATTAAAT CCTACCCCACTCCCGGGCC CCAGTCATTAAATTCAACC ATAGACTCAAGGGACAAAG TAGACTCAAGGGACAAAGC CTCGGTGTATATATGTGGC TCAGTGTATATATGTGGGT CAGTGTATATATGTGGGTA TAGGTGGCTATTCCTTTGA CCCCAGTCATTAAATTCAA GTTTATATATGTGGCTATC TTTATATATGTGGCTATCC AGACAAACTCTTGAAAAAA GACAAACTCTTGAAAAAAA CACACTGGCCTCAGTGTAT ACACTGGCCTCAGTGTATA GCCAGTCATTAAATTCAAAGCCAGTCATTAAATTCAAG TTAAAGCTTCCACACTTGC ATTAAATTCAAGCTCAAAG AAACTCTTGAAAAAAAGCC TAGGTGAATATAGGTGGGT CCGACGGCCCGCCAGGCAT TGGGGCTATACCACTGACA GCCAGTCATTAAATGCAAG CCAGTCATTAAATGCAAGC CAGTCATTAAATGCAAGCT TTAAATTCAAGCTCAAAGA TAAATTCAAGCTCAAAGAG TTGAAAAATCATCAGAAAA CGGAGAAAGCAACATGATT CCCCTAAAGCCTTCACACT CCCTAAAGCCTTCACACTT CCTAAAGCCTTCACACTTG AAAAGGCAGCCTAGGAGAA AAAGCTTTCACAGTTGACT AAGCTTTCACAGTTGACTC AGCTTTCACAGTTGACTCA TATATGTGGCTATCCCACTTATATGGGGCTATACCACTTATATGTGACTACACCACC AGTCATTAAATGCAAGCTC TCACACTTGCCTCGGTGTA TCAAGCTCAAAGAGACAAA CCTCAGTGTATATTGGTGG AAAAAGCCAGCCTAGGAGA CCACTTACGGGCCGCCAGT CAGTCAATATATGTGACTA CCCCCCAGTCATTAAATTC AACCCACGGACAGGCCGCC CTTCCCCTAAACCTTTCAC CCAGTCTTTACTGGTGCTC TTCACACTTGCCTCAGTGT TCACACTTGCCTCAGTGTA GTCCCACCAGTCATTAAAT ATACCACTGACGGGCCGCC AAGCCCCAAGAGACAAACC CCCCCTGAGGGGCCGCCAG CCCCTGAGGGGCCGCCAGT GCAGCCTCGGAGAAAGCAA TGACGGCCCGCCAGTCATT AGCAGTAAAATGTGTAATT TCTAGAAAGTTCCTCCCCC AACCTTTCACACTGGCCTC ACCTTTCACACTGGCCTCA CCTTTCACACTGGCCTCAG CTCAAAGAGACAAACTCTT GCAGTAAAATGTGTAATTT TTCCCCTAAAGCCTTCACA TCCCCTAAAGCCTTCACAC CCCAGGAGGGGGCGTCCAG CCAGGAGGGGGCGTCCAGT GTGCTCTTCCCACTTCCGG CGTCCCACCAGTCATTAAA GACCCCCAGTCATAAAATT AGGTGAATATAGGTGGGTA TGGCTATACCACTGACGGG CTATACCACTGACAGGCCG AGAAAGAAACATGATTTTT GAATCAAGGATAGAAGTTC AATCAAGGATAGAAGTTCT ATTCAAGCTCCAAGAGACA TTCTAGAAAGTTCATTCCC TCTAGAAAGTTCATTCCCC ACTTGCCTCAGTGTATATA GGGTCTCCAGTCATTAAAT GGTCTCCAGTCATTAAATT CGTCCAGTCATTAAATTCA TTATATTTTGAAAACTCAT TATATTTTGAAAACTCATC GATTTCACACTTGTGTCAT ATTTCACACTTGTGTCATT GGCGTCCCACCAGTCATTA GCGTCCCACCAGTCATTAA AAAGCTTCCACACTTGCCT AAGCTTCCACACTTGCCTC AGAACTGCCCTCCCCCTAA ACCGACGGCCCGCCAGGCA ATAGAATTTCTAGAAATTT ATCATCAGAAAACTGAGAA GAACCCACGGACAGGCCGC TCCCTTAAAGCTTCCACAC TATTTTGAAAACTCATCAG TCAGTGTATATATGTGGCT CAGTGTATATATGTGGCTA TTACGGGCCGCCAGTCATT GGATAGAATTTCTAGAAAT GATAGAATTTCTAGAAATT ATCAAGGATAGAAGTTCTA TCAAGGATAGAAGTTCTAG CAAGGATAGAAGTTCTAGA AAAAAGGCAGCCTAGGCGA TCTAGAAAGTTCCTTCCCC ATATTTTGAAAACTCATCA TGAGTGGGGTCTCCAGTCAACGGACAGGCCGCCAGTCA CCCGCCAGGCATTAAATTC TGTATATATGGGGCTATAC ACCCCCAGTCATAAAATTC TTCAAGCTCAAAGAGACAA CCCCTTGCCTCAGTGTATA GGATAGAATTTCTAGAAAG CATGTAACAAATGTGATTT GGTGCTCTTCCCACTTCCG AAGGATAGAAGTTCTAGAA CAGTCATAAAATTCAAGCT NNNNNNNANATTNTNANAA TGACTACACCACCGACGGC TACGGGCCGCCAGTCATTA CCCAGTCTTTACTGGTGCT CCTAAGGCTTTCACACTTG AGTAATTGTAAGAACTGCC CCCTCCCCGTAAAGCTTTC …a “simple” region of a human genome.
  19. 19. is expensive. TTATGAGGTGACATTTAAA ATGATTCTTAGGTTTCAAT TGATTCTTAGGTTTCAATG GATTCTTAGGTTTCAATGG TAGCTTCCAATGGGCAATA AGCTTCCAATGGGCAATAA GCTTCCAATGGGCAATAAA TTTCTAAATGTTTCTTAGC TTCTAAATGTTTCTTAGCT TCTAAATGTTTCTTAGCTT TTTTTTCTCATAAAATGGT TTTTTCTCATAAAATGGTT TTTTCTCATAAAATGGTTT ATGAAGCGTAGGCTATGCT TGAAGCGTAGGCTATGCTG GAAGCGTAGGCTATGCTGC CTGAGATGAAGAGAAGGCT TGAGATGAAGAGAAGGCTT GAGATGAAGAGAAGGCTTT GCCATTCTGAGGAAGTTTT CCATTCTGAGGAAGTTTTT CATTCTGAGGAAGTTTTTG ATAAAATGGTCTCTGAATG TAAAATGGTCTCTGAATGT AAAATGGTCTCTGAATGTT GCTTTGCTTTCTATGAGGA CTTTGCTTTCTATGAGGAG TTTGCTTTCTATGAGGAGT TCAATGGGCAATAAAAAAC CAATGGGCAATAAAAAACT AATGGGCAATAAAAAACTT AGTATTTGAGATGAAGAGA GTATTTGAGATGAAGAGAA TATTTGAGATGAAGAGAAG CCAATCTGAGGAAGCATCT CAATCTGAGGAAGCATCTG AATCTGAGGAAGCATCTGA ATTTGAGATGAAGAGAAGG TAGAAGTGAGCCAATCTGA AGAAGTGAGCCAATCTGAG GAAGTGAGCCAATCTGAGG CTATGCTGCCTTTGATGTG TATGCTGCCTTTGATGTGT ATGCTGCCTTTGATGTGTG AACTTTTAGGGAAATAGAA ACTTTTAGGGAAATAGAAG CTTTTAGGGAAATAGAAGT TTTTTGAGATGAAGCGAAG TTTTGAGATGAAGCGAAGG TTTGAGATGAAGCGAAGGC TGTTTTTCTCATAAAATGG GTTTTTCTCATAAAATGGT TTTTTCTCATAAAATGGTC AGTTTTTCTCATAAAATGG TCATAAAATGATTTCTGAA CATAAAATGATTTCTGAAT ATAAAATGATTTCTGAATG TTTTTCTCATAAAATGATT TTTTCTCATAAAATGATTT TTTCTCATAAAATGATTTC AGTCTTTGAGATGGAGGGA GTCTTTGAGATGGAGGGAA TCTTTGAGATGGAGGGAAA GTCTATGAGGAGAGCATTA TCTATGAGGAGAGCATTAG CTATGAGGAGAGCATTAGA CCAATCTGTGGAAGCATTT CAATCTGTGGAAGCATTTG AATCTGTGGAAGCATTTGA ATAAAATGGTTTTTGTATG TAAAATGGTTTTTGTATGT AAAATGGTTTTTGTATGTT GTGGGCAATAAATAAATTA TGGGCAATAAATAAATTAT TTCTCATAAATTGGTTTCT TCTCATAAATTGGTTTCTG CTCATAAATTGGTTTCTGA AAGCGTAGGCTATGCTGCC TGATTGCCTTTATGAGGTG GATTGCCTTTATGAGGTGA ATTGCCTTTATGAGGTGAC TGGTTTTTGTATGTTTCTT GGTTTTTGTATGTTTCTTA GTTTTTGTATGTTTCTTAG CTAAATGTTTCTTAGCTTT GGGAAAGCTTTGCTGTCTA GGAAAGCTTTGCTGTCTAT GAGAAGGCTGTGCTGTCTA AGAAGGCTGTGCTGTCTAT GAAGGCTGTGCTGTCTATG TGTATGTTTCTTAGCTTTC GTATGTTTCTTAGCTTTCA TATGTTTCTTAGCTTTCAA TTCTTAGCTTCCAATGGGC TCTTAGCTTCCAATGGGCA CTTAGCTTCCAATGGGCAA AGATGAAGCGAAGGCTTTG GATGAAGCGAAGGCTTTGC ATGAAGCGAAGGCTTTGCT GGAAGCATTTGAGATGAAG GAAGCATTTGAGATGAAGCGAAGCATTTGAGATGAAGA AAGCATTTGAGATGAAGCGAAGCATTTGAGATGAAGAG AATAACTTTTAGGGAAATA ATAACTTTTAGGGAAATAG TAACTTTTAGGGAAATAGA TTTGAGATGAAGAGAAGGGTTTGAGATGAAGAGAAGGC CTTTGAGATGGAGGGAAAG TTTGAGATGGAGGGAAAGC TTGAGATGGAGGGAAAGCT TTTCAATGGGGAATAAATA TTCAATGGGGAATAAATAA TCAATGGGGAATAAATAAC GAGGAAGTATCTGAGATGA AGGAAGTATCTGAGATGAA GGAAGTATCTGAGATGAAG TGCATTAGAATAGAATCGC GCATTAGAATAGAATCGCT CATTAGAATAGAATCGCTC TTCAATGGGCAATAAATAA TCAATGGGCAATAAATAAC CAATGGGCAATAAATAACT GTGAGCTAATCTGAGTAGG TGAGCTAATCTGAGTAGGT GAGCTAATCTGAGTAGGTA AGATGGAGGGAAAGCTTTG GATGGAGGGAAAGCTTTGC ATGGAGGGAAAGCTTTGCT AGATGAAGAGAAGGCTGTG GATGAAGAGAAGGCTGTGC ATGAAGAGAAGGCTGTGCT AGGGAAAGCTTTGCTGTCT GCTTTGCTGTCTATGAGGA CTTTGCTGTCTATGAGGAG TTTGCTGTCTATGAGGAGATTTGCTGTCTATGAGGAGT TTGAGATGAAGAGAAGGCT TGAGATGAAGAGAAGGCTG GAGATGAAGAGAAGGCTGT AAGAGAAGGCTTTGCTTTC AGAGAAGGCTTTGCTTTCT GAGAAGGCTTTGCTTTCTA GAAAAGGGCACCTGTGTTG AAAAGGGCACCTGTGTTGA AAAGGGCACCTGTGTTGAT AGCGTAGGCTATGCTGCCT TTGCTTTCTATGAGGAGTG TGCTTTCTATGAGGAGTGC GCTTTCTATGAGGAGTGCA TGAATGATTCTTAGGTTTC GAATGATTCTTAGGTTTCA AATGATTCTTAGGTTTCAA AAGAGAAGGCTTTGCTGTC AGAGAAGGCTTTGCTGTCT GAGAAGGCTTTGCTGTCTA TTTCTGAATGTTTCTTAGC TTCTGAATGTTTCTTAGCT CGCCAATCTGTGGAAGCAT GCCAATCTGTGGAAGCATT TATGAGGAGAGCATTAGAA ATGAGGAGAGCATTAGAAT TGAGGAGAGCATTAGAATA GCTGTCTATGAGGAGTGTA CTGTCTATGAGGAGTGTAT TGTCTATGAGGAGTGTATT AATAACTTTTAGGAAAATA ATAACTTTTAGGAAAATAG TAACTTTTAGGAAAATAGA AGGAGAGCATTAGAATAGA GGAGAGCATTAGAATAGAA GAGAGCATTAGAATAGAAT TGGAGGGAAAGCTTTGCTG GGAGGGAAAGCTTTGCTGT GAGGGAAAGCTTTGCTGTC CAATGGGCAATAAATTACT AATGGGCAATAAATTACTT ATGGGCAATAAATTACTTT AGAGCATTAGAATAGAATC GAGCATTAGAATAGAATCG AGCATTAGAATAGAATCGC TTAGCTTTCAATGGGCAAT TAGCTTTCAATGGGCAATA AGCTTTCAATGGGCAATAA GTGCGCCAATCTGTGGAAG TGCGCCAATCTGTGGAAGC GCGCCAATCTGTGGAAGCA GAGGAGAGCATTAGAATAG TTTTAGGGAAATAGAAGTG GCAATAAATTACTTTTCGA CAATAAATTACTTTTCGAG AATAAATTACTTTTCGAGA GAGCCAATCTGAGGAAGTC AGCCAATCTGAGGAAGTCT GCCAATCTGAGGAAGTCTT AGATGAAGAGAAGGCTTTG GATGAAGAGAAGGCTTTGC ATAGAATCGCTCCAGGAAA TAGAATCGCTCCAGGAAAA AGAATCGCTCCAGGAAAAG GGGCAGTAAATAACTTTTA GGCAGTAAATAACTTTTAG GCAGTAAATAACTTTTAGG AATCTGAGGAAGCATTTGA ATCTGAGGAAGCATTTGAG TCTGAGGAAGCATTTGAGA GGTTTTTCTCATAAAATGG ATGGGCAATAAATAGCTTT TGGGCAATAAATAGCTTTT AAGCATCTGAGATGAAGAG AGCATCTGAGATGAAGAGA GCATCTGAGATGAAGAGAA TTCTTAGCTTTCAATGGGG TCTTAGCTTTCAATGGGGA CTTAGCTTTCAATGGGGAA AGTGCATTAGAATAGAATT GTGCATTAGAATAGAATTG TGCATTAGAATAGAATTGC AAAGGTCACCTGTGTTGAT AAGGTCACCTGTGTTGATT AGGTCACCTGTGTTGATTG ATCGCTCCAGGAAAAGGGC TCGCTCCAGGAAAAGGGCA CGCTCCAGGAAAAGGGCAC TAGATGTGAGCTAATCTGA AGATGTGAGCTAATCTGAG GATGTGAGCTAATCTGAGT CCAGGAAAAGGGCACCTGT CAGGAAAAGGGCACCTGTG AGGAAAAGGGCACCTGTGT TAAATAACTTTTAGGAAAA AAATAACTTTTAGGAAAAT TTTTTGTATGTTTCTTAGC TTTTGTATGTTTCTTAGCT GGAAAAGGTCACCTGTGTT GAAAAGGTCACCTGTGTTG AAAAGGTCACCTGTGTTGA TCATAAATTGGTTTCTGAA CATAAATTGGTTTCTGAAT ATAAATTGGTTTCTGAATG GTATTAGAATAGAATCGCT TATTAGAATAGAATCGCTC ATTAGAATAGAATCGCTCC GAGATGAAGAGAAGGGTTT AGATGAAGAGAAGGGTTTG GATGAAGAGAAGGGTTTGC ATCTGAGGAAGTATTTGAG TCTGAGGAAGTATTTGAGA CTGAGGAAGTATTTGAGAT GCTGTGCTGTCTATGAGGA CTGTGCTGTCTATGAGGAG TGTGCTGTCTATGAGGAGT AGAATTGCTCCAGGAAAAG GAATTGCTCCAGGAAAAGG AATTGCTCCAGGAAAAGGT AAGTTTTTGAGATGAAGCG AGTTTTTGAGATGAAGCGA GTTTTTGAGATGAAGCGAA AATAGAAGTGAGCCAATCT ATAGAAGTGAGCCAATCTG CTCATAAAATGGTTTCTGA TCATAAAATGGTTTCTGAA CATAAAATGGTTTCTGAAT AATAGAATTGCTCCAGGAA ATAGAATTGCTCCAGGAAA TAGAATTGCTCCAGGAAAA CCAATGGGCAATAAATAAC AATGGGCAATAAATAACTT TTAGCTTCCAATGGGCAATTTAGCTTTCAATGGGGAAT TAGCTTTCAATGGGGAATA TTGCTGTCTATGAGGAGAG TGCTGTCTATGAGGAGAGC TTTCTCATAAAATGGTTTCTTTCTCATAAAATGGTTTT ATTGCTCCAGGAAAAGGTC TTGCTCCAGGAAAAGGTCA TGCTCCAGGAAAAGGTCAC CTGAATGTTTCTTAGCTTT TGAATGTTTCTTAGCTTTC GAATGTTTCTTAGCTTTCA GGTTTTTCTCATAAAATGA GCTTTCAATGGGCAATAAA CTTTCAATGGGCAATAAAT TCTCATAAAATGGTCTCTG CTCATAAAATGGTCTCTGA TCATAAAATGGTCTCTGAA GTTTCTGAATGATTCTTAG TTTCTGAATGATTCTTAGG TTCTGAATGATTCTTAGGT CAGGAAAAGGTAACGTGAG AGGAAAAGGTAACGTGAGG GGAAAAGGTAACGTGAGGT TAAATAACTTTTAGTGAAA AAATAACTTTTAGTGAAAT AATAACTTTTAGTGAAATA CTTCAATGGGCAATAAAAA TTCAATGGGCAATAAAAAA TGTTTCTTAGCTTTCAATG GTTTCTTAGCTTTCAATGG TTTCTTAGCTTTCAATGGG GGGCAATAAATTACTTTTC GGCAATAAATTACTTTTCG CTTGCAATGGGCAATAAAT TTGCAATGGGCAATAAATA TGCAATGGGCAATAAATAA CAATGGGGAATAAATAACT AATGGGGAATAAATAACTT GCGAAGGCTTTGCTGTCTA CGAAGGCTTTGCTGTCTAT GAAGGCTTTGCTGTCTATG TTTAGGGAAATAGATGTGA TTAGGGAAATAGATGTGAG TAGGGAAATAGATGTGAGC GAATGTTTCTTAGCTTCCA AATGTTTCTTAGCTTCCAA ATGTTTCTTAGCTTCCAAT TCTGAATGATTCTTAGGTT TTCAGTGGGCAATAAATAA TCAGTGGGCAATAAATAAA CAGTGGGCAATAAATAAAT GAAGAGAAGGCTTTGCTTT GCTGTCTATGAGGAGTGCA CTGTCTATGAGGAGTGCAT TGTCTATGAGGAGTGCATT AACTTTTAGGAAAATAGAT ACTTTTAGGAAAATAGATG AAGGCTTTGCTGTCTATGA TAGCTTTCAATGGGCAGTA AGCTTTCAATGGGCAGTAA GCTTTCAATGGGCAGTAAA AGAGAAGGCTGTGCTGTCT CCTGTGTTGATTGCCTTTA CTGTGTTGATTGCCTTTAT TGTGTTGATTGCCTTTATG TGAGGAAGTATTTGAGATG GAGGAAGTATTTGAGATGA GGAATAAATAACTTTTACG GAATAAATAACTTTTACGG AATAAATAACTTTTACGGA AAACTTTTAGGGAAATAGA TAGAATAGAATTGCTCCAG AGAATAGAATTGCTCCAGG GAATAGAATTGCTCCAGGA ATGGTTTCTGAATGTTTCT TGGTTTCTGAATGTTTCTT GGTTTCTGAATGTTTCTTA TTTTCTCATAAATTGGTTT TTTCTCATAAATTGGTTTC TATGAGGAGTGCATTAGAA ATGAGGAGTGCATTAGAAT TGAGGAGTGCATTAGAATA CTGAGGAAGTATCTGAGAT TGAGGAAGTATCTGAGATG TGAGATGGAGGGAAAGCTT GAGATGGAGGGAAAGCTTT TTCTCATAAAATGATTTCT CCAATCTGAGGAAGTCTTT AGCTAATCTGAGTAGGTAT CAATCTGAGGAAGTATCTG AATCTGAGGAAGTATCTGA ATCTGAGGAAGTATCTGAG TGTGAGCCATTCTGAGGAA GTGAGCCATTCTGAGGAAG TGAGCCATTCTGAGGAAGT AAAACTTTTAGGGAAATAG ATGAGGAGTGTATTAGAAT TGAGGAGTGTATTAGAATA GAGGAGTGTATTAGAATAG GGTTTTTCTCATAAATTGG GTTTTTCTCATAAATTGGT AAATGTTTCTTAGCTTTCA AATGTTTCTTAGCTTTCAA ATGTTTCTTAGCTTTCAAT TTGAGATGAAGCGTAGGCT TGAGATGAAGCGTAGGCTA GAGATGAAGCGTAGGCTAT TATGAGGAGTGTATTAGAA TGGGGAATAAATAACTTTT GGGGAATAAATAACTTTTA GGGAATAAATAACTTTTAC GGTTTCTGAATGATTCTTA CGCTCCAGGAAAAGGTCAC GCTCCAGGAAAAGGTCACC CTCCAGGAAAAGGTCACCT GTCACCTGTGTTGATTGCC TCACCTGTGTTGATTGCCT CACCTGTGTTGATTGCCTT GTGTTGATTGCCTTTATGA AGAAGGCTTTGCTTTCTAT GAAGGCTTTGCTTTCTATG AAGGCTTTGCTTTCTATGA GAGATGAAGCGAAGGCTTT TCTGAGGAAGTATCTGAGA CCAATCTGAGGAAGTATCT TGAGGAAGCATCTGAGATG GAGGAAGCATCTGAGATGA AGGAAGCATCTGAGATGAA GCTCCAGGAAAAGGGCACC TCGCTCCAGGAAAAGGTCA TCCAATGGGCAATAAATAATTCAATGGGCAGTAAATAA TCAATGGGCAGTAAATAAC CAATGGGCAGTAAATAACT ATTCTTAGGTTTCAATGGG TTCTTAGGTTTCAATGGGC TCTTAGGTTTCAATGGGCA CTGTCTATGAGGAGAGCAT TGTCTATGAGGAGAGCATT AGCCAATCTGAGGAAGCAT GCCAATCTGAGGAAGCATCGCCAATCTGAGGAAGCATT CCAATCTGAGGAAGCATTT TTTCTATGAGGAGTGCATT TTCTATGAGGAGTGCATTA TCTATGAGGAGTGCATTAG AAGGCTGTGCTGTCTATGA GGAAGCATCTGAGATGAAG GAAGCATCTGAGATGAAGA ATGAAGAGAAGGGTTTGCT TGAAGAGAAGGGTTTGCTG AGCTTTCAATGGGGAATAA GCTTTCAATGGGGAATAAA CTTTCAATGGGGAATAAAT GAAATAGATGTGAGCCAAT AAATAGATGTGAGCCAATC AATAGATGTGAGCCAATCT AATGGGCAGTAAATAACTT TTAGAATAGAATCGCTCCA CTATGAGGAGTGCATTAGA TTCTCATAAAATGGTTTTT TCTCATAAAATGGTTTTTG CTCATAAAATGGTTTTTGT CTCCAGGAAAAGGTAACGT TCCAGGAAAAGGTAACGTG CCAGGAAAAGGTAACGTGA TTGATTGCCTTTATGAGGT GAAGTATCTGAGATGAAGA AGGAAAAGGTCACCTGTGT ACCTGTGTTGATTGCCTTT AGGAGTGTATTAGAATAGA GGAGTGTATTAGAATAGAA GAGTGTATTAGAATAGAAT TAAATGTTTCTTAGCTTTC TCTTAGCTTCAATGGGCAA CTTAGCTTCAATGGGCAAT TTAGCTTCAATGGGCAATA AAGGGCACCTGTGTTGATT TCTGTGGAAGCATTTGAGA CTGTGGAAGCATTTGAGAT TGTGGAAGCATTTGAGATG TTGGTTTCTGAATGATTCT TGGTTTCTGAATGATTCTT AATGGTTTCTAAATGTTTC ATGGTTTCTAAATGTTTCT TGGTTTCTAAATGTTTCTT TTGAGATGAAGCGAAGGCT TGAGATGAAGCGAAGGCTT ATCTGAGGAAGCATCTGAG TCTGAGGAAGCATCTGAGA CTCCAGGAAAAGGGCACCT TAACTTTTACGGAAATAGA AACTTTTACGGAAATAGAT ACTTTTACGGAAATAGATG AGCATTTGAGATGAAGAGA GCATTTGAGATGAAGAGAA TAGGTTTCAATGGGCATTA AGGTTTCAATGGGCATTAA GGTTTCAATGGGCATTAAA ATAAATTACTTTTCGAGAT GGAAAATAGATGTGAGCCA GAAAATAGATGTGAGCCAA AAAATAGATGTGAGCCAAT AGGCTATGCTGCCTTTGAT GGCTATGCTGCCTTTGATG GCTATGCTGCCTTTGATGT AGGCTTTGCTGTCTATGAG GGCTTTGCTGTCTATGAGG TTTCGAGATATTGTTGTGC TTCGAGATATTGTTGTGCG TCGAGATATTGTTGTGCGC AGAAGGCTTTGCTGTCTAT CTGAATGTTTCTTAGCTTC TGAATGTTTCTTAGCTTCC GTAGGCTATGCTGCCTTTG TAGGCTATGCTGCCTTTGA GCCTTTATGAGGTGACATT CCTTTATGAGGTGACATTT CTTTATGAGGTGACATTTA TTTCAATGGGCAATAAATA TTTTTCTCATAAATTGGTT TTAGCTTTCAATGGGCAGT TCAATGGGCAATAAATAGC CAATGGGCAATAAATAGCT AATGGGCAATAAATAGCTT AAATGGTTTTTGTATGTTT AATGTTTCTTAGCTTTCAG AATCGCTCCAGGAAAAGGT ATCGCTCCAGGAAAAGGTAATCGCTCCAGGAAAAGGTC TCGCTCCAGGAAAAGGTAATCGCTCCAGGAAAAGGTCC GAAGTCTTTGAGATGGAGG AAGTCTTTGAGATGGAGGG TAAAAAACTTTTAGGGAAA AAAAAACTTTTAGGGAAAT AAAAACTTTTAGGGAAATA GAATCGCTCCAGGAAAAGG ATTGTTGTGCGCCAATCTG TTGTTGTGCGCCAATCTGT TGTTGTGCGCCAATCTGTG CAGGAAAAGGTCACCTGTG AGGGCACCTGTGTTGATTG GGGCACCTGTGTTGATTGC GGCACCTGTGTTGATTGCC TGAAGCGAAGGCTTTGCTG TTTCTTAGCTTCAATGGGC TTCTTAGCTTCAATGGGCA CTCATAAAATGGTTTCTAA TCATAAAATGGTTTCTAAA CATAAAATGGTTTCTAAAT TTCTCATAAAATGGTTTCT TCTCATAAAATGGTTTCTATCTCATAAAATGGTTTCTG ATGTGAGCTAATCTGAGTA GTCTATGAGGAGTGCATTA TCTCATAAAATGATTTCTG CTCATAAAATGATTTCTGA CATTTGAGATGAAGAGAAG GTTGTGCGCCAATCTGTGG TTGTGCGCCAATCTGTGGA ATGAAGAGAAGGCTTTGCT TAAAATGGTTTCTAAATGT AAAATGGTTTCTAAATGTT AAATGGTTTCTAAATGTTT TTCTTAGCTTTCAGTGGGC TCTTAGCTTTCAGTGGGCA CTTAGCTTTCAGTGGGCAA ATAAATAACTTTTACGGAA TAAATAACTTTTACGGAAA CTGAGGAAGTCTTTGAGAT TGAGGAAGTCTTTGAGATG GAGGAAGTCTTTGAGATGG CTTTCTATGAGGAGTGCAT AATCGCTCCAGGAAAAGGG GTGGAAGCATTTGAGATGA TGGAAGCATTTGAGATGAA TTGAGATGAAGAGAAGGGT TGAGATGAAGAGAAGGGTT GCATTTGAGATGAAGCGTA CATTTGAGATGAAGCGTAG ATTTGAGATGAAGCGTAGG AGGCTTTGCTTTCTATGAG GGCTTTGCTTTCTATGAGG TAAATAACTTTTAGGGAAA AAATAACTTTTAGGGAAAT TTTTCTCATAAAATGGTCT TTTCTCATAAAATGGTCTC TCAATGGGCAATAAATTAC TAGGGAAATAGAAGTGAGC AGGGAAATAGAAGTGAGCC GGGAAATAGAAGTGAGCCA CTGAGGAAGCATCTGAGAT ATAAAATGGTTTCTAAATG AATAACTTTTACGGAAATA ATAACTTTTACGGAAATAG TCTGAGTAGGTATTTGAGA CTGAGTAGGTATTTGAGAT TGAGTAGGTATTTGAGATG TCCAGGAAAAGGTCACCTG TCATAAAATGGTTTTTGTA TGCTGCCTTTGATGTGTGC GCTGCCTTTGATGTGTGCT TGTTTCTTAGCTTCCAATG GTTTCTTAGCTTCCAATGG GCAATGGGCAATAAATAAC AATTACTTTTCGAGATATT ATTACTTTTCGAGATATTG TTACTTTTCGAGATATTGT ATGGGCAATAAAAAACTTT TGGGCAATAAAAAACTTTT CAATCTGAGGAAGCATTTG CAATAAATAACTTTTAGGA AATAAATAACTTTTAGGAA ATAAATAACTTTTAGGAAA AAATTGGTTTCTGAATGAT AATTGGTTTCTGAATGATT ATTGGTTTCTGAATGATTC TTTATGAGGTGACATTTAA AGGGAAATAGATGTGAGCCAGGGAAATAGATGTGAGCT GGGAAATAGATGTGAGCCAGGGAAATAGATGTGAGCTA GTTTTTCTCATAAAATGAT CTTCCAATGGGCAATAAAT TTCCAATGGGCAATAAATA AGGAGTGCATTAGAATAGA GGAGTGCATTAGAATAGAA GAGTGCATTAGAATAGAAT CCAATCTGAGGAAGTATTT CAATCTGAGGAAGTATTTG AATCTGAGGAAGTATTTGA CAGTAAATAACTTTTAGGG AGTAAATAACTTTTAGGGA TTCTGAGGAAGTTTTTGAG TCTGAGGAAGTTTTTGAGA CTGAGGAAGTTTTTGAGAT TTTTAGGGAAATAGATGTG TTCAATGGGCAATAAATAG AATAAATAACTTTTAGTGA ATAAATAACTTTTAGTGAA TTGCCTTTATGAGGTGACA TGCCTTTATGAGGTGACAT GAATAGAATCGCTCCAGGA AATAGAATCGCTCCAGGAA AGTGGGCAATAAATAAATT AATGTTTCTTAGCTTCAAT ATGTTTCTTAGCTTCAATG TGTTTCTTAGCTTCAATGG GTAAATAACTTTTAGGGAA ATTCTGAGGAAGTTTTTGA TGGGCAGTAAATAACTTTT ATGGGCAGTAAATAACTTT CGCTCCAGGAAAAGGTAAC GCTCCAGGAAAAGGTAACG AGGCTGTGCTGTCTATGAG GGCTGTGCTGTCTATGAGG AACTTTTAGGGAAATAGAT ACTTTTAGGGAAATAGATG CTTTTAGGGAAATAGATGT GCGTAGGCTATGCTGCCTT CGTAGGCTATGCTGCCTTT CCAGGAAAAGGTCACCTGT CTTAGGTTTCAATGGGCAT TTAGGTTTCAATGGGCATT TTCAATGGGCAATAAATTA TGGGCAATAAATTACTTTT TGAGCCAATCTGAGGAAGC GAGCCAATCTGAGGAAGCA ATGGGGAATAAATAACTTT TGTGAGCTAATCTGAGTAG AGGTATTTGAGATGAAGAG GGTATTTGAGATGAAGAGA GCTGTCTATGAGGAGAGCA GCATTAGAATAGAATTGCT CATTAGAATAGAATTGCTC ATTAGAATAGAATTGCTCC AGTGAGCCAATCTGAGGAA GTGAGCCAATCTGAGGAAG TGAGCCAATCTGAGGAAGT TCTATGAGGAGTGTATTAG CTATGAGGAGTGTATTAGA TTTGTATGTTTCTTAGCTT TTGTATGTTTCTTAGCTTT GGAAATAGAAGTGAGCCAA GTTTCTGAATGTTTCTTAG CATCTGAGATGAAGAGAAG TAAATTACTTTTCGAGATA AAATTACTTTTCGAGATAT GGAAAAGGGCACCTGTGTT AAGTGAGCCAATCTGAGGA TTCTTAGCTTTCAATGGGC TCTTAGCTTTCAATGGGCA CTTAGCTTTCAATGGGCAA AGCTTTCAGTGGGCAATAA GCTTTCAGTGGGCAATAAA CTTTCAGTGGGCAATAAAT TGTATTAGAATAGAATCGC AAATAACTTTTACGGAAAT AGATGAAGCGTAGGCTATG GATGAAGCGTAGGCTATGC GAAATAGAAGTGAGCCAAT AAATAGAAGTGAGCCAATC TAAATTGGTTTCTGAATGA GAGTAGGTATTTGAGATGA AGTAGGTATTTGAGATGAA TTTCTTAGCTTCCAATGGG TGAAGAGAAGGCTGTGCTG TCTGAATGTTTCTTAGCTT TTAGCTTTCAGTGGGCAAT TAGCTTTCAGTGGGCAATA TGTGAGCCAATCTGAGGAA TGAGGAAGTTTTTGAGATG GAGGAAGTTTTTGAGATGA AATGGTTTTTGTATGTTTC TTTAGGAAAATAGATGTGA TTAGGAAAATAGATGTGAG TAGGAAAATAGATGTGAGC GTTTTTTCTCATAAAATGG TAGCTTCAATGGGCAATAA AGCTTCAATGGGCAATAAA GCTTCAATGGGCAATAAAA AGGAAGTCTTTGAGATGGA GGAAGTCTTTGAGATGGAG GTGTATTAGAATAGAATCG GAAGTTTTTGAGATGAAGC GTCTATGAGGAGTGTATTA AGCATTTGAGATGAAGCGT TAATCTGAGTAGGTATTTG AATCTGAGTAGGTATTTGA ATCTGAGTAGGTATTTGAG AGATGTGAGCCAATCTGAG GATGTGAGCCAATCTGAGG ATGTGAGCCAATCTGAGGA TGTGCGCCAATCTGTGGAA ATCTGAGATGAAGAGAAGG TCTGAGATGAAGAGAAGGC GAAGAGAAGGGTTTGCTGT ATGGGCAATAAATAACTTT GGAAGTATTTGAGATGAAG GAAGTATTTGAGATGAAGA AAGTATTTGAGATGAAGAG GAGCCATTCTGAGGAAGTT GCTAATCTGAGTAGGTATTGCCAATCTGAGGAAGTATT TTTCAGTGGGCAATAAATA AGGAAGTATTTGAGATGAA TAGGTATTTGAGATGAAGA GGGCAATAAATAACTTTTA GGCAATAAATAACTTTTAG GCAATAAATAACTTTTAGTGCAATAAATAACTTTTAGG CTTTCAATGGGCAGTAAAT GGAAATAGATGTGAGCCAA TTTTAGGAAAATAGATGTG GTTGATTGCCTTTATGAGG AAGAGAAGGGTTTGCTGTC AGAGAAGGGTTTGCTGTCT CGAGATATTGTTGTGCGCC GAGATATTGTTGTGCGCCA AGATATTGTTGTGCGCCAA GAGGAAGCATTTGAGATGA AGGAAGCATTTGAGATGAA TTGCTGTCTATGAGGAGTG TGCTGTCTATGAGGAGTGCTGCTGTCTATGAGGAGTGT CAATCTGAGGAAGTCTTTG AATCTGAGGAAGTCTTTGA ATCTGAGGAAGTCTTTGAG TCCAGGAAAAGGGCACCTG TTTCTTAGCTTTCAGTGGG ATCTGTGGAAGCATTTGAG GGAAATAGATGTGAGCTAA GAAATAGATGTGAGCTAAT ATAAAATGGTTTCTGAATG GAAGCGAAGGCTTTGCTGT TGGGCAATAAATAACTTTT GTTTCTAAATGTTTCTTAG AAATGGTTTCTGAATGTTT AATGGTTTCTGAATGTTTC AGGAAGTTTTTGAGATGAA GGAAGTTTTTGAGATGAAG TTCTCATAAAATGGTCTCT TTTGAGATGAAGCGTAGGC AGTGCATTAGAATAGAATC TTTAGGGAAATAGAAGTGA TTAGGGAAATAGAAGTGAG AGTGTATTAGAATAGAATC GGCAATAAAAAACTTTTAG GCAATAAAAAACTTTTAGG CAATAAAAAACTTTTAGGG ATAGATGTGAGCTAATCTG CTGAATGATTCTTAGGTTT GTTTCTTAGCTTTCAGTGG CATAAAATGGTCTCTGAAT ATAAAAAACTTTTAGGGAA TACTTTTCGAGATATTGTT TGTTGATTGCCTTTATGAG CATAAAATGGTTTTTGTAT AAGAGAAGGCTGTGCTGTC GGGCAATAAAAAACTTTTA GTTTCTTAGCTTCAATGGG AATAGATGTGAGCTAATCT GTAGGTATTTGAGATGAAG AGGAAAATAGATGTGAGCC CTGAGGAAGCATTTGAGAT TGAGGAAGCATTTGAGATG AGCGAAGGCTTTGCTGTCT AGCCATTCTGAGGAAGTTT CTAATCTGAGTAGGTATTT CTTTTAGGAAAATAGATGT TTAGAATAGAATTGCTCCA CAATAAATAACTTTTAGGG AATAAATAACTTTTAGGGA ATAAATAACTTTTAGGGAA GTGCATTAGAATAGAATCG ATATTGTTGTGCGCCAATC TATTGTTGTGCGCCAATCT TCTGAGGAAGTCTTTGAGA GAAGAGAAGGCTGTGCTGT AGCCAATCTGAGGAAGTAT GCCAATCTGAGGAAGTATC GATATTGTTGTGCGCCAAT TAAAATGGTTTCTGAATGT AAAATGGTTTCTGAATGTT CAATAAATAACTTTTAGTG GGTTTCTAAATGTTTCTTA TAGAATAGAATCGCTCCAG ATGGTTTTTGTATGTTTCT GAGGAGTGCATTAGAATAG AATAAAAAACTTTTAGGGA TTTTCGAGATATTGTTGTG TGTTTCTTAGCTTTCAGTG ATGTTTCTTAGCTTTCAGT GAAGAGAAGGCTTTGCTGT TTTCAATGGGCAGTAAATA GGTCACCTGTGTTGATTGC ATAGATGTGAGCCAATCTG TAGATGTGAGCCAATCTGA CTGCCTTTGATGTGTGCTT GAGCCAATCTGAGGAAGTA AAGCGAAGGCTTTGCTGTC AAATAGATGTGAGCTAATC GCACCTGTGTTGATTGCCT GTTTCAATGGGCATTAAAT AGAATAGAATCGCTCCAGG CTTTTCGAGATATTGTTGT ACTTTTCGAGATATTGTTG TGAAGAGAAGGCTTTGCTGTGAAGAGAAGGCTTTGCTT TGCCTTTGATGTGTGCTTT TTTCAATGGGCAATAAATT There are over 3000 20-mers,! and over 30 valid paths!
  20. 20. Genotyping • Use sliding “window” traversal of genome to bucket sites • Currently use a model based off of the samtools mpileup genotype likelihood and EM algorithms • Moving to monoallelic “allele graph” model A CA C C T C T G T C A C C C T C T G T C A CA C C C C T G T C A CA C C C C T G T C A C C C T C TT G T C
  21. 21. Allele Graphs • Edges of graph define conditional probabilities ! ! • Can efficiently marginalize probabilities over graph via belief propagation, exactly solve for argmax ACACTCG C A TCTCA G C TCCACACT Notes:! X = copy number of this allele Y = copy number of preceding allele k = number of reads observed j = number of reads supporting Y —> X transition Pi = probability that read i supports Y —> X transition
  22. 22. Future Work • When integrating over samples, we should cluster samples by similarity • Working on “multi-region” assembly; will integrate alt references, “similar regions” • Performance and accuracy evaluation on Illumina Platinum pedigree, 1000 Genomes
  23. 23. Acknowledgements • UC Berkeley: Matt Massie, Timothy Danford, André Schumacher, Jey Kottalam, Karen Feng, Eric Tu, Niranjan Kumar, Ananth Pallaseni, Anthony Joseph, Dave Patterson! • Mt. Sinai: Arun Ahuja, Neal Sidhwaney, Ryan Williams, Michael Linderman, Jeff Hammerbacher! • GenomeBridge: Carl Yeksigian! • Cloudera: Uri Laserson! • Microsoft Research: Ravi Pandya! • UC Santa Cruz: Benedict Paten, David Haussler! • And many other open source contributors, especially Michael Heuer, Neil Ferguson, Andy Petrella, Xavier Tordior! • Total of 27 contributors to ADAM/BDG from >12 institutions

×