Published on

Published in: Technology, Education
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Biochip

    1. 1. Algorithms for Biochip Design and Optimization Ion Mandoiu Computer Science & Engineering Department University of Connecticut
    2. 2. Overview <ul><li>Physical design of DNA arrays </li></ul><ul><li>DNA tag set design </li></ul><ul><li>Digital microfluidic biochip testing </li></ul><ul><li>Conclusions </li></ul>
    3. 3. Driver Biochip Applications <ul><li>Driver applications </li></ul><ul><ul><li>Gene expression (transcription analysis) </li></ul></ul><ul><ul><li>SNP genotyping </li></ul></ul><ul><ul><li>CNP analysis </li></ul></ul><ul><ul><li>Genomic-based microorganism identification </li></ul></ul><ul><ul><li>Point-of-care diagnosis </li></ul></ul><ul><ul><li>healthcare, forensics, environmental monitoring,… </li></ul></ul><ul><li>As focus shifts from basic research to clinical applications, there are increasingly stringent design requirements on sensitivity, specificity, cost </li></ul><ul><ul><li>Assay design and optimization become critical </li></ul></ul>
    4. 4. <ul><li>Human Genome  3  10 9 base pairs </li></ul><ul><li>Main form of variation between individual genomes: single nucleotide polymorphisms (SNPs) </li></ul><ul><li>Total #SNPs  1  10 7 </li></ul><ul><li>Difference b/w any two individuals  3  10 6 SNPs (  0.1% of entire genome) </li></ul>Single Nucleotide Polymorphisms … ataggtcc C tatttcgcgc C gtatacacggg T ctata … … ataggtcc G tatttcgcgc A gtatacacggg A ctata … … ataggtcc C tatttcgcgc C gtatacacggg T ctata …
    5. 5. Watson-Crick Complementarity <ul><li>Four nucleotide types: A,C,T,G </li></ul><ul><ul><li>A’s paired with T’s (2 hydrogen bonds) </li></ul></ul><ul><ul><li>C’s paired with G’s (3 hydrogen bonds) </li></ul></ul>
    6. 6. SNP genotyping via direct hybridization Hybridization <ul><li>SNP1 with alleles T/G </li></ul><ul><li>SNP2 with alleles A/G </li></ul>Array with 2 probes/SNP Labeled sample A C T C G A A C T C G A Optical scanning used to identify alleles present in the sample
    7. 7. In-Place Probe Synthesis CG AC CG AC ACG AG G AG C Probes to be synthesized A A A A A
    8. 8. In-Place Probe Synthesis CG AC CG AC ACG AG G AG C Probes to be synthesized A A A A A C C C C C C
    9. 9. In-Place Probe Synthesis CG AC CG AC ACG AG G AG C Probes to be synthesized A A A A A C C C C C C G G G G G G
    10. 10. Simplified DNA Array Flow Probe Selection Array Manufacturing Hybridization Experiment Gene expression levels, SNP genotypes,… Analysis of Hybridization Intensities Mask Manufacturing Physical Design: Probe Placement & Embedding Design Manufacturing End User
    11. 11. Unwanted Illumination Effect <ul><li>Unintended illumination during manufacturing  synthesis of erroneous probes </li></ul><ul><li>Effect gets worse with technology scaling </li></ul>
    12. 12. Border Length Minimization Objective <ul><li>Effects of unintended illumination  border length </li></ul>A A A A A C C C C C C G G G G G G border CG AC CG AC ACG AG G AG C
    13. 13. Synchronous Synthesis <ul><li>Periodic deposition sequence, e.g., (ACTG) k </li></ul><ul><ul><li>Each probe grown by one nucleotide in each period </li></ul></ul> # border conflicts b/w adjacent probes = 2 x Hamming distance T G C A T G T G C A … C A period C T A C G T
    14. 14. 2D Placement Problem <ul><li>Find minimum cost mapping of the Hamming graph onto the grid graph </li></ul><ul><li>Special case of the Quadratic Assignment Problem </li></ul>Edge cost = 2 x Hamming distance probe
    15. 15. 2D Placement: Sliding-Window Matching <ul><li>Slide window over entire chip </li></ul><ul><li>Repeat fixed # of iterations (  O(N) time for fixed window size), or until improvement drops below certain threshold </li></ul><ul><li>Proposed by [Doll et al. ‘94] in VLSI context </li></ul>1 3 2 5 4 Select mutually nonadjacent probes from small window 2 2 3 1 4 Re-assign optimally
    16. 16. 2D Placement: Epitaxial Growth <ul><li>Proposed by [PreasL’88, ShahookarM’91] in VLSI context </li></ul><ul><li>Simulates crystal growth </li></ul><ul><li>Efficient “row” implementation </li></ul><ul><ul><li>Use lexicographical sorting for initial ordering of probes </li></ul></ul><ul><ul><li>Fill cells row-by-row </li></ul></ul><ul><ul><li>Bound number of candidate probes considered when filling each cell </li></ul></ul><ul><ul><li>Constant # of lookahead rows  O(N 3/2 ) runtime, N = #probes </li></ul></ul>
    17. 17. 2D Placement: Recursive Partitioning <ul><li>Very effective in VLSI placement [AlpertK’95,Caldwell et al.’00] </li></ul><ul><li>4-way partition using linear time clustering </li></ul><ul><li>Repeat until Row-Epitaxial can be applied </li></ul>
    18. 18. Asynchronous Synthesis A A A C C C T T T G G G A C T G A G T G T G A A Deposition Sequence Probes Synchronous Embedding A G T A G G T A G A A G T A G T ASAP Embedding G
    19. 19. <ul><li>Efficient solution by dynamic programming </li></ul>Optimal Single-Probe Re-Embedding A C T A C G T A C G T Source Sink
    20. 20. In-Place Re-Embedding Algorithms <ul><li>2D placement fixed, allow only probe embeddings to change </li></ul><ul><ul><li>Greedy: optimally re-embed probe with largest gain </li></ul></ul><ul><ul><li>Chessboard: alternate re-embedding of black/white cells </li></ul></ul><ul><ul><li>Sequential: re-embed probes row-by-row </li></ul></ul>CPU %LB CPU %LB CPU %LB 121.4 120.5 Chessboard 1423 54 127.1 125.7 Greedy 120.9 119.9 Sequential 1535 943 500 64 40 100 Chip size
    21. 21. Integration with Probe Selection Probe Selection Physical Design: Placement & Embedding Probe Pools Chip size 100x100 Pool Row-Epitaxial Pool Size 7515 15.2 16 3645 11.8 8 1796 8.2 4 1040 4.3 2 217 - 1 CPU sec. % Improv
    22. 22. Overview <ul><li>Physical design of DNA arrays </li></ul><ul><li>DNA tag set design </li></ul><ul><li>Digital microfluidic biochip testing </li></ul><ul><li>Conclusions </li></ul>
    23. 23. Universal Tag Arrays <ul><li>Brenner 97, Morris et al. 98 </li></ul><ul><ul><li>Array consisting of application independent tags </li></ul></ul><ul><ul><li>Two-part “reporter” probes: aplication specific primers ligated to antitags </li></ul></ul><ul><ul><li>Detection carried by a sequence of reactions separately involving the primer and the antitag part of reporter probes </li></ul></ul>
    24. 24. Universal Tag Array Advantages <ul><li>Cost effective </li></ul><ul><ul><li>Same tag array used for different analyses </li></ul></ul><ul><ul><li> can be mass-produced </li></ul></ul><ul><ul><li>Only need to synthesize new set of reporter probes </li></ul></ul><ul><li>More reliable! </li></ul><ul><ul><li>Solution phase hybridization better understood than hybridization on solid support </li></ul></ul>
    25. 25. SNP Genotyping with Tag Arrays Tag + Primer G A G C antitag <ul><li>Mix reporter probes with unlabeled genomic DNA </li></ul>2. Solution phase hybridization 3. Single-Base Extension (SBE) 4. Solid phase hybridization G A G G A G T G A T C C T C C
    26. 26. Tag Set Design Problem <ul><li>(H1) Tags hybridize strongly to complementary antitags </li></ul><ul><li>(H2) No tag hybridizes to a non-complementary antitag </li></ul>t1 t1 t2 t2 t1 t2 t1 Tag Set Design Problem: Find a maximum cardinality set of tags satisfying (H1)-(H2)
    27. 27. Hybridization Models <ul><li>Melting temperature Tm: temperature at which 50% of duplexes are in hybridized state </li></ul><ul><li>2-4 rule </li></ul><ul><li>Tm = 2 #(As and Ts) + 4 #(Cs and Gs) </li></ul><ul><li>More accurate models exist, e.g., the near-neighbor model </li></ul>
    28. 28. <ul><li>Hamming distance model, e.g., [Marathe et al. 01] </li></ul><ul><ul><li>Models rigid DNA strands </li></ul></ul><ul><li>LCS/edit distance model, e.g., [Torney et al. 03] </li></ul><ul><ul><li>Models infinitely elastic DNA strands </li></ul></ul><ul><li>c-token model [Ben-Dor et al. 00]: </li></ul><ul><ul><li>Duplex formation requires formation of nucleation complex between perfectly complementary substrings </li></ul></ul><ul><ul><li>Nucleation complex must have weight  c, where wt(A)=wt(T)=1, wt(C)=wt(G)=2 (2-4 rule) </li></ul></ul>Hybridization Models (contd.)
    29. 29. c-h Code Problem <ul><li>c-token: left-minimal DNA string of weight  c, i.e., </li></ul><ul><ul><li>w(x)  c </li></ul></ul><ul><ul><li>w(x’) < c for every proper suffix x’ of x </li></ul></ul><ul><li>A set of tags is a c-h code if </li></ul><ul><ul><li>(C1) Every tag has weight  h </li></ul></ul><ul><ul><li>(C2) Every c-token is used at most once </li></ul></ul>c-h Code Problem [Ben-Dor et al.00] Given c and h, find maximum cardinality c-h code
    30. 30. Algorithms for c-h Code Problem <ul><li>[Ben-Dor et al.00] approximation algorithm based on DeBruijn sequences </li></ul><ul><li>Alphabetic tree search algorithm </li></ul><ul><ul><li>Enumerate candidate tags in lexicographic order, save tags whose c-tokens are not used by previously selected tags </li></ul></ul><ul><ul><li>Easily modified to handle various combinations of constraints </li></ul></ul><ul><li>[MT 05, 06] Optimum c-h codes can be computed in practical time for small values of c by using integer programming </li></ul><ul><ul><li>Practical runtime using Garg-Koneman approximation and LP-rounding </li></ul></ul>
    31. 31. Token Content of a Tag <ul><li>c=4 </li></ul><ul><li>CCAGATT </li></ul><ul><li>CC </li></ul><ul><li>CCA </li></ul><ul><li>CAG </li></ul><ul><li>AGA </li></ul><ul><li>GAT </li></ul><ul><li>GATT </li></ul>Tag  sequence of c-tokens End pos: 2 3 4 5 6 7 c-token: CC  CCA  CAG  AGA  GAT  GATT
    32. 32. Layered c-token graph for length-l tags s t c 1 c N l l-1 c/2 (c/2)+1 …
    33. 33. Integer Program Formulation [MPT05] <ul><li>Maximum integer flow problem w/ set capacity constraints </li></ul><ul><li>O( hN) constraints & variables, where N = #c-tokens </li></ul>
    34. 34. Packing LP Formulation
    35. 35. Garg-Konemann Algorithm <ul><li>x  0; y   // y i are variables of the dual LP </li></ul><ul><li>Find min weight s-t path p, where weight(v) = y i for every v  V i </li></ul><ul><li>While weight(p) < 1 do </li></ul><ul><li> M  max i |p  V i | </li></ul><ul><li>x p  x p + 1/M </li></ul><ul><li>For every i, y i  y i ( 1 +  * |p  V i |/M ) </li></ul><ul><li>Find min weight s-t path p, where weight(v) = y i for v  V i </li></ul><ul><li>4. For every p, x p  x p / (1 - log 1+   ) </li></ul>[GK98] The algorithm computes a factor (1-  ) 2 approximation to the optimal LP solution with (N/  )* log 1+  N shortest path computations
    36. 36. LP Based Tag Set Design <ul><li>Run Garg-Konemann and store the minimum weight paths in a list </li></ul><ul><li>Traversing the list in reverse order, pick tags corresponding to paths if they are feasible and do not share c-tokens with already selected tags </li></ul><ul><li>Mark used c-tokens and run the alphabetic tree search algorithm to select additional tags </li></ul>
    37. 37. Periodic Tags [MT05] <ul><li>Key observation: c-token uniqueness constraint in c-h code formulation is too strong </li></ul><ul><ul><li>A c-token should not appear in two different tags, but can be repeated in a tag </li></ul></ul><ul><li>A tag t is called periodic if it is the prefix of (  )  for some “period”  </li></ul><ul><ul><li>Periodic strings make best use of c-tokens </li></ul></ul>
    38. 38. c-token factor graph, c=4 (incomplete) CC AAG AAC AAAA AAAT
    39. 39. Vertex-disjoint Cycle Packing Problem <ul><li>Given directed graph G, find maximum number of vertex disjoint directed cycles in G </li></ul><ul><li>[MT 05] APX-hard even for regular directed graphs with in-degree and out-degree 2 </li></ul><ul><ul><li>h-c/2+1 approximation factor for tag set design problem </li></ul></ul><ul><li>[Salavatipour and Verstraete 05] </li></ul><ul><ul><li>Quasi-NP-hard to approximate within  (log 1-  n) </li></ul></ul><ul><ul><li>O(n 1/2 ) approximation algorithm </li></ul></ul>
    40. 40. Cycle Packing Algorithm <ul><li>Construct c-token factor graph G </li></ul><ul><li>T  {} </li></ul><ul><li>For all cycles C defining periodic tags, in increasing order of cycle length, </li></ul><ul><ul><ul><li>Add to T the tag defined by C </li></ul></ul></ul><ul><ul><ul><li>Remove C from G </li></ul></ul></ul><ul><li>Perform an alphabetic tree search and add to T tags consisting of unused c-tokens </li></ul><ul><li>Return T </li></ul>
    41. 41. Experimental Results h
    42. 42. More Hybridization Constraints… <ul><li>Enforced during tag assignment by </li></ul><ul><ul><li>- Leaving some tags unassigned and distributing primers across multiple arrays [Ben-Dor et al. 03] </li></ul></ul><ul><ul><li>- Exploiting availability of multiple primer candidates [MPT05] </li></ul></ul>t1 t2 t1
    43. 43. Herpes B Gene Expression Assay GenFlex Tags Periodic Tags % Util. # arrays % Util. # arrays % Util. # arrays 76.10 1 99.80 2 97.80 4 5 76.10 1 98.90 2 96.73 4 1 1522 70 78.00 1 99.90 2 98.00 4 5 78.00 1 98.70 2 96.53 4 1 1560 67 72.30 1 100.00 2 96.13 4 5 72.30 1 97.20 2 94.06 4 1 1446 60 2000 tags 1000 tags 500 tags Pool size # pools T m % Util. # arrays % Util. # arrays % Util. # arrays 70.30 2 91.10 2 92.26 4 5 65.40 2 73.65 3 88.46 4 1 1522 70 67.20 2 76.00 3 91.86 4 5 61.15 2 69.70 3 86.33 4 1 1560 67 63.55 2 70.95 3 88.26 4 5 57.05 2 65.35 3 82.26 4 1 1446 60 2000 tags 1000 tags 500 tags Pool size # pools T m
    44. 44. Overview <ul><li>Physical design of DNA arrays </li></ul><ul><li>DNA tag set design </li></ul><ul><li>Digital microfluidic biochip testing </li></ul><ul><li>Conclusions </li></ul>
    45. 45. Digital Microfluidic Biochips [Srinivasan et al. 04] <ul><li>Electrodes typically arranged in rectangular grid </li></ul><ul><li>Droplets moved by applying voltage to adjacent cell </li></ul><ul><li>Can be used for analyses of DNA, proteins, metabolites… </li></ul>[Su&Chakrabarty 06] I/O I/O Cell
    46. 46. Design Challenges <ul><li>Testing </li></ul><ul><ul><li>High electrode failure rate, but can re-configure around </li></ul></ul><ul><ul><li>Performed both after manufacturing and concurrent with chip operation </li></ul></ul><ul><ul><li>Main objective is minimization of completion time </li></ul></ul><ul><li>Module placement </li></ul><ul><ul><li>Assay operations (mixing, amplification, etc.) can be mapped to overlapping areas of the chip if performed at different times </li></ul></ul><ul><li>Droplet routing </li></ul><ul><ul><li>When multiple droplets are routed simultaneously must prevent accidental droplet merging or interference </li></ul></ul>Merging Interference
    47. 47. Concurrent Testing Problem <ul><li>GIVEN: </li></ul><ul><ul><li>Input/Output cells </li></ul></ul><ul><ul><li>Position of obstacles (cells in use by ongoing reactions) </li></ul></ul><ul><li>FIND: </li></ul><ul><ul><li>Trajectories for test droplets such that </li></ul></ul><ul><ul><ul><li>Every non-blocked cell is visited by at least one test droplet </li></ul></ul></ul><ul><ul><ul><li>Droplet trajectories meet non-merging and non-interference constraints </li></ul></ul></ul><ul><ul><ul><li>Completion time is minimized </li></ul></ul></ul>Defect model: test droplet gets stuck at defective electrode [Su et al. 04] ILP-based solution for single test droplet case & heuristic for multiple input-output pairs with single test droplet/pair
    48. 48. ILP Formulation for Unconstrained Number of Droplets <ul><li>Each cell (i,j) visited at least once: </li></ul><ul><li>Droplet conservation: </li></ul><ul><li>No droplet merging: </li></ul><ul><li>No droplet interference: </li></ul><ul><li>Minimize completion time: </li></ul>
    49. 49. Special Case <ul><li>NxN Chip </li></ul><ul><li>I/O cells in Opposite Corners </li></ul><ul><li>No Obstacles </li></ul><ul><li> Single droplet solution needs N 2 cycles </li></ul>
    50. 50. Lower Bound <ul><li>Claim: Completion time is at least 4N – 4 cycles </li></ul>Proof: In each cycle, each of the k droplets place 1 dollar in current cell  3k(k-1)/2 dollars paid waiting to depart  3k(k-1)/2 dollars paid waiting for last droplet  k dollars in each diagonal  1 dollar in each cell
    51. 51. Stripe Algorithm with N/3 Droplets Stripe algorithm has approximation factor of
    52. 52. Stripe Algorithm with Obstacles of width Q <ul><li>Divide array into vertical stripes of width Q+1 </li></ul><ul><li>Use one droplet per stripe </li></ul><ul><li>All droplets visit cells in assigned stripes in parallel </li></ul><ul><ul><li>In case of interference droplet on left stripe waits for droplet in right stripe </li></ul></ul>
    53. 53. Results for 120x120 Chip, 2x2 Obstacles ~20x decrease in completion time by using multiple droplets 19x 570 736.6 1071 1501 10800 25% 20x 580.8 738.4 1046.8 1501 11520 20% 21x 588.2 730.8 1025.8 1501 12240 15% 22x 592.6 734.8 1010.8 1490 12960 10% 23x 596.2 725 982.8 1473 13680 5% 24x 598.8 715.2 953.4 1420 14256 1% 24x 593 710 944 1412 14400 0% k=40 k=30 k=20 k=12 k=1 k=40 vs. k=1 speed-up Average completion time (cycles) Obstacle Area
    54. 54. Overview <ul><li>Physical design of DNA arrays </li></ul><ul><li>DNA tag set design </li></ul><ul><li>Digital microfluidic biochip testing </li></ul><ul><li>Conclusions </li></ul>
    55. 55. Conclusions <ul><li>Biochip design is a fertile area of applications </li></ul><ul><ul><li>Combinatorial optimization techniques can yield significant improvements in assay quality/throughput </li></ul></ul><ul><li>Very dynamic area, driver applications and underlying technologies change rapidly </li></ul>
    56. 56. Acknowledgments <ul><li>Physical design of DNA arrays: A.B. Kahng, P. Pevzner, S. Reda, X. Xu, A. Zelikovsky </li></ul><ul><li>Tag set design: D. Trinca </li></ul><ul><li>Testing of digital microfluidic biochips: R. Garfinkel, B. Pasaniuc, A. Zelikovsky </li></ul><ul><li>Financial support: UCONN Research Foundation, NSF awards 0546457 and 0543365 </li></ul>
    57. 57. Questions?