SlideShare a Scribd company logo
Douglas Cork
Steven Lembark
HIV­1, W­curves, & Shoe Leather
●   Existing genetics tools fail on HIV­1
    ●   They make assumptions based on “normal” DNA 
        that fail on HIV – or cancer, or plants.
    ●   Correlation tools look at evolution, not state.
●   We are working on tools for clinical analysis.
    ●   The W­curve abstracts DNA into geometry.
    ●   The TSP clusters genenes rather than trying to 
        impute inheritence.
Sequences Inform Treatment
●   Treating HIV requires sequencing it to choose 
    appropriate drugs:
    ●   HIV­1 evolves drug resistence in months.
    ●   Multiple strains in a single pateint are common, 
        both from multiple sources or evolution.
    ●   Crossover recombination relatively common due to 
        cross­infected cells.
Problem: HIV is Hard to Analyze
●   HIV is a non­correcting retrovirus.
●   Evolves 10,000 times faster than humans or 
    influenza – one new strain per patient per day.
●   Genomes for wild types range from 8349 to 
    9829 bases, making localized comparisions 
    difficult.
●   The single FDA approved algorithm directing 
    treatment from sequence handles only type­B; 
    the U.S. Army has 15%+ non­B infections.
The Current Tools
●   Blast, Fasta, ClustalW perform alignment.
    ●   Table­driven analysis of base transitions.
    ●   Score the entire sequence with a single value.
●   Graphical tools are designed to display 
    inheritence rather than state.
    ●   Output is difficult to read in a clinical setting.
Phenogram of Drug­
Resistant and Random
Samples
●   Tries to show ancestory, 
    not state.
●   Not very good for visual 
    identification of which 
    patients are drug 
    resistant.
Trees are not particularly
helpful either.
HIVHXB2CG            TGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGA
                            AY736838-gp120_      -------------------------------TACAGTTTATTATGGGGTGCCTGTGTGGA
                                                                                 ***** *********** **********
                            HIVHXB2CG            AGGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTAC
                            AY736838-gp120_      GAGATGCAGATACCACCCTATTTTGTGCATCAGATGCCAAGGCACATGAGACAGAAGTGC




ClustalW of gp120
                                                   ** ***   ***** ******************** ** *** **** ***** ** *
                            HIVHXB2CG            ATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTAT
                            AY736838-gp120_      ACAATGTCTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAATACACC
                                                 * ***** ********************************************* **
                            HIVHXB2CG            TGGTAAATGTGACAGAAAATTTTAACATGTGGAAAAATGACATGGTAGAACAGATGCATG
                            AY736838-gp120_      TGGAAAATGTAACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAGCAGATGCAGG
                                                 *** ****** *************************** ********** ******** *
                            HIVHXB2CG            AGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCT
                            AY736838-gp120_      AGGATGTAATCAGTTTATGGGATCAAAGTCTAAAGCCATGTGTAAAGTTAACTCCTCTCT
                                                 ***** ********************** ***************** ***** ** ****


    Difficult to compare 
                            HIVHXB2CG            GTGTTAGTTTAAAGTGCAC------TGATTTGAAGAATGATACTAATACCAATAGTAGTA
                            AY736838-gp120_      GCGTTACTTTAAATTGTACCAATGCTAATTTGACCAATGGCAGTAGCAAAACCAATGTCT
●                                                * **** ****** ** **      * ****** **** * ** * * * *
                            HIVHXB2CG            GCGGGAGAATGATAATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAA
                            AY736838-gp120_      CTAACATAATAGGAAATATAACAGATGAAGTAAGAAACTGTACTTTTAATATGACCACAG


    sequences vis.ually.
                                                      * ***   **     * ** ** *** ****** **** ***** * ****
                            HIVHXB2CG            GCATAAGAGGTAAGGTGCAGAAAGAATATGCATTTTT
                            TTATAAACTTGATATAATACCAA

                            AY736838-gp120_      AACTAACAGATAAGAAGCAGAAGGTCCATGCACTCTTTTATAAGCTTGATATAGTACAAA
                                                     *** ** **** ****** *    ***** * ******** ********* *** **

●   Not useful for large    HIVHXB2CG
                            AY736838-gp120_

                            HIVHXB2CG
                                                 T---AGATAATGATACTACCAGC---TATAAGTTGACAAGTTGTAACACCTCAGTCATTA
                                                 TTGAAGATAAGAAGAATAGTAGTGAGTATAGGTTAATAAATTGTAATACTTCAGTCATTA
                                                 *    ****** * * ** **      **** *** * ** ****** ** **********
                                                 CACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTG


    numbers of 
                            AY736838-gp120_      AGCAGGCTTGTCCAAAGATATCCTTTGATCCAATTCCTATACATTATTGTACTCCAGCTG
                                                    ***** ********* ********** ******** ************ * ** ****
                            HIVHXB2CG            GTTTTGCGATTCTAAAATGTAATAATAAGACGTTCAATGGAACAGGACCATGTACAAATG
                            AY736838-gp120_      GTTATGCGATTTTAAAGTGTAATGATAAGAATTTCAATGGGACAGGGCCATGTAAAAATG


    sequences.
                                                 *** ******* **** ****** ****** ******** ***** ******* *****
                            HIVHXB2CG            TCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAA
                            AY736838-gp120_      TCAGCTCAGTACAATGCACACATGGAATTAAGCCAGTGGTATCAACTCAATTGCTGTTAA
                                                 ***** ********** ************* ****** ************ *********
                            HIVHXB2CG            ATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGTCAATTTCACGGACAATGCTA
                            AY736838-gp120_      ATGGCAGTCTAGCAGAAGAAGAGATAATAATCAGATCTGAAGATCTCACAAACAATGCCA


    Gaps make analysis 
                                                 *********************** ** **** *******     ** **** ******* *
                            HIVHXB2CG            AAACCATAATAGTACAGCTGAACACATCTGTAGAAATTAATTGTACAAGACCCAACAACA
●                           AY736838-gp120_      AAACCATAATAGTGCACCTTAATAAATCTGTAGAAATCAATTGTACCAGACCCTCCAACA
                                                 ************* ** ** ** * ************ ******** ****** *****
                            HIVHXB2CG            ATACAAGAAAAAGAATCCGTATCCAGAGAGGACCAGGGAGAGCATTTGTTACAATAGGAA


    difficult               AY736838-gp120_

                            HIVHXB2CG
                            AY736838-gp120_
                                                 ATACAAGAACAAGTATAACTAT------AGGACCAGGACGAGTATTCTATAGAACAGGAG
                                                 ********* *** **    ***      ********* *** ***     ** ** ****
                                                 A---AATAGGAAATATGAGACAAGCACATTGTAACATTAGTAGAGCAAAATGGAATAACA
                                                 ATATAATAGGAAATATAAGAAAAGCATATTGTGAGATTAATGGAACAAAATGGAATAAAG
                                                 *    ************ *** ***** ***** * **** * ** *************
                            HIVHXB2CG            CTTTAAAACAGATAGCTAGCAAATTAAGAGAACAATTTGGAAATAATAAAACAATAATCT
                            AY736838-gp120_      TTTTAAAACAGGTAACTGAAAAATTAAAAGAGCACTTT------AATAAGACAATAATCT
                                                   ********** ** **   ******* *** ** ***       ***** **********
                            HIVHXB2CG            TTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTTTAATTGTGGAGGGG
                            AY736838-gp120_      TTCAACCACCCTCAGGAGGAGATCTAGAAATTACAATGCATCATTTTAATTGTAGAGGGG
                                                 ** * * * ********** ** * ******* ** ***      ********** ******
                            HIVHXB2CG            AATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTGGA
                            AY736838-gp120_      AATTTTTCTATTGCAATACAACAAAACTGTTTAATAATATTTGCCTAGGAAATG---AAA
                                                 ********** ** *** ***** ************ ** *** *       * *       *
                            HIVHXB2CG            GTACTGAAGGGTCAAATAACACTGAAGGAAGTGACACAATCACCCTCCCATGCAGAATAA
                            AY736838-gp120_      CCATGGCGGGGTGTAATGACACT---------------ATCACACTTCCATGCAAGATAA
                                                    * * **** *** *****                   ***** ** ******* ****
New Tools
●   Clinical vs. evolutionary.
●   Avoid assumptions that break current tools.
●   Suitable for a repeatable process in clinics or 
    data mining in research.
●   We are using:
    ●   W­curve for analysis.
    ●   TSP for clustering.
    ●   R for data management & display.
W­curve
●   Geometric abstraction of DNA.
●   Manufactured by a simple state machine.
●   Alignment at finer scale available using 
    geometry than character strings.
●   Avoids assumptions about transition 
    probabilities by taking the figure as­is.
W­Curve Generator is a State Machine
●   C,A,T,G are assigned to corners of a square.
●   Successive points move halfway to the next 
    base's corner.
W­curve for “CG”
●   Curve shown 
    in Blue.
●   Halfway to C 
    then G in 
    X‑Y, single 
    steps in Z.
●   Cyl. storage 
    simplifies 
    comparision.
W­curve of Wild HIV­1 POL Gene
W­curve of Wild HIV­1 POL
W­curves of Wild & Drug Resistant Pol
Detail of Wild & Drug Resistant Pol
Distance Metric
●   Bases are arranged in 
    square to minimize 
    effects of SNP's.
●   Synonymous SNP's 
    are usually in the 
    same quadrant.
●   Points within same 
    quadrant have small 
    difference, opposite 
    quad's get larger.
Comparison Produces “Chunks”
●   Comparison yields a list of chunks.
●   Curves are aligned within the chunk.
●   Summing chunks gives single value two curves.
●   Analyzing them in detail allows mining local 
    similarities and variations.
●   Grouping allows examination of crossover­
    recombination events.
Clustering: Traveling Salesman Problem
●   The TSP is simple to describe, hard to solve:
    ●   Starting and finishing in the same city.
    ●   Visit a list of cities once each.
    ●   Minimize the distance (cost).
●   Optimal solutions will cluster the nearby cities.
●   The problem was always in defining the 
    clusters.
Take a Walk and Cluster Your Genes
●   Climer & Zhang, 2004.
●   Method for detecting N clusters:
    ●   Add  N dummy cities to the distance map.
    ●   Each one has the same, small distance to all other 
        cities (we use 2­20).
    ●   Dummy cities end up in the inter­cluster gaps.
●   The process is trivial to implement: just add that 
    many rows and columns to the original 
    comparison matrix.
Displaying the Tour
●   Mapping the tour onto a circle gives a good 
    view of the distances.
●   Coloring simplifies inspection.
    ●   Black dots for dummy cities.
    ●   Single type at the top (e.g. wild type).
    ●   Color successive data points using the “rainbow” 
        sequence with a large number of colors.
    ●   Sequences more alike get more similar colors.
Example with 8 D­R, 100 Samples
Multiple uses for color sequence.
●   Track individual over time.
    ●   Progression through colors shows history.
    ●   Clustering highlights progression towards drug 
        resistance.
●   Track sample population.
    ●   Recycling the colors from one initial tour helps show 
        changes in successive graphs.
    ●   Simplifies tracking progression in anonymous 
        populations found in HIV treatment centers.
Visualizing W­curves
●   We use a WebGL­based package “WebCurve”.
●   Developed at IIT as a web­friendly solution for  
    examining 3D geometry.
●   Gracefully handles displaying 100+ sequences 
    at 10K bases each on a notebook computer.
●   Available from github, archive includes a web 
    server and code to generate files for display.
Summary
●   W­curve and TSP allow us to cluster genes.
●   Provides a more useful output in a clinical 
    setting.
●   Color coding the TSP results allows tracking 
    changes in a population or progression an 
    individual over time.

More Related Content

Similar to Clustering Genes: W-curve + TSP

Analysis of Manduca sexta Chitinase-h sequence
Analysis of Manduca sexta Chitinase-h sequenceAnalysis of Manduca sexta Chitinase-h sequence
Analysis of Manduca sexta Chitinase-h sequence
nongkat
 
Biotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsBiotech Era Ahead: Transcriptomics
Biotech Era Ahead: Transcriptomics
Taha A. Taha
 
Interesting Tuur
Interesting TuurInteresting Tuur
Interesting Tuurmeneertuur
 
Sequence Analysis - 2-16-16 (1)
Sequence Analysis - 2-16-16 (1)Sequence Analysis - 2-16-16 (1)
Sequence Analysis - 2-16-16 (1)Alexander Ward
 
Bioinformetics - genetic variation between haemoglobin protein of humans and ...
Bioinformetics - genetic variation between haemoglobin protein of humans and ...Bioinformetics - genetic variation between haemoglobin protein of humans and ...
Bioinformetics - genetic variation between haemoglobin protein of humans and ...
Aniket Bagul
 
Stability resume
  Stability  resume  Stability  resume
Stability resumeRabah HELAL
 
primer analysis.pdf
primer analysis.pdfprimer analysis.pdf
primer analysis.pdf
AaimaAfzaal
 
Steven detrie's protein synthesis model
Steven detrie's protein synthesis modelSteven detrie's protein synthesis model
Steven detrie's protein synthesis model
punxsyscience
 
Steven detrie protein synthesis model
Steven detrie protein synthesis modelSteven detrie protein synthesis model
Steven detrie protein synthesis modelpunxsyscience
 
Protein synthesis model
Protein synthesis modelProtein synthesis model
Protein synthesis modelpunxsyscience
 
Steven detrie protein synthesis model
Steven detrie protein synthesis modelSteven detrie protein synthesis model
Steven detrie protein synthesis modelpunxsyscience
 

Similar to Clustering Genes: W-curve + TSP (12)

Analysis of Manduca sexta Chitinase-h sequence
Analysis of Manduca sexta Chitinase-h sequenceAnalysis of Manduca sexta Chitinase-h sequence
Analysis of Manduca sexta Chitinase-h sequence
 
Biotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsBiotech Era Ahead: Transcriptomics
Biotech Era Ahead: Transcriptomics
 
Interesting Tuur
Interesting TuurInteresting Tuur
Interesting Tuur
 
Allegato 2
Allegato 2Allegato 2
Allegato 2
 
Sequence Analysis - 2-16-16 (1)
Sequence Analysis - 2-16-16 (1)Sequence Analysis - 2-16-16 (1)
Sequence Analysis - 2-16-16 (1)
 
Bioinformetics - genetic variation between haemoglobin protein of humans and ...
Bioinformetics - genetic variation between haemoglobin protein of humans and ...Bioinformetics - genetic variation between haemoglobin protein of humans and ...
Bioinformetics - genetic variation between haemoglobin protein of humans and ...
 
Stability resume
  Stability  resume  Stability  resume
Stability resume
 
primer analysis.pdf
primer analysis.pdfprimer analysis.pdf
primer analysis.pdf
 
Steven detrie's protein synthesis model
Steven detrie's protein synthesis modelSteven detrie's protein synthesis model
Steven detrie's protein synthesis model
 
Steven detrie protein synthesis model
Steven detrie protein synthesis modelSteven detrie protein synthesis model
Steven detrie protein synthesis model
 
Protein synthesis model
Protein synthesis modelProtein synthesis model
Protein synthesis model
 
Steven detrie protein synthesis model
Steven detrie protein synthesis modelSteven detrie protein synthesis model
Steven detrie protein synthesis model
 

More from Workhorse Computing

Wheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility Modules
Workhorse Computing
 
mro-every.pdf
mro-every.pdfmro-every.pdf
mro-every.pdf
Workhorse Computing
 
Paranormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add UpParanormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add Up
Workhorse Computing
 
The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.
Workhorse Computing
 
Unit Testing Lots of Perl
Unit Testing Lots of PerlUnit Testing Lots of Perl
Unit Testing Lots of Perl
Workhorse Computing
 
Generating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in PosgresqlGenerating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in Posgresql
Workhorse Computing
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!
Workhorse Computing
 
BSDM with BASH: Command Interpolation
BSDM with BASH: Command InterpolationBSDM with BASH: Command Interpolation
BSDM with BASH: Command Interpolation
Workhorse Computing
 
Findbin libs
Findbin libsFindbin libs
Findbin libs
Workhorse Computing
 
Memory Manglement in Raku
Memory Manglement in RakuMemory Manglement in Raku
Memory Manglement in Raku
Workhorse Computing
 
BASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic InterpolationBASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic Interpolation
Workhorse Computing
 
Effective Benchmarks
Effective BenchmarksEffective Benchmarks
Effective Benchmarks
Workhorse Computing
 
Metadata-driven Testing
Metadata-driven TestingMetadata-driven Testing
Metadata-driven Testing
Workhorse Computing
 
The W-curve and its application.
The W-curve and its application.The W-curve and its application.
The W-curve and its application.
Workhorse Computing
 
Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.
Workhorse Computing
 
Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.
Workhorse Computing
 
Smoking docker
Smoking dockerSmoking docker
Smoking docker
Workhorse Computing
 
Getting Testy With Perl6
Getting Testy With Perl6Getting Testy With Perl6
Getting Testy With Perl6
Workhorse Computing
 
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Workhorse Computing
 
Neatly folding-a-tree
Neatly folding-a-treeNeatly folding-a-tree
Neatly folding-a-tree
Workhorse Computing
 

More from Workhorse Computing (20)

Wheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility Modules
 
mro-every.pdf
mro-every.pdfmro-every.pdf
mro-every.pdf
 
Paranormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add UpParanormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add Up
 
The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.
 
Unit Testing Lots of Perl
Unit Testing Lots of PerlUnit Testing Lots of Perl
Unit Testing Lots of Perl
 
Generating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in PosgresqlGenerating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in Posgresql
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!
 
BSDM with BASH: Command Interpolation
BSDM with BASH: Command InterpolationBSDM with BASH: Command Interpolation
BSDM with BASH: Command Interpolation
 
Findbin libs
Findbin libsFindbin libs
Findbin libs
 
Memory Manglement in Raku
Memory Manglement in RakuMemory Manglement in Raku
Memory Manglement in Raku
 
BASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic InterpolationBASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic Interpolation
 
Effective Benchmarks
Effective BenchmarksEffective Benchmarks
Effective Benchmarks
 
Metadata-driven Testing
Metadata-driven TestingMetadata-driven Testing
Metadata-driven Testing
 
The W-curve and its application.
The W-curve and its application.The W-curve and its application.
The W-curve and its application.
 
Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.
 
Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.
 
Smoking docker
Smoking dockerSmoking docker
Smoking docker
 
Getting Testy With Perl6
Getting Testy With Perl6Getting Testy With Perl6
Getting Testy With Perl6
 
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
 
Neatly folding-a-tree
Neatly folding-a-treeNeatly folding-a-tree
Neatly folding-a-tree
 

Recently uploaded

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 

Recently uploaded (20)

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 

Clustering Genes: W-curve + TSP

  • 2. HIV­1, W­curves, & Shoe Leather ● Existing genetics tools fail on HIV­1 ● They make assumptions based on “normal” DNA  that fail on HIV – or cancer, or plants. ● Correlation tools look at evolution, not state. ● We are working on tools for clinical analysis. ● The W­curve abstracts DNA into geometry. ● The TSP clusters genenes rather than trying to  impute inheritence.
  • 3. Sequences Inform Treatment ● Treating HIV requires sequencing it to choose  appropriate drugs: ● HIV­1 evolves drug resistence in months. ● Multiple strains in a single pateint are common,  both from multiple sources or evolution. ● Crossover recombination relatively common due to  cross­infected cells.
  • 4. Problem: HIV is Hard to Analyze ● HIV is a non­correcting retrovirus. ● Evolves 10,000 times faster than humans or  influenza – one new strain per patient per day. ● Genomes for wild types range from 8349 to  9829 bases, making localized comparisions  difficult. ● The single FDA approved algorithm directing  treatment from sequence handles only type­B;  the U.S. Army has 15%+ non­B infections.
  • 5. The Current Tools ● Blast, Fasta, ClustalW perform alignment. ● Table­driven analysis of base transitions. ● Score the entire sequence with a single value. ● Graphical tools are designed to display  inheritence rather than state. ● Output is difficult to read in a clinical setting.
  • 6. Phenogram of Drug­ Resistant and Random Samples ● Tries to show ancestory,  not state. ● Not very good for visual  identification of which  patients are drug  resistant.
  • 8. HIVHXB2CG TGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGA AY736838-gp120_ -------------------------------TACAGTTTATTATGGGGTGCCTGTGTGGA ***** *********** ********** HIVHXB2CG AGGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTAC AY736838-gp120_ GAGATGCAGATACCACCCTATTTTGTGCATCAGATGCCAAGGCACATGAGACAGAAGTGC ClustalW of gp120 ** *** ***** ******************** ** *** **** ***** ** * HIVHXB2CG ATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTAT AY736838-gp120_ ACAATGTCTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAATACACC * ***** ********************************************* ** HIVHXB2CG TGGTAAATGTGACAGAAAATTTTAACATGTGGAAAAATGACATGGTAGAACAGATGCATG AY736838-gp120_ TGGAAAATGTAACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAGCAGATGCAGG *** ****** *************************** ********** ******** * HIVHXB2CG AGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCT AY736838-gp120_ AGGATGTAATCAGTTTATGGGATCAAAGTCTAAAGCCATGTGTAAAGTTAACTCCTCTCT ***** ********************** ***************** ***** ** **** Difficult to compare  HIVHXB2CG GTGTTAGTTTAAAGTGCAC------TGATTTGAAGAATGATACTAATACCAATAGTAGTA AY736838-gp120_ GCGTTACTTTAAATTGTACCAATGCTAATTTGACCAATGGCAGTAGCAAAACCAATGTCT ● * **** ****** ** ** * ****** **** * ** * * * * HIVHXB2CG GCGGGAGAATGATAATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAA AY736838-gp120_ CTAACATAATAGGAAATATAACAGATGAAGTAAGAAACTGTACTTTTAATATGACCACAG sequences vis.ually. * *** ** * ** ** *** ****** **** ***** * **** HIVHXB2CG GCATAAGAGGTAAGGTGCAGAAAGAATATGCATTTTT TTATAAACTTGATATAATACCAA AY736838-gp120_ AACTAACAGATAAGAAGCAGAAGGTCCATGCACTCTTTTATAAGCTTGATATAGTACAAA *** ** **** ****** * ***** * ******** ********* *** ** ● Not useful for large  HIVHXB2CG AY736838-gp120_ HIVHXB2CG T---AGATAATGATACTACCAGC---TATAAGTTGACAAGTTGTAACACCTCAGTCATTA TTGAAGATAAGAAGAATAGTAGTGAGTATAGGTTAATAAATTGTAATACTTCAGTCATTA * ****** * * ** ** **** *** * ** ****** ** ********** CACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTG numbers of  AY736838-gp120_ AGCAGGCTTGTCCAAAGATATCCTTTGATCCAATTCCTATACATTATTGTACTCCAGCTG ***** ********* ********** ******** ************ * ** **** HIVHXB2CG GTTTTGCGATTCTAAAATGTAATAATAAGACGTTCAATGGAACAGGACCATGTACAAATG AY736838-gp120_ GTTATGCGATTTTAAAGTGTAATGATAAGAATTTCAATGGGACAGGGCCATGTAAAAATG sequences. *** ******* **** ****** ****** ******** ***** ******* ***** HIVHXB2CG TCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAA AY736838-gp120_ TCAGCTCAGTACAATGCACACATGGAATTAAGCCAGTGGTATCAACTCAATTGCTGTTAA ***** ********** ************* ****** ************ ********* HIVHXB2CG ATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGTCAATTTCACGGACAATGCTA AY736838-gp120_ ATGGCAGTCTAGCAGAAGAAGAGATAATAATCAGATCTGAAGATCTCACAAACAATGCCA Gaps make analysis  *********************** ** **** ******* ** **** ******* * HIVHXB2CG AAACCATAATAGTACAGCTGAACACATCTGTAGAAATTAATTGTACAAGACCCAACAACA ● AY736838-gp120_ AAACCATAATAGTGCACCTTAATAAATCTGTAGAAATCAATTGTACCAGACCCTCCAACA ************* ** ** ** * ************ ******** ****** ***** HIVHXB2CG ATACAAGAAAAAGAATCCGTATCCAGAGAGGACCAGGGAGAGCATTTGTTACAATAGGAA difficult AY736838-gp120_ HIVHXB2CG AY736838-gp120_ ATACAAGAACAAGTATAACTAT------AGGACCAGGACGAGTATTCTATAGAACAGGAG ********* *** ** *** ********* *** *** ** ** **** A---AATAGGAAATATGAGACAAGCACATTGTAACATTAGTAGAGCAAAATGGAATAACA ATATAATAGGAAATATAAGAAAAGCATATTGTGAGATTAATGGAACAAAATGGAATAAAG * ************ *** ***** ***** * **** * ** ************* HIVHXB2CG CTTTAAAACAGATAGCTAGCAAATTAAGAGAACAATTTGGAAATAATAAAACAATAATCT AY736838-gp120_ TTTTAAAACAGGTAACTGAAAAATTAAAAGAGCACTTT------AATAAGACAATAATCT ********** ** ** ******* *** ** *** ***** ********** HIVHXB2CG TTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTTTAATTGTGGAGGGG AY736838-gp120_ TTCAACCACCCTCAGGAGGAGATCTAGAAATTACAATGCATCATTTTAATTGTAGAGGGG ** * * * ********** ** * ******* ** *** ********** ****** HIVHXB2CG AATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTGGA AY736838-gp120_ AATTTTTCTATTGCAATACAACAAAACTGTTTAATAATATTTGCCTAGGAAATG---AAA ********** ** *** ***** ************ ** *** * * * * HIVHXB2CG GTACTGAAGGGTCAAATAACACTGAAGGAAGTGACACAATCACCCTCCCATGCAGAATAA AY736838-gp120_ CCATGGCGGGGTGTAATGACACT---------------ATCACACTTCCATGCAAGATAA * * **** *** ***** ***** ** ******* ****
  • 9. New Tools ● Clinical vs. evolutionary. ● Avoid assumptions that break current tools. ● Suitable for a repeatable process in clinics or  data mining in research. ● We are using: ● W­curve for analysis. ● TSP for clustering. ● R for data management & display.
  • 10. W­curve ● Geometric abstraction of DNA. ● Manufactured by a simple state machine. ● Alignment at finer scale available using  geometry than character strings. ● Avoids assumptions about transition  probabilities by taking the figure as­is.
  • 11. W­Curve Generator is a State Machine ● C,A,T,G are assigned to corners of a square. ● Successive points move halfway to the next  base's corner.
  • 12. W­curve for “CG” ● Curve shown  in Blue. ● Halfway to C  then G in  X‑Y, single  steps in Z. ● Cyl. storage  simplifies  comparision.
  • 16. Distance Metric ● Bases are arranged in  square to minimize  effects of SNP's. ● Synonymous SNP's  are usually in the  same quadrant. ● Points within same  quadrant have small  difference, opposite  quad's get larger.
  • 17. Comparison Produces “Chunks” ● Comparison yields a list of chunks. ● Curves are aligned within the chunk. ● Summing chunks gives single value two curves. ● Analyzing them in detail allows mining local  similarities and variations. ● Grouping allows examination of crossover­ recombination events.
  • 18. Clustering: Traveling Salesman Problem ● The TSP is simple to describe, hard to solve: ● Starting and finishing in the same city. ● Visit a list of cities once each. ● Minimize the distance (cost). ● Optimal solutions will cluster the nearby cities. ● The problem was always in defining the  clusters.
  • 19. Take a Walk and Cluster Your Genes ● Climer & Zhang, 2004. ● Method for detecting N clusters: ● Add  N dummy cities to the distance map. ● Each one has the same, small distance to all other  cities (we use 2­20). ● Dummy cities end up in the inter­cluster gaps. ● The process is trivial to implement: just add that  many rows and columns to the original  comparison matrix.
  • 20. Displaying the Tour ● Mapping the tour onto a circle gives a good  view of the distances. ● Coloring simplifies inspection. ● Black dots for dummy cities. ● Single type at the top (e.g. wild type). ● Color successive data points using the “rainbow”  sequence with a large number of colors. ● Sequences more alike get more similar colors.
  • 21.
  • 22.
  • 24. Multiple uses for color sequence. ● Track individual over time. ● Progression through colors shows history. ● Clustering highlights progression towards drug  resistance. ● Track sample population. ● Recycling the colors from one initial tour helps show  changes in successive graphs. ● Simplifies tracking progression in anonymous  populations found in HIV treatment centers.
  • 25. Visualizing W­curves ● We use a WebGL­based package “WebCurve”. ● Developed at IIT as a web­friendly solution for   examining 3D geometry. ● Gracefully handles displaying 100+ sequences  at 10K bases each on a notebook computer. ● Available from github, archive includes a web  server and code to generate files for display.
  • 26. Summary ● W­curve and TSP allow us to cluster genes. ● Provides a more useful output in a clinical  setting. ● Color coding the TSP results allows tracking  changes in a population or progression an  individual over time.