Graph algorithms have important applications in bioinformatics. Seymour Benzer used graph theory and interval graphs to study the structure of viral genomes. Through his experiment infecting bacteria with pairs of disabled bacteriophages, Benzer was able to show that the genome of these viruses was linear rather than branched based on whether the interval graph formed was interval or not. This provided insight into viral genome structure using mathematical concepts like graphs and intervals.
Graph theory - Traveling Salesman and Chinese PostmanChristian Kehl
Traveling Salesman and Chinese Postman problems
1. Problem Description and Complexity
2. Theoretical Approach
3. Practical Approaches and Possible Solutions
4. Examples
The document discusses the travelling salesman problem (TSP). TSP involves finding the shortest possible route for a salesman to visit each city in a set and return to their starting point. The problem was first studied in the 1800s and became increasingly popular in scientific circles in the 1950s-60s. TSP is an NP-complete optimization problem with many real-world applications like delivery routing. Exact methods to solve TSP take too long, so heuristic methods provide good but not necessarily optimal solutions more quickly. The objective is to minimize the total distance traveled between cities, and TSP problems can be symmetric or asymmetric depending on whether distances between cities are the same in both directions.
The document describes the key steps in the NGS workflow including library construction, preparation of the substrate, sequencing, and data analysis. It provides examples of fragmenting genomic DNA, constructing libraries for Illumina and Ion Torrent sequencing, and quality control steps like size selection and quantification of libraries. Different applications of NGS are also summarized such as targeted sequencing using probe hybridization or PCR and epigenomics approaches involving ChIP-seq and bisulfite sequencing.
OPTIMIZATION TECHNIQUES
Optimization techniques are methods for achieving the best possible result under given constraints. There are various classical and advanced optimization methods. Classical methods include techniques for single-variable, multi-variable without constraints, and multi-variable with equality or inequality constraints using methods like Lagrange multipliers or Kuhn-Tucker conditions. Advanced methods include hill climbing, simulated annealing, genetic algorithms, and ant colony optimization. Optimization has applications in fields like engineering, business/economics, and pharmaceutical formulation to improve processes and outcomes under constraints.
Genome-wide association study (GWAS) technology has been a primary method for identifying the genes responsible for diseases and other traits for the past ten years. GWAS continues to be highly relevant as a scientific method. Over 2,000 human GWAS reports now appear in scientific journals. Our free eBook aims to explain the basic steps and concepts to complete a GWAS experiment.
The document introduces nonlinear programming (NLP) and contrasts it with linear programming (LP). NLP involves optimization problems with nonlinear objective functions or constraints, which are more difficult to solve than LP problems. Examples are provided to illustrate how NLP searches can fail to find the global optimum. The document also formulates two NLP examples: one involving profit maximization for chair pricing, and another involving investment portfolio selection to minimize risk.
Genome-wide association studies (GWAS) have been providing valuable insight to the genetics of common and complex diseases for many years. In this webcast we will walk through one possible workflow for completing GWAS in Golden Helix SNP & Variation Suite (SVS) with special attention paid to adjusting analysis for population stratification.
Mystique Restaurant will operate as a sole proprietorship providing meals, pastries, and beverages. It will be located in Charlieville, Trinidad near many businesses and customers. The restaurant will employ 11 skilled and semi-skilled staff. Funding will come from personal savings, bank loans using property as collateral, and family loans. The entrepreneur's roles are planning, organizing, and evaluating performance. Food will be prepared using technology for quality and efficiency. Local suppliers will provide ingredients. The business must comply with food handling and business licensing regulations to operate legally and avoid shutdown. Ethical waste disposal is important to avoid pollution.
Graph theory - Traveling Salesman and Chinese PostmanChristian Kehl
Traveling Salesman and Chinese Postman problems
1. Problem Description and Complexity
2. Theoretical Approach
3. Practical Approaches and Possible Solutions
4. Examples
The document discusses the travelling salesman problem (TSP). TSP involves finding the shortest possible route for a salesman to visit each city in a set and return to their starting point. The problem was first studied in the 1800s and became increasingly popular in scientific circles in the 1950s-60s. TSP is an NP-complete optimization problem with many real-world applications like delivery routing. Exact methods to solve TSP take too long, so heuristic methods provide good but not necessarily optimal solutions more quickly. The objective is to minimize the total distance traveled between cities, and TSP problems can be symmetric or asymmetric depending on whether distances between cities are the same in both directions.
The document describes the key steps in the NGS workflow including library construction, preparation of the substrate, sequencing, and data analysis. It provides examples of fragmenting genomic DNA, constructing libraries for Illumina and Ion Torrent sequencing, and quality control steps like size selection and quantification of libraries. Different applications of NGS are also summarized such as targeted sequencing using probe hybridization or PCR and epigenomics approaches involving ChIP-seq and bisulfite sequencing.
OPTIMIZATION TECHNIQUES
Optimization techniques are methods for achieving the best possible result under given constraints. There are various classical and advanced optimization methods. Classical methods include techniques for single-variable, multi-variable without constraints, and multi-variable with equality or inequality constraints using methods like Lagrange multipliers or Kuhn-Tucker conditions. Advanced methods include hill climbing, simulated annealing, genetic algorithms, and ant colony optimization. Optimization has applications in fields like engineering, business/economics, and pharmaceutical formulation to improve processes and outcomes under constraints.
Genome-wide association study (GWAS) technology has been a primary method for identifying the genes responsible for diseases and other traits for the past ten years. GWAS continues to be highly relevant as a scientific method. Over 2,000 human GWAS reports now appear in scientific journals. Our free eBook aims to explain the basic steps and concepts to complete a GWAS experiment.
The document introduces nonlinear programming (NLP) and contrasts it with linear programming (LP). NLP involves optimization problems with nonlinear objective functions or constraints, which are more difficult to solve than LP problems. Examples are provided to illustrate how NLP searches can fail to find the global optimum. The document also formulates two NLP examples: one involving profit maximization for chair pricing, and another involving investment portfolio selection to minimize risk.
Genome-wide association studies (GWAS) have been providing valuable insight to the genetics of common and complex diseases for many years. In this webcast we will walk through one possible workflow for completing GWAS in Golden Helix SNP & Variation Suite (SVS) with special attention paid to adjusting analysis for population stratification.
Mystique Restaurant will operate as a sole proprietorship providing meals, pastries, and beverages. It will be located in Charlieville, Trinidad near many businesses and customers. The restaurant will employ 11 skilled and semi-skilled staff. Funding will come from personal savings, bank loans using property as collateral, and family loans. The entrepreneur's roles are planning, organizing, and evaluating performance. Food will be prepared using technology for quality and efficiency. Local suppliers will provide ingredients. The business must comply with food handling and business licensing regulations to operate legally and avoid shutdown. Ethical waste disposal is important to avoid pollution.
This document provides an introduction to next generation sequencing (NGS) technologies. It begins with an outline of topics to be covered, including the evolution of NGS technologies, their descriptions and comparisons, bioinformatics challenges of NGS data analysis, and some aspects of NGS data analysis workflows and tools. The document then delves into explanations of specific NGS platforms, their performance characteristics, and the sequencing processes. It discusses the large computational infrastructure and data management needs of NGS, as well as quality control, preprocessing of NGS data, and popular analysis tools and workflows.
This document provides information about starting a proposed business called Austin's Car Wash and Guest House. It includes sections on acknowledging assistance received, introducing the business, justifying its location, selecting appropriate labor, identifying sources of capital, defining the entrepreneur's role, describing production methods and levels, discussing use of technology, outlining linkages, addressing potential for growth, noting government regulations, and ethical considerations. The business will provide car washing and guest house services, using various manual and semi-skilled labor. It will be located for accessibility and to provide local employment.
This is a presentation on how to build your problem statement given in the course AR3U012 Methods for Urbanism of the TU Delft (Delft University of Technology). This is prepared for students of urbanism, urban planning and urban design.
Este documento describe el problema del cartero chino y algoritmos para resolverlo de forma óptima. El problema consiste en encontrar la ruta más corta para un cartero que debe entregar correspondencia en todas las calles de una ciudad y regresar a la oficina central. Se presentan teoremas y algoritmos como el de Edmonds para encontrar cadenas eulerianas que representan la ruta óptima. También se incluyen ejemplos y propuestas para diseñar una aplicación que resuelva este problema.
This document provides guidance on writing an effective problem statement for a research proposal. It defines a research problem as a situation that needs a solution where possible solutions exist. An effective problem statement clearly describes the issue to be addressed in one sentence, with additional paragraphs elaborating on the problem's importance and context. It should identify the variables of interest and relationship between variables to be studied. The problem statement establishes the foundation for the rest of the proposal by framing the scope and focus of the research. It is important to demonstrate that the problem is worth studying by considering factors like its current relevance, future implications, practical applications, and theoretical significance. The problem statement helps motivate the need for the study and generates the research questions to be answered.
Este documento describe el problema clásico de los siete puentes de Königsberg, que dio origen a la teoría de grafos. El problema consistía en determinar si era posible recorrer todos los puentes de la ciudad pasando una sola vez por cada uno. Euler resolvió el problema mediante la abstracción de los detalles de la ciudad en una representación gráfica, donde los vértices representaban las tierras y las aristas los puentes. Determinó que no era posible realizar el recorrido debido a que los vértices tenían grados impares. Est
The document describes string comparison techniques using matrix algebra and seaweed matrices. It introduces the concept of semi-local string comparison, which involves comparing a whole string to substrings of another string. The key idea is representing string comparison matrices implicitly using seaweed matrices, which represent unit-Monge matrices. This allows developing algebraic techniques for efficiently multiplying such matrices using the algebra of braids and the seaweed monoid. These multiplication techniques can then be applied to problems like dynamic programming string comparison and comparing compressed strings.
The document provides an overview of the KNIME analytics platform and its capabilities. It discusses:
- KNIME's origins, offices, codebase, and application areas including pharma, healthcare, finance, retail, and more.
- The key components of the KNIME platform including data access, transformation, analysis, visualization, and deployment capabilities.
- Integrations with tools like R, Weka, databases, and file formats.
- Community contributions expanding KNIME's functionality in areas like bioinformatics, chemistry, image processing, and more.
Ядерный век прошел, и становится все понятнее, что в фокусе науки 21-го века будут живые системы, медицина, и человек во всех его проявлениях. Здесь осуществляются самые масштабные финансовые вливания, и на эту отрасль человечество возлагает самые большие надежды. Все чаще слышатся предметные обсуждения тем, казавшихся еще недавно научной фантастикой: сможет ли человечество победить старение, рак, и другие смертельные заболевания? Сможет ли менять свой геном по собственному желанию? Будем ли мы хозяевами своим телам в той же мере, как мы хозяйничаем на Земле?
Многие десятилетия биология и медицина развивались как описательные науки. Однако по мере созревания и накопления информации, любая наука рано или поздно переходит на более точный язык - язык математики. Проект "Геном человека" обеспечил технологический прорыв, который будет питать науку о живом еще много лет - но который также поставил много новых глобальных вопросов перед современными учеными.
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...BioinformaticsInstitute
This document summarizes recent advances in cancer immunotherapy from the perspective of systems biology. It discusses how checkpoint blockade immunotherapy works by addressing the second co-inhibitory checkpoint signal needed for T cell activation. Computational methods are now able to identify tumor-specific neoantigens that can be targeted by immunotherapy. Mouse model studies showed that certain tumors are naturally rejected due to expression of a mutant antigen recognized by T cells, and that antigen-specific T cells are present before immunotherapy treatment. The high mutational load in melanoma makes it particularly responsive to checkpoint blockade. Early work in the 19th century by William Coley observed tumor regression following bacterial infection, which led to development of a toxin mixture that resembled modern vaccine formulations. Members of
http://bioinformaticsinstitute.ru/guests
В пятницу 10 октября в 19.00 Мария Шутова (ИоГЕН РАН) выступала в Институте биоинформатики с открытой лекцией, посвященной изучению рака.
Рак -- одна из наиболее распространенных причин смерти по всему миру. В лекции рассматривается, как знания об эволюции, работе генома, репрограммировании, а также использование биоинформатических методов помогли лучше понять, как развивается раковая опухоль и предложить новые методы лечения разнообразных типов рака. Рассмотрены мышиные модели развития рака и интересные результаты, которые были получены с их помощью.
http://bioinformaticsinstitute.ru/lectures
Гостевая лекция Института биоинформатики, 9 октября 2014. Лектор -- Мария Шутова (ИоГЕН РАН).
За последние десять лет плюрипонтентные клетки стали героями двух Нобелевских премий и многих тысяч научных и научно-популярных статей. Их уникальная возможность превращаться в любую клетку взрослого организма до сих пор дает пищу для ума как биологам развития, так и ученым, ищущим способы лечения генетических заболеваний. В лекции будет рассказано о двух типах плюрипотентных клеток: "естественных" (эмбриональные стволовые клетки) и "искусственных" (индуцированные плюрипотентные стволовые клетки). Отдельно мы остановимся на том, как знания о работе транскрипционных факторов помогли репрограммировать клетки, и как эти "искусственные" плюрипотентные клетки можно использовать в медицине.
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...BioinformaticsInstitute
This document summarizes genetic analyses of complex human phenotypes. It describes whole genome sequencing of individuals from bipolar disorder families and finding an association between genetic variation in a chromosome 6 region and amygdala volume. It also discusses rare variant sequencing of metabolic syndrome-related genes in Finnish cohorts, identifying new signals beyond existing GWAS hits. Additionally, it outlines exome and targeted sequencing of Tourette syndrome pedigrees, with a genome-wide significant result in a long non-coding RNA gene linked to the trait.
В своей лекции Андрей Афанасьев рассказал о стартапах в биотехе и биоинформатике и своем биоинформатическом проекте iBinom, разобрал несколько биотехнологических проектов глазами инноваторов и инвесторов, а также коснулся вопроса поиска инвестиций и поделился личным опытом взаимодействия с венчурными фондами и институтами развития.
This document provides an overview of the ENCODE project and how its data can be accessed through the UCSC Genome Browser. It discusses the different types of ENCODE data available, including mapping data, gene annotations, expression data, regulatory information, and genetic variation. It also explains how to find, view, and download ENCODE tracks from the Genome Browser and where to get more information about ENCODE. The overall goal of the ENCODE project is to identify all functional elements in the human genome.
This document provides an introduction to next generation sequencing (NGS) technologies. It begins with an outline of topics to be covered, including the evolution of NGS technologies, their descriptions and comparisons, bioinformatics challenges of NGS data analysis, and some aspects of NGS data analysis workflows and tools. The document then delves into explanations of specific NGS platforms, their performance characteristics, and the sequencing processes. It discusses the large computational infrastructure and data management needs of NGS, as well as quality control, preprocessing of NGS data, and popular analysis tools and workflows.
This document provides information about starting a proposed business called Austin's Car Wash and Guest House. It includes sections on acknowledging assistance received, introducing the business, justifying its location, selecting appropriate labor, identifying sources of capital, defining the entrepreneur's role, describing production methods and levels, discussing use of technology, outlining linkages, addressing potential for growth, noting government regulations, and ethical considerations. The business will provide car washing and guest house services, using various manual and semi-skilled labor. It will be located for accessibility and to provide local employment.
This is a presentation on how to build your problem statement given in the course AR3U012 Methods for Urbanism of the TU Delft (Delft University of Technology). This is prepared for students of urbanism, urban planning and urban design.
Este documento describe el problema del cartero chino y algoritmos para resolverlo de forma óptima. El problema consiste en encontrar la ruta más corta para un cartero que debe entregar correspondencia en todas las calles de una ciudad y regresar a la oficina central. Se presentan teoremas y algoritmos como el de Edmonds para encontrar cadenas eulerianas que representan la ruta óptima. También se incluyen ejemplos y propuestas para diseñar una aplicación que resuelva este problema.
This document provides guidance on writing an effective problem statement for a research proposal. It defines a research problem as a situation that needs a solution where possible solutions exist. An effective problem statement clearly describes the issue to be addressed in one sentence, with additional paragraphs elaborating on the problem's importance and context. It should identify the variables of interest and relationship between variables to be studied. The problem statement establishes the foundation for the rest of the proposal by framing the scope and focus of the research. It is important to demonstrate that the problem is worth studying by considering factors like its current relevance, future implications, practical applications, and theoretical significance. The problem statement helps motivate the need for the study and generates the research questions to be answered.
Este documento describe el problema clásico de los siete puentes de Königsberg, que dio origen a la teoría de grafos. El problema consistía en determinar si era posible recorrer todos los puentes de la ciudad pasando una sola vez por cada uno. Euler resolvió el problema mediante la abstracción de los detalles de la ciudad en una representación gráfica, donde los vértices representaban las tierras y las aristas los puentes. Determinó que no era posible realizar el recorrido debido a que los vértices tenían grados impares. Est
The document describes string comparison techniques using matrix algebra and seaweed matrices. It introduces the concept of semi-local string comparison, which involves comparing a whole string to substrings of another string. The key idea is representing string comparison matrices implicitly using seaweed matrices, which represent unit-Monge matrices. This allows developing algebraic techniques for efficiently multiplying such matrices using the algebra of braids and the seaweed monoid. These multiplication techniques can then be applied to problems like dynamic programming string comparison and comparing compressed strings.
The document provides an overview of the KNIME analytics platform and its capabilities. It discusses:
- KNIME's origins, offices, codebase, and application areas including pharma, healthcare, finance, retail, and more.
- The key components of the KNIME platform including data access, transformation, analysis, visualization, and deployment capabilities.
- Integrations with tools like R, Weka, databases, and file formats.
- Community contributions expanding KNIME's functionality in areas like bioinformatics, chemistry, image processing, and more.
Ядерный век прошел, и становится все понятнее, что в фокусе науки 21-го века будут живые системы, медицина, и человек во всех его проявлениях. Здесь осуществляются самые масштабные финансовые вливания, и на эту отрасль человечество возлагает самые большие надежды. Все чаще слышатся предметные обсуждения тем, казавшихся еще недавно научной фантастикой: сможет ли человечество победить старение, рак, и другие смертельные заболевания? Сможет ли менять свой геном по собственному желанию? Будем ли мы хозяевами своим телам в той же мере, как мы хозяйничаем на Земле?
Многие десятилетия биология и медицина развивались как описательные науки. Однако по мере созревания и накопления информации, любая наука рано или поздно переходит на более точный язык - язык математики. Проект "Геном человека" обеспечил технологический прорыв, который будет питать науку о живом еще много лет - но который также поставил много новых глобальных вопросов перед современными учеными.
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...BioinformaticsInstitute
This document summarizes recent advances in cancer immunotherapy from the perspective of systems biology. It discusses how checkpoint blockade immunotherapy works by addressing the second co-inhibitory checkpoint signal needed for T cell activation. Computational methods are now able to identify tumor-specific neoantigens that can be targeted by immunotherapy. Mouse model studies showed that certain tumors are naturally rejected due to expression of a mutant antigen recognized by T cells, and that antigen-specific T cells are present before immunotherapy treatment. The high mutational load in melanoma makes it particularly responsive to checkpoint blockade. Early work in the 19th century by William Coley observed tumor regression following bacterial infection, which led to development of a toxin mixture that resembled modern vaccine formulations. Members of
http://bioinformaticsinstitute.ru/guests
В пятницу 10 октября в 19.00 Мария Шутова (ИоГЕН РАН) выступала в Институте биоинформатики с открытой лекцией, посвященной изучению рака.
Рак -- одна из наиболее распространенных причин смерти по всему миру. В лекции рассматривается, как знания об эволюции, работе генома, репрограммировании, а также использование биоинформатических методов помогли лучше понять, как развивается раковая опухоль и предложить новые методы лечения разнообразных типов рака. Рассмотрены мышиные модели развития рака и интересные результаты, которые были получены с их помощью.
http://bioinformaticsinstitute.ru/lectures
Гостевая лекция Института биоинформатики, 9 октября 2014. Лектор -- Мария Шутова (ИоГЕН РАН).
За последние десять лет плюрипонтентные клетки стали героями двух Нобелевских премий и многих тысяч научных и научно-популярных статей. Их уникальная возможность превращаться в любую клетку взрослого организма до сих пор дает пищу для ума как биологам развития, так и ученым, ищущим способы лечения генетических заболеваний. В лекции будет рассказано о двух типах плюрипотентных клеток: "естественных" (эмбриональные стволовые клетки) и "искусственных" (индуцированные плюрипотентные стволовые клетки). Отдельно мы остановимся на том, как знания о работе транскрипционных факторов помогли репрограммировать клетки, и как эти "искусственные" плюрипотентные клетки можно использовать в медицине.
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...BioinformaticsInstitute
This document summarizes genetic analyses of complex human phenotypes. It describes whole genome sequencing of individuals from bipolar disorder families and finding an association between genetic variation in a chromosome 6 region and amygdala volume. It also discusses rare variant sequencing of metabolic syndrome-related genes in Finnish cohorts, identifying new signals beyond existing GWAS hits. Additionally, it outlines exome and targeted sequencing of Tourette syndrome pedigrees, with a genome-wide significant result in a long non-coding RNA gene linked to the trait.
В своей лекции Андрей Афанасьев рассказал о стартапах в биотехе и биоинформатике и своем биоинформатическом проекте iBinom, разобрал несколько биотехнологических проектов глазами инноваторов и инвесторов, а также коснулся вопроса поиска инвестиций и поделился личным опытом взаимодействия с венчурными фондами и институтами развития.
This document provides an overview of the ENCODE project and how its data can be accessed through the UCSC Genome Browser. It discusses the different types of ENCODE data available, including mapping data, gene annotations, expression data, regulatory information, and genetic variation. It also explains how to find, view, and download ENCODE tracks from the Genome Browser and where to get more information about ENCODE. The overall goal of the ENCODE project is to identify all functional elements in the human genome.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
2. Outline
1. Introduction to Graph Theory
2. The Hamiltonian & Eulerian Cycle Problems
3. Basic Biological Applications of Graph Theory
4. DNA Sequencing
5. Shortest Superstring & Traveling Salesman Problems
6. Sequencing by Hybridization
7. Fragment Assembly & Repeats in DNA
8. Fragment Assembly Algorithms
4. Knight Tours
• Knight Tour Problem: Given an
8 x 8 chessboard, is it possible to
find a path for a knight that visits
every square exactly once and
returns to its starting square?
• Note: In chess, a knight may move
only by jumping two spaces in one
direction, followed by a jump one
space in a perpendicular direction.
http://www.chess-poster.com/english/laws_of_chess.htm
6. • 1759: Berlin Academy of Sciences
proposes a 4000 francs prize for the
solution of the more general problem
of finding a knight tour on an N x N
chessboard.
• 1766: The problem is solved by
Leonhard Euler (pronounced ―Oiler‖).
• The prize was never awarded since
Euler was Director of Mathematics
at Berlin Academy and was
deemed ineligible.
18th Century: N x N Knight Tour Problem
Leonhard Euler
http://commons.wikimedia.org/wiki/File:Leonhard_Euler_by_Handmann.png
7. • A graph is a collection (V, E) of two sets:
• V is simply a set of objects, which we
call the vertices of G.
• E is a set of pairs of vertices which
we call the edges of G.
Introduction to Graph Theory
8. • A graph is a collection (V, E) of two sets:
• V is simply a set of objects, which we
call the vertices of G.
• E is a set of pairs of vertices which
we call the edges of G.
• Simpler: Think of G as a network:
Introduction to Graph Theory
http://uh.edu/engines/epi2467.htm
9. • A graph is a collection (V, E) of two sets:
• V is simply a set of objects, which we
call the vertices of G.
• E is a set of pairs of vertices which
we call the edges of G.
• Simpler: Think of G as a network:
• Nodes = vertices
Introduction to Graph Theory
http://uh.edu/engines/epi2467.htm
Vertex
10. • A graph is a collection (V, E) of two sets:
• V is simply a set of objects, which we
call the vertices of G.
• E is a set of pairs of vertices which
we call the edges of G.
• Simpler: Think of G as a network:
• Nodes = vertices
• Edges = segments connecting the
nodes
Introduction to Graph Theory
http://uh.edu/engines/epi2467.htm
Vertex
Edge
12. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
Hamiltonian Cycle Problem
13. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
Hamiltonian Cycle Problem
14. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
15. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
16. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
17. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
18. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
19. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
20. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
21. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
22. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
23. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
24. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
25. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
26. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
27. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
28. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
29. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
30. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
31. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
32. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
33. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
34. • Input: A graph G = (V, E)
• Output: A Hamiltonian cycle in
G, which is a cycle that visits
every vertex exactly once.
• Example: In 1857, William Rowan
Hamilton asked whether the graph
to the right has such a cycle.
• Do you see a Hamiltonian cycle?
Hamiltonian Cycle Problem
35. • Let us form a graph G = (V, E) as
follows:
• V = the squares of a chessboard
• E = the set of edges (v, w) where v
and w are squares on the
chessboard and a knight can jump
from v to w in a single move.
• Hence, a knight tour is just a
Hamiltonian Cycle in this graph!
Knight Tours Revisited
36. • Theorem: The Hamiltonian Cycle Problem is NP-Complete.
• This result explains why knight tours were so difficult to find;
there is no known quick method to find them!
Hamiltonian Cycle Problem
37. • Recall the Traveling Salesman Problem (TSP):
• n cities
• Cost of traveling from i to j
is given by c(i, j)
• Goal: Find the tour of all the
cities of lowest total cost.
• Example at right: One
busy salesman!
• So we might like to think of the Hamiltonian Cycle Problem as a
TSP with all costs = 1, where we have some edges missing (there
doesn’t always exist a flight between all pairs of cities).
Hamiltonian Cycle Problem as TSP
http://www.ima.umn.edu/public-lecture/tsp/index.html
38. • The city of Konigsberg, Prussia (today: Kaliningrad, Russia)
was made up of both banks of a river, as well as two islands.
• The riverbanks and the islands were connected with bridges, as
follows:
• The residents wanted to know if they could take a walk from
anywhere in the city, cross each bridge exactly once, and wind
up where they started.
The Bridges of Konigsberg
http://www.math.uwaterloo.ca/navigation/ideas/Zeno/zenocando.shtml
39. • 1735: Enter Euler...his idea: compress each land area down to a
single point, and each bridge down to a segment connecting
two points.
The Bridges of Konigsberg
40. • 1735: Enter Euler...his idea: compress each land area down to a
single point, and each bridge down to a segment connecting
two points.
• This is just a graph!
The Bridges of Konigsberg
http://www.math.uwaterloo.ca/navigation/ideas/Zeno/zenocando.shtml
41. • 1735: Enter Euler...his idea: compress each land area down to a
single point, and each bridge down to a segment connecting
two points.
• This is just a graph!
• What we are looking for,
then, is a cycle in this
graph which covers each
edge exactly once.
The Bridges of Konigsberg
http://www.math.uwaterloo.ca/navigation/ideas/Zeno/zenocando.shtml
42. • 1735: Enter Euler...his idea: compress each land area down to a
single point, and each bridge down to a segment connecting
two points.
• This is just a graph!
• What we are looking for,
then, is a cycle in this
graph which covers each
edge exactly once.
• Using this setup, Euler
showed that such a cycle cannot exist.
The Bridges of Konigsberg
http://www.math.uwaterloo.ca/navigation/ideas/Zeno/zenocando.shtml
43. Eulerian Cycle Problem
• Input: A graph G = (V, E).
• Output: A cycle in G that touches every edge in E (called an
Eulerian cycle), if one exists.
• Example: At right is a
demonstration of an
Eulerian cycle.
http://mathworld.wolfram.com/EulerianCycle.html
44. Eulerian Cycle Problem
• Theorem: The Eulerian Cycle Problem can be solved in linear
time.
• So whereas finding a Hamiltonian cycle quickly becomes
intractable for an arbitrary graph, finding an Eulerian cycle is
relatively much easier.
• Keep this fact in mind, as it will become essential.
46. Modeling Hydrocarbons with Graphs
• Arthur Cayley studied chemical
structures of hydrocarbons in the
mid-1800s.
• He used trees (acyclic connected
graphs) to enumerate structural
isomers.
Hydrocarbon StructureArthur Cayley
http://www.scientific-web.com/en/Mathematics/Biographies/ArthurCayley01.html
47. T4 Bacteriophages: Life Finds a Way
• Normally, the T4 bacteriophage kills
bacteria
• However, if T4 is mutated (e.g., an
important gene is deleted) it gets
disabled and loses the ability to kill
bacteria
• Suppose a bacterium is infected with
two different disabled mutants–
would the bacterium still survive?
• Amazingly, a pair of disabled viruses
can still kill a bacterium.
• How is this possible? T4 Bacteriophage
48. Benzer’s Experiment
• Seymour Benzer’s Idea: Infect bacteria with pairs of mutant
T4 bacteriophage (virus).
• Each T4 mutant has an unknown interval deleted from its
genome.
• If the two intervals overlap: T4 pair
is missing part of its genome and
is disabled—bacteria survive.
• If the two intervals do not overlap:
T4 pair has its entire genome and
is enabled – bacteria are killed.
http://commons.wikimedia.org/wiki/File:Seymour_Benzer.gif
Seymour Benzer
50. Benzer’s Experiment and Graph Theory
• We construct an interval graph:
• Each T4 mutant forms a vertex.
• Place an edge between mutant pairs where bacteria survived
(i.e., the deleted intervals in the pair of mutants overlap)
• As the next slides show, the interval graph structure reveals
whether DNA is linear or branched.
73. Linear Genome Branched Genome
Linear vs. Branched Genomes: Interval Graphs
• Simply by comparing the structure of the two interval graphs,
Benzer showed that genomes cannot be branched!
75. • Sanger Method (1977):
Labeled ddNTPs terminate
DNA copying at random
points.
• Both methods generate labeled
fragments of varying lengths
that are further electrophoresed.
• Gilbert Method (1977):
Chemical method to cleave
DNA at specific points (G,
G+A, T+C, C).
DNA Sequencing: History
Frederick Sanger Walter Gilbert
76. Sanger Method: Generating Read
1. Start at primer
(restriction site).
2. Grow DNA chain.
3. Include ddNTPs.
4. Stop reaction at all
possible points.
5. Separate products
by length, using
gel
electrophoresis.
77. Sanger Method: Sequencing
• Shear DNA into millions of
small fragments.
• Read 500 – 700 nucleotides
at a time from the small
fragments.
78. Fragment Assembly
• Computational Challenge: assemble individual short
fragments (―reads‖) into a single genomic sequence
(―superstring‖).
• Until late 1990s the so called ―shotgun fragment assembly‖ of
the human genome was viewed as an intractable problem,
because it required so much work for a large genome.
• Our computational challenge leads to the formal problem at
the beginning of the next section.
80. Shortest Superstring Problem (SSP)
• Problem: Given a set of strings, find a shortest string that
contains all of them.
• Input: Strings s1, s2,…., sn
• Output: A ―superstring‖ s that contains all strings
s1, s2,…., sn as substrings, such that the length of s is
minimized.
84. SSP: Example
• So our greedy guess of concatenating all the strings together
turns out to be substantially suboptimal (length 24 vs. 10).
85. SSP: Example
• So our greedy guess of concatenating all the strings together
turns out to be substantially suboptimal (length 24 vs. 10).
• Note: The strings here are just the integers from 1 to 8 in base-2 notation.
86. SSP: Issues
• Complexity: NP-complete (in a few slides).
• Also, this formulation does not take into account the
possibility of sequencing errors, and it is difficult to adapt to
handle that consideration.
87. • Given strings si and sj , define overlap(si , sj ) as the length of
the longest prefix of sj that matches a suffix of si .
The Overlap Function
88. • Given strings si and sj , define overlap(si , sj ) as the length of
the longest prefix of sj that matches a suffix of si .
• Example:
• s1 = aaaggcatcaaatctaaaggcatcaaa
• s2 = aagcatcaaatctaaaggcatcaaa
The Overlap Function
89. • Given strings si and sj , define overlap(si , sj ) as the length of
the longest prefix of sj that matches a suffix of si .
• Example:
• s1 = aaaggcatcaaatctaaaggcatcaaa
• s2 = aagcatcaaatctaaaggcatcaaa
aaaggcatcaaatctaaaggcatcaaa
aaaggcatcaaatctaaaggcatcaaa
The Overlap Function
90. • Given strings si and sj , define overlap(si , sj ) as the length of
the longest prefix of sj that matches a suffix of si .
• Example:
• s1 = aaaggcatcaaatctaaaggcatcaaa
• s2 = aagcatcaaatctaaaggcatcaaa
aaaggcatcaaatctaaaggcatcaaa
aaaggcatcaaatctaaaggcatcaaa
• Therefore, overlap(s1 , s2 ) = 12.
The Overlap Function
91. Why is SSP an NP-Complete Problem?
• Construct a graph G as follows:
• The n vertices represent the n strings s1, s2,…., sn.
• For every pair of vertices si and sj , insert an edge of length
overlap( si, sj ) connecting the vertices.
• Then finding the shortest superstring will correspond to
finding the shortest Hamiltonian path in G.
• But this is the Traveling Salesman Problem (TSP), which we
know to be NP-complete.
• Hence SSP must also be NP-Complete!
• Note: We also need to show that any TSP can be formulated as a SSP (not difficult).
92. Reducing SSP to TSP: Example 1
• Take our previous set of
strings S = {000, 001, 010,
011, 100, 101, 110, 111}.
93. Reducing SSP to TSP: Example 1
• Take our previous set of
strings S = {000, 001, 010,
011, 100, 101, 110, 111}.
• Then the graph for S is
given at right.
94. Reducing SSP to TSP: Example 1
• Take our previous set of
strings S = {000, 001, 010,
011, 100, 101, 110, 111}.
• Then the graph for S is
given at right.
• One minimal Hamiltonian
path gives our previous
superstring, 0001110100.
95. Reducing SSP to TSP: Example 1
• Take our previous set of
strings S = {000, 001, 010,
011, 100, 101, 110, 111}.
• Then the graph for S is
given at right.
• One minimal Hamiltonian
path gives our previous
superstring, 0001110100.
• Check that this works!
96. Reducing SSP to TSP: Example 2
• S = {ATC, CCA, CAG,
TCC, AGT}
97. Reducing SSP to TSP: Example 2
ATC
CCA
TCC
AGT
CAG
2
2 22
1
1
1
0
1
1
• S = {ATC, CCA, CAG,
TCC, AGT}
• The graph is provided at
right.
98. Reducing SSP to TSP: Example 2
ATC
CCA
TCC
AGT
CAG
2
2 22
1
1
1
0
1
1
• S = {ATC, CCA, CAG,
TCC, AGT}
• The graph is provided at
right.
• A minimal Hamiltonian
path gives as shortest
superstring ATCCAGT.
99. Reducing SSP to TSP: Example 2
ATC
CCA
TCC
AGT
CAG
2
2 22
1
1
1
0
1
1
ATC
• S = {ATC, CCA, CAG,
TCC, AGT}
• The graph is provided at
right.
• A minimal Hamiltonian
path gives as shortest
superstring ATCCAGT.
100. Reducing SSP to TSP: Example 2
ATC
CCA
TCC
AGT
CAG
2
2 22
1
1
1
0
1
1
ATCC
• S = {ATC, CCA, CAG,
TCC, AGT}
• The graph is provided at
right.
• A minimal Hamiltonian
path gives as shortest
superstring ATCCAGT.
101. Reducing SSP to TSP: Example 2
ATC
CCA
TCC
AGT
CAG
2
2 22
1
1
1
0
1
1
ATCCA
• S = {ATC, CCA, CAG,
TCC, AGT}
• The graph is provided at
right.
• A minimal Hamiltonian
path gives as shortest
superstring ATCCAGT.
102. Reducing SSP to TSP: Example 2
ATC
CCA
TCC
AGT
CAG
2
2 22
1
1
1
0
1
1
ATCCAG
• S = {ATC, CCA, CAG,
TCC, AGT}
• The graph is provided at
right.
• A minimal Hamiltonian
path gives as shortest
superstring ATCCAGT.
103. Reducing SSP to TSP: Example 2
ATC
CCA
TCC
AGT
CAG
2
2 22
1
1
1
0
1
1
• S = {ATC, CCA, CAG,
TCC, AGT}
• The graph is provided at
right.
• A minimal Hamiltonian
path gives as shortest
superstring ATCCAGT. ATCCAGT
105. • 1988: SBH is suggested as an an
alternative sequencing method.
Nobody believes it will ever
work.
• 1991: Light directed polymer
synthesis is developed by Steve
Fodor and colleagues.
• 1994: Affymetrix develops the
first 64-kb DNA microarray.
First microarray
prototype (1989)
First commercial
DNA microarray
prototype w/16,000
features (1994)
500,000 features
per chip (2002)
Sequencing by Hybridization (SBH): History
106. • Attach all possible DNA probes of length l to a flat surface,
each probe at a distinct known location. This set of probes is
called a DNA array.
• Apply a solution containing
fluorescently labeled DNA
fragment to the array.
• The DNA fragment hybridizes
with those probes that are
complementary to substrings
of length l of the fragment.
How SBH Works
Hybridization of a DNA Probe
http://members.cox.net/amgough/Fanconi-genetics-PGD.htm
107. How SBH Works
• Using a spectroscopic
detector, determine
which probes hybridize
to the DNA fragment to
obtain the l–mer
composition of the target
DNA fragment.
• Reconstruct the sequence
of the target DNA
fragment from the l-mer
composition.
DNA Microarray
http://www.wormbook.org/chapters/www_germlinegenomics/germlinegenomics.html
108. How SBH Works: Example
• Say our DNA fragment hybridizes to indicate that it contains
the following substrings: GCAA, CAAA, ATAG, TAGG,
ACGC, GGCA.
• Then the most logical
explanation is that our
fragment is the shortest
superstring containing
these strings!
• Here the superstring is:
ATAGGCAAACGC DNA Microarray Interpreted
109. l-mer Composition
• Spectrum( s, l ): The unordered multiset of all l-mers in a
string s of length n.
• The order of individual elements in Spectrum( s, l ) does not
matter.
110. l-mer Composition
• Spectrum( s, l ): The unordered multiset of all l-mers in a
string s of length n.
• The order of individual elements in Spectrum( s, l ) does not
matter.
• For s = TATGGTGC all of the following are equivalent
representations of Spectrum( s, 3):
111. l-mer Composition
• Spectrum( s, l ): The unordered multiset of all l-mers in a
string s of length n.
• The order of individual elements in Spectrum( s, l ) does not
matter.
• For s = TATGGTGC all of the following are equivalent
representations of Spectrum( s, 3):
{TAT, ATG, TGG, GGT, GTG, TGC}
112. l-mer Composition
• Spectrum( s, l ): The unordered multiset of all l-mers in a
string s of length n.
• The order of individual elements in Spectrum( s, l ) does not
matter.
• For s = TATGGTGC all of the following are equivalent
representations of Spectrum( s, 3):
{TAT, ATG, TGG, GGT, GTG, TGC}
{ATG, GGT, GTG, TAT, TGC, TGG}
113. l-mer Composition
• Spectrum( s, l ): The unordered multiset of all l-mers in a
string s of length n.
• The order of individual elements in Spectrum( s, l ) does not
matter.
• For s = TATGGTGC all of the following are equivalent
representations of Spectrum( s, 3):
{TAT, ATG, TGG, GGT, GTG, TGC}
{ATG, GGT, GTG, TAT, TGC, TGG}
{TGG, TGC, TAT, GTG, GGT, ATG}
114. l-mer Composition
• Spectrum( s, l ): The unordered multiset of all l-mers in a
string s of length n.
• The order of individual elements in Spectrum( s, l ) does not
matter.
• For s = TATGGTGC all of the following are equivalent
representations of Spectrum( s, 3):
{TAT, ATG, TGG, GGT, GTG, TGC}
{ATG, GGT, GTG, TAT, TGC, TGG}
{TGG, TGC, TAT, GTG, GGT, ATG}
• Which ordering do we choose?
115. l-mer Composition
• Spectrum( s, l ): The unordered multiset of all l-mers in a
string s of length n.
• The order of individual elements in Spectrum( s, l ) does not
matter.
• For s = TATGGTGC all of the following are equivalent
representations of Spectrum( s, 3):
{TAT, ATG, TGG, GGT, GTG, TGC}
{ATG, GGT, GTG, TAT, TGC, TGG}
{TGG, TGC, TAT, GTG, GGT, ATG}
• Which ordering do we choose? Typically the one that is
lexicographic, meaning in alphabetical order (think of a
phonebook).
116. • Different sequences may share a common spectrum.
• Example:
Different Sequences, Same Spectrum
Spectrum GTATCT, 2
Spectrum GTCTAT, 2
AT, CT, GT, TA, TC
117. The SBH Problem
• Problem: Reconstruct a string from its l-mer composition
• Input: A set S, representing all l-mers from an (unknown)
string s.
• Output: A string s such that Spectrum( s, l ) = S
• Note: As we have seen, there may be more than one correct
answer. Determining which DNA sequence is actually correct
is another matter.
118. SBH: Hamiltonian Path Approach
• Create a graph G as follows:
• Create one vertex for each member of S.
• Connect vertex v to vertex w with a directed edge (arrow)
if the last l – 1 elements of v match the first l – 1 elements
of w.
• Then a Hamiltonian path in this graph will correspond to a
string s such that Spectrum( s, l )!
132. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
133. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1:
134. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S =
135. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATG
136. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGC
137. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCG
138. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGT
139. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTG
140. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTGG
141. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTGGC
142. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTGGCA
143. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTGGCA
• Path 2:
144. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTGGCA
• Path 2: Gives the string
S =
145. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTGGCA
• Path 2: Gives the string
S = ATG
146. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTGGCA
• Path 2: Gives the string
S = ATGG
147. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTGGCA
• Path 2: Gives the string
S = ATGGC
148. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTGGCA
• Path 2: Gives the string
S = ATGGCG
149. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTGGCA
• Path 2: Gives the string
S = ATGGCGT
150. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTGGCA
• Path 2: Gives the string
S = ATGGCGTG
151. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTGGCA
• Path 2: Gives the string
S = ATGGCGTGC
152. SBH: Hamiltonian Path Approach
• Example:
S = {ATG TGG TGC GTG GGC GCA GCG CGT}
• There are actually two Hamiltonian paths in this graph:
• Path 1: Gives the string
S = ATGCGTGGCA
• Path 2: Gives the string
S = ATGGCGTGCA
153. SBH: A Lost Cause?
• At this point, we should be concerned about using a
Hamiltonian path to solve SBH.
• After all, recall that SSP was an NP-Complete problem, and
we have seen that an instance of SBH is an instance of SSP.
• However, note that SBH is actually a specific case of SSP, so
there is still hope for an efficient algorithm for SBH:
• We are considering a spectrum of only l-mers, and not
strings of any other length.
• Also, we only are connecting two l-mers with an edge if and
only if the overlap between them is l – 1, whereas before we
connected l-mers if there was any overlap at all.
• Note: SBH is not NP-Complete since SBH reduces to SSP, but not vice-versa.
154. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
155. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT}.
AT
GT CG
CAGCTG
GG
156. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG}.
AT
GT CG
CAGCTG
GG
157. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG}.
AT
GT CG
CAGCTG
GG
158. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC}.
AT
GT CG
CAGCTG
GG
159. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC,
GT}.
AT
GT CG
CAGCTG
GG
160. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC,
GT, CA}.
AT
GT CG
CAGCTG
GG
161. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC,
GT, CA, CG}.
AT
GT CG
CAGCTG
GG
162. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC,
GT, CA, CG}.
• E = shown at right.
AT
GT CG
CAGCTG
GG
163. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC,
GT, CA, CG}.
• E = shown at right.
AT
GT CG
CAGCTG
GG
164. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC,
GT, CA, CG}.
• E = shown at right.
AT
GT CG
CAGCTG
GG
165. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC,
GT, CA, CG}.
• E = shown at right.
AT
GT CG
CAGCTG
GG
166. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC,
GT, CA, CG}.
• E = shown at right.
AT
GT CG
CAGCTG
GG
167. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC,
GT, CA, CG}.
• E = shown at right.
AT
GT CG
CAGCTG
GG
168. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC,
GT, CA, CG}.
• E = shown at right.
AT
GT CG
CAGCTG
GG
169. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC,
GT, CA, CG}.
• E = shown at right.
AT
GT CG
CAGCTG
GG
170. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC,
GT, CA, CG}.
• E = shown at right.
AT
GT CG
CAGCTG
GG
171. SBH: Eulerian Path Approach
• So instead, let us consider a completely different graph G:
• Vertices = the set of (l – 1)-mers which are substrings of
some l-mer from our set S.
• v is connected to w with a directed edge if the final l – 2
elements of v agree with the first l – 2 elements of w, and
the union of v and w is in S.
• Example: S = {ATG, TGG,
TGC, GTG, GGC, GCA,
GCG, CGT}.
• V = {AT, TG, GG, GC,
GT, CA, CG}.
• E = shown at right.
AT
GT CG
CAGCTG
GG
172. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
AT
GT CG
CAGCTG
GG
173. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATG
AT
GT CG
CAGCTG
GG
174. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGG
AT
GT CG
CAGCTG
GG
175. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGC
AT
GT CG
CAGCTG
GG
176. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCG
AT
GT CG
CAGCTG
GG
177. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCGT
AT
GT CG
CAGCTG
GG
178. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCGTG
AT
GT CG
CAGCTG
GG
179. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCGTGC
AT
GT CG
CAGCTG
GG
180. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCGTGCA
AT
GT CG
CAGCTG
GG
181. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCGTGCA
AT
GT CG
CAGCTG
GG
182. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCGTGCA
2. ATG AT
GT CG
CAGCTG
GG
183. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCGTGCA
2. ATGC AT
GT CG
CAGCTG
GG
184. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCGTGCA
2. ATGCG AT
GT CG
CAGCTG
GG
185. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCGTGCA
2. ATGCGT AT
GT CG
CAGCTG
GG
186. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCGTGCA
2. ATGCGTG AT
GT CG
CAGCTG
GG
187. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCGTGCA
2. ATGCGTGG AT
GT CG
CAGCTG
GG
188. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCGTGCA
2. ATGCGTGGC AT
GT CG
CAGCTG
GG
189. SBH: Eulerian Path Approach
• Key Point: A sequence reconstruction will actually correspond
to an Eulerian path in this graph.
• Recall that an Eulerian path is ―easy‖ to find (one can always
be found in linear time)…so we have found a simple solution
to SBH!
• In our example, two solutions:
1. ATGGCGTGCA
2. ATGCGTGGCA AT
GT CG
CAGCTG
GG
190. But…How Do We Know an Eulerian Path Exists?
• A graph is balanced if for every vertex the number of
incoming edges equals to the number of outgoing edges. We
write this for vertex v as:
in(v)=out(v)
• Theorem: A connected graph is Eulerian (i.e. contains an
Eulerian cycle) if and only if each of its vertices is balanced.
• We will prove this by demonstrating the following:
1. Every Eulerian graph is balanced.
2. Every balanced graph is Eulerian.
191. Every Eulerian Graph is Balanced
• Suppose we have an Eulerian graph G. Call C the Eulerian
cycle of G, and let v be any vertex of G.
• For every edge e entering v, we can pair e with an edge leaving
v, which is simply the edge in our cycle C that follows e.
• Therefore it directly follows that in(v)=out(v) as needed, and
since our choice of v was arbitrary, this relation must hold for
all vertices in G, so we are finished with the first part.
192. Every Balanced Graph is Eulerian
• Next, suppose that we have a balanced graph G.
• We will actually construct an Eulerian cycle in G.
• Start with an arbitrary vertex v and form a path in G without
repeated edges until we reach a ―dead end,‖ meaning a vertex
with no unused edges leaving it.
• G is balanced, so every time we enter a
vertex w that isn’t v during the course of
our path, we can find an edge leaving w.
So our dead end is v and we have a cycle.
193. Every Balanced Graph is Eulerian
• We have two simple cases for our cycle, which we call C:
1. C is an Eulerian cycle G is Eulerian DONE.
2. C is not an Eulerian cycle.
• So we can assume that C is not an
Eulerian cycle, which means that C
contains vertices which have
untraversed edges.
• Let w be such a vertex, and start a
new path from w. Once again, we
must obtain a cycle, say C’.
194. Every Balanced Graph is Eulerian
• Combine our cycles C and C’ into a bigger cycle C* by
swapping edges at w (see figure).
• Once again, we test C*:
1. C* is an Eulerian cycle G is Eulerian DONE.
2. C* is not an Eulerian cycle.
• If C* is not Eulerian, we iterate our
procedure. Because G has a finite
number of edges, we must eventually
reach a point where our current cycle
is Eulerian (Case 1 above). DONE.
195. • A vertex v is semi-balanced if either in(v) = out(v) + 1 or
in(v) = out(v) – 1 .
• Theorem: A connected graph has an Eulerian path if and only
if it contains at most two semi-balanced vertices and all other
vertices are balanced.
• If G has no semi-balanced vertices, DONE.
• If G has two semi-balanced vertices, connect them with a
new edge e, so that the graph G + e is balanced and must be
Eulerian. Remove e from the Eulerian cycle in G + e to
obtain an Eulerian path in G.
• Think: Why can G not have just one semi-balanced vertex?
Euler’s Theorem: Extension
196. • Fidelity of Hybridization: It is difficult to detect differences
between probes hybridized with perfect matches and those
with one mismatch.
• Array Size: The effect of low fidelity can be decreased with
longer l-mers, but array size increases exponentially in l.
Array size is limited with current technology.
• Practicality: SBH is still impractical. As DNA microarray
technology improves, SBH may become practical in the future.
Some Difficulties with SBH
197. • Practicality Again: Although SBH is still impractical, it
spearheaded expression analysis and SNP analysis techniques.
• Practicality Again and Again: In 2007 Solexa (now Illumina)
developed a new DNA sequencing approach that generates so
many short l-mers that they essentially mimic a universal DNA
array.
Some Difficulties with SBH
212. Reading an Electropherogram
• Reading an Electropherogram requires four processes:
1. Filtering
2. Smoothening
3. Correction for length compressions
4. A method for calling the nucleotides – PHRED
217. Shotgun Sequencing
Cut many times at random
(hence shotgun)
Genomic Segment
Get one or two reads from
each segment
218. Shotgun Sequencing
Cut many times at random
(hence shotgun)
Genomic Segment
Get one or two reads from
each segment
~500 bp ~500 bp
219. Fragment Assembly
• Cover region with ~7-fold redundancy.
• Overlap reads and extend to reconstruct the original
genomic region.
Reads
220. Read Coverage
• Length of genomic segment: L
• Number of reads: n
• Length of each read: l
• Define the coverage as: C = n l / L
• Question: How much coverage is enough?
• Lander-Waterman Model: Assuming uniform distribution of
reads, C = 10 results in 1 gap in coverage per million
nucleotides.
C
221. • Repeats: A major problem for fragment assembly.
• More than 50% of human genome are repeats:
• Over 1 million Alu repeats (about 300 bp).
• About 200,000 LINE repeats (1000 bp and longer).
Repeat Repeat Repeat
Challenges in Fragment Assembly
222. • A Triazzle ® puzzle has only
16 pieces and looks simple.
• BUT… there are many
repeats!
• The repeats make it very
difficult to solve.
• This repetition is what makes
fragment assembly is so
difficult.
DNA Assembly Analogy: Triazzle
http://www.triazzle.com/
223. Repeat Type Explanation
• Low-Complexity DNA (e.g. ATATATATACATA…)
• Microsatellite repeats (a1…ak)N where k ~ 3-6
(e.g.
CAGCAGTAGCAGCACCAG)
• Gene Families genes duplicate & then diverge
• Segmental duplications ~very long, very similar copies
Repeat Classification
224. Repeat Classification
Repeat Type Explanation
•SINE Transposon Short Interspersed Nuclear
Elements
(e.g., Alu: ~300 bp long, 106
copies)
•LINE Transposon Long Interspersed Nuclear
Elements
~500 - 5,000 bp long,
200,000 copies
•LTR retroposons Long Terminal Repeats (~700 bp)
229. Assembly Method: Overlap-Layout-Consensus
• Assemblers: ARACHNE, PHRAP,
CAP, TIGR, CELERA
• Three steps:
1. Overlap: Find potentially
overlapping reads.
2. Layout: Merge reads into
contigs and contigs into
supercontigs.
Layout
Overlap
230. Assembly Method: Overlap-Layout-Consensus
• Assemblers: ARACHNE, PHRAP,
CAP, TIGR, CELERA
• Three steps:
1. Overlap: Find potentially
overlapping reads.
2. Layout: Merge reads into
contigs and contigs into
supercontigs.
3. Consensus: Derive the DNA
sequence and correct any read
errors.
Consensus
..ACGATTACAATAGGTT..
Layout
Overlap
231. Step 1: Overlap
• Find the best match between the suffix of one read and the
prefix of another.
• Due to sequencing errors, we need to use dynamic
programming to find the optimal overlap alignment.
• Apply a filtration method to filter out pairs of fragments that
do not share a significantly long common substring.
233. • A k-mer that appears N times initiates N2 comparisons.
• For an Alu that appears 106 times, we will have 1012
comparisons – this is too many.
• Solution: Discard all k-mers that appear more than t
Coverage, (t ~ 10)
Step 1: Overlap
234. • We next create local multiple alignments from the overlapping
reads.
TAGATTACACAGATTACTGA
TAGATTACACAGATTACTGA
TAG TTACACAGATTATTGA
TAGATTACACAGATTACTGA
TAGATTACACAGATTACTGA
TAGATTACACAGATTACTGA
TAG TTACACAGATTATTGA
TAGATTACACAGATTACTGA
Step 2: Layout
235. Step 2: Layout
• Repeats are a major challenge.
• Do two aligned fragments really overlap, or are they from two
copies of a repeat?
• Solution: repeat masking – hide the repeats!
236. Step 2: Layout
• Repeats are a major challenge.
• Do two aligned fragments really overlap, or are they from two
copies of a repeat?
• Solution: repeat masking – hide the repeats!
• Masking results in a high rate of misassembly (~20 %).
237. Step 2: Layout
• Repeats are a major challenge.
• Do two aligned fragments really overlap, or are they from two
copies of a repeat?
• Solution: repeat masking – hide the repeats!
• Masking results in a high rate of misassembly (~20 %).
• Misassembly means a lot more work at the finishing step.
238. • Repeats shorter than read length are OK.
• Repeats with more base pair differences than the sequencing
error rate are OK.
• To make a smaller portion of the genome appear repetitive, try
to:
• Increase read length
• Decrease sequencing error rate
Step 2: Layout
239. Step 3: Consensus
• A consensus sequence is derived from a profile of the
assembled fragments.
• A sufficient number of reads are required to ensure a
statistically significant consensus.
• Reading errors are corrected.
240. • Derive multiple alignment from pairwise read alignments.
• Derive each consensus base by weighted voting.
TAGATTACACAGATTACTGA TTGATGGCGTAA CTA
TAGATTACACAGATTACTGACTTGATGGCGTAAACTA
TAG TTACACAGATTATTGACTTCATGGCGTAA CTA
TAGATTACACAGATTACTGACTTGATGGCGTAA CTA
TAGATTACACAGATTACTGACTTGATGGGGTAA CTA
TAGATTACACAGATTACTGACTTGATGGCGTAA CTA
Step 3: Consensus
Multiple Alignment
Consensus String
241. • Each vertex represents a read from the original sequence.
• Vertices are connected by an edge if they overlap.
Overlap Graph: Hamiltonian Approach
242. Repeat Repeat Repeat
• Each vertex represents a read from the original sequence.
• Vertices are connected by an edge if they overlap.
Overlap Graph: Hamiltonian Approach
243. Repeat Repeat Repeat
• Each vertex represents a read from the original sequence.
• Vertices are connected by an edge if they overlap.
Overlap Graph: Hamiltonian Approach
244. Repeat Repeat Repeat
• Each vertex represents a read from the original sequence.
• Vertices are connected by an edge if they overlap.
Overlap Graph: Hamiltonian Approach
245. Repeat Repeat Repeat
• Each vertex represents a read from the original sequence.
• Vertices are connected by an edge if they overlap.
Overlap Graph: Hamiltonian Approach
246. Repeat Repeat Repeat
• Each vertex represents a read from the original sequence.
• Vertices are connected by an edge if they overlap.
Overlap Graph: Hamiltonian Approach
247. Repeat Repeat Repeat
• Each vertex represents a read from the original sequence.
• Vertices are connected by an edge if they overlap.
Overlap Graph: Hamiltonian Approach
248. Repeat Repeat Repeat
• Each vertex represents a read from the original sequence.
• Vertices are connected by an edge if they overlap.
Overlap Graph: Hamiltonian Approach
249. Repeat Repeat Repeat
• A Hamiltonian path in this graph provides a candidate assembly.
• Each vertex represents a read from the original sequence.
• Vertices are connected by an edge if they overlap.
Overlap Graph: Hamiltonian Approach
250. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
251. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
252. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
253. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
254. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
255. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
256. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
257. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
258. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
259. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
260. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
261. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
262. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
263. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
264. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
265. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
Overlap Graph: Hamiltonian Approach
266. • So finding an alignment corresponds to finding a Hamiltonian
path in the overlap graph.
• Recall that the Hamiltonian path/cycle problem is NP-
Complete: no efficient algorithms are known.
• Note: Finding a Hamiltonian path only looks easy because we
know the optimal alignment before constructing overlap graph.
Overlap Graph: Hamiltonian Approach
267. • The ―overlap-layout-consensus‖ technique implicitly solves
the Hamiltonian path problem and has a high rate of mis-
assembly.
• Can we adapt the Eulerian Path approach borrowed from the
SBH problem?
• Fragment assembly without repeat masking can be done in
linear time with greater accuracy.
EULER Approach to Fragment Assembly
268. Repeat Repeat Repeat
Repeat Graph: Eulerian Approach
• Gluing each repeat edge together
gives a clear progression of the
path through the entire sequence.
269. Repeat Repeat Repeat
• Gluing each repeat edge together
gives a clear progression of the
path through the entire sequence.
Repeat Graph: Eulerian Approach
270. Repeat Repeat Repeat
Repeat Graph: Eulerian Approach
• Gluing each repeat edge together
gives a clear progression of the
path through the entire sequence.
271. Repeat Repeat Repeat
Repeat Graph: Eulerian Approach
• Gluing each repeat edge together
gives a clear progression of the
path through the entire sequence.
• In the repeat graph, an alignment
corresponds to an Eulerian
path…linear time reduction!
272. Repeat1 Repeat1Repeat2 Repeat2
• The repeat graph can
be easily constructed
with any number of
repeats.
Repeat Graph: Eulerian Approach
275. • Problem: In previous slides, we have constructed the repeat
graph while already knowing the genome structure.
• How do we construct the repeat graph just from fragments?
• Solution: Break the reads into smaller pieces.
?
Making Repeat Graph From Reads Only
276. Repeat Sequences: Emulating a DNA Chip
• A virtual DNA chip allows one to solve the fragment assembly
problem using our SBH algorithm.
277. Construction of Repeat Graph
• Construction of repeat graph from k-mers: emulates an
SBH experiment with a huge (virtual) DNA chip.
• Breaking reads into k-mers: Transforms sequencing data into
virtual DNA chip data.
278. • Error correction in reads: ―Consensus first‖ approach to
fragment assembly.
• Makes reads (almost) error-free BEFORE the assembly
even starts.
• Uses reads and mate-pairs to simplify the repeat graph
(Eulerian Superpath Problem).
Construction of Repeat Graph
279. • If an error exists in one of the 20-mer reads, the error will be
perpetuated among all of the smaller pieces broken from that
read.
• However, that error will not be present in the other instances
of the 20-mer read.
• So it is possible to eliminate most point mutation errors before
reconstructing the original sequence.
Minimizing Errors
280. • Graph theory has a wide range of applications throughout
bioinformatics, including sequencing, motif finding, protein
networks, and many more.
Graph Theory in Bioinformatics
281. • Simons, Robert W. Advanced Molecular Genetics Course,
UCLA (2002).
http://www.mimg.ucla.edu/bobs/C159/Presentations/Benzer.pdf
• Batzoglou, S. Computational Genomics Course, Stanford
University (2004).
http://www.stanford.edu/class/cs262/handouts.html
References