This project developed a new algorithm to infer phylogenetic trees from morphological data involving inapplicable characters. The algorithm consists of a downpass and uppass to assign values to internal nodes and optimize the tree locally at each node. It finds the optimal tree according to parsimony principles. While tested successfully on sample trees, further work is needed to prove optimality for all trees and extend the approach to multi-character data.
A distance-based method for phylogenetic tree reconstruction using algebraic ...Emily Castner
This poster is for the Joint Mathematics Meetings undergraduate poster session in January 2016 and gives a sample of my research from summer 2015 in implementing a new tree reconstruction method at the Winthrop REU: Bridging Theoretical & Applied Mathematics.
Presented this slide deck to analytic and evaluation professionals at the Ohio Program Evaluators' Group's bi-annual conference. Discussed how to reduce large, complex datasets into smaller, manageable projects.
This presentation entitled 'Molecular phylogenetics and its application' deals with all the developmental ideas and basics in the field of bioinformatics.
A distance-based method for phylogenetic tree reconstruction using algebraic ...Emily Castner
This poster is for the Joint Mathematics Meetings undergraduate poster session in January 2016 and gives a sample of my research from summer 2015 in implementing a new tree reconstruction method at the Winthrop REU: Bridging Theoretical & Applied Mathematics.
Presented this slide deck to analytic and evaluation professionals at the Ohio Program Evaluators' Group's bi-annual conference. Discussed how to reduce large, complex datasets into smaller, manageable projects.
This presentation entitled 'Molecular phylogenetics and its application' deals with all the developmental ideas and basics in the field of bioinformatics.
introduction to upgma software , its history and origination, basic mening of upgma, the upgma algorithm, steps to perform upgma, and its diagramatic representation of the process along with an example, its application, advantages along with the disadvantages, and its uses.
The MEGA software is one of the most widely used software tools in molecular taxonomy and bioinformatics. This module describes how MEGA can be employed in a classroom setting to teach the fundamentals of molecular taxonomy.
Sequence homology search and multiple sequence alignment(1)AnkitTiwari354
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal (or lateral) gene transfer event (xenologs).[1]
Homology among DNA, RNA, or proteins is typically inferred from their nucleotide or amino acid sequence similarity. Significant similarity is strong evidence that two sequences are related by evolutionary changes from a common ancestral sequence. Alignments of multiple sequences are used to indicate which regions of each sequence are homologous.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Theoretical work submitted to the Journal should be original in its motivation or modeling structure. Empirical analysis should be based on a theoretical framework and should be capable of replication. It is expected that all materials required for replication (including computer programs and data sets) should be available upon request to the authors.
introduction to upgma software , its history and origination, basic mening of upgma, the upgma algorithm, steps to perform upgma, and its diagramatic representation of the process along with an example, its application, advantages along with the disadvantages, and its uses.
The MEGA software is one of the most widely used software tools in molecular taxonomy and bioinformatics. This module describes how MEGA can be employed in a classroom setting to teach the fundamentals of molecular taxonomy.
Sequence homology search and multiple sequence alignment(1)AnkitTiwari354
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal (or lateral) gene transfer event (xenologs).[1]
Homology among DNA, RNA, or proteins is typically inferred from their nucleotide or amino acid sequence similarity. Significant similarity is strong evidence that two sequences are related by evolutionary changes from a common ancestral sequence. Alignments of multiple sequences are used to indicate which regions of each sequence are homologous.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Theoretical work submitted to the Journal should be original in its motivation or modeling structure. Empirical analysis should be based on a theoretical framework and should be capable of replication. It is expected that all materials required for replication (including computer programs and data sets) should be available upon request to the authors.
GIRA, empresa 100% mexicana con 10 años de experiencia; especialista en programas de Capacitación Vivencial y organización de eventos Recreativos Empresariales, enfocados al fortalecimiento de la Cultura y Clima Organizacional.
Is the sweetness in the sugar sector for realShailesh Saraf
The stock prices in sugar sector skyrocketed over 20% in past two months. Upper Ganges Sugar share price, Mawana Sugars, Oudh Sugar share price, stocks of Ugar Sugar, Dharani Sugar, Dwarikesh Sugar, Uttam Sugar share price, Thiru Arooran Sugars share price and Rajshree Sugar share price rallied up to 350% in the month of June this year.
A controversial genetic restoration mechanism has been proposed for the model organism Arabidopsis thaliana. This theory proposes that genetic material from non-parental ancestors is used to restore genetic information that was inadvertently corrupted during reproduction. We evaluate the effectiveness of this strategy by adapting it to an evolutionary algorithm solving two distinct benchmark optimization problems. We compare the performance of the proposed strategy with a number of alternate strategies – including the Mendelian alternative. Included in this comparison are a number of biologically implausible templates that help elucidate likely reasons for the relative performance of the different templates. Results show that the proposed non- Mendelian restoration strategy is highly effective across the range of conditions investigated – significantly outperforming the Mendelian alternative in almost every situation.
Course slides for computational phyloinformatics, an annual course organized by NESCent in collaboration with hosting organizations across the world. I am the teacher of the Perl section of the course, these are the slides I presented in 2010 at BGI, Shenzhen, PRC.
Considerations on the genetic equilibrium lawIOSRJM
In the first part of the paper I willpresentabriefreview on the Hardy-Weinberg equilibrium and it's formulation in projective algebraicgeometry. In the second and last part I willdiscussexamples and generalizations on the topic
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
Ontologisms have been applied to many applications in recent years, especially on Sematic Web, Information Retrieval, Information Extraction, and Question and Answer. The purpose of domain-specific ontology is to get rid of conceptual and terminological confusion. It accomplishes this by specifying a set of generic concepts that characterizes the domain as well as their definitions and interrelationships. This paper will describe some algorithms for identifying semantic relations and constructing an Information Technology Ontology, while extracting the concepts and objects from different sources. The Ontology is constructed based on three main resources: ACM, Wikipedia and unstructured files from ACM Digital Library. Our algorithms are combined of Natural Language Processing and Machine Learning. We use Natural Language Processing tools, such as OpenNLP, Stanford Lexical Dependency Parser in order to explore sentences. We then extract these sentences based on English pattern in order to build training set. We use a random sample among 245 categories of ACM to evaluate our results. Results generated show that our system yields superior performance.
Ontologisms have been applied to many applications in recent years, especially on Sematic Web, Information
Retrieval, Information Extraction, and Question and Answer. The purpose of domain-specific ontology
is to get rid of conceptual and terminological confusion. It accomplishes this by specifying a set of generic
concepts that characterizes the domain as well as their definitions and interrelationships. This paper will
describe some algorithms for identifying semantic relations and constructing an Information Technology
Ontology, while extracting the concepts and objects from different sources. The Ontology is constructed
based on three main resources: ACM, Wikipedia and unstructured files from ACM Digital Library. Our
algorithms are combined of Natural Language Processing and Machine Learning. We use Natural Language
Processing tools, such as OpenNLP, Stanford Lexical Dependency Parser in order to explore sentences.
We then extract these sentences based on English pattern in order to build training set. We use a
random sample among 245 categories of ACM to evaluate our results. Results generated show that our
system yields superior performance.
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESacijjournal
Named Entity Recognition which is an important subject of Natural Language Processing is a key technology of information extraction, information retrieval, question answering and other text processing applications. In this study, we evaluate previously well-established association measures as an initial
attempt to extract two-worded named entities in a Turkish corpus. Furthermore we propose a new association measure, and compare it with the other methods. The evaluation of these methods is performed by precision and recall measures.
Exploring the Solution Space of Sorting by Reversals: A New ApproachIDES Editor
Analysing genome rearrangements is a problem
from the vast domain of comparative genomics and
computational biology. Several studies have shown that closely
related species have essentially the same set of genes however
their gene orders differ. The differences in the gene order are
the results of various large-scale evolutionary events of which
reversal is the most common rearrangement event. The
problem of finding the shortest sequence of reversals that can
transform one genome into another is called the sorting by
reversals problem. The length of such a sequence is the
reversal distance between the two genomes. In comparative
genomics, sorting by reversals algorithms are often used to
propose evolutionary scenarios of large-scale genomic
mutations between species. Following the first polynomial
time solution of this problem, several improvements has been
published on the subject. In 2008, Braga et al. proposed an
algorithm to perform the enumeration of traces that sort a
signed permutation by reversals. This algorithm has
exponential complexity in both time and space. To efficiently
handle the traces, Baudet and Dias proposed a depth first
approach in 2010. However, one of the limitations of the
proposed algorithm was that it cannot provide the count of
number of solutions in each trace. In this paper we are
presenting an algorithm to list the normal forms of each trace
in depth first manner and provide count of the total number of
solutions in the solution space.
Understanding Protein Function on a Genome-scale through the Analysis of Molecular Networks
Cornell Medical School, Physiology, Biophysics and Systems Biology (PBSB) graduate program, 2009.01.26, 16:00-17:00; [I:CORNELL-PBSB] (Long networks talk, incl. the following topics: why networks w. amsci*, funnygene*, net. prediction intro, memint*, tse*, essen*, sandy*, metagenomics*, netpossel*, tyna*+ topnet*, & pubnet* . Fits easily into 60’ w. 10’ questions. PPT works on mac & PC and has many photos w. EXIF tag kwcornellpbsb .)
Date Given: 01/26/2009
Tree Physiology Optimization in Constrained Optimization ProblemTELKOMNIKA JOURNAL
Metaheuristic algorithms are proven to be more effective on finding global optimum in numerous
problems including the constrained optimization area. The algorithms have the capacity to prevail over
many deficiencies in conventional algorithms. Besides of good quality of performance, some metaheuristic
algorithms have limitations that may deteriorate by certain degree of difficulties especially in real-world
application. Most of the real-world problems consist of constrained problem that is significantly important in
modern engineering design and must be considered in order to perform any optimization task. Therefore, it
is essential to compare the performance of the algorithm in diverse level of difficulties in constrained
region. This paper introduces Tree Physiology Optimization (TPO) algorithm for solving constrained
optimization problem and compares the performance with other existing metaheuristic algorithms. The
constrained problems that are included in the comparison are three engineering design and nonlinear
mathematic problems. The difficulties of each proposed problem are the function complexity, number of
constraints, and dimension of variables. The performance measure of each algorithm is the statistical
results of finding the global optimum and the convergence towards global optimum.
Introduction to 16S rRNA gene multivariate analysisJosh Neufeld
Short introductory talk on multivariate statistics for 16S rRNA gene analysis given at the 2nd Soil Metagenomics conference in Braunschweig Germany, December 2013. A previous talk had discussed quality filtering, chimera detection, and clustering algorithms.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
PMC Poster - phylogenetic algorithm for morphological data
1. Project Description Method Conclusion
Background
Results
[1] Felsenstein, Joseph (2004). Inferring Phylogenies.
[2] Brazeau, Martin D. Problematic character coding methods in morphology and
their effect, Biological Journal of the Linnean Society, 2011, 103, p. 489-498.
[3] Swofford, David L. and Maddison, Wayne P. Reconstructing Ancestral Character
States Under Wagner Parsimony, Mathematical Biosciences 87, 199-229 (1987).
[4] De Laet, Jan E (2005). Parsimony and the problem of inapplicable in sequence
data, in: Albert, V.A. (ed.) Parsimony, phylogeny and genomics.
[5] Budd, Graham E. A paleontological solution to the arthropod head problem,
Nature, vol. 417, 16 May 2002, p. 271-275.
The literature contains many algorithms for inferring
phylogenies [1], often based on the principle of parsimony:
the most likely scenario is the one with the fewest number
of evolutionary changes. An example is the Fitch algorithm,
suited for non-additive data. However, in a morphological
study of arthropod appendages we deal with additive data in
the form of a character describing the number of
appendages. We would also like to include other data such
as appendage colour or shape, but they are dependent on
the presence of a certain appendage. This contingency leads
to complications involving inapplicable data [2].
We split the problem into two parts: (1) reconstruct the tree
with the number of appendages as single character, and (2)
apply an algorithm adapted from the Fitch algorithm to the
full data set, whereby only evolutionary changes in present
states are taken into account. The first part relies on Wagner
parsimony, and algorithms to find optimal trees have been
proposed decades ago [3]. However, these existing
algorithms are either too general or fail to maximize the
number of homologies, as desirable by the Hennig-Farris
principle [4]. Hence we have tried to find a novel algorithm
suitable for dealing with the particular problem studied.
Historical biology is primarily concerned with the inference of
evolutionary history and relationships. Whereas molecular data
can be readily interrogated under a range of models, the
reconstruction of phylogenetic trees from morphological data –
the mainstay of palaeontologists and conventional taxonomists
– has outstanding problems. This project’s objective was to
adapt existing models of morphological evolution to incorporate
peculiarities of morphological data sets.
Contact information: (1) yiteng@live.nl, (2) ms609@cam.ac.uk
Funded by the Post Master
Consultancy Scheme
A typical problem that involves morphological data is the
arthropod head problem. Arthropods form a phylum which
includes the hexapods, crustaceans, myriapods, chelicerates
and trilobites). Their heads have complex structures which
include one or more pairs of antennae. The evolutionary
relationships and between these appendages – whether
some antennae have the same origin as others – is an
ongoing topic of debate among zoologists.
Schematic
representation of head
structures of five groups
of arthropods A new algorithm has been proposed for inferring the tree for
the first part of our approach. It consists of a downpass from
the tips to the root and an uppass from the root upwards
assigning definite values to all internal nodes and tips (see
flowchart). The algorithm was devised by examining trees
locally by looking at an internal node and its immediate
ancestor and descendants. By considering the various
possibilities for their datasets exhaustively and optimizing
the internal node in each of the cases, we obtained the
algorithm displayed to the right.
This algorithm has been implemented in R and C code and
has been tested for a range of sample trees. The results
found all coincide with the best tree obtained by parsimony
and the Hennig-Farris principle (example: top-right figure).
Above: algorithm applied to arbitary tree
Left: flowchart of the algorithm.
Below: example of phylogenetic tree (source: [5])
The algorithm is a proposed solution for the first part of the
problem. Each part of the tree is optimized locally during the
uppass, but it remains to prove that the resulting full tree is
optimal. Furthermore, ongoing work has resulted in an
algorithm for the second part of the problem for single-
character trees. An extension to multi-character trees seems
possible, but has not yet been realized.
References