Presentation of the software package RPhenoscape for the R platform for statistical computing. The package bridges between the ecosystem of packages for comparative phylogenetics in R and the data content and computational semantics services provided by the API of the Phenoscape Knowledgebase. Presented at the 2016 Evolution Meetings in Austin, TX.
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...Hilmar Lapp
Presentation about two small tools addressing gaps commonly encountered when computing and programming with OWL (the Web Ontology Language) at scale. Given at the 2014 Bioinformatics Open Source Conference (BOSC).
The video of the talk is here: http://youtu.be/K0SlYwMyn-A
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...Hilmar Lapp
Presentation about two small tools addressing gaps commonly encountered when computing and programming with OWL (the Web Ontology Language) at scale. Given at the 2014 Bioinformatics Open Source Conference (BOSC).
The video of the talk is here: http://youtu.be/K0SlYwMyn-A
Molecular scaffolds are special and useful guides to discovery, poster (36x54"). Presented at ACS National Meeting SciMix in Indianapolis, Sep 9, 2013.
Getting Started with the Hymenoptera Anatomical OntologyKatja C. Seltmann
For Biodiversity Informatics workshop in Stockholm, Friday September 18. Describing the Hymenoptera Anatomical Ontology. Authors: Matthew Yoder, Andrew Deans, Katja Seltmann, István Mikó, Matthew Bertone
Alignment of Ontology Design Patterns: Class As Property Value, Value Partiti...Bene Rodriguez
This presentation revisits a specific version of three different Ontology Design Patterns (ODPs): Class as a Property Value (CPV), Value Partition (VP) and Normalisation. The review of the CPV identifies two distinct modelling problems being tangled that prompt to decouple the pattern into two variants: a strict and a coarse CPV pattern. The examination continues with a comparative analysis among the patterns that reveals key alignments and differences at the structural and semantic level. (Related full paper available at: http://dx.doi.org/10.1007/978-3-642-33615-7_16)
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Ellinor Michel
Global Digital Infrastructure for Biological Nomenclature and Taxonomy
Ellinor Michel, Dep’t of Life Sciences, The Natural History Museum, London, UK, (e.michel@nhm.ac.uk)
Richard L. Pyle, Natural Sciences Dep’t, Bishop Museum, Honolulu, HI, USA
Robert P. Guralnick, Dep’t of Ecology & Evolutionary Biology, Univ Colorado, Boulder, CO, USA
Jon Todd, Dep’t of Earth Sciences, The Natural History Museum, London, UK,
The future for interoperable scientific information is digital, yet scientific names, the handles for all biodiversity information, remain without an integrated system tied to published descriptions and museum type specimens. Descriptions and type specimens provide standards for the otherwise fluid concepts of biological taxa. We are working to unify the infrastructures for biological nomenclature across nomenclatural codes (including zoological (ICZN - http://iczn.org/), botanical (ICNafp - http://www.iapt-taxon.org/nomen/main.php) and bacterial (ICNB) codes) through the Global Names Architecture (GNA). Our initial focus is on animal names, as these comprise the largest component of metazoan biodiversity and ZooBank (zoobank.org) is the first code-related online nomenclatural registration system. Users are applied scientists in agriculture, medicine, veterinary science and climate change research; biodiversity researchers such as ecologists, physiologists; archives such as museums; the scientific publishing community – in short, all users of scientific names of organisms based on the work of taxonomists.
Molecular scaffolds are special and useful guides to discovery, poster (36x54"). Presented at ACS National Meeting SciMix in Indianapolis, Sep 9, 2013.
Getting Started with the Hymenoptera Anatomical OntologyKatja C. Seltmann
For Biodiversity Informatics workshop in Stockholm, Friday September 18. Describing the Hymenoptera Anatomical Ontology. Authors: Matthew Yoder, Andrew Deans, Katja Seltmann, István Mikó, Matthew Bertone
Alignment of Ontology Design Patterns: Class As Property Value, Value Partiti...Bene Rodriguez
This presentation revisits a specific version of three different Ontology Design Patterns (ODPs): Class as a Property Value (CPV), Value Partition (VP) and Normalisation. The review of the CPV identifies two distinct modelling problems being tangled that prompt to decouple the pattern into two variants: a strict and a coarse CPV pattern. The examination continues with a comparative analysis among the patterns that reveals key alignments and differences at the structural and semantic level. (Related full paper available at: http://dx.doi.org/10.1007/978-3-642-33615-7_16)
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Ellinor Michel
Global Digital Infrastructure for Biological Nomenclature and Taxonomy
Ellinor Michel, Dep’t of Life Sciences, The Natural History Museum, London, UK, (e.michel@nhm.ac.uk)
Richard L. Pyle, Natural Sciences Dep’t, Bishop Museum, Honolulu, HI, USA
Robert P. Guralnick, Dep’t of Ecology & Evolutionary Biology, Univ Colorado, Boulder, CO, USA
Jon Todd, Dep’t of Earth Sciences, The Natural History Museum, London, UK,
The future for interoperable scientific information is digital, yet scientific names, the handles for all biodiversity information, remain without an integrated system tied to published descriptions and museum type specimens. Descriptions and type specimens provide standards for the otherwise fluid concepts of biological taxa. We are working to unify the infrastructures for biological nomenclature across nomenclatural codes (including zoological (ICZN - http://iczn.org/), botanical (ICNafp - http://www.iapt-taxon.org/nomen/main.php) and bacterial (ICNB) codes) through the Global Names Architecture (GNA). Our initial focus is on animal names, as these comprise the largest component of metazoan biodiversity and ZooBank (zoobank.org) is the first code-related online nomenclatural registration system. Users are applied scientists in agriculture, medicine, veterinary science and climate change research; biodiversity researchers such as ecologists, physiologists; archives such as museums; the scientific publishing community – in short, all users of scientific names of organisms based on the work of taxonomists.
Integrating data with phylogenies, at scaleHilmar Lapp
Invited presentation at the final Phenotype RCN Summit, held at Biosphere2, AZ, Feb 26-28, 2016. Co-presented with N. Cellinese.
More information about the Phyloreferencing project can be found at http://phyloref.org.
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Spark Summit
Recent advances in genome sequencing technologies and bioinformatics have enabled whole-genomes to be studied at population-level rather then for small number of individuals. This provides new power to whole genome association studies (WGAS
), which now seek to identify the multi-gene causes of common complex diseases like diabetes or cancer.
As WGAS involve studying thousands of genomes, they pose both technological and methodological challenges. The volume of data is significant, for example the dataset from 1000 Genomes project with genomes of 2504 individuals includes nearly 85M genomic variants with raw data size of 0.8 TB. The number of features is enormous and greatly exceeds the number of samples, which makes it challenging to apply traditional statistical approaches.
Random forest is one of the methods that was found to be useful in this context, both because of its potential for parallelization and its robustness. Although there is a number of big data implementations available (including Spark ML) they are tuned for typical dataset with large number of samples and relatively small number of variables, and either fail or are inefficient in the GWAS context especially, that a costly data preprocessing is usually required.
To address these problems, we have developed the RandomForestHD – a Spark based implementation optimized for highly dimensional data sets. We have successfully RandomForestHD applied it to datasets beyond the reach of other tools and for smaller datasets found its performance superior. We are currently applying RandomForestHD, released as part of the VariantSpark toolkit, to a number of WGAS studies.
In the presentation we will introduce the domain of WGAS and related challenges, present RandomForestHD with its design principles and implementation details with regards to Spark, compare its performance with other tools, and finally showcase the results of a few WGAS applications.
My poster on using pairwise learning for annotating, engineering and designing biological molecules. Mostly an overview of the types of things we are working on at the lab.
Open Bioinformatics Foundation: 2014 Update & Some IntrospectionHilmar Lapp
Annual update about the Open Bioinformatics Foundation, presented at the 2014 Bioinformatics Open Source Conference (BOSC). which was held July 11-12, 2014, in Boston, MA.
The Dryad Digital Repository: Published data as part of the greater data ecos...Hilmar Lapp
Presented at the M3 and Biosharing Special Interest Group (SIG) meeting at ISMB 2010 in Boston, MA: http://gensc.org/gc_wiki/index.php/M3_%26_BioSharing
Bringing reason to phenotype diversity, character change, and common descentHilmar Lapp
Talk I gave in the National Center for BioOntologies (NCBO) Webinar series, on Nov 17, 2010.
Abstract, bio, and video recording are at the NCBO website:
http://www.bioontology.org/phenoscape
Report to the 2009 TDWG Conference in Montpellier, France, about the Phyloinformatics VoCamp that we ran just prior to and into the beginning of the conference. Full details about the VoCamp are here:
http://www.evoio.org/wiki/VoCamp1
Open science, open-source, and open data: Collaboration as an emergent property?Hilmar Lapp
Talk I gave as part of the panel "How will cyberinfrastructure capabilities shape the future of scientific collaboration?" at the Cyberinfrastructure for Collaborative Science workshop, held at the National Evolutionary Synthesis Center (NESCent), May 18-20, 2011.
More information about the workshop at
https://www.nescent.org/wg_collabsci/2011_Workshop
Liberating Our Beautiful Trees: A Call to Arms.Hilmar Lapp
Lightning talk I gave at the 2012 iEvoBio conference in Ottawa, Canada. The abstract is can be found here:
http://ievobio.org/ocs2/index.php/ievobio/2012/paper/view/39/26
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Mammalian Pineal Body Structure and Also Functions
Rphenoscape: Connecting the semantics of evolutionary morphology to comparative phylogenetics
1. Rphenoscape:
Connecting the semantics of
evolutionary morphology to
comparative phylogenetics
Hilmar Lapp (Duke University)
Hong Xu (Duke University)
Jim Balhoff (RTI, Inc.)
Evolution Meetings 2016, Austin, TX
2. RPhenoscape
• A package for accessing the
Phenoscape Knowledgebase from
within R programs
• Programmatic access to:
• Evolutionary character data with
computable semantics
• Machine-reasoning with
computable phenotype data
3. R features a rich ecosystem
for comparative phylogenetics
CRAN Task View on
Phylogenetics and
Comparative Methods at last
count lists 76 packages.
4. Comparative analysis needs comparative data
Magee et al (2014), PLOS ONE
See also Drew et al (2013), PLOS Biology; Stoltzfus et al (2012), BMC Research Notes
5. The lack of reusable digital data
is amplified for morphology
Which matrix are we criticizing?
RC07 published their character list (their appendix 2) and their
matrix (appendix 3). We also have a hitherto unpublished
NEXUS file (presented here as part of Data S3), most likely
sent by [M.R.] to [D.G.] in late 2007 or early 2008, which
purports to contain the same matrix. Surprisingly, the
character list in the paper and that in the file do not agree
on the identities of characters 132–134.
Marjanović and Laurin (2015) Reevaluation of the largest published morphological data matrix for
phylogenetic analysis of Paleozoic limbed vertebrates. PeerJ Preprints 3:e1596v1
14. RPhenoscape
An R package for API access to the Phenoscape
Knowledgebase
• Evolutionary character data with computable
semantics
• Machine-reasoning with computable
phenotype data:
• Synthetic supermatrix synthesis
• Semantics-based character and state filtering
• Semantic similarity-driven querying and
synthesis
15. Use-case: Querying studies by
morphology and taxonomy
> slist <- pk_get_study_list(taxon = "Ictaluridae",
entity = "pectoral fin")
> slist[,"label"]
Source: local data frame [10 x 1]
label
<chr>
1 Bockmann, F. A. (1998)
2 Chen, X. (1994)
3 De Pinna, M. C., Ferraris, C. J. J., & Vari, R. P. (2007)
4 Fink, S. V, & Fink, W. L. (1981); Fink, S. V, & Fink, W. L. (1996)
5 Kailola, P. J. (2004)
6 Lundberg, J. G. (1992)
7 Mo, T. (1991)
8 Royero, R. (1999)
9 Vigliotta, T. R. (2008)
10 Wiley, E.O., and Johnson, G.D. (2010)
>
16. Use-case: Querying studies by
morphology and taxonomy
> nex_list <- pk_get_study_xml(as.matrix
(slist[2:3,c("id")]))
....This might take a while....
https://scholar.google.com/scholar?q=hylogenetic+studies+of+the
+amblycipitid+catfishes+%28Teleostei%2C+Siluriformes%29+with
+species+accounts&btnG=&hl=en&as_sdt=0%2C42
Parse NeXML....
http://dx.doi.org/10.1111/j.1096-3642.2007.00306.x
Parse NeXML....
> nex_list[[1]]
A nexml object representing:
0 phylogenetic tree blocks, where:
block 1 contains NULL phylogenetic trees
block 0 contains phylogenetic trees
155 meta elements
1 character matrices
53 taxonomic units
Taxa: Pseudobagarius leucorhynchus, Liobagrus obesus,
Hypsidoris farsonensis, Erethistes sp. (Chen 1994), Bunocephalus
amaurus, Xyliphius sp. (Chen 1994) ...
19. Use-case: Filter matrix using
semantics
> is_desc <- pk_is_descendant('Ictalurus', m$taxa)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
## [12] TRUE TRUE TRUE TRUE
# pk_is_ancestor() also available (and pk_is_extinct())
#
# This is in development for characters, too:
# pk_is_descendant(‘jaw skeleton', m$chars,
# relationships = ‘part of’)
20. Current limitations
Package is not on CRAN (yet), need to install from
Github:
Data in Phenoscape concentrated on vertebrates, and
skeletal fin-limb characters.
Semantics-driven matrix synthesis currently limited to
presence-absence characters.
library(“devtools”)
install_github(“xu-hong/rphenoscape”)
21. Summary
Phenoscape KB has an API for machine access to
computable morphology data and computational
semantics services.
RPhenoscape is a bridge between this API and the
ecosystem of comparative phylogenetics packages in R.
Translates between R user (who uses labels, data
matrices) and Phenoscape KB API (which uses identifiers,
ontology terms, NeXML, etc).
Code on Github: https://github.com/xu-hong/rphenoscape
23. Acknowledgements
U.S. National Science Foundation
DBI-1062404, DBI-1062542
National Evolutionary Synthesis Center
(NESCent), NSF #EF-0905606
Phenoscape contributors, Advisory Board, Data
sources (see: http://phenoscape.org/wiki/
Acknowledgments)
RNeXML developers (C. Boettiger, S. Chamberlain)
http://github.com/rOpenSci/RNeXML
24. Get in touch
Repo: github.com/xu-hong/rphenoscape
Github: github.com/hlapp
Twitter: @hlapp