Inferring networks from multiple samples with consensus LASSOtuxette
The document discusses network inference from gene expression data. It provides background on DNA, transcription, and gene expression. Gene expression data from microarrays contains measurements of thousands of genes across multiple samples. The goal is to infer a gene network or graph with nodes as genes and edges as strong links between gene expressions. Graphical Gaussian models (GGMs) are commonly used, where the concentration matrix encodes conditional independence relationships between genes. Several approaches are discussed for estimating the concentration matrix from data, including graphical lasso methods that promote sparse solutions.
Inferring networks from multiple samples with consensus LASSOtuxette
This document provides an overview of biological concepts and network inference methods. It discusses DNA, transcription, gene expression, and how transcriptomic data is obtained. Gene networks can be inferred from expression data using correlations or partial correlations between genes. Network inference focuses on direct relationships between genes and can identify interactions for previously unannotated genes.
Inferring networks from multiple samples with consensus LASSOtuxette
This document provides a short overview of network inference using graphical Gaussian models (GGMs). It discusses inferring networks from multiple samples, with the motivation being to identify genes that are linked independently or depending on different conditions. A naive approach of performing independent estimations on each sample is described. Joint network inference using the consensus LASSO method is then introduced to better identify common and condition-specific network structures across multiple related samples.
Meren's pirate presentation at the STAMPS course to talk about the basic concepts most binning algorithms use to bin contigs into genome bins: sequence composition, and differential coverage.
The document summarizes several bio-inspired algorithms including CLONALG, aiNet, ABNET, and Opt-aiNet. CLONALG is a clonal selection algorithm inspired by immune system principles of clonal selection, hypermutation, and affinity maturation. aiNet is an artificial immune network model that uses principles of clonal selection, affinity maturation, and network suppression to perform unsupervised learning and clustering. ABNET is an antibody network based on a feedforward neural network trained with immune system concepts. Opt-aiNet adapts the aiNet model for optimization problems by introducing dynamic population sizing, mutation proportional to fitness, and automatic stopping criteria.
Inferring networks from multiple samples with consensus LASSOtuxette
The document discusses network inference from gene expression data. It provides background on DNA, transcription, and gene expression. Gene expression data from microarrays contains measurements of thousands of genes across multiple samples. The goal is to infer a gene network or graph with nodes as genes and edges as strong links between gene expressions. Graphical Gaussian models (GGMs) are commonly used, where the concentration matrix encodes conditional independence relationships between genes. Several approaches are discussed for estimating the concentration matrix from data, including graphical lasso methods that promote sparse solutions.
Inferring networks from multiple samples with consensus LASSOtuxette
This document provides an overview of biological concepts and network inference methods. It discusses DNA, transcription, gene expression, and how transcriptomic data is obtained. Gene networks can be inferred from expression data using correlations or partial correlations between genes. Network inference focuses on direct relationships between genes and can identify interactions for previously unannotated genes.
Inferring networks from multiple samples with consensus LASSOtuxette
This document provides a short overview of network inference using graphical Gaussian models (GGMs). It discusses inferring networks from multiple samples, with the motivation being to identify genes that are linked independently or depending on different conditions. A naive approach of performing independent estimations on each sample is described. Joint network inference using the consensus LASSO method is then introduced to better identify common and condition-specific network structures across multiple related samples.
Meren's pirate presentation at the STAMPS course to talk about the basic concepts most binning algorithms use to bin contigs into genome bins: sequence composition, and differential coverage.
The document summarizes several bio-inspired algorithms including CLONALG, aiNet, ABNET, and Opt-aiNet. CLONALG is a clonal selection algorithm inspired by immune system principles of clonal selection, hypermutation, and affinity maturation. aiNet is an artificial immune network model that uses principles of clonal selection, affinity maturation, and network suppression to perform unsupervised learning and clustering. ABNET is an antibody network based on a feedforward neural network trained with immune system concepts. Opt-aiNet adapts the aiNet model for optimization problems by introducing dynamic population sizing, mutation proportional to fitness, and automatic stopping criteria.
Lecture given for the Data Mining course at Uppsala university in October 2013. The presentation talks about data analysis in the context of genomics, next-generation sequencing, metagenomics etc.
MultiAgent artificial immune system for network intrusion detectionAboul Ella Hassanien
This thesis implements a multi-agent anomaly network intrusion detection system inspired by biological immunity to detect and classify network attacks. It proposes five approaches, including using a genetic algorithm to generate anomaly detectors, discretizing continuous features to create homogeneity between different feature types, and applying feature selection techniques. The approaches are evaluated on datasets like NSL-KDD to generate detectors for identifying anomalous network connections using measures like Euclidean, Minkowski, and Hamming distance. While initial results are promising, further work is needed to optimize feature selection and evaluate the approaches on additional datasets and attack types.
The document discusses using artificial immune systems for computer security. It introduces the human immune system and how artificial immune systems (AIS) are inspired by it. Several AIS models are described for detecting viruses, including negative selection, partial matching rules, anomaly detection, and self/non-self models. The BAM (B-cell Algorithmic Model) is presented as a way to detect viral code by comparing patterns to known legal and viral codes using a correlation matrix. The document concludes the BAM model provides an effective way to detect viruses and errors using artificial immune system concepts.
The document discusses exome sequencing and compares the performance of the xGen Exome Research Panel to other commercial exome sequencing panels. Key points:
1) An independent study directly compared the xGen panel to 3 other commercial exome panels and found that the xGen panel had a higher on-target rate and more uniform coverage than the other panels.
2) When deeply sequenced, the xGen panel was able to achieve greater than 20x coverage of over 94% of bases in the RefSeq database with only 40 million reads, which is 2.5-4 times fewer reads than the other panels tested.
3) The coverage profile produced by the xGen panel more closely resembled whole genome sequencing
A FRIENDLY APPROACH TO PARTICLE FILTERS IN COMPUTER VISIONMarcos Nieto
This is a friendly approach to particle filters. Some hints, examples, and good practices to be able to successfully apply particle filters to solve your computer vision pro
This document discusses the challenges of analyzing large datasets from metagenomic shotgun sequencing experiments. It notes that while sequencing costs have decreased significantly, the computational analysis of the massive amounts of data generated still poses major challenges. It introduces the concept of "digital normalization" as an approach to reduce dataset sizes while retaining most of the biological information by removing redundant reads. The document advocates for making analysis tools and datasets openly accessible to help advance understanding of microbial communities from metagenomics studies.
Digital normalization provides a computationally efficient way to filter high-coverage reads from shotgun sequencing data, reducing data size while retaining most of the information needed for downstream analysis. It works by estimating the coverage of each read without using a reference genome and discarding reads above a given coverage cutoff. The method has been shown to significantly decrease memory requirements for de novo assembly of various data types like metagenomes and transcriptomes, while producing similar or improved assembly results. Future work includes developing reference-free methods for analyzing sequencing data in a streaming fashion before or without assembly.
This document provides an overview of unsupervised machine learning techniques for clustering. It discusses different types of clustering including flat partitions, hierarchical trees, and hard vs soft memberships. Specific clustering algorithms are covered like K-means, hierarchical agglomerative clustering (HAC), DBSCAN, and graph-based clustering. Distance functions and linkage methods for HAC are also summarized. The document concludes with examples of applications for different clustering techniques.
This document summarizes a presentation on analyzing microbial communities using QIIME (Quantitative Insights Into Microbial Ecology). It discusses how to [1] summarize taxonomy from an OTU table, [2] calculate beta diversity using UniFrac to compare communities, and [3] visualize diversity through emperor plots and networks. Additional analysis techniques like sampling design and network analysis are also briefly covered.
2006: Artificial Immune Systems - The Past, The Present, And The Future?Leandro de Castro
The document provides an overview of artificial immune systems (AIS), including:
1) It discusses the history and development of AIS from the 1980s onward, including early works that drew inspiration from immunology and key conferences/publications.
2) It outlines the current state of AIS research, including new application areas, algorithmic improvements, and theoretical investigations into convergence and modeling.
3) It speculates on potential future directions for AIS, such as strengthening theoretical foundations, exploring innate immunity and danger theory models, and applying AIS to dynamic environments.
Interpretable Sparse Sliced Inverse Regression for digitized functional datatuxette
The document discusses interpretable sparse sliced inverse regression (IS-SIR) for functional data regression. It begins with background on using metamodels as proxies for computationally expensive agronomic models to understand relationships between climate inputs and plant outputs. SIR is presented as a semi-parametric regression technique that identifies relevant subspaces to predict outputs from functional inputs. The proposal involves combining SIR with automatic interval selection to point out interpretable predictor intervals. Simulations are discussed to evaluate the proposed method.
Visualiser et fouiller des réseaux - Méthodes et exemples dans Rtuxette
AG du PEPI IBIS, 1er avril 2014
Cet exposé introduira la notion de réseaux et les problématiques élémentaires qui y sont généralement associées (visualisation, recherche de sommets importants, recherche de modules). Les notions seront illustrées à l'aide d'exemples utilisant le logiciel R sur un réseau réel.
Real Estate Customer Servicing GAP AnalysisRahul Gaur
“Customer is King”: The analysis was done on customer grievances data which we had collected over the period of one year. The study was helpful in bridging the GAP between current processes & optimum processes in customer servicing which results into better customer satisfaction.
A Pecha Kucha style presentation submitted as part of my Master of Arts Design Management at Birmingham City University - Birmingham Institute of Art and Design
A Design Strategy Case Study of Fiskars Home - Orange-handled scissors.
Graph mining 2: Statistical approaches for graph miningtuxette
This document summarizes a talk on statistical approaches for graph mining. It introduces basic graph terminology and describes some standard global and local numerical characteristics for describing graph structure. These characteristics are calculated for a toy graph dataset and compared to random graph null models to identify which characteristics have unexpectedly high or low values compared to the random graphs. Clustering methods for graph mining are also outlined but not described in detail.
일본 건축물에 대해 조사를 하다가 오히려 개인적인 흥미가 있는 일본건축공간디자이너를 중점적으로 조사했습니다. 그들의 단순한 연혁이 아닌 디자인 갤러리와 뮤지엄을 중점적으로 조사하여 그들의 특징과 철학을 알아보았습니다. 배운점을 바탕으로하여 스스로 3D 작업을 시도해보았습니다.
Lecture given for the Data Mining course at Uppsala university in October 2013. The presentation talks about data analysis in the context of genomics, next-generation sequencing, metagenomics etc.
MultiAgent artificial immune system for network intrusion detectionAboul Ella Hassanien
This thesis implements a multi-agent anomaly network intrusion detection system inspired by biological immunity to detect and classify network attacks. It proposes five approaches, including using a genetic algorithm to generate anomaly detectors, discretizing continuous features to create homogeneity between different feature types, and applying feature selection techniques. The approaches are evaluated on datasets like NSL-KDD to generate detectors for identifying anomalous network connections using measures like Euclidean, Minkowski, and Hamming distance. While initial results are promising, further work is needed to optimize feature selection and evaluate the approaches on additional datasets and attack types.
The document discusses using artificial immune systems for computer security. It introduces the human immune system and how artificial immune systems (AIS) are inspired by it. Several AIS models are described for detecting viruses, including negative selection, partial matching rules, anomaly detection, and self/non-self models. The BAM (B-cell Algorithmic Model) is presented as a way to detect viral code by comparing patterns to known legal and viral codes using a correlation matrix. The document concludes the BAM model provides an effective way to detect viruses and errors using artificial immune system concepts.
The document discusses exome sequencing and compares the performance of the xGen Exome Research Panel to other commercial exome sequencing panels. Key points:
1) An independent study directly compared the xGen panel to 3 other commercial exome panels and found that the xGen panel had a higher on-target rate and more uniform coverage than the other panels.
2) When deeply sequenced, the xGen panel was able to achieve greater than 20x coverage of over 94% of bases in the RefSeq database with only 40 million reads, which is 2.5-4 times fewer reads than the other panels tested.
3) The coverage profile produced by the xGen panel more closely resembled whole genome sequencing
A FRIENDLY APPROACH TO PARTICLE FILTERS IN COMPUTER VISIONMarcos Nieto
This is a friendly approach to particle filters. Some hints, examples, and good practices to be able to successfully apply particle filters to solve your computer vision pro
This document discusses the challenges of analyzing large datasets from metagenomic shotgun sequencing experiments. It notes that while sequencing costs have decreased significantly, the computational analysis of the massive amounts of data generated still poses major challenges. It introduces the concept of "digital normalization" as an approach to reduce dataset sizes while retaining most of the biological information by removing redundant reads. The document advocates for making analysis tools and datasets openly accessible to help advance understanding of microbial communities from metagenomics studies.
Digital normalization provides a computationally efficient way to filter high-coverage reads from shotgun sequencing data, reducing data size while retaining most of the information needed for downstream analysis. It works by estimating the coverage of each read without using a reference genome and discarding reads above a given coverage cutoff. The method has been shown to significantly decrease memory requirements for de novo assembly of various data types like metagenomes and transcriptomes, while producing similar or improved assembly results. Future work includes developing reference-free methods for analyzing sequencing data in a streaming fashion before or without assembly.
This document provides an overview of unsupervised machine learning techniques for clustering. It discusses different types of clustering including flat partitions, hierarchical trees, and hard vs soft memberships. Specific clustering algorithms are covered like K-means, hierarchical agglomerative clustering (HAC), DBSCAN, and graph-based clustering. Distance functions and linkage methods for HAC are also summarized. The document concludes with examples of applications for different clustering techniques.
This document summarizes a presentation on analyzing microbial communities using QIIME (Quantitative Insights Into Microbial Ecology). It discusses how to [1] summarize taxonomy from an OTU table, [2] calculate beta diversity using UniFrac to compare communities, and [3] visualize diversity through emperor plots and networks. Additional analysis techniques like sampling design and network analysis are also briefly covered.
2006: Artificial Immune Systems - The Past, The Present, And The Future?Leandro de Castro
The document provides an overview of artificial immune systems (AIS), including:
1) It discusses the history and development of AIS from the 1980s onward, including early works that drew inspiration from immunology and key conferences/publications.
2) It outlines the current state of AIS research, including new application areas, algorithmic improvements, and theoretical investigations into convergence and modeling.
3) It speculates on potential future directions for AIS, such as strengthening theoretical foundations, exploring innate immunity and danger theory models, and applying AIS to dynamic environments.
Interpretable Sparse Sliced Inverse Regression for digitized functional datatuxette
The document discusses interpretable sparse sliced inverse regression (IS-SIR) for functional data regression. It begins with background on using metamodels as proxies for computationally expensive agronomic models to understand relationships between climate inputs and plant outputs. SIR is presented as a semi-parametric regression technique that identifies relevant subspaces to predict outputs from functional inputs. The proposal involves combining SIR with automatic interval selection to point out interpretable predictor intervals. Simulations are discussed to evaluate the proposed method.
Visualiser et fouiller des réseaux - Méthodes et exemples dans Rtuxette
AG du PEPI IBIS, 1er avril 2014
Cet exposé introduira la notion de réseaux et les problématiques élémentaires qui y sont généralement associées (visualisation, recherche de sommets importants, recherche de modules). Les notions seront illustrées à l'aide d'exemples utilisant le logiciel R sur un réseau réel.
Real Estate Customer Servicing GAP AnalysisRahul Gaur
“Customer is King”: The analysis was done on customer grievances data which we had collected over the period of one year. The study was helpful in bridging the GAP between current processes & optimum processes in customer servicing which results into better customer satisfaction.
A Pecha Kucha style presentation submitted as part of my Master of Arts Design Management at Birmingham City University - Birmingham Institute of Art and Design
A Design Strategy Case Study of Fiskars Home - Orange-handled scissors.
Graph mining 2: Statistical approaches for graph miningtuxette
This document summarizes a talk on statistical approaches for graph mining. It introduces basic graph terminology and describes some standard global and local numerical characteristics for describing graph structure. These characteristics are calculated for a toy graph dataset and compared to random graph null models to identify which characteristics have unexpectedly high or low values compared to the random graphs. Clustering methods for graph mining are also outlined but not described in detail.
일본 건축물에 대해 조사를 하다가 오히려 개인적인 흥미가 있는 일본건축공간디자이너를 중점적으로 조사했습니다. 그들의 단순한 연혁이 아닌 디자인 갤러리와 뮤지엄을 중점적으로 조사하여 그들의 특징과 철학을 알아보았습니다. 배운점을 바탕으로하여 스스로 3D 작업을 시도해보았습니다.
This slide deck presents some of the insights gleaned from a data set in Ntrepid Corporation’s Timestream application that is an open-source collection of reported ISIS-linked activity in Yemen. The full case study is available: http://www.criticalthreats.org/yemen/exploring-isis-yemen-zimmerman-july-24-2015.
The Islamic State in Iraq and al Sham (ISIS) is attempting to expand its footprint in Yemen. ISIS declared an Islamic Caliphate on June 29, 2014, under the leadership of the new Caliph, ISIS leader Abu Bakr al Baghdadi. The return of the Caliphate under Baghdadi placed an obligation on all Muslims to pledge allegiance to him, according to ISIS. Al Qaeda broadly dismisses the legitimacy of the Caliphate under ISIS, and ISIS and al Qaeda are now in competition for the leadership of the global jihadist movement.
Al Qaeda’s Yemen-based affiliate, al Qaeda in the Arabian Peninsula (AQAP), dominates the jihadist fight in the country and it remains the greatest direct threat to the United States from the al Qaeda network. AQAP leadership reaffirmed its allegiance to al Qaeda leader Ayman al Zawahiri in November 2014, publicly rejecting the legitimacy of the Islamic Caliphate. The 2015 collapse of the central Yemeni state created opportunities for AQAP to exploit, and the group is expanding its presence in Yemen.
The initial reaction to ISIS in Yemen was muted, but the group has begun to make inroads as the conflict there protracts. ISIS began claiming regular attacks in Yemen as of March 2015 and now operates in at least eight Yemeni governorates.
Natural processes like weathering, erosion, deposition, landslides, volcanic eruptions, earthquakes and floods shape Earth's landforms and oceans in both constructive and destructive ways. Key ocean landforms include the continental shelf, slope, mid-ocean ridge, rift zone, trench, and ocean basin. Waves, currents, tides and storms continually change coastal features such as beaches, barrier islands, estuaries and inlets through erosion and deposition.
1) Ganesh D. Keskar designed a villa in Jaisalmer, Rajasthan with a ground floor plan that includes a porch, kitchen, guest bedroom, dining area, living area, TV room, common toilet, caretaker bedroom, and utility area.
2) The second floor plan includes a terrace, master bedroom, children's bedroom, courtyard, TV area, play area, balcony, and two bathrooms.
3) Elevations and section drawings along with the site plan were provided. Detailed drawings of the watchman cabin and compound wall were also included.
Building an Information Infrastructure to Support Microbial Metagenomic SciencesLarry Smarr
06.01.14
Presentation for the Microbe Project Interagency Team
Title: Building an Information Infrastructure to Support Microbial Metagenomic Sciences
La Jolla, CA
Assembly of metagenomes is challenging due to heterogeneous samples containing many different genomes that overlap. Specialized software like Genovo uses probabilistic models to discover likely sequence reconstructions from shotgun reads. While standard assemblers can be used by tweaking parameters, challenges remain due to repetitive elements and multiple species. Checking contigs for read depth, GC frequency, and tetranucleotide frequency can help identify dominant species.
Tom Delmont: From the Terragenome Project to Global Metagenomic Comparisons: ...GigaScience, BGI Hong Kong
This document discusses challenges in comparing metagenomic data from different environments and studies. It argues that when exploring a new environment, multiple methodological approaches should be used to capture natural and methodological variations. When performing global comparisons, methodological variations should be considered for all environments. Defining ecosystems precisely at the microorganism level is important. The author's vision is for projects like the Earth Microbiome Project to use flexible experimental designs informed by different experts to best represent microbial communities.
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
06.09.15
Invited Talk
2006 Synthetic Biology Symposium
Aliso Creek Inn
Title: Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics
Laguna Beach, CA
The presentation includes preliminary information about the big data mainly metagenomic data and discussions related to the hurdles in analyzing using conventional approaches. In the later part, brief introduction about machine learning approaches using biological example for each. In the last, work done with special focus on implementation of a machine learning approach Random Forest for the functional annotation and taxonomic classification of metagenomic data.
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3Gianpaolo Coro
An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations. e-Infrastructures allow scientists residing at distant places to collaborate. They offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction, e.g. data transfer, data harmonization, data processing workflows etc. e-Infrastructures are gaining an important place in the field of biodiversity conservation. Their computational capabilities help scientists to reuse models, obtain results in shorter time and share these results with other colleagues. They are also used to access several and heterogeneous biodiversity catalogues.
In this course, the D4Science e-Infrastructure will be used to conduct experiments in the field of biodiversity conservation. D4Science hosts models and contributions by several international organizations involved in the biodiversity conservation field. The course will give students an overview of the models, the practices and the methods that large international organizations like FAO and UNESCO apply by means of D4Science. At the same time, the course will introduce students to the basic concepts under e-Infrastructures, Virtual Research Environments, data sharing and experiments reproducibility.
GBIF BIFA mentoring, Day 4b Event core, July 2016Dag Endresen
GBIF BIFA mentoring in Los Banos, Philippines for the South-East Asian ASEAN Biodiversity Heritage Parks. With Dr. Yu-Huang Wang, Dr. Po-Jen Chiang, and Guan-Shuo Mai from TaiBIF the GBIF node of Taiwan (Chinese Tapei); and the Biodiversity Informatics team at ASEAN Centre For Biodiversity. http://www.gbif.no/events/2016/gbif-bifa-mentoring.html
The document summarizes a "Barcode Blitz" project between the CSIRO Ecosystem Sciences and the Biodiversity Institute of Ontario to rapidly barcode Australian Lepidoptera specimens. Over 10 weeks, they processed 28,000 specimens representing 8,000 species, taking leg samples for DNA barcoding and digitizing collection records. This created a comprehensive barcode reference library that has supported research into taxonomy, biodiversity, and biosecurity applications.
Zhang et al ecn 2016 building an accessible weevil tissue collection for geno...taxonbytes
Poster describing the origin and function of the ASUHIC Weevil Tissue Collection (WTC), see tinyurl.com/weeviltissuecollection; presented at the 2016 Entomological Collections Network Meeting, September 23, 2016, Orlando, Florida. ECN website: http://ecnweb.org/
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5Gianpaolo Coro
An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations. e-Infrastructures allow scientists residing at distant places to collaborate. They offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction, e.g. data transfer, data harmonization, data processing workflows etc. e-Infrastructures are gaining an important place in the field of biodiversity conservation. Their computational capabilities help scientists to reuse models, obtain results in shorter time and share these results with other colleagues. They are also used to access several and heterogeneous biodiversity catalogues.
In this course, the D4Science e-Infrastructure will be used to conduct experiments in the field of biodiversity conservation. D4Science hosts models and contributions by several international organizations involved in the biodiversity conservation field. The course will give students an overview of the models, the practices and the methods that large international organizations like FAO and UNESCO apply by means of D4Science. At the same time, the course will introduce students to the basic concepts under e-Infrastructures, Virtual Research Environments, data sharing and experiments reproducibility.
- Life sciences have specific computational challenges as data production from sequencing grows faster than Moore's law and there is a constant need to compare new data to existing data.
- France is developing complementary HPC and HTC infrastructures for life sciences, including the Institut Français de Bioinformatique, France Génomique, and E-Biothon - an HPC platform for research in life sciences.
- E-Biothon provides researchers with access to Blue Gene/P supercomputers and over 200TB of storage for large-scale genome assembly and comparative analyses in projects like studying synteny across microbial genomes and characterizing biodiversity.
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future OpportunitiesFranck Michel
1. The document discusses different approaches to modeling biodiversity data as linked data, including representing taxa as classes or class instances. Representing them the same way across datasets maximizes interlinking but may limit reasoning abilities.
2. It also outlines the differences between modeling from a thesaurus perspective versus a biological perspective. A thesaurus focuses on describing individuals while an ontology can define classes through necessary and sufficient conditions.
3. While pragmatism in modeling linked data can increase interoperability now, it may reduce opportunities for reasoning and inference in the future as perspectives are not fully aligned. Choosing a clear modeling approach is important.
The Transformation of Systems Biology Into A Large Data ScienceRobert Grossman
Systems biology is becoming a data-intensive science due to the exponential growth of genomic and biological data. Large projects now produce petabytes of data that require new computational infrastructure to store, manage, and analyze. Cloud computing provides elastic resources that can scale to support the increasing data needs of systems biology. Case studies show how clouds are used for large-scale data integration and analysis, running combinatorial analysis over genomic marks, and enabling reanalysis of biological data through elastic virtual machines. The Open Cloud Consortium is working to provide open cloud resources for biological and biomedical research through testbeds and proposed bioclouds.
The document discusses the process of digitizing the collection at the Lyman Entomological Museum, including challenges faced and benefits gained. About 10% of the collection has been digitized so far. Digitization is a time-consuming process that involves verifying specimen identification, adding unique identifiers, georeferencing localities, and cleaning data. Making the data openly accessible online through databases creates opportunities for research.
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...Franck Michel
1. Taxonomic registers are important tools for biodiversity data but linking them as linked open data can be challenging.
2. This document presents TAXREF-LD, a model for representing the French taxonomic register TAXREF as linked data in a way that distinguishes nomenclatural and taxonomic information and can accommodate taxonomic changes.
3. TAXREF-LD was developed to be biologically relevant, compatible with semantic web standards, and able to link to other datasets. It represents TAXREF names and taxa as SKOS concepts and OWL classes with extensive external linking.
A presentation on the AusPlots program detailing it's aims and objectives, what and how data is collected, how it is delivered along with information on collaborations, data use, analysis and future opportunities
This document discusses a global study examining the impacts of storms on freshwater habitats and phytoplankton assemblages. It outlines the study's goals of analyzing data from over 30 lakes to identify how storms affect ecosystems, biodiversity, and community resilience. The study faces challenges in dealing with heterogeneous data from different disciplines and origins. It employs adapted team management, data compilation strategies, and analytical methods like meta-analysis and trait-based approaches to standardize data and facilitate comparisons across sites. Initial results are available on the study website.
Kernel methods for data integration in systems biology tuxette
This document provides an overview of a seminar presentation on kernel methods for data integration in systems biology. It begins with short biographies of the presenter, who is trained as a mathematician and statistician and applies their skills to research in human health and animal genomics using various omics data types. Examples are given of the presenter's past work inferring networks and integrating gene expression and lipid data, as well as expression and 3D DNA location data. The talk will discuss how to integrate multiple omics data from different sources and types using kernels. Kernels allow reducing high-dimensional data to similarity matrices and are not restricted to numeric data. They also allow embedding expert knowledge and provide a framework for statistical learning.
Comparative study of ensemble deep learning models to determine the classific...CSITiaesprime
Sea turtles are reptiles listed on the international union for conservation of nature (IUCN) red list of threatened species and the convention on international trade in endangered species of wild fauna and flora (CITES) Appendix I as species threatened with extinction. Sea turtles are nearly extinct due to natural predators and people who are frequently incorrect or even ignorant in determining which turtles should not be caught. The aim of this study was to develop a classification system to help classify sea turtle species. Therefore, the ensemble deep learning of convolutional neural network (CNN) method based on transfer learning is proposed for the classification of turtle species found in coastal communities. In this case, there are five well-known CNN models (VGG-16, ResNet-50, ResNet-152, Inception-V3, and DenseNet201). Among the five different models, the three most successful were selected for the ensemble method. The final result is obtained by combining the predictions of the CNN model with the ensemble method during the test. The evaluation result shows that the VGG16 - DenseNet201 ensemble is the best ensemble model, with accuracy, precision, recall, and F1-Score values of 0.74, 0.75, 0.74, and 0.76, respectively. This result also shows that this ensemble model outperforms the original model.
This document summarizes the goals and progress of the Open Tree of Life project, which aims to synthesize a complete draft tree of life using existing phylogenetic data. The project has collected phylogenetic data from over 7000 studies and stored it using graph databases. An open public interface allows users to browse, download and query the tree. The project is on track to release an initial draft tree in year 1 and refine it based on user feedback in year 2, while expanding collaborations and incentives for data contributors.
Web Apollo: Lessons learned from community-based biocuration efforts.Monica Munoz-Torres
This presentation tries to highlight the importance and relevance of community-based curation of biological data. It describes the results of harvesting expertise from dispersed researchers assigning functions to predicted and curated peptides, as well as collaborative efforts for standardization of genes and gene product attributes across species and databases.
The presentation provides overview and significance of the TERN long term ecological research network. The presentation was part of the Workshop on Approaches to Terrestrial Ecosystem Data Management : from collection to synthesis and beyond which was held on 9th of March 2016 in University of Queensland.
Racines en haut et feuilles en bas : les arbres en mathstuxette
1. The document discusses methods for clustering and differential analysis of Hi-C matrices, which represent the 3D organization of DNA.
2. It proposes extending Ward's hierarchical clustering to directly use Hi-C similarity matrices while enforcing adjacency constraints. A fast algorithm was also developed.
3. A new method called "treediff" was created to perform differential analysis of Hi-C matrices based on the Wasserstein distance between hierarchical clusterings. Software implementations of these methods were also developed.
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
The document discusses a presentation about multi-omics data integration methods using kernel methods. The presentation introduces kernel methods, how they can be used to integrate heterogeneous omics data, and examples of applications. Specifically, it discusses using kernel methods to perform unsupervised transformation-based integration of multi-omics data. It also presents an application of constrained kernel hierarchical clustering to analyze Hi-C data by directly using Hi-C matrices as kernels.
Méthodologies d'intégration de données omiquestuxette
This document presents a presentation on multi-omics data integration methods given by Nathalie Vialaneix on December 13, 2023. The presentation discusses different types of omics data that can be integrated, both vertically across different levels of omics data on the same samples and horizontally across similar types of omics data on different samples. It also discusses different analysis approaches that can be taken, including supervised and unsupervised methods. The rest of the presentation focuses on unsupervised transformation-based integration methods using kernels.
The document discusses current and future work on analyzing Hi-C data and differential analysis of Hi-C matrices. It describes a clustering method developed to partition chromosomes based on Hi-C matrix similarity. It also introduces a new method called treediff for differential analysis of Hi-C data that calculates the distance between hierarchical clusterings. Current work includes reviewing differential analysis methods, investigating differential subtrees with multiple testing control, and inferring chromatin interaction networks.
Can deep learning learn chromatin structure from sequence?tuxette
This document discusses a deep learning model called ORCA that can predict chromatin structure from DNA sequence. The model uses a neural network with an encoder to extract features from sequence and a decoder to predict Hi-C matrices. It was trained on Hi-C data from multiple cell types and can predict interactions between regions at various resolutions. The model accurately captures features like CTCF-mediated loops and can predict effects of structural variants on chromatin structure. It allows for in silico mutagenesis to study how mutations may alter 3D genome organization.
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
The document discusses multi-omics data integration methods, particularly kernel methods. It describes how kernel methods transform data into similarity matrices between samples rather than relying on variable space. Multiple kernel integration approaches are presented that combine multiple similarity matrices into a consensus kernel in an unsupervised manner, such as through a STATIS-like framework that maximizes the similarity between kernels. Examples of applications to datasets from the TARA Oceans expedition are given.
This document provides an overview of the MetaboWean and Idefics projects. MetaboWean aims to study the co-evolution of gut microbiota and epithelium during suckling-to-weaning transition in rabbits, using metabolomics, metagenomics, and single-cell RNA sequencing data. Idefics integrates multiple omics datasets from human skin samples to understand relationships between microorganisms and molecules and how they are structured in patient groups. The datasets include metagenomics, metabolomics, and proteomics from host and microbiota.
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
ASTERICS is an interactive and integrative data analysis tool for omics data. It uses Rserve and PyRserve with Flask and Vue.js in a Docker container to integrate omics data. The backend uses Rserve and PyRserve with Flask on the server side, while the frontend uses Vue.js. This architecture was chosen for its open source and light design. Data communication between Rserve and PyRserve is limited, requiring an object database. ASTERICS is deployed using three Docker containers for R, Python, and
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
This document summarizes a scientific presentation about molecular biology and omics data analysis. The presentation covers topics related to analyzing large omics datasets using methods like kernel methods, graphical models, and neural networks to learn gene regulation networks and predict phenotypes. Key challenges addressed are handling big data, missing values, non-Gaussian data types like counts and compositional data. The goal is to better understand complex biological systems from multi-omics data.
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
The document summarizes preliminary results from evaluating methods for inferring gene regulatory networks from expression data in Bacillus subtilis. It finds that recall of the known network is generally poor (<20% for random forest), but inferred clusters still retain biological information about common regulators. It plans to confirm results, test restricting edges to sigma factors, and explore other inference methods like Bayesian networks and ARACNE.
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
The document discusses methods for integrating multi-scale omics data using kernel and machine learning approaches. It describes how omics data is large, heterogeneous, and multi-scaled, creating bottlenecks for analysis. Methods discussed for data integration include multiple kernel learning to combine different relational datasets in an unsupervised way. The methods are applied to integrate different datasets from the TARA Oceans expedition to identify patterns in ocean microbial communities. Improving interpretability of the methods and making them more accessible to biological users is discussed.
Journal club: Validation of cluster analysis results on validation datatuxette
This document presents a framework for validating cluster analysis results on validation data. It describes situations where clustering is inferential versus descriptive and recommends using validation data separate from the data used for clustering. A typology of validation methods is provided, including validation based on the clustering method or results, and evaluation using internal validation, external validation, visual properties, or stability measures.
The document discusses the differences between overfitting and overparametrization in machine learning models. It explores how random forests may exhibit a phenomenon known as "double descent" where test error initially decreases then increases with more parameters before decreasing again. While double descent has been observed in other models, the document questions whether it is directly due to model complexity in random forests since very large trees may be unable to fully interpolate extremely large datasets.
Selective inference and single-cell differential analysistuxette
This document discusses selective inference and single-cell differential analysis. It introduces the problem of "double dipping" in the standard single-cell analysis pipeline where the same dataset is used for clustering and differential analysis. Two approaches for addressing this are presented: 1) A method that perturbs clusters before testing for differences, and 2) A test based on a truncated distribution that assumes clusters and genes are given separately. Experiments applying these methods to real single-cell datasets are described. The document outlines challenges in extending these approaches to more complex analyses.
SOMbrero : un package R pour les cartes auto-organisatricestuxette
SOMbrero is an R package that implements self-organizing map (SOM) algorithms. It can handle numeric, non-numeric, and relational data. The package contains functions for training SOMs, diagnosing results, and plotting maps. It also includes tools like a shiny app and vignettes to aid users without programming experience. SOMbrero supports missing data imputation and extends SOM to relational datasets through non-Euclidean distance measures.
Graph Neural Network for Phenotype Predictiontuxette
This document describes a study on using graph neural networks (GNNs) for phenotype prediction from gene expression data. The objectives are to determine if including network information can improve predictions, which network types work best, and if GNNs can learn network inferences. It provides background on GNNs and how they generalize convolutional layers to graph data. The authors implemented a GNN model from previous work as a starting point and tested it on different network types to see which network information is most useful for predictions. Their methodology involves comparing GNN performance to other methods like random forests using 10-fold cross validation.
A short and naive introduction to using network in prediction modelstuxette
The document provides an introduction to using network information in prediction models. It discusses representing a network as a graph with a Laplacian matrix. The Laplacian captures properties like random walks on the graph and heat diffusion. Eigenvectors of the Laplacian related to small eigenvalues are strongly tied to graph structure. The document discusses using the Laplacian in prediction models by working in the feature space defined by the Laplacian eigenvectors or directly regularizing a linear model with the Laplacian. This introduces network information and encourages similar contributions from connected nodes. The approaches are applied to problems like predicting phenotypes from gene expression using a known gene network.
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSSérgio Sacani
The pathway(s) to seeding the massive black holes (MBHs) that exist at the heart of galaxies in the present and distant Universe remains an unsolved problem. Here we categorise, describe and quantitatively discuss the formation pathways of both light and heavy seeds. We emphasise that the most recent computational models suggest that rather than a bimodal-like mass spectrum between light and heavy seeds with light at one end and heavy at the other that instead a continuum exists. Light seeds being more ubiquitous and the heavier seeds becoming less and less abundant due the rarer environmental conditions required for their formation. We therefore examine the different mechanisms that give rise to different seed mass spectrums. We show how and why the mechanisms that produce the heaviest seeds are also among the rarest events in the Universe and are hence extremely unlikely to be the seeds for the vast majority of the MBH population. We quantify, within the limits of the current large uncertainties in the seeding processes, the expected number densities of the seed mass spectrum. We argue that light seeds must be at least 103 to 105 times more numerous than heavy seeds to explain the MBH population as a whole. Based on our current understanding of the seed population this makes heavy seeds (Mseed > 103 M⊙) a significantly more likely pathway given that heavy seeds have an abundance pattern than is close to and likely in excess of 10−4 compared to light seeds. Finally, we examine the current state-of-the-art in numerical calculations and recent observations and plot a path forward for near-future advances in both domains.
PPT on Sustainable Land Management presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
Signatures of wave erosion in Titan’s coastsSérgio Sacani
The shorelines of Titan’s hydrocarbon seas trace flooded erosional landforms such as river valleys; however, it isunclear whether coastal erosion has subsequently altered these shorelines. Spacecraft observations and theo-retical models suggest that wind may cause waves to form on Titan’s seas, potentially driving coastal erosion,but the observational evidence of waves is indirect, and the processes affecting shoreline evolution on Titanremain unknown. No widely accepted framework exists for using shoreline morphology to quantitatively dis-cern coastal erosion mechanisms, even on Earth, where the dominant mechanisms are known. We combinelandscape evolution models with measurements of shoreline shape on Earth to characterize how differentcoastal erosion mechanisms affect shoreline morphology. Applying this framework to Titan, we find that theshorelines of Titan’s seas are most consistent with flooded landscapes that subsequently have been eroded bywaves, rather than a uniform erosional process or no coastal erosion, particularly if wave growth saturates atfetch lengths of tens of kilometers.
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxgoluk9330
Ahota Beel, nestled in Sootea Biswanath Assam , is celebrated for its extraordinary diversity of bird species. This wetland sanctuary supports a myriad of avian residents and migrants alike. Visitors can admire the elegant flights of migratory species such as the Northern Pintail and Eurasian Wigeon, alongside resident birds including the Asian Openbill and Pheasant-tailed Jacana. With its tranquil scenery and varied habitats, Ahota Beel offers a perfect haven for birdwatchers to appreciate and study the vibrant birdlife that thrives in this natural refuge.
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...Advanced-Concepts-Team
Presentation in the Science Coffee of the Advanced Concepts Team of the European Space Agency on the 07.06.2024.
Speaker: Diego Blas (IFAE/ICREA)
Title: Gravitational wave detection with orbital motion of Moon and artificial
Abstract:
In this talk I will describe some recent ideas to find gravitational waves from supermassive black holes or of primordial origin by studying their secular effect on the orbital motion of the Moon or satellites that are laser ranged.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
Integrating Tara Oceans datasets using unsupervised multiple kernel learning
1. Integrating Tara Oceans datasets using unsupervised
multiple kernel learning
Nathalie Villa-Vialaneix
Joint work with Jérôme Mariette
http://www.nathalievilla.org
Séminaire de Probabilité et Statistique
Laboratoire J.A. Dieudonné, Université de Nice
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 1/41
2. Sommaire
1 Metagenomic datasets and associated questions
2 A typical (and rich) case study: TARA Oceans datasets
3 A UMKL framework for integrating multiple metagenomic data
4 Application to TARA Oceans datasets
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 2/41
3. Sommaire
1 Metagenomic datasets and associated questions
2 A typical (and rich) case study: TARA Oceans datasets
3 A UMKL framework for integrating multiple metagenomic data
4 Application to TARA Oceans datasets
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 3/41
4. What are metagenomic data?
Source: [Sommer et al., 2010]
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 4/41
5. What are metagenomic data?
Source: [Sommer et al., 2010]
abundance data sparse
n × p-matrices with count data
of samples in rows and
descriptors (species, OTUs,
KEGG groups, k-mer, ...) in
columns. Generally p n.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 4/41
6. What are metagenomic data?
Source: [Sommer et al., 2010]
abundance data sparse
n × p-matrices with count data
of samples in rows and
descriptors (species, OTUs,
KEGG groups, k-mer, ...) in
columns. Generally p n.
philogenetic tree (evolution
history between species,
OTUs...). One tree with p leaves
built from the sequences
collected in the n samples.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 4/41
7. What are metagenomic data used for?
produce a profile of the diversity of a given sample ⇒ allows to
compare diversity between various conditions
used in various fields: environmental science, microbiote, ...
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 5/41
8. What are metagenomic data used for?
produce a profile of the diversity of a given sample ⇒ allows to
compare diversity between various conditions
used in various fields: environmental science, microbiote, ...
Processed by computing a relevant dissimilarity between samples
(standard Euclidean distance is not relevant) and by using this dissimilarity
in subsequent analyses.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 5/41
9. β-diversity data: dissimilarities between count data
Compositional dissimilarities: (nig) count of species g for sample i
Jaccard: the fraction of species specific of either sample i or j:
djac =
g I{nig>0,njg=0} + I{njg>0,nig=0}
j I{nig+njg>0}
Bray-Curtis: the fraction of the sample which is specific of either
sample i or j
dBC =
g |nig − njg|
g(nig + njg)
Other dissimilarities available in the R package philoseq, most of them
not Euclidean.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 6/41
11. β-diversity data: phylogenetic dissimilarities
Phylogenetic dissimilarities
For each branch e, note le its length and pei
the fraction of counts in sample i
corresponding to species below branch e.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 7/41
12. β-diversity data: phylogenetic dissimilarities
Phylogenetic dissimilarities
For each branch e, note le its length and pei
the fraction of counts in sample i
corresponding to species below branch e.
Unifrac: the fraction of the tree specific to
either sample i or sample j.
dUF =
e le(I{pei>0,pej=0} + I{pej>0,pei=0})
e leI{pei+pej>0}
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 7/41
13. β-diversity data: phylogenetic dissimilarities
Phylogenetic dissimilarities
For each branch e, note le its length and pei
the fraction of counts in sample i
corresponding to species below branch e.
Unifrac: the fraction of the tree specific to
either sample i or sample j.
dUF =
e le(I{pei>0,pej=0} + I{pej>0,pei=0})
e leI{pei+pej>0}
Weighted Unifrac: the fraction of the
diversity specific to sample i or to sample j.
dwUF =
e le|pei − pej|
e(pei + pej)
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 7/41
14. Sommaire
1 Metagenomic datasets and associated questions
2 A typical (and rich) case study: TARA Oceans datasets
3 A UMKL framework for integrating multiple metagenomic data
4 Application to TARA Oceans datasets
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 8/41
15. TARA Oceans datasets
The 2009-2013 expedition
Co-directed by Étienne Bourgois
and Éric Karsenti.
7,012 datasets collected from
35,000 samples of plankton and
water (11,535 Gb of data).
Study the plankton: bacteria,
protists, metazoans and viruses
representing more than 90% of the
biomass in the ocean.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 9/41
16. TARA Oceans datasets
Science (May 2015) - Studies on:
eukaryotic plankton diversity
[de Vargas et al., 2015],
ocean viral communities
[Brum et al., 2015],
global plankton interactome
[Lima-Mendez et al., 2015],
global ocean microbiome
[Sunagawa et al., 2015],
. . . .
→ datasets from different types and
different sources analyzed separately.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 10/41
17. Background of this talk
Objectives
Until now: many papers using many methods. No integrated analysis
performed.
What do the datasets reveal if integrated in a single analysis?
Our purpose: develop a generic method to integrate phylogenetic,
taxonomic and functional community composition together with
environmental factors.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 11/41
18. TARA Oceans datasets that we used
[Sunagawa et al., 2015]
Datasets used
environmental dataset: 22 numeric features (temperature, salinity, . . . ).
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 12/41
19. TARA Oceans datasets that we used
[Sunagawa et al., 2015]
Datasets used
environmental dataset: 22 numeric features (temperature, salinity, . . . ).
bacteria phylogenomic tree: computed from ∼ 35,000 OTUs.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 12/41
20. TARA Oceans datasets that we used
[Sunagawa et al., 2015]
Datasets used
environmental dataset: 22 numeric features (temperature, salinity, . . . ).
bacteria phylogenomic tree: computed from ∼ 35,000 OTUs.
bacteria functional composition: ∼ 63,000 KEGG orthologous groups.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 12/41
21. TARA Oceans datasets that we used
[de Vargas et al., 2015]
Datasets used
environmental dataset: 22 numeric features (temperature, salinity, . . . ).
bacteria phylogenomic tree: computed from ∼ 35,000 OTUs.
bacteria functional composition: ∼ 63,000 KEGG orthologous groups.
eukaryotic plankton composition splited into 4 groups pico (0.8 − 5µm),
nano (5 − 20µm), micro (20 − 180µm) and meso (180 − 2000µm).
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 12/41
22. TARA Oceans datasets that we used
[Brum et al., 2015]
Datasets used
environmental dataset: 22 numeric features (temperature, salinity, . . . ).
bacteria phylogenomic tree: computed from ∼ 35,000 OTUs.
bacteria functional composition: ∼ 63,000 KEGG orthologous groups.
eukaryotic plankton composition splited into 4 groups pico (0.8 − 5µm),
nano (5 − 20µm), micro (20 − 180µm) and meso (180 − 2000µm).
virus composition: ∼ 867 virus clusters based on shared gene content.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 12/41
23. TARA Oceans datasets that we used
Common samples
48 samples,
2 depth layers: surface
(SRF) and deep chlorophyll
maximum (DCM),
31 different sampling
stations.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 13/41
24. Sommaire
1 Metagenomic datasets and associated questions
2 A typical (and rich) case study: TARA Oceans datasets
3 A UMKL framework for integrating multiple metagenomic data
4 Application to TARA Oceans datasets
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 14/41
25. Kernel methods
Kernel viewed as the dot product in an implicit Hilbert space
K : X × X → R st: K(xi, xj) = K(xj, xi) and ∀ m ∈ N, ∀x1, ..., xm ∈ X,
∀ α1, ..., αm ∈ R, m
i,j=1 αiαjK(xi, xj) ≥ 0.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 15/41
26. Kernel methods
Kernel viewed as the dot product in an implicit Hilbert space
K : X × X → R st: K(xi, xj) = K(xj, xi) and ∀ m ∈ N, ∀x1, ..., xm ∈ X,
∀ α1, ..., αm ∈ R, m
i,j=1 αiαjK(xi, xj) ≥ 0.
⇒ [Aronszajn, 1950]
∃!(H, ., . ), φ : X → H st: K(xi, xj) = φ(xi), φ(xj)
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 15/41
27. Exploratory analysis with kernels
A well know example: kernel PCA [Schölkopf et al., 1998]
PCA analysis performed in the feature space induced by the kernel K.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 16/41
28. Exploratory analysis with kernels
A well know example: kernel PCA [Schölkopf et al., 1998]
PCA analysis performed in the feature space induced by the kernel K.
In practice:
K is centered: K ← K − 1
N KIN + 1
N2 IN
KIN;
K-PCA is performed by the eigen-decomposition of (centered) K
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 16/41
29. Exploratory analysis with kernels
A well know example: kernel PCA [Schölkopf et al., 1998]
PCA analysis performed in the feature space induced by the kernel K.
In practice:
K is centered: K ← K − 1
N KIN + 1
N2 IN
KIN;
K-PCA is performed by the eigen-decomposition of (centered) K
If (αk )k=1,...,N ∈ RN
and (λk )k=1,...,N are the eigenvectors and eigenvalues,
PC axes are:
ak =
N
i=1
αkiφ(xi)
and ak = (aki)i=1,...,n are orthonormal in the feature space induced by the
kernel:
∀ k, k , ak , ak = αk Kαk = δkk with δkk =
0 if k k
1 otherwise
.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 16/41
30. Exploratory analysis with kernels
A well know example: kernel PCA [Schölkopf et al., 1998]
PCA analysis performed in the feature space induced by the kernel K.
In practice:
K is centered: K ← K − 1
N KIN + 1
N2 IN
KIN;
K-PCA is performed by the eigen-decomposition of (centered) K
Coordinate of the projection of the observations (φ(xi))i:
ak , φ(xi) =
n
j=1
αkjKji = Ki.αk = λk αki,
where Ki. is the i-th row of K.
No representation for the variables (no real variables...).
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 16/41
31. Exploratory analysis with kernels
A well know example: kernel PCA [Schölkopf et al., 1998]
PCA analysis performed in the feature space induced by the kernel K.
In practice:
K is centered: K ← K − 1
N KIN + 1
N2 IN
KIN;
K-PCA is performed by the eigen-decomposition of (centered) K
Other unsupervised kernel methods: kernel SOM
[Olteanu and Villa-Vialaneix, 2015, Mariette et al., 2017]
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 16/41
32. Usefulness of K-PCA
Non linear PCA
Source: By Petter Strandmark - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=3936753
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 17/41
33. Usefulness of K-PCA
[Mariette et al., 2017] K-PCA for non numeric datasets - here a
quantitative time series: job trajectories after graduation from the French
survey “Generation 98” [Cottrell and Letrémy, 2005]
color is the mode of the trajectories
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 17/41
34. From multiple dissimilarities to multiple kernels
1 several (non Euclidean) dissimilarities D1
, . . . , DM
, transformed into
similarities with [Lee and Verleysen, 2007]:
Km
(xi, xj) = −
1
2
Dm
(xi, xj) −
2
N
N
k=1
Dm
(xi, xk ) +
1
N2
N
k, k =1
Dm
(xk , xk )
2 if non positive, clipping or flipping (removing the negative part of the
eigenvalues decomposition or taking its opposite) produce kernels
[Chen et al., 2009].
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 18/41
35. From multiple kernels to an integrated kernel
How to combine multiple kernels?
naive approach: K∗ = 1
M m Km
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 19/41
36. From multiple kernels to an integrated kernel
How to combine multiple kernels?
naive approach: K∗ = 1
M m Km
supervised framework: K∗ = m βmKm
with βm ≥ 0 and m βm = 1
with βm chosen so as to minimize the prediction error
[Gönen and Alpaydin, 2011]
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 19/41
37. From multiple kernels to an integrated kernel
How to combine multiple kernels?
naive approach: K∗ = 1
M m Km
supervised framework: K∗ = m βmKm
with βm ≥ 0 and m βm = 1
with βm chosen so as to minimize the prediction error
[Gönen and Alpaydin, 2011]
unsupervised framework but input space is Rd
[Zhuang et al., 2011]
K∗ = m βmKm
with βm ≥ 0 and m βm = 1 with βm chosen so as to
minimize the distortion between all training data ij K∗
(xi, xj) xi − xj
2
;
AND minimize the approximation of the original data by the kernel
embedding i xi − j K∗
(xi, xj)xj
2
.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 19/41
38. From multiple kernels to an integrated kernel
How to combine multiple kernels?
naive approach: K∗ = 1
M m Km
supervised framework: K∗ = m βmKm
with βm ≥ 0 and m βm = 1
with βm chosen so as to minimize the prediction error
[Gönen and Alpaydin, 2011]
unsupervised framework but input space is Rd
[Zhuang et al., 2011]
K∗ = m βmKm
with βm ≥ 0 and m βm = 1 with βm chosen so as to
minimize the distortion between all training data ij K∗
(xi, xj) xi − xj
2
;
AND minimize the approximation of the original data by the kernel
embedding i xi − j K∗
(xi, xj)xj
2
.
Our proposal: 2 UMKL frameworks which do not require data to have
values in Rd
.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 19/41
39. STATIS like framework
[L’Hermier des Plantes, 1976, Lavit et al., 1994]
Similarities between kernels:
Cmm =
Km
, Km
F
Km
F Km
F
=
Trace(Km
Km
)
Trace((Km)2)Trace((Km )2)
.
(Cmm is an extension of the RV-coefficient [Robert and Escoufier, 1976] to
the kernel framework)
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 20/41
40. STATIS like framework
[L’Hermier des Plantes, 1976, Lavit et al., 1994]
Similarities between kernels:
Cmm =
Km
, Km
F
Km
F Km
F
=
Trace(Km
Km
)
Trace((Km)2)Trace((Km )2)
.
(Cmm is an extension of the RV-coefficient [Robert and Escoufier, 1976] to
the kernel framework)
maximize
M
m=1
K∗
(v),
Km
Km
F F
= v Cv
for K∗
(v) =
M
m=1
vmKm
and v ∈ RM
such that v 2 = 1.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 20/41
41. STATIS like framework
[L’Hermier des Plantes, 1976, Lavit et al., 1994]
Similarities between kernels:
Cmm =
Km
, Km
F
Km
F Km
F
=
Trace(Km
Km
)
Trace((Km)2)Trace((Km )2)
.
(Cmm is an extension of the RV-coefficient [Robert and Escoufier, 1976] to
the kernel framework)
maximize
M
m=1
K∗
(v),
Km
Km
F F
= v Cv
for K∗
(v) =
M
m=1
vmKm
and v ∈ RM
such that v 2 = 1.
Solution: first eigenvector of C ⇒ Set β = v
M
m=1 vm
(consensual kernel).
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 20/41
42. A kernel preserving the original topology of the data I
From an idea similar to that of [Lin et al., 2010], find a kernel such that the
local geometry of the data in the feature space is similar to that of the
original data.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 21/41
43. A kernel preserving the original topology of the data I
From an idea similar to that of [Lin et al., 2010], find a kernel such that the
local geometry of the data in the feature space is similar to that of the
original data.
Proxy of the local geometry
Km
−→ Gm
k
k−nearest neighbors graph
−→ Am
k
adjacency matrix
⇒ W = m I{Am
k
>0} or W = m Am
k
Adjacency matrix image from: By S. Mohammad H. Oloomi, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=35313532
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 21/41
44. A kernel preserving the original topology of the data I
From an idea similar to that of [Lin et al., 2010], find a kernel such that the
local geometry of the data in the feature space is similar to that of the
original data.
Proxy of the local geometry
Km
−→ Gm
k
k−nearest neighbors graph
−→ Am
k
adjacency matrix
⇒ W = m I{Am
k
>0} or W = m Am
k
Feature space geometry measured by
∆i(β) = φ∗
β(xi),
φ∗
β(x1)
...
φ∗
β(xN)
=
K∗
β (xi, x1)
...
K∗
β (xi, xN)
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 21/41
45. A kernel preserving the original topology of the data II
Sparse version
minimize
N
i,j=1
Wij ∆i(β) − ∆j(β)
2
for K∗
β =
M
m=1
βmKm
and β ∈ RM
st βm ≥ 0 and
M
m=1
βm = 1.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 22/41
46. A kernel preserving the original topology of the data II
Sparse version
minimize
N
i,j=1
Wij ∆i(β) − ∆j(β)
2
for K∗
β =
M
m=1
βmKm
and β ∈ RM
st βm ≥ 0 and
M
m=1
βm = 1.
⇔ minimize
M
m,m =1
βmβm Smm
β ∈ RM
such that βm ≥ 0 and
M
m=1
βm = 1,
for Smm = N
i,j=1 Wij ∆m
i
− ∆m
j
2
and ∆m
i
=
Km
(xi, x1)
...
Km
(xi, xN)
.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 22/41
47. A kernel preserving the original topology of the data II
Non sparse version
minimize
N
i,j=1
Wij ∆i(β) − ∆j(β)
2
for K∗
v =
M
m=1
vmKm
and v ∈ RM
st vm ≥ 0 and v 2 = 1.
⇔ minimize
M
m,m =1
vmvm Smm
v ∈ RM
such that vm ≥ 0 and v 2 = 1,
for Smm = N
i,j=1 Wij ∆m
i
− ∆m
j
2
and ∆m
i
=
Km
(xi, x1)
...
Km
(xi, xN)
.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 22/41
48. Optimization issues
Sparse version writes minβ βT
Sβ st β ≥ 0 and β 1 = m βm = 1 ⇒
standard QP problem with linear constrains (ex: package quadprog
in R).
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 23/41
49. Optimization issues
Sparse version writes minβ βT
Sβ st β ≥ 0 and β 1 = m βm = 1 ⇒
standard QP problem with linear constrains (ex: package quadprog
in R).
Non sparse version writes minβ βT
Sβ st β ≥ 0 and β 2 = 1 ⇒ QPQC
problem (hard to solve).
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 23/41
50. Optimization issues
Sparse version writes minβ βT
Sβ st β ≥ 0 and β 1 = m βm = 1 ⇒
standard QP problem with linear constrains (ex: package quadprog
in R).
Non sparse version writes minβ βT
Sβ st β ≥ 0 and β 2 = 1 ⇒ QPQC
problem (hard to solve).
Equivalent to the following problem: minβ,B Trace(S2X) st
Trace(AX) = 1, Trace(AjX) ≥ 0 and B = β β with:
X =
1 β
β B
A =
0 0M
0M IM
Aj =
0 1j
1j 0MM
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 23/41
51. Optimization issues
Sparse version writes minβ βT
Sβ st β ≥ 0 and β 1 = m βm = 1 ⇒
standard QP problem with linear constrains (ex: package quadprog
in R).
Non sparse version writes minβ βT
Sβ st β ≥ 0 and β 2 = 1 ⇒ QPQC
problem (hard to solve).
Relaxed into to the following problem: minβ,B Trace(S2X) st
Trace(AX) = 1, Trace(AjX) ≥ 0 with:
X =
1 β
β B
is positive semi-definite
A =
0 0M
0M IM
Aj =
0 1j
1j 0MM
Semi-definite programming ⇒ efficient solvers exist.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 23/41
52. A proposal to improve interpretability of K-PCA in our
framework
Issue: How to assess the importance of a given species in the K-PCA?
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 24/41
53. A proposal to improve interpretability of K-PCA in our
framework
Issue: How to assess the importance of a given species in the K-PCA?
our datasets are either numeric (environmental) or are built from a
n × p count matrix
⇒ for a given species, randomly permute counts and re-do the
analysis (kernel computation - with the same optimized weights - and
K-PCA)
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 24/41
54. A proposal to improve interpretability of K-PCA in our
framework
Issue: How to assess the importance of a given species in the K-PCA?
our datasets are either numeric (environmental) or are built from a
n × p count matrix
⇒ for a given species, randomly permute counts and re-do the
analysis (kernel computation - with the same optimized weights - and
K-PCA)
the influence of a given species in a given dataset on a given PC
subspace is accessed by computing the Crone-Crosby distance
between these two PCA subspaces [Crone and Crosby, 1995] (∼
Frobenius norm between the projectors)
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 24/41
55. Sommaire
1 Metagenomic datasets and associated questions
2 A typical (and rich) case study: TARA Oceans datasets
3 A UMKL framework for integrating multiple metagenomic data
4 Application to TARA Oceans datasets
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 25/41
56. Integrating ’omics data using kernels
M TARA Oceans datasets
(xm
i
)i=1,...,n,m=1,...,M measured on the same
ocean samples (1, . . . , N) which take
values in an arbitrary space (Xm
)m:
environmental dataset,
bacteria phylogenomic tree,
bacteria functional composition,
eukaryote pico-plankton composition,
. . .
virus composition.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 26/41
57. Integrating ’omics data using kernels
Environmental dataset: standard euclidean
distance, given by K(xi, xj) = xT
i
xj.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 26/41
58. Integrating ’omics data using kernels
Bacteria phylogenomic tree: the weighted
Unifrac distance, given by
dwUF (xi, xj) =
e le|pei − pej|
e pei + pej
.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 26/41
59. Integrating ’omics data using kernels
All composition based datasets: bacteria
functional composition, eukaryote (pico,
nano, micro, meso)-plankton composition
and virus composition calculated using the
Bray-Curtis dissimilarity,
dBC(xi, xj) =
g |nig − njg|
g nig + njg
,
nig: gene g abundances summarized at the
KEGG orthologous groups level in sample
i.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 26/41
60. Integrating ’omics data using kernels
Combinaison of M kernels by a weighted
sum
K∗
=
M
m=1
βmKm
,
where βm ≥ 0 and M
m=1 βm = 1.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 26/41
61. Integrating ’omics data using kernels
Apply standard data mining methods
(clustering, linear model, PCA, . . . ) in the
feature space.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 26/41
63. Correlation between kernels (STATIS)
Low correlations between the bacteria functional composition and
other datasets.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 27/41
64. Correlation between kernels (STATIS)
Low correlations between the bacteria functional composition and
other datasets.
Strong correlation between environmental variables and small
organisms (bacteria, eukarote pico-plankton and virus).
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 27/41
65. Influence of k (nb of neighbors) on (βm)m
k ≥ 5 provides stable results
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 28/41
66. (βm)m values returned by graph-MKL
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 29/41
67. (βm)m values returned by graph-MKL
The dataset the less correlated to the others: the bacteria functional
composition has the highest coefficient.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 29/41
68. (βm)m values returned by graph-MKL
The dataset the less correlated to the others: the bacteria functional
composition has the highest coefficient.
Three kernels have a weight equal to 0 (sparse version).
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 29/41
69. Proof of concept: using [Sunagawa et al., 2015]
Datasets
139 samples, 3 layers (SRF, DCM and MES)
kernels: phychem, pro-OTUs and pro-OGs
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 30/41
70. Proof of concept: using [Sunagawa et al., 2015]
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 31/41
71. Proof of concept: using [Sunagawa et al., 2015]
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 32/41
72. Proof of concept: using [Sunagawa et al., 2015]
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 33/41
73. Proof of concept: using [Sunagawa et al., 2015]
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 34/41
74. Proof of concept: using [Sunagawa et al., 2015]
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 35/41
75. Proof of concept: using [Sunagawa et al., 2015]
Proteobacteria (clade SAR11 (Alphaproteobacteria) and SAR86)
dominate the sampled areas of the ocean in term of relative
abundance and taxonomic richness.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 36/41
79. Conclusion et perspectives
Summary
an integrative exploratory method
... particularly well suited for multi metagenomic datasets
with enhanced interpretability
Perspectives
implement SDP solution and test it
improve biological interpretation
soon-to-be-released R package
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 40/41
81. References
Aronszajn, N. (1950).
Theory of reproducing kernels.
Transactions of the American Mathematical Society, 68(3):337–404.
Brum, J., Ignacio-Espinoza, J., Roux, S., Doulcier, G., Acinas, S., Alberti, A., Chaffron, S., Cruaud, C., de Vargas, C., Gasol, J.,
Gorsky, G., Gregory, A., Guidi, L., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Poulos, B., Schwenck, S., Speich, S.,
Dimier, C., Kandels-Lewis, S., Picheral, M., Searson, S., Tara Oceans coordinators, Bork, P., Bowler, C., Sunagawa, S., Wincker,
P., Karsenti, E., and Sullivan, M. (2015).
Patterns and ecological drivers of ocean viral communities.
Science, 348(6237).
Chen, Y., Garcia, E., Gupta, M., Rahimi, A., and Cazzanti, L. (2009).
Similarity-based classification: concepts and algorithm.
Journal of Machine Learning Research, 10:747–776.
Cottrell, M. and Letrémy, P. (2005).
How to use the Kohonen algorithm to simultaneously analyse individuals in a survey.
Neurocomputing, 63:193–207.
Crone, L. and Crosby, D. (1995).
Statistical applications of a metric on subspaces to satellite meteorology.
Technometrics, 37(3):324–328.
de Vargas, C., Audic, S., Henry, N., Decelle, J., Mahé, P., Logares, R., Lara, E., Berney, C., Le Bescot, N., Probert, I.,
Carmichael, M., Poulain, J., Romac, S., Colin, S., Aury, J., Bittner, L., Chaffron, S., Dunthorn, M., Engelen, S., Flegontova, O.,
Guidi, L., Horák, A., Jaillon, O., Lima-Mendez, G., Lukeš, J., Malviya, S., Morard, R., Mulot, M., Scalco, E., Siano, R., Vincent, F.,
Zingone, A., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Acinas, S., Bork, P., Bowler, C.,
Gorsky, G., Grimsley, N., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Raes, J., Sieracki, M. E., Speich, S.,
Stemmann, L., Sunagawa, S., Weissenbach, J., Wincker, P., and Karsenti, E. (2015).
Eukaryotic plankton diversity in the sunlit ocean.
Science, 348(6237).
Gönen, M. and Alpaydin, E. (2011).
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 41/41
82. Multiple kernel learning algorithms.
Journal of Machine Learning Research, 12:2211–2268.
Lavit, C., Escoufier, Y., Sabatier, R., and Traissac, P. (1994).
The ACT (STATIS method).
Computational Statistics and Data Analysis, 18(1):97–119.
Lee, J. and Verleysen, M. (2007).
Nonlinear Dimensionality Reduction.
Information Science and Statistics. Springer, New York; London.
L’Hermier des Plantes, H. (1976).
Structuration des tableaux à trois indices de la statistique.
PhD thesis, Université de Montpellier.
Thèse de troisième cycle.
Lima-Mendez, G., Faust, K., Henry, N., Decelle, J., Colin, S., Carcillo, F., Chaffron, S., Ignacio-Espinosa, J., Roux, S., Vincent, F.,
Bittner, L., Darzi, Y., Wang, B., Audic, S., Berline, L., Bontempi, G., Cabello, A., Coppola, L., Cornejo-Castillo, F., d’Oviedo, F.,
de Meester, L., Ferrera, I., Garet-Delmas, M., Guidi, L., Lara, E., Pesant, S., Royo-Llonch, M., Salazar, F., Sánchez, P.,
Sebastian, M., Souffreau, C., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Gorsky, G.,
Not, F., Ogata, H., Speich, S., Stemmann, L., Weissenbach, J., Wincker, P., Acinas, S., Sunagawa, S., Bork, P., Sullivan, M.,
Karsenti, E., Bowler, C., de Vargas, C., and Raes, J. (2015).
Determinants of community structure in the global plankton interactome.
Science, 348(6237).
Lin, Y., Liu, T., and CS., F. (2010).
Multiple kernel learning for dimensionality reduction.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 33:1147–1160.
Mariette, J., Olteanu, M., and Villa-Vialaneix, N. (2017).
Efficient interpretable variants of online SOM for large dissimilarity data.
Neurocomputing, 225:31–48.
Olteanu, M. and Villa-Vialaneix, N. (2015).
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 41/41
83. On-line relational and multiple relational SOM.
Neurocomputing, 147:15–30.
Robert, P. and Escoufier, Y. (1976).
A unifying tool for linear multivariate statistical methods: the rv-coefficient.
Applied Statistics, 25(3):257–265.
Schölkopf, B., Smola, A., and Müller, K. (1998).
Nonlinear component analysis as a kernel eigenvalue problem.
Neural Computation, 10(5):1299–1319.
Sommer, M., Church, G., and Dantas, G. (2010).
A functional metagenomic approach for expanding the synthetic biology toolbox for biomass conversion.
Molecular Systems Biology, 6(360).
Sunagawa, S., Coelho, L., Chaffron, S., Kultima, J., Labadie, K., Salazar, F., Djahanschiri, B., Zeller, G., Mende, D., Alberti, A.,
Cornejo-Castillo, F., Costea, P., Cruaud, C., d’Oviedo, F., Engelen, S., Ferrera, I., Gasol, J., Guidi, L., Hildebrand, F., Kokoszka,
F., Lepoivre, C., Lima-Mendez, G., Poulain, J., Poulos, B., Royo-Llonch, M., Sarmento, H., Vieira-Silva, S., Dimier, C., Picheral,
M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Bowler, C., de Vargas, C., Gorsky, G., Grimsley, N., Hingamp, P.,
Iudicone, D., Jaillon, O., Not, F., Ogata, H., Pesant, S., Speich, S., Stemmann, L., Sullivan, M., Weissenbach, J., Wincker, P.,
Karsenti, E., Raes, J., Acinas, S., and Bork, P. (2015).
Structure and function of the global ocean microbiome.
Science, 348(6237).
Zhuang, J., Wang, J., Hoi, S., and Lan, X. (2011).
Unsupervised multiple kernel clustering.
Journal of Machine Learning Research: Workshop and Conference Proceedings, 20:129–144.
Nathalie Villa-Vialaneix | Unsupervised multiple kernel learning 41/41