Shnier et al., Persistent homology analysis of brain transcriptome data in autism
Qaiser et al., Fast and accurate tumor segmentation of histology images using persistent homology and deep convolutional features
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Constructing a classification model is important in machine learning for a particular task. A
classification process involves assigning objects into predefined groups or classes based on a
number of observed attributes related to those objects. Artificial neural network is one of the
classification algorithms which, can be used in many application areas. This paper investigates
the potential of applying the feed forward neural network architecture for the classification of
medical datasets. Migration based differential evolution algorithm (MBDE) is chosen and
applied to feed forward neural network to enhance the learning process and the network
learning is validated in terms of convergence rate and classification accuracy. In this paper,
MBDE algorithm with various migration policies is proposed for classification problems using
medical diagnosis.
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
Feature selection is an effective method used in text categorization for sorting a set of documents into certain number of predefined categories. It is an important method for improving the efficiency and accuracy of text categorization algorithms by removing irredundant terms from the corpus. Genome contains the total amount of genetic information in the chromosomes of an organism, including its genes and DNA sequences. In this paper a Clustering technique called Hierarchical Techniques is used tocategories the Features from the Genome documents. A framework is proposed for Genomic Feature set Selection. A Filter based Feature Selection Method like
2 statistics, CHIR statistics are used to select the Feature set. The Selected Feature set is verified by using F-measure and it is biologically validated for Biological relevance using the BLAST tool.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Constructing a classification model is important in machine learning for a particular task. A
classification process involves assigning objects into predefined groups or classes based on a
number of observed attributes related to those objects. Artificial neural network is one of the
classification algorithms which, can be used in many application areas. This paper investigates
the potential of applying the feed forward neural network architecture for the classification of
medical datasets. Migration based differential evolution algorithm (MBDE) is chosen and
applied to feed forward neural network to enhance the learning process and the network
learning is validated in terms of convergence rate and classification accuracy. In this paper,
MBDE algorithm with various migration policies is proposed for classification problems using
medical diagnosis.
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
Feature selection is an effective method used in text categorization for sorting a set of documents into certain number of predefined categories. It is an important method for improving the efficiency and accuracy of text categorization algorithms by removing irredundant terms from the corpus. Genome contains the total amount of genetic information in the chromosomes of an organism, including its genes and DNA sequences. In this paper a Clustering technique called Hierarchical Techniques is used tocategories the Features from the Genome documents. A framework is proposed for Genomic Feature set Selection. A Filter based Feature Selection Method like
2 statistics, CHIR statistics are used to select the Feature set. The Selected Feature set is verified by using F-measure and it is biologically validated for Biological relevance using the BLAST tool.
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
I gave this talk in the EDBT 2014 conference, which tool place in Athens, Greece.
I show how data examples can be used to characterize the behavior of scientific modules. I present a new methods that automatically generate the data examples, and show that such data examples are useful for the human user to understand the task of the modules, and that they can be used to assist curators in repairing broken workflows (i.e., workflows for which one or more modules are no longer supplied by their providers)
Automatic Feature Subset Selection using Genetic Algorithm for Clusteringidescitation
Feature subset selection is a process of selecting a
subset of minimal, relevant features and is a pre processing
technique for a wide variety of applications. High dimensional
data clustering is a challenging task in data mining. Reduced
set of features helps to make the patterns easier to understand.
Reduced set of features are more significant if they are
application specific. Almost all existing feature subset
selection algorithms are not automatic and are not application
specific. This paper made an attempt to find the feature subset
for optimal clusters while clustering. The proposed Automatic
Feature Subset Selection using Genetic Algorithm (AFSGA)
identifies the required features automatically and reduces
the computational cost in determining good clusters. The
performance of AFSGA is tested using public and synthetic
datasets with varying dimensionality. Experimental results
have shown the improved efficacy of the algorithm with optimal
clusters and computational cost.
Delineation of techniques to implement on the enhanced proposed model using d...ijdms
In post genomic era with the advent of new technologies a huge amount of complex molecular data are
generated with high throughput. The management of this biological data is definitely a challenging task
due to complexity and heterogeneity of data for discovering new knowledge. Issues like managing noisy
and incomplete data are needed to be dealt with. Use of data mining in biological domain has made its
inventory success. Discovering new knowledge from the biological data is a major challenge in data
mining technique. The novelty of the proposed model is its combined use of intelligent techniques to classify
the protein sequence faster and efficiently. Use of FFT, fuzzy classifier, String weighted algorithm, gram
encoding method, neural network model and rough set classifier in a single model and in an appropriate
place can enhance the quality of the classification system .Thus the primary challenge is to identify and
classify the large protein sequences in a very fast and easy but intellectual way to decrease the time
complexity and space complexity.
Fuzzy clustering has been widely studied and applied in a variety of key areas of science and
engineering. In this paper the Improved Teaching Learning Based Optimization (ITLBO)
algorithm is used for data clustering, in which the objects in the same cluster are similar. This
algorithm has been tested on several datasets and compared with some other popular algorithm
in clustering. Results have been shown that the proposed method improves the output of
clustering and can be efficiently used for fuzzy clustering.
Patterns that only occur in objects belonging to a
single class are called Jumping Emerging Patterns (JEP). JEP
based Classifiers are considered one of the successful classification
systems. Due to its comprehensibility, simplicity and strong
differentiating abilities JEPs have captured significant recognition.
However, discovery of JEPs in a large pattern space is normally a
time consuming and challenging task because of their exponential
behaviour. In this work a novel method based on genetic
algorithm (GA) is proposed to discover JEPs in large pattern
space. Since the complexity of GA is lower than other algorithms,
so we have combined the power of JEPs and GA to find high
quality JEPs from datasets to improve performance of
classification system. Our proposed method explores a set of high
quality JEPs from pattern search space unlike other methods in
literature that compute complete set of JEPs, Large numbers of
duplicate and redundant JEPs are filtered out during their
discovery process. Experimental results show that our proposed
Genetic-JEPs are effective and accurate for classification of a
variety of data sets and in general achieve higher accuracy than
other standard classifiers.
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
I gave this talk in the EDBT 2014 conference, which tool place in Athens, Greece.
I show how data examples can be used to characterize the behavior of scientific modules. I present a new methods that automatically generate the data examples, and show that such data examples are useful for the human user to understand the task of the modules, and that they can be used to assist curators in repairing broken workflows (i.e., workflows for which one or more modules are no longer supplied by their providers)
Automatic Feature Subset Selection using Genetic Algorithm for Clusteringidescitation
Feature subset selection is a process of selecting a
subset of minimal, relevant features and is a pre processing
technique for a wide variety of applications. High dimensional
data clustering is a challenging task in data mining. Reduced
set of features helps to make the patterns easier to understand.
Reduced set of features are more significant if they are
application specific. Almost all existing feature subset
selection algorithms are not automatic and are not application
specific. This paper made an attempt to find the feature subset
for optimal clusters while clustering. The proposed Automatic
Feature Subset Selection using Genetic Algorithm (AFSGA)
identifies the required features automatically and reduces
the computational cost in determining good clusters. The
performance of AFSGA is tested using public and synthetic
datasets with varying dimensionality. Experimental results
have shown the improved efficacy of the algorithm with optimal
clusters and computational cost.
Delineation of techniques to implement on the enhanced proposed model using d...ijdms
In post genomic era with the advent of new technologies a huge amount of complex molecular data are
generated with high throughput. The management of this biological data is definitely a challenging task
due to complexity and heterogeneity of data for discovering new knowledge. Issues like managing noisy
and incomplete data are needed to be dealt with. Use of data mining in biological domain has made its
inventory success. Discovering new knowledge from the biological data is a major challenge in data
mining technique. The novelty of the proposed model is its combined use of intelligent techniques to classify
the protein sequence faster and efficiently. Use of FFT, fuzzy classifier, String weighted algorithm, gram
encoding method, neural network model and rough set classifier in a single model and in an appropriate
place can enhance the quality of the classification system .Thus the primary challenge is to identify and
classify the large protein sequences in a very fast and easy but intellectual way to decrease the time
complexity and space complexity.
Fuzzy clustering has been widely studied and applied in a variety of key areas of science and
engineering. In this paper the Improved Teaching Learning Based Optimization (ITLBO)
algorithm is used for data clustering, in which the objects in the same cluster are similar. This
algorithm has been tested on several datasets and compared with some other popular algorithm
in clustering. Results have been shown that the proposed method improves the output of
clustering and can be efficiently used for fuzzy clustering.
Patterns that only occur in objects belonging to a
single class are called Jumping Emerging Patterns (JEP). JEP
based Classifiers are considered one of the successful classification
systems. Due to its comprehensibility, simplicity and strong
differentiating abilities JEPs have captured significant recognition.
However, discovery of JEPs in a large pattern space is normally a
time consuming and challenging task because of their exponential
behaviour. In this work a novel method based on genetic
algorithm (GA) is proposed to discover JEPs in large pattern
space. Since the complexity of GA is lower than other algorithms,
so we have combined the power of JEPs and GA to find high
quality JEPs from datasets to improve performance of
classification system. Our proposed method explores a set of high
quality JEPs from pattern search space unlike other methods in
literature that compute complete set of JEPs, Large numbers of
duplicate and redundant JEPs are filtered out during their
discovery process. Experimental results show that our proposed
Genetic-JEPs are effective and accurate for classification of a
variety of data sets and in general achieve higher accuracy than
other standard classifiers.
Problems in Task Scheduling in Multiprocessor Systemijtsrd
This Contemporary computer systems are multiprocessor or multicomputer machines. Their efficiency depends on good methods of administering the executed works. Fast processing of a parallel application is possible only when its parts are appropriately ordered in time and space. This calls for efficient scheduling policies in parallel computer systems. In this work deterministic problems of scheduling are considered. The classical scheduling theory assumed that the application in any moment of time is executed by only one processor. This assumption has been weakened recently, especially in the context of parallel and distributed computer systems. This monograph is devoted to problems of deterministic scheduling applications (or tasks according to the scheduling terminology) requiring more than one processor simultaneously. We name such applications multiprocessor tasks. In this work the complexity of open multiprocessor task scheduling problems has been established. Algorithms for scheduling multiprocessor tasks on parallel and dedicated processors are proposed. For a special case of applications with regular structure which allow for dividing it into parts of arbitrary size processed independently in parallel, a method of finding optimal scattering of work in a distributed computer system is proposed. The applications with such regular characteristics are called divisible tasks. The concept of a divisible task enables creation of tractable computation models in a wide class of computer architectures such as chains, stars, meshes, hypercubes, multistage networks. Divisible task method gives rise to the evaluation of computer system performance. Examples of such performance evaluation are presented. This work summarizes earlier works of the author as well as contains new original results. Mukul Varshney | Jyotsna | Abhakiran Rajpoot | Shivani Garg"Problems in Task Scheduling in Multiprocessor System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: http://www.ijtsrd.com/papers/ijtsrd2198.pdf http://www.ijtsrd.com/computer-science/computer-architecture/2198/problems-in-task-scheduling-in-multiprocessor-system/mukul-varshney
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
Get more information:
http://imdevsoftware.wordpress.com/2014/10/11/2014-metabolomic-data-analysis-and-visualization-workshop-and-tutorials/
Recently I had the pleasure of teaching statistical and multivariate data analysis and visualization at the annual Summer Sessions in Metabolomics 2014, organized by the NIH West Coast Metabolomics Center.
Similar to last year, I’ve posted all the content (lectures, labs and software) for any one to follow along with at their own pace. I also plan to release videos for all the lectures and labs.
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
Introductory lecture to multivariate analysis of proteomic data.
Material from the UC Davis 2014 Proteomics Workshop.
See more at: http://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/
BPSO&1-NN algorithm-based variable selection for power system stability ident...IJAEMSJORNAL
Due to the very high nonlinearity of the power system, traditional analytical methods take a lot of time to solve, causing delay in decision-making. Therefore, quickly detecting power system instability helps the control system to make timely decisions become the key factor to ensure stable operation of the power system. Power system stability identification encounters large data set size problem. The need is to select representative variables as input variables for the identifier. This paper proposes to apply wrapper method to select variables. In which, Binary Particle Swarm Optimization (BPSO) algorithm combines with K-NN (K=1) identifier to search for good set of variables. It is named BPSO&1-NN. Test results on IEEE 39-bus diagram show that the proposed method achieves the goal of reducing variables with high accuracy.
Intelligent Controller Design for a Chemical ProcessCSCJournals
Abstract - Chemical process control is a challenging problem due to the strong on-line non-linearity and extreme sensitivity to disturbances of the process. Ziegler – Nichols tuned PI and PID controllers are found to provide poor performances for higher-order and non–linear systems. This paper presents an application of one-step-ahead fuzzy as well as ANFIS (adaptive-network-based fuzzy inference system) tuning scheme for an Continuous Stirred Tank Reactor CSTR process. The controller is designed based on a Mamdani type and Sugeno type fuzzy system constructed to model the dynamics of the process. The fuzzy system model can take advantage of both a priori linguistic human knowledge through parameter initialization, and process measurements through on- line parameter adjustment. The ANFIS, which is a fuzzy inference system, is implemented in the framework of adaptive networks. The proposed ANFIS can construct an input-output mapping based on both human knowledge (in the form of fuzzy if-then rules) and stipulated input-output data pairs. In this method, a novel approach based on tuning of fuzzy logic control as well as ANFIS for a CSTR process, capable of providing an optimal performance over the entire operating range of process are given. Here Fuzzy logic control as well as ANFIS for obtaining the optimal design of the CSTR process is explained. In this approach, the development of rule based and the formation of the membership function are evolved simultaneously. The performance of the algorithm in obtaining the optimal tuning values has been analyzed in CSTR process through computer simulation.
Paper memo: Optimal-Transport Analysis of Single-Cell Gene Expression Identif...Ryohei Suzuki
Journal club slide for the following paper:
Schiebinger et al., 2016, Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming, Cell 176, 928--943.
Ryohei Suzuki and Takeo Igarashi, Collaborative 3D Modeling by the Crowd, in Proceedings of the 43rd International Conference on Graphics, Visualization & Human-computer Interaction (GI 2017)
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Nutraceutical market, scope and growth: Herbal drug technology
Paper memo: persistent homology on biological problems
1. Journal Club Sep. 18, 2020 (Ryohei Suzuki)
J. R. Soc. Interface 16.158 (2019): 16:20190531
Medical image analysis 55 (2019): 1-14.
2. Topological data analysis (TDA)
Why TDA?
• TDA provides metric-invariant
summarization of complex and
high-dimensional data
cf. many normalization modes of RNA-seq
• TDA robustly handles the global
structure of data in intuitive way
Applications of TDA
• Material science (crystal structure)
• Network analysis
• Peak detection, etc.
Topology(位相幾何学)
= mathematical framework for
describing the “shape” of object
that is invariant with respect to
continuous deformation
4. Persistent homology
Assuming the input to be a point set, observe the transition of topological
features of the complex given by connecting points with growing radius ε
Lifetime of individual
connected components
Lifetime of individual rings
Robust ring
structure
Called “barcode”
Birth of ring Death of ring
5. Persistent diagram
Scatter graph showing the
birth-(x-axis) and death-time (y-axis)
of topological components
→ representing the information of
global structure of the data
Further analysis
- Calculating summarized values
e.g. sum of the cycle length SLk
- Classification of diagrams as images
Figure from https://www.pnas.org/content/113/26/7035
Robust ring
Transient
rings
6. Goal: discover the transcriptomic characteristics of ASD patients’ brains
• ASD is known to be highly heritable, but no key genetic variant contributing to the disease
is found. Rather, >100 genes are considered to contribute to the risk.
• Several studies have showed transcriptomic differences e.g., the downregulation of
neuronal synaptic genes and the upregulation of immune genes in ASD patients
• More comprehensive study is required to understand the disease
Approach: directly apply persistent homology to expression data
• To see the inter-patient and inter-gene geometries of ASD/healthy groups
Patient-space
Densely-packed topology
= patients have similar expression profiles
Sparsely-packed topology
= patients have heterogeneous expression
7. Dataset and study overview
Datasets
• Dataset 1: microarray (9934 genes, 29 ASD / 29 control) [1], log2-transformed
• Dataset 2: RNA-seq (22399 genes, 82 ASD / 82 control) [2], RPKM & log2-transformed
Procedure
• Calculate the inter-sample and inter-gene
distance matrices for ASD/control expression
• Dissimilarity measure: 1-r (r=Pearson correlation)
• Compute the persistent diagrams
• Derive the summary values
• SDT0
= sum of death times of connected components.
• Euler characteristics = SL0 – SL1 + SL2
※SLk is sum of lifespan of connected components (k=0), rings (k=1), hollows (k=2).
[1] Voineagu et al., (2011) Nature 474, 380-384 [2] Parikshak et al., (2016) Nature 540, 423-427
Sample 1 Sample 2 Sample 3
Gene 1 0.01 0.52 …
Gene 2 0.25
Gene 3 …
Inter-sample
Inter-gene
9. Results (inter-gene)
p=0.316 p=0.403
p=0.998 p=0.997
ASD-PD Control-PD diff SDT0 diff Euler
Author’s conclusion:
ASD/healthy groups don’t have
significant difference in their
transcriptomic organization
Insignificant??
Dense topology
→ expression of
genes correlate well
among samples
Sparse topology
→ less correlation
10. Goal: fast tumor-region segmentation on WSI of colorectal cancer (CRC)
• CRC is the third/second most diagnosed cancer in males/females
• Fast automatic detection of possible tumor regions is vital for clinical use
• CNN-based methods are actively studied, but suffer from computational costs
Approach: use PH-inspired feature to classify patches
• PH of image pixels is calculated via thresholding
• Birth/death time distribution is used as feature
• Comparison of the feature with ~100 exemplars
provides very fast classification model
11. Connecting pixels by thresholding
• Common way to calculate persistent homology for 2D image data
• By lowering the threshold, connected components advent and vanish
(merge) one after another.
Left image from: https://www.nature.com/articles/s41598-018-36798-y
12. Persistent homology profiles (PHP)
• From the thresholding result, probability distributions of birth/death-
time called PHP are constructed (green lines)
• These distributions are
treated as feature vectors
• By comparing PHP of
input data with those of
exemplar T/N images,
fast classification can be
performed.
birth death
tumor
mean
normal
mean
PHP
13. Exemplar selection using CNN activation
• Training dataset contains ~100000 patches
→ we should compare the PHP of input with some representative values
• Improper selection of exemplars
causes overfitting to significant
texture patterns
• Authors proposes a CNN-based
selection strategy where patches
with various feature activation
are equally respected Select k exemplars from
each bin of activation
strength
(highest 1/Q ~ lowest 1/Q)
14. Quantitative classification results
• Proposed algorithm outperforms existing
methods in terms of F1-score in two
distinct dataset
• Generalization has room for improvement,
but best among the tested methods
• Why good? → PHP efficiently captures
connectivity between cells in rotation-
invariant way, which is difficult for convnets
15. Qualitative segmentation results
Comments
• Comparison to the recent deep encoder-decoder models was not conducted
• Batch effects (e.g., contrast) may significantly influence the calculation of PHP
16. Reflection
• (+) Persistent homology provides unique information about the global
structure of the dataset, which is difficult to calculate in raw-data space,
which would be useful for very high-dimensional data with large noise
• (-) Persistent homology only provides highly summarized statistics,
discarding the information about contributions of individual data points,
e.g., which gene set is contributing in ASD patients.
• Combination with CNNs, which perform very good at discovering local
features, seems to be a promising idea for image analysis.