A Review on Covid Detection using Cross Dataset AnalysisIRJET Journal
This document provides an overview of deep learning approaches used for COVID-19 detection using cross-dataset analysis of CT scans. It discusses how cross-dataset analysis aims to improve model accuracy by handling limitations like generalization problems, dataset bias, and robustness to variation in image quality. Several studies that have used techniques like transfer learning, data augmentation, and pre-processing on CT scan datasets are summarized. The studies found that models trained on one dataset performed best on similar datasets, and accuracy dropped when testing on datasets with more variation in images. Overall, the document reviews progress in cross-dataset COVID detection using CT scans, but notes there are still opportunities to address limitations and improve model adaptation across diverse datasets.
Visual data mining combines traditional data mining methods with information visualization techniques to explore large datasets. There are three levels of integration between visualization and automated mining methods - no/limited integration, loose integration where methods are applied sequentially, and full integration where methods are applied in parallel. Different visualization methods exist for univariate, bivariate and multivariate data based on the type and dimensions of the data. The document describes frameworks and algorithms for visual data mining, including developing new algorithms interactively through a visual interface. It also summarizes a document on using data mining and visualization techniques for selective visualization of large spatial datasets.
This document describes a novel graph embedding procedure based on simplicial complexes for graph classification tasks. Simplicial complexes are mathematical objects that can capture multi-way relationships in data beyond pairwise relationships. The proposed approach uses simplicial complexes to extract meaningful substructures from graphs, clusters these substructures to form an alphabet, and then embeds each graph as a symbolic histogram over the alphabet. This moves the problem into a metric space where standard machine learning algorithms can be applied. The approach is tested on 30 graph classification benchmarks and two protein analysis applications to demonstrate its effectiveness.
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper22.pdf
Debapriya Banik and Debotosh Bhattacharjee : Deep Conditional Adversarial learning for polyp Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This approach has addressed the Medico automatic polyp segmentation challenge which is a part of Mediaeval 2020. We have proposed a deep conditional adversarial learning based network for the automatic polyp segmentation task. The network comprises of two interdependent models namely a generator and a discriminator. The generator network is a FCN employed for the prediction of the polyp mask while the discriminator enforces the segmentation to be as similar as the real segmented mask (ground truth). Our proposed model achieved a comparative result on the test dataset provided by the organizers of the challenge.
The document proposes a new method called the Brownian correlation metric prototypical network (BCMPN) for fault diagnosis of rotating machinery. The BCMPN uses a multi-scale mask preprocessing mechanism to improve model performance. It extracts multi-scale features using dilation convolution and an effective light channel attention module. For classification, it measures the difference between the joint feature function and product of marginal distributions using Brownian distance, unlike existing methods that use Euclidean or cosine distance. Experiments on gear dataset and laboratory data show the BCMPN performs better than other methods for problems with few training samples and zero samples in the target domain.
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...IJCI JOURNAL
When only a few lower modes data are available to evaluate a large number of unknown parameters, it is
difficult to acquire information about all unknown parameters. The challenge in this kind of updation
problem is first to get confidence about the parameters that are evaluated correctly using the available
data and second to get information about the remaining parameters. In this work, the first issue is resolved
employing the sensitivity of the modal data used for updation. Once it is fixed that which parameters are
evaluated satisfactorily using the available modal data the remaining parameters are evaluated employing
modal data of a virtual structure. This virtual structure is created by adding or removing some known
stiffness to or from some of the stories of the original structure. A 12-story shear building is considered for
the numerical illustration of the approach. Results of the study show that the present approach is an
effective tool in system identification problem when only a few data is available for updation.
Perceiver CPI is a nested cross-attention network for compound-protein interaction prediction that aims to address drawbacks in existing models. It uses a perceiver-style architecture with compound and protein inputs processed separately before attention-based fusion. The model outperforms state-of-the-art baselines on benchmark datasets, achieving lower error and higher accuracy on tasks including novel pairs, compounds, and proteins. Future work could leverage 3D protein structures from AlphaFold, transfer learning, and improved interpretability.
Advanced machine learning for metabolite identificationDai-Hai Nguyen
The document proposes a machine learning approach called ADAPTIVE for metabolite identification from mass spectrometry. It has two steps: 1) a learning step and 2) a candidate retrieval step. The learning step involves learning mappings from molecular structures to molecular vectors and from spectra to molecular vectors to maximize correlation. The candidate retrieval step takes a query spectrum, converts it to a molecular vector, and searches a database of molecular vectors derived from structures. Experiments on a benchmark dataset show the method achieves better predictive performance than existing methods while being more computationally efficient.
A Review on Covid Detection using Cross Dataset AnalysisIRJET Journal
This document provides an overview of deep learning approaches used for COVID-19 detection using cross-dataset analysis of CT scans. It discusses how cross-dataset analysis aims to improve model accuracy by handling limitations like generalization problems, dataset bias, and robustness to variation in image quality. Several studies that have used techniques like transfer learning, data augmentation, and pre-processing on CT scan datasets are summarized. The studies found that models trained on one dataset performed best on similar datasets, and accuracy dropped when testing on datasets with more variation in images. Overall, the document reviews progress in cross-dataset COVID detection using CT scans, but notes there are still opportunities to address limitations and improve model adaptation across diverse datasets.
Visual data mining combines traditional data mining methods with information visualization techniques to explore large datasets. There are three levels of integration between visualization and automated mining methods - no/limited integration, loose integration where methods are applied sequentially, and full integration where methods are applied in parallel. Different visualization methods exist for univariate, bivariate and multivariate data based on the type and dimensions of the data. The document describes frameworks and algorithms for visual data mining, including developing new algorithms interactively through a visual interface. It also summarizes a document on using data mining and visualization techniques for selective visualization of large spatial datasets.
This document describes a novel graph embedding procedure based on simplicial complexes for graph classification tasks. Simplicial complexes are mathematical objects that can capture multi-way relationships in data beyond pairwise relationships. The proposed approach uses simplicial complexes to extract meaningful substructures from graphs, clusters these substructures to form an alphabet, and then embeds each graph as a symbolic histogram over the alphabet. This moves the problem into a metric space where standard machine learning algorithms can be applied. The approach is tested on 30 graph classification benchmarks and two protein analysis applications to demonstrate its effectiveness.
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper22.pdf
Debapriya Banik and Debotosh Bhattacharjee : Deep Conditional Adversarial learning for polyp Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This approach has addressed the Medico automatic polyp segmentation challenge which is a part of Mediaeval 2020. We have proposed a deep conditional adversarial learning based network for the automatic polyp segmentation task. The network comprises of two interdependent models namely a generator and a discriminator. The generator network is a FCN employed for the prediction of the polyp mask while the discriminator enforces the segmentation to be as similar as the real segmented mask (ground truth). Our proposed model achieved a comparative result on the test dataset provided by the organizers of the challenge.
The document proposes a new method called the Brownian correlation metric prototypical network (BCMPN) for fault diagnosis of rotating machinery. The BCMPN uses a multi-scale mask preprocessing mechanism to improve model performance. It extracts multi-scale features using dilation convolution and an effective light channel attention module. For classification, it measures the difference between the joint feature function and product of marginal distributions using Brownian distance, unlike existing methods that use Euclidean or cosine distance. Experiments on gear dataset and laboratory data show the BCMPN performs better than other methods for problems with few training samples and zero samples in the target domain.
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...IJCI JOURNAL
When only a few lower modes data are available to evaluate a large number of unknown parameters, it is
difficult to acquire information about all unknown parameters. The challenge in this kind of updation
problem is first to get confidence about the parameters that are evaluated correctly using the available
data and second to get information about the remaining parameters. In this work, the first issue is resolved
employing the sensitivity of the modal data used for updation. Once it is fixed that which parameters are
evaluated satisfactorily using the available modal data the remaining parameters are evaluated employing
modal data of a virtual structure. This virtual structure is created by adding or removing some known
stiffness to or from some of the stories of the original structure. A 12-story shear building is considered for
the numerical illustration of the approach. Results of the study show that the present approach is an
effective tool in system identification problem when only a few data is available for updation.
Perceiver CPI is a nested cross-attention network for compound-protein interaction prediction that aims to address drawbacks in existing models. It uses a perceiver-style architecture with compound and protein inputs processed separately before attention-based fusion. The model outperforms state-of-the-art baselines on benchmark datasets, achieving lower error and higher accuracy on tasks including novel pairs, compounds, and proteins. Future work could leverage 3D protein structures from AlphaFold, transfer learning, and improved interpretability.
Advanced machine learning for metabolite identificationDai-Hai Nguyen
The document proposes a machine learning approach called ADAPTIVE for metabolite identification from mass spectrometry. It has two steps: 1) a learning step and 2) a candidate retrieval step. The learning step involves learning mappings from molecular structures to molecular vectors and from spectra to molecular vectors to maximize correlation. The candidate retrieval step takes a query spectrum, converts it to a molecular vector, and searches a database of molecular vectors derived from structures. Experiments on a benchmark dataset show the method achieves better predictive performance than existing methods while being more computationally efficient.
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates the subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques used in the analysis.
Estimating project development effort using clustered regression approachcsandit
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a
challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the
complex and dynamic interaction of factors that impact software development. Heterogeneity
exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying
them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency
due to heterogeneity of the data. Using a clustered approach creates the subsets of data having
a degree of homogeneity that enhances prediction accuracy. It was also observed in this study
that ridge regression performs better than other regression techniques used in the analysis.
Deep learning optimization for drug-target interaction prediction in COVID-19...IJECEIAES
The exponentially increasing bioinformatics data raised a new problem: the computation time length. The amount of data that needs to be processed is not matched by an increase in hardware performance, so it burdens researchers on computation time, especially on drug-target interaction prediction, where the computational complexity is exponential. One of the focuses of high-performance computing research is the utilization of the graphics processing unit (GPU) to perform multiple computations in parallel. This study aims to see how well the GPU performs when used for deep learning problems to predict drug-target interactions. This study used the gold-standard data in drug-target interaction (DTI) and the coronavirus disease (COVID-19) dataset. The stages of this research are data acquisition, data preprocessing, model building, hyperparameter tuning, performance evaluation and COVID-19 dataset testing. The results of this study indicate that the use of GPU in deep learning models can speed up the training process by 100 times. In addition, the hyperparameter tuning process is also greatly helped by the presence of the GPU because it can make the process up to 55 times faster. When tested using the COVID-19 dataset, the model showed good performance with 76% accuracy, 74% F-measure and a speed-up value of 179.
This document describes Visinets, a web-based software tool for pathway modeling and dynamic visualization. The tool uses causal mapping (CMAP) as its mathematical approach, which graphically represents biological networks and interactions. The authors tested Visinets by: 1) building an executable EGFR-MAPK pathway model using its graphical modeling interface; and 2) translating an existing ODE-based insulin signaling model into CMAP format. The testing confirmed CMAP's potential for broad pathway modeling and visualization applications. Visinets offers pathway analysis and dynamic simulation in real time through its web-based graphical interface, providing an alternative for biomedical research.
IRJET - Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence M...IRJET Journal
This document presents a method for classifying plant leaf diseases from color images using texture and color features extracted from the images along with an artificial neural network classifier. The proposed system first preprocesses the input images, then extracts color features like mean and standard deviation of HSV color space and texture features like energy, contrast, homogeneity and correlation using a gray level co-occurrence matrix. These features are then used to train a backpropagation neural network classifier to automatically classify test images into disease categories. Experimental results show the backpropagation network provides high accuracy for plant disease classification, with 97.2% accuracy on validation data and lower error rates than support vector machines.
IRJET- Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence Ma...IRJET Journal
This document presents a method for classifying plant leaf diseases from color images using texture and color features. The proposed system first preprocesses input images, then extracts features like color (mean, standard deviation of HSV channels) and texture (energy, contrast, homogeneity, correlation from GLCM). These features are used to train a backpropagation neural network classifier. The system was tested on images of six plant diseases and showed minimum training error and good classification accuracy. This automated approach could help inexperienced farmers and experts more accurately diagnose plant diseases.
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...IRJET Journal
This document proposes a fusion method for tag-based image reranking and similarity finding based on topic diversity. It focuses on retrieving images using a two-phase online and offline process. In the online phase, images are processed using tag graph construction, community detection, community ranking, and image similarity ranking. It then merges topic diversity ranking and image similarity ranking to create a fusion model for retrieving images. In the offline phase, query keywords are given to the fusion model to retrieve hierarchical images. The goal is to promote topic coverage performance and leverage both relevance and diversity of search results.
IRJET - Symmetric Image Registration based on Intensity and Spatial Informati...IRJET Journal
This document presents a proposed system for symmetric image registration based on intensity and spatial information using a technique called the Coloured Simple Algebraic Algorithm (CSAA). The system first preprocesses color images, extracts features, then classifies images as symmetric or asymmetric using a neural network. It is shown to provide accurate and robust registration of medical and biomedical images. The system is implemented and evaluated on sample images, demonstrating it can successfully identify symmetric versus asymmetric images. The proposed approach aims to improve on existing techniques for intensity-based image registration tasks.
A simplified predictive framework for cost evaluation to fault assessment usi...IJECEIAES
Software engineering is an integral part of any software development scheme which frequently encounters bugs, errors, and faults. Predictive evaluation of software fault contributes towards mitigating this challenge to a large extent; however, there is no benchmarked framework being reported in this case yet. Therefore, this paper introduces a computational framework of the cost evaluation method to facilitate a better form of predictive assessment of software faults. Based on lines of code, the proposed scheme deploys adopts a machine-learning approach to address the perform predictive analysis of faults. The proposed scheme presents an analytical framework of the correlation-based cost model integrated with multiple standards machine learning (ML) models, e.g., linear regression, support vector regression, and artificial neural networks (ANN). These learning models are executed and trained to predict software faults with higher accuracy. The study considers assessing the outcomes based on error-based performance metrics in detail to determine how well each learning model performs and how accurate it is at learning. It also looked at the factors contributing to the training loss of neural networks. The validation result demonstrates that, compared to logistic regression and support vector regression, neural network achieves a significantly lower error score for software fault prediction.
Traffic Outlier Detection by Density-Based Bounded Local Outlier FactorsITIIIndustries
Outlier detection (OD) is widely used in many fields, such as finance, information and medicine, in cleaning up datasets and keeping the useful information. In a traffic system, it alerts the transport department and drivers with abnormal traffic situations such as congestion and traffic accident. This paper presents a density-based bounded LOF (BLOF) method for large-scale traffic video data in Hong Kong. A dimension reduction by principal component analysis (PCA) was accomplished on the spatial-temporal traffic signals. Previously, a density-based local outlier factor (LOF) method on a two-dimensional (2D) PCAproceeded spatial plane was performed. In this paper, a threedimensional (3D) PCA-proceeded spatial space for the classical density-based OD is firstly compared with the results from the 2D counterpart. In our experiments, the classical density-based LOF OD has been applied to the 3D PCA-proceeded data domain, which is new in literature, and compared to the previous 2D domain. The average DSRs has increased about 2% in the PM sessions: 91% (2D) and 93% (3D). Also, comparing the classical density-based LOF and the new BLOF OD methods, the average DSRs in the supervised approach has increased from 94% (LOF) to 96% (BLOF) for the AM sessions and from 93% (LOF) to 95% (BLOF) for the PM sessions.
IRJET- An Improvised Multi Focus Image Fusion Algorithm through QuadtreeIRJET Journal
The document proposes a new quadtree-based algorithm for multi-focus image fusion. The algorithm divides the input images into 4 equal blocks using a quadtree structure. It then further divides each block into smaller blocks and detects the focused regions in each block using a focus measure and weighted values. The small blocks are then fused using a modified Laplacian mechanism. The fused image is evaluated using SSIM and ESSIM values, which indicate the proposed algorithm performs better fusion than previous methods.
This document provides a summary of Md. Ariful Islam's background and qualifications. He is a PhD candidate in Computer Science at Stony Brook University focused on modeling, simulation, and formal verification of complex software and dynamical systems. He has extensive skills in various modeling, simulation, optimization, and programming languages and tools.
IRJET- Fusion based Brain Tumor DetectionIRJET Journal
1. The document discusses a method for detecting brain tumors using medical image fusion and support vector machines (SVM).
2. It involves fusing two MRI images using SVM to create a single fused image with more information than the original images. Texture and wavelet features are then extracted from the fused image.
3. The SVM classifier classifies the brain tumors as benign or malignant based on the trained and tested features extracted from the fused image.
Graph fusion of finger multimodal biometricsAnu Antony
Graph fusion technique i.e., weighted graph structure model to characterize the finger biometrics, and present the fusion frameworks for the trimodal images of a finger.
This document provides an overview of salient object detection techniques, including both traditional and deep learning-based methods. It discusses early models of saliency detection based on cognitive theories of human visual attention. Global contrast and diffusion-based methods for salient object detection are described. The use of fully convolutional neural networks for deep learning-based salient object detection is also covered. Both qualitative and quantitative comparisons of detection techniques are presented. The document concludes by noting improvements in recent models from including edge and context information, but that detection remains challenging across a variety of difficult image scenarios.
This document presents a new layout algorithm for visualizing communities in clustered social networks that integrates both structural and profile information. The algorithm (1) calculates dissimilarity matrices using profile and structural data, (2) performs multidimensional scaling to reflect node proximity, and (3) defines an interaction zone between communities. Experiments on Facebook, DBLP, and protein networks show it can identify important boundary nodes and observe community interactions. Future work includes extending the model to include viewpoints and applying it to real applications like marketing analysis.
Influence Analysis of Image Feature Selection TechniquesOver Deep Learning ModelIRJET Journal
This document discusses using different image feature selection techniques and their impact on deep learning models for image classification. It analyzes shape, color, texture, and combined features extracted from images using techniques like local binary patterns (LBP), grid color moments, and Sobel operators. A convolutional neural network (CNN) is used as the deep learning classifier. The performance is evaluated on a diabetic retinopathy detection dataset in terms of classification accuracy. The goal is to determine which feature selection techniques improve accuracy while minimizing computational resources when used with CNNs. A system is proposed that extracts individual features and combined features from images, then classifies them using CNNs to compare the impact of different feature selection approaches.
The document discusses using machine learning algorithms and supervised learning methods to develop an automated system for detecting nanoparticles and estimating their size and spatial distribution from scanning electron microscope images. The goal is to enable industrial-scale manufacturing of nanomaterials by applying quality control tools. Specifically, the research uses support vector machines and scale-invariant feature transform to extract features from images and classify pixels as nanorods or background in order to predict locations and dimensions of nanorods.
NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...ssuser4b1f48
This document presents GOAT, a scalable global transformer model for graph-structured data. GOAT uses a novel local attention module to absorb rich local information from node neighborhoods, in addition to a global attention mechanism that allows each node to attend to all other nodes. The document reports that GOAT achieves strong performance on large-scale homophilous and heterophilous node classification benchmarks, demonstrating its ability to leverage both local and global graph information for prediction tasks. Ablation studies on codebook size further indicate GOAT's effectiveness at modeling long-range interactions through its global attention.
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates the subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques used in the analysis.
Estimating project development effort using clustered regression approachcsandit
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a
challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the
complex and dynamic interaction of factors that impact software development. Heterogeneity
exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying
them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency
due to heterogeneity of the data. Using a clustered approach creates the subsets of data having
a degree of homogeneity that enhances prediction accuracy. It was also observed in this study
that ridge regression performs better than other regression techniques used in the analysis.
Deep learning optimization for drug-target interaction prediction in COVID-19...IJECEIAES
The exponentially increasing bioinformatics data raised a new problem: the computation time length. The amount of data that needs to be processed is not matched by an increase in hardware performance, so it burdens researchers on computation time, especially on drug-target interaction prediction, where the computational complexity is exponential. One of the focuses of high-performance computing research is the utilization of the graphics processing unit (GPU) to perform multiple computations in parallel. This study aims to see how well the GPU performs when used for deep learning problems to predict drug-target interactions. This study used the gold-standard data in drug-target interaction (DTI) and the coronavirus disease (COVID-19) dataset. The stages of this research are data acquisition, data preprocessing, model building, hyperparameter tuning, performance evaluation and COVID-19 dataset testing. The results of this study indicate that the use of GPU in deep learning models can speed up the training process by 100 times. In addition, the hyperparameter tuning process is also greatly helped by the presence of the GPU because it can make the process up to 55 times faster. When tested using the COVID-19 dataset, the model showed good performance with 76% accuracy, 74% F-measure and a speed-up value of 179.
This document describes Visinets, a web-based software tool for pathway modeling and dynamic visualization. The tool uses causal mapping (CMAP) as its mathematical approach, which graphically represents biological networks and interactions. The authors tested Visinets by: 1) building an executable EGFR-MAPK pathway model using its graphical modeling interface; and 2) translating an existing ODE-based insulin signaling model into CMAP format. The testing confirmed CMAP's potential for broad pathway modeling and visualization applications. Visinets offers pathway analysis and dynamic simulation in real time through its web-based graphical interface, providing an alternative for biomedical research.
IRJET - Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence M...IRJET Journal
This document presents a method for classifying plant leaf diseases from color images using texture and color features extracted from the images along with an artificial neural network classifier. The proposed system first preprocesses the input images, then extracts color features like mean and standard deviation of HSV color space and texture features like energy, contrast, homogeneity and correlation using a gray level co-occurrence matrix. These features are then used to train a backpropagation neural network classifier to automatically classify test images into disease categories. Experimental results show the backpropagation network provides high accuracy for plant disease classification, with 97.2% accuracy on validation data and lower error rates than support vector machines.
IRJET- Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence Ma...IRJET Journal
This document presents a method for classifying plant leaf diseases from color images using texture and color features. The proposed system first preprocesses input images, then extracts features like color (mean, standard deviation of HSV channels) and texture (energy, contrast, homogeneity, correlation from GLCM). These features are used to train a backpropagation neural network classifier. The system was tested on images of six plant diseases and showed minimum training error and good classification accuracy. This automated approach could help inexperienced farmers and experts more accurately diagnose plant diseases.
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...IRJET Journal
This document proposes a fusion method for tag-based image reranking and similarity finding based on topic diversity. It focuses on retrieving images using a two-phase online and offline process. In the online phase, images are processed using tag graph construction, community detection, community ranking, and image similarity ranking. It then merges topic diversity ranking and image similarity ranking to create a fusion model for retrieving images. In the offline phase, query keywords are given to the fusion model to retrieve hierarchical images. The goal is to promote topic coverage performance and leverage both relevance and diversity of search results.
IRJET - Symmetric Image Registration based on Intensity and Spatial Informati...IRJET Journal
This document presents a proposed system for symmetric image registration based on intensity and spatial information using a technique called the Coloured Simple Algebraic Algorithm (CSAA). The system first preprocesses color images, extracts features, then classifies images as symmetric or asymmetric using a neural network. It is shown to provide accurate and robust registration of medical and biomedical images. The system is implemented and evaluated on sample images, demonstrating it can successfully identify symmetric versus asymmetric images. The proposed approach aims to improve on existing techniques for intensity-based image registration tasks.
A simplified predictive framework for cost evaluation to fault assessment usi...IJECEIAES
Software engineering is an integral part of any software development scheme which frequently encounters bugs, errors, and faults. Predictive evaluation of software fault contributes towards mitigating this challenge to a large extent; however, there is no benchmarked framework being reported in this case yet. Therefore, this paper introduces a computational framework of the cost evaluation method to facilitate a better form of predictive assessment of software faults. Based on lines of code, the proposed scheme deploys adopts a machine-learning approach to address the perform predictive analysis of faults. The proposed scheme presents an analytical framework of the correlation-based cost model integrated with multiple standards machine learning (ML) models, e.g., linear regression, support vector regression, and artificial neural networks (ANN). These learning models are executed and trained to predict software faults with higher accuracy. The study considers assessing the outcomes based on error-based performance metrics in detail to determine how well each learning model performs and how accurate it is at learning. It also looked at the factors contributing to the training loss of neural networks. The validation result demonstrates that, compared to logistic regression and support vector regression, neural network achieves a significantly lower error score for software fault prediction.
Traffic Outlier Detection by Density-Based Bounded Local Outlier FactorsITIIIndustries
Outlier detection (OD) is widely used in many fields, such as finance, information and medicine, in cleaning up datasets and keeping the useful information. In a traffic system, it alerts the transport department and drivers with abnormal traffic situations such as congestion and traffic accident. This paper presents a density-based bounded LOF (BLOF) method for large-scale traffic video data in Hong Kong. A dimension reduction by principal component analysis (PCA) was accomplished on the spatial-temporal traffic signals. Previously, a density-based local outlier factor (LOF) method on a two-dimensional (2D) PCAproceeded spatial plane was performed. In this paper, a threedimensional (3D) PCA-proceeded spatial space for the classical density-based OD is firstly compared with the results from the 2D counterpart. In our experiments, the classical density-based LOF OD has been applied to the 3D PCA-proceeded data domain, which is new in literature, and compared to the previous 2D domain. The average DSRs has increased about 2% in the PM sessions: 91% (2D) and 93% (3D). Also, comparing the classical density-based LOF and the new BLOF OD methods, the average DSRs in the supervised approach has increased from 94% (LOF) to 96% (BLOF) for the AM sessions and from 93% (LOF) to 95% (BLOF) for the PM sessions.
IRJET- An Improvised Multi Focus Image Fusion Algorithm through QuadtreeIRJET Journal
The document proposes a new quadtree-based algorithm for multi-focus image fusion. The algorithm divides the input images into 4 equal blocks using a quadtree structure. It then further divides each block into smaller blocks and detects the focused regions in each block using a focus measure and weighted values. The small blocks are then fused using a modified Laplacian mechanism. The fused image is evaluated using SSIM and ESSIM values, which indicate the proposed algorithm performs better fusion than previous methods.
This document provides a summary of Md. Ariful Islam's background and qualifications. He is a PhD candidate in Computer Science at Stony Brook University focused on modeling, simulation, and formal verification of complex software and dynamical systems. He has extensive skills in various modeling, simulation, optimization, and programming languages and tools.
IRJET- Fusion based Brain Tumor DetectionIRJET Journal
1. The document discusses a method for detecting brain tumors using medical image fusion and support vector machines (SVM).
2. It involves fusing two MRI images using SVM to create a single fused image with more information than the original images. Texture and wavelet features are then extracted from the fused image.
3. The SVM classifier classifies the brain tumors as benign or malignant based on the trained and tested features extracted from the fused image.
Graph fusion of finger multimodal biometricsAnu Antony
Graph fusion technique i.e., weighted graph structure model to characterize the finger biometrics, and present the fusion frameworks for the trimodal images of a finger.
This document provides an overview of salient object detection techniques, including both traditional and deep learning-based methods. It discusses early models of saliency detection based on cognitive theories of human visual attention. Global contrast and diffusion-based methods for salient object detection are described. The use of fully convolutional neural networks for deep learning-based salient object detection is also covered. Both qualitative and quantitative comparisons of detection techniques are presented. The document concludes by noting improvements in recent models from including edge and context information, but that detection remains challenging across a variety of difficult image scenarios.
This document presents a new layout algorithm for visualizing communities in clustered social networks that integrates both structural and profile information. The algorithm (1) calculates dissimilarity matrices using profile and structural data, (2) performs multidimensional scaling to reflect node proximity, and (3) defines an interaction zone between communities. Experiments on Facebook, DBLP, and protein networks show it can identify important boundary nodes and observe community interactions. Future work includes extending the model to include viewpoints and applying it to real applications like marketing analysis.
Influence Analysis of Image Feature Selection TechniquesOver Deep Learning ModelIRJET Journal
This document discusses using different image feature selection techniques and their impact on deep learning models for image classification. It analyzes shape, color, texture, and combined features extracted from images using techniques like local binary patterns (LBP), grid color moments, and Sobel operators. A convolutional neural network (CNN) is used as the deep learning classifier. The performance is evaluated on a diabetic retinopathy detection dataset in terms of classification accuracy. The goal is to determine which feature selection techniques improve accuracy while minimizing computational resources when used with CNNs. A system is proposed that extracts individual features and combined features from images, then classifies them using CNNs to compare the impact of different feature selection approaches.
The document discusses using machine learning algorithms and supervised learning methods to develop an automated system for detecting nanoparticles and estimating their size and spatial distribution from scanning electron microscope images. The goal is to enable industrial-scale manufacturing of nanomaterials by applying quality control tools. Specifically, the research uses support vector machines and scale-invariant feature transform to extract features from images and classify pixels as nanorods or background in order to predict locations and dimensions of nanorods.
Similar to NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks: methods, applications and evaluations", Bioinformatics 2020 (20)
NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...ssuser4b1f48
This document presents GOAT, a scalable global transformer model for graph-structured data. GOAT uses a novel local attention module to absorb rich local information from node neighborhoods, in addition to a global attention mechanism that allows each node to attend to all other nodes. The document reports that GOAT achieves strong performance on large-scale homophilous and heterophilous node classification benchmarks, demonstrating its ability to leverage both local and global graph information for prediction tasks. Ablation studies on codebook size further indicate GOAT's effectiveness at modeling long-range interactions through its global attention.
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...ssuser4b1f48
This document summarizes the Cluster-GCN method for training graph convolutional networks (GCNs) in a memory-efficient and scalable way. The key contributions of Cluster-GCN are that it achieves the best memory usage for training GCNs on large graphs, especially deep GCNs, while maintaining training speed comparable to or faster than existing methods. Experimental results demonstrate that Cluster-GCN can efficiently train very deep GCNs on large graphs and achieve state-of-the-art performance.
This document summarizes a research paper on Gate Graph Sequence Neural Networks (GGSNN). GGSNN is a model that incorporates time dependencies and higher-order relationships in graphs using GRU-based methods. It generates an output sequence to allow for graph-level analysis. The model can be used for a wide range of tasks involving logical formulas. It uses GRU to compute slopes via backpropagation over time, allowing it to capture long-term dependencies between output time steps. Node representations in GGSNN can be updated over time using label data, unlike previous graph neural networks.
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...ssuser4b1f48
1) The document proposes a deep learning framework called DeepLGF to predict drug-drug interactions by combining local and global feature extraction from biomedical knowledge graphs.
2) DeepLGF uses graph neural networks and knowledge graph embedding methods to extract local drug features from chemical structures and biological functions, and global features from the relationships between drugs and other biological entities.
3) Experimental results on prediction tasks using several drug interaction datasets demonstrate that DeepLGF outperforms other state-of-the-art models and has promising applications in drug development and clinical use.
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...ssuser4b1f48
1. The document summarizes the GraphSAGE framework for inductive node embedding proposed by Hamilton et al.
2. GraphSAGE leverages node features to learn an embedding function that generalizes to unseen nodes using a sample and aggregate approach.
3. Across citation, Reddit, and other datasets, GraphSAGE improves classification F1-scores by 51% on average compared to using node features alone and outperforms strong baselines.
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...ssuser4b1f48
This document proposes a new self-supervised learning framework called Relational Graph Representation Learning (RGRL). RGRL aims to learn node representations that preserve relationships between nodes even after augmentation. It does this by focusing training on low-degree nodes and using both global and local contexts to sample anchor nodes. Experiments on 14 real-world datasets show RGRL outperforms previous methods on tasks like node classification and link prediction.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks: methods, applications and evaluations", Bioinformatics 2020
1. Hyo Eun Lee
Network Science Lab
Dept. of Biotechnology
The Catholic University of Korea
E-mail: gydnsml@gmail.com
2023.06.28
Bioinformatics 2020
2. 1
Introduction
• Motivation and tasks
• Biomedical graph
• Purpose
Method
• Graph embedding methods
• Application of Graph embedding on biomedical network
Result
• Dataset and experimental set-up
• Link prediction / Node classification results
• Influence of hyperparameters
Discussion and Conclusion
3. 2
Motivation
• Graph embedding is underutilized in
biomedical networks
• Graph embedding in biomedical networks
can help uncover potential discoveries
1. Introduction
Tasks
: Biomedical Link Prediction Tasks
, Node Classification Tasks
• Biomedical Link Prediction Tasks
DDA(Drug-Disease Association)
DDI(Drug-Drug Interaction)
PPI(Protein - Protein Interaction)
• Node Classification Tasks
Medical term semantic type
Protein function prediction
4. 3
1. Introduction
Biomedical graph
• Graph
node ∶ biomedical entities
edge ∶ relations
• Effects of graph analyzing
: DDA-based prediction of potential drug indications and clinical decision support
, Detecting lncRNA function
• Embedding Method : Automatically learn a low-dimensional future representation
- Method of Preserving Structural Information of graph
- Can be used for downstream tasks
5. 4
Purpose
1. Investigate the potential of embedding an advances graph
2. Links Prediction serves 3 critical biomedical applications
3. Formalizing semantic classification of medical terms and classifying them using embedding techniques
4. Suggest proper embedding method and hyperparameter settings for each task
Fill in this black
Fig. 1. Pipeline for applying graph embedding methods to biomedical tasks. Low-dimensional node representations are first learned from biomedical networks by graph embedding methods and then used as features to build specific classifiers for different tasks. For
(a) matrix factorization-based methods, they use a data matrix (e.g. adjacency matrix) as the input to learn embeddings through matrix factorization. For (b) random walk-based methods, they first generate sequences of nodes through random walks and then feed the
sequences into the word2vec model (Mikolov et al., 2013) to learn node representations. For (c) neural network-based methods, their architectures and inputs vary from different models (see Section 2 for details)
1. Introduction
6. 5
2. Method
Graph embedding methods
• 11 Embedding Methods
Type : MF(5) / Random Walk(3)
/ Neural Network(3)
Fill in this black
7. 6
2. Method
Graph embedding methods
• First-order proximity
: Based on direct connections between two objects
(local)
• Second-order proximity
: Considers indirect connections between objects
(global)
• High-order proximity
: Consider the neighbors of neighbors
Fill in this black
8. 7
Graph embedding methods
• (a) MF-based methods
- Factorize a data metric into a low-dimensional vector
- Preserves hidden manifold structure and topological properties
2. Method
HOPE GraRep
9. 8
Graph embedding methods
• (b) Random walk-based methods
- Create a node sequence to learning node representations
2. Method
Deep Walk node2vec
struc2vec
10. 9
Graph embedding methods
• (c) Neural Network -based methods
- Different methods use different architectures and information inputs
2. Method
LINE SDNE GAE
11. 10
2. Method
Application of Graph embedding on biomedical network
• 3 biomedical link prediction(DDA, DDI,
PPI) and node classifications
Type : Link prediction(3)
, Node classification(2)
Fill in this black
12. 11
• 1) Link prediction
- Predicting potential interactions based on
biomedical entities and unknown interactions
2. Method
Formalize
• Traditional methods
: Use biological feature structures, gene ontology, graph properties
→ Problem 1. Difficult to apply and use biological features
2. Fit of bio-features
⇒ Use graph embedding methods to solve this problem
• Use supervised or semi-supervised graph inference models to make
predictions
Application of Graph embedding on biomedical network
13. 12
• 2) Node classification
- Protein function prediction, Medical terms classification
2. Method
Protein function prediction
• Real experiments are expensive
, so graph-based methods were introduced
Medical terms classification
• Models for using the growth of clinical text to improve
personalized care and aid judgment
• Medical terms (using UMLS data) and how to measure their co-
occurrence to overcome privacy concerns
Application of Graph embedding on biomedical network
Fig. 2. Illustration of (a) how medical term–term co-occurrence graph is
constructed and (b) node type classification in the graph. Our work
assumes that the graph is given as in Finlayson et al. (2014) and mainly
focuses on (b), i.e. testing various embedding methods on the
classification performancE
15. 14
3. Results
Dataset (7)
Link prediction Node classification
: DDA(2), DDI(1), PPI(1)
• DDA
- Validated association of
chemicals and disease pathways in CTDs
- Drug-disease relationship in NDF-RT in UMLS
• DDI
- Comprehensive data from DrugBank
• PPI
- Get Homo sapiens PPIs from STRING
: Term-Term Co-occurrence Graph(1), PPI(1)
• Refine data from stanford hospitals and clinics
using frequency of occurrence statistics
• PPI
- Using Meshup data and Node2vec
17. 16
3. Results
Experimental set-up
Link prediction
• Known interactions(Positive)
: 80% Training 20% Testing
• Unknown interactions(majority)
: Negative sampling
• Evaluation: ROC curve (AUC), accuracy, F1 score
Node classification
• Training by embedding the entire graph
information
• Nodes with label information
: 80% Training 20% Testing
• Evaluation : F1(Percentage) Micro/Macro
• Dimension setting: 100
Use grid search to tune 1-2 critical hyperparameters
18. 17
3. Results
Link prediction results
Note: Due to the limited space, we only show the AUC value. Other evaluation metrics can be found in Supplementary Material. The best performing method in each category is in bold.
19. 18
3. Results
Link prediction results
Fig. 3. (a) Comparison with the state-of-the-arts for drug-disease association prediction (LRSSL) (Liang et al., 2017); (b) drug–drug interaction prediction (DeepDDI) (Ryu et al., 2018) and (c) gene (protein)
function prediction (Mashup) (Cho et al., 2016). Same as Mashup, we evaluate their performance on three-level human Biological Process (BP) gene annotations (each containing GO terms with 101–300, 31–
100 and 11–30 genes, respectively). As can be seen, in each task, general graph embedding methods achieve competitive performance against them
20. 19
3. Results
Node classification
Note: The best performing method in each category is in bold. a The source code of GAE provided by the authors does not support a large-scale graph (nodes>40k). We omit its performance on ‘Clini COOC’ here.
21. 20
3. Results
The influence of dimension
ig. 4. The influence of dimensionality on the performance and training time of different embedding methods based on ‘CTD DDA’ dataset
22. 21
3. Results
Fill in this black
Influence of hyperparameters
• Embedding dimensions effects prediction
performance and time efficiency
- When dimensionality exceeds 100,
performance saturates and
time cost increases rapidly.
Fig. 4. The influence of dimensionality on the performance and training time of different
embedding methods based on ‘CTD DDA’ dataset
24. 23
4. Discussion and Conclusion
Discussion
• Need for a comprehensive evaluation of graph embedding methods in biomedical networks
•
• Future research
: Exploring the use of graph embedding methods for various biomedical challenges
(such as gene expression analysis and disease diagnosis)
, Investigating the interpretability of graph embeddings and developing methods to
incorporate domain knowledge into the embedding process
• Emphasized the importance of open source tools and datasets, and the need to develop them
Conclusion
• Evaluate 11 graph embedding methods on 7 biomedical datasets
• Found that embedding methods performed well and the potential for future predictive work
• Provided guidance on setting hyperparameters and discussed potential directions for future work
Editor's Notes
동기현재까지 그래프 임베딩은 소셜 또는 단순한 바이오 인포메이션 네트워크에서 사용되었으며, 체계적 실험 및 분석 관련 바이오메디컬 네트워크에서는 사용되지 않고 있었다.
따라서 바이오 메디컬 네트워크에 이를 적용하면 잠재적인 발견을 할 수 있을 거다.
Task
: 이 논문에서는 11가지 임베딩 방법을 크게 2가지 테스크에 적용하는데, 각각 바이오메디컬 링크 프리딕션 테스크와 노드 클레시피케이션 테스크로 나뉜다. 세부적으로는 --.