This document describes a study that uses machine learning models to predict active compounds for lung cancer. Specifically:
1) A dataset of molecules was collected from the ChEMBL database and divided into active and non-active groups based on inhibition concentration values. Molecular descriptors were then calculated to encode the chemical structures.
2) Two machine learning models - a neural network and gradient boosting tree classifier - were trained on the molecular descriptors to predict compound activity. Feature selection was also performed to analyze important structural features.
3) The models accurately predicted active compounds for lung cancer based on quantitative structure-activity relationships. Comparative analysis identified important chemical structures contributing to compound effectiveness.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Graphical Model and Clustering-Regression based Methods for Causal Interactio...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex
because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast
Cancer at the early stage was to apply artificial intelligence where machines are simulated with
intelligence and programmed to think and act like a human. This allows machines to passively learn and
find a pattern, which can be used later to detect any new changes that may occur. In general, machine
learning is quite useful particularly in the medical field, which depends on complex genomic
measurements such as microarray technique and would increase the accuracy and precision of results.
With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper
treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast
Cancer diagnostic system using complex genomic analysis via microarray technology. The system will
combine two machine learning methods, K-means cluster, and linear regression.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
B. PHARM 6TH SEMESTER DRUG DESIGN. FACTORS, QSAR, DRUG DISCOVERY, DRUG DEVELOPMENT, VARIOUS APPROACHES FOR DRUG DESIGN, PARTITION COEFFICIENT, HAMMETS EQUATION, TAFTS STERIC PARAMETER, HANSCH ANALYSIS
DSAGLSTM-DTA: Prediction of Drug-Target Affinity using Dual Self-Attention an...mlaij
The research on affinity between drugs and targets (DTA) aims to effectively narrow the target search space for drug repurposing. Therefore, reasonable prediction of drug and target affinities can minimize the waste of resources such as human and material resources. In this work, a novel graph-based model called DSAGLSTM-DTA was proposed for DTA prediction. The proposed model is unlike previous graph-based drug-target affinity model, which incorporated self-attention mechanisms in the feature extraction process of drug molecular graphs to fully extract its effective feature representations. The features of each atom in the 2D molecular graph were weighted based on attention score before being aggregated as molecule representation and two distinct pooling architectures, namely centralized and distributed architectures were implemented and compared on benchmark datasets. In addition, in the course of processing protein sequences, inspired by the approach of protein feature extraction in GDGRU-DTA, we continue to interpret protein sequences as time series and extract their features using Bidirectional Long Short-Term Memory (BiLSTM) networks, since the context-dependence of long amino acid sequences. Similarly, DSAGLSTM-DTA also utilized a self-attention mechanism in the process of protein feature extraction to obtain comprehensive representations of proteins, in which the final hidden states for element in the batch were weighted with the each unit output of LSTM, and the results were represented as the final feature of proteins. Eventually, representations of drug and protein were concatenated and fed into prediction block for final prediction. The proposed model was evaluated on different regression datasets and binary classification datasets, and the results demonstrated that DSAGLSTM-DTA was superior to some state-ofthe-art DTA models and exhibited good generalization ability.
DSAGLSTM-DTA: PREDICTION OF DRUG-TARGET AFFINITY USING DUAL SELF-ATTENTION AN...mlaij
The research on affinity between drugs and targets (DTA) aims to effectively narrow the target search
space for drug repurposing. Therefore, reasonable prediction of drug and target affinities can minimize the
waste of resources such as human and material resources. In this work, a novel graph-based model called
DSAGLSTM-DTA was proposed for DTA prediction. The proposed model is unlike previous graph-based
drug-target affinity model, which incorporated self-attention mechanisms in the feature extraction process
of drug molecular graphs to fully extract its effective feature representations. The features of each atom in
the 2D molecular graph were weighted based on attention score before being aggregated as molecule
representation and two distinct pooling architectures, namely centralized and distributed architectures
were implemented and compared on benchmark datasets. In addition, in the course of processing protein
sequences, inspired by the approach of protein feature extraction in GDGRU-DTA, we continue to
interpret protein sequences as time series and extract their features using Bidirectional Long Short-Term
Memory (BiLSTM) networks, since the context-dependence of long amino acid sequences. Similarly,
DSAGLSTM-DTA also utilized a self-attention mechanism in the process of protein feature extraction to
obtain comprehensive representations of proteins, in which the final hidden states for element in the batch
were weighted with the each unit output of LSTM, and the results were represented as the final feature of
proteins. Eventually, representations of drug and protein were concatenated and fed into prediction block
for final prediction. The proposed model was evaluated on different regression datasets and binary
classification datasets, and the results demonstrated that DSAGLSTM-DTA was superior to some state-ofthe-art DTA models and exhibited good generalization ability.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Graphical Model and Clustering-Regression based Methods for Causal Interactio...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex
because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast
Cancer at the early stage was to apply artificial intelligence where machines are simulated with
intelligence and programmed to think and act like a human. This allows machines to passively learn and
find a pattern, which can be used later to detect any new changes that may occur. In general, machine
learning is quite useful particularly in the medical field, which depends on complex genomic
measurements such as microarray technique and would increase the accuracy and precision of results.
With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper
treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast
Cancer diagnostic system using complex genomic analysis via microarray technology. The system will
combine two machine learning methods, K-means cluster, and linear regression.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
B. PHARM 6TH SEMESTER DRUG DESIGN. FACTORS, QSAR, DRUG DISCOVERY, DRUG DEVELOPMENT, VARIOUS APPROACHES FOR DRUG DESIGN, PARTITION COEFFICIENT, HAMMETS EQUATION, TAFTS STERIC PARAMETER, HANSCH ANALYSIS
DSAGLSTM-DTA: Prediction of Drug-Target Affinity using Dual Self-Attention an...mlaij
The research on affinity between drugs and targets (DTA) aims to effectively narrow the target search space for drug repurposing. Therefore, reasonable prediction of drug and target affinities can minimize the waste of resources such as human and material resources. In this work, a novel graph-based model called DSAGLSTM-DTA was proposed for DTA prediction. The proposed model is unlike previous graph-based drug-target affinity model, which incorporated self-attention mechanisms in the feature extraction process of drug molecular graphs to fully extract its effective feature representations. The features of each atom in the 2D molecular graph were weighted based on attention score before being aggregated as molecule representation and two distinct pooling architectures, namely centralized and distributed architectures were implemented and compared on benchmark datasets. In addition, in the course of processing protein sequences, inspired by the approach of protein feature extraction in GDGRU-DTA, we continue to interpret protein sequences as time series and extract their features using Bidirectional Long Short-Term Memory (BiLSTM) networks, since the context-dependence of long amino acid sequences. Similarly, DSAGLSTM-DTA also utilized a self-attention mechanism in the process of protein feature extraction to obtain comprehensive representations of proteins, in which the final hidden states for element in the batch were weighted with the each unit output of LSTM, and the results were represented as the final feature of proteins. Eventually, representations of drug and protein were concatenated and fed into prediction block for final prediction. The proposed model was evaluated on different regression datasets and binary classification datasets, and the results demonstrated that DSAGLSTM-DTA was superior to some state-ofthe-art DTA models and exhibited good generalization ability.
DSAGLSTM-DTA: PREDICTION OF DRUG-TARGET AFFINITY USING DUAL SELF-ATTENTION AN...mlaij
The research on affinity between drugs and targets (DTA) aims to effectively narrow the target search
space for drug repurposing. Therefore, reasonable prediction of drug and target affinities can minimize the
waste of resources such as human and material resources. In this work, a novel graph-based model called
DSAGLSTM-DTA was proposed for DTA prediction. The proposed model is unlike previous graph-based
drug-target affinity model, which incorporated self-attention mechanisms in the feature extraction process
of drug molecular graphs to fully extract its effective feature representations. The features of each atom in
the 2D molecular graph were weighted based on attention score before being aggregated as molecule
representation and two distinct pooling architectures, namely centralized and distributed architectures
were implemented and compared on benchmark datasets. In addition, in the course of processing protein
sequences, inspired by the approach of protein feature extraction in GDGRU-DTA, we continue to
interpret protein sequences as time series and extract their features using Bidirectional Long Short-Term
Memory (BiLSTM) networks, since the context-dependence of long amino acid sequences. Similarly,
DSAGLSTM-DTA also utilized a self-attention mechanism in the process of protein feature extraction to
obtain comprehensive representations of proteins, in which the final hidden states for element in the batch
were weighted with the each unit output of LSTM, and the results were represented as the final feature of
proteins. Eventually, representations of drug and protein were concatenated and fed into prediction block
for final prediction. The proposed model was evaluated on different regression datasets and binary
classification datasets, and the results demonstrated that DSAGLSTM-DTA was superior to some state-ofthe-art DTA models and exhibited good generalization ability.
A radiology report serves as an intermediary between a radiologist and referring clinician for suggesting
appropriate treatment to the patients, aimed at better healthcare management. It is essentially a tool
that assists radiologists in conveying their input to the patients and clinicians regarding positive or negative findings on a case. The objective of this paper is to discuss and propose Radiology Information & Reporting System (RIRS), highlight challenges governing its implementation and suggest way forwards
towards its effective implementation across the public sector tertiary care institutions of Pakistan. In the end, it is concluded that the proposed RIRS would potentially offer enormous benefits in terms of cost
savings, reporting accuracy, faster processing and operational efficiency as opposed to the conventionally available manual radiology reporting procedures and systems.
Computer Added Drug Design is one of the latest technology of medicine world. This short slide will help you to know a little about CADD.If you want to know a vast plz go throw the reference book.
Bioinformatics role in Pharmaceutical industriesMuzna Kashaf
Bioinformatics plays a key role in the pharmaceutical industry by enabling target identification of diseases, rational drug design, compound refinement, and other processes. It facilitates identifying target diseases and compounds, detecting molecular bases of diseases, designing drugs, refining compounds, and testing drug solubility and effects. Bioinformatics supports various stages of drug development including formulation, crystallization determination, polymer modeling, and testing before human use. Its integration into the pharmaceutical industry supports drug discovery, healthcare advances, and realizing the promises of projects like the Human Genome Project.
Data Mining and Big Data Analytics in Pharma Ankur Khanna
The document proposes software solutions for drug research, including text mining, data warehousing, data mining, database development, and big data analytics. It discusses common challenges in drug research like the high costs and low success rates. It then describes various solutions like text mining patents and research to help identify new research opportunities and reduce duplication of efforts. It provides examples of how various pharmaceutical companies use data mining and warehousing techniques. Overall, the document pitches different IT solutions that can help pharmaceutical and life sciences companies address their research challenges and make their processes more efficient.
1) AI systems like Adam and Eve have discovered new scientific knowledge by autonomously generating and testing hypotheses about yeast genes using public databases and laboratory experiments.
2) AI is being applied throughout the drug development process, including target identification, compound design and synthesis, clinical trial optimization, and drug repurposing.
3) Partnerships between pharmaceutical companies and AI firms are exploring applications like generating new immuno-oncology treatments, metabolic disease therapies, and cancer treatments through large-scale data analysis.
New Drug Design & Discovery discusses the process of drug discovery and design. It begins with an introduction to how drugs work in the human body to modulate functions. The drug discovery process is then described as a long, expensive endeavor involving chemical synthesis, clinical development, and formulation. Computer-aided drug design uses molecular modeling and structure-based approaches to predict ligand-receptor binding and identify biological targets in silico. Combinatorial chemistry and high-throughput screening allow for the rapid synthesis and testing of large libraries of compounds. The goal is to develop more potent and safer drugs through these computational and high-throughput methods.
This document provides a summary of a student assignment on drug design and toxicology. It discusses several topics:
1) It outlines the drug design process and different types of drug design approaches, including ligand-based and structure-based design.
2) It discusses the importance of toxicology testing in drug development to evaluate safety. Topics covered include emerging safety biomarkers, establishing human first-dose levels, pathway analysis, and genomic biomarker usage.
3) It explores various dynamic QSAR techniques and their applications in drug design and toxicology, as well as examining ADME and toxicology relationships.
Cheminformatics plays a key role in modern drug discovery by helping chemists organize and analyze the vast amounts of chemical data being produced. It combines fields like chemistry, biology, and informatics to transform data into knowledge. Specifically, cheminformatics aids in tasks like identifying drug targets, finding lead compounds, optimizing leads, and conducting pre-clinical trials through methods such as high-throughput screening, structure-activity modeling, and predictive toxicity analysis. It also provides tools for tasks like drawing and searching chemical structures in databases.
Computer aided drug design (CADD) uses computer modeling to help design and discover new drug molecules. It involves designing molecules that are complementary in shape and charge to bind to a biomolecular target like a protein. This can help drugs activate or inhibit the target to produce therapeutic effects. CADD is not a direct route to new drugs but provides information to guide and coordinate drug discovery experiments in a more efficient manner. It is hoped CADD can help save time and money in the drug development process.
Ranadip Pal and his team won a prize in 2012 for using multiple types of 'omics data to more successfully predict drug sensitivity than other teams. Pal then improved on this model by combining five types of data, reducing uncertainty by over 40%. Scientists are increasingly seeing the value of integrated omics analyses to better understand cellular processes. A company called General Metabolics offers integrated omics services to analyze multiple data types from cell samples together, providing insights that individual analyses cannot. This allows researchers to gain a clearer understanding of cellular physiology.
This document discusses the debate between randomized clinical trials (RCTs) and observational studies using big data. While RCTs are better for minimizing bias, observational studies can include more patients and answer questions RCTs cannot. The document outlines several large cancer databases that can help learn from every patient, including SEER and NCDB registries. It describes how these databases are being enriched with additional data sources like EHRs, genomic data, and mobile devices. This evolving use of big data from numerous sources can improve outcomes by better understanding toxicity, costs, and quality of cancer care.
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...CSCJournals
Logistic Regression (LR) is a well known classification method in the field of statistical learning. It allows probabilistic classification and shows promising results on several benchmark problems. Logistic regression enables us to investigate the relationship between a categorical outcome and a set of explanatory variables. Artificial Neural Networks (ANNs) are popularly used as universal non-linear inference models and have gained extensive popularity in recent years. Research activities are considerable and literature is growing. The goal of this research work is to compare the performance of Logistic Regression and Neural Network models on publicly available medical datasets. The evaluation process of the model is as follows. The logistic regression and neural network methods with sensitivity analysis have been evaluated for the effectiveness of the classification. The Classification Accuracy is used to measure the performance of both the models. From the experimental results it is confirmed that the neural network model with sensitivity analysis model gives more efficient result.
The document discusses the applications of bioinformatics in drug discovery. It describes how bioinformatics supports computer-aided drug design through computational methods to simulate drug-receptor interactions. It also discusses how virtual high-throughput screening can identify compounds that strongly bind to protein targets. The document outlines the key steps in drug design, including identifying the disease target, studying lead compounds, rational drug design techniques, and testing drugs. It emphasizes that bioinformatics can predict important drug characteristics like absorption and toxicity to save costs during development.
Biotechnology And Chemical Weapons Controlguest971b1073
Biotechnology has the potential to both aid medical research and fuel chemical weapons proliferation. Advances like genomics, microarrays, proteomics, and combinatorial chemistry could be used to rationally design new drugs, but also new chemical weapons. This threatens the Chemical Weapons Convention by enabling covert development of novel agents from unscheduled precursors. Additionally, development of non-lethal chemical weapons could provide cover for lethal programs and erode norms against chemical weapons use. The Review Conference should address these challenges to strengthen the Convention.
Drug-induced liver injury (DILI) is one of the most important reasons for drug development failure at both pre-approval and post-approval stages. There has been increased interest in developing predictive in vivo, in vitro and in silico models to identify compounds that cause idiosyncratic hepatotoxicity. In the current study we applied machine learning, Bayesian modeling method with extended connectivity fingerprints and other interpretable descriptors. The model that was developed and internally validated (using a training set of 295 compounds) was then applied to a large test set relative to the training set (237 compounds) for external validation. The resulting concordance of 60%, sensitivity of 56%, and specificity of 67% were comparable to internal validation. The Bayesian model with ECFC_6 fingerprint and interpretable descriptors suggested several substructures that are chemically reactive and may also be important for DILI-causing compounds, e.g. ketones, diols and -methyl styrene type structures. Using SMARTS filters published by several pharmaceutical companies we evaluated whether such reactive substructures could be readily detected by any of the published filters. It was apparent that the most stringent filters used in this study, like the Abbott alerts which captures thiol traps and other compounds, may be of utility in identifying DILI-causing compounds (sensitivity 67%). A significant outcome of the present study is that we provide predictions for many compounds that cause DILI by using the knowledge we have available from previous studies for computational approaches. These computational models may represent a cost effective selection criteria prior to costly in vitro or in vivo experimental studies.
Bioinformatics plays an important role in drug discovery and development by enabling target identification, rational drug design, compound refinement, and other processes. Key applications of bioinformatics include virtual screening of large compound libraries to identify potential drug leads, homology modeling of protein structures to inform drug design, and similarity searches to find analogs of existing drug molecules. The overall drug development process involves studying the disease, identifying drug targets, designing compounds, testing and refining candidates, and conducting clinical trials. Computational techniques expedite many steps but experimental validation is still needed.
Exploiting drug targets for Immuno-Oncology drug discoveryVikram Rao
IO innovators are increasingly pursuing the CD molecules as targets for new therapies, for example in developing antibodies for immune modulation and cytotoxicity.
This requires a deep understanding of the gene’s role in cancer biology.
Find out how we can help you with understanding the relationship between CD genes and checkpoint inhibitors... exploring different approaches to patient response biomarkers, and prioritise novel drug targets.
Research trends in different pharmaceutical areas: Natural product chemistry
Imtiaj Hossain Chowdhury
B’Pharm (Jahangirnagar University), M’Pharm (Jahangirnagar University)
Master’s in Public Health (American International University Bangladesh)
drug delivery and formulation sciences in the most intelligent
way. This should be attained to fulfi l the ultimate goal for all
scientists to leave their experimental results all over the years
as footsteps for followers to walk on.
New regulations requiring toxicity data on chemicals and an increasing number of efforts to predict the likelihood of failure of molecules earlier in the drug discovery process are combining to increase the utilization of computational models to toxicity. The potential to predict human toxicity directly from a molecular structure is feasible. By using the experimental properties of known compounds as the basis of predictive models it is possible to develop structure activity relationships and resulting algorithms related to toxicity. Several examples have been published recently, including those for drug-induced liver injury (DILI), the pregnane X receptor, P450 3A4 time dependent inhibition, and transporters associated
with toxicities. The versatility and potential of using such models in drug discovery may be illustrated by increasing the efficiency of molecular screening and decreasing the number of animal studies. With more computational power available on increasingly smaller devices, as well as many collaborative initiatives to make data and toxicology models available, this may enable the development of mobile apps for predicting human toxicities, further increasing their utilization.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
More Related Content
Similar to Predicting active compounds for lung cancer based on quantitative structure-activity relationships
A radiology report serves as an intermediary between a radiologist and referring clinician for suggesting
appropriate treatment to the patients, aimed at better healthcare management. It is essentially a tool
that assists radiologists in conveying their input to the patients and clinicians regarding positive or negative findings on a case. The objective of this paper is to discuss and propose Radiology Information & Reporting System (RIRS), highlight challenges governing its implementation and suggest way forwards
towards its effective implementation across the public sector tertiary care institutions of Pakistan. In the end, it is concluded that the proposed RIRS would potentially offer enormous benefits in terms of cost
savings, reporting accuracy, faster processing and operational efficiency as opposed to the conventionally available manual radiology reporting procedures and systems.
Computer Added Drug Design is one of the latest technology of medicine world. This short slide will help you to know a little about CADD.If you want to know a vast plz go throw the reference book.
Bioinformatics role in Pharmaceutical industriesMuzna Kashaf
Bioinformatics plays a key role in the pharmaceutical industry by enabling target identification of diseases, rational drug design, compound refinement, and other processes. It facilitates identifying target diseases and compounds, detecting molecular bases of diseases, designing drugs, refining compounds, and testing drug solubility and effects. Bioinformatics supports various stages of drug development including formulation, crystallization determination, polymer modeling, and testing before human use. Its integration into the pharmaceutical industry supports drug discovery, healthcare advances, and realizing the promises of projects like the Human Genome Project.
Data Mining and Big Data Analytics in Pharma Ankur Khanna
The document proposes software solutions for drug research, including text mining, data warehousing, data mining, database development, and big data analytics. It discusses common challenges in drug research like the high costs and low success rates. It then describes various solutions like text mining patents and research to help identify new research opportunities and reduce duplication of efforts. It provides examples of how various pharmaceutical companies use data mining and warehousing techniques. Overall, the document pitches different IT solutions that can help pharmaceutical and life sciences companies address their research challenges and make their processes more efficient.
1) AI systems like Adam and Eve have discovered new scientific knowledge by autonomously generating and testing hypotheses about yeast genes using public databases and laboratory experiments.
2) AI is being applied throughout the drug development process, including target identification, compound design and synthesis, clinical trial optimization, and drug repurposing.
3) Partnerships between pharmaceutical companies and AI firms are exploring applications like generating new immuno-oncology treatments, metabolic disease therapies, and cancer treatments through large-scale data analysis.
New Drug Design & Discovery discusses the process of drug discovery and design. It begins with an introduction to how drugs work in the human body to modulate functions. The drug discovery process is then described as a long, expensive endeavor involving chemical synthesis, clinical development, and formulation. Computer-aided drug design uses molecular modeling and structure-based approaches to predict ligand-receptor binding and identify biological targets in silico. Combinatorial chemistry and high-throughput screening allow for the rapid synthesis and testing of large libraries of compounds. The goal is to develop more potent and safer drugs through these computational and high-throughput methods.
This document provides a summary of a student assignment on drug design and toxicology. It discusses several topics:
1) It outlines the drug design process and different types of drug design approaches, including ligand-based and structure-based design.
2) It discusses the importance of toxicology testing in drug development to evaluate safety. Topics covered include emerging safety biomarkers, establishing human first-dose levels, pathway analysis, and genomic biomarker usage.
3) It explores various dynamic QSAR techniques and their applications in drug design and toxicology, as well as examining ADME and toxicology relationships.
Cheminformatics plays a key role in modern drug discovery by helping chemists organize and analyze the vast amounts of chemical data being produced. It combines fields like chemistry, biology, and informatics to transform data into knowledge. Specifically, cheminformatics aids in tasks like identifying drug targets, finding lead compounds, optimizing leads, and conducting pre-clinical trials through methods such as high-throughput screening, structure-activity modeling, and predictive toxicity analysis. It also provides tools for tasks like drawing and searching chemical structures in databases.
Computer aided drug design (CADD) uses computer modeling to help design and discover new drug molecules. It involves designing molecules that are complementary in shape and charge to bind to a biomolecular target like a protein. This can help drugs activate or inhibit the target to produce therapeutic effects. CADD is not a direct route to new drugs but provides information to guide and coordinate drug discovery experiments in a more efficient manner. It is hoped CADD can help save time and money in the drug development process.
Ranadip Pal and his team won a prize in 2012 for using multiple types of 'omics data to more successfully predict drug sensitivity than other teams. Pal then improved on this model by combining five types of data, reducing uncertainty by over 40%. Scientists are increasingly seeing the value of integrated omics analyses to better understand cellular processes. A company called General Metabolics offers integrated omics services to analyze multiple data types from cell samples together, providing insights that individual analyses cannot. This allows researchers to gain a clearer understanding of cellular physiology.
This document discusses the debate between randomized clinical trials (RCTs) and observational studies using big data. While RCTs are better for minimizing bias, observational studies can include more patients and answer questions RCTs cannot. The document outlines several large cancer databases that can help learn from every patient, including SEER and NCDB registries. It describes how these databases are being enriched with additional data sources like EHRs, genomic data, and mobile devices. This evolving use of big data from numerous sources can improve outcomes by better understanding toxicity, costs, and quality of cancer care.
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...CSCJournals
Logistic Regression (LR) is a well known classification method in the field of statistical learning. It allows probabilistic classification and shows promising results on several benchmark problems. Logistic regression enables us to investigate the relationship between a categorical outcome and a set of explanatory variables. Artificial Neural Networks (ANNs) are popularly used as universal non-linear inference models and have gained extensive popularity in recent years. Research activities are considerable and literature is growing. The goal of this research work is to compare the performance of Logistic Regression and Neural Network models on publicly available medical datasets. The evaluation process of the model is as follows. The logistic regression and neural network methods with sensitivity analysis have been evaluated for the effectiveness of the classification. The Classification Accuracy is used to measure the performance of both the models. From the experimental results it is confirmed that the neural network model with sensitivity analysis model gives more efficient result.
The document discusses the applications of bioinformatics in drug discovery. It describes how bioinformatics supports computer-aided drug design through computational methods to simulate drug-receptor interactions. It also discusses how virtual high-throughput screening can identify compounds that strongly bind to protein targets. The document outlines the key steps in drug design, including identifying the disease target, studying lead compounds, rational drug design techniques, and testing drugs. It emphasizes that bioinformatics can predict important drug characteristics like absorption and toxicity to save costs during development.
Biotechnology And Chemical Weapons Controlguest971b1073
Biotechnology has the potential to both aid medical research and fuel chemical weapons proliferation. Advances like genomics, microarrays, proteomics, and combinatorial chemistry could be used to rationally design new drugs, but also new chemical weapons. This threatens the Chemical Weapons Convention by enabling covert development of novel agents from unscheduled precursors. Additionally, development of non-lethal chemical weapons could provide cover for lethal programs and erode norms against chemical weapons use. The Review Conference should address these challenges to strengthen the Convention.
Drug-induced liver injury (DILI) is one of the most important reasons for drug development failure at both pre-approval and post-approval stages. There has been increased interest in developing predictive in vivo, in vitro and in silico models to identify compounds that cause idiosyncratic hepatotoxicity. In the current study we applied machine learning, Bayesian modeling method with extended connectivity fingerprints and other interpretable descriptors. The model that was developed and internally validated (using a training set of 295 compounds) was then applied to a large test set relative to the training set (237 compounds) for external validation. The resulting concordance of 60%, sensitivity of 56%, and specificity of 67% were comparable to internal validation. The Bayesian model with ECFC_6 fingerprint and interpretable descriptors suggested several substructures that are chemically reactive and may also be important for DILI-causing compounds, e.g. ketones, diols and -methyl styrene type structures. Using SMARTS filters published by several pharmaceutical companies we evaluated whether such reactive substructures could be readily detected by any of the published filters. It was apparent that the most stringent filters used in this study, like the Abbott alerts which captures thiol traps and other compounds, may be of utility in identifying DILI-causing compounds (sensitivity 67%). A significant outcome of the present study is that we provide predictions for many compounds that cause DILI by using the knowledge we have available from previous studies for computational approaches. These computational models may represent a cost effective selection criteria prior to costly in vitro or in vivo experimental studies.
Bioinformatics plays an important role in drug discovery and development by enabling target identification, rational drug design, compound refinement, and other processes. Key applications of bioinformatics include virtual screening of large compound libraries to identify potential drug leads, homology modeling of protein structures to inform drug design, and similarity searches to find analogs of existing drug molecules. The overall drug development process involves studying the disease, identifying drug targets, designing compounds, testing and refining candidates, and conducting clinical trials. Computational techniques expedite many steps but experimental validation is still needed.
Exploiting drug targets for Immuno-Oncology drug discoveryVikram Rao
IO innovators are increasingly pursuing the CD molecules as targets for new therapies, for example in developing antibodies for immune modulation and cytotoxicity.
This requires a deep understanding of the gene’s role in cancer biology.
Find out how we can help you with understanding the relationship between CD genes and checkpoint inhibitors... exploring different approaches to patient response biomarkers, and prioritise novel drug targets.
Research trends in different pharmaceutical areas: Natural product chemistry
Imtiaj Hossain Chowdhury
B’Pharm (Jahangirnagar University), M’Pharm (Jahangirnagar University)
Master’s in Public Health (American International University Bangladesh)
drug delivery and formulation sciences in the most intelligent
way. This should be attained to fulfi l the ultimate goal for all
scientists to leave their experimental results all over the years
as footsteps for followers to walk on.
New regulations requiring toxicity data on chemicals and an increasing number of efforts to predict the likelihood of failure of molecules earlier in the drug discovery process are combining to increase the utilization of computational models to toxicity. The potential to predict human toxicity directly from a molecular structure is feasible. By using the experimental properties of known compounds as the basis of predictive models it is possible to develop structure activity relationships and resulting algorithms related to toxicity. Several examples have been published recently, including those for drug-induced liver injury (DILI), the pregnane X receptor, P450 3A4 time dependent inhibition, and transporters associated
with toxicities. The versatility and potential of using such models in drug discovery may be illustrated by increasing the efficiency of molecular screening and decreasing the number of animal studies. With more computational power available on increasingly smaller devices, as well as many collaborative initiatives to make data and toxicology models available, this may enable the development of mobile apps for predicting human toxicities, further increasing their utilization.
Similar to Predicting active compounds for lung cancer based on quantitative structure-activity relationships (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMHODECEDSIET
Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems.
### How TDM Works
1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal.
2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned.
3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver.
4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots.
### Types of TDM
1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data.
2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel.
### Applications of TDM
- **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot.
- **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth.
- **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium.
### Advantages of TDM
- **Efficient Use of Bandwidth**: TDM all
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Predicting active compounds for lung cancer based on quantitative structure-activity relationships
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 13, No. 5, October 2023, pp. 5755~5763
ISSN: 2088-8708, DOI: 10.11591/ijece.v13i5.pp5755-5763 5755
Journal homepage: http://ijece.iaescore.com
Predicting active compounds for lung cancer based on
quantitative structure-activity relationships
Hamza Hanafi1
, Badr Dine Rossi Hassani2
, M’hamed Aït Kbir1
1
Intelligent Automation Team, STI Doctoral Center, Abdelmalek Essaadi University, Tangier, Morocco
2
LABIPHABE Laboratory, STI Doctoral Center, Abdelmalek Essaadi University, Tangier, Morocco
Article Info ABSTRACT
Article history:
Received Jan 18, 2023
Revised Apr 7, 2023
Accepted Apr 15, 2023
Recently, advancements in computational and artificial intelligence (AI)
methods have contributed in improving research results in the field of drug
discovery. In fact, machine learning techniques have proven to be especially
effective in this regard, aiding in the development of new drug variants and
enabling more precise targeting of specific disease mechanisms. In this paper,
we propose to use a quantitative structure-activity relationship-based
approach for predicting active compounds related to non-small cell lung
cancer. Our approach uses a neural network classifier that learns from
sequential structures and chemical properties of molecules, as well as a
gradient boosting tree classifier to conduct comparative analysis. To evaluate
the contribution of each feature, we employ Shapley additive explanations
(SHAP) summary plots to perform features selection. Our approach involves
a dataset of active and non-active molecules collected from ChEMBL
database. Our results show the effectiveness of the proposed approach when
it comes to predicting accurately active compounds for lung cancer.
Furthermore, our comparative analysis reveals important chemical structures
that contribute to the effectiveness of the compounds. Thus, the proposed
approach can greatly enhance the drug discovery pipeline and may lead to the
development of new and effective treatments for lung cancer.
Keywords:
Bioinformatics
Diagnosis of cancer
Drug discovery
Drug target
Lung cancer
This is an open access article under the CC BY-SA license.
Corresponding Author:
Hamza Hanafi
Intelligent Automation Team, STI Doctoral Studies Center, Abdelmalek Essaadi University
Tangier, Morocco
Email: hamzahanafi1@gmail.com/hamza.hanafi@etu.uae.ac.ma
1. INTRODUCTION
Research works conducted in the field of drug discovery are important and contribute to the
improvement of healthcare quality. Developing a new drug is a long and complex process that relies on
the translation of a new molecular target into a proven therapy with efficient results. Drug discovery is one of
the most outstanding scientific tasks. Advances in computational biology have broadly improved drug
discovery pipelines. Classical methods directed towards this goal are time-consuming and expensive [1].
Therapeutic studies are crucial for designing new drugs for the benefit of patients, as well as for public health
reasons [2].
Nowadays, computational techniques have expanded their focus and greatly improved pipelines in the
field of pharmacological medicine, as they have demonstrated successful results compared to traditional
methods. Moreover, the remarkable amount of biological data publicly available and carefully stored in
repositories has enabled researchers to explore numerous computational-based methodologies. Predictive
modeling is one of the most widely applied techniques to enhance drug discovery pipelines. Machine learning
(ML) techniques can be utilized to construct models that effectively classify drugs into relevant therapeutic
2. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 5, October 2023: 5755-5763
5756
categories and accurately detect and classify various stages of tumors [3], [4]. Additionally, ML methods can
be used to design new drugs based on the chemical properties of studied molecules [5].
Quantitative structure-activity relationship (QSAR) methods are techniques that apply ML in order to
learn from the relationships among the chemical structure and the biological activity [6] of molecules,
additionally, they have helped to establish an empirical statistical model for the computational chemistry toolkit
[7]. The chemical structure of molecules is subject of calculations of molecular descriptors that describe
essentially the physical and chemical properties that distinguish one molecule from another. QSAR-based
models can provide insights on which chemical properties are important to inhibit a biological process. Such
information will be of great interest to biologists and chemists in their design of future molecules in order to
have more robust properties.
Bioinformatic methods have successfully enabled researchers to study molecules from a system level
perspective. It uses computational processes to integrate knowledge and expertise from genomics, proteomics,
transcriptomics, population genetics, and molecular phylogenetics. Bioinformatic analysis has enhanced drug
target identification and drug candidate screening. Moreover, it facilitates predictions of drug resistance,
minimized side effects, and has become more essential in drug discovery [8]. Thus, numerous ML-based
algorithms have been proposed to predict interactions among biological entities, as well as to design new drugs
with similar properties for specific medical treatments. One of the main challenges to build an efficient ML
classifier is the absence of good quality data. In fact, the available biological data is heterogeneous and requires
a preprocessing step before initiating the training process of the ML models. Moreover, in cancer classification
problems, most of the available datasets are imbalanced; as there are extensively more non-active molecules
than active ones [9].
Our contribution is to build a classifier able to predict active compounds for lung cancer. We inferred
active and non-active molecules from ChEMBL database to constitute our dataset. We computed fingerprints
descriptors of collected molecules to learn from them. We took full advantage of the chemical characteristics
and the structure of the molecules to build a sequential neural network model. Furthermore, we conducted a
comparative study between the Multilayer perceptron neural network and the gradient boosting tree classifiers
to analyze features contribution for each model in order to identify important chemical structures of active
molecules for lung cancer.
This paper is a part of a series of research carried out by a team of our laboratory, interested to
exploring biological data using datamining machine learning tools [10]–[12]. The paper is organized as: in
section 2 we present some related works, our approach is presented in section 3. The obtained results are discussed
in section 4 and finally the conclusion.
2. RELATED WORK
Experiments to identify new therapeutic targets are aimed at investigating novel molecules and
improve bioavailability of drugs. Traditional methods applied in drug discovery rely on the physical and
chemical structure of the studied molecules. Genome-wide association studies (GWAS) screen a large number
of genomes to identify associations between genetic variants and non-disease traits [13]. This approach has
been widely used to identify single nucleotide polymorphisms (SNPs) associated with diseases and greatly
improved our understanding of biological processes [14]. Identification of drug target sites is another
methodology that many studies rely on. It refers to the discovery of interactions among diverse compounds and
protein targets in the human body. Lee et al. [15] experimentally demonstrated that the duration of in vivo
drug-target binding is highly affected by the drug-target resistance. In another study on targeted therapies for
lung cancer predictions, Larsen et al. [16] suggested to integrate genome-wide tumor analysis along with
drug-targeted responsive phenotypes to investigate new therapeutic strategies. This approach requires further
knowledge on the binding sites. Moreover, it involves prior knowledge of related pathways to develop effective
targeted therapeutics.
Structure-based approaches have significantly enhanced virtual screening, de novo design, and lead
optimization [17], [18], based on the availability of ligand structures. On this subject, Almeida et al. [19] used
multiple ligand-based virtual screening approaches to investigate novel potential MARK-3 Inhibitors in cancer.
Similarly, Li et al. [20] suggested a nanoparticle-mediated targeted drug as a novel therapeutic for
hepatocellular carcinoma using ligands that recognize hepatoma cells. The main disadvantage of this approach
is that it cannot be used in situations where ligands are unknown. Similarity-based methods have also been
used to design novel compounds. QSAR is a methodology that suggests structurally similar compounds tend
to possess similar biological activities [21]. Numerous studies based on this approach calculate a similarity
score among drug profiles to discover potential drug-drug interactions. Vilar and Hripcsak [22] used several
drug profiles to compute a similarity score between multiple compounds. Correspondingly, Ferdousi et al. [23]
compared diverse molecular profiles and found that the structural profile is the most optimal metric to predict
3. Int J Elec & Comp Eng ISSN: 2088-8708
Predicting active compounds for lung cancer based on quantitative structure-activity … (Hamza Hanafi)
5757
drug-drug interactions. The major disadvantage of this method is the choice of a suitable threshold for the
computed similarity; which is highly affected by the quality of the used dataset and false positive interactions.
Likewise, classical methods used in drug discovery are time consuming. Besides, they are less accurate because
of the number of reported ADRs.
Computational methods have significantly changed the way novel drugs are designed. Drugs
discovery pipelines have been largely enhanced and improved our understanding of biological processes.
Biological networks are a great way to represent chemical interactions as they have helped to integrate and
create a model of diverse heterogeneous biological data. Hanaf et al. [12], proposed a network-based method
combined with an ML algorithm to classify and predict interactions between genes, drugs, and diseases.
Similarly, they were able to rank the top 20 gene-drug pairs related to lung cancer. Topological data analysis
has recently been used to study large-scale biological data. Hanafi et al. [24], built a biological network using
data integration methods and explored numerous graph properties to evaluate potential gene-disease
interactions. Huang et al. [25] about drug repositioning for non-small cell lung cancer (NSCLC), the authors
combined topological parameter-based classification and ML algorithms to explore potential therapeutic drugs
for NSCLC, they successfully suggested promising drugs for treating early and late-stage lung cancer that were
supported by the literature and appeared highly effective in clinical trials and in vitro. Similarly, in a study about
identification of small potent molecule inhibitors to target Src kinase as a therapeutic strategy for lung cancer
Weng et al. [26], constructed a computational model for the in silico screening of Src inhibitors and evaluated
the effect of potential candidate compounds based on a QSAR model. The obtained results were promising, as
the candidate compounds used revealed a significant inhibitory effect against Src activity.
In this paper, we present a computational QSAR-based model, combined with a tree-based classifier
and a neural network model, to predict novel targeted compounds in lung cancer. We created a dataset of
compounds related to NSCLC from the ChEMBL database. We computed molecular descriptors of the
molecules as an 881-bit array, which we used as input features for the learning tasks. Furthermore, to evaluate
our models, we conducted a feature engineering step and compared feature contributions using the SHAP
values method.
3. OUR APPROACH
Our study follows a very meticulous approach to propose active compounds for lung cancer. The
overall methodology is described in Figure 1. We started by collecting bioactive compounds related to
non-small cell lung cancer from the ChEMBL database to construct our dataset. Afterwards, we clustered the
compounds into two groups: highly active and non-active drugs, based on their inhibition concentration value
at 50%, denoted as IC50. The lower the IC50, the more likely the drug is effective in inhibiting NSCLC. Then,
we computed molecular descriptors and initiated two learning tasks to build our models and learn from the
chemical characteristics of the calculated molecular descriptors.
Figure 1. Overall approach followed to predict compounds activity for non-small cell lung cancer
4. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 5, October 2023: 5755-5763
5758
3.1. Dataset construction
We constructed our dataset from the ChEMBL database; it is a discovery platform that covers
drug-like compounds [27], [28]. It offers a large variety of data related to drugs and provides insights, tools,
and resources for drug discovery. Researchers use ChEMBL to make associations between diseases and their
relevant targets. It also helps identify small molecules that can be used to target newly sequenced genomes.
ChEMBL integrates its content primarily from the scientific literature, making it a great and accurate tool for
in silico drug design. The collected compounds have two features to predict their binding activity. Table 1
shows some samples from our dataset.
The IC50 measures the potency of a molecule in inhibiting a biological process by 50%. It indicates
how much of a substance is needed to inhibit half of a given process. Consequently, drugs with lower IC50
values are highly active and have a value of 1 in the column “Activity in NSCLC”. Conversely, drugs with
higher IC50 values are less active and have a value of 0 in the column “Activity in NSCLC”.
Table 1. Some samples from the used dataset
Compound identifier in ChEMBL IC50 value (nM) Activity in NSCLC
CHEMBL133389 250 1
CHEMBL336075 10,000 0
CHEMBL130758 750 1
CHEMBL372987 7,000 0
CHEMBL201490 18 1
CHEMBL131038 10,000 0
CHEMBL185 8,400 0
CHEMBL159 5 1
CHEMBL11359 6,700 0
CHEMBL132876 40,300 0
3.2. Molecular descriptors
Molecular descriptors can be defined as a way to encode the chemical structure of molecules into
numbers, typically represented as an array of bits. Each numerical value denotes the presence or absence of a
certain pattern, such as a hydrogen bond, atom, or fragment. They are used to explore physicochemical and
topological properties to establish the basis for in silico predictive QSAR-based models and are also useful in
performing similarity searches in molecular libraries.
We used the PaDEL-descriptor [29] software to calculate the molecular descriptors of our collected
molecules. The software was developed by the National University of Singapore with the aid of the Chemistry
Development Kit. It uses the simplified molecular input line entry system (SMILES) to compute hundreds of
molecular descriptors and fingerprints. SMILES is the simplest way to represent a molecule based on a line
notation [30]. It is a way to encode a chemical structure using notations that can be read and understood by a
computer. The ChEMBL database provides the SMILES notation for the collected compounds, and Table 2
shows some of our collected molecules with their corresponding SMILES notation.
PubChem is a chemistry database that covers substances, compounds, and bio-assays. It defines a
binary substructure fingerprint for chemical structures. Each compound’s SMILES in our dataset was encoded
into an array of 881 bits consisting of physicochemical properties defined by PubChem. Table 3 shows a
summary description of the bits used by PubChem descriptors.
Table 2. Some samples of collected compounds and their SMILES notation
Compound identifier in ChEMBL SMILES
CHEMBL133389 O=C1C=CC(=O)c2c(O)c(Oc3ccccc3)c(Cl)c(O)c21
CHEMBL336075 Cc1ccc[n+](C2=C([O-])C(=O)c3c(O)ccc(O)c3C2=O)c1
CHEMBL130758 COc1c(Cl)c(Cl)c(OC)c2c1C(=O)C=CC2=O
CHEMBL372987 COC(=O)c1[nH]c2ccc(Cl)cc2c1Sc1ccccc1OC
CHEMBL201490 COC(=O)c1[nH]c2ccc(Cl)cc2c1Sc1cc(OC)cc(OC)c1
CHEMBL131038 COc1cc(OC)c2c3c(oc2c1)C(=O)c1c(OC)ccc(OC)c1C3=O
CHEMBL185 O=c1[nH]cc(F)c(=O)[nH]1
CHEMBL132876 O=C1c2ccccc2C(=O)c2c1oc1cc3c(cc21)OCO3
3.3. Learning tasks
The dataset contains a collection of 142,852 molecules clustered into active and non-active groups
based on the IC50 value. We allocated 90% of our data to build the training set and 10% for the test set. The
target feature used to predict, activity in NSCLC, can take discrete values which are: 0 for non-active
5. Int J Elec & Comp Eng ISSN: 2088-8708
Predicting active compounds for lung cancer based on quantitative structure-activity … (Hamza Hanafi)
5759
compounds and 1 for highly active compounds. We carried out a pre-processing step which consisted of
reducing the number of features to use as an input to the ML model. The initial number of features is 881. Low
variance features have been removed using a variance threshold of 0.15, which means dropping the feature
where 85% of values are similar. We ended up with 175 features with high variance that will present a good
set to allow the model to detect regularities present in the used dataset. A low number of features is also useful
to perform optimal training phase experience.
To build our neural network, we used the Keras [31] library to define a multilayer perceptron model
for binary classification. Physiochemical properties were fed into a sequential model, which consists of three
hidden layers. Each layer is a dense class. The input layer size is 175 in order to be mapped to the feature
vector. Then, the three hidden layers have 50, 10, and 2 neurons, respectively, with rectified linear unit (ReLU)
as the activation function. The output layer has one node that uses the sigmoid activation function. We
represented the physiochemical properties to the network with a single output value. We trained our model
using binary cross-entropy as the loss function and the Adam method as the optimizer. Similarly, we applied a
gradient boosting tree classifier to learn from a molecular descriptors array to predict the activity of compounds
in NSCLC. We used the extreme gradient boosting (XGBoost) algorithm from the Scikit-Learn [32], [33]
implementation to train and evaluate our model’s performance.
Table 3. Description of bits defined by PubChem
PubChem bit position range Description
From 0 to 114 These bits check for the existence or count of individual chemical atoms
From 115 to 262 These bits check for the existence of rings
From 263 to 326 These bits check for the existence of bonded atom pairs, regardless of their count and order
From 327 to 448 These bits check for the existence of atom nearest neighbor patterns, taking into account
aromaticity significant bonds
From 445 to 459 These bits check for the existence of detailed atom neighborhood patterns, regardless of
count, but where bond orders are specific
From 460 to 712 These bits check for the existence of simple SMARTS patterns, regardless of count, but
where bond orders are specific and bond aromaticity matches both single and double bonds
From 713 to 880 These bits check for the existence of complex SMARTS patterns, regardless of count, but
where bond orders and bond aromaticity are specific
4. RESULTS AND DISCUSSION
Our neural network is a sequential feedforward model consisting of 3 hidden layers. The performance
of the model was evaluated based on the log loss function as well as the reached accuracy during the training
process over 100 epochs. The data was shuffled and split into portions called batches, with each batch
consisting of 10 samples. During the learning process, the model loops over all these batches in each epoch
and updates the model. Figure 2 shows the evolution over training cycles of the log loss. The model achieved
an accuracy score of 0.96 with a log loss of 0.1166.
Similarly, we calculated the log loss of the decision tree-based classifier to evaluate its performance
over 100 epochs. We performed tuning using the grid-search function to find optimal values for the
hyperparameters of the model, which reached its highest performance with 100 boosted trees for a max depth
of 6 levels. Figure 3 shows the obtained curve of the log loss function, which achieved a value of 0.008 for the
validation set. The plot illustrates a decreasing curve in the log loss function. In addition, the model achieved
an F1 score of 0.86 for both predicted classes.
This gives us a snapshot of the training process which is successful for the two models, the XGBoost
model is more effective than the neural network-based model. We can see that both models can achieve highest
prediction performances. However, the number of predictors (175) is still high. Consequently, finding relevant
features is a crucial step to depict important structures of active molecules in lung cancer. Moreover, it will help
set up appropriate model parameters that will enhance the classification results of our method when evaluated on
unseen molecules. For that reason, we calculated the most relevant features to fully utilize the capabilities of our
models, which can easily capture patterns within the structure of novel molecules and reduce the number of
hyperparameters that need to be tuned. We plotted the shapely values using the Shapley additive explanations
(SHAP) method [34], a way to explain how the model is estimating a prediction class for a given molecule. It is
also a way to measure feature contribution and to find the most relevant features for the used dataset.
Figures 4 and 5 depict the key features identified by the artificial neural network (ANN) and XGBoost models,
respectively, highlighting the essential molecular patterns utilized by both models to make predictions.
Remarkably, there are 11 common features that both models leverage, suggesting that these features play a critical
role in distinguishing active and inactive drugs in NSCLC. To provide a comprehensive overview, Table 4
summarizes these 11 features, shedding light on the structural characteristics that underlie drug efficiency in
NSCLC.
6. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 5, October 2023: 5755-5763
5760
Figure 2. Log loss curve obtained for the artificial network model
Figure 3. Log loss curve obtained for the XGBoost model
Figure 4. Feature contribution in the artificial
neural network model
Figure 5. Feature contribution in the XGBoost
model
7. Int J Elec & Comp Eng ISSN: 2088-8708
Predicting active compounds for lung cancer based on quantitative structure-activity … (Hamza Hanafi)
5761
Now, we have performed the training task for our XGBoost model using a reduced number of features.
The 11 most relevant features were used as input for the learning process. The model achieved an F1 score of
0.98 with a final mean squared error of 0.02. This score is higher compared to that reached by the study in [35],
where an F1 score of 0.76 was obtained. Moreover, we compared the ability of our model to predict drugs that
were revealed as highly active in the study [35]. Then, we ranked top-10 highly active molecules in lung cancer,
and Table 5 shows a list of these drugs.
The top-10 list of molecules we obtained were all supported by the literature. Erlotinib, which ranked
1, is an oral anticancer drug that inhibits the epidermal growth factor receptor responsible for excessive cell
development in malignant lung tumors [36]. Paclitaxel, which ranked 2, is used in combination with Cisplatin
(ranked 5) as a first-line therapy for patients whose disease cannot be treated with surgery or radiation therapy
[37], [38]. Specific KRAS mutations are responsible for lung cancer, and patients with these mutations are
often resistant to targeted drugs such as those ranked 3 and 7 [39]. Moreover, drugs predicted to be active in
NSCLC by the study in [35] were also present in our top-10 list (drugs ranked 4, 6, 8, 9, and 10), and have
been validated by many studies [40]–[45].
Table 4. Structural patterns that describe drugs’ activity in NSCLC obtained from our ML methods
PubChem bit position Definition of the descriptor Chemical structure depiction NSCLC activity
PubchemFP4 Presence of more than 1 Li molecule highly active
PubchemFP165 Presence of more than 4 saturated or
aromatic carbon-only ring of size 5
less active or inactive
PubchemFP50 Presence of more than 1 Mg
molecule
highly active
PubchemFP51 Presence of more than 1 Al molecule highly active
PubchemFP36 Presence of more than 8 S molecules less active or inactive
PubchemFP104 Presence of more than 1 Sm
molecule
highly active
PubchemFP119 Presence of at least one unsaturated
non-aromatic carbon-only ring of
size 3
highly active
PubchemFP171 Presence of more than 5 rings of size
5
less active or inactive
PubchemFP2 Presence of more than 16 H
molecules
highly active
PubchemFP17 Presence of more than 8 N molecules highly active
PubchemFP161 Presence of more than 3 unsaturated
non-aromatic carbon-only rings of
size 5
less active or inactive
Table 5. Top-10 ranked drugs in lung cancer
Rank Drug name
1 Erlotinib
2 Paclitaxel
3 Atezolizumab
4 Osimertinib
5 Cisplatin
6 Fluoxetine
7 Sotorasib
8 Sulfasalazine
9 Azathioprine
10 Rotenone
5. CONCLUSION
In this paper, we propose a new approach to explore the important structures of active molecules in
lung cancer. we set up two machine learning models to learn from chemical structures and predict novel drugs
highly active in lung cancer, taking full advantage of a QSAR-based method. We conducted a comparative
study to evaluate the performance of the two models based on several metrics. We used SHAP values to
perform a feature engineering step and to list essential chemical structures that had a high contribution to the
training phase of our models to make accurate predictions. Both models showed good results and were
successfully able to rank the top 10 highly active molecules used as a therapy process for patients with lung
cancer. The obtained results were compared to the medical research literature and supported by several studies.
Our methodology demonstrated promising results that can enhance drug discovery pipelines not only for lung
cancer case but can be generalized to other diseases.
8. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 5, October 2023: 5755-5763
5762
REFERENCES
[1] T. Cheng, Q. Li, Z. Zhou, Y. Wang, and S. H. Bryant, “Structure-based virtual screening for drug discovery: a problem-centric
review,” The AAPS Journal, vol. 14, no. 1, pp. 133–141, Mar. 2012, doi: 10.1208/s12248-012-9322-0.
[2] H.-P. Shih, X. Zhang, and A. M. Aronov, “Drug discovery effectiveness from the standpoint of therapeutic mechanisms and
indications,” Nature Reviews Drug Discovery, vol. 17, no. 1, pp. 19–33, Jan. 2018, doi: 10.1038/nrd.2017.194.
[3] F. S. Hanoon and A. H. Hassin Alasadi, “A modified residual network for detection and classification of Alzheimer’s disease,”
International Journal of Electrical and Computer Engineering (IJECE), vol. 12, no. 4, pp. 4400–4407, Aug. 2022, doi:
10.11591/ijece.v12i4.pp4400-4407.
[4] G. Saranya and A. Pravin, “A comprehensive study on disease risk predictions in machine learning,” International Journal of
Electrical and Computer Engineering (IJECE), vol. 10, no. 4, pp. 4217–4225, Aug. 2020, doi: 10.11591/ijece.v10i4.pp4217-
4225.
[5] A. Aliper, S. Plis, A. Artemov, A. Ulloa, P. Mamoshina, and A. Zhavoronkov, “Deep learning applications for predicting
pharmacological properties of drugs and drug repurposing using transcriptomic data,” Molecular Pharmaceutics, vol. 13, no. 7,
pp. 2524–2530, Jul. 2016, doi: 10.1021/acs.molpharmaceut.6b00248.
[6] S. Kwon, H. Bae, J. Jo, and S. Yoon, “Comprehensive ensemble in QSAR prediction for drug discovery,” BMC Bioinformatics,
vol. 20, no. 1, Dec. 2019, doi: 10.1186/s12859-019-3135-4.
[7] S. Chackalamannil, D. Rotella, and S. E. Ward, Comprehensive medicinal chemistry III, 3rd
Edition. Elsevier B.V. ScienceDirect,
2017.
[8] J. Vamathevan and E. Birney, “A review of recent advances in translational bioinformatics: Bridges from biology to medicine,”
Yearbook of Medical Informatics, vol. 26, no. 01, pp. 178–187, Sep. 2017, doi: 10.15265/IY-2017-017.
[9] S. Fotouhi, S. Asadi, and M. W. Kattan, “A comprehensive data level analysis for cancer diagnosis on imbalanced data,” Journal
of Biomedical Informatics, vol. 90, Feb. 2019, doi: 10.1016/j.jbi.2018.12.003.
[10] F. Rafii, B. D. R. Hassani, and M. A. Kbir, “Automatic clustering of microarray data using ART2 neural network,” Journal of
Theoretical and Applied Information Technology, vol. 90, no. 1, pp. 175–184, 2005.
[11] F. Rafii, M. A. Kbir, and B. D. R. Hassani, “MLP network for lung cancer presence prediction based on microarray data,” in 2015
Third World Conference on Complex Systems (WCCS), Nov. 2015, pp. 1–6, doi: 10.1109/ICoCS.2015.7483276.
[12] H. Hanaf, B. Hassani, and M. Kbir, “Predicting gene-drug-disease interactions by integrating heterogeneous biological data through
a network model,” International Journal of Advances in Soft Computing and its Applications, vol. 14, no. 1, pp. 36–48, Apr. 2022,
doi: 10.15849/IJASCA.220328.03.
[13] D. Altshuler, M. J. Daly, and E. S. Lander, “Genetic mapping in human disease,” Science, vol. 322, no. 5903, pp. 881–888, Nov.
2008, doi: 10.1126/science.1156409.
[14] P. Ballesta, D. Bush, F. F. Silva, and F. Mora, “Genomic predictions using low-density SNP markers, pedigree and GWAS
information: A case study with the non-model species eucalyptus cladocalyx,” Plants, vol. 9, no. 1, Jan. 2020, doi:
10.3390/plants9010099.
[15] K. S. S. Lee et al., “Drug-target residence time affects in vivo target occupancy through multiple pathways,” ACS Central Science,
vol. 5, no. 9, pp. 1614–1624, Sep. 2019, doi: 10.1021/acscentsci.9b00770.
[16] J. E. Larsen, T. Cascone, D. E. Gerber, J. V. Heymach, and J. D. Minna, “Targeted therapies for lung cancer,” The Cancer Journal,
vol. 17, no. 6, pp. 512–527, Nov. 2011, doi: 10.1097/PPO.0b013e31823e701a.
[17] X. Lu et al., “The development of pharmacophore modeling: generation and recent applications in drug discovery,” Current
Pharmaceutical Design, vol. 24, no. 29, pp. 3424–3439, Dec. 2018, doi: 10.2174/1381612824666180810162944.
[18] S.-Y. Yang, “Pharmacophore modeling and applications in drug discovery: challenges and recent advances,” Drug Discovery
Today, vol. 15, no. 11–12, pp. 444–450, Jun. 2010, doi: 10.1016/j.drudis.2010.03.013.
[19] J. Almeida, J. Volpini, J. Poiani, and C. Silva, “Ligand-based drug design of novel MARK-3 inhibitors in cancer,” Current Bioactive
Compounds, vol. 10, no. 2, pp. 112–123, Oct. 2014, doi: 10.2174/157340721002141001102743.
[20] M. Li, W. Zhang, B. Wang, Y. Gao, Z. Song, and Q. C. Zheng, “Ligand-based targeted therapy: a novel strategy for hepatocellular
carcinoma,” International Journal of Nanomedicine, vol. 11, pp. 5645–5669, Oct. 2016, doi: 10.2147/IJN.S115727.
[21] M. Akamatsu, “Current state and perspectives of 3D-QSAR,” Current Topics in Medicinal Chemistry, vol. 2, no. 12,
pp. 1381–1394, Dec. 2002, doi: 10.2174/1568026023392887.
[22] S. Vilar and G. Hripcsak, “The role of drug profiles as similarity metrics: applications to repurposing, adverse effects detection and
drug–drug interactions,” Briefings in Bioinformatics, Jun. 2016, doi: 10.1093/bib/bbw048.
[23] R. Ferdousi, R. Safdari, and Y. Omidi, “Computational prediction of drug-drug interactions based on drugs functional similarities,”
Journal of Biomedical Informatics, vol. 70, pp. 54–64, Jun. 2017, doi: 10.1016/j.jbi.2017.04.021.
[24] H. Hanafi, B. D. R. Hassani, and M. A. Kbir, “Biological networks analysis, analytical approaches and use case on protein-protein
network interactions,” in Proceedings of the 4th
International Conference on Smart City Applications, Oct. 2019, pp. 1–5, doi:
10.1145/3368756.3368996.
[25] C.-H. Huang, P. M.-H. Chang, C.-W. Hsu, C.-Y. F. Huang, and K.-L. Ng, “Drug repositioning for non-small cell lung cancer by
using machine learning algorithms and topological graph theory,” BMC Bioinformatics, vol. 17, no. S1, Dec. 2016, doi:
10.1186/s12859-015-0845-0.
[26] C.-W. Weng et al., “Pharmacophore-based virtual screening for the identification of the novel Src inhibitor SJG-136 against lung
cancer cell growth and motility,” American Journal of Cancer Research, vol. 10, no. 6, pp. 1668–1690, 2020.
[27] A. Gaulton et al., “ChEMBL: a large-scale bioactivity database for drug discovery,” Nucleic Acids Research, vol. 40, no. D1,
pp. D1100–D1107, Jan. 2012, doi: 10.1093/nar/gkr777.
[28] A. Gaulton et al., “The ChEMBL database in 2017,” Nucleic Acids Research, vol. 45, no. D1, pp. D945–D954, Jan. 2017, doi:
10.1093/nar/gkw1074.
[29] C. W. Yap, “PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints,” Journal of
Computational Chemistry, vol. 32, no. 7, pp. 1466–1474, May 2011, doi: 10.1002/jcc.21707.
[30] D. Weininger, “SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules,” Journal
of Chemical Information and Computer Sciences, vol. 28, no. 1, pp. 31–36, Feb. 1988, doi: 10.1021/ci00057a005.
[31] N. Ketkar, “Introduction to keras,” in Deep Learning with Python, Berkeley, CA: Apress, 2017, pp. 97–111, doi: 10.1007/978-1-
4842-2766-4_7.
[32] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, Aug. 2016, pp. 785–794,
doi: 10.1145/2939672.2939785.
9. Int J Elec & Comp Eng ISSN: 2088-8708
Predicting active compounds for lung cancer based on quantitative structure-activity … (Hamza Hanafi)
5763
[33] P. D. Chary and R. P. Singh, “Review on advanced machine learning model: scikit-learn,” International Journal of Scientific
Research and Engineering Development, vol. 3, no. 4, pp. 526–529, 2020. Accessed: Sep. 20, 2022. [Online]. Available:
https://papers.ssrn.com/abstract=3694350.
[34] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in NIPS’17: Proceedings of the 31st
International Conference on Neural Information Processing Systems, 2017, pp. 4768–4777.
[35] R. X. Huang et al., “Lung adenocarcinoma-related target gene prediction and drug repositioning,” Frontiers in Pharmacology,
vol. 13, Aug. 2022, doi: 10.3389/fphar.2022.936758.
[36] C. M. Rocha-Lima and L. E. Raez, “Erlotinib (tarceva) for the treatment of non-small-cell lung cancer and pancreatic cancer,” P &
T: A Peer-reviewed Journal for Formulary Management, vol. 34, no. 10, pp. 559–564, 2009.
[37] S. Ramalingam and C. P. Belani, “Paclitaxel for non-small cell lung cancer,” Expert Opinion on Pharmacotherapy, vol. 5, no. 8,
pp. 1771–1780, Aug. 2004, doi: 10.1517/14656566.5.8.1771.
[38] R. Pirker, G. Krajnik, S. Zöchbauer, R. Malayeri, M. Kneussl, and H. Huber, “Paclitaxel/cisplatin in advanced non-small-cell lung
cancer (NSCLC),” Annals of Oncology, vol. 6, no. 8, pp. 833–835, Oct. 1995, doi: 10.1093/oxfordjournals.annonc.a059324.
[39] L. Huang, Z. Guo, F. Wang, and L. Fu, “KRAS mutation: from undruggable to druggable in cancer,” Signal Transduction and
Targeted Therapy, vol. 6, no. 1, Nov. 2021, doi: 10.1038/s41392-021-00780-4.
[40] X. Tang et al., “Machine learning-based CT radiomics analysis for prognostic prediction in metastatic non-small cell lung cancer
patients with EGFR-T790M mutation receiving third-generation EGFR-TKI Osimertinib treatment,” Frontiers in Oncology,
vol. 11, Sep. 2021, doi: 10.3389/fonc.2021.719919.
[41] K. Hu et al., “Suppression of the SLC7A11/glutathione axis causes synthetic lethality in KRAS-mutant lung adenocarcinoma,”
Journal of Clinical Investigation, vol. 130, no. 4, pp. 1752–1766, Mar. 2020, doi: 10.1172/JCI124049.
[42] Y.-L. Shi, S. Feng, W. Chen, Z.-C. Hua, J.-J. Bian, and W. Yin, “Mitochondrial inhibitor sensitizes non-small-cell lung carcinoma
cells to TRAIL-induced apoptosis by reactive oxygen species and Bcl-XL/p53-mediated amplification mechanisms,” Cell Death &
Disease, vol. 5, no. 12, pp. e1579–e1579, Dec. 2014, doi: 10.1038/cddis.2014.547.
[43] Z. Yang et al., “Antitumor effect of fluoxetine on chronic stress-promoted lung cancer growth via suppressing kynurenine pathway
and enhancing cellular immunity,” Frontiers in Pharmacology, vol. 12, Aug. 2021, doi: 10.3389/fphar.2021.685898.
[44] D. A. Gomes, A. M. Joubert, and M. H. Visagie, “In vitro effects of papaverine on cell proliferation, reactive oxygen species, and
cell cycle progression in cancer cells,” Molecules, vol. 26, no. 21, Oct. 2021, doi: 10.3390/molecules26216388.
[45] K. W. Baik, S. H. Kim, and H. Y. Shin, “Paraneoplastic neuromyelitis optica associated with lung adenocarcinoma in a young
woman,” Journal of Clinical Neurology, vol. 14, no. 2, pp. 246–247, 2018, doi: 10.3988/jcn.2018.14.2.246.
BIOGRAPHIES OF AUTHORS
Hamza Hanafi received his degree in Computer Science engineering in 2017 from
the Faculty of Sciences and Technologies at the University Abdelmalek Essaâdi in Tangier,
Morocco. He is currently a Ph.D. candidate and member of the Intelligent Automation Team at
the STI Doctoral Center, also located in the Faculty of Sciences and Technologies at the
University Abdelmalek Essaâdi. His research interests include computational biology,
bioinformatics, and machine learning. Hamza has published several research articles in
international journals and conferences on computer science. He can be contacted at email:
hamzahanafi1@gmail.com or hamza.hanafi@etu.uae.ac.ma.
Badr Dine Rossi Hassani is a full professor of biology at the Faculty of Sciences
and Technologies (FST) of Tangier, Morocco, and Ph.D. managing director at LABIPHABE
laboratory, his research areas interest many disciplines: Cancer Research, Biotechnology,
Bioinformatics. He is a member of scientific committees of many international conferences and
journals. He can be contacted at email: badrrossi@gmail.com.
M’hamed Aït Kbir is a full professor at the computer science department of the
Faculty of Sciences and Technologies (FST) of Tangier, since 2001, University Abdelmalek
Essaâdi, Morocco. As a member of LIST laboratory, since 2007, his research works focus on
three main areas: Computer vision (multimedia flow optimization, multimedia document content
watermarking, object recognition, 3D contents indexing and retrieval, 3D reconstruction)-
artificial intelligence (machine learning, deep learning, planning and search strategies)-
bioinformatics (micro-array data decision making, biological data integration, biological
networks analysis). He is a member of scientific committees of many international conferences
and journals. As an expert, he participates in the evaluation of public and private education
programs for the ANEAQ and the ministry of higher education and scientific research. He can
be contacted by email: maitkbir@uae.ac.ma.