The document discusses knowledge extraction and visualization using rule-based machine learning. It summarizes work applying evolutionary rule learning to problems like protein structure prediction and analyzing biological data. Key points include:
- The BioHEL rule learning system was used to generate predictive rules from large biological datasets and extract understandable knowledge.
- Case studies demonstrated knowledge extraction and network construction from seed germination microarray data and cancer gene expression data.
- Challenges include ensuring the extracted knowledge is reliable and refining the analysis to provide more domain-specific insights.
Pattern Recognition using Artificial Neural NetworkEditor IJCATR
An artificial neural network (ANN) usually called neural network. It can be considered as a resemblance to a paradigm
which is inspired by biological nervous system. In network the signals are transmitted by the means of connections links. The links
possess an associated way which is multiplied along with the incoming signal. The output signal is obtained by applying activation to
the net input NN are one of the most exciting and challenging research areas. As ANN mature into commercial systems, they are likely
to be implemented in hardware. Their fault tolerance and reliability are therefore vital to the functioning of the system in which they
are embedded. The pattern recognition system is implemented with Back propagation network and Hopfield network to remove the
distortion from the input. The Hopfield network has high fault tolerance which supports this system to get the accurate output.
Paper memo: persistent homology on biological problemsRyohei Suzuki
Shnier et al., Persistent homology analysis of brain transcriptome data in autism
Qaiser et al., Fast and accurate tumor segmentation of histology images using persistent homology and deep convolutional features
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
Network Biology: A paradigm for modeling biological complex systemsGanesh Bagler
These slides are part of the two lectures delivered at the as part of the 'National Workshop on Network Modelling and Graph Theory' (Dec 14-16, 2017) at Department of Mathematics, Dibrugarh University, Assam, India.
(1) Network Biology: A paradigm for integrative modeling of biological complex systems -- 14 Dec 2017, 3:30pm
(2) Applications of network modeling in biomedicine -- 15 Dec 2017, 9:00pm
Sponsored by UGC under SAP DRS (II)
(1) Workshop link: https://www.dibru.ac.in/upcoming-events/2981-national-workshop-on-network-modelling-and-graph-theory
(2) The Workshop Flyer: https://www.dibru.ac.in/images/uploaded_files/2017/Nov/National_Workshop_on_Network_Modelling_and_Graph_Theory.pdf
Understanding Protein Function on a Genome-scale through the Analysis of Molecular Networks
Cornell Medical School, Physiology, Biophysics and Systems Biology (PBSB) graduate program, 2009.01.26, 16:00-17:00; [I:CORNELL-PBSB] (Long networks talk, incl. the following topics: why networks w. amsci*, funnygene*, net. prediction intro, memint*, tse*, essen*, sandy*, metagenomics*, netpossel*, tyna*+ topnet*, & pubnet* . Fits easily into 60’ w. 10’ questions. PPT works on mac & PC and has many photos w. EXIF tag kwcornellpbsb .)
Date Given: 01/26/2009
Pattern Recognition using Artificial Neural NetworkEditor IJCATR
An artificial neural network (ANN) usually called neural network. It can be considered as a resemblance to a paradigm
which is inspired by biological nervous system. In network the signals are transmitted by the means of connections links. The links
possess an associated way which is multiplied along with the incoming signal. The output signal is obtained by applying activation to
the net input NN are one of the most exciting and challenging research areas. As ANN mature into commercial systems, they are likely
to be implemented in hardware. Their fault tolerance and reliability are therefore vital to the functioning of the system in which they
are embedded. The pattern recognition system is implemented with Back propagation network and Hopfield network to remove the
distortion from the input. The Hopfield network has high fault tolerance which supports this system to get the accurate output.
Paper memo: persistent homology on biological problemsRyohei Suzuki
Shnier et al., Persistent homology analysis of brain transcriptome data in autism
Qaiser et al., Fast and accurate tumor segmentation of histology images using persistent homology and deep convolutional features
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
Network Biology: A paradigm for modeling biological complex systemsGanesh Bagler
These slides are part of the two lectures delivered at the as part of the 'National Workshop on Network Modelling and Graph Theory' (Dec 14-16, 2017) at Department of Mathematics, Dibrugarh University, Assam, India.
(1) Network Biology: A paradigm for integrative modeling of biological complex systems -- 14 Dec 2017, 3:30pm
(2) Applications of network modeling in biomedicine -- 15 Dec 2017, 9:00pm
Sponsored by UGC under SAP DRS (II)
(1) Workshop link: https://www.dibru.ac.in/upcoming-events/2981-national-workshop-on-network-modelling-and-graph-theory
(2) The Workshop Flyer: https://www.dibru.ac.in/images/uploaded_files/2017/Nov/National_Workshop_on_Network_Modelling_and_Graph_Theory.pdf
Understanding Protein Function on a Genome-scale through the Analysis of Molecular Networks
Cornell Medical School, Physiology, Biophysics and Systems Biology (PBSB) graduate program, 2009.01.26, 16:00-17:00; [I:CORNELL-PBSB] (Long networks talk, incl. the following topics: why networks w. amsci*, funnygene*, net. prediction intro, memint*, tse*, essen*, sandy*, metagenomics*, netpossel*, tyna*+ topnet*, & pubnet* . Fits easily into 60’ w. 10’ questions. PPT works on mac & PC and has many photos w. EXIF tag kwcornellpbsb .)
Date Given: 01/26/2009
Prediction of Bioprocess Production Using Deep Neural Network MethodTELKOMNIKA JOURNAL
Deep learning enhanced the state-of-the-art methods in genomics allows it to be used
in analysing the biological data with high prediction. The training process of neural network with
several hidden layers which has been facilitated by deep learning has been subjected into
increased interest in achieving remarkable results in various fields. Thus, the extraction of
bioprocess production can be implemented by pathway prediction in genomic metabolic network
in eschericia coli. As metabolic engineering involves the manipulation of genes which have the
potential to increase the yield of metabolite production. A mathematical model of this network is
the foundation for the development of computational procedure that directs genetic
manipulations that would eventually lead to optimized bioprocess production. Due to the ability
of deep learning to be well suited in terms of genomics, modelling for biological network can be
implemented. Each layer reveal the insight of biological network which enable pathway analysis
to be implemented in order to extract the target bioprocess production. In this study, deep
neural network has been to identify any set of gene deletion models that offers optimal results in
xylitol production and its growth yield.
Neural Network and Artificial Intelligence.
Neural Network and Artificial Intelligence.
WHAT IS NEURAL NETWORK?
The method calculation is based on the interaction of plurality of processing elements inspired by biological nervous system called neurons.
It is a powerful technique to solve real world problem.
A neural network is composed of a number of nodes, or units[1], connected by links. Each linkhas a numeric weight[2]associated with it. .
Weights are the primary means of long-term storage in neural networks, and learning usually takes place by updating the weights.
Artificial neurons are the constitutive units in an artificial neural network.
WHY USE NEURAL NETWORKS?
It has ability to Learn from experience.
It can deal with incomplete information.
It can produce result on the basis of input, has not been taught to deal with.
It is used to extract useful pattern from given data i.e. pattern Recognition etc.
Biological Neurons
Four parts of a typical nerve cell :• DENDRITES: Accepts the inputs• SOMA : Process the inputs• AXON : Turns the processed inputs into outputs.• SYNAPSES : The electrochemical contactbetween the neurons.
ARTIFICIAL NEURONS MODEL
Inputs to the network arerepresented by the x1mathematical symbol, xn
Each of these inputs are multiplied by a connection weight , wn
sum = w1 x1 + ……+ wnxn
These products are simplysummed, fed through the transfer function, f( ) to generate a result and then output.
NEURON MODEL
Neuron Consist of:
Inputs (Synapses): inputsignal.Weights (Dendrites):determines the importance ofincoming value.Output (Axon): output toother neuron or of NN .
Building Neural Network Through Neuroevolutionbergel
1 hour class on Neuroevolution. After presenting genetic algorithm, two neurevolutions algorithms are presented, simple and NEAT. Various examples about artificial life and video games are given
Turbidity is a measure of water quality. Excessive turbidity poses a threat to health and causes pollution. Most of the available mathematical models of water treatment plants do not capture turbidity. A reliable model is essential for effective removal of turbidity in the water treatment plant. This paper presents a comparison of Hammerstein Wiener and neural network technique for estimating of turbidity in water treatment plant. The models were validated using an experimental data from Tamburawa water treatment plant in Kano, Nigeria. Simulation results demonstrated that the neural network model outperformed the Hammerstein-Wiener model in estimating the turbidity. The neural network model may serve as a valuable tool for predicting the turbidity in the plant.
Computational approaches for mapping the human connectomeCameron Craddock
Describes open challenges and ongoing work for mapping the human functional connectome and identifying inter-individual variation in the connectome that maps to phenotype and clinical outcomes. Also describes open science initiatives to help scientists from disparate backgrounds to become involved in this research.
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...Christy Maver
Numenta VP Research Subutai Ahmad presents a talk on "Sparsity in the Neocortex and its Implications for Continual Learning" at the virtual CVPR 2020 workshop. In this talk, he discusses how continuous learning systems can benefit from sparsity, active dendrites and other neocortical mechanisms.
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATIONcscpconf
Feature selection is more accurate technique in protein sequence classification. Researchers apply some well-known classification techniques like neural networks, Genetic algorithm, Fuzzy ARTMAP, Rough Set Classifier etc for extracting features.This paper presents a review is with
three different classification models such as fuzzy ARTMAP model, neural network model and Rough set classifier model.This is followed by a new technique for classifying protein
sequences.The proposed model is typically implemented with an own designed tool using JAVA and tries to prove that it reduce the computational overheads encountered by earlier
approaches and also increase the accuracy of classification.
This explains the general algorithmic flow which goes into developing a Neural Network ensemble hybridized with evolutionary optimization schemes which are targeted in optimizing more than one cost function.
Prediction of Bioprocess Production Using Deep Neural Network MethodTELKOMNIKA JOURNAL
Deep learning enhanced the state-of-the-art methods in genomics allows it to be used
in analysing the biological data with high prediction. The training process of neural network with
several hidden layers which has been facilitated by deep learning has been subjected into
increased interest in achieving remarkable results in various fields. Thus, the extraction of
bioprocess production can be implemented by pathway prediction in genomic metabolic network
in eschericia coli. As metabolic engineering involves the manipulation of genes which have the
potential to increase the yield of metabolite production. A mathematical model of this network is
the foundation for the development of computational procedure that directs genetic
manipulations that would eventually lead to optimized bioprocess production. Due to the ability
of deep learning to be well suited in terms of genomics, modelling for biological network can be
implemented. Each layer reveal the insight of biological network which enable pathway analysis
to be implemented in order to extract the target bioprocess production. In this study, deep
neural network has been to identify any set of gene deletion models that offers optimal results in
xylitol production and its growth yield.
Neural Network and Artificial Intelligence.
Neural Network and Artificial Intelligence.
WHAT IS NEURAL NETWORK?
The method calculation is based on the interaction of plurality of processing elements inspired by biological nervous system called neurons.
It is a powerful technique to solve real world problem.
A neural network is composed of a number of nodes, or units[1], connected by links. Each linkhas a numeric weight[2]associated with it. .
Weights are the primary means of long-term storage in neural networks, and learning usually takes place by updating the weights.
Artificial neurons are the constitutive units in an artificial neural network.
WHY USE NEURAL NETWORKS?
It has ability to Learn from experience.
It can deal with incomplete information.
It can produce result on the basis of input, has not been taught to deal with.
It is used to extract useful pattern from given data i.e. pattern Recognition etc.
Biological Neurons
Four parts of a typical nerve cell :• DENDRITES: Accepts the inputs• SOMA : Process the inputs• AXON : Turns the processed inputs into outputs.• SYNAPSES : The electrochemical contactbetween the neurons.
ARTIFICIAL NEURONS MODEL
Inputs to the network arerepresented by the x1mathematical symbol, xn
Each of these inputs are multiplied by a connection weight , wn
sum = w1 x1 + ……+ wnxn
These products are simplysummed, fed through the transfer function, f( ) to generate a result and then output.
NEURON MODEL
Neuron Consist of:
Inputs (Synapses): inputsignal.Weights (Dendrites):determines the importance ofincoming value.Output (Axon): output toother neuron or of NN .
Building Neural Network Through Neuroevolutionbergel
1 hour class on Neuroevolution. After presenting genetic algorithm, two neurevolutions algorithms are presented, simple and NEAT. Various examples about artificial life and video games are given
Turbidity is a measure of water quality. Excessive turbidity poses a threat to health and causes pollution. Most of the available mathematical models of water treatment plants do not capture turbidity. A reliable model is essential for effective removal of turbidity in the water treatment plant. This paper presents a comparison of Hammerstein Wiener and neural network technique for estimating of turbidity in water treatment plant. The models were validated using an experimental data from Tamburawa water treatment plant in Kano, Nigeria. Simulation results demonstrated that the neural network model outperformed the Hammerstein-Wiener model in estimating the turbidity. The neural network model may serve as a valuable tool for predicting the turbidity in the plant.
Computational approaches for mapping the human connectomeCameron Craddock
Describes open challenges and ongoing work for mapping the human functional connectome and identifying inter-individual variation in the connectome that maps to phenotype and clinical outcomes. Also describes open science initiatives to help scientists from disparate backgrounds to become involved in this research.
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...Christy Maver
Numenta VP Research Subutai Ahmad presents a talk on "Sparsity in the Neocortex and its Implications for Continual Learning" at the virtual CVPR 2020 workshop. In this talk, he discusses how continuous learning systems can benefit from sparsity, active dendrites and other neocortical mechanisms.
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATIONcscpconf
Feature selection is more accurate technique in protein sequence classification. Researchers apply some well-known classification techniques like neural networks, Genetic algorithm, Fuzzy ARTMAP, Rough Set Classifier etc for extracting features.This paper presents a review is with
three different classification models such as fuzzy ARTMAP model, neural network model and Rough set classifier model.This is followed by a new technique for classifying protein
sequences.The proposed model is typically implemented with an own designed tool using JAVA and tries to prove that it reduce the computational overheads encountered by earlier
approaches and also increase the accuracy of classification.
This explains the general algorithmic flow which goes into developing a Neural Network ensemble hybridized with evolutionary optimization schemes which are targeted in optimizing more than one cost function.
Applications of Machine Learning to Location-based Social NetworksJoan Capdevila Pujol
This work is part of a seminar talk given at Universitat de Girona (UdG). It is basically an introduction to Location-based Social Networks through two Machine Learning applications: a recommendation system and an event discovery technique.
The Internet of Things (IoT) comes with great possibilities as well as major security and privacy issues. Although digital forensics has long been studied in both academia and industry, mobility forensics is relatively new and unexplored. Mobility forensics deals with tools and techniques that work towards forensically sound recovery of data and evidence from mobile devices [1]. In this paper, we explore mobility forensics in the context of IoT. This paper discusses the data collection and classification process from IoT smart home devices in details. It also contains attack scenario based analysis of collected data and a proposed mobility forensics model that fits into such scenarios.
Cite: K. M. S. Rahman, M. Bishop, and A. Holt, “Internet of Things Mobility Forensics,” INSuRE Conference, 2016.
Describes a link between KM technologies and business strategy through context-specific KM inititiatives. Paper presented at CATI 2005, Congresso Anual de Tecnologia de Informa��o, S�o Paulo, Brazil.
Airline passenger profiling based on fuzzy deep machine learningAyman Qaddumi
Passenger profiling plays a vital part of commercial aviation security. Classical passenger profiling methods are inefficient in handling the rapidly increasing amounts of electronic records. Emerging deep learning models combined with highly parallel computing have exhibited promising performance for feature exaction and abstraction, but their applications in aviation security management have rarely been reported.
As the complexity of choosing optimised and task specific steps and ML models is often beyond non-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge. We call the resulting research area that targets progressive automation of machine learning AutoML.
Although it focuses on end users without expert knowledge, AutoML also offers new tools to machine learning experts, for example to:
1. Perform architecture search over deep representations
2. Analyse the importance of hyperparameters.
Online Machine Learning: introduction and examplesFelipe
In this talk I introduce the topic of Online Machine Learning, which deals with techniques for doing machine learning in an online setting, i.e. where you train your model a few examples at a time, rather than using the full dataset (off-line learning).
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
Big Data, Data Science, Machine Learning and Analytics are a few of the new buzzwords that have invaded out industry of late. Again we are being sold a unicorn-laden, silver-bullet panacea by heavy handed marketing folks, evoking an expected pushback from the most enlightened members of our community. However, as was the case before, there might just be enough technical meat in there to help out with our security challenges and the overwhelming odds we face everyday. And if so, what do we as a community have to know about these technologies in order to be better professionals? Can we really use the data we have been collecting to help automate our security decision making? Is a robot going to steal my job?
If you are interested in what is behind this marketing buzz and are not scared of a little math, this talk would like to address some insights into applying Machine Learning techniques to data any of us have easy access to, and try to bring home the point that if all of this technology can be used to show us “better” ads in social media and track our behavior online (and a bit more than that) it can also be used to defend our networks as well.
How multiple experts can be leveraged in a machine learning application without knowing apriori who are "good" experts and who are "bad" experts. See how we can quantify the bounds on the overall results.
* GOTO Berlin Conference 2013
Toru Shimogaki / NTT DATA CORPORATION
"The realtime processing for web services"
In Recruit Technologies, we are now concentrating on using streaming data processing and machine learning to analyze online user behavior and improve our services. We have a packaged solution named "Genn.ai" to make these technologies widely available in Recruit group. It will be opensourced. Using it, you can extract the power of Storm with simple scripts! In addition, we are making an effort to use online machine learning middleware "Jubatus" in production with NTT DATA.
http://gotocon.com/berlin-2013/presentation/The%20realtime%20processing%20for%20web%20services
Quantification of variability and uncertainty in systems medicine modelsNatal van Riel
BioSB2016 Conference
Abstract: Computational modelling in systems biology addresses biological processes at different levels and scales. The quantification of model parameters from experimental data is a complicated task. To develop accurate, predictive models it is necessary to analyze how variance in data propagates into parameter estimates and, more importantly, model predictions. The network structure of the biological systems imposes strong constraints on possible solutions of a model. Amounts of data, available at molecular and physiological level, continue to increase. Often, model results are only partly in agreement with data, despite that model parameters are fitted. In contrast to existing belief that calibration of systems biology models to experimental data is prone to overfitting, we argue that dynamical models, despite their size and complexity, are not flexible enough to correctly describe all data.
Approaches are explored to introduce more degrees of freedom in models, but simultaneously enforcing sparsity if extra flexibility is not required. Estimation tools for dynamical systems are complemented with ‘regularization’ methods to reduce the error (bias) in models without escalating uncertainties (variance). This paradigm shift will be illustrated in two examples: 1) modelling of longitudinal data in a cohort of Type 2 Diabetics using different medication, and 2) the application in preclinical research studying the effect of liver X receptor activation on HDL metabolism and liver steatosis.
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
The challenge of accurately characterizing bioassays is a real pain point for many drug discovery organizations. Research has shown that some organizations have legacy assay collections exceeding 20,000 protocols, the great majority of which are not accurately characterized. This problem is compounded by the fact that many new protocol registrations are still not following FAIR (Findability, Accessibility, Interoperability, and Reusability) Data principles.
BioAssay Express is a tool focused on transforming the traditional protocol description from an unstructured free form text into a well-curated data store based upon FAIR Data principles. By using well-defined annotations for assays, the tool enables precise ontology based searches without having to resort to imprecise keyword searches.
This talk explores a number of new important features designed to help scientists accelerate the drug discovery process. Some example use-cases include: enabling drug repositioning projects; improving SAR models; identifying appropriate machine learning data sets; fine-tuning integrative-omic pathways;
An aspirational goal for our team is to build a metadata schema based on semantic web vocabularies that is comprehensive to the extent that the text description becomes optional. One of the many possibilities is to take the initial prospective ELN entry for a bioassay protocol and feed it directly to an automated instrument. While there are many challenges involved in creating the ELN-to-robot loop, we will provide some insights into our collaborations with UCSF automation experts.
In summary, the ability to quickly and accurately search or analyze bioassay data (public or internal) is a rate limiting problem in drug discovery. We will present the latest developments toward removing this bottleneck.
https://plan.core-apps.com/acs_sd2019/abstract/6f58993d-a716-49ad-9b09-609edde5a3f4
Molecular modelling for in silico drug discoveryLee Larcombe
A slide set based on the small molecule section of "Introduction to in silico drug discovery" with more detail on molecular modelling and simulation aspects. Including a bit more on protein structure prediction
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...jaumebp
This work assesses the performance of the BioHEL data mining method to handle large-scale datasets, and proposes a representation to deal efficiently with domains with mixed discrete-continuous attributes
Single-cell RNA sequencing workshop given at the Ottawa Hospital Research Institute in 2018. Note that slides contain animations that won't be viewed in the slidehsare
Systems biology & Approaches of genomics and proteomicssonam786
This presentation provides the basic understanding of varous genomics and proteomics techniques.Systems biology studies life as a system .It includes the study of living system using various omic technologies .
This presentation is part of the Pacific Education Institute's content for the STEM Project Based Learning tutorial available through NH e-Learning for Educators as part of the Conservation Education series supported by the Association of Fish and Wildlife Agencies.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Knowledge extraction and visualisation using rule-based machine learning
1. Knowledge extraction and
visualisation using rule-based
machine learning
Dr. Jaume Bacardit
Interdisciplinary Computing and Complex Systems
(ICOS) research group
University of Nottingham
jaume.bacardit@nottingham.ac.uk
ICOS seminar. 11/10/2012
2. Preface
• I came to Nottingham in 2005 to work as a postdoc in a project applying
evolutionary rule learning to protein structure prediction (EPSRC
GR/T07534/01). In the project me managed to:
– Generate predictors that are competent with the start-of-the-art
– Indeed, extract human-readable explanations providing new
knowledge
– We proposed several improvements to the learning algorithms so they
could scale to big problems
• When I became a lecturer in 2008 I started several collaborations with
experimentalists analysing biological data of all kinds, always with the goal
of extracting knowledge
– Thanks to having sets of rules, it is relatively straightforward to
develop a generic methodology to extract knowledge from them, that
can be applied almost straight away to a variety of datasets
– Still, we are only at the tip of the iceberg, there are many ways in
which this analysis can be made more efficient/reliable/useful
4. A set of rules as a knowledge
representation
1
If (X<0.25 and Y>0.75) or
(X>0.75 and Y<0.25) then
If (X>0.75 and Y>0.75) then
Y If (X<0.25 and Y<0.25) then
Everything else
0 1
X
6. The BioHEL rule learning system
• BioHEL [Bacardit et al., 09] is an evolutionary
learning system that applies the Iterative Rule
Learning (IRL) approach
• Designed explicitly to deal with noisy large-scale
datasets
• IRL was first used in EC by the SIA system
[Venturini, 93]
7. BioHEL’s learning paradigm
– IRL has been used for many years in the ML community,
with the name of separate-and-conquer
– A standard elitist Genetic Algorithm generates each rule
8. BioHEL’s characteristics 1/2
• Objective function that tries to balance the
generation of accurate and general rules
– Accurate: not making many mistakes
– General: covering as many examples as possible and covering as much
of the search space as possible
• Attribute list rule representation
– Automatically identifying the relevant attributes for a given rule and
discarding all the other ones
• Ensemble mechanisms
– Exploiting the GA’s stochasticity to construct ensembles of rule sets, all
of them generated from the same data, but with different random
seeds, also ensembles for ordinal classification
9. BioHEL’s characteristics 2/2
• The ILAS windowing scheme
– Efficiency enhancement method. Training set divided into strata.
Different GA iterations use different strata for their evaluation using a
round-robin policy
• GPGPU-based fitness evaluation
– Obtaining ~50x speedups on large datasets on its own and ~700x
speedups in combination with ILAS
11. Functional Network Reconstruction for
seed germination
Microarray data obtained from seed tissue of
Arabidopsis Thaliana
122 samples represented by the expression level
of almost 14000 genes
It had been experimentally determined whether
each of the seeds had germinated or not
Can we learn to predict germination/dormancy
from the microarray data?
Bassel et al., Plant Cell 23(9):3101-3116, 2011
12. Generating rule sets
BioHEL was able to predict the
outcome of the samples with
93.5% accuracy (10 x 10-fold cross-
validation
Learning from a scrambled dataset
(labels randomly assigned to
samples) produced ~50% accuracy
If At1g27595>100.87 and At3g49000>68.13 and At2g40475>55.96 Predict
germination
If At4g34710>349.67 and At4g37760>150.75 and At1g30135>17.66 Predict
germination
If At3g03050>37.90 and At2g20630>96.01 and At3g02885>9.66 Predict germination
If At5g54910>45.03 and At4g18975>16.74 and At3g28910>52.76 and At1g48320>56.80
Predict germination
Everything else Predict dormancy
13. Identifying regulators
Rule building process is stochastic
Generates different rule sets each time the system is
run
But if we run the system many times, we can see
some patterns in the rule sets
Genes appearing quite more frequent than the rest
Some associated to dormancy
Some associated to germination
We generated 10K rule sets for each outcome
Rules predicted one of the two outcomes
Default rule captured the other
15. Generating co-prediction networks of
interactions
• For each of the rules shown before to be
true, all of the conditions in it need to be
true at the same time
– Each rule is expressing an interaction between
certain gens
• From a high number of rule sets we can
identify pairs of genes that co-occur with
high frequency and generate functional
networks with a methodology coined as co-
prediction
• The network shows different topology when
compared to other type of network
construction methods (e.g. by gene co-
expression)
• Different regions in the network contain the
germination and dormancy genes.
• Other visualisations providing the big picture
exist (Urbanowicz et al., 2012)
16. Experimental validation
We have experimentally verified this analysis
By ordering and planting knockouts for the highly ranked
genes
We have been able to identify four new regulators of
germination, with phenotype different than the wild type
17. Same analysis. Different datasets
• We applied the same principle to three cancer
datasets from the literature (E. Glaab et al., PLoS
ONE (2012) 7(7):e39932)
• We checked PubMed to see if the genes linked
together in BioHEL’s rules appeared together in
the literature
• We used Point-Wise Mutual Information (PMI) to
quantify that the genes do not appear linked
together in the literature by chance
• Compared the PMI scores of the highly ranked
pairs of genes with random pairs
19. And to lots of other datasets!
• These datasets were generated using transcriptomics
technology
– Looks at RNA
• There are lots of other –omics (hundreds of them)
– Proteomics
– Lipidomics
– Metabolomics
– Next-generation sequencing
• Each –omics requires specific preprocessing, but the
learning and knowledge extraction process is exactly
the same
• Lots of datasets out there
20. Another example different from -omics
• Protein Structure Prediction aims to predict the 3D
structure of a protein based on its primary sequence
21. Prediction types of PSP
• There are several kinds of prediction problems within
the scope of PSP
– The main one, of course, is to predict the 3D coordinates
of all atoms of a protein (or at least the backbone) based
on its primary sequence
– There are many structural properties of individual residues
within a protein that can be predicted, for instance:
• The secondary structure state of the residue
• If a residue is buried in the core of the protein or exposed in the
surface
– Accurate predictions of these sub-problems can simplify
the general 3D PSP problem
22. Contact Map prediction
• Prediction, for each pair of residues in a
protein, whether these residues are in
contact (have a small distance between
them in the 3D structure) or not
• This problem can be represented by a
binary matrix. 1= contact, 0 = non
contact. Plotting this matrix reveals the
main traits in the protein structure
• Very sparse characteristic: Less than 2%
of contacts in native structures
• Training sets easily reach millions of
residue pairs
• Our method was one of the top
predictors in the last two editions of the
CASP competition (actually, the best
sequence-based predictor in last CASP)
helices sheets
(Bacardit et al., Bioinformatics (2012) 28 (19): 2441-2448)
23. Steps for CM prediction
1. Prediction of
Secondary structure (using PSIPRED)
Solvent Accessibility
Recursive Convex Hull Using BioHEL [Bacardit et al., 09]
Coordination Number
2. Integration of all these predictions plus other
sources of information
3. Final CM prediction (using BioHEL)
24. Characterisation of the contact map
problem
Three types of input information were used
1. Detailed information of three different windows of
residues centered around
The two target residues (2x)
The middle point between them
2. Information about the connecting segment between the
two target residues and
3. Global protein information.
1
3
2
25. Samples and ensembles
Training set
Training set contained 32 million
pairs of AA and 631 attributes
x50 (+60GB of disk space)
Samples
50 samples of 660K examples are
generated from the training set with a
ratio of 2:1 non-contacts/contacts
x25
Rule sets BioHEL is run 25 times for each sample
Prediction is done by a consensus of
1250 rule sets
Confidence of prediction is computed
based on the votes distribution in the
Consensus ensemble.
Whole training process took about 25K
CPU hours
Predictions
26. Knowledge extraction in contact map
prediction
• Basic analysis is exactly the same
Frequent attributes
Frequent pairs of
attributes
27. But analysis can be much more refined
• Because the representation has a very clear structure
and we have lots of domain knowledge
• For instance, there are several way to aggregate the
ranks of individual attributes based on characteristics
from the representation/domain
Ranks aggregated by
source of information
Ranks aggregated by
amino acid type
29. The knowledge extraction can be
much more refined
• We just looked at what attributes appear in the
rules, but not yet at the shape of the predicates
• Sometimes biasing the representation helps
generating knowledge that is more useful to the
domain experts
– In the experiments with the seed data BioHEL was
constrained to generate only predicates “Att>X”
– But we always have to be careful when introducing
bias
30. Is the knowledge real?
• Data is far from perfect, lots of spurious peaks
• Probably many of the edges in the network are false
positives
• Strategies for filtering the knowledge
– Classic blind feature selection?
– Contrast the knowledge with databases of curated
information about the genes/interactions
• Some of these are quite pricy!
• Or we need strong text mining skills
– Careful balance is needed, we don’t want to filter true
positives
– Using expert knowledge to bias the learning process (Moore
& White, 2006)
31. Modelling the ML problem
• Datasets annotated as “case/controls” are easy
• What happens with N>2 labels?
– Tricky for decision lists, as there is an implicit overlap
between rules
• What happens with continuous annotations?
– There are similar examples in the literature using
model trees (Nepomuceno-Chamorro et al., 2010)
• What happens when the annotation is a time
course?
– Ordinal classification problem
32. References
• BioHEL
– Improving the scalability of rule-based evolutionary learning. J. Bacardit, E.K.
Burke and N. Krasnogor. Memetic Computing journal 1(1):55-67, 2009
– Speeding Up the Evaluation of Evolutionary Learning Systems using GPGPUs.
M. Franco, N. Krasnogor and J. Bacardit. In Proceedings of the 12th Annual
Conference on Genetic and Evolutionary Computation (GECCO2010), 1039-
1046, ACM Press, 2010
– Modelling the Initialisation Stage of the ALKR Representation for Discrete
Domains and GABIL Encoding. M. Franco, N. Krasnogor and J. Bacardit. In
Proceedings of the 13th Annual Conference on Genetic and Evolutionary
Computation - GECCO2011, pages 1291-1298. ACM, 2011
– Post-processing Operators for Decision Lists. M. Franco, N. Krasnogor and J.
Bacardit. In Proceedings of the 14th Annual Conference on Genetic and
Evolutionary Computation - GECCO2012, pages 847-854. ACM, 2012
– Analysing BioHEL using challenging boolean functions. M. Franco, N.
Krasnogor and J. Bacardit. Evolutionary Intelligence, 5(2):87-102, June 2012
33. References
• Knowledge extraction and visualisation
– Prediction of Recursive Convex Hull Class Assignments for Protein Residues. Stout, M.,
Bacardit, J., Hirst, J.D. and Krasnogor, N. Bioinformatics, 24(7):916-923, 2008
– Automated Alphabet Reduction for Protein Datasets. J. Bacardit, M. Stout, J.D. Hirst, A.
Valencia, R.E. Smith and N. Krasnogor. BMC Bioinformatics 10:6, 2009
– Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on
Large-Scale Data Sets. George W. Bassel, Enrico Glaab, Julietta Marquez, Michael J.
Holdsworth and Jaume Bacardit. The Plant Cell, 23(9):3101-3116, 2011
– E. Glaab, J. Bacardit, J.M. Garibaldi and N. Krasnogor. Using Rule-Based Machine
Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer
Gene Expression Data. PLoS ONE 7(7):e39932. 2012. doi:10.1371/journal.pone.0039932
– J. Bacardit, P. Widera, A. Márquez-Chamorro, F. Divina, J.S. Aguilar-Ruiz and Natalio
Krasnogor. Contact map prediction using a large-scale ensemble of rule sets and the
fusion of multiple predicted structural features. Bioinformatics (2012) 28 (19): 2441-
2448. doi:10.1093/bioinformatics/bts472
– HP Fainberg, K. Bodley, J. Bacardit, D. Li, F. Wessely, NP. Mongan, ME. Symonds, L. Clarke
and A. Mostyn, Reduced neonatal mortality in Meishan piglets: a role for hepatic fatty
acids? PLoS ONE, in press, 2012
34. References
• Related work
– Nepomuceno-Chamorro, I.A., Aguilar-Ruiz, J.S., and
Riquelme, J.C. (2010). Inferring gene regression networks
with model trees. BMC Bioinformatics 11: 517
– Moore, J. and White, B., Exploiting expert knowledge in
genetic programming for genome-wide genetic analysis,
Parallel Problem Solving from Nature-PPSN IX, pp. 969-
977, 2006
– R. J. Urbanowicz, A. Granizo-MacKenzie, and J. H. Moore.
Instance-linked attribute tracking and feedback for
michigan-style supervised learning classifier systems. In
GECCO ’12: Proceedings of the 14th annual conference on
Genetic and evolutionary computation , pages 927–934.
ACM Press, 2012
35. Acknowledgements
• Natalio Krasnogor
• Michael Holdsworth
• George Bassel
• Enrico Glaab
• Pawel Widera
• Maria Franco
• Anna Swan
• Hernan Fainberg
• EPSRC GR/T07534/01 & EP/H016597/1
36. Knowledge extraction and
visualisation using rule-based
machine learning
Dr. Jaume Bacardit
Interdisciplinary Computing and Complex Systems
(ICOS) research group
University of Nottingham
jaume.bacardit@nottingham.ac.uk
ICOS seminar. 11/10/2012