The document discusses methods for outlier detection and diversity analysis of chemical datasets using R-nearest neighbor (R-NN) curves. It describes generating R-NN curves for molecules by calculating the number of neighbors within increasing hypersphere radii and characterizing curves through metrics like the radius of maximum slope. These methods allow visualization and comparison of molecules' local densities and can identify outliers. The document also discusses using locality sensitive hashing to enable R-NN curve analysis of very large datasets in sublinear time.
Portuguese Market and On-board Sampling Effort ReviewErnesto Jardim
Accurate and precise estimation of discards is a major objective of data collection programs throughout the world. Discard reduction is also a major topic of the new Common Fisheries Policy (CFP) and the future Data Collection Multi-Annual Programme (DC-MAP). Using data from the Portuguese on-board observer programme that samples two otter trawl fisheries in ICES Division IXa, we compare two different approaches for estimating the sampling effort required to attain "assessment grade" discard estimates: a model-based approach (exponential-decay models) and a probability-based approach (based on classic sampling theory). We show that both approaches attain comparable sample size estimates and that the sample size required to attain precision objectives
varies across species and across fisheries being likely influenced by discard motifs. We demonstrate that sampling levels at least two fold higher than the present sampling levels would be required to attain the precision levels set in the current Data Collection Framework (DCF). We discuss the implications of these results in light of the future ability of European onboard sampling programs to detect, e.g., progressive reductions in discard levels.
SAP 2012 - Programma 11 - CITTÀ DELLA CONOSCENZA E PER I GIOVANIComune Udine
Lo Stato di Attuazione dei programmi verifica il grado di avanzamento dei programmi dell'anno in corso, definiti nella Relazione Previsionale e programmatica 2012/2014.
Portuguese Market and On-board Sampling Effort ReviewErnesto Jardim
Accurate and precise estimation of discards is a major objective of data collection programs throughout the world. Discard reduction is also a major topic of the new Common Fisheries Policy (CFP) and the future Data Collection Multi-Annual Programme (DC-MAP). Using data from the Portuguese on-board observer programme that samples two otter trawl fisheries in ICES Division IXa, we compare two different approaches for estimating the sampling effort required to attain "assessment grade" discard estimates: a model-based approach (exponential-decay models) and a probability-based approach (based on classic sampling theory). We show that both approaches attain comparable sample size estimates and that the sample size required to attain precision objectives
varies across species and across fisheries being likely influenced by discard motifs. We demonstrate that sampling levels at least two fold higher than the present sampling levels would be required to attain the precision levels set in the current Data Collection Framework (DCF). We discuss the implications of these results in light of the future ability of European onboard sampling programs to detect, e.g., progressive reductions in discard levels.
SAP 2012 - Programma 11 - CITTÀ DELLA CONOSCENZA E PER I GIOVANIComune Udine
Lo Stato di Attuazione dei programmi verifica il grado di avanzamento dei programmi dell'anno in corso, definiti nella Relazione Previsionale e programmatica 2012/2014.
What is cooperative learning?
The acronym PIES may be used to denote the key elements of positive interdependence, individual accountability, equal participation, and simultaneous interaction.
Maintenance management best practices indonesia outline.compressedFasih Lisan
To provide a better understanding of:
ȉ The principles and functions of a maintenance
management
ȉ 'HȣQLWLRQVDQGNH\WHUPVLQPDLQWHQDQFH
ȉ The basic types of maintenance (corrective, preventive,
predictive), their derivatives and combinations
ȉ 0DLQWHQDQFHVWUDWHJLHVFRPSDULVRQEHQHȣWVDQG
risks
ȉ How to interface with operations and technical
teams
ȉ The total productive maintenance concepts, integrated
teams
ȉ +RZWRGHULYHWKHEHVWȣWWLQJPDLQWHQDQFHVWUDWHJ\
ȉ Maintenance planning and scheduling including
shut down fundamentals
ȉ The types, structures and philosophies for selecting
WKHEHVWȣWWLQJPDLQWHQDQFHRUJDQL]DWLRQV
ȉ Budget planning and cost controlling in maintenance
ȉ Maintenance logistics, spare parts optimization
ȉ The role of HSE management
Our UK offices have been in disarray for far too long while we’ve made adjustments, changes, and improvements to compliment our growing company! Our refurbishment is finally complete and we’re loving the improved space. We hope you love it, too!
We are from the internet - we know the value of open source. Hardware and storage is unfortunately real, but you can outsource it all. This talk will guide you through how to exploit cloud computing today to make you happier and more efficient.
What is cooperative learning?
The acronym PIES may be used to denote the key elements of positive interdependence, individual accountability, equal participation, and simultaneous interaction.
Maintenance management best practices indonesia outline.compressedFasih Lisan
To provide a better understanding of:
ȉ The principles and functions of a maintenance
management
ȉ 'HȣQLWLRQVDQGNH\WHUPVLQPDLQWHQDQFH
ȉ The basic types of maintenance (corrective, preventive,
predictive), their derivatives and combinations
ȉ 0DLQWHQDQFHVWUDWHJLHVFRPSDULVRQEHQHȣWVDQG
risks
ȉ How to interface with operations and technical
teams
ȉ The total productive maintenance concepts, integrated
teams
ȉ +RZWRGHULYHWKHEHVWȣWWLQJPDLQWHQDQFHVWUDWHJ\
ȉ Maintenance planning and scheduling including
shut down fundamentals
ȉ The types, structures and philosophies for selecting
WKHEHVWȣWWLQJPDLQWHQDQFHRUJDQL]DWLRQV
ȉ Budget planning and cost controlling in maintenance
ȉ Maintenance logistics, spare parts optimization
ȉ The role of HSE management
Our UK offices have been in disarray for far too long while we’ve made adjustments, changes, and improvements to compliment our growing company! Our refurbishment is finally complete and we’re loving the improved space. We hope you love it, too!
We are from the internet - we know the value of open source. Hardware and storage is unfortunately real, but you can outsource it all. This talk will guide you through how to exploit cloud computing today to make you happier and more efficient.
Robust parametric classification and variable selection with minimum distance...echi99
We present a robust solution to the classification and variable selection problem when the dimension of the data, or number of predictor variables, may greatly exceed the number of observations. When faced with the problem of classifying objects given many measured attributes of the objects, the goal is to build a model that makes the most accurate predictions using only the most meaningful subset of the available measurements. The introduction of L1 regularized model fitting has inspired many approaches that simultaneously do model fitting and variable selection. If parametric models are employed, the standard approach is some form of regularized maximum likelihood estimation. This is an asymptotically efficient procedure under very general conditions - provided that the model is specified correctly. Correctly specifying a model, however, is not trivial. Even a few outliers among data drawn from an otherwise pure sample of data can result in a very poor model. In contrast, minimizing the integrated square error, while less efficient, proves to be robust to a fair amount of contamination. We propose to fit logistic models using this alternative criterion to address the possibility of model misspecification. The resulting method may be considered a robust variant of regularized maximum likelihood methods for high dimensional data.
The design of chemical libraries is usually informed by pre-existing characteristics and desired features. On the other hand, assesing the prospective performance of a new library is more difficult. Importantly, a given screening library is often screened in a variety of systems which can differ in cell lines, readouts, formats and so on. In this study we explore to what extent pre-existing libraries can shed light on the relation between library activity and assay features. Using an ontology such as the BAO, it is possible to construct a hierarchy of annotations associated with an assay. Based on this annotation hierarchy we can then ask how likely are molecules associated with a specific annotation, to be identified as active. To allow generalization we consider substrucural features, as represented by a structural key fingerprint, rather than whole molecules. We employ a Bayesian framework to quantify the the association between a substructural feature and a given assay annotation, using a set of NCGC assays that have been annotated with BAO terms. We discuss our approach to training the Bayesian model and describe benchmarks that characterize model performance relative to the position of the annotation in the BAO hierarchy. Finally we discuss the role of this approach in a library design workflow that includes traditional design features such as chemical space coverage and physicochemical properties but also takes in to account screening platform features.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
1. Outlier Detection Using R-NN Curves
Searching for HIV Integrase Inhibitors
Summary
Navigating Molecular Haystacks:
Tools & Applications
Finding Interesting Molecules & Doing it Fast
Rajarshi Guha
Department of Chemistry
Pennsylvania State University
20th April, 2006
2. Outlier Detection Using R-NN Curves
Searching for HIV Integrase Inhibitors
Summary
Past Work
QSAR Applications
Artemisinin analogs
PDGFR inhibitors
Bleaching agents
Linear & non-linear methods
QSAR Methods
Representative QSAR sets
Model interpretation
Model applicability
3. Outlier Detection Using R-NN Curves
Searching for HIV Integrase Inhibitors
Summary
Past Work
Cheminformatics
Automated QSAR pipeline
Contributions to the CDK
Cheminformatics webservices
R packages and snippets
4. Outlier Detection Using R-NN Curves
Searching for HIV Integrase Inhibitors
Summary
Past Work
Chemical Data Mining
Approximate k-NN
Local regression
Outlier detection
VS protocol
5. Outlier Detection Using R-NN Curves
Searching for HIV Integrase Inhibitors
Summary
Past Work
Chemical Data Mining
Approximate k-NN
Local regression
Outlier detection
VS protocol
6. Outlier Detection Using R-NN Curves
Searching for HIV Integrase Inhibitors
Summary
Outline
1 Outlier Detection Using R-NN Curves
Methods for Diversity Analysis
Generating & Summarizing R-NN Curves
Using R-NN Curves
2 Searching for HIV Integrase Inhibitors
Previous Work on HIV Integrase
A Tiered Virtual Screening Protocol
What Does the Pipeline Give Us?
3 Summary
7. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Outline
1 Outlier Detection Using R-NN Curves
Methods for Diversity Analysis
Generating & Summarizing R-NN Curves
Using R-NN Curves
2 Searching for HIV Integrase Inhibitors
Previous Work on HIV Integrase
A Tiered Virtual Screening Protocol
What Does the Pipeline Give Us?
3 Summary
8. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Diversity Analysis
Why is it Important?
Compound acquisition
Lead hopping
Knowledge of the distribution of compounds in a descriptor
space may improve predictive models
9. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Approaches to Diversity Analysis
Cell based q q
q
Divide space into bins q
q q
q
qq
q q
qq
q
q
q
qq
q
q
q
q
q q
q
qq
q
q
qq qq q
q
qq q
qq q q
q q
qq q q q
Compounds are mapped to bins
q q q q
q qqq q qq q
q qqq q qq q
q q q q
q q q q q q qqq q
q q
q q qqq q q q
q q q q q
q q
qq q q
q
q q q q q
qq q q q
q q q qq q
q q
q q
q q q
q
q
q
qq q q
q q q q
Disadvantages q
qq q
q
q
q
Not useful for high dimensional
data
Choosing the bin size can be tricky
Schnur, D.; J. Chem. Inf. Comput. Sci. 1999, 39, 36–45
Agrafiotis, D.; Rassokhin, D.; J. Chem. Inf. Comput. Sci. 2002, 42, 117–12
10. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Approaches to Diversity Analysis
q
Distance based q q
q
q
q
q
Considers distance between q
q
q
q
q
compounds in a space q q
q
q
q
q q
q q q q
Generally requires pairwise distance q
q
q
q
calculation q q q
q
q q
q
q q
q q
q q
Can be sped up by kD trees, MVP q
q
q
q
q
trees etc. q
q
q
q
q
Agrafiotis, D. K.; Lobanov, V. S.; J. Chem. Inf. Comput. Sci. 1999, 39, 51–58
11. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Generating an R-NN Curve
Observations
Consider a query point with a hypersphere, of radius R,
centered on it
For small R, the hypersphere will contain very few or no
neighbors
For larger R, the number of neighbors will increase
When R ≥ Dmax , the neighbor set is the whole dataset
The question is . . .
Does the variation of nearest neighbor count with radius allow us
to characterize the location of a query point in a dataset?
12. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Generating an R-NN Curve
Observations
Consider a query point with a hypersphere, of radius R,
centered on it
For small R, the hypersphere will contain very few or no
neighbors
For larger R, the number of neighbors will increase
When R ≥ Dmax , the neighbor set is the whole dataset
The question is . . .
Does the variation of nearest neighbor count with radius allow us
to characterize the location of a query point in a dataset?
13. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Generating an R-NN Curve
Algorithm q
q 50%
q q
Dmax ← max pairwise distance q q
q
q
qq
30%
for molecule in dataset do q q
q q
q
q q
q
q q
q
q
q
q
q
q
q q q
R ← 0.01 × Dmax
q q q
q
q q
q qq q q q q
q q q q
qq q
q
q q q
q 10%
q q q q q q
q
q q q
q 5%
while R ≤ Dmax do
q q q
q qq qq
q q q q q
q qq
q q q q
qq q
q q q
q q q
q q q
q
q qq q q q
q q
q q q
Find NN’s within radius R
q q
q q
q
q
q
q
q q q q
q q q
q
q q q
Increment R
q q
q q q
q q q
q
q q q q q
q
q
q
end while q
end for q
Plot NN count vs. R
Guha, R.; Dutta, D.; Jurs, P.C.; Chen, T.; J. Chem. Inf. Model., submitted
14. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Generating an R-NN Curve
qq
qq
qqqq
qqq qqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqq
q qqqqqq
qqqqq
q
qqqq
qqqq
250
q
250
qq
qqq
qq
Number of Neighbors
Number of Neighbors
q q
200
200
qq
q
q
q
150
150
qq
q
q
q
q
100
100
q
qq
q
q
50
q
50
qq
q
qq
qq q
qqq
qq
qqqqqqqqqq
qqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqq
0
qqq
qq
0
0 20 40 60 80 100 0 20 40 60 80 100
R R
Sparse Dense
15. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Characterizing an R-NN Curve
Converting the Plot to Numbers
Since R-NN curves are sigmoidal, fit them to the logistic
equation
1 + m e −R/τ
NN = a ·
1 + n e −R/τ
m, n should characterize the curve
Problems
Two parameters
Non-linear fitting is dependent on the starting point
For some starting points, the fit does not converge and
requires repetition
16. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Characterizing an R-NN Curve
Converting the Plot to a Number
Determine the value of R where the
lower tail transitions to the linear
portion of the curve S=0
S’’
Solution
Determine the slope at various
points on the curve S’
S=0
Find R for the first occurence of
the maximal slope (Rmax(S) )
Can be achieved using a finite
difference approach
17. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Characterizing an R-NN Curve
Converting the Plot to a Number
Determine the value of R where the
lower tail transitions to the linear
portion of the curve S=0
S’’
Solution
Determine the slope at various
points on the curve S’
S=0
Find R for the first occurence of
the maximal slope (Rmax(S) )
Can be achieved using a finite
difference approach
18. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Characterizing Multiple R-NN Curves
Problem
Visual inspection of curves is useful
for a few molecules
80
For larger datasets we need to
summarize R-NN curves
60
Rmax S
40
Solution
Plot Rmax(S) values for each
20
molecule in the dataset
Points at the top of the plot are
0
0 50 100 150 200 250
Serial Number
located in the sparsest regions
Points at the bottom are located in
the densest regions
19. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
How Can We Use It For Large Datasets?
Breaking the O(n2 ) barrier
Traditional NN detection has a time complexity of O(n2 )
Modern NN algorithms such as kD-trees
have lower time complexity
restricted to the exact NN problem
Solution is to use approximate NN algorithms such as Locality
Sensitive Hashing (LSH)
Bentley, J.; Commun. ACM 1980, 23, 214–229
Datar, M. et al.; SCG ’04: Proc. 20th Symp. Comp. Geom.; ACM Press, 2004
Dutta, D.; Guha, R.; Jurs, P.; Chen, T.; J. Chem. Inf. Model. 2006, 46, 321–333
20. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
How Can We Use It For Large Datasets?
−1.0
−1.5
log [Mean Query Time (sec)]
Why LSH?
−2.0
Theoretically sublinear
−2.5
Shown to be 3 orders of
−3.0
LSH (142 descriptors)
LSH (20 descriptors)
magnitude faster than kNN (142 descriptors)
kNN (20 descriptors)
−3.5
traditional kNN
0.02 0.03 0.04 0.05 0.06 0.07 0.08
Very accurate (> 94%) Radius
Comparison of NN detection speed
on a 42,000 compound dataset
using a 200 point query set
Dutta, D.; Guha, R.; Jurs, P.; Chen, T.; J. Chem. Inf. Model. 2006, 46, 321–333
21. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Alternatives?
Why not use PCA?
R-NN curves are fundamentally a
4
form of dimension reduction
2
Principal Components Analysis is
Principal Component 2
0
also a form of dimension reduction
−2
Disadvantages
−4
Eigendecomposition via SVD is
−6
O(n3 ) 0 5 10
Principal Component 1
Difficult to visualize more than 2 or
3 PC’s at the same time
22. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Datasets
Boiling point dataset
277 molecules
Average: MW = 115, Stanimoto = 0.20
Calculated 214 descriptors, reduced to 64
Kazius-AMES dataset
4337 molecules
Average MW = 240, Stanimoto = 0.21
Known to have a number of significant outliers
Calculated 142 MOE descriptors, reduced to 45
Goll, E.; Jurs, P.; J. Chem. Inf. Comput. Sci. 1999, 39, 974–983
Kazius, J.; McGuire, R.; Bursi, R.; J. Med. Chem. 2005, 48, 312–320
23. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Choosing a Descriptor Space
Boiling point dataset
We had previously generated linear regression models using a
GA to search for subsets
Best model had 4 descriptors
Kazius-AMES dataset
No models were generated for this dataset
Considered random descriptor subsets
Used a 5-descriptor subset to represent the results
The R-NN curve approach focuses on the distribution of molecules
in a given descriptor space. Hence a good or random descriptor
subset should be able to highlight the utility of the method
24. Outlier Detection Using R-NN Curves Methods for Diversity Analysis
Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves
Summary Using R-NN Curves
Choosing a Descriptor Space
Boiling point dataset
We had previously generated linear regression models using a
GA to search for subsets
Best model had 4 descriptors
Kazius-AMES dataset
No models were generated for this dataset
Considered random descriptor subsets
Used a 5-descriptor subset to represent the results
The R-NN curve approach focuses on the distribution of molecules
in a given descriptor space. Hence a good or random descriptor
subset should be able to highlight the utility of the method