discusses using a definitive screening design to characterize and optimize a glycoprofiling method and compares the definitive screening results to a much larger central composite design results
Data Envelopment Analysis is a linear programming technique that assigns efficiency scores to firms engaged in producing similar outputs employing similar inputs. Extremely efficient firms are potential Outliers. The method developed detects Outliers, implementing Stochastic Threshold Value, with computational ease. It is useful in data filtering in BIG DATA problems.
Test Optimization With Design of Experimentajitbkulkarni
This presentation describes optimization techniques using JMP tool to significantly reduce the test resources and test execution time without sacrificing test coverage.
Data Envelopment Analysis is a linear programming technique that assigns efficiency scores to firms engaged in producing similar outputs employing similar inputs. Extremely efficient firms are potential Outliers. The method developed detects Outliers, implementing Stochastic Threshold Value, with computational ease. It is useful in data filtering in BIG DATA problems.
Test Optimization With Design of Experimentajitbkulkarni
This presentation describes optimization techniques using JMP tool to significantly reduce the test resources and test execution time without sacrificing test coverage.
Analysis strategies for constrained mixture and mixture process experiments u...Philip Ramsey
Approaches to analyzing constrained mixture and mixture process factor experiments. Originally presented at the European Discovery Conference in Copenhagen on 3/14/2019
Alleviating Privacy Attacks Using Causal ModelsAmit Sharma
Machine learning models, especially deep neural networks have been shown to reveal membership information of inputs in the training data. Such membership inference attacks are a serious privacy concern, for example, patients providing medical records to build a model that detects HIV would not want their identity to be leaked. Further, we show that the attack accuracy amplifies when the model is used to predict samples that come from a different distribution than the training set, which is often the case in real world applications. Therefore, we propose the use of causal learning approaches where a model learns the causal relationship between the input features and the outcome. An ideal causal model is known to be invariant to the training distribution and hence generalizes well to shifts between samples from the same distribution and across different distributions. First, we prove that models learned using causal structure provide stronger differential privacy guarantees than associational models under reasonable assumptions. Next, we show that causal models trained on sufficiently large samples are robust to membership inference attacks across different distributions of datasets and those trained on smaller sample sizes always have lower attack accuracy than corresponding associational models. Finally, we confirm our theoretical claims with experimental evaluation on 4 moderately complex Bayesian network datasets and a colored MNIST image dataset. Associational models exhibit upto 80\% attack accuracy under different test distributions and sample sizes whereas causal models exhibit attack accuracy close to a random guess. Our results confirm the value of the generalizability of causal models in reducing susceptibility to privacy attacks. Paper available at https://arxiv.org/abs/1909.12732
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Matt Hansen
An extension on a series about hypothesis testing, this lesson reviews the 2 Sample T & Paired T tests as central tendency measurements for normal distributions.
Res 351 final exam guide 9) In the Southeast, the potato chip market share he...laksminarayanakmpv
9) In the Southeast, the potato chip market share held by the Lays brand is 46%. This is an example of _____.
A. a research question
B. a descriptive hypothesis
C. a relational hypothesis
D. an explanatory hypothesis
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Matt Hansen
An extension on hypothesis testing, this lesson reviews the 1 Sample Sign & Wilcoxon tests as central tendency measurements for non-normal distributions.
This is a presentation of a Quality Improvement Project conducted in King Faisal Specialist Hospital (KFSH&RC), CT section. Under the Course: Healthcare Quality Improvement, HSQM 614. king Saud bin Abdulaziz university for health sciences.
- Project Timeline: (5 October - 29 December) 2016.
- Project Leader: Bandar AlGhamdi
- Course Instructor: Dr. Khaled Al Surimi
Analysis strategies for constrained mixture and mixture process experiments u...Philip Ramsey
Approaches to analyzing constrained mixture and mixture process factor experiments. Originally presented at the European Discovery Conference in Copenhagen on 3/14/2019
Alleviating Privacy Attacks Using Causal ModelsAmit Sharma
Machine learning models, especially deep neural networks have been shown to reveal membership information of inputs in the training data. Such membership inference attacks are a serious privacy concern, for example, patients providing medical records to build a model that detects HIV would not want their identity to be leaked. Further, we show that the attack accuracy amplifies when the model is used to predict samples that come from a different distribution than the training set, which is often the case in real world applications. Therefore, we propose the use of causal learning approaches where a model learns the causal relationship between the input features and the outcome. An ideal causal model is known to be invariant to the training distribution and hence generalizes well to shifts between samples from the same distribution and across different distributions. First, we prove that models learned using causal structure provide stronger differential privacy guarantees than associational models under reasonable assumptions. Next, we show that causal models trained on sufficiently large samples are robust to membership inference attacks across different distributions of datasets and those trained on smaller sample sizes always have lower attack accuracy than corresponding associational models. Finally, we confirm our theoretical claims with experimental evaluation on 4 moderately complex Bayesian network datasets and a colored MNIST image dataset. Associational models exhibit upto 80\% attack accuracy under different test distributions and sample sizes whereas causal models exhibit attack accuracy close to a random guess. Our results confirm the value of the generalizability of causal models in reducing susceptibility to privacy attacks. Paper available at https://arxiv.org/abs/1909.12732
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Matt Hansen
An extension on a series about hypothesis testing, this lesson reviews the 2 Sample T & Paired T tests as central tendency measurements for normal distributions.
Res 351 final exam guide 9) In the Southeast, the potato chip market share he...laksminarayanakmpv
9) In the Southeast, the potato chip market share held by the Lays brand is 46%. This is an example of _____.
A. a research question
B. a descriptive hypothesis
C. a relational hypothesis
D. an explanatory hypothesis
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Matt Hansen
An extension on hypothesis testing, this lesson reviews the 1 Sample Sign & Wilcoxon tests as central tendency measurements for non-normal distributions.
This is a presentation of a Quality Improvement Project conducted in King Faisal Specialist Hospital (KFSH&RC), CT section. Under the Course: Healthcare Quality Improvement, HSQM 614. king Saud bin Abdulaziz university for health sciences.
- Project Timeline: (5 October - 29 December) 2016.
- Project Leader: Bandar AlGhamdi
- Course Instructor: Dr. Khaled Al Surimi
Explains the concept of autovalidation that can be used to select predictive models with data from designed experiments where a true validation set is not available. Contains three case studies to demonstrate the approach
- Complexity of climate systems
- Climate modelling
- The need for modelling
- System thinking
- Analytical vs Numerical modeling
- Mathematical models
- Modeling process and model selection
- Model Uncertainty
- Modeling application and tools
ADMET properties prediction using AI will accelerate the process of drug discovery.
This slide mostly focuses on using graph-based deep learning techniques to predict drug properties.
Six Sigma Methods and Formulas for Successful Quality ManagementRSIS International
This paper is about the five phases of Six Sigma which are Define, Measure, Analyze, Improve& Control. The methods used in each phase are discussed in detail and the various tests used in Analyze Phase of Six Sigma are given; Six Sigma can be implemented in an organization by using the methods and formulas used in each phase combined with the help of Statistical Software Minitab 18.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Software testing effort estimation with cobb douglas function a practical app...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Software testing effort estimation with cobb douglas function- a practical ap...eSAT Journals
Abstract Effort estimation is one of the critical challenges in Software Testing Life Cycle (STLC). It is the basis for the project’s effort estimation, planning, scheduling and budget planning. This paper illustrates model with an objective to depict the accuracy and bias variation of an organization’s estimates of software testing effort through Cobb-Douglas function (CDF). Data variables selected for building the model were believed to be vital and have significant impact on the accuracy of estimates. Data gathered for the completed projects in the organization for about 13 releases. Statistically, all variables in this model were statistically significant at p<0.05><0.01 levels. The Cobb-Douglas function was selected and used for the software testing effort estimation. The results achieved with CDF were compared with the estimates provided by the area expert. The model’s estimation figures are more accurate than the expert judgment. CDF has one of the appropriate techniques for estimating effort for software testing. CDF model accuracy is 93.42%.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Use of Definitive Screening Designs to Optimize an Analytical Method
1. Applications of Designed Experiments in the Development of an
Analytical Method for Glycoprofiling
Eliza Yeung, Ph.D.
R&D Strategic Projects Manager/
PC Study Director
Cytovance Biologics, Inc
800 Research Parkway, Ste 200
Oklahoma City, OK, USA
eyeung@cytovance.com
Philip J. Ramsey, Ph.D.
Department of Math. and Stat.
North Haven Group, consultancy
University of New Hampshire
Durham, NH, USA
Cell: 1-315-3518
philip.ramsey@unh.edu
2. Glycoproteins are the largest group of biologically-derived drugs.
ICH Q6B guideline requires extensive physiochemical characterization
of biopharmaceuticals including inherent structural heterogeneity due
to glycosylation (post-translationally modified) and lot-to-lot
consistency is required.
Carbohydrate content.
Carbohydrate chain structure.
Oligosaccharide pattern
(antennary profile).
Glycosylation site.
Currently, there is a lack of a universal analysis technique for
glycosylation analysis.
(C) 2015 Cytovance 2
High-Mannose Complex-types
biantennary triantennary tetrantennary
Common types of N-glycans. UPS General Chapter <1084>.
3. The goal of the research is to develop a robust and cost effective
method to characterize glycoproteins.
High Performance Anion Exchange Chromatography with pulsed
amperometic detection (HPAE-PAD) was selected as a potential
method for characterization.
The goal is to express glycan
peaks as a function of an
in-house glucose ladder (GU).
An experimental design approach
was selected to characterize
and optimize (make robust)
the HPAE-PAD method.
(C) 2015 Cytovance 3
0 10 20 30 40 50
-10
0
10
20
30
40
50
Glucose Ladder
GU=1
GU=2
GU=3
GU=4
GU=5
GU=6
GU=7
GU=8
GU=9
GU=10
GU=11
Time (min)
Response
4. The approach uses the GU ladder as a reference to identify glycoform
peaks from an actual human antibody sample.
(C) 2015 Cytovance 4
Glycoform
glycan
G-Unit
A 3.59
B 3.89
C 4.23
D 4.42
E 9.17
F 10.8
0 10 20 30 40 50
-10.0
-7.5
-5.0
-2.5
0.0
2.5
5.0
7.5
10.0
human IgG
Glucose ladder
GU=1
GU=2
GU=3
GU=4
GU=5
GU=6
GU=7
GU=8
GU=9
GU=10
GU=11
Neutral
A
B
C
D
E F
Charged
Time (min)
Response
Neutrall Charged
5. Sources of variation in glycoprofiling and analytical target profile.
(C) 2015 Cytovance 5
Analytical Target Profile (ATP)
• A good separation of neutral & charge glycoform glycans in a single run
• Consistent glycan peak assignment
6. Test biomolecule
glycans released
by PNGase F
digestion of a
glycoprotein
HPAE-PAD
glycoprofiling of
test biomolecule
glycans
(each glycan is
expressed as
GU, glucose
unit)
Define responses
of interested and
factors for DOE
(e.g. resolution
between targeted
GUs)
Use the JMP
statistical
software to
design the
experiment
(DOE)
Use glucose
ladder for DOE
study
Use JMP to select
models for each
response
Use JMP to
optimize all
responses of
interested
(C) 2015 Cytovance 6
7. (C) 2015 Cytovance 7
A total of 8 runs using a human IgG sample were performed to
demonstrate the stability or run to run robustness of the HPAE-PAD
method using the Glucose Ladder – some of the runs were
performed on separate days.
Below is a table of the results and one can see that the identified
glycoform peaks in the IgG sample consistently eluted with respect
to the GU peaks.
Glycoform G-Unit Std Dev %CV
A 3.59 0.07 1.9%
B 3.89 0.09 2.4%
C 4.23 0.08 1.9%
D 4.42 0.08 1.7%
E 9.17 0.09 0.9%
F 10.8 0.12 1.1%
8. (C) 2015 Cytovance 8
Five factors were selected to manipulate in the experiment.
* Gradient_01, _02 and _03 are % A (500 mM NaOAc) increases over 12 min, 12 min and 18
min respectively and at constant initial % B (200 mM NaOH,10 mM NaOAc). The values are
expressed as mM NaOAc per min.
Factor (level) -1 0 1
Initial %NaOAc (% A) 0 10 20
Initial %NaOH (% B) 30 40 50
Gradient_01
(mM NaOAc /min)
0.415 1.25 2.085
Gradient_02
(mM NaOAc /min)
1.25 2.085 2.915
Gradient_03
(mM NaOAc /min)
4.72 5.555 6.39
9. (C) 2015 Cytovance 9
Seven responses were chosen to optimize in the experiment.
Response Description Optimization
RT_G03 Retention Time Target ~ 8.5 min
Resol_G03 Resolution G03-G04 Maximize
Resol_G04 Resolution G04-G05 Maximize
Resol_G05 Resolution G05-G06 Maximize
Resol_G09 Resolution G09-G10 Maximize
Resol_G10 Resolution G10-G11 Maximize
USP Tailing USP Tailing G04 Monitor (0.8-1.2)
10. (C) 2015 Cytovance 10
There are many choices for the experimental design and it was expected
that nonlinear and interaction effects would occur.
Below is a table of commonly used types of designs.
The number of runs displayed assume 3 replicate center points for the
DSD and Fractional Factorial and 2 for the CCD.
No. of
Factors
Definitive Screening
Design
Screening Design
e.g. Fractional Factorial
Response Surface
e.g. Central
Composite Design
4 12 11 26
5 16 19 28
6 16 19 or 35 46
7 20 19 or 35 80
8 20 19 or 35 82
11. (C) 2015 Cytovance 11
A new class of screening designs were developed by Jones and
Nachtsheim (2011a, 2011b) referred to as Definitive Screening
Designs (DSD); the designs have been enhanced by Xiao (2012).
For K factors DSDs require 2K+1 runs if K is even and 2K+3 if K
is odd (to ensure main effect orthogonality).
All factors are run at three levels in a factorial arrangement.
Main effects are orthogonal and free of aliasing (partial or full)
with quadratic effects and two-way interaction effects.
No quadratic or two-way interaction effect is fully aliased with
another quadratic or two-way interaction effect.
It is possible to estimate every term of a full quadratic model, but
not in a single model.
12. (C) 2015 Cytovance 12
Although the DSD is thought of as a screening design it can just as
easily thought of as a very efficient response surface design.
Traditional response surface designs, such as the Central Composite
Design, are necessarily large because they support the estimation
of the full quadratic model
Full quadratic: a model that contains all main effects, all quadratic
effects, and all possible two-factor interactions.
For K factors the number of terms N in the full quadratic model is:
In practice, effect sparsity almost always holds, so it is not necessary
to estimate all N terms; in fact often <50% are actually important.
2 1
2
K K
N
13. (C) 2015 Cytovance 13
It was decided to perform both the Definitive Screening Design
(DSD) and the larger Central Composite Design in parallel to
determine how well the DSD performed compared to the much
larger CCD.
Our focus in this talk is primarily on the DSD, but the CCD results
will be used as a basis of comparison.
For a set of p potential experimental effects there are potentially 2P
possible models to consider – a rather daunting challenge.
For K = 5 factors there are 20 possible experimental effects in the
full quadratic model resulting in a total of 220 = 1,048,576 possible
models of varying size in terms of included effects.
Fortunately only a subset of the 20 potential effects are likely to be
important, so our model sizes are considerably smaller than 20.
14. (C) 2015 Cytovance 14
In building predictive models we have two competing issues:
Under-fitting the model resulting in biased or inaccurate
prediction;
Over-fitting the model resulting in inflated prediction error.
Although the classic approach to the under- and over-fitting problem
is to find a single, best compromise model, this is not necessarily an
optimal strategy.
In no way can one consider a single model to be correct; a
correct model only exists in a simulation study.
Modern computing power and statistical algorithms available allow us
to look for models, or a combination of models, that best predict
the behavior of the physical system.
15. (C) 2015 Cytovance 15
Two widely accepted measures of fit for a model are:
AICc = bias corrected Akaike Information Criterion;
BIC = Bayesian Information Criterion;
Both criteria punish under- and over-fitting, but in a different way.
So, they may not agree on the best model(s) – they often do not
(see Burnham and Anderson, 2002).
There is not agreement in the statistical community as to whether
AICc or BIC criterion is preferred; it almost surely depends on the
application.
For both the AICc and BIC smaller values indicate better
predictive models.
16. (C) 2015 Cytovance 16
A major focus in contemporary statistics and science is research into
model selection strategies with no shortage of ideas and opinions.
We will focus on a straightforward approach referred to as All
Possible Models where we compute all possible models up to a
specified size.
As an example, we fit all one factor models, all two factors, etc.
Given the DSD is supersaturated for the full quadratic model, that is
the number of possible effects p > n, the largest possible model is
determined by the number of experimental settings in the DSD.
The strategy we use is to define a full quadratic model and then fit all
possible models up to and including a specified size.
The fitted models are then sorted by AICc and BIC.
17. (C) 2015 Cytovance 17
Instead of AICc and BIC we use a form of them with nice theoretical
properties for model selection.
It is best to rank models based on AICc differences computed as
where AICcmin is the smallest AICc among the candidate models.
It is also best to work with BIC differences in ranking models where
the differences are computed as
Typically models where or B exceed a range of 2 – 4 are excluded
from further consideration – this is not a hard and fast rule.
We will use this strategy on the HPAE-PAD experiment.
mini iAICc AICc
mini iB BIC BIC
18. (C) 2015 Cytovance 18
Below is a view of the DSD trials and experimental results for the
response.
The analyses are preformed with the JMP statistical software.
19. (C) 2015 Cytovance 19
Below is a plot of the delta AICc and BIC results that help us identify
potential candidate models; response is RT_03.
AICc suggests
models with
about p =5.
BIC suggesting
models with
around p = 7
to 10.
As expected the
two criteria do
not agree as to
best models.
20. (C) 2015 Cytovance 20
Using All Possible Models we fit a number of models for each
response and select the best models in terms of fit and prediction
error.
Below is a typical model for RT_03
22. (C) 2015 Cytovance 22
Using function decomposition methods from applied math we can
assess the relative importance of the input factors.
This is a unique capability in JMP 12.
23. (C) 2015 Cytovance 23
A similar modeling and optimization process was performed using the
data from the much larger CCD experiment.
Below is a comparison table of the recommended optimum settings for
the inputs for the two designs.
There is close agreement on the three most important inputs.
Input DSD Optimum CCD Optimum
Initial %Ac 1.20 1.55
Initial %NaOH 50.00 50.00
Gradient_01 1.23 0.42
Gradient_02 1.25 2.92
Gradient_03 4.87 4.72
24. (C) 2015 Cytovance 24
Below are two Actual vs Predicted plots based on the DSD-based
models predicting the CCD responses, which is used as a validation
data set.
25. (C) 2015 Cytovance 25
The predicted responses are also very close for the two designs. One
can see that the predicted responses at the optimum settings are
close.
Response
DSD
Optimum
CCD
Optimum
2*(Std. Errors)
RT_03 8.50 8.50 0.60
Resol_03 8.38 8.68 2.40
Resol_04 8.80 9.97 1.20
Resol_05 8.37 7.73 1.00
Resol_09 4.69 4.14 1.20
Resol_10 3.78 3.54 0.66
26. (C) 2015 Cytovance 26
Robust analytical methods are required by QbD for GMP in
Biopharmaceuticals and in general for sound research results.
Definitive Screening Designs are a cost effective type of experimental
design that can be used to characterize and optimize analytical
methods or many physical phenomena.
In this presentation we have shown that the DSD performed as well as
the CCD in optimizing the HPAE-PAD despite having
approximately half the total number of experimental trials.
Note: When substantial amounts of observational data are available,
the methods shown in this talk can often still be used.
27. (C) 2015 Cytovance 27
The following are couple of additional events that you might be
interested in attending.
Session C2 Friday Morning. Application of Definitive Screening
Design (DSD) to the icIEF Assay Development of Antibodies and
Therapeutic Proteins. Dr. Srividya Suryanarayana (R & D Services
Cytovance Biologics, United States).
Session C2 Friday 13:50 – 17:00. Workshop: Modern Design and
Analysis of Experiments for Biological Applications Using the
JMP® Statistical Software. Dr. Philip J. Ramsey (University of New
Hampshire, United States)