This document describes a project that aims to maximize the power of hypothesis tests in single gene experiments with multiple cell types under a cost constraint. It formulates the problem and proposes a sequential sampling procedure to estimate the optimal sample size allocation across cell types. The procedure begins with an initial sample from each cell type and sequentially adds more samples until the estimated optimal sample size is reached for each cell type while ensuring the total cost is within the given budget. Graphical and tabular results from simulations evaluating the procedure are also presented.
This study examined the relationship between perceived service quality and customer satisfaction in Sri Lankan veterinary hospitals. It found that perceived service quality has a strong positive relationship with customer satisfaction. Through a survey of 200 customers, the study identified four key dimensions of service quality: service-oriented commitment, reliability, tangibles, and assurance. Service-oriented commitment was found to be the most important driver of customer satisfaction, while tangibles had the weakest impact on satisfaction. The results indicate that all four dimensions are positively related to customer satisfaction.
A Heuristic Approach to Innovative Problem SolvingIJERA Editor
A methodic approach to innovative problem solving is suggested. First, Bayesian techniques are analyzed for quantifying, monitoring and predicting the process. The symmetry of Bayes‟ theorem implicates that the chances of success offrail ideas with small base rates can be boosted by highly accurate tests built on solid scientific ground. Second, a hypothesis is presented in which five methodic elements – connection, selection, transformation, balance and finish - are deemed to be necessary and sufficient to explain innovative solutions to complex problems. The hypothesis is supported by the analysis of disruptive innovations in several fields, and by emulation of a data base including 40,000 inventions.The reported findings may become useful in the further methodic development of innovative problem solving, especially in the risky and lengthy preconceptual phases.
Students’t distribution, small sample inference about population mean and the difference between two means. paired difference tests, inferences about population variance
In the presentation, hypothesis test has been explained with scrap. Tree diagram is there to understand in which situation u can apply which parametric test
Strategies for setting futility analyses at multiple time points in clinical ...Lu Mao
This document outlines strategies for conducting futility analyses in clinical trials with multiple interim analyses. It discusses motivations for futility rules, relevant statistical tools like conditional power and predictive power, and previous studies investigating futility rules based on these tools. The document proposes a theoretical framework for nonbinding futility rules using different statistical scales and numerical simulations in R to optimize the average sample size under the null hypothesis while controlling for power loss and sample size inflation.
Nowadays Scientists using laboratory animals are under increasing pressure to justify their sample sizes using a ‘‘power analysis’’ to improve study reproducibility and from ethical point of view too.
In this presentation, I review the three methods currently used to determine sample size: ‘‘tradition’’ or ‘‘common sense’’, the ‘‘resource equation’’ and the ‘‘power analysis’’.
The document describes an optimization method for weighting explicit and latent concepts in clinical decision support queries. It proposes representing queries using weighted unigram, bigram and multi-term concepts from the query, top documents and knowledge bases. It then uses Graduated Non-Convexity optimization to determine importance weights for different concept types and sources to maximize retrieval performance. Experimental results on two TREC clinical decision support track query sets show the proposed method outperforms other baselines.
This study examined the relationship between perceived service quality and customer satisfaction in Sri Lankan veterinary hospitals. It found that perceived service quality has a strong positive relationship with customer satisfaction. Through a survey of 200 customers, the study identified four key dimensions of service quality: service-oriented commitment, reliability, tangibles, and assurance. Service-oriented commitment was found to be the most important driver of customer satisfaction, while tangibles had the weakest impact on satisfaction. The results indicate that all four dimensions are positively related to customer satisfaction.
A Heuristic Approach to Innovative Problem SolvingIJERA Editor
A methodic approach to innovative problem solving is suggested. First, Bayesian techniques are analyzed for quantifying, monitoring and predicting the process. The symmetry of Bayes‟ theorem implicates that the chances of success offrail ideas with small base rates can be boosted by highly accurate tests built on solid scientific ground. Second, a hypothesis is presented in which five methodic elements – connection, selection, transformation, balance and finish - are deemed to be necessary and sufficient to explain innovative solutions to complex problems. The hypothesis is supported by the analysis of disruptive innovations in several fields, and by emulation of a data base including 40,000 inventions.The reported findings may become useful in the further methodic development of innovative problem solving, especially in the risky and lengthy preconceptual phases.
Students’t distribution, small sample inference about population mean and the difference between two means. paired difference tests, inferences about population variance
In the presentation, hypothesis test has been explained with scrap. Tree diagram is there to understand in which situation u can apply which parametric test
Strategies for setting futility analyses at multiple time points in clinical ...Lu Mao
This document outlines strategies for conducting futility analyses in clinical trials with multiple interim analyses. It discusses motivations for futility rules, relevant statistical tools like conditional power and predictive power, and previous studies investigating futility rules based on these tools. The document proposes a theoretical framework for nonbinding futility rules using different statistical scales and numerical simulations in R to optimize the average sample size under the null hypothesis while controlling for power loss and sample size inflation.
Nowadays Scientists using laboratory animals are under increasing pressure to justify their sample sizes using a ‘‘power analysis’’ to improve study reproducibility and from ethical point of view too.
In this presentation, I review the three methods currently used to determine sample size: ‘‘tradition’’ or ‘‘common sense’’, the ‘‘resource equation’’ and the ‘‘power analysis’’.
The document describes an optimization method for weighting explicit and latent concepts in clinical decision support queries. It proposes representing queries using weighted unigram, bigram and multi-term concepts from the query, top documents and knowledge bases. It then uses Graduated Non-Convexity optimization to determine importance weights for different concept types and sources to maximize retrieval performance. Experimental results on two TREC clinical decision support track query sets show the proposed method outperforms other baselines.
This talk is presented at Bio-inspiring and evolutionary computation: Trends, applications and open issues workshop, 7 Nov. 2015 Faculty of Computers and Information, Cairo University
Use of Definitive Screening Designs to Optimize an Analytical MethodPhilip Ramsey
discusses using a definitive screening design to characterize and optimize a glycoprofiling method and compares the definitive screening results to a much larger central composite design results
Self-Control in Chimpanzees Relates to General IntelligenceSachin Eldho
This document summarizes a study that examined the relationship between general intelligence and delayed gratification in chimpanzees. 40 chimpanzees completed tasks measuring general intelligence (PCTB) and delayed gratification (HDT). Results found that chimpanzees with higher general intelligence scores chose the larger later reward in the HDT more often and accumulated more rewards overall. Intelligence scores were most strongly correlated with an efficiency measure of HDT performance. The findings suggest general intelligence and self-control abilities are linked in chimpanzees, as in humans, and may have common genetic or neural mechanisms.
Computational Pool-Testing with Retesting StrategyWaqas Tariq
Pool testing is a cost effective procedure for identifying defective items in a large population. It also improves the efficiency of the testing procedure when imperfect tests are employed. This study develops computational pool-testing strategy based on a proposed pool testing with re-testing strategy. Statistical moments based on this applied design have been generated. With advent of computers in 1980‘s, pool-testing with re-testing strategy under discussion is handled in the context of computational statistics. From this study, it has been established that re-testing reduces misclassifications significantly as compared to Dorfman procedure although re-testing comes with a cost i.e. increase in the number of tests. Re-testing considered improves the sensitivity and specificity of the testing scheme.
This document summarizes a simulation study comparing the performance of different meta-analysis methods when assumptions of normality are violated. The study generated simulated datasets with various distributions for true effects and degrees of heterogeneity. It then compared methods like fixed effects, DerSimonian-Laird, maximum likelihood, and permutations in terms of coverage, power, and confidence interval estimation. The results showed that some methods are more robust to non-normal data, with profile likelihood and permutations generally performing best, while other methods like fixed effects and DerSimonian-Laird showed poorer performance.
This document discusses 10 different types of data analysis: univariate, bivariate, multivariate, normative, descriptive, classification, evaluative, comparative, and cost-effective analysis. For each type of analysis, the document provides examples of how each could be used to analyze data from research studies, including specifying the variables and statistical tools that could be used. It focuses on explaining the purpose and application of each analytical technique.
Opticon 2017 Experimenting with Stats EngineOptimizely
- Stats Engine was built to help companies make data-driven decisions from experiments as data has become cheaper and more real-time while practitioners are now everyone.
- When analyzing experiments, it is important to avoid peeking at interim results and control for multiple comparisons testing across many metrics and variations which can lead to false discoveries.
- Stats Engine addresses these issues through sequential analysis and controlling the false discovery rate when more signals are introduced to experiments.
- Practitioners can use Stats Engine to understand when to stop experiments, how additional variations and metrics impact results, and how to balance risk versus velocity in their experimentation programs.
Bayesian solutions to high-dimensional challenges using hybrid searchCaleb (Shiqiang) Jin
This dissertation proposes Bayesian methods for variable selection in high-dimensional data. Chapter 2 focuses on Bayesian best subset selection using a hybrid search approach. Specifically, it aims to identify the k most important predictors out of p candidates and find a single best model from the 2^p possible models. A linear regression framework is used where only a few predictors are assumed to be associated with the response. A hybrid search algorithm is developed to solve the non-convex optimization problem of finding the best subset model.
Innovative sample size methods for adaptive clinical trials webinar web ver...nQuery
View the video here:
https://www.statsols.com/webinar/innovative-sample-size-methods-for-adaptive-clinical-trials
Given the high failure rates and the increased costs of clinical trials, researchers need innovative design strategies to best optimize financial resources and reduce the risk to patients.
Adaptive designs are emerging as a way to reduce risk and cost associated with clinical trials. The FDA recently published guidance (Innovative Cures Act) and are actively encouraging sponsors to use Adaptive trials.
Adaptive design is a clinical trial design that allows adaptations or modifications to aspects of the trial after its initiation without undermining the validity and integrity of the trial.
In this webinar, Ronan will demonstrate nQuery's new Adaptive module focusing on Sample Size Re-Estimation & Group-Sequential Design.
In this webinar you will learn about:
The pros and cons of adaptive designs
Sample Size Re-Estimation
Group-Sequential Design
Conditional Power
Predictive Power
Innovative Sample Size Methods For Clinical Trials nQuery
"Innovative Sample Size Methods for Clinical Trials" is hosted to coincide with the Spring 2018 update to nQuery - The leading Sample Size Software.
Hosted by Ronan Fitzpatrick - Head of Statistics and nQuery Lead Researcher at Statsols - you'll learn about the benefits of a range of procedures and how you can implement them into your work:
1) Dose-escalation with the Bayesian Continual Reassessment Method
CRM is a growing alternative to the 3+3 method for Phase I trials finding the Maximum Tolerated Dose (MTD).
See how researchers can overcome 3+3 drawbacks to easily find the required sample size for this beneficial alternative for finding the MTD.
2) Bayesian Assurance with Survival Example
This Bayesian alternative to power has experienced a rapid rise in interest and application from researchers.
See how Assurance is being used by researchers to discover the true “probability of success” of a trial.
3) Mendelian Randomization
Mendelian randomization (MR) is a method that allows testing of a causal effect from observational data in the presence of confounding factors.
However, in order to design efficient Mendelian randomization studies, it is essential to calculate the appropriate sample sizes required. We demonstrate what to do to achieve this.
4) Negative Binomial Distribution
Negative binomial model has been increasingly used to model the count data. One of the challenges of applying negative binomial model in clinical trial design is the sample size estimation.
We demonstrate how best to determine the appropriate sample size in the presence of challenges such as unequal follow-up or dispersion.
The chi-square test is used to determine if an observed frequency distribution differs from an expected theoretical distribution. It can test for independence between two variables or goodness of fit between observed and expected distributions. Some key requirements for the chi-square test are that observations are independent and sample sizes are sufficiently large. The test generates a chi-square statistic that is compared to critical values from a chi-square distribution based on the degrees of freedom to determine if the null hypothesis can be rejected. Examples demonstrated how to calculate expected frequencies, the test statistic, degrees of freedom, and make conclusions based on comparing the test statistic to critical values.
is a lecture presentation covering various aspects of statistical analysis in research. It includes a case study on Barack Obama's 2008 homepage, focusing on multivariate testing to measure conversion rates of different homepage configurations. The lecture discusses statistical tests, the concept of a null hypothesis, p-values, and the chi-square (χ²) test. It also covers normal and binomial distributions, contingency tables, expected values, and how to compute and interpret χ² values. Python code for statistical analysis, particularly using the SciPy library, is also presented, demonstrating how to perform chi-square and t-tests in Python. This material is likely used in an educational setting, focusing on applying statistical methods in practical scenarios.
All content within this presentation is the property of Royal Holloway, University of London. Unauthorized use, duplication, or distribution of the materials contained herein is strictly prohibited.
The document discusses important statistical terms related to sampling. It defines population as the set of all measurements of interest to the researcher, and sample as a subset of the population. Sampling is done to get information about large populations at lower cost and with greater accuracy compared to studying the entire population. The document outlines different types of sampling methods including probability and non-probability sampling, and provides examples like simple random sampling, systematic sampling, and cluster sampling. It also discusses factors like sampling size, sampling error, and type 1 and type 2 errors.
This document discusses non-parametric tests, which are statistical tests that can be used when assumptions of parametric tests are not met. It provides examples of common non-parametric tests including the sign test and Wilcoxon signed-rank test. The sign test analyzes differences between paired observations by assigning a "+" or "-" sign. The Wilcoxon signed-rank test also assigns ranks to the differences and calculates the sum of positive and negative ranks to determine if the null hypothesis can be rejected. The document provides step-by-step explanations and examples of applying these non-parametric tests to paired sample data to test for differences between populations.
Seminar of February 9, 2012 for the ICOS group in the University of Nottingham.
Abstract: The Protein Structure Prediction (PSP) problem is to determine the three-dimensional structure of a protein, using only information contained in its amino acid sequence. The PSP problem is one of the most important open problems in structural bioinformatics. This is because the 3D structures determine the protein function and would be of enormous help for designing new drugs for diseases such as cancer or Alzheimer. Among the main data structures to represent protein structures, there are two widely used: contact maps and distance maps. Contact maps represent binary proximities (contact or non-contact) between each pair of amino acids of a protein. Distance maps represent distances between these amino acids pairs. However, contact and distance maps are very difficult to predict. In fact, the accuracy achieved by protein contact map predictors at Top L/5 in the last Critical Assessment of Techniques for Protein Structure Prediction competition (CASP9) is up to 22% approximately, and clearly must be improved. In this seminar, the author will present an approach to predict protein structures based on a nearest neighbors scheme. In this approach protein fragments are assembled according to their physico-chemical similarities, using information extracted from known protein structures. This method produces a distance map, which provides more information about the structure of a protein than a contact map, and which can be converted into contact map with different thresholds. The prediction procedure starts with a feature selection on the 544 amino acid physico-chemical properties of the AAindex repository, resulting different properties set which were used to predictions. The author will show some recent results using his approach and, finally, he will outline some of his current researching and future works.
SinGAN - Learning a Generative Model from a Single Natural ImageJishnu P
SinGAN is a generative adversarial network (GAN) that can learn the distribution of a single natural image and generate new realistic samples from that image distribution. Unlike other GANs that require large datasets, SinGAN only needs a single image for training. It uses a multi-scale architecture with multiple generators and discriminators at different scales. SinGAN was shown to generate high quality samples for tasks like super resolution, image editing, and animation from a single image. It also has some failure cases like generating unrealistic samples at the boundaries.
The document discusses the evolution of CAPTCHAs from first generation distorted text to reCAPTCHAs that helped digitize books by using words humans could read but computers could not. It then discusses how NoCAPTCHA reCAPTCHA was developed to address issues like accessibility for those with disabilities. The document also summarizes a student project that used deep learning methods like CNNs and transfer learning to break single character CAPTCHAs with high accuracy, showing the need for more advanced CAPTCHAs.
This talk is presented at Bio-inspiring and evolutionary computation: Trends, applications and open issues workshop, 7 Nov. 2015 Faculty of Computers and Information, Cairo University
Use of Definitive Screening Designs to Optimize an Analytical MethodPhilip Ramsey
discusses using a definitive screening design to characterize and optimize a glycoprofiling method and compares the definitive screening results to a much larger central composite design results
Self-Control in Chimpanzees Relates to General IntelligenceSachin Eldho
This document summarizes a study that examined the relationship between general intelligence and delayed gratification in chimpanzees. 40 chimpanzees completed tasks measuring general intelligence (PCTB) and delayed gratification (HDT). Results found that chimpanzees with higher general intelligence scores chose the larger later reward in the HDT more often and accumulated more rewards overall. Intelligence scores were most strongly correlated with an efficiency measure of HDT performance. The findings suggest general intelligence and self-control abilities are linked in chimpanzees, as in humans, and may have common genetic or neural mechanisms.
Computational Pool-Testing with Retesting StrategyWaqas Tariq
Pool testing is a cost effective procedure for identifying defective items in a large population. It also improves the efficiency of the testing procedure when imperfect tests are employed. This study develops computational pool-testing strategy based on a proposed pool testing with re-testing strategy. Statistical moments based on this applied design have been generated. With advent of computers in 1980‘s, pool-testing with re-testing strategy under discussion is handled in the context of computational statistics. From this study, it has been established that re-testing reduces misclassifications significantly as compared to Dorfman procedure although re-testing comes with a cost i.e. increase in the number of tests. Re-testing considered improves the sensitivity and specificity of the testing scheme.
This document summarizes a simulation study comparing the performance of different meta-analysis methods when assumptions of normality are violated. The study generated simulated datasets with various distributions for true effects and degrees of heterogeneity. It then compared methods like fixed effects, DerSimonian-Laird, maximum likelihood, and permutations in terms of coverage, power, and confidence interval estimation. The results showed that some methods are more robust to non-normal data, with profile likelihood and permutations generally performing best, while other methods like fixed effects and DerSimonian-Laird showed poorer performance.
This document discusses 10 different types of data analysis: univariate, bivariate, multivariate, normative, descriptive, classification, evaluative, comparative, and cost-effective analysis. For each type of analysis, the document provides examples of how each could be used to analyze data from research studies, including specifying the variables and statistical tools that could be used. It focuses on explaining the purpose and application of each analytical technique.
Opticon 2017 Experimenting with Stats EngineOptimizely
- Stats Engine was built to help companies make data-driven decisions from experiments as data has become cheaper and more real-time while practitioners are now everyone.
- When analyzing experiments, it is important to avoid peeking at interim results and control for multiple comparisons testing across many metrics and variations which can lead to false discoveries.
- Stats Engine addresses these issues through sequential analysis and controlling the false discovery rate when more signals are introduced to experiments.
- Practitioners can use Stats Engine to understand when to stop experiments, how additional variations and metrics impact results, and how to balance risk versus velocity in their experimentation programs.
Bayesian solutions to high-dimensional challenges using hybrid searchCaleb (Shiqiang) Jin
This dissertation proposes Bayesian methods for variable selection in high-dimensional data. Chapter 2 focuses on Bayesian best subset selection using a hybrid search approach. Specifically, it aims to identify the k most important predictors out of p candidates and find a single best model from the 2^p possible models. A linear regression framework is used where only a few predictors are assumed to be associated with the response. A hybrid search algorithm is developed to solve the non-convex optimization problem of finding the best subset model.
Innovative sample size methods for adaptive clinical trials webinar web ver...nQuery
View the video here:
https://www.statsols.com/webinar/innovative-sample-size-methods-for-adaptive-clinical-trials
Given the high failure rates and the increased costs of clinical trials, researchers need innovative design strategies to best optimize financial resources and reduce the risk to patients.
Adaptive designs are emerging as a way to reduce risk and cost associated with clinical trials. The FDA recently published guidance (Innovative Cures Act) and are actively encouraging sponsors to use Adaptive trials.
Adaptive design is a clinical trial design that allows adaptations or modifications to aspects of the trial after its initiation without undermining the validity and integrity of the trial.
In this webinar, Ronan will demonstrate nQuery's new Adaptive module focusing on Sample Size Re-Estimation & Group-Sequential Design.
In this webinar you will learn about:
The pros and cons of adaptive designs
Sample Size Re-Estimation
Group-Sequential Design
Conditional Power
Predictive Power
Innovative Sample Size Methods For Clinical Trials nQuery
"Innovative Sample Size Methods for Clinical Trials" is hosted to coincide with the Spring 2018 update to nQuery - The leading Sample Size Software.
Hosted by Ronan Fitzpatrick - Head of Statistics and nQuery Lead Researcher at Statsols - you'll learn about the benefits of a range of procedures and how you can implement them into your work:
1) Dose-escalation with the Bayesian Continual Reassessment Method
CRM is a growing alternative to the 3+3 method for Phase I trials finding the Maximum Tolerated Dose (MTD).
See how researchers can overcome 3+3 drawbacks to easily find the required sample size for this beneficial alternative for finding the MTD.
2) Bayesian Assurance with Survival Example
This Bayesian alternative to power has experienced a rapid rise in interest and application from researchers.
See how Assurance is being used by researchers to discover the true “probability of success” of a trial.
3) Mendelian Randomization
Mendelian randomization (MR) is a method that allows testing of a causal effect from observational data in the presence of confounding factors.
However, in order to design efficient Mendelian randomization studies, it is essential to calculate the appropriate sample sizes required. We demonstrate what to do to achieve this.
4) Negative Binomial Distribution
Negative binomial model has been increasingly used to model the count data. One of the challenges of applying negative binomial model in clinical trial design is the sample size estimation.
We demonstrate how best to determine the appropriate sample size in the presence of challenges such as unequal follow-up or dispersion.
The chi-square test is used to determine if an observed frequency distribution differs from an expected theoretical distribution. It can test for independence between two variables or goodness of fit between observed and expected distributions. Some key requirements for the chi-square test are that observations are independent and sample sizes are sufficiently large. The test generates a chi-square statistic that is compared to critical values from a chi-square distribution based on the degrees of freedom to determine if the null hypothesis can be rejected. Examples demonstrated how to calculate expected frequencies, the test statistic, degrees of freedom, and make conclusions based on comparing the test statistic to critical values.
is a lecture presentation covering various aspects of statistical analysis in research. It includes a case study on Barack Obama's 2008 homepage, focusing on multivariate testing to measure conversion rates of different homepage configurations. The lecture discusses statistical tests, the concept of a null hypothesis, p-values, and the chi-square (χ²) test. It also covers normal and binomial distributions, contingency tables, expected values, and how to compute and interpret χ² values. Python code for statistical analysis, particularly using the SciPy library, is also presented, demonstrating how to perform chi-square and t-tests in Python. This material is likely used in an educational setting, focusing on applying statistical methods in practical scenarios.
All content within this presentation is the property of Royal Holloway, University of London. Unauthorized use, duplication, or distribution of the materials contained herein is strictly prohibited.
The document discusses important statistical terms related to sampling. It defines population as the set of all measurements of interest to the researcher, and sample as a subset of the population. Sampling is done to get information about large populations at lower cost and with greater accuracy compared to studying the entire population. The document outlines different types of sampling methods including probability and non-probability sampling, and provides examples like simple random sampling, systematic sampling, and cluster sampling. It also discusses factors like sampling size, sampling error, and type 1 and type 2 errors.
This document discusses non-parametric tests, which are statistical tests that can be used when assumptions of parametric tests are not met. It provides examples of common non-parametric tests including the sign test and Wilcoxon signed-rank test. The sign test analyzes differences between paired observations by assigning a "+" or "-" sign. The Wilcoxon signed-rank test also assigns ranks to the differences and calculates the sum of positive and negative ranks to determine if the null hypothesis can be rejected. The document provides step-by-step explanations and examples of applying these non-parametric tests to paired sample data to test for differences between populations.
Seminar of February 9, 2012 for the ICOS group in the University of Nottingham.
Abstract: The Protein Structure Prediction (PSP) problem is to determine the three-dimensional structure of a protein, using only information contained in its amino acid sequence. The PSP problem is one of the most important open problems in structural bioinformatics. This is because the 3D structures determine the protein function and would be of enormous help for designing new drugs for diseases such as cancer or Alzheimer. Among the main data structures to represent protein structures, there are two widely used: contact maps and distance maps. Contact maps represent binary proximities (contact or non-contact) between each pair of amino acids of a protein. Distance maps represent distances between these amino acids pairs. However, contact and distance maps are very difficult to predict. In fact, the accuracy achieved by protein contact map predictors at Top L/5 in the last Critical Assessment of Techniques for Protein Structure Prediction competition (CASP9) is up to 22% approximately, and clearly must be improved. In this seminar, the author will present an approach to predict protein structures based on a nearest neighbors scheme. In this approach protein fragments are assembled according to their physico-chemical similarities, using information extracted from known protein structures. This method produces a distance map, which provides more information about the structure of a protein than a contact map, and which can be converted into contact map with different thresholds. The prediction procedure starts with a feature selection on the 544 amino acid physico-chemical properties of the AAindex repository, resulting different properties set which were used to predictions. The author will show some recent results using his approach and, finally, he will outline some of his current researching and future works.
SinGAN - Learning a Generative Model from a Single Natural ImageJishnu P
SinGAN is a generative adversarial network (GAN) that can learn the distribution of a single natural image and generate new realistic samples from that image distribution. Unlike other GANs that require large datasets, SinGAN only needs a single image for training. It uses a multi-scale architecture with multiple generators and discriminators at different scales. SinGAN was shown to generate high quality samples for tasks like super resolution, image editing, and animation from a single image. It also has some failure cases like generating unrealistic samples at the boundaries.
The document discusses the evolution of CAPTCHAs from first generation distorted text to reCAPTCHAs that helped digitize books by using words humans could read but computers could not. It then discusses how NoCAPTCHA reCAPTCHA was developed to address issues like accessibility for those with disabilities. The document also summarizes a student project that used deep learning methods like CNNs and transfer learning to break single character CAPTCHAs with high accuracy, showing the need for more advanced CAPTCHAs.
Stencil computation research project presentation #1Jishnu P
The document discusses optimization techniques used in an auto-tuning framework for parallel multicore stencil computations. It describes loop unrolling, cache blocking, and arithmetic simplification implemented as AST transformations. Cache blocking exposes temporal locality and increases cache reuse by organizing data into blocks that fit in cache. The framework applies these serial optimizations and additional parallelization strategies by modifying the AST to reflect the chosen parallelization before code generation.
This document describes an MCQ answering system that uses textual entailment and text similarity measures. The system first preprocesses the question, passage and answer options to resolve references. It then retrieves relevant sentences from the passage and calculates similarity scores. The possible combinations of sentences and answer options are analyzed using textual entailment and similarity scoring. The option with the highest combined score is selected as the answer. System performance is evaluated using the c@1 metric which encourages reducing incorrect answers while maintaining correct ones.
Cs403 Parellel Programming Travelling Salesman ProblemJishnu P
This document discusses parallelizing the traveling salesman problem. It proposes two approaches: 1) dividing the permutations of routes among threads, having each calculate costs and return the minimum, and 2) sharing a variable for minimum cost across threads and updating it concurrently as threads evaluate routes. Pseudocode is provided for both approaches. Speedup calculations assume ideal parallelization but also account for reductions from unconnected cities.
Ansible Overview - System Administration and MaintenanceJishnu P
This presentation provides an overview of Ansible as an automation tool. Ansible has much more capabilities but here I have discussed its functionalities as an automation tool.
CS404 Pattern Recognition - Locality Preserving ProjectionsJishnu P
This document summarizes a presentation on Locality Preserving Projections (LPP), a dimensionality reduction technique. LPP preserves the local neighborhood structure of the data when projecting it to a lower-dimensional space. It constructs an adjacency graph representing the connections between nearby data points, chooses weights for these connections, and computes eigenmaps to perform the dimensionality reduction while maintaining locality. LPP is compared to Principal Component Analysis (PCA), the most common dimensionality reduction method, which aims to preserve global variance rather than local neighborhood structure.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
1. Introduction Methodology Results Conclusions Future Work Contributions
B.Tech. Project
Attaining maximum power under cost constraints
in single gene experiments
P Jishnu Jaykumar (201352005)
Abhijeet Singh Panwar (201351005)
under the supervision of
Dr. Bhargab Chattopadhyay
Indian Institute of Information Technology,
Vadodara
May 5, 2017
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 1/27
2. Introduction Methodology Results Conclusions Future Work Contributions
Introduction
Single gene experiments are assays for querying ribonu-
cleic acid species abundance in individual or pools of
cells.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 2/27
3. Introduction Methodology Results Conclusions Future Work Contributions
Introduction
Single gene experiments are assays for querying ribonu-
cleic acid species abundance in individual or pools of
cells.
They have enormous importance in varied fields like evo-
lution, pathology and drug development.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 2/27
4. Introduction Methodology Results Conclusions Future Work Contributions
Introduction
Single gene experiments are assays for querying ribonu-
cleic acid species abundance in individual or pools of
cells.
They have enormous importance in varied fields like evo-
lution, pathology and drug development.
However, these experiments are usually performed by
pre-specifying the sample size during experimental de-
sign, which may lead to under powered or over powered
hypothesis tests.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 2/27
5. Introduction Methodology Results Conclusions Future Work Contributions
Introduction
Single gene experiments are assays for querying ribonu-
cleic acid species abundance in individual or pools of
cells.
They have enormous importance in varied fields like evo-
lution, pathology and drug development.
However, these experiments are usually performed by
pre-specifying the sample size during experimental de-
sign, which may lead to under powered or over powered
hypothesis tests.
This resulted in a shift to experimental design
paradigms with minimal number of replicates to maxi-
mize power.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 2/27
6. Introduction Methodology Results Conclusions Future Work Contributions
Introduction
Single gene experiments are assays for querying ribonu-
cleic acid species abundance in individual or pools of
cells.
They have enormous importance in varied fields like evo-
lution, pathology and drug development.
However, these experiments are usually performed by
pre-specifying the sample size during experimental de-
sign, which may lead to under powered or over powered
hypothesis tests.
This resulted in a shift to experimental design
paradigms with minimal number of replicates to maxi-
mize power.
Several studies have analyzed sample size required to
maximize power in hypothesis testing in RNA abun-
dance experiments.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 2/27
7. Introduction Methodology Results Conclusions Future Work Contributions
Introduction - Contd.
Recently, for the two-stage experimental design frame-
work, algorithms like SCOTTY have been proposed.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 3/27
8. Introduction Methodology Results Conclusions Future Work Contributions
Introduction - Contd.
Recently, for the two-stage experimental design frame-
work, algorithms like SCOTTY have been proposed.
But SCOTTY fails, if there are multiple (> 2) cell types.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 3/27
9. Introduction Methodology Results Conclusions Future Work Contributions
Introduction - Contd.
Recently, for the two-stage experimental design frame-
work, algorithms like SCOTTY have been proposed.
But SCOTTY fails, if there are multiple (> 2) cell types.
Also, two-stage procedures are imperfect if pilot sample
is unrepresentative.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 3/27
10. Introduction Methodology Results Conclusions Future Work Contributions
Introduction - Contd.
Recently, for the two-stage experimental design frame-
work, algorithms like SCOTTY have been proposed.
But SCOTTY fails, if there are multiple (> 2) cell types.
Also, two-stage procedures are imperfect if pilot sample
is unrepresentative.
Thus the focus is to maximize the power of the hypoth-
esis test related to a single gene belonging to more than
two cell types under a cost constraint.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 3/27
11. Introduction Methodology Results Conclusions Future Work Contributions
Some facts
In experimental research, limited funding is allotted be-
forehand to carry out the sampling process.
Thus under a cost constraint we have to carry out the
test of equivalence or inferiority (or superiority) with
minimum errors.
Note that we fix the probability of type-I error to α (0.05
in our case),
So the idea is to minimize the probability of type-II
error or equivalently maximizing the power under cost
constraints.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 4/27
12. Introduction Methodology Results Conclusions Future Work Contributions
Formulation
Suppose there are K cell types belonging to a particular
gene.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 5/27
13. Introduction Methodology Results Conclusions Future Work Contributions
Formulation
Suppose there are K cell types belonging to a particular
gene.
For the ith
cell type, suppose Xi1, . . . , Xini
be iid ran-
dom variables, not necessary normal, with means µi and
variances σ2
i .
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 5/27
14. Introduction Methodology Results Conclusions Future Work Contributions
Formulation
Suppose there are K cell types belonging to a particular
gene.
For the ith
cell type, suppose Xi1, . . . , Xini
be iid ran-
dom variables, not necessary normal, with means µi and
variances σ2
i .
Consider δ = c µ K
i=1 ciµi , c = (c1, . . . , cK) is
known.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 5/27
15. Introduction Methodology Results Conclusions Future Work Contributions
Formulation
Suppose there are K cell types belonging to a particular
gene.
For the ith
cell type, suppose Xi1, . . . , Xini
be iid ran-
dom variables, not necessary normal, with means µi and
variances σ2
i .
Consider δ = c µ K
i=1 ciµi , c = (c1, . . . , cK) is
known.
Estimator of δ is δn = K
i=1 ci
¯Xni
.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 5/27
16. Introduction Methodology Results Conclusions Future Work Contributions
Formulation Contd.
γi’s: sample size allocation ratios (unknown), ni = γin1,
γ1 = 1.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 6/27
17. Introduction Methodology Results Conclusions Future Work Contributions
Formulation Contd.
γi’s: sample size allocation ratios (unknown), ni = γin1,
γ1 = 1.
V ar(δn) = K
i=1 c2
i σ2
i /ni = 1
n1
K
i=1 c2
i σ2
i /γi.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 6/27
18. Introduction Methodology Results Conclusions Future Work Contributions
Formulation Contd.
γi’s: sample size allocation ratios (unknown), ni = γin1,
γ1 = 1.
V ar(δn) = K
i=1 c2
i σ2
i /ni = 1
n1
K
i=1 c2
i σ2
i /γi.
Using CLT,
√
n1(δn − δ)
K
i=1 c2
i σ2
i /γi
→
D
N(0, 1)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 6/27
19. Introduction Methodology Results Conclusions Future Work Contributions
Test for Equivalence
Compare δ, to two equivalence margins with δ1 < δ < δ2
H0 : δ ≤ δ1 or δ ≥ δ2 against HA : δ1 < δ < δ2
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 7/27
20. Introduction Methodology Results Conclusions Future Work Contributions
Test for Equivalence
Compare δ, to two equivalence margins with δ1 < δ < δ2
H0 : δ ≤ δ1 or δ ≥ δ2 against HA : δ1 < δ < δ2
Consider the test for equivalence. Then the null hypoth-
esis H0 is rejected at α% if
δ1 + zα V ar(δn) < δn < δ2 + zα V ar(δn)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 7/27
21. Introduction Methodology Results Conclusions Future Work Contributions
Test for Equivalence
Compare δ, to two equivalence margins with δ1 < δ < δ2
H0 : δ ≤ δ1 or δ ≥ δ2 against HA : δ1 < δ < δ2
Consider the test for equivalence. Then the null hypoth-
esis H0 is rejected at α% if
δ1 + zα V ar(δn) < δn < δ2 + zα V ar(δn)
The approximate power function is
PTE(δ) = Φ
−zα +
δ2 − δ
V ar(ˆδn)
−Φ
zα −
δ − δ1
V ar(ˆδn)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 7/27
22. Introduction Methodology Results Conclusions Future Work Contributions
Test for Inferiority
Compare δ, to δ0, we have,
H0 : δ ≤ −|δ0| against HA : δ > −|δ0|
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 8/27
23. Introduction Methodology Results Conclusions Future Work Contributions
Test for Inferiority
Compare δ, to δ0, we have,
H0 : δ ≤ −|δ0| against HA : δ > −|δ0|
Consider the inferiority test. Then the null hypothesis
H0 is rejected at α% if
δn > −|δ0| + zα V ar(δn)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 8/27
24. Introduction Methodology Results Conclusions Future Work Contributions
Test for Inferiority
Compare δ, to δ0, we have,
H0 : δ ≤ −|δ0| against HA : δ > −|δ0|
Consider the inferiority test. Then the null hypothesis
H0 is rejected at α% if
δn > −|δ0| + zα V ar(δn)
The approximate power function
PTI(δ) = Φ
−zα +
δ + | δ0 |
V ar(ˆδn)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 8/27
25. Introduction Methodology Results Conclusions Future Work Contributions
Maximizing Power Under Cost Constraints
Suppose, we have a certain amount of money to carry
out sampling, say |A0.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 9/27
26. Introduction Methodology Results Conclusions Future Work Contributions
Maximizing Power Under Cost Constraints
Suppose, we have a certain amount of money to carry
out sampling, say |A0.
If it costs |ai to sample each observation from ith
cell
type, then A0 = K
i=1 aini =⇒ n1 = A0
K
i=1 aiγi
.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 9/27
27. Introduction Methodology Results Conclusions Future Work Contributions
Maximizing Power Under Cost Constraints
Suppose, we have a certain amount of money to carry
out sampling, say |A0.
If it costs |ai to sample each observation from ith
cell
type, then A0 = K
i=1 aini =⇒ n1 = A0
K
i=1 aiγi
.
In order to maximize power, the optimal allocation ratio
γi =
| ci | σi
| c1 | σ1
a1
ai
(1)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 9/27
28. Introduction Methodology Results Conclusions Future Work Contributions
Maximizing Power Under Cost Constraints
Suppose, we have a certain amount of money to carry
out sampling, say |A0.
If it costs |ai to sample each observation from ith
cell
type, then A0 = K
i=1 aini =⇒ n1 = A0
K
i=1 aiγi
.
In order to maximize power, the optimal allocation ratio
γi =
| ci | σi
| c1 | σ1
a1
ai
(1)
The optimal sample size for ith
cell type is,
nio =
A0 | ci | σi
K
i=1 | cl | σl
√
alai
(2)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 9/27
29. Introduction Methodology Results Conclusions Future Work Contributions
Sequential Procedure
Step 1: First obtain Nio = m(≥ 2) observations from
each of the cell types and for ith
cell type ,check
m ≥
A0 | Ci | SiNio
K
i=1 | cl | slNlo
√
alai
If for ith
cell type, the above condition gets satisfied, no
further observations should be obtained from that cell
type. Else go to step 2 after carrying out step 1 for all
cell types.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 10/27
30. Introduction Methodology Results Conclusions Future Work Contributions
Sequential Procedure Contd.
Step 2: Add m observations corresponding to the ith
cell type and set Nio = m + m and check,
m + m ≥
A0 | Ci | SiNio
K
i=1 | cl | slNlo
√
alai
If for ith
cell type, the above condition gets satisfied,
no further sampling needs to be done for that cell type.
Else repeat step 2 and continue this until for all cell
types, the condition gets satisfied.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 11/27
31. Introduction Methodology Results Conclusions Future Work Contributions
Sequential Procedure and Characteristics
One can find optimal sample size using the stopping
rule. Nio, is the smallest integer, such that
nio ≥
A0 | Ci | Sinio
K
i=1 | cl | slnlo
√
alai
(3)
The total cost of sampling K
i=1 Nio observations, Nio
being the estimated final optimal sample size for ith
cell
type computed using Equation 4, is |A0 .
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 12/27
32. Introduction Methodology Results Conclusions Future Work Contributions
Contd.
The optimal allocation ratio as defined in Equation 2 to
derive minimum variance depends on population variances
of the K cell types. In practice, the value of the population
variance of each cell type is unknown, therefore, the sample
size required to obtain maximum power of the test cannot
be computed.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 13/27
33. Introduction Methodology Results Conclusions Future Work Contributions
Graphical Representation
Flowchart that describes the sequential procedure developed.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 14/27
36. Introduction Methodology Results Conclusions Future Work Contributions
Graph
Figure: Accuracy w.r.t. to all the distributions taken into account
for the simulation study.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 17/27
37. Introduction Methodology Results Conclusions Future Work Contributions
About the toolkit
A toolkit (web-application) for the sequential procedure
was also developed.
Input: A xlsx file containing data in a predefined format.
Output: Optimal sample sizes for each cell type speci-
fied in the input file.
Intermediate stages of the procedure can also be viewed.
It is temporarily hosted at http://139.59.74.69/
alpha_version/
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 18/27
38. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions
Using the sequential procedure we found the optimal
sample sizes of the cell types, which are required to ob-
tain maximum power under cost constraints.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 19/27
39. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions
Using the sequential procedure we found the optimal
sample sizes of the cell types, which are required to ob-
tain maximum power under cost constraints.
The simulation study showed that on an average, using
our procedure, the estimated optimal sample size and
theoretical sample size are close for most of the situa-
tions, except for the mixture of normal, weibull and chi
distribution.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 19/27
40. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions
Using the sequential procedure we found the optimal
sample sizes of the cell types, which are required to ob-
tain maximum power under cost constraints.
The simulation study showed that on an average, using
our procedure, the estimated optimal sample size and
theoretical sample size are close for most of the situa-
tions, except for the mixture of normal, weibull and chi
distribution.
The procedure developed is independent of distribu-
tions.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 19/27
41. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions Contd.
SCOTTY is limited to normal distribution and hence
can’t handle the datasets following other distributions.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 20/27
42. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions Contd.
SCOTTY is limited to normal distribution and hence
can’t handle the datasets following other distributions.
Due to this factor, our procedure gains an advantage
over SCOTTY.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 20/27
43. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions Contd.
SCOTTY is limited to normal distribution and hence
can’t handle the datasets following other distributions.
Due to this factor, our procedure gains an advantage
over SCOTTY.
We have developed a toolkit for estimating optimal sam-
ple sizes of different cell types corresponding to a single
gene and for making an inference about hypothesis test.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 20/27
44. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions Contd.
SCOTTY is limited to normal distribution and hence
can’t handle the datasets following other distributions.
Due to this factor, our procedure gains an advantage
over SCOTTY.
We have developed a toolkit for estimating optimal sam-
ple sizes of different cell types corresponding to a single
gene and for making an inference about hypothesis test.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 20/27
45. Introduction Methodology Results Conclusions Future Work Contributions
Future Work
The project consisted of simulation study and toolkit
development for the sequential procedure with a single
gene having multiple cell types.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 21/27
46. Introduction Methodology Results Conclusions Future Work Contributions
Future Work
The project consisted of simulation study and toolkit
development for the sequential procedure with a single
gene having multiple cell types.
The future activities include developing a procedure for
a set of genes with multiple cell types.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 21/27
47. Introduction Methodology Results Conclusions Future Work Contributions
Future Work
The project consisted of simulation study and toolkit
development for the sequential procedure with a single
gene having multiple cell types.
The future activities include developing a procedure for
a set of genes with multiple cell types.
In addition to this, the toolkit will be also added up
with extra features and would be made to run for set of
genes scenario as well.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 21/27
48. Introduction Methodology Results Conclusions Future Work Contributions
Future Work
The project consisted of simulation study and toolkit
development for the sequential procedure with a single
gene having multiple cell types.
The future activities include developing a procedure for
a set of genes with multiple cell types.
In addition to this, the toolkit will be also added up
with extra features and would be made to run for set of
genes scenario as well.
The final outcome would be a rich web and desktop
application.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 21/27
49. Introduction Methodology Results Conclusions Future Work Contributions
Contributions
Abhijeet
Writing the R code for
the sequential procedure.
Simulation study using
Gamma Distribution
Weibull Distribution
Normal-Exponential
Distribution
Chi Distribution
Chi Square (Non-
Central) Distribution
Logistic Distribution
Jishnu
R code reviewing and bug
fixing.
Transforming the R code
to Python code.
Simulation study using
Normal distribution
Mixture-normal distri-
butions
Mixture of Normal +
Weibull + Chi Distri-
bution
Toolkit development.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 22/27
50. Introduction Methodology Results Conclusions Future Work Contributions
References I
[BSM+
13] Michele A Busby, Chip Stewart, Chase A Miller,
Krzysztof R Grzeda, and Gabor T Marth, Scotty:
a web tool for designing rna-seq experiments to
measure differential gene expression, Bioinfor-
matics 29 (2013), no. 5, 656–657.
[GCL11] Jiin-Huarng Guo, Hubert J Chen, and Wei-Ming
Luh, Sample size planning with the cost constraint
for testing superiority and equivalence of two in-
dependent groups, British Journal of Mathemat-
ical and Statistical Psychology 64 (2011), no. 3,
439–461.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 23/27
51. Introduction Methodology Results Conclusions Future Work Contributions
References II
[GS91] Bhaskar Kumar Ghosh and Pranab Kumar Sen,
Handbook of sequential analysis, CRC Press,
1991.
[JT99] Christopher Jennison and Bruce W Turnbull,
Group sequential methods with applications to
clinical trials, CRC Press, 1999.
[LG16] Wei-Ming Luh and Jiin-Huarng Guo, Sample size
planning for the noninferiority or equivalence of
a linear contrast with cost considerations., Psy-
chological methods 21 (2016), no. 1, 13.
[Lip90] Mark W Lipsey, Design sensitivity: Statistical
power for experimental research, vol. 19, Sage,
1990.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 24/27
52. Introduction Methodology Results Conclusions Future Work Contributions
References III
[LR06] Erich L Lehmann and Joseph P Romano, Testing
statistical hypotheses, Springer Science & Busi-
ness Media, 2006.
[MDS08] Nitis Mukhopadhyay and Basil M De Silva, Se-
quential methods and their applications, CRC
press, 2008.
[SS81] Pranab Kumar Sen and Pranab K Sen, Sequential
nonparametrics: invariance principles and statis-
tical inference, Wiley New York, 1981.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 25/27
53. Introduction Methodology Results Conclusions Future Work Contributions
Any Questions?
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 26/27
54. Introduction Methodology Results Conclusions Future Work Contributions
Statistics, likelihoods, and probabilities
mean everything to men, nothing to God.
- Richelle E. Goodrich
Thank You.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 27/27