Talk given at ISCB 2016 Birmingham
For indications and treatments where their use is possible, n-of-1 trials represent a promising means of investigating potential treatments for rare diseases. Each patient permits repeated comparison of the treatments being investigated and this both increases the number of observations and reduces their variability compared to conventional parallel group trials.
However, depending on whether the framework for analysis used is randomisation-based or model- based produces puzzling difference in inferences. This can easily be shown by starting on the one hand with the randomisation philosophy associated with the Rothamsted school of inference and building up the analysis through the block + treatment structure approach associated with John Nelder’s theory of general balance (as implemented in GenStat®) or starting on the other hand with a plausible variance component approach through a mixed model. However, it can be shown that these differences are related not so much to modelling approach per se but to the questions one attempts to answer: ranging from testing whether there was a difference between treatments in the patients studied, to predicting the true difference for a future patient, via making inferences about the effect in the average patient.
This in turn yields interesting insight into the long-run debate over the use of fixed or random effect meta-analysis.
Some practical issues of analysis will also be covered in R and SAS®, in which languages some functions and macros to facilitate analysis have been written. It is concluded that n-of-1 hold great promise in investigating chronic rare diseases but that careful consideration of matters of purpose, design and analysis is necessary to make best use of them.
Acknowledgement
This work is partly supported by the European Union’s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. “IDEAL”
The statistical revolution of the 20th century was largely concerned with developing methods for analysing small datasets. Student’s paper of 1908 was the first in the English literature to address the problem of second order uncertainty (uncertainty about the measures of uncertainty) seriously and was hailed by Fisher as heralding a new age of statistics. Much of what Fisher did was concerned with problems of what might be called ‘small data’, not only as regards efficient analysis but also as regards efficient design and in addition paying close attention to what was necessary to measure uncertainty validly.
I shall consider the history of some of these developments, in particular those that are associated with what might be called the Rothamsted School, starting with Fisher and having its apotheosis in John Nelder’s theory of General Balance and see what lessons they hold for the supposed ‘big data’ revolution of the 21st century.
Improving predictions: Lasso, Ridge and Stein's paradoxMaarten van Smeden
Slides of masterclass "Improving predictions: Lasso, Ridge and Stein's paradox" at the (Dutch) National Institute for Public Health and the Environment (RIVM)
The Seven Habits of Highly Effective StatisticiansStephen Senn
If you know why the title of this talk is extremely stupid, then you clearly know something about control, data and reasoning: in short, you have most of what it takes to be a statistician. If you have studied statistics then you will also know that a large amount of anything, and this includes successful careers, is luck.
In this talk I shall try share some of my experiences of being a statistician in the hope that it will help you make the most of whatever luck life throws you, In so doing, I shall try my best to overcome the distorting influence of that easiest of sciences hindsight. Without giving too much away, I shall be recommending that you read, listen, think, calculate, understand, communicate, and do. I shall give you some example of what I think works and what I think doesn’t
In all of this you should never forget the power of negativity and also the joy of being able to wake up every day and say to yourself ‘I love the small of data in the morning’.
Webinar slides how to reduce sample size ethically and responsiblynQuery
[Webinar] How to reduce sample size...ethically and responsibly | In this free webinar, you will learn various design strategies to help reduce the sample size of your study in an ethical and responsible manner. Practical examples will be used throughout.
Talk given at ISCB 2016 Birmingham
For indications and treatments where their use is possible, n-of-1 trials represent a promising means of investigating potential treatments for rare diseases. Each patient permits repeated comparison of the treatments being investigated and this both increases the number of observations and reduces their variability compared to conventional parallel group trials.
However, depending on whether the framework for analysis used is randomisation-based or model- based produces puzzling difference in inferences. This can easily be shown by starting on the one hand with the randomisation philosophy associated with the Rothamsted school of inference and building up the analysis through the block + treatment structure approach associated with John Nelder’s theory of general balance (as implemented in GenStat®) or starting on the other hand with a plausible variance component approach through a mixed model. However, it can be shown that these differences are related not so much to modelling approach per se but to the questions one attempts to answer: ranging from testing whether there was a difference between treatments in the patients studied, to predicting the true difference for a future patient, via making inferences about the effect in the average patient.
This in turn yields interesting insight into the long-run debate over the use of fixed or random effect meta-analysis.
Some practical issues of analysis will also be covered in R and SAS®, in which languages some functions and macros to facilitate analysis have been written. It is concluded that n-of-1 hold great promise in investigating chronic rare diseases but that careful consideration of matters of purpose, design and analysis is necessary to make best use of them.
Acknowledgement
This work is partly supported by the European Union’s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. “IDEAL”
The statistical revolution of the 20th century was largely concerned with developing methods for analysing small datasets. Student’s paper of 1908 was the first in the English literature to address the problem of second order uncertainty (uncertainty about the measures of uncertainty) seriously and was hailed by Fisher as heralding a new age of statistics. Much of what Fisher did was concerned with problems of what might be called ‘small data’, not only as regards efficient analysis but also as regards efficient design and in addition paying close attention to what was necessary to measure uncertainty validly.
I shall consider the history of some of these developments, in particular those that are associated with what might be called the Rothamsted School, starting with Fisher and having its apotheosis in John Nelder’s theory of General Balance and see what lessons they hold for the supposed ‘big data’ revolution of the 21st century.
Improving predictions: Lasso, Ridge and Stein's paradoxMaarten van Smeden
Slides of masterclass "Improving predictions: Lasso, Ridge and Stein's paradox" at the (Dutch) National Institute for Public Health and the Environment (RIVM)
The Seven Habits of Highly Effective StatisticiansStephen Senn
If you know why the title of this talk is extremely stupid, then you clearly know something about control, data and reasoning: in short, you have most of what it takes to be a statistician. If you have studied statistics then you will also know that a large amount of anything, and this includes successful careers, is luck.
In this talk I shall try share some of my experiences of being a statistician in the hope that it will help you make the most of whatever luck life throws you, In so doing, I shall try my best to overcome the distorting influence of that easiest of sciences hindsight. Without giving too much away, I shall be recommending that you read, listen, think, calculate, understand, communicate, and do. I shall give you some example of what I think works and what I think doesn’t
In all of this you should never forget the power of negativity and also the joy of being able to wake up every day and say to yourself ‘I love the small of data in the morning’.
Webinar slides how to reduce sample size ethically and responsiblynQuery
[Webinar] How to reduce sample size...ethically and responsibly | In this free webinar, you will learn various design strategies to help reduce the sample size of your study in an ethical and responsible manner. Practical examples will be used throughout.
The Rothamsted school meets Lord's paradoxStephen Senn
Lords ‘paradox’ is a notoriously difficult puzzle that is guaranteed to provoke discussion, dissent and disagreement. Two statisticians analyse some observational data and come to radically different conclusions, each of which has acquired defenders over the years since Lord first proposed his puzzle in 1967. It features in the recent Book of Why by Pearl and McKenzie, who use it to demonstrate the power of Pearl’s causal calculus, obtaining a solution they claim is unambiguously right. They also claim that statisticians have failed to get to grips with causal questions for well over a century, in fact ever since Karl Pearson developed Galton’s idea of correlation and warned the scientific world that correlation is not causation.
However, only two years before Lord published his paradox John Nelder outlined a powerful causal calculus for analyzing designed experiments based on a careful distinction between block and treatment structure. This represents an important advance in formalizing the approach to analysing complex experiments that started with Fisher 100 years ago, when he proposed splitting variability using the square of the standard deviation, which he called the variance, continued with Yates and has been developed since the 1960s by Rosemary Bailey, amongst others. This tradition might be referred to as The Rothamsted School. It is fully implemented in Genstat® but, as far as I am aware, not in any other package.
With the help of Genstat®, I demonstrate how the Rothamsted School would approach Lord’s paradox and come to a solution that is not the same as the one reached by Pearl and McKenzie, although given certain strong but untestable assumptions it would reduce to it. I conclude that the statistical tradition may have more to offer in this respect than has been supposed.
Clinical trials: quo vadis in the age of covid?Stephen Senn
A discussion of the role of clinical trials in the age of COVID. My contribution to the phastar 2020 life sciences summit https://phastar.com/phastar-life-science-summit
An early and overlooked causal revolution in statistics was the development of the theory of experimental design, initially associated with the "Rothamstead School". An important stage in the evolution of this theory was the experimental calculus developed by John Nelder in the 1960s with its clear distinction between block and treatment factors in designed experiments. This experimental calculus produced appropriate models automatically from more basic formal considerations but was, unfortunately, only ever implemented in Genstat®, a package widely used in agriculture but rarely so in medical research. In consequence its importance has not been appreciated and the approach of many statistical packages to designed experiments is poor. A key feature of the Rothamsted School approach is that identification of the appropriate components of variation for judging treatment effects is simple and automatic.
The impressive more recent causal revolution in epidemiology, associated with Judea Pearl, seems to have no place for components of variation, however. By considering the application of Nelder’s experimental calculus to Lord’s Paradox, I shall show that this reveals that solutions that have been proposed using the more modern causal calculus are problematic. I shall also show that lessons from designed clinical trials have important implications for the use of historical data and big data more generally.
This year marks the 70th anniversary of the Medical Research Council randomised clinical trial (RCT) of streptomycin in tuberculosis led by Bradford Hill. This is widely regarded as a landmark in clinical research. Despite its widespread use in drug regulation and in clinical research more widely and its high standing with the evidence based medicine movement, the RCT continues to attracts criticism. I show that many of these criticisms are traceable to failure to understand two key concepts in statistics: probabilistic inference and design efficiency. To these methodological misunderstandings can be added the practical one of failing to appreciate that entry into clinical trials is not simultaneous but sequential.
I conclude that although randomisation should not be used as an excuse for ignoring prognostic variables, it is valuable and that many standard criticisms of RCTs are invalid.
How to combine results from randomised clinical trials on the additive scale with real world data to provide predictions on the clinically relevant scale for individual patients
There are many valid criticisms of P-values but the criticism that they are largely responsible for the reproducibility crisis has been accepted rather lightly in some quarters. Whatever the inferential statistic that is used, it is quite illogical to assume that as the sample size increases it will tend to show more evidence against the null hypothesis. This applies to Bayesian posterior probabilities as much as it does to P-values. In the context of P-values it can be referred to as the trend towards significance fallacy but more generally, for reasons I shall explain, it could be referred to as the anticipated evidence fallacy.
The anticipated evidence fallacy is itself an example of the overstated evidence fallacy. I shall also discuss this fallacy and other relevant matters affecting reproducible science including the problem of false negatives.
There are many questions one might ask of a clinical trial, ranging from what was the effect in the patients studied to what might the effect be in future patients via what was the effect in individual patients? The extent to which the answer to these questions is similar depends on various assumptions made and in some cases the design used may not permit any meaningful answer to be given at all.
A related issue is confusion between randomisation, random sampling, linear model and true multivariate based modelling. These distinctions don’t matter much for some purposes and under some circumstances but for others they do.
Innovative Sample Size Methods For Clinical Trials nQuery
"Innovative Sample Size Methods for Clinical Trials" is hosted to coincide with the Spring 2018 update to nQuery - The leading Sample Size Software.
Hosted by Ronan Fitzpatrick - Head of Statistics and nQuery Lead Researcher at Statsols - you'll learn about the benefits of a range of procedures and how you can implement them into your work:
1) Dose-escalation with the Bayesian Continual Reassessment Method
CRM is a growing alternative to the 3+3 method for Phase I trials finding the Maximum Tolerated Dose (MTD).
See how researchers can overcome 3+3 drawbacks to easily find the required sample size for this beneficial alternative for finding the MTD.
2) Bayesian Assurance with Survival Example
This Bayesian alternative to power has experienced a rapid rise in interest and application from researchers.
See how Assurance is being used by researchers to discover the true “probability of success” of a trial.
3) Mendelian Randomization
Mendelian randomization (MR) is a method that allows testing of a causal effect from observational data in the presence of confounding factors.
However, in order to design efficient Mendelian randomization studies, it is essential to calculate the appropriate sample sizes required. We demonstrate what to do to achieve this.
4) Negative Binomial Distribution
Negative binomial model has been increasingly used to model the count data. One of the challenges of applying negative binomial model in clinical trial design is the sample size estimation.
We demonstrate how best to determine the appropriate sample size in the presence of challenges such as unequal follow-up or dispersion.
Talk given at RSS 2016 Manchester
I consider the problems that the ASA faced in getting a P-value statement together, not in terms of the process, but by looking at the expressed opinion of 21 published commentaries of the agreed statement. I then trace the history of the development of P-values. I show that the perceived problem with P-values in not just one of a supposed inadequacy of frequentist statistics but reflects a struggle at the very heart of Bayesian inference. I conclude that replacing P-values by automatic Bayesian approaches is unlikely to abolish controversy. It may be better to try and embrace diversity than to pretend it is not there.
The Rothamsted school meets Lord's paradoxStephen Senn
Lords ‘paradox’ is a notoriously difficult puzzle that is guaranteed to provoke discussion, dissent and disagreement. Two statisticians analyse some observational data and come to radically different conclusions, each of which has acquired defenders over the years since Lord first proposed his puzzle in 1967. It features in the recent Book of Why by Pearl and McKenzie, who use it to demonstrate the power of Pearl’s causal calculus, obtaining a solution they claim is unambiguously right. They also claim that statisticians have failed to get to grips with causal questions for well over a century, in fact ever since Karl Pearson developed Galton’s idea of correlation and warned the scientific world that correlation is not causation.
However, only two years before Lord published his paradox John Nelder outlined a powerful causal calculus for analyzing designed experiments based on a careful distinction between block and treatment structure. This represents an important advance in formalizing the approach to analysing complex experiments that started with Fisher 100 years ago, when he proposed splitting variability using the square of the standard deviation, which he called the variance, continued with Yates and has been developed since the 1960s by Rosemary Bailey, amongst others. This tradition might be referred to as The Rothamsted School. It is fully implemented in Genstat® but, as far as I am aware, not in any other package.
With the help of Genstat®, I demonstrate how the Rothamsted School would approach Lord’s paradox and come to a solution that is not the same as the one reached by Pearl and McKenzie, although given certain strong but untestable assumptions it would reduce to it. I conclude that the statistical tradition may have more to offer in this respect than has been supposed.
Clinical trials: quo vadis in the age of covid?Stephen Senn
A discussion of the role of clinical trials in the age of COVID. My contribution to the phastar 2020 life sciences summit https://phastar.com/phastar-life-science-summit
An early and overlooked causal revolution in statistics was the development of the theory of experimental design, initially associated with the "Rothamstead School". An important stage in the evolution of this theory was the experimental calculus developed by John Nelder in the 1960s with its clear distinction between block and treatment factors in designed experiments. This experimental calculus produced appropriate models automatically from more basic formal considerations but was, unfortunately, only ever implemented in Genstat®, a package widely used in agriculture but rarely so in medical research. In consequence its importance has not been appreciated and the approach of many statistical packages to designed experiments is poor. A key feature of the Rothamsted School approach is that identification of the appropriate components of variation for judging treatment effects is simple and automatic.
The impressive more recent causal revolution in epidemiology, associated with Judea Pearl, seems to have no place for components of variation, however. By considering the application of Nelder’s experimental calculus to Lord’s Paradox, I shall show that this reveals that solutions that have been proposed using the more modern causal calculus are problematic. I shall also show that lessons from designed clinical trials have important implications for the use of historical data and big data more generally.
This year marks the 70th anniversary of the Medical Research Council randomised clinical trial (RCT) of streptomycin in tuberculosis led by Bradford Hill. This is widely regarded as a landmark in clinical research. Despite its widespread use in drug regulation and in clinical research more widely and its high standing with the evidence based medicine movement, the RCT continues to attracts criticism. I show that many of these criticisms are traceable to failure to understand two key concepts in statistics: probabilistic inference and design efficiency. To these methodological misunderstandings can be added the practical one of failing to appreciate that entry into clinical trials is not simultaneous but sequential.
I conclude that although randomisation should not be used as an excuse for ignoring prognostic variables, it is valuable and that many standard criticisms of RCTs are invalid.
How to combine results from randomised clinical trials on the additive scale with real world data to provide predictions on the clinically relevant scale for individual patients
There are many valid criticisms of P-values but the criticism that they are largely responsible for the reproducibility crisis has been accepted rather lightly in some quarters. Whatever the inferential statistic that is used, it is quite illogical to assume that as the sample size increases it will tend to show more evidence against the null hypothesis. This applies to Bayesian posterior probabilities as much as it does to P-values. In the context of P-values it can be referred to as the trend towards significance fallacy but more generally, for reasons I shall explain, it could be referred to as the anticipated evidence fallacy.
The anticipated evidence fallacy is itself an example of the overstated evidence fallacy. I shall also discuss this fallacy and other relevant matters affecting reproducible science including the problem of false negatives.
There are many questions one might ask of a clinical trial, ranging from what was the effect in the patients studied to what might the effect be in future patients via what was the effect in individual patients? The extent to which the answer to these questions is similar depends on various assumptions made and in some cases the design used may not permit any meaningful answer to be given at all.
A related issue is confusion between randomisation, random sampling, linear model and true multivariate based modelling. These distinctions don’t matter much for some purposes and under some circumstances but for others they do.
Innovative Sample Size Methods For Clinical Trials nQuery
"Innovative Sample Size Methods for Clinical Trials" is hosted to coincide with the Spring 2018 update to nQuery - The leading Sample Size Software.
Hosted by Ronan Fitzpatrick - Head of Statistics and nQuery Lead Researcher at Statsols - you'll learn about the benefits of a range of procedures and how you can implement them into your work:
1) Dose-escalation with the Bayesian Continual Reassessment Method
CRM is a growing alternative to the 3+3 method for Phase I trials finding the Maximum Tolerated Dose (MTD).
See how researchers can overcome 3+3 drawbacks to easily find the required sample size for this beneficial alternative for finding the MTD.
2) Bayesian Assurance with Survival Example
This Bayesian alternative to power has experienced a rapid rise in interest and application from researchers.
See how Assurance is being used by researchers to discover the true “probability of success” of a trial.
3) Mendelian Randomization
Mendelian randomization (MR) is a method that allows testing of a causal effect from observational data in the presence of confounding factors.
However, in order to design efficient Mendelian randomization studies, it is essential to calculate the appropriate sample sizes required. We demonstrate what to do to achieve this.
4) Negative Binomial Distribution
Negative binomial model has been increasingly used to model the count data. One of the challenges of applying negative binomial model in clinical trial design is the sample size estimation.
We demonstrate how best to determine the appropriate sample size in the presence of challenges such as unequal follow-up or dispersion.
Talk given at RSS 2016 Manchester
I consider the problems that the ASA faced in getting a P-value statement together, not in terms of the process, but by looking at the expressed opinion of 21 published commentaries of the agreed statement. I then trace the history of the development of P-values. I show that the perceived problem with P-values in not just one of a supposed inadequacy of frequentist statistics but reflects a struggle at the very heart of Bayesian inference. I conclude that replacing P-values by automatic Bayesian approaches is unlikely to abolish controversy. It may be better to try and embrace diversity than to pretend it is not there.
jevera bay - جيفيرة باى - الساحل الشمالىSarah Lasheen
تمتع بجمال الشاطىء فى جيفيره باى حيث الرمال الناعمه والسماء الصافيه وأشعه الشمس الدافئة التى تضى سحر خاص لبحر جيفيره باى
جيفيرة باى - الساحل الشمالى
ولأنك تقضى معظم اجازتك على الشاطىء وفرنا لك العديد من الخدمات على طول الشاطىء بعرض 540م وعمق 1600م ومساحة تصل الى 15 فدان تشمل منطقة اكوابارك للاطفال - مطاعم وكافيهات - اماكن مجهزة للحفلات
خدمات القريه
( فندق على مسطح 3000م – مول تجارى ضخم – منطقة السوق التجارى على مساحة 5100م – مبنى خدمات طبيه على مساحة 350م – يوجد 27 حمام سباحة موزع على القريه بالكامل ويوجد حمام سباحة خاص بالفندق مساحته 3000م – مسجد مساحته 600م – بحيرات صناعية – ممرات نخيل – يوجد منطقة لاند سكيب 62% من المساحة الاجماليه للشعور بالهدوء والصفاء الذهنى – يوجد 4 ملاعب منهم 2 عند الشاطئ و2 عند الفندق )
للاستعلام
01010007513
Here is tips of how to make website SEO Friendly:-
Website Navigation
JavaScript Web Navigation Systems
Frames
Importance of Valid HTML Code
Search Engine Spam Penalties
Website duplicate content
By- EZ Rankings
Chapter 8
Sampling
Sampling
Sampling involves decisions about who or what will be tested, observed, or interviewed in your study (Morse, 2007)
Key questions to address:
Who should and should not be included?
How many should be included?
Probability
Probability is the likelihood that an event or a condition will occur
You can express probability in terms of the chance the event will occur or in percentages
Levels of Significance
Levels of significance are the difference that will be accepted as too large to be attributed to chance
These levels are set by the researcher at the outset of a study
Probability Samples
Probability samples are formed to ensure that each subject has an equal chance of being included so an unbiased sample can be used
Probability Samples
A sampling design explains how the subjects are chosen and should include:
Number of subjects
How they will be assessed, screened, and selected
Inclusion and exclusion criteria
Probability Samples
Random selection is accomplished by having:
Identification of all possible participants
Every potential participant is given an equal chance of being selected
Probability Samples
Variations of random sampling include:
Stratified: randomly select from each stratum
Cluster: sample groups rather than individuals
Multistage: sample from multiple sets of clusters
Nonprobability Sampling
Reasons why researchers use nonprobability samples are:
Limited resources for developing an accurate sampling frame or purchase lists of potential subjects
Information needed to identify all potential subjects is not available
Nonprobability Sampling
Reasons why researchers use nonprobability samples are:
Limited number of subjects
Subjects are difficult to find or difficult to persuade to participate in study
Subjects do not complete study
Experimental mortality
Nonprobability Sampling
Types of nonprobability samples include:
Quota sampling: select a specified number of participants from each group
Convenience sampling: enroll those who are available
Snowball network or referral sampling: begin with known individuals and ask them to refer others who meet selection criteria
Tracking and Reporting
Sample Development
In order to improve the reporting of randomized controlled trials (RCTs), the Consolidated Standards of Reporting Trials (CONSORT) were developed
A flow diagram that can be used for tracking sample development
CONSORT Flow Diagram
Source: Altman, D.G., Schulz, K.F., Moher, D., Egger, M.. Davidoff, F., Elbourne, D., Gøtzsche, P.C., & Lang, T. (2001). The revised CONSORT statement for reporting randomized trials: Explanation and elaboration. Annuals of Internal Medicine; 134(8), 663-694.
Example of Flowchart
Source: Buchbinder, R., Osborne, R.H., Ebeling, P. R., Wark, J.D., Mitchell, P.M., Wriedt, C., Graves, S.D., Staples, M.P., & Murphy, B. (2009). A randomized trial of vertebroplasty for painful osteoporotic vertebral factures. The New England Journal of Medicine, 361 ...
Biostatistics in clinical research involves the application of statistical methods to analyze and interpret data from clinical trials. It plays a crucial role in study design, sample size determination, data analysis, and result interpretation. Biostatisticians ensure that clinical research findings are valid, reliable, and meaningful, contributing to evidence-based medicine. Their expertise helps researchers make informed decisions, assess treatment efficacy, and draw accurate conclusions about the safety and effectiveness of interventions.
5 essential steps for sample size determination in clinical trials slidesharenQuery
In this free webinar hosted by nQuery Researcher & Statistician Eimear Keyes, we map out the 5 essential steps for sample size determination in clinical trials. At each step, Eimear will highlight the important function it plays and how to avoid the errors that will negatively impact your sample size determination and therefore your study.
Watch the Video: https://www.statsols.com/webinar/the-5-essential-steps-for-sample-size-determination
Practical Methods To Overcome Sample Size ChallengesnQuery
Watch the video at: https://www.statsols.com/webinars/practical-methods-to-overcome-sample-size-challenges
In this webinar hosted by Ronan Fitzpatrick - Head of Statistics and nQuery Lead Researcher at Statsols - we will examine some of the most common practical challenges you will experience while calculating sample size for your study. These challenges will be split into two categories:
1. Overcoming Sample Size Calculation Challenges
(Survival Analysis Example)
We will examine practical methods to overcome common sample size calculation issues by focusing in on one of the more complex areas for sample size determination; Survival analysis. We will cover difficulties and potential issues surrounding challenges such as:
Drop Out: How to deal with expected dropouts or censoring. We compare the simple loss-to-follow-up method and integrating a dropout process into the sample size model?
Planning Uncertainty: How best to deal with the inevitable uncertainty at the planning stage? We examine how best to apply a sensitivity analysis and Bayesian approaches to explore the uncertainty in your sample size calculations.
Choosing the Effect Size: Various approaches and interpretations exist for how to find the effect size value. We examine those contrasting interpretations and determine the best method and also how to deal with parameterization options.
2. Overcoming Study Design Challenges
(Vaccine Efficacy Example)
The Randomised Controlled Trial (RCT) is considered the gold standard in trial design in drug development. However, there are often practical impediments which mean that adjustments or pragmatic approaches are needed for some trials and studies.
We will examine practical methods how to overcome common study design challenges and how these affect your sample size calculations. In this webinar, we will use common issues in vaccine study design to examine difficulties surrounding issues such as:
Case-Control Analysis: We will examine how to deal with study constraints and how to deal with analyses done during an observational study.
Alternative Randomization Methods: How best to address randomization in your vaccine trial design when full randomization is difficult, expensive or impractical. We examine how sample size calculations are affected with cluster or Mendelian randomization.
Rare Events: How does an outcome being rare affect the types of study design and statistical methods chosen in your study.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
How to judge approximations? Pitfalls of statistics
1. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Most take things upon trust
Bartolomeo Civalleri1
, Roberto Dovesi1
, Erin R. Johnson3
,
Pascal Pernot 4
, Davide Presti2
, Andreas Savin5
1
Department of Chemistry and NIS Centre of Excellence
University of Torino (Italy)
2
Department of Chemical and Geological Sciences and INSTM research unit
University of Modena and Reggio-Emilia, Modena (Italy)
3
Chemistry and Chemical Biology, School of Natural Sciences
University of California, Merced (USA)
4
Laboratoire de Chimie Physique d’Orsay
Universit´e Paris-Sud (France)
5
Laboratoire de Chimie Th´eorique
CNRS and Sorbonne University UPMC Univ Paris 6 (France)
Winterschool on Computational Chemistry 2015
1 / 91
2. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Most take things upon trust
... some (and those of the most) taking
things upon trust, misemploy their power of
assent ...
John Locke
2 / 91
3. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Most take things upon trust
... some (and those of the most) taking
things upon trust, misemploy their power of
assent ...
John Locke
2 / 91
4. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Success of benchmarks
Quantifying experience
0
50
100
150
200
250
300
1993 1998 2003 2008 2013
0
2000
4000
6000
8000
10000
12000
Numberofpublications
Numberofcitations
Year
publications
citations
3 / 91
5. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
What this talk is about: statistics
Dealing with a large amount of numbers:
efficient algorithms
performant computers
new methods, e.g., DFT
Statistical methods used to concentrate information
largely used in environmental sciences, medicine, finance, ....
very useful
pitfalls
In spite of mathematical rigor, using statistical indicators does not
avoid human decision.
4 / 91
6. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Predicting with or without understanding
Physical models with systematic improvement:
understanding
Improvement can be seen with optimism
Limitations:
cost and time
absence of rigorous bounds
Statistical models (correlations):
without knowing the underlying cause
Legitimate when used with necessary care
Limitations:
Choice of quantities (properties) entering the model
Statistical treatment
Conclusions drawn
5 / 91
7. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Overview
Properties (quantities) analyzed
Quality of approximation (model)
Decisions to take (human)
(How do the preceding points affect the design of methods?)
6 / 91
8. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Unjustified correlations
Happiness(t) = w0 + w1
t
j=1
γ
t−j
CRj + w2
t
j=1
γ
t−j
EVj + w3
t
j=1
γ
t−j
RPEj
R.B. Rutledge et al, PNAS 2014
Was happiness properly
defined?
Were the factors
determining it properly
chosen?
How is the agreement of
the data with the model?
Do we learn how to get
happier?
7 / 91
9. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Justified correlations: predicting without
understanding
Captain James Cook forced his crew to eat sauerkraut, without
knowing that lack of vitamin C produces scurvy.
Properties: sauerkraut (containing vitamin C) and number of
sailors getting scurvy
Agreement: very good (although no statistics)
Acting: Cook acted, and avoided scurvy
8 / 91
10. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Justified or unjustified?
Comparison of data obtained
by a model (e.g., density functional approximation, DFA),
and reference values (experimental, or calculated by a more
advanced model)
20 15 10 5
20
15
10
5
reference
B3LYP
9 / 91
11. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Overview
Properties we study
Measures of satisfaction (statistical indicators)
Decisions to take (human)
10 / 91
12. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Origin of properties
Do we get properties we need
from experiment?
from calculations?
11 / 91
13. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Reference data from experiment
Do we get properties from experimental data?
Error bars
Corrections
Models used to analyze the data
12 / 91
14. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Error bars
13 / 91
15. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Is 1 kcal/mol chemical accuracy?
Results for the G1 set (atomization energies for 55 molecules)
The experimental data reported here are taken from a combination of NIST-JANAF
tables and Huber and Herzberg, ...
Most experimental errors are small, i.e., < 0.5 kcal/mol, although several are somewhat
larger, e.g., CS has an experimental error of 6 kcal/mol. ..
For several species experimental errors are unavailable.
J.C. Grossman, 2002
14 / 91
16. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Is 1 kcal/mol chemical accuracy?
Results for the G1 set (atomization energies for 55 molecules)
The experimental data reported here are taken from a combination of NIST-JANAF
tables and Huber and Herzberg, ...
Most experimental errors are small, i.e., < 0.5 kcal/mol, although several are somewhat
larger, e.g., CS has an experimental error of 6 kcal/mol. ..
For several species experimental errors are unavailable.
J.C. Grossman, 2002
Difficulty to extract experimental error bars from published data
Cf. J. Cioslowski et al, 2000
14 / 91
17. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Corrections to experimental data
15 / 91
18. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Temperature dependence of lattice constants
Lattice constants measured to 5 significant digits
How many digits at 0 K? Hebstein, Acta Cryst B 2000
16 / 91
19. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Models behind experimental data
17 / 91
20. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Fundamental band gaps
R. T. Poole, J. G. Jenkin, J. Liesegang, R. C. Leckey
Phys. Rev. B 11, 5679, 1975
Independent particle
model
Origin of data
PES, and inverse
PES?
exciton structure?
. . .
18 / 91
21. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Spurious effects on experimental data?
For example, Taylor and Hartman tentatively placed the valence
band of LiF at about 13 eV below the vacuum level on the basis of
an edge in the photoelectric yield curve. However, their yield curve
continues to fall rapidly at lower photon energies, and this may be
interpreted as a threshold of approximately 10 eV, which compares
favorably with our estimate of 9.8 eV for this quantity.
R. T. Poole, J. G. Jenkin, J. Liesegang, R. C. Leckey Phys. Rev. B 11, 5679, 1975
Problems for band gaps: nuclear motion, surface, ... effects, etc.
19 / 91
22. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Ideal and real experimental data
ZnO Weinen et al. (report from cpfs.mpg.de)
Reproduce the spectrum not the gap
(H. Tjeng, in Lausanne 2014)
20 / 91
23. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Reference data from calculations
Same quantity can be calculated with reference and model
Is the theory behind the calculation capable to provide the
desired quantity?
Can we trust calculated data?
21 / 91
24. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Is the theory behind the calculation capable to
provide the desired quantity?
22 / 91
25. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Calculating fundamental band gaps with different
methods
IP − EA
Provided by exact Green’s functions
Not provided by Hartree-Fock∗
Not provided by exact Kohn-Sham orbital energies
1966: Sham-Kohn, 1983: Perdew-Levy, Sham-Schl¨uter, ...∗
Exact Kohn-Sham calculations would provide exact results using two
separate calculations, for X, and for X−
Density functional hybrids∗
?
∗
Just correlation for most calculations?
23 / 91
26. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Do we use the right theory?
smartandgreen.eu
Do we need fundamental or optical band gaps?
24 / 91
27. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Can we trust calculated data?
Higher quality calculations may not
have the accuracy needed for comparisons with lower level
methods
be allowed to “filter” (to decide whether a lower level
calculation is good or bad)
25 / 91
28. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Accuracy of “reliable” calculations
26 / 91
29. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
How accessible are sufficiently accurate data?
Mean absolute errors (kcal/mol)
G1 set (atomization energies for 55 molecules)
J.C. Grossman, 2002
B3LYP DMC CCSD(T)/aug-cc-pVQZ CCSD(T)/CBS
2.5 2.9 2.8 1.3
27 / 91
30. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
How accessible are sufficiently accurate data?
Mean absolute errors (kcal/mol)
G1 set (atomization energies for 55 molecules)
J.C. Grossman, 2002
B3LYP DMC CCSD(T)/aug-cc-pVQZ CCSD(T)/CBS
2.5 2.9 2.8 1.3
Does the reference have the same (in)accuracy?
27 / 91
31. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Effect of improving the calculated reference data
Weak interaction benchmark data set (S22)
given set of molecules
2006 calculations (Jurecka et al.)
2011 calculations (Marshall et al.)
Percentage of “correct” results changes from 55% to 86%
“Correct”: within ±0.5 kcal/mol with B3LYP corrected for dispersion by XDM
20 15 10 5
20
15
10
5
reference
B3LYP
28 / 91
32. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Filtering using “reliable” calculations
Perform reference calculations with a different method, and refrain
from accepting results when the two methods disagree.
Example: Perform point-wise an expensive calculation to verify
cheap calculation.
29 / 91
33. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Why filtering not necessarily is an improvement
Cases:
Model says Filter selects Filter rejects
result reliable a (correct) b (error)
result unreliable c (error) d (correct)
Fraction of reliable results:
before selection by filter: (a + b)/(a + b + c + d)
after selection by filter: a/(a + c)
Filtering brings no improvement when
(a + b)/(a + b + c + d) ≥ a/(a + c), or bc ≥ ad
30 / 91
34. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Filters not always successful
Band gaps in 28 cubic crystals
Filter is too restrictive, but reliable
PBEsol PBE filter selects (39%) PBE filter rejects (61%)
Result reliable (93%) 11 15
Result unreliable(7%) 0 2
100% (11 out of 11) results selected by PBE filter are reliable
Filter selects reasonably well, and is useless
PBEsol PBE0 filter selects (92%) PBE0 filter rejects (8%)
Result reliable (93%) 24 2
Result unreliable(7%) 2 0
Results selected by PBE0 filter are reliable, ≈ as without filter, but systems excluded.
PBEsol and PBE0 make the same mistake: unreliable results are close, and thus wrongly selected (ad = 0). Some reliable
PBEsol results are not close to PBE0, and rejected (bc = 0). Thus, ad < bc.
31 / 91
35. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Reference data. Summary
Pitfalls:
Experimental data
Error bars
Corrections applied
Model used
Calculated data
may not be accurate enough
32 / 91
36. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Benchmarks
Model and benchmark?
33 / 91
37. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Benchmarks
Benchmark and reality
33 / 91
38. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Reference data. Conclusion
To judge the quality of a method we compare to benchmarks.
These can be inappropriate.
Need for critical analysis of the accuracy of the reference data
from the perspective of the problem under study.
34 / 91
39. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Reference data. Conclusion
To judge the quality of a method we compare to benchmarks.
These can be inappropriate.
Need for critical analysis of the accuracy of the reference data
from the perspective of the problem under study.
Once we have decided about reference data, we have to
define a measure quantifying our choice: statistical indicators
34 / 91
40. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Overview
Properties we study
Measures of satisfaction (statistical indicators)
Decisions to take (human)
35 / 91
41. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Diagnostic tools
Weighing of the heart Book of the Dead, Papyrus of Ani, British Museum
With large amount of numbers: need for representative numbers
36 / 91
42. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Statistical indicators. Overview
Many indicators (mean, median, mode, ...)
Role of sampling
37 / 91
43. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Many indicators
Indicators can yield different ranking
When the mean has no meaning
38 / 91
44. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Indicators can yield different ranking
39 / 91
45. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Indicators do not yield the same ordering of
methods
Absolute errors: |xi |
Mean: 1
n i=1,n |xi |
Median: half of the |xi | are < median, half > median
Maximum: max(|x1|, |x2|, . . . , |xN |)
Results for the G3/99 benchmark set (kcal/mol)
Method Mean Median Max
B3LYP 4 2 34
LC-ωPBE 5 4 25
Which method is better?
40 / 91
46. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Condensing data by indicators
Radiation around Fukushima Daiichi NPS
NNSA 04/03/2011
Evacuation at 30 km, exclusion zone 20 km
Radiation may more important at >30 km than at 10 km.
Mean: bad indicator
41 / 91
47. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Origin of problems
Error distribution, and its mean
mean is relevant irrelevant
42 / 91
48. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
A simple model for parametrized approximations
A mathematically (=clearly) defined problem
Model Analogy
x ∈ (0, 1) Choice of system (random)
(1 + x)2
Exact result
1 + mx Approximation
m ∈ (2, 3) Parameter
y = (1 + mx) − (1 + x)2
Error of the approximation
Objective: “Recommend the best m”
43 / 91
49. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
A set of simple models
m = 2 using exact value of function and derivative at origin
m = 3 using exact value of function at origin and endpoint
2 < m < 3 using exact value of function at origin, and some
other criterion of similitude
44 / 91
50. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Approximations do not yield normally distributed
errors
Model and its error distribution
45 / 91
51. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Absolute error distributions
Origin of difference between medians (red), max, mode, . . .
B3LYP LC wPBE
count
0204060
count
0 20 40 60
0
10
20
30
absolute error
Two density functional methods, for G3/99
46 / 91
52. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
When the mean has no meaning
47 / 91
53. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
When the mean has no meaning
h
1
Α
α is uniformly distributed on (0, π/2).
What is the mean value of h?
48 / 91
54. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
When the mean has no meaning
h
1
Α
α is uniformly distributed on (0, π/2).
What is the mean value of h?
π/2
0
dα tan(α) =
∞
0
h d arctan(h) =
∞
0
h dh
1+h2 = ∞
Variance also diverges.
cf. Lorentzian shape for peaks in spectroscopy
48 / 91
55. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
No mean, no variance, ...?
B3LYP distribution of atomization energy errors (G3/99)
Distributed as 2
πa (1 + (h/a)2
)−1
?
0 5 10 15 20 25 30 35
0
50
100
150
200
absolute error kcal mol
frequency
B3LYP
Mean on sample: 4 kcal/mol
Explanation: small errors accumulate in larger systems
49 / 91
56. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Shape of distributions
Nearly uniform distribution of PBE absolute errors, G3/99
0 10 20 30 40 50
0
50
100
150
absolute error kcal mol
frequency
PBE
50 / 91
57. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Small errors accumulate in larger systems. A
simple model
Error of approximation when detaching an atom from a chain with
n bonds (n + 1 atoms) ≈ x
x
Error in the atomization energy: x n
Mean error for chains of n = 1, . . . , m:
1
m
m
n=1
x n =
1
m
x m(m + 1)/2 = x(m + 1)/2
diverges when m → ∞
The error of the atomization energy per atom → x, when m → ∞.
51 / 91
58. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Small errors accumulate in larger systems.
Atomization energies
G3, G2, and G1 benchmark sets with different functionals
MAE MAE/atom
G1
G2
G3
B3LYP
CAM-B3LYP
LC-ωPBE
B97-1
BLYP
PBE0
PW86PBE
PBE
BH&HLYP
G1
G2
G3
B3LYP
CAM B3LYP
LC ΩPBE
B97 1
BLYP
PBE0
PW86PBE
PBE
BH&HLYP
52 / 91
59. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Indicators are affected by sampling
53 / 91
60. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Uncertainty of mean
54 / 91
61. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Finite sampling brings uncertainty of mean
Simple example
Uniform sampling on interval (0,1):
Distribution of 100 means of samples of 100
0.45 0.50 0.55
0
5
10
15
mean
count
55 / 91
62. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Finite sampling brings uncertainty of mean
G3/99 MAEs
(and subsets with randomly reduced sample size - from 221 to 22)
Full G3 99 set Subset 1 Subset 2 Subset 3
1
2
3
4
5
6
MAE kcal mol
B97 1
CAM B3LYP
B3LYP
56 / 91
63. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Finite sampling brings uncertainty of mean
Benchmarks for weak interactions
0.2
0.3
0.4
0.5
0.6
B3LYP
BLYP
LC
−ωPBE
PW
86PBEBH
&H
LYPC
AM
−B3LYPPBE0
B97−1
PBE
MeanAbsoluteError(kcal/mol)
KB49
S22
S66
S115
Are differences between methods important?
57 / 91
64. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Statistical indicators. Summary
Different statistical indicators can lead to different
conclusions, and may even not exist
Sampling is unavailable, and brings uncertainty
58 / 91
65. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Composite Portraiture
59 / 91
66. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Statistical indicators. Conclusions
Statistical indicators (MAE, etc.)
are useful, maybe unavoidable, but
can induce into error, and thus
should be used with care
In spite of mathematical formulation,
supplementary criteria needed
60 / 91
67. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Overview
Properties we study
Measures of satisfaction (statistical indicators)
Decisions to take (human)
61 / 91
68. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Actions (decisions) after knowing the results of
the (statistical) analysis
Living with uncertainties
Accurate values or accurate trends
Domain of validity
Utility
Psychology of decision
Publishing only correct results
62 / 91
69. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Living with uncertainties
63 / 91
70. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Uncertainties in reference values affect judgment
of calculated data
Lattice constants in some cubic crystals: MSE±RMSD
LDA: -3.5±2.7 pm
HSEsol (best among tested functionals): 0.0±1.5 pm
Is the source of the error in the reference data?
Is there a need for better functionals?
64 / 91
71. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Uncertainty in reference data propagates to
ranking of functionals
What is agreement with experimental data, if we do not know how
accurate experimental data are?
Percentage of computed (dispersion corrected) results
in agreement with experimental (G3/99) atomization energies,
within ±x kcal/mol
Method x = 0.5 x = 1 x = 2 x = 4
BLYP 9 14 24 42
B3LYP 9 22 44 73
CAMB3LYP 7 13 29 57
65 / 91
72. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Accurate values or trends?
66 / 91
73. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Accurate values or trends?
Which method is better?
0
Good mean (more accurate) Good variance (better trend)
MAE not a good indicator: mixes systematic with random errors
66 / 91
74. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Mean errors and variance for band gap
calculations
2 0 2 4 6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
ME
Σ
HF LDA PBE PBEsol
PBE0 PBEsol0 B97 B3LYP
HSE06 HSEsol HISS LC wPBE
LC wPBEsol RSHXLDA wB97 wB97 X
Prefer HISS?
67 / 91
75. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Correcting for the mean error of band gaps
Constant shift easy to correct (new approximation):
error → error-ME (choice by σ)
2 0 2 4 6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
ME
Σ
HF LDA PBE PBEsol
PBE0 PBEsol0 B97 B3LYP
HSE06 HSEsol HISS LC wPBE
LC wPBEsol RSHXLDA wB97 wB97 X
Prefer LC-ωPBEsol?
With one parameter HF can be made as good as HISS.
68 / 91
76. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Correcting lattice constants
69 / 91
77. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Domain of validity
70 / 91
78. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Relevance of the benchmark data set
Questions
Is the benchmark relevant for the problem of interest?
E.g., not when benchmark designed for one property, and used for another
Is the benchmark biased? The benchmark is based upon
systems that may be different from the system under study,
e.g., because of a shift of interest with time
Example: Do not expect good large band gaps, based upon the experience with
small band gaps
0 2 4 6 8 10 12 14
0
1
2
3
0
1
2
3
band gap eV
bandgapeV
HSE06
71 / 91
79. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Which method is better?
Band gaps of a set of crystals: better use HSE06 or HISS?
0 2 4 6 8 10 12
0.0
0.5
1.0
1.5
band gap eV
absoluteerroreV
HSE06
HISS
72 / 91
80. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Avoiding mixing?
Not always possible: systems where
different “components” are simultaneously needed,
the relative importance of “components” is not reproduced
the same way by different methods.
73 / 91
81. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
When mixing produces inversions
We split the S115 benchmark set into:
1. H-bonded (HB)
2. weakly interacting (WI)
One of the method gives better MAEs for both benchmark sets.
Situations where the worse method gives better results?
74 / 91
82. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
When mixing produces inversions
We split the S115 benchmark set into:
1. H-bonded (HB)
2. weakly interacting (WI)
One of the method gives better MAEs for both benchmark sets.
Situations where the worse method gives better results?
Yes, when weighing of HB and WI is different!
HB WI
4
5
6
7
8
9
10
Interaction
MAE
B3LYP
PBE0
74 / 91
83. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Best method not for all properties
Mean absolute errors
Method Lattice constant (pm) Bulk modulus (GPa)
HSEsol 1 7
PBE0 4 5
75 / 91
84. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Utility
76 / 91
85. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Utility of a calculation
Choose between A and B
Result A (expensive) B (cheap)
A better than B good bad
A as good as B good good
A worse than B bad good
77 / 91
86. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Quantify the utility of a calculation
“Gain” obtained by choosing between A and B
cheap: +1, expensive, -1; good: +2, bad, -2
A (expensive) B (cheap)
A better than B good (+2-1) bad (-2+1)
A as good as B good (+2-1) good (+2+1)
A worse than B bad (-2-1) good (+2+1)
78 / 91
87. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Quantify the utility of a calculation for lattice
constants
Choice between A (PBEsol) and B (LDA) for lattice constants
A (expensive) B (cheap) probability
A better than B 1 -1 1/2
A as good as B 1 3 1/4
A worse than B -3 3 1/4
“Gain” 0 1
MAE 2.4 pm 3.5 pm
MAE (or probability) and “gain” give contradictory recommendations
79 / 91
88. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Psychology of decision
Market behavior
80 / 91
89. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
What is the best method?
Safest result?
Minimize maximal error
Most stable error on average?
81 / 91
90. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Share portfolio
Dow Jones Industrial Average on 2 June 2014
Company Yearly change
Pfitzer Inc -3.26
Walt Disney Co 9.96
http://money.cnn.com/data/markets/dow
Share portfolio does not insure highest gain, but enhances stability
Supposing long-term gain at stock exchange
82 / 91
91. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Using a portfolio of methods?
Using different methods, in the right proportion can enhance
stability of results
Do 25% of calculations with RPA and 75% with RPAx
HB7
WI8
Type of interaction
RPA
RPAx
Method
2.0
2.5
3.0
Error
Data from W. Zhu et al, JCP 2010
83 / 91
92. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Using a portfolio of methods?
Using different methods, in the right proportion can enhance
stability of results
Do 25% of calculations with RPA and 75% with RPAx
HB7
WI8
Type of interaction
RPA
RPAx
Method
2.0
2.5
3.0
Error
Data from W. Zhu et al, JCP 2010
Mixing by community
Invisible hand of the market
83 / 91
93. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
The probability to predict the correct result
Assumptions:
New calculations yield acceptable results, distributed as in the
benchmark set.
The probability to obtain a good result is given by
p =
a
t
a: number of results within the accepted threshold,
t: total number of results in the benchmark set
Probability to obtain at least k correct answers, out of 10
(binomial distribution)
10
j=k
10
j
pj
(1 − p)10−j
84 / 91
94. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Probability to publish n correct results out of ten
The probability to obtain B3LYP+XDM atomization energies with
absolute errors per atom less than a chosen maximum acceptable
value, for a set of n systems and assuming the same error
distribution as the G3 data set.B3LYP errors, based upon G3/99
atomization energies/atom
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Probability
Maximum acceptable absolute error (kcal/mol)
10
5
2
1
Even with a very large tolerance (3 kcal/mol per atom) it is more probable to not have all
ten results right, than to have them all right.
85 / 91
95. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Human decisions. Summary
Not necessarily the same choice, if made for best accuracy, or
for best trend
The community’s choice (experience with one class of
systems, properties, ...) might not be the best adapted to the
problem under study
Criteria from decision theory can orient choices otherwise
than “diagnostic tools” (like MAE, etc.)
Statistics tell us that there are many unreliable results in the
literature
86 / 91
96. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Human decision. Conclusions
Decision taking is unavoidable.
Specifying the criteria for the decision taken should be part of
the study.
87 / 91
97. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Conclusions
What should I do ...?
Auguste Rodin Le Penseur Mus´ee Rodin
88 / 91
98. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Conclusions
Many pitfalls. Recommendations
Learning statistics and decision theory is useful when working
with large amounts of data
89 / 91
99. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Conclusions
Many pitfalls. Recommendations
Learning statistics and decision theory is useful when working
with large amounts of data
Good old effort for understanding should not be forgotten
35th Midwest Theoretical Chemistry Conference, Ames (2003)
Speaker: I use DFT, because it is an easy to use black box, and
does not require much thinking.
K. Ruedenberg: Why is it a bad thing to think?
Calculations get easy, but expertize still needed
89 / 91
100. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Some (difficult) questions to try to answer
What do we want to know from the calculation?
Is the theory we use capable to provide it?
What accuracy do we need?
Do we expect the approximations we make give the necessary
accuracy?
On what is based our judgment (knowledge, experience,
advice, impact factor, ...)?
Are the reference data significant for our problem?
How do we judge the accuracy of the approximation
(sufficient data, significant indicators, . . . ) ?
Is their accuracy sufficient for our purpose?
If the accuracy is not sufficient, what are we willing to give
up?
. . .
90 / 91
101. Trust
A. Savin
Introduction
Overview
Properties
From experiment
From calculations
Statistical indicators
Many indicators
Indicators can yield
different ranking
When the mean has
no meaning
Indicators are
affected by sampling
Human decisions
Living with
uncertainties
Accurate values or
trends
Domain of validity
Utility
Psychology of
decision
Publishing reliable
results
Conclusions
Progress in science does not necessarily come
from better accuracy
Nearly uniform distribution of BH&HLYP absolute errors, G3/99
0 10 20 30 40 50
0
50
100
150
absolute error kcal mol
frequency
BH&HLYP
BH&H: at the origin of hybrids
91 / 91