Presented at 2013 Computer-Aided Drug Design Gordon conference and subtitled, 'Eu prefiro a minha comida cozida e meus dados brutos' and the chicken in the title slide photo had been walking around the day before the photo was taken.
In this lesson, students will be shown that it is not enough to get measures of central tendency in a data set by scrutinizing two different data sets with the same measures of central tendency. We illustrate this using data on the returns on stocks where it is not only the mean, median and mode which are the same, it is also true for other measures of location like its minimum and maximum. However, the spread of observations are different which means that to further describe the data sets we need additional measures like a measure about the dispersion of the data, i.e. range, interquartile range, variance, standard deviation, and coefficient of variation. Also, the standard deviation, as a measure of dispersion can be viewed as a measure of risk, specifically in the case of making investments in stock market. The smaller the value of the standard deviation, the smaller is the risk.
In this lesson, students will be shown that it is not enough to get measures of central tendency in a data set by scrutinizing two different data sets with the same measures of central tendency. We illustrate this using data on the returns on stocks where it is not only the mean, median and mode which are the same, it is also true for other measures of location like its minimum and maximum. However, the spread of observations are different which means that to further describe the data sets we need additional measures like a measure about the dispersion of the data, i.e. range, interquartile range, variance, standard deviation, and coefficient of variation. Also, the standard deviation, as a measure of dispersion can be viewed as a measure of risk, specifically in the case of making investments in stock market. The smaller the value of the standard deviation, the smaller is the risk.
Introduces and explains the use of multiple linear regression, a multivariate correlational statistical technique. For more info, see the lecture page at http://goo.gl/CeBsv. See also the slides for the MLR II lecture http://www.slideshare.net/jtneill/multiple-linear-regression-ii
Descriptive statistics, central tendency, measures of variability, measures of dispersion, skewness, kurtosis, range, standard deviation, mean, median, mode, variance, normal distribution
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
This lecture provides an overview of linear regression analysis, interaction terms, ANOVA, optimization, log-level, and log-log transformations. The first practical example centers around the Boston housing market where the second example dives into business applications of regression analysis in a supermarket retailer.
UNIVARIATE & BIVARIATE ANALYSIS
UNIVARIATE BIVARIATE & MULTIVARIATE
UNIVARIATE ANALYSIS
-One variable analysed at a time
BIVARIATE ANALYSIS
-Two variable analysed at a time
MULTIVARIATE ANALYSIS
-More than two variables analysed at a time
TYPES OF ANALYSIS
DESCRIPTIVE ANALYSIS
INFERENTIAL ANALYSIS
DESCRIPTIVE ANALYSIS
Transformation of raw data
Facilitate easy understanding and interpretation
Deals with summary measures relating to sample data
Eg-what is the average age of the sample?
INFERENTIAL ANALYSIS
Carried out after descriptive analysis
Inferences drawn on population parameters based on sample results
Generalizes results to the population based on sample results
Eg-is the average age of population different from 35?
DESCRIPTIVE ANALYSIS OF UNIVARIATE DATA
1. Prepare frequency distribution of each variable
Missing Data
Situation where certain questions are left unanswered
Analysis of multiple responses
Measures of central tendency
3 measures of central tendency
1.Mean
2.Median
3.Mode
MEAN
Arithmetic average of a variable
Appropriate for interval and ratio scale data
x
MEDIAN
Calculates the middle value of the data
Computed for ratio, interval or ordinal scale.
Data needs to be arranged in ascending or descending order
MODE
Point of maximum frequency
Should not be computed for ordinal or interval data unless grouped.
Widely used in business
MEASURE OF DISPERSION
Measures of central tendency do not explain distribution of variables
4 measures of dispersion
1.Range
2.Variance and standard deviation
3.Coefficient of variation
4.Relative and absolute frequencies
DESCRIPTIVE ANALYSIS OF BIVARIATE DATA
There are three types of measure used.
1.Cross tabulation
2.Spearmans rank correlation coefficient
3.Pearsons linear correlation coefficient
Cross Tabulation
Responses of two questions are combined
Spearman’s rank order correlation coefficient.
Used in case of ordinal data
How to choose the right statistics techniques in different situation. This short presentation provide a compact summary on various method of statistics either descriptive and inferential.
for further inquiry please reach me at bodhiyawijaya@gmail.com
Sheltrex, Karjat is being developed on approximately 100 acres of land and is ideally situated in the green city of Karjat, which is mid-way between Mumbai, the financial capital and Pune, the IT and education hub of India. The city will host 20,000 houses from high rise, low rise and several commercial areas. It will accommodate a high school spread over 7 acres of land, a 35 bed state-of-the-art hospital facility spanning over 80,000 sq.ft, a 2 kilometer promenade along the River Ulhas, a Bollywood themed outdoor museum, a multi storied Shopping Mall, a Community Centre with 2 multiplexes, Swimming pool, Gym and Health Club amongst other world class amenities.
The Head of Technology at E.ON Kernkraft, Michael FUCHS, then presented the load follow from operator point of view with in particular, the issue of intermittency of wind and solar power versus stability of nuclear power.
Introduces and explains the use of multiple linear regression, a multivariate correlational statistical technique. For more info, see the lecture page at http://goo.gl/CeBsv. See also the slides for the MLR II lecture http://www.slideshare.net/jtneill/multiple-linear-regression-ii
Descriptive statistics, central tendency, measures of variability, measures of dispersion, skewness, kurtosis, range, standard deviation, mean, median, mode, variance, normal distribution
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
This lecture provides an overview of linear regression analysis, interaction terms, ANOVA, optimization, log-level, and log-log transformations. The first practical example centers around the Boston housing market where the second example dives into business applications of regression analysis in a supermarket retailer.
UNIVARIATE & BIVARIATE ANALYSIS
UNIVARIATE BIVARIATE & MULTIVARIATE
UNIVARIATE ANALYSIS
-One variable analysed at a time
BIVARIATE ANALYSIS
-Two variable analysed at a time
MULTIVARIATE ANALYSIS
-More than two variables analysed at a time
TYPES OF ANALYSIS
DESCRIPTIVE ANALYSIS
INFERENTIAL ANALYSIS
DESCRIPTIVE ANALYSIS
Transformation of raw data
Facilitate easy understanding and interpretation
Deals with summary measures relating to sample data
Eg-what is the average age of the sample?
INFERENTIAL ANALYSIS
Carried out after descriptive analysis
Inferences drawn on population parameters based on sample results
Generalizes results to the population based on sample results
Eg-is the average age of population different from 35?
DESCRIPTIVE ANALYSIS OF UNIVARIATE DATA
1. Prepare frequency distribution of each variable
Missing Data
Situation where certain questions are left unanswered
Analysis of multiple responses
Measures of central tendency
3 measures of central tendency
1.Mean
2.Median
3.Mode
MEAN
Arithmetic average of a variable
Appropriate for interval and ratio scale data
x
MEDIAN
Calculates the middle value of the data
Computed for ratio, interval or ordinal scale.
Data needs to be arranged in ascending or descending order
MODE
Point of maximum frequency
Should not be computed for ordinal or interval data unless grouped.
Widely used in business
MEASURE OF DISPERSION
Measures of central tendency do not explain distribution of variables
4 measures of dispersion
1.Range
2.Variance and standard deviation
3.Coefficient of variation
4.Relative and absolute frequencies
DESCRIPTIVE ANALYSIS OF BIVARIATE DATA
There are three types of measure used.
1.Cross tabulation
2.Spearmans rank correlation coefficient
3.Pearsons linear correlation coefficient
Cross Tabulation
Responses of two questions are combined
Spearman’s rank order correlation coefficient.
Used in case of ordinal data
How to choose the right statistics techniques in different situation. This short presentation provide a compact summary on various method of statistics either descriptive and inferential.
for further inquiry please reach me at bodhiyawijaya@gmail.com
Sheltrex, Karjat is being developed on approximately 100 acres of land and is ideally situated in the green city of Karjat, which is mid-way between Mumbai, the financial capital and Pune, the IT and education hub of India. The city will host 20,000 houses from high rise, low rise and several commercial areas. It will accommodate a high school spread over 7 acres of land, a 35 bed state-of-the-art hospital facility spanning over 80,000 sq.ft, a 2 kilometer promenade along the River Ulhas, a Bollywood themed outdoor museum, a multi storied Shopping Mall, a Community Centre with 2 multiplexes, Swimming pool, Gym and Health Club amongst other world class amenities.
The Head of Technology at E.ON Kernkraft, Michael FUCHS, then presented the load follow from operator point of view with in particular, the issue of intermittency of wind and solar power versus stability of nuclear power.
LA GIMNASIA DEL BIENESTAR. SE PRESENTAN LAS TRES ENFERMEDADES QUE ESTAN ATACANDO AL SER HUMANO QUE TRANSITA POR EL SIGLO XXI. SYNTERGYM HACE UNA PROPUESTA ENFOCADA A LIMPIAR, REPARAR Y ARMONIZAR, COMO UNA FORMA DE TRATAR NUESTRAS AMENAZAS EMERGENTES.
Print Security - Are Business Complacent?Adrian Boucek
How do you treat the security of your paper documents? Print security is often overlooked and this white paper describes how greater measures must be taken to keep your information safe and secure.
Process of converting data set having vast dimensions into data set with lesser dimensions ensuring that it conveys similar information concisely.
Concept
R code
Measure of dispersion has two types Absolute measure and Graphical measure. There are other different types in there.
In this slide the discussed points are:
1. Dispersion & it's types
2. Definition
3. Use
4. Merits
5. Demerits
6. Formula & math
7. Graph and pictures
8. Real life application.
Aspects of pharmaceutical molecular designPeter Kenny
Presented at ResearResearch Center for Molecular Medicine of the Austrian Academy of Sciencesch Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM) in July 2014 to a non-chemist audience. Not sure how it worked but it was an enjoyable visit and nobody fell asleep in the talk.
Some new directions for pharmaceutical molecular designPeter Kenny
I used this talk on visits to International Medical University (Kuala Lumpur), Nanyang Technological University (Singapore) and Novartis Institute for Tropical Diseases (Singapore)
Fragment screening library workshop (IQPC 2008)Peter Kenny
I also ran a workshop on selection of compounds for fragment screening just before the 2008 IQPC compound library conference and these are the slides I used.
Design of fragment screening libraries (IQPC 2008)Peter Kenny
These were the slides that I used for the 2008 IQPC compound libraries conference which was the first external lecture on fragment screening libraries.
Design of compound libraries for fragment screening (Feb 2012 version)Peter Kenny
Slimmed down fragment screening library talk presented at University of Adelaide (Dec 2011) and Pharmaxis (Feb 2012). Includes dingo and Maria Sharapova (losing finalist at 2012 Australian Open). The photo for the title slide is of a range finder from the Admiral Graf Spee and was taken in Montevideo.
Design of fragment screening libraries (Feb 2010 version)Peter Kenny
I have lectured on design of fragment screening libraries a number of times and, to be honest, my material is getting a bit dated. This presentation is from Feb 2010 when I was visiting CSIRO and the photo in the title slide was taken in Tierra del Fuego.
Lipophilicity in the context of molecular designPeter Kenny
I did this talk at Simpósio de Simulação Computacional e Avaliação Biológica de Biomoléculas na Amazônia (SSCABBA) in Belem on 12-Sept-2012. The photograph in the title slide was taken in Asunción.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
JMeter webinar - integration with InfluxDB and Grafana
Tales of correlation inflation (2013 CADD GRC)
1. Tales of correlation inflation
(Eu prefiro a minha comida cozida e meus dados brutos)
Peter W Kenny, Universidade de São Paulo
2. Correlation
• Strong correlation implies good predictivity
– I have observed a correlation so you must use my rule
• Multivariate data analysis (e.g. PCA) usually involves
transformation to orthogonal basis
• Applying cutoffs (e.g. MW restriction) to data can
distort correlations
3. Quantifying strengths of relationships between
continuous variables
• Correlation measures
– Pearson product-moment correlation coefficient (R)
– Spearman's rank correlation coefficient ()
– Kendall rank correlation coefficient (τ)
• Quality of fit measures
– Coefficient of determination (R2) is the fraction of the
variance in Y that is explained by model
– Root mean square error (RMSE)
4. Difference in mean values of Y for X = A and X = B
Scale by standard
deviation
Scale by standard
error
Cohen’s d
(independent of
sample size)
Student’s t
(depends on
sample size)
Size of effect for categorical X
R2 can be seen as analogous to Cohen’s d
5. r
N 1202
R 0.247 ( 95% CI: 0.193 | 0.299)
0.215 ( P < 0.0001)
0.148 ( P < 0.0001)
N 8
R 0.972 ( 95% CI: 0.846 | 0.995)
0.970 ( P < 0.0001)
0.909 ( P = 0.0018)
Correlation Inflation in Flatland
See Lovering, Bikker & Humblet (2009) JMC 52:6752-6756 DOI
6. Preparation of synthetic data sets
Kenny & Montanari (2013) JCAMD 27:1-13 DOI
Add Gaussian noise
(SD=10) to Y
7. Correlation inflation by hiding variation
See Hopkins, Mason & Overington (2006) Curr Opin Struct Biol 16:127-136 DOI
Leeson & Springthorpe (2007) NRDD 6:881-890 DOI
Data is naturally binned (X is an integer) and mean value of Y is calculated for each
value of X. In some studies, averaged data is only presented graphically and it is left to
the reader to judge the strength of the correlation.
R = 0.34 R = 0.30 R = 0.31
R = 0.67 R = 0.93 R = 0.996
8. Masking variation with standard error
See Gleeson (2008) JMC 51:817-834 DOI
Partition by value of X into 4 bins with equal numbers of data points and display 95%
confidence interval for mean (green) and mean ± SD (blue) for each bin.
R = 0.12 R = 0.29 R = 0.28
9. N Bins Degrees of Freedom F P
40 4 3 0.2596 0.8540
400 4 3 12.855 < 0.0001
4000 4 3 115.35 < 0.0001
4000 2 1 270.91 < 0.0001
4000 8 7 50.075 < 0.0001
“In each plot provided, the width of the errors bars and the difference in the mean
values of the different categories are indicative of the strength of the relationship
between the parameters.” Gleeson (2008) JMC 51:817-834 DOI
The error of standard error
ANOVA for binned data sets
10. Know your data
• Assays are typically run in replicate making it possible
to estimate assay variance
• Every assay has a finite dynamic range and it may not
always be obvious what this is for a particular assay
• Dynamic range may have been sacrificed for
thoughput but this, by itself, does not make the
assay bad
• We need to be able analyse in-range and out-of-
range data within single unified framework
– See Lind (2010) QSAR analysis involving assay results which are only known to
be greater than, or less than some cut-off limit. Mol Inf 29:845-852 DOI
11. Depicting variation with
percentile plots
This graphical representation of data makes it easy
to visualize variation and can be used with mixed
in-range and out-of-range data. See Colclough et
al (2008) BMCL 16:6611-6616 DOI
12. Binning continuous data restricts your options for analysis and
places burden of proof on you to show that your conclusions are
independent of the binning scheme. Think before you bin!
Averaging the
binned data was
your idea so don’t
try blaming me this
time!
13. Some stuff to think about
• Model continuous data as continuous data
– RMSE is most relevant to prediction but you still need R2
– Fitted parameters may provide insight (e.g. solubility is more sensitive than
potency to lipophilicity)
• When selecting training data think in terms of Design of Experiments
(e.g. evenly spaced values of X)
• Try to achieve normally distributed Y (e.g. use pIC50 rather than IC50)
• Never make statements about the strength of a relationship when
you’ve hidden variation in the data (unless you want a starring role in
Correlation Inflation 2)
• To be meaningful a measure of the spread of a distribution must be
independent of sample size
• Reviewers/editors, mercilessly purge manuscripts of statements like,
“A negative correlation was observed between X and Y” or “A and B are
correlated/linked”