Technology Trends for Private Member Clubs: A Look at Modern Club Systems for...Net at Work
See firsthand how today's Club Management Systems provide timely access to valuable member information - resulting in increased member satisfaction, improved operational efficiency, and an overall positive impact on profitability.
During this 60 minute session we will discuss technology trends for private member clubs and demonstrate what today's leading Modern Club Systems provide, including:
Quick & Easy Access to Member Information, Behavior and Preferences
Comprehensive Member Billing
Powerful Member Marketing Capability
Complete Operational Analytic Reporting: Financial - Social - Activity - by Member
Modern Financial Systems with Powerful Analytics Reporting
Analysis on Data Fusion Techniques for Combining Conflicting Beliefsijsrd.com
The consensus operator provides a method for combining possibly conflicting beliefs within the Dempster-Shafer belief theory, and represents an alternative to the traditional Dempster's rule. In everyday discourse dogmatic beliefs are expressed by observers when they have a strong and rigid opinion about a subject of interest. Such beliefs can be expressed and formalised within the Demspter- Shafer belief theory. This paper describes how the consensus operator can be applied to dogmatic conflicting opinions, i.e. when the degree of conflict is very high. It overcomes shortcomings of Dempster's rule and other operators that have been proposed for combining possibly conflicting beliefs.
On Severity, the Weight of Evidence, and the Relationship Between the Twojemille6
Margherita Harris
Visiting fellow in the Department of Philosophy, Logic and Scientific Method at the London
School of Economics and Political Science.
ABSTRACT: According to the severe tester, one is justified in declaring to have evidence in support of a
hypothesis just in case the hypothesis in question has passed a severe test, one that it would be very
unlikely to pass so well if the hypothesis were false. Deborah Mayo (2018) calls this the strong
severity principle. The Bayesian, however, can declare to have evidence for a hypothesis despite not
having done anything to test it severely. The core reason for this has to do with the
(infamous) likelihood principle, whose violation is not an option for anyone who subscribes to the
Bayesian paradigm. Although the Bayesian is largely unmoved by the incompatibility between
the strong severity principle and the likelihood principle, I will argue that the Bayesian’s never-ending
quest to account for yet an other notion, one that is often attributed to Keynes (1921) and that is
usually referred to as the weight of evidence, betrays the Bayesian’s confidence in the likelihood
principle after all. Indeed, I will argue that the weight of evidence and severity may be thought of as
two (very different) sides of the same coin: they are two unrelated notions, but what brings them
together is the fact that they both make trouble for the likelihood principle, a principle at the core of
Bayesian inference. I will relate this conclusion to current debates on how to best conceptualise
uncertainty by the IPCC in particular. I will argue that failure to fully grasp the limitations of an
epistemology that envisions the role of probability to be that of quantifying the degree of belief to
assign to a hypothesis given the available evidence can be (and has been) detrimental to an
adequate communication of uncertainty.
Frequentist inference only seems easy By John MountChester Chen
This is part of Alpine ML Talk Series:
The talk is called “Frequentist inference only seems easy” and is about the theory of simple statistical inference (based on material from this article http://www.win-vector.com/blog/2014/07/frequenstist-inference-only-seems-easy/ ). The talk includes some simple dice games (I bring dice!) that really break the rote methods commonly taught as statistics. This is actually a good thing, as it gives you time and permission to work out how common statistical methods are properly derived from basic principles. This takes a little math (which I develop in the talk), but it changes some statistics from "do this" to "here is why you calculate like this.” It should appeal to people interested in the statistical and machine learning parts of data science.
hypothesis-Meaning need for hypothesis qualities of good hypothesis type of hypothesis null and alternative hypothesis sources of hypothesis formulation of hypothesis, hypothesis testing
Technology Trends for Private Member Clubs: A Look at Modern Club Systems for...Net at Work
See firsthand how today's Club Management Systems provide timely access to valuable member information - resulting in increased member satisfaction, improved operational efficiency, and an overall positive impact on profitability.
During this 60 minute session we will discuss technology trends for private member clubs and demonstrate what today's leading Modern Club Systems provide, including:
Quick & Easy Access to Member Information, Behavior and Preferences
Comprehensive Member Billing
Powerful Member Marketing Capability
Complete Operational Analytic Reporting: Financial - Social - Activity - by Member
Modern Financial Systems with Powerful Analytics Reporting
Analysis on Data Fusion Techniques for Combining Conflicting Beliefsijsrd.com
The consensus operator provides a method for combining possibly conflicting beliefs within the Dempster-Shafer belief theory, and represents an alternative to the traditional Dempster's rule. In everyday discourse dogmatic beliefs are expressed by observers when they have a strong and rigid opinion about a subject of interest. Such beliefs can be expressed and formalised within the Demspter- Shafer belief theory. This paper describes how the consensus operator can be applied to dogmatic conflicting opinions, i.e. when the degree of conflict is very high. It overcomes shortcomings of Dempster's rule and other operators that have been proposed for combining possibly conflicting beliefs.
On Severity, the Weight of Evidence, and the Relationship Between the Twojemille6
Margherita Harris
Visiting fellow in the Department of Philosophy, Logic and Scientific Method at the London
School of Economics and Political Science.
ABSTRACT: According to the severe tester, one is justified in declaring to have evidence in support of a
hypothesis just in case the hypothesis in question has passed a severe test, one that it would be very
unlikely to pass so well if the hypothesis were false. Deborah Mayo (2018) calls this the strong
severity principle. The Bayesian, however, can declare to have evidence for a hypothesis despite not
having done anything to test it severely. The core reason for this has to do with the
(infamous) likelihood principle, whose violation is not an option for anyone who subscribes to the
Bayesian paradigm. Although the Bayesian is largely unmoved by the incompatibility between
the strong severity principle and the likelihood principle, I will argue that the Bayesian’s never-ending
quest to account for yet an other notion, one that is often attributed to Keynes (1921) and that is
usually referred to as the weight of evidence, betrays the Bayesian’s confidence in the likelihood
principle after all. Indeed, I will argue that the weight of evidence and severity may be thought of as
two (very different) sides of the same coin: they are two unrelated notions, but what brings them
together is the fact that they both make trouble for the likelihood principle, a principle at the core of
Bayesian inference. I will relate this conclusion to current debates on how to best conceptualise
uncertainty by the IPCC in particular. I will argue that failure to fully grasp the limitations of an
epistemology that envisions the role of probability to be that of quantifying the degree of belief to
assign to a hypothesis given the available evidence can be (and has been) detrimental to an
adequate communication of uncertainty.
Frequentist inference only seems easy By John MountChester Chen
This is part of Alpine ML Talk Series:
The talk is called “Frequentist inference only seems easy” and is about the theory of simple statistical inference (based on material from this article http://www.win-vector.com/blog/2014/07/frequenstist-inference-only-seems-easy/ ). The talk includes some simple dice games (I bring dice!) that really break the rote methods commonly taught as statistics. This is actually a good thing, as it gives you time and permission to work out how common statistical methods are properly derived from basic principles. This takes a little math (which I develop in the talk), but it changes some statistics from "do this" to "here is why you calculate like this.” It should appeal to people interested in the statistical and machine learning parts of data science.
hypothesis-Meaning need for hypothesis qualities of good hypothesis type of hypothesis null and alternative hypothesis sources of hypothesis formulation of hypothesis, hypothesis testing
The replication crisis: are P-values the problem and are Bayes factors the so...StephenSenn2
Today’s posterior is tomorrow’s prior. Dennis Lindley1 (P2)
It has been claimed that science is undergoing a replication crisis and that when looking for culprits, the cult of significance is the chief suspect. It has also been claimed that Bayes factors might provide a solution.
In my opinion, these claims are misleading and part of the problem is our understanding of the purpose and nature of replication, which has only recently been subject to formal analysis2. What we are or should be interested in is truth. Replication is a coherence not a correspondence requirement3 and one that has a strong dependence on the size of the replication study4.
Consideration of Bayes factors raises a puzzling question. Should the Bayes factor for a replication study be calculated as if it were the initial study? If the answer is yes, the approach is not fully Bayesian and furthermore the Bayes factors will be subject to exactly the same replication ‘paradox’ as P-values. If the answer is no, then in what sense can an initially found Bayes factor be replicated and what are the implications for how we should view replication of P-values?
A further issue is that little attention has been paid to false negatives and, by extension to true negative values. Yet, as is well known from the theory of diagnostic tests, it is meaningless to consider the performance of a test in terms of false positives alone.
I shall argue that we are in danger of confusing evidence with the conclusions we draw and that any reforms of scientific practice should concentrate on producing evidence that is reliable as it can be qua evidence. There are many basic scientific practices in need of reform. Pseudoreplication5, for example, and the routine destruction of information through dichotomisation6 are far more serious problems than many matters of inferential framing that seem to have excited statisticians.
References
1. Lindley DV. Bayesian statistics: A review. SIAM; 1972.
2. Devezer B, Navarro DJ, Vandekerckhove J, Ozge Buzbas E. The case for formal methodology in scientific reform. R Soc Open Sci. Mar 31 2021;8(3):200805. doi:10.1098/rsos.200805
3. Walker RCS. Theories of Truth. In: Hale B, Wright C, Miller A, eds. A Companion to the Philosophy of Language. John Wiley & Sons,; 2017:532-553:chap 21.
4. Senn SJ. A comment on replication, p-values and evidence by S.N.Goodman, Statistics in Medicine 1992; 11:875-879. Letter. Statistics in Medicine. 2002;21(16):2437-44.
5. Hurlbert SH. Pseudoreplication and the design of ecological field experiments. Ecological monographs. 1984;54(2):187-211.
6. Senn SJ. Being Efficient About Efficacy Estimation. Research. Statistics in Biopharmaceutical Research. 2013;5(3):204-210. doi:10.1080/19466315.2012.754726
The replication crisis: are P-values the problem and are Bayes factors the so...jemille6
Today’s posterior is tomorrow’s prior. Dennis Lindley
It has been claimed that science is undergoing a replication crisis and that when looking for culprits, the cult of significance is the chief suspect. It has also been claimed that Bayes factors might provide a solution.
In my opinion, these claims are misleading and part of the problem is our understanding
of the purpose and nature of replication, which has only recently been subject to formal
analysis.
What we are or should be interested in is truth. Replication is a coherence not a correspondence requirement and one that has a strong dependence on the size
of the replication study
.
Consideration of Bayes factors raises a puzzling question. Should the Bayes factor for a replication study be calculated as if it were the initial study? If the answer is yes, the approach is not fully Bayesian and furthermore the Bayes factors will be subject to
exactly the same replication ‘paradox’ as P-values. If the answer is no, then in what
sense can an initially found Bayes factor be replicated and what are the implications for how we should view replication of P-values?
A further issue is that little attention has been paid to false negatives and, by extension
to true negative values. Yet, as is well known from the theory of diagnostic tests, it is
meaningless to consider the performance of a test in terms of false positives alone.
I shall argue that we are in danger of confusing evidence with the conclusions we draw and that any reforms of scientific practice should concentrate on producing evidence
that is reliable as it can be qua evidence. There are many basic scientific practices in
need of reform. Pseudoreplication, for example, and the routine destruction of
information through dichotomisation are far more serious problems than many matters of inferential framing that seem to have excited statisticians.
D. Mayo: Replication Research Under an Error Statistical Philosophy jemille6
D. Mayo (Virginia Tech) slides from her talk June 3 at the "Preconference Workshop on Replication in the Sciences" at the 2015 Society for Philosophy and Psychology meeting.
Similar to pydata_confernence_july_2016_modified_sunday_final (20)
5. THE WHY?
Majority of predictive analytics today is
based on historic data. As data
scientists, we hunt and gather historic
data, train to generate models using
variety of algorithms, test and validate
it and finally deploy to score.
This however limits us to situations that
have been considered in our training
data. Our predictions don’t incorporate
any external or environmental factors
not already trained.
6. PHILOSOPHY
So far as the laws of
mathematics refer to
reality, they are not
certain. And so far as
they are certain, they
do not refer to reality.
As complexity rises,
precise statements
lose meaning and
meaningful
statements lose
8. DEFINITION :
RANDOMNESS
Box A contains 8 small and 8 big apples. You randomly
pick an apple from the box. There is uncertainty about
the size of the apple that will come out. The size of the
apple can be described by a random variable that takes
2 different values.
There are 2 distinct
possible outcomes and
there is no uncertainty
after the apple is taken
out and observed.
The probability of selected apple being
big or small is :
50 %
9. DEFINITION : FUZZINESS
Now, you have a box (Box B) that contains 16 apples
with different sizes ranging from very small to very
large. You have one of these apples in hand. There is
uncertainty in describing the size of the apple but this
time the uncertainty is not about the lack of knowledge
of the outcome. It is about the lack of the boundary
between small and large. You use a fuzzy
membership function to define the membership value
of the apple in the set ``large`` or ``small``. Everything
is known except how to label the objects (apples), how
to describe them and draw the boundaries of the sets.
Box B
11. DEMPSTER SHAFFER –
WHY?
• Dempster Shafer theory was used in AI and
reliability modeling1.
• Dempster Shafer led the path to Fuzzy logic. It
is a base theory for uncertainty modeling.
• Dempster–Shafer theory (a.k.a. theory of belief
functions) is a generalization of the Bayesian
theory of subjective probability.
• Whereas Bayesian theory requires probabilities
for each question of interest, belief functions
allow us to base degrees of belief (strength of
evidence in favor of some proposition) for one
question on probabilities for a related question.
1http://www.gnedenko-forum.org/Journal/2007/03-042007/article19_32007.pdf
12. BRIEF HISTORY:
• 17th Century Roots
• 1968 > Dempster developed means for combining degrees
of belief derived from independent items of evidence.
• 1976 > Shafer developed the Dempster-Shafer theory.
• 1988-90 > The theory was criticized by researchers as being
confusing probabilities of truth with probabilities of
provability2,3.
• 1998 > Kłopotek and Wierzchoń proposed to interpret the
Dempster–Shafer theory in terms of statistics of decision
tables
• 2011 > Dr. Hemant Baruah, The Theory of Fuzzy Sets:
Beliefs and Realities
• 2012 > Dr. Jøsang proved that Dempster's rule of
combination actually is a method for fusing belief
constraints.
• 2013 > Dr. John Yen - "Can Evidence Be Combined in the
Dempster-Shafer Theory"2 http://sartemov.ws.gc.cuny.edu/files/2012/10/Artemov-Beklemishev.-Provability-logic.pdf
3 https://en.wikipedia.org/wiki/Provability_logic
14. EXAMPLE :
Dempster–Shafer theory has 2 components :
• Obtaining degrees of belief.
• Combining degrees of belief.
She tells me a limb fell on my car. This statement
must be true if she is reliable.
So her testimony alone justifies a 0.9 degree of
belief that a limb fell on my car and a Zero (not 0.1)
degree of belief that no limb fell on my car.
Note : Zero degree of belief here does not mean that I am sure
no limb fell on my car. It merely means based on Betty’s
testimony there is no reason to believe that a limb fell on my
Betty
Belief function is 0.9 and 0
together
15. COMBINING BELIEFS
Sally
I have another friend Sally.
She independent of Betty, tells me a limb fell on my
car.
The event that Betty is reliable is independent of the
event that Sally is reliable. So we multiply the
probabilities of these events.
The probability that both are reliable is 0.9x0.9 =0.81,
The probability that neither is reliable is 0.1x0.1 =0.01,
&
The probability that at least one is reliable is 1-0.01
=0.99
Since both said a limb fell on my car, at least one of them being
reliable implies that a limb did fell on my car, and hence I may
assign this event a degree of belief of 0.99
Reference : http://www.glennshafer.com/assets/downloads/articles/article48.pdf
16. COMBINING BELIEFS :
CONTRADICT
Sally
Now suppose, Sally contradicts Betty. She tells me no
limb fell on my car.
In this case both cannot be right and hence both
cannot be reliable – one is reliable or neither is
reliable.
The prior probabilities :
Only Betty is reliable and Sally is not :
(0.9x0.1) = 0.09
Only Sally is reliable and Betty is not :
(0.9x0.1) =0.09
Neither is reliable is 0.1x0.1 =0.01
17. Only Betty is reliable while Sally is not, provided at least one is unreliable
:
P(Betty is reliable, Sally is not)/P(at least one is unreliable)
0.09/[1-(0.9x0.9)] =9/19
Only Sally is reliable while Betty is not, provided at least one is unreliable
:
P(Sally is reliable, Betty is not)/P(at least one is unreliable)
0.09/[1-(0.9x0.9)] =9/19
Neither of them are reliable, provided at least one is unreliable :
P(Sally is unreliable and Betty is unreliable)/P(at least one is
unreliable)
[0.1 x 0.1]/[1-(0.9x0.9)] =1/19
Hence we have 9/19 degree of belief that a limb fell on my car (Betty
reliable) & 9/19 degree of belief that no limb fell on my car (Sally
reliable)
SallyBetty
The posterior
probabilities :
COMBINING BELIEFS :
CONTRADICT
18. BASIC DEMPSTER SHAFER
We obtain degrees of belief for
one question (Did a limb fell on
my car?) from probabilities for
another question (Is the witness
reliable?)4.
4 http://www.glennshafer.com/assets/downloads/articles/article48.pdf
20. NEXT STEPS
• Quantify fuzziness in binary
prediction
• Obtain improved probabilities for
binary classification based off a
prior classification algorithm
using Modified Dempster-Shafer
21. DATASETS TESTED
Higgs Boson
Titanic
Cdystonia
Parkinsons Telemonitoring
Abalone
Yeast
Skin_Nonskin
Contraceptive Method Choice
Diabetic Renopathy
Mammographic Mass
Breast Cancer Wisconson
(Diagnostic)
Breast Cancer Wisconson
(Prognastic)
Haberman’s Survival
Statlog (Heart)
SPECT Heart
SPECTF Heart
Hepatitis
Sponge
Audiology
Soybean
Ecoli
Post-Operative Patient
Zoo
EEG Eye State
Wilt
Bach
Nursery
Phishing Websites
Pets
Contraceptive Method
First Order
Spambase
Hill Valley
Satellite Image
Agaricus Lepiota
King Rook vs King Pawn
Firm Teacher
Fertility
Dermatology
Mice Protein
LSVT Voice
Thyroid
22. MODIFIED DS SUMMARY
Road Forward
• Scale performance with data dimension
• Correct for highly imbalanced classes which
may lead to high-class probability when there is
none (high/low belief conflict)
• Test different prior algorithms
Positives
• Can significantly increase performance metrics
and to a greater precision
• Algorithm is not black-box. It can be understood
relatively easily
• Prior probabilities can come from algorithms
other than logistic regression
23. JAR OF MARBLES
Type equation here.Type equation here.
Person
1
Person 2
Majority is Red 70% 50%
Majority is Blue 20% 30%
Half Blue and
Red
10% 20%
What is the
majority color in
the jar?
A Dempster-Shafer student could
then say he/she believes:
Red Blue Half-half
81% 17% 2%
24. DEMPSTER’S RULE OF
COMBINATION
𝑚1⨁𝑚2 𝐴 =
1
1 − 𝐾
𝐵∩𝐶=∅
𝑚1(𝐵)𝑚2(𝐶) = 𝑚1,2 𝐴
Combines multiple pieces of evidence (mass functions).
Emphasizes the
agreement, and ignores
the conflict of the mass
functions by using the
normalization factor 1-K
𝐾 =
𝐵⋂𝐶=𝐴 ≠𝜙
𝑚1 𝐵 𝑚2 𝐶 , 𝐾 ≠ 1
K is the measure of
conflict between the two
mass sets meaning 1-K
is the measure of
agreement
25. DS NOTATION
Θ =
The plausibility function of a class
sums the mass values of all sets that
intersect that class. Measures what is
left over after subtracting evidence
against a belief.
pl(𝐴) = 𝐵⋂𝐴≠𝜙 𝑚(𝐵)
𝑏𝑒𝑙(𝐴) =
𝐵⊆𝐴
𝑚(𝐵)
The belief function of a class sums the
mass values of all the non-empty
subsets of that class. The strength of
evidence in favor of some proposition
𝑚(𝐴) ≤ 𝑏𝑒𝑙(𝐴) ≤ 𝑝𝑙(𝐴)
26. MODIFIED DS: OBTAINING
POSTERIOR
𝐶 𝐴 𝐸′ =
𝑚(𝐴|𝐸′
)
𝑃(𝐴)
𝐴⊂Θ
𝑚(𝐴|𝐸′)
𝑃(𝐴)
Given some evidence and prior probabilities for a
class, we can obtain a posterior or updated mass
value
R. P. S. Mahler D. Fixsen. The modified dempster-shafer approach to classification
𝐸′
The Basic Certainty Value: a normalized
ratio of a hypothesis subset mass
to its prior probability*
Evidence Space: a set of mutually exclusive outcom
or possible values of an evidence source.
27. JAR OF MARBLES
Type equation here.Type equation here.
Person
1
Person
2
Majority is Red 70% 50%
Majority is Blue 20% 30%
Half Blue and
Red
10% 20%
What is the
majority color in
the jar?
A Dempster-Shafer
student could then say
he/she believes:
Red Blue Half-
half
DS 81% 17% 2%
Modifie
d
84.9% 13.5% 1.6%
Prior Information
Red Blue
60% 40%
29. OBTAINING MASS VALUES
For Each Feature
1. Partition the training data by class
2. Calculate the max/min values of each partition
3. Determine which max/min value situation it is by
looking at the ranges of each partition
30. EXAMPLES
So far have tested DS algorithm on 43
datasets:
• UCI Machine Learning Repository
• R package “Datasets”
31. EMAIL SPAM
• Email spam is from a diverse
group of individuals. Predicting on
individuals definition of spam or
not.
• 58 features: primarily of different
word and character frequencies
• 4601 observations
https://archive.ics.uci.edu/ml/datasets/Spambase
Median Metrics Based of 10-fold CV
Metrics LR LRDS
Accuracy 0.63587 0.63913
AUC 0.76317 0.76265
F1 Score 0.45753 0.45840
Precision 0.52277 0.52470
True
Positive
0.39503 0.40151
False
Positive
0.01245 0.01384
33. SPECT HEART
• Cardiac Single Proton Emission
Computed Tomography images
• Predicting on normal or abnormal
heart
• 23 features on partial human
diagnoses of abnormality
• 267 observations
Source: https://archive.ics.uci.edu/ml/datasets/Skin+Segmentation
Median Metrics Based of 10-fold CV
Metrics LR LRDS
Accuracy 0.55660 0.72641
AUC 0.89846 0.89846
F1 Score 0.52168 0.57441
Precision 0.52618 0.66678
True
Positive
0.14074 0.50020
False
Positive
0.01562 0.036458
36. REFERENCES
• Application of Dempster Shafer in reliability modeling http://www.gnedenko-forum.org/Journal/2007/03-
042007/article19_32007.pdf
• Provability Logic http://sartemov.ws.gc.cuny.edu/files/2012/10/Artemov-Beklemishev.-Provability-logic.pdf
• Wiki Provability Logic https://en.wikipedia.org/wiki/Provability_logic
• Dr. Shafer's Article http://www.glennshafer.com/assets/downloads/articles/article48.pdf
• Home Page of Dr. Lotfi Zadeh https://www2.eecs.berkeley.edu/Faculty/Homepages/zadeh.html
• L. A. Zadeh, "The concept of a linguistic variable and its application to approximate reasoning - I," Information Sciences, vol.
8, no. 3, pp. 199-249, July 1975.
• R. E. Bellman and L. A. Zadeh, "Decision making in a fuzzy environment," Management Science, vol. 17, no. 4, pp. B141-
B164, Dec. 1970.
• L. A. Zadeh, "Fuzzy sets," Information and Control, vol. 8, no. 3, pp. 338-353, June 1965.
• Alexander Karlsson. R: Evidence Combination in R. R Foundation for Statistical Computing, MIT, 2012.
• R. P. S. Mahler D. Fixsen. The modified dempster-shafer approach to classification. IEEE Std. 1516-2000, pages 96 – 104,
Jan 1997.
• Jiwen Guan, Jasmina Pavlin, and Victor R. Lesser. Combining Evidence in the Extended Dempster-Shafer Theory, pages
163–178. Springer London, London, 1990
• M. Lichman. UCI machine learning repository, 2013
• Simon Parsons and London Wca Px. Some qualitative approaches to applying the dempster-shafer theory, 1994.
39. DS NOTATION
Θ =Θ = 𝑎, 𝑏
2Θ
= {𝜙, 𝑎, 𝑏, Θ}
A hypothesis for the possibilities which is a subset of
all possibilities
The power set is the set of all possible subsets of
Θ, including Θ itself and the empty set Φ
𝑚: 2Θ
→ [0,1]
Mass function: Each element of the
power set has a mass value assigned
to it
𝑚 𝜙 = 0 For the purposes of our algorithm, the
mass of the empty set is zero
𝐴 ⊆ Θ
𝑚 𝐴 = 1 Sum of masses (for each row) is 1
(currently)
Hi this is Taposh Roy and Austin Powell from Kaiser Permanente. This is part of the on-going research work done by Austin.
In this talk we will discuss the following:
Motivation of combining fuzziness and predictability.
Math behind Dempster Shafer method.
Our Analysis combining Logistic Regression with Dempster Shaffer.
Review of our results and interpretation
Majority of predictive analytics today is based on historic data.
As data scientists, we hunt and gather historic data, train to generate models using variety of algorithms, test and validate it and finally deploy to score.
This however limits us to situations that have historic information. Our predictions don’t incorporate any external or environmental factors not already trained.
Can you guess who said this ?
This is Dr. Zadeh, his research has been pioneering in this area of belief functions.
Before we go further, I am going to talk about a few concepts.
Box A contains 8 small and 8 big apples. You randomly pick an apple from the box. There is uncertainty about the size of the apple that will come out. The size of the apple can be described by a random variable that takes 2 different values with 50% probability each. There are 2 distinct possible outcomes and there is no uncertainty after the apple is taken out and observed.
You have a box (Box B) that contains 10 apples with different sizes ranging from very small to very large. You have one of these apples in hand. There is uncertainty in describing the size of the apple but this time the uncertainty is not about the lack of knowledge of the outcome. It is about the lack of the boundary between small and large. You use a fuzzy membership function to define the membership value of the apple in the set ``large`` or ``small``. Everything is known except how to label the objects (apples), how to describe them and draw the boundaries of the sets.
AI researchers in the early 1980s, when they were trying to adapt probability
theory to expert systems.
AI researchers in the early 1980s, when they were trying to adapt probability
theory to expert systems.
AI researchers in the early 1980s, when they were trying to adapt probability
theory to expert systems.
AI researchers in the early 1980s, when they were trying to adapt probability
theory to expert systems.
AI researchers in the early 1980s, when they were trying to adapt probability
theory to expert systems.
AI researchers in the early 1980s, when they were trying to adapt probability
theory to expert systems.
AI researchers in the early 1980s, when they were trying to adapt probability
theory to expert systems.
We could get one probability confidence interval using something like assuming normality for the true parameter. (1) Disadvantage is the usual issue associated with normality assumptions and sample size (2) This is for the final performance metric, not an a per-observation upper and lower bound
”To a greater precision” in first point is referring to the fact that in the following plots, boxplots have smaller bounds for LRDS
Person A and Person B probabilities should be introduced as mass values
Anyone who has ever done a probability problem has done a jar/urn problem with a certainty
Should note that for the example we are not considering a binary classification. In our case, we would only be considering Red and Green for example
Plausibility It also ranges from 0 to 1 and measures the extent to which evidence in favor of ~p leaves room for belief in p.
Will address what a “measure” is
In addition to combining information from different features. It is possible to use the same Bayesian process to integrate information from other data sources. Possibly as a way of weighting your masses. By way of example, say that we did logistic regression on the breast cancer dataset to obtain probabilities. We then could obtain probabilities from another binary prediction algorithm such as Naïve Bayes and combine the two to obtain mass values.
This equation is from the paper “A reasoning model based on an extended Dempster-Shafer Theory” – John Yen
The Basic Certainty Value of a hypothesis subset is the normalized ratio of the subset’s mass to its prior probability
Person A and Person B probabilities should be introduced as mass values
Anyone who has ever done a probability problem has done a jar/urn problem with a certainty
Should note that for the example we are not considering a binary classification. In our case, we would only be considering Red and Green for example
This figure assuming that the dataset of interest only has 4 features
In the real-world example, mass values would be someone’s opinion of the probability of a certain event.
Storyline for the results:
Firstly point out that no data cleaning, engineering, outlier removal etc. was done so the performance is all relative.
Introduce results with Titanic since it is a well-known dataset
The results tend to show that lrds does slightly better in general with Acc than AUC, however this is the metric were more interested in. So from an operational standpoint we have had good results.
Precision: “how many selected items are relevant”
Recall: “how many relevant items are selected”
Precision: “how many selected items are relevant”
Recall: “how many relevant items are selected”
PR curves are:
Better when looking at highly skewed datasets
An ROC curve dominates (as compared to other metrics) iff it dominates in PR curves
There are some disadvantages when it comes to optimizing, but PR was a performance idicator
Precision: “how many selected items are relevant”
Recall: “how many relevant items are selected”
Precision: “how many selected items are relevant”
Recall: “how many relevant items are selected”
3 main points of this slide:
If there is a big difference in all the different performance metrics, then its in LRDS’ favor. There has not been a decrease in performance across our metrics for LRDS
The exception to the above is for the False positive rate
There is no difference in the PR curves, but a large difference in performance metrics
Note that F1 that LRDS is actually greater, difference is backwards
Precision: “how many selected items are relevant”
Recall: “how many relevant items are selected”
Precision: “how many selected items are relevant”
Recall: “how many relevant items are selected”
Precision: “how many selected items are relevant”
Recall: “how many relevant items are selected”