This is almost self-contained explanation of our recent result. The contents are based on our talk in the RIMS2014 conference.
Recently, many fundamental and important results in statistical decision theory have been extended to the quantum system. Quantum Hunt-Stein theorem and quantum locally asymptotic normality are typical successful examples.
In our recent preprint, we show quantum minimax theorem, which is also an extension of a well-known result, minimax theorem in statistical decision theory, first shown by Wald and generalized by LeCam. Our assertions hold for every closed convex set of measurements and for general parametric models of density operator. On the other hand, Bayesian analysis based on least favorable priors has been widely used in classical statistics and is expected to play a crucial role in quantum statistics. According to this trend, we also show the existence of least favorable priors, which seems to be new even in classical statistics.
Processing & Properties of Floor and Wall Tiles.pptx
Quantum Minimax Theorem in Statistical Decision Theory (RIMS2014)
1. This presentation is shown in RIMS conference,
Kyoto, JAPAN
ftanaka@sigmath.es.osaka-u.ac.jp
Quantum Minimax Theorem
in Statistical Decision Theory
November 10, 2014
PPuubblliic VVerrsiionn
田中冬彦((Fuyuhiko TANAKA))
Graduate School of Engineering Science, Osaka University
If you are interested in details, please see quant-ph 1410.3639 ; citations are also welcome!
2. Prologue:
What is Bayesian Statistics?
(Non-technical Short Version)
*It is for non-statisticians esp. researchers in quantum information.
3. Mission of Statistics
Mission of Statistics
To give a desirable and plausible method according to statistical models,
purpose of inference (point estimation, prediction, hypothesis testing, model
selection, etc.), and loss functions.
*Precise evaluation of estimation error is secondary.
Difference from Pure Math
Statistics is required to meet various needs in the world
→ Significant problems in statistics have changed according to the times
(This trivial fact does not seem to be known so much except for statisticians.)
4. What’s Bayesian Statistics? (1/2)
Bayesian Statistics
S i i Statistics d i i admitting h the usage of f i prior di ib i
distributions
prior distribution= prob. dist. on the unknown parameter
(We do not care about the interpretation of this prob.)
*Interpretation heavily depends on individual researcher.
5. What’s Bayesian Statistics? (2/2)
Relation to frequentist’s statistics (traditional statistics)
1 1. Bayesian statistics has enlarged the framework and satisfies
various needs. (Not an alternative of frequentists’ statistics!)
2 2. Sticking to frequentists frequentists’ methods like MLEs is the preference to a
specific prior distribution (narrow-minded!)
(Indeed, MLEs often fail; their independence of loss is unstuaible )
3. Frequentists’ results give good starting points to Bayesian analysis
For details, see, e.g., C. Robert, Bayesian Choice Chap.11
Frequentists
(T diti l
Bayesian Statistics
Traditional
statistics)
6. Misunderstanding
Since priors are not objective, Bayesian methods should not be used for
scientific result.
Objective priors (Noninformative priors) are proposed for such usage.
Bayesian statistics do not deny frequentists’ way and statistical inference based on an objective prior
often agrees with frequentists’ one. (ex: MLE = MAP estimate under the uniform prior)
It seems unsolved which noninformative prior should be used.
The essence of this problem is how to choose a good statistical
method with small sample. It is NOT inherent to Bayesian statistics,
but has been a central issue in traditional statistics.
Frequentists and mathematicians avoid this and only consider mathematically tractable
models and/or asymptotic theory (it’s large-sample approximation!)
In Bayesian statistics, this problem is transformed into the choice of noninformative prior
7. Further Reference
Preface in RIMS Kokyuroku 1834 (Theory and applications of statistical
inference in quantum theory)
POINT
- Point out some misunderstanding of Statistics
g
for non-experts (esp. Physicists)
- Define the word “quantum statistical inference”
as an academic term
Sorry but it is written in Japanese.
8. CONTENTS
0. Notation
1. Introduction
2. Quantum Statistical Models and Measurements
3. Quantum Statistical Decision Problems
4. Risk Functions and Optimality Criteria
5. Main Result
6. Minimax POVMs and Least Favorable Priors
7. Summary
10. Description of Probability in QM
Quantum Mechanics (QM): described by operators in a Hilbert space
P b bilit Probability: d ib described d b by d it density op. and d measurement
t
D it Density t
operator
Probability distributions
( ) (dx ) Tr ( )M(d x ) Measurement(POVM)
M(d x )
M t
Measurements
( ) x a ( x ) (dx )
M(dx )
Statistical processing of
data
11. Density Operators
Density operators (density matrices, state)
S ( ) : { ll d Hilb t }
all d.o.s on a Hilbert space { L() | Tr 1, 0}
When dimH n
n *
0 1
U
p
U
0 i p
j 0 p p n
Tr 1
1
j
12. Measurements in QM (1/2)
Mathematical model of measurements
= Positive Operator Valued Measure (POVM)
POVM (def def. for finite sets)
Sample space = Possible values for a measurement
{ , , , } 1 2 k x x x
POVM { POVM{ M j } j 1 1, , k = a family of linear ops ops. satisfying
M M x O j j : ({ })
j
j M I
Positive operator
(positive semidefinite matrix)
13. Measurements in QM (2/2)
How to construct a probability space (Axiom of QM)
d d.o o. S ( )
{ , , , } 1 2 k sample space x x x
j j k M 1, , { } POVM (Measurement)
Then the prob obtaining a measurement outcome x
j prob. j j p : Tr
M
* The above p prob. satisfies conditions (p positive and g
summing to one
M O j
due to both def.s of d.o.s and POVMs.
0
j I M j
j
Tr 1
and
15. Quantum Bayesian Hypothesis Testing(1/2)
Alice chooses one alphabet (denoted by an integer) and sends one
corresponding quantum state (described by a density operator)
di d ib d b d i )
1.
{ , ,..., } (H ) 1 2 S k 1,2,..., k
2. Bob receives the quantum state and perform a measurement and
guesses the alphabet Alice really sent to him.
The whole process is described by a POVM
j j 1, 2 , ,k {M }
3. The proportion of alphabets is given by a distribution
1, , , 0 1 1 k k
(called a prior distribution)
23214
1 2 4 2 3
16. Quantum Bayesian Hypothesis Testing(2/2)
* The probability that Bob guesses j when Alice sends i.
i j p (B j | A i) Tr M
* Bob’s loss (estimation error) when he guesses j while Alice sends i.
w(i, j )
k
* Bob’s risk for the i-th alphabet
R i w i j p B j A i
M ( ) ( , ) ( |
)
j
1
* Bob’s average risk (his task is to minimize by using a good POVM)
k
R ( i)
i
i i
1
M )
17. Summary of Q-BHT
Chooses one alphabet and sends one quantum state
1,2,..., { , ,..., } (H) 1 2 S k 1 2 k
Proportion of each alphabet
(prior distribution)
1 k 1,
k
0 Guess the alphabet by choosing a measurement (POVM)
j j 1, 2 , ,k {M }
k
k
r :
R ( i
) R ( i) ( i j
)T M
Average risk
i j i w M ) , Tr ,M M i
i 1
j
1
Bayes POVM w.r.t. = POVM minimizing the average risk
18. Minimax POVM
1.Chooses one alphabet and sends one quantum state
{ , ,..., } (H) 1 2 S k 1,2,..., k
Guess the alphabet by choosing a measurement (POVM)
j j 1, 2 , ,k
{M }
Minimize the worst-case risk
Bob has no prior information
*
The worst-case risk
: sup ( ) M
*M
r R i
i
Minimax POVM = POVM minimizing the worst-case risk
19. Quantum Minimax Theorem (Simple Ver.)
1.Chooses one alphabet and sends one quantum state
{ , ,..., } (H ) 1 2 S k 1,2,..., k
Guess the alphabet by choosing a measurement (POVM)
j j 1, 2 , ,k
{M } Theorem ( Hirota and Ikehara, 1982)
k
inf sup ( ) sup inf ( )
M M M M
i
i
i
R i R i
1
, )
Minimax POVM agrees with the Bayes POVM w.r.t. the worst case prior.
*The worst-case prior to Bob is called a least favorable prior.
21. states
Example (2/2)
1 0 0 1/ 2 1/ 2 0
0 0 0
Quantum ,
0 0 0 1 2
, 1/ 2 1/ 2 0
.
0 0 0
0 0 0
0 0 0 3
0 0 1
0-1 loss ij w(i, j) 1
LFP
(1) (2) 1 / 2, (3) 0 LF LF LF
Minimax POVM is constructed as a Bayes POVM w.r.t. LFP.
Important li i
Implication
When completely unknown, it is not necessarily wise to find an optimal POVM
with the uniform prior.
(Although Statisticians already know this fact long decades ago…)
22. Rewritten in Technical Terms
11.Nature
Parameter space Quantum Statistical Model
{ } S (H) 1,2,..., k
Experimenter and Statistician
Decision space POVM on U
u uU U 1,2,..., k {M }
Loss function
w( , u )
Risk function
u R ( ) w( , u )Tr M M
uU
Theorem
i f R ( ) i f
( ) R
( )
inf sup sup inf M M M M
23. Main Result (Brief Summary)
Quantum Minimax Theorem
R R
inf sup ( ) sup inf ( ) (d ) M M ( )
M M
Po
P Po
R ( ) ( )T (
)M(d )
: w( , u Tr du M where U
R R
cf)
inf sup ( ) sup inf ( ) (d )
D D
P ( ) Conditions, assumptions, and essence of proof are explained.
24. Previous Works
Wald (1950)
Statistical Decision Theory
H l Holevo (1973) Q t Quantum St titi Statistical lD ii Decision Th
Theory
Recent results and applications (many!)
Kumagai and Hayashi (2013)
Guta and Jencova (2007)
Hayashi (1998)
Le Cam (1964)
( Classical) Minimax Theorem in statistical decision theory
) y First ver. is given by Wald (1950)
we show
Quantum Minimax Theorem in quantum statistical decision theory
26. Formal Definition of Statistical Models
(naïve ) Quantum statistical model = A parametric family of density operators
3 1 2
1
1 i
0 ( ) 2 ( ) 2 ( ) 2
1 Ex.
1 2 3
i 1
1 2 3
2
( )
, , R 1 2 3
(r , s; ) r ( ) s (r ) ( s ) R
Basic Idea (Holevo, 1976)
A quantum statistical model is defined by a map
: (H) 1 L
( )
Next, we see the required conditions for this map
27. Regular Operator-Valued Function (1/2)
Definition
K
Locally compact metric space
compact set
: {( , ) : ( , ) } K K K d
0
(H) trace-class operators on a Hilbert space 1 L
: (H) 1 T L
For a map , we define
K X X T T X K T ( ) : inf : ( ) ( ) ,( , ) 1
Definition
An operator-valued function T
K T K
lim ( ) 0
0
is regular
28. Regular Operator-Valued Function (2/2)
Remark 1
lim ( ) 0
K T K
0
Uniformly continuous w.r.t. trace norm on every compact set
as
sup T ( ) T
(
)
0
0
Converse does not hold if
1
( , )
K
dim H
Remark 2
The regularity assures that the following operator-valued integral is well-defined
f ( )T ( ) (d )
for every ( ) P () 0 f
C and
K X X T T X K T ( ) : inf : ( ) ( ) ,( , ) 1
y
29. Quantum Statistical Models
Definition
S (H) : {X L(H) : Tr X 1, X 0}
: S (H) is called a quantum statistical model
if it satisfies the conditions below.
Conditions
1. Identifiability (one-to-one map) 1 2 1 2 ( ) ( ),
2. Regularity lim ( ) 0
0
T K
K
↑Necessary for our arguments
30. Measurements (POVMs)
U Decision space (locally compact space)
AU Borel sets (the smallest sigma algebra containing all open sets)
D fi i i
Definition
M
Positive-operator valued function is called a POVM if it satisfies
B A(U )
MB 0
U B B A j i B B
M B M( B )
j j
1 1
j
j
, , 1 2 i j ,
MU I
Po (U ) All POVMs over U
31. Born Rule
Axiom of Quantum Mechanics
For the quantum system described by
and a measurement described by a POVM
M(dx)
t measurement t outcome i is di tib t distributed d di according t
to
x ~ Tr M(dx ) (dx )
POVM x
M(dx )
(dx )
33. Basic Setting
Situation
For the quantum system specified with the unknown parameter
experimenters extract information for some purposes.
( ) POVM x a ( x )
(dx )
M(dx )
Typical sequence of task
1 1. To choose a measurement (POVM)
2. To perform the measurement over the quantum system
3. To take an action a(x) among choices based on the measurement
outcome x
formally called a decision function
34. Decision Functions
Example
- Estimate the unknown parameter x 1
a ( x )
xn n
x) - Estimate the unknown d.o. rather than parameter
a ( x ) a ( x
) 0
1 2
( ) ( ) 0
0 0 1 ( ) ( )
1 3
3
*
2
a x a x
a x a x
- Validate the entanglement/separability a ( x ) 0,1
- Constr ct Construct confidence region (credible region)
[a L (x), a R
(x)]
35. Remarks for Non-statisticians
Remarks
1. If the quantum state is completely known, then the distribution of
measurement outcome is also known and the best action is chosen.
2. Action has to be made from finite data.
3. Precise estimation of the parameter is only a typical example.
( ) POVM x a ( x )
(dx )
M(dx )
36. Loss Functions
Performance of decision functions is compared by adopting
the loss functions (smaller is better).
Ex: Parameter Estimation
Quantum Statistical Model
{ ( ) : R m }
Action space
Loss function (squared error)
2 w( , a ) a
Later we see the formal definition of the loss function.
37. Measurements Over Decisions
Basic Idea
From the g g, beginning, we y only consider POVMs over the action p
space.
Put together
( ) POVM x a ( x )
(dx )
M(dx )
( da )
POVM a
N(da )
( )
38. Quantum Statistical Decision Problems
Formal Definitions
Parameter space (locally compact space)
( ) Quantum statistical model
U *Decision p space ( y locally p compact p )
space)
w : U R { } Loss function (lower semi continuous**; bounded below)
The triplet ( ( ),U , w) l d bl
is called a quantum statistical decision problem.
cf) statistical decision ( p U w)
problem
,,
p (dx ) Statistical model
U Decision space
w : U R { } Loss function
*we follow Holevo’s terminology instead of saying “action space”.
** we impose a slightly stronger condition on the loss than LeCam, Holevo.
40. Comparison of POVMs
The loss (e.g. squared error) depends on both unknown parameter
and
our decision u u, which is a random variable variable.( (u ~ (du )
Tr ( )M(du ) )
In order to compare two POVMs, we focus on the average of the loss
w w.r r.t t. the distribution of u
u.
( ) POVM
E[w( , u )]
u
(du )
M(du )
N(du ' )
Compared at the same
POVM E'[w( ,u')]
( ) u '
(du ' )
41. D fi iti
Risk Functions
Definition
The risk function of
M Po (U )
R ( ) : w( , u )Tr ( )M(d u ) M E w( , u )
U
u ~ Tr ( )M(d u )
R kf N i i i
1. Smaller risk is better.
Remarks for Non-statisticians
2. Generally there exists no POVM achieving the uniformly smallest
risk among POVMs.
Since R M (
) depends on the unknown parameter, we need
additional optimality criteria.
42. Optimality Criteria (1/2)
Definition
A POVM is said to be minimax if it minimizes the supremum of the risk function, i.e.,
sup ( ) inf sup ( ) M M
R R
M* M ( ) Po U
Historical Remark
Bogomolov showed the existence of a minimax POVM (Bogomolov
1981) in a more general framework.
(Description of quantum states and measurements is different from recent one.)
43. Optimality Criteria (2/2)
P() All probability distributions (defined on Borel sets)
In y Bayesian , statistics, ( P() is called a p
prior distribution.
Definition
The average risk of M P o ( U ) w.r.t.
R
( ) (d )
: ,M M r
A POVM is said to be Bayes if it minimizes the average risk, i.e.,
,M M ( ) ,M inf r r
Po U
Historical Remark
Holevo showed the existence of a Bayes POVM (Holevo 1976; see also
Ozawa, 1980) in a more general framework.
44. Ex. of Loss Functions and Risk Functions
Parameter Estimation
{ ( ) : R m }
U 2 w( , u ) u
R ( ) u Tr ( )M(d u ) 2
M E [ ] 2 u
U
Construction of Predictive Density Operator
{ ( ) n : R p }
U S (H m ) w( , u ) D ( ( ) m || u )
R ( ) D( ( ) m || u )Tr ( ) m M(d u ) M U
46. Main Result (Quantum Minimax Theorem)
Theorem (*)
Let and U be a compact metric space.
Then the following equality holds for every quantum statistical decision problem.
R R
inf sup ( ) sup inf ( ) (d ) M ( ) M ( )
M ( ) M
Po U
P Po U
R ( ) : w( , u )Tr ( )M(d u ) M where the risk function is given by
U
Corollary
y
For every closed convex subset Q Po (U ) the above assertion holds.
*see quant-ph 1410.3639 for proof
47. Key Lemmas For Theorem
Compactness of the POVMs
Holevo (1976)
If the decision space U is compact, then P Po (U )
is also compact.
Equicontinuity of risk functions Lemma by FT
w : U R Loss function
If w is bounded and continuous, then
: { : M ( )} ( ) The equicontinuity implies M F R Po U C
is (uniformly) equicontinuous. F C ()
the compactness of
under the uniform convergence topology
We show main theorem by using Le Cam’s result with both lemmas.
However, not a consequence of their previous old results.
49. Previous Works (LFPs)
Wald (1950) Statistical Decision Theory
Jeffreys (1961)
Jeffreys prior
Bernardo ( , 1979, )
2005)
Reference prior; reference analysis
Komaki (2011)
Latent information prior
Tanaka (2012)
MDP p prior ( Reference p prior for p pure-states model)
)
Objective priors in Bayesian analysis are indeed least
favorable priors.
50. Main Result (Existence Theorem of LFP)
Definition
If a prior supremum, i.e., achieves the supremum i e the following holds
R R
inf ( ) (d ) sup inf ( ) (d ) M ( ) M ( )
M ( ) M
P Po U
Po U LF
The prior L F is called a least favorable prior(LFP).
R ( ) : w( , u )Tr ( )M(du ) M where
U
Theorem
Let and U be a p compact metric p
space.
w: continuous loss function.
Then, for every decision problem, there exists a LFP.
51. Pathological Example
Remark
Even on a compact space, a bounded lower semicontinuous (and not
continuous) loss function does not necessarily admit a LFP.
Example [0,1]
R( )
( ) ( , ) M R L u 1
( 1
0
0 1
M M
1 sup R ( ) sup R
( ) (d )
[ 0 ,1] [ 0 ,1]
But for every prior, 1 R ( ) (d )
[ 0 ,1] M
52. Minimax POVM is constructed using LFP
Corollary
Let and U
be a compact space
metric space.
If a LFP exists for a quantum statistical decision problem,
Every minimax POVM is a Bayes POVM with LFP
respect to the LFP.
In particular, if the Bayes POVM w.r.t. the LFP is unique, then it is
minimax.
y
Corollary2
For every closed convex subset Q Po (U ) the above assertion holds.
Much of theoretical results are derived from quantum
minimax theorem.
54. Discussion
1. We show quantum minimax theorem, which gives theoretical basis in
quantum statistical decision theory, in particular in objective Bayesian
analysis.
2. For every closed convex subset of POVMs, all of assertions still hold.
Our result has also practical meanings for experimenters.
Future Works
FutureWorks
- Show other decision-theoretic results and previous known results by using
quantum minimax theorem
- Propose a practical algorithm of finding a least favorable prior
- Define geometrical structure in the quantum statistical decision problem
If you are interested in details, please see quant-ph 1410.3639 ; citations are also welcome!