This presentation is shown in RIMS conference, 
Kyoto, JAPAN 
ftanaka@sigmath.es.osaka-u.ac.jp 
Quantum Minimax Theorem 
in Statistical Decision Theory 
November 10, 2014 
PPuubblliic VVerrsiionn 
田中冬彦((Fuyuhiko TANAKA)) 
Graduate School of Engineering Science, Osaka University 
If you are interested in details, please see quant-ph 1410.3639 ; citations are also welcome!
Prologue: 
What is Bayesian Statistics? 
(Non-technical Short Version) 
*It is for non-statisticians esp. researchers in quantum information.
Mission of Statistics 
Mission of Statistics 
To give a desirable and plausible method according to statistical models, 
purpose of inference (point estimation, prediction, hypothesis testing, model 
selection, etc.), and loss functions. 
*Precise evaluation of estimation error is secondary. 
Difference from Pure Math 
Statistics is required to meet various needs in the world 
→ Significant problems in statistics have changed according to the times 
(This trivial fact does not seem to be known so much except for statisticians.)
What’s Bayesian Statistics? (1/2) 
Bayesian Statistics 
S i i Statistics d i i admitting h the usage of f i prior di ib i 
distributions 
prior distribution= prob. dist. on the unknown parameter 
(We do not care about the interpretation of this prob.) 
*Interpretation heavily depends on individual researcher.
What’s Bayesian Statistics? (2/2) 
Relation to frequentist’s statistics (traditional statistics) 
1 1. Bayesian statistics has enlarged the framework and satisfies 
various needs. (Not an alternative of frequentists’ statistics!) 
2 2. Sticking to frequentists frequentists’ methods like MLEs is the preference to a 
specific prior distribution (narrow-minded!) 
(Indeed, MLEs often fail; their independence of loss is unstuaible ) 
3. Frequentists’ results give good starting points to Bayesian analysis 
For details, see, e.g., C. Robert, Bayesian Choice Chap.11 
Frequentists 
(T diti l 
Bayesian Statistics 
Traditional 
statistics)
Misunderstanding 
Since priors are not objective, Bayesian methods should not be used for 
scientific result. 
Objective priors (Noninformative priors) are proposed for such usage. 
Bayesian statistics do not deny frequentists’ way and statistical inference based on an objective prior 
often agrees with frequentists’ one. (ex: MLE = MAP estimate under the uniform prior) 
It seems unsolved which noninformative prior should be used. 
The essence of this problem is how to choose a good statistical 
method with small sample. It is NOT inherent to Bayesian statistics, 
but has been a central issue in traditional statistics. 
Frequentists and mathematicians avoid this and only consider mathematically tractable 
models and/or asymptotic theory (it’s large-sample approximation!) 
In Bayesian statistics, this problem is transformed into the choice of noninformative prior
Further Reference 
Preface in RIMS Kokyuroku 1834 (Theory and applications of statistical 
inference in quantum theory) 
POINT 
- Point out some misunderstanding of Statistics 
g 
for non-experts (esp. Physicists) 
- Define the word “quantum statistical inference” 
as an academic term 
Sorry but it is written in Japanese.
CONTENTS 
0. Notation 
1. Introduction 
2. Quantum Statistical Models and Measurements 
3. Quantum Statistical Decision Problems 
4. Risk Functions and Optimality Criteria 
5. Main Result 
6. Minimax POVMs and Least Favorable Priors 
7. Summary
0. Notation
Description of Probability in QM 
Quantum Mechanics (QM): described by operators in a Hilbert space 
P b bilit Probability: d ib described d b by d it density op. and d measurement 
t 
D it Density t 
operator 
Probability distributions 
 ( )   (dx )  Tr ( )M(d x ) Measurement(POVM) 
M(d x ) 
M t 
Measurements 
 ( ) x a ( x ) (dx )   
M(dx ) 
Statistical processing of 
data
Density Operators 
Density operators (density matrices, state) 
S ( ) : { ll d Hilb t  } 
 all d.o.s on a Hilbert space  { L() | Tr 1,  0} 
When dimH  n   
n * 
0 1 
U 
p 
U 
  
  
  
  
    0 i p 
  
j 0 p  p n 
Tr 1 
1 
j 
 

Measurements in QM (1/2) 
Mathematical model of measurements 
= Positive Operator Valued Measure (POVM) 
POVM (def def. for finite sets) 
Sample space  = Possible values for a measurement 
{ , , , } 1 2 k  x x  x 
POVM { POVM{ M j } j  1 1,  , k = a family of linear ops ops. satisfying 
M M x O j j : ({ }) 
 
  
j 
j M I 
Positive operator 
(positive semidefinite matrix)
Measurements in QM (2/2) 
How to construct a probability space (Axiom of QM) 
d d.o o.   S ( ) 
{ , , , } 1 2 k sample space   x x  x 
j j k M 1, , { }   POVM (Measurement) 
Then the prob obtaining a measurement outcome x 
  
j prob. j j p : Tr  
M 
* The above p prob. satisfies conditions (p positive and g 
summing to one 
M O j  
due to both def.s of d.o.s and POVMs. 
 
 0 
j I M j 
  
j 
Tr  1 
and
1. Introduction
Quantum Bayesian Hypothesis Testing(1/2) 
Alice chooses one alphabet (denoted by an integer) and sends one 
corresponding quantum state (described by a density operator) 
di d ib d b d i ) 
1. 
{ , ,..., } (H ) 1 2 S k 1,2,..., k      
2. Bob receives the quantum state and perform a measurement and 
guesses the alphabet Alice really sent to him. 
The whole process is described by a POVM 
j j 1, 2 , ,k {M }   
3. The proportion of alphabets is given by a distribution 
1, , , 0 1 1     k k       
(called a prior distribution) 
23214  
1  2  4  2  3 
Quantum Bayesian Hypothesis Testing(2/2) 
* The probability that Bob guesses j when Alice sends i. 
i j p (B  j | A  i)  Tr  M 
* Bob’s loss (estimation error) when he guesses j while Alice sends i. 
w(i, j ) 
k 
* Bob’s risk for the i-th alphabet 
 
R i w i j p B j A i 
M ( )  ( , ) (  |  
) 
j 
1 
* Bob’s average risk (his task is to minimize by using a good POVM) 
k 
 R ( i) 
 
i 
i i 
1 
M )
Summary of Q-BHT 
Chooses one alphabet and sends one quantum state 
1,2,..., { , ,..., } (H) 1 2 S k 1 2 k     
Proportion of each alphabet 
(prior distribution) 
 1     k  1,  
k 
 0 Guess the alphabet by choosing a measurement (POVM) 
j j 1, 2 , ,k {M }   
k 
k 
r :  
 
 R ( i 
) R ( i) ( i j 
)T M 
Average risk 
 
i j i w M ) , Tr  ,M M i 
 
i 1 
 
j 
1 
Bayes POVM w.r.t. = POVM minimizing the average risk
Minimax POVM 
1.Chooses one alphabet and sends one quantum state 
  { , ,..., } (H) 1 2 S k 1,2,..., k     
Guess the alphabet by choosing a measurement (POVM) 
j j 1, 2 , ,k 
{M }   
Minimize the worst-case risk 
Bob has no prior information 
* 
The worst-case risk 
: sup ( ) M 
*M 
r R i 
i 
 
Minimax POVM = POVM minimizing the worst-case risk
Quantum Minimax Theorem (Simple Ver.) 
1.Chooses one alphabet and sends one quantum state 
  { , ,..., } (H ) 1 2 S k 1,2,..., k     
Guess the alphabet by choosing a measurement (POVM) 
j j 1, 2 , ,k 
{M } Theorem ( Hirota and Ikehara, 1982) 
k 
  
inf sup ( ) sup inf  ( ) 
M M M M 
i 
i 
i 
R i R i 
1 
 
, ) 
Minimax POVM agrees with the Bayes POVM w.r.t. the worst case prior. 
*The worst-case prior to Bob is called a least favorable prior.
Example (1/2) 
states 
1 0 0 1/ 2 1/ 2 0 
 
 
 
 
 
 
 
 
 
 
0 0 0 
 
 
 
 
Quantum  
 
 
   , 
. 
 
 
0 0 0 1 2 
, 1/ 2 1/ 2 0 
0 0 0 
0 0 0 
   
   
   
   
0 0 0 3 
0 0 1 
   
   
  
0-1 loss w(( , i, j) 
 1   
ij Minimax POVM 
1 1/ 2 1/ 2 0 
 
 1 2 
 2 
 M  M 1/ 2 1 1/ 2 0 , 
0 0 0 . 
1  1/ 2  
1/ 2 0 
2 
1/ 2 1 1/ 2 0 , 1 
1 
    
 
 
 
    
    
  
0 0 0 
3 
 
  
   
M  
0 0 0 
0 0 0 
  
 
 
 
0 0 1 
  
 
 
 
*This is a counterexample of Theorem 2 in Hirota and Ikehara (1982), where their proof seems not 
mathematically rigorous.
states 
Example (2/2) 
1 0 0 1/ 2 1/ 2 0 
 
 
  
 
 
  
0 0 0 
 
 
  
Quantum , 
 
0 0 0 1 2 
 
, 1/ 2 1/ 2 0 
 
   . 
0 0 0 
0 0 0 
   
  
 
 
   
  
 
0 0 0 3 
0 0 1 
   
  
 
  
0-1 loss ij w(i, j)  1   
LFP 
(1)  (2)  1 / 2, (3)  0 LF LF LF    
Minimax POVM is constructed as a Bayes POVM w.r.t. LFP. 
Important li i 
Implication 
When completely unknown, it is not necessarily wise to find an optimal POVM 
with the uniform prior. 
(Although Statisticians already know this fact long decades ago…)
Rewritten in Technical Terms 
11.Nature 
Parameter space Quantum Statistical Model 
  
{ }  S (H)    1,2,..., k  
Experimenter and Statistician 
Decision space POVM on U 
u uU U  1,2,..., k  {M } 
Loss function 
w( , u ) 
Risk function 
  u R ( ) w( , u )Tr M M     
uU 
Theorem 
i f R ( ) i f  
( ) R 
( ) 
inf sup sup inf  M M M M 
 
 
  
Main Result (Brief Summary) 
Quantum Minimax Theorem 
      
R R 
inf sup ( )  sup inf ( ) (d ) M M ( ) 
M M 
    
Po   
P Po 
R (  )   ( )T (  
)M(d ) 
: w( , u Tr du M where  U 
 
R R 
cf) 
inf sup ( )  sup inf ( ) (d )   
  
D D 
    
 P ( )  Conditions, assumptions, and essence of proof are explained.
Previous Works 
Wald (1950) 
Statistical Decision Theory 
H l Holevo (1973) Q t Quantum St titi Statistical lD ii Decision Th 
Theory 
Recent results and applications (many!) 
Kumagai and Hayashi (2013) 
Guta and Jencova (2007) 
Hayashi (1998) 
Le Cam (1964) 
( Classical) Minimax Theorem in statistical decision theory 
) y First ver. is given by Wald (1950) 
we show 
Quantum Minimax Theorem in quantum statistical decision theory
2. Quantum Statistical Models 
and Measurements
Formal Definition of Statistical Models 
(naïve ) Quantum statistical model = A parametric family of density operators 
 
 3 1 2 
1 
1   i 
 
0  (  ) 2  (  ) 2  (  ) 2  
1 Ex. 
  
  
 
  1 2 3   
 
  
i 1 
  
 
1 2 3 
2 
( )    
, , R 1 2 3     
 (r , s; )  r  ( ) s  (r  ) ( s  )   R 
Basic Idea (Holevo, 1976) 
A quantum statistical model is defined by a map 
: (H) 1    L 
   ( ) 
Next, we see the required conditions for this map
Regular Operator-Valued Function (1/2) 
Definition 
 
K   
Locally compact metric space 
compact set 
: {( , ) : ( , )  }  K  K  K d  
  0 
(H) trace-class operators on a Hilbert space 1 L 
: (H) 1 T   L 
For a map , we define 
     K X X T  T  X   K T ( ) : inf :   ( )  ( )  ,( , ) 1 
Definition 
An operator-valued function T 
 K T K   
lim ( ) 0 
0 
 
   
is regular
Regular Operator-Valued Function (2/2) 
Remark 1 
lim ( ) 0 
 K T K  
    
0 
Uniformly continuous w.r.t. trace norm on every compact set 
as 
sup T (  ) T 
(  
)  
0 
 0 
Converse does not hold if 
1 
( , ) 
K 
dim H 
  
 
   
Remark 2 
The regularity assures that the following operator-valued integral is well-defined 
 
 
f ( )T ( ) (d ) 
for every ( )   P () 0 f 
 C  and 
     K X X T  T  X   K T ( ) : inf :   ( )  ( )  ,( , ) 1 
y
Quantum Statistical Models 
Definition 
S (H) : {X  L(H) : Tr X  1, X  0} 
 :   S (H) is called a quantum statistical model 
if it satisfies the conditions below. 
Conditions 
1. Identifiability (one-to-one map) 1 2 1 2  ( )   ( ),   
2. Regularity lim ( ) 0 
0 
T  K  
K   
   
↑Necessary for our arguments
Measurements (POVMs) 
U Decision space (locally compact space) 
  
AU Borel sets (the smallest sigma algebra containing all open sets) 
D fi i i 
Definition 
M 
Positive-operator valued function is called a POVM if it satisfies 
B  A(U ) 
 
MB   0 
 
  
  
  U B B A j i B B   
 
 
M  B M( B ) 
 
  
 
j j 
1 1 
j 
j 
, , 1 2  i j   ,  
MU   I 
Po (U ) All POVMs over U
Born Rule 
Axiom of Quantum Mechanics 
For the quantum system described by 
 
and a measurement described by a POVM 
M(dx) 
t measurement t outcome i is di tib t distributed d di according t 
to 
x ~ Tr M(dx ) (dx )     
 POVM x 
M(dx ) 
(dx )  
3. Quantum Statistical 
Decision Problems
Basic Setting 
Situation 
For the quantum system specified with the unknown parameter 
experimenters extract information for some purposes. 
 ( ) POVM x a ( x ) 
 
(dx ) 
M(dx ) 
 Typical sequence of task 
1 1. To choose a measurement (POVM) 
2. To perform the measurement over the quantum system 
3. To take an action a(x) among choices based on the measurement 
outcome x 
formally called a decision function
Decision Functions 
Example 
- Estimate the unknown parameter x 1   
 
a ( x )  
xn n 
x) - Estimate the unknown d.o. rather than parameter 
 
 
 
 
a ( x ) a ( x 
) 0 
 
   
 
   
1 2 
( ) ( ) 0 
  
 
0 0 1 ( ) ( ) 
1 3 
3 
* 
2 
a x a x 
a x a x 
 
- Validate the entanglement/separability a ( x )  0,1 
- Constr ct Construct confidence region (credible region) 
[a L (x), a R 
(x)]
Remarks for Non-statisticians 
Remarks 
1. If the quantum state is completely known, then the distribution of 
measurement outcome is also known and the best action is chosen. 
2. Action has to be made from finite data. 
3. Precise estimation of the parameter is only a typical example. 
 ( ) POVM x a ( x ) 
(dx )   
M(dx )
Loss Functions 
Performance of decision functions is compared by adopting 
the loss functions (smaller is better). 
Ex: Parameter Estimation 
Quantum Statistical Model 
{  ( ) :    R m } 
 
Action space 
Loss function (squared error) 
2 w( , a )    a 
Later we see the formal definition of the loss function.
Measurements Over Decisions 
Basic Idea 
From the g g, beginning, we y only consider POVMs over the action p 
space. 
Put together 
 ( ) POVM x a ( x ) 
 
(dx ) 
M(dx ) 
 ( da )  
 
POVM a 
N(da ) 
 ( )
Quantum Statistical Decision Problems 
Formal Definitions 
 Parameter space (locally compact space) 
 ( ) Quantum statistical model 
U *Decision p space ( y locally p compact p ) 
space) 
w :  U  R  { } Loss function (lower semi continuous**; bounded below) 
The triplet ( ( ),U , w) l d bl 
is called a quantum statistical decision problem. 
cf) statistical decision ( p U w) 
problem 
,,  
p (dx ) Statistical model  
U Decision space 
w :  U  R  { } Loss function 
*we follow Holevo’s terminology instead of saying “action space”. 
** we impose a slightly stronger condition on the loss than LeCam, Holevo.
4. Risk Functions and 
Optimality Criteria
Comparison of POVMs 
The loss (e.g. squared error) depends on both unknown parameter  
and 
our decision u u, which is a random variable variable.( (u ~   (du )  
Tr ( )M(du ) ) 
In order to compare two POVMs, we focus on the average of the loss 
w w.r r.t t. the distribution of u 
u. 
( ) POVM 
E[w( , u )] 
 u 
(du )   
M(du ) 
N(du ' ) 
Compared at the same  
POVM E'[w( ,u')] 
 ( ) u ' 
(du ' )  
D fi iti 
Risk Functions 
Definition 
The risk function of 
M  Po (U ) 
R ( ) : w( , u )Tr ( )M(d u ) M     E w( , u )   
  
U 
u ~ Tr ( )M(d u ) 
R kf N i i i 
1. Smaller risk is better. 
Remarks for Non-statisticians 
2. Generally there exists no POVM achieving the uniformly smallest 
risk among POVMs. 
Since R M (  
) depends on the unknown parameter, we need 
additional optimality criteria.
Optimality Criteria (1/2) 
Definition 
A POVM is said to be minimax if it minimizes the supremum of the risk function, i.e., 
sup ( ) inf sup ( ) M M 
R   R 
 
M* M ( )  Po U  
  
Historical Remark 
Bogomolov showed the existence of a minimax POVM (Bogomolov 
1981) in a more general framework. 
(Description of quantum states and measurements is different from recent one.)
Optimality Criteria (2/2) 
P() All probability distributions (defined on Borel sets) 
In y Bayesian , statistics,   ( P() is called a p 
prior distribution. 
Definition 
The average risk of M  P o ( U ) w.r.t.  
 R 
(  ) (d  ) 
: ,M M   r  
A POVM is said to be Bayes if it minimizes the average risk, i.e., 
,M M ( ) ,M inf    r r 
Po U 
 
Historical Remark 
Holevo showed the existence of a Bayes POVM (Holevo 1976; see also 
Ozawa, 1980) in a more general framework.
Ex. of Loss Functions and Risk Functions 
Parameter Estimation 
{ ( ) :    R m } 
U   2 w( , u )    u 
R ( ) u Tr ( )M(d u ) 2 
M     E [ ] 2    u  
   
U 
Construction of Predictive Density Operator 
{ ( ) n :    R p } 
U  S (H  m ) w( , u )  D ( ( )  m || u ) 
    
R ( ) D( ( ) m || u )Tr ( ) m M(d u ) M      U
5. Main Result
Main Result (Quantum Minimax Theorem) 
Theorem (*) 
Let  and U be a compact metric space. 
Then the following equality holds for every quantum statistical decision problem. 
      
R R 
inf sup ( )  sup inf ( ) (d ) M ( ) M ( ) 
M ( ) M 
    
Po U   
P Po U 
  
R ( ) : w( , u )Tr ( )M(d u ) M where the risk function is given by     
U 
Corollary 
y 
For every closed convex subset Q  Po (U ) the above assertion holds. 
*see quant-ph 1410.3639 for proof
Key Lemmas For Theorem 
Compactness of the POVMs 
Holevo (1976) 
If the decision space U is compact, then P Po (U ) 
is also compact. 
Equicontinuity of risk functions Lemma by FT 
w :   U  R Loss function 
If w is bounded and continuous, then 
: { : M ( )} ( ) The equicontinuity implies M F  R  Po U  C  
is (uniformly) equicontinuous. F  C () 
the compactness of 
under the uniform convergence topology 
We show main theorem by using Le Cam’s result with both lemmas. 
However, not a consequence of their previous old results.
6. Minimax POVMs 
and Least Favorable Priors
Previous Works (LFPs) 
Wald (1950) Statistical Decision Theory 
Jeffreys (1961) 
Jeffreys prior 
Bernardo ( , 1979, ) 
2005) 
Reference prior; reference analysis 
Komaki (2011) 
Latent information prior 
Tanaka (2012) 
MDP p prior ( Reference p prior for p pure-states model) 
) 
Objective priors in Bayesian analysis are indeed least 
favorable priors.
Main Result (Existence Theorem of LFP) 
Definition 
If a prior supremum, i.e., achieves the supremum i e the following holds 
 R    R    
     
inf ( ) (d )  sup inf ( ) (d ) M ( ) M ( ) 
M ( ) M 
P Po U 
Po U LF 
 The prior L F is called a least favorable prior(LFP).  
  
R ( ) : w( , u )Tr ( )M(du ) M where     
U 
Theorem 
Let  and U be a p compact metric p 
space. 
w: continuous loss function. 
Then, for every decision problem, there exists a LFP.
Pathological Example 
Remark 
Even on a compact space, a bounded lower semicontinuous (and not 
continuous) loss function does not necessarily admit a LFP. 
Example   [0,1] 
R( ) 
( )  ( , )  M R  L  u 1  
( 1 
0  
0 1 
   M M 
1 sup R ( ) sup R 
( ) (d ) 
 [ 0 ,1] [ 0 ,1] 
  
But for every prior, 1   R ( ) (d ) 
[ 0 ,1] M
Minimax POVM is constructed using LFP 
Corollary 
Let  and U 
be a compact space 
metric space. 
If a LFP exists for a quantum statistical decision problem, 
Every minimax POVM is a Bayes POVM with LFP 
respect to the LFP. 
In particular, if the Bayes POVM w.r.t. the LFP is unique, then it is 
minimax. 
y 
Corollary2 
For every closed convex subset Q  Po (U ) the above assertion holds. 
Much of theoretical results are derived from quantum 
minimax theorem.
7. Summary
Discussion 
1. We show quantum minimax theorem, which gives theoretical basis in 
quantum statistical decision theory, in particular in objective Bayesian 
analysis. 
2. For every closed convex subset of POVMs, all of assertions still hold. 
Our result has also practical meanings for experimenters. 
Future Works 
FutureWorks 
- Show other decision-theoretic results and previous known results by using 
quantum minimax theorem 
- Propose a practical algorithm of finding a least favorable prior 
- Define geometrical structure in the quantum statistical decision problem 
If you are interested in details, please see quant-ph 1410.3639 ; citations are also welcome!

Quantum Minimax Theorem in Statistical Decision Theory (RIMS2014)

  • 1.
    This presentation isshown in RIMS conference, Kyoto, JAPAN ftanaka@sigmath.es.osaka-u.ac.jp Quantum Minimax Theorem in Statistical Decision Theory November 10, 2014 PPuubblliic VVerrsiionn 田中冬彦((Fuyuhiko TANAKA)) Graduate School of Engineering Science, Osaka University If you are interested in details, please see quant-ph 1410.3639 ; citations are also welcome!
  • 2.
    Prologue: What isBayesian Statistics? (Non-technical Short Version) *It is for non-statisticians esp. researchers in quantum information.
  • 3.
    Mission of Statistics Mission of Statistics To give a desirable and plausible method according to statistical models, purpose of inference (point estimation, prediction, hypothesis testing, model selection, etc.), and loss functions. *Precise evaluation of estimation error is secondary. Difference from Pure Math Statistics is required to meet various needs in the world → Significant problems in statistics have changed according to the times (This trivial fact does not seem to be known so much except for statisticians.)
  • 4.
    What’s Bayesian Statistics?(1/2) Bayesian Statistics S i i Statistics d i i admitting h the usage of f i prior di ib i distributions prior distribution= prob. dist. on the unknown parameter (We do not care about the interpretation of this prob.) *Interpretation heavily depends on individual researcher.
  • 5.
    What’s Bayesian Statistics?(2/2) Relation to frequentist’s statistics (traditional statistics) 1 1. Bayesian statistics has enlarged the framework and satisfies various needs. (Not an alternative of frequentists’ statistics!) 2 2. Sticking to frequentists frequentists’ methods like MLEs is the preference to a specific prior distribution (narrow-minded!) (Indeed, MLEs often fail; their independence of loss is unstuaible ) 3. Frequentists’ results give good starting points to Bayesian analysis For details, see, e.g., C. Robert, Bayesian Choice Chap.11 Frequentists (T diti l Bayesian Statistics Traditional statistics)
  • 6.
    Misunderstanding Since priorsare not objective, Bayesian methods should not be used for scientific result. Objective priors (Noninformative priors) are proposed for such usage. Bayesian statistics do not deny frequentists’ way and statistical inference based on an objective prior often agrees with frequentists’ one. (ex: MLE = MAP estimate under the uniform prior) It seems unsolved which noninformative prior should be used. The essence of this problem is how to choose a good statistical method with small sample. It is NOT inherent to Bayesian statistics, but has been a central issue in traditional statistics. Frequentists and mathematicians avoid this and only consider mathematically tractable models and/or asymptotic theory (it’s large-sample approximation!) In Bayesian statistics, this problem is transformed into the choice of noninformative prior
  • 7.
    Further Reference Prefacein RIMS Kokyuroku 1834 (Theory and applications of statistical inference in quantum theory) POINT - Point out some misunderstanding of Statistics g for non-experts (esp. Physicists) - Define the word “quantum statistical inference” as an academic term Sorry but it is written in Japanese.
  • 8.
    CONTENTS 0. Notation 1. Introduction 2. Quantum Statistical Models and Measurements 3. Quantum Statistical Decision Problems 4. Risk Functions and Optimality Criteria 5. Main Result 6. Minimax POVMs and Least Favorable Priors 7. Summary
  • 9.
  • 10.
    Description of Probabilityin QM Quantum Mechanics (QM): described by operators in a Hilbert space P b bilit Probability: d ib described d b by d it density op. and d measurement t D it Density t operator Probability distributions  ( )   (dx )  Tr ( )M(d x ) Measurement(POVM) M(d x ) M t Measurements  ( ) x a ( x ) (dx )   M(dx ) Statistical processing of data
  • 11.
    Density Operators Densityoperators (density matrices, state) S ( ) : { ll d Hilb t  }  all d.o.s on a Hilbert space  { L() | Tr 1,  0} When dimH  n   n * 0 1 U p U             0 i p   j 0 p  p n Tr 1 1 j  
  • 12.
    Measurements in QM(1/2) Mathematical model of measurements = Positive Operator Valued Measure (POVM) POVM (def def. for finite sets) Sample space  = Possible values for a measurement { , , , } 1 2 k  x x  x POVM { POVM{ M j } j  1 1,  , k = a family of linear ops ops. satisfying M M x O j j : ({ })    j j M I Positive operator (positive semidefinite matrix)
  • 13.
    Measurements in QM(2/2) How to construct a probability space (Axiom of QM) d d.o o.   S ( ) { , , , } 1 2 k sample space   x x  x j j k M 1, , { }   POVM (Measurement) Then the prob obtaining a measurement outcome x   j prob. j j p : Tr  M * The above p prob. satisfies conditions (p positive and g summing to one M O j  due to both def.s of d.o.s and POVMs.   0 j I M j   j Tr  1 and
  • 14.
  • 15.
    Quantum Bayesian HypothesisTesting(1/2) Alice chooses one alphabet (denoted by an integer) and sends one corresponding quantum state (described by a density operator) di d ib d b d i ) 1. { , ,..., } (H ) 1 2 S k 1,2,..., k      2. Bob receives the quantum state and perform a measurement and guesses the alphabet Alice really sent to him. The whole process is described by a POVM j j 1, 2 , ,k {M }   3. The proportion of alphabets is given by a distribution 1, , , 0 1 1     k k       (called a prior distribution) 23214  1  2  4  2  3 
  • 16.
    Quantum Bayesian HypothesisTesting(2/2) * The probability that Bob guesses j when Alice sends i. i j p (B  j | A  i)  Tr  M * Bob’s loss (estimation error) when he guesses j while Alice sends i. w(i, j ) k * Bob’s risk for the i-th alphabet  R i w i j p B j A i M ( )  ( , ) (  |  ) j 1 * Bob’s average risk (his task is to minimize by using a good POVM) k  R ( i)  i i i 1 M )
  • 17.
    Summary of Q-BHT Chooses one alphabet and sends one quantum state 1,2,..., { , ,..., } (H) 1 2 S k 1 2 k     Proportion of each alphabet (prior distribution)  1     k  1,  k  0 Guess the alphabet by choosing a measurement (POVM) j j 1, 2 , ,k {M }   k k r :    R ( i ) R ( i) ( i j )T M Average risk  i j i w M ) , Tr  ,M M i  i 1  j 1 Bayes POVM w.r.t. = POVM minimizing the average risk
  • 18.
    Minimax POVM 1.Choosesone alphabet and sends one quantum state   { , ,..., } (H) 1 2 S k 1,2,..., k     Guess the alphabet by choosing a measurement (POVM) j j 1, 2 , ,k {M }   Minimize the worst-case risk Bob has no prior information * The worst-case risk : sup ( ) M *M r R i i  Minimax POVM = POVM minimizing the worst-case risk
  • 19.
    Quantum Minimax Theorem(Simple Ver.) 1.Chooses one alphabet and sends one quantum state   { , ,..., } (H ) 1 2 S k 1,2,..., k     Guess the alphabet by choosing a measurement (POVM) j j 1, 2 , ,k {M } Theorem ( Hirota and Ikehara, 1982) k   inf sup ( ) sup inf  ( ) M M M M i i i R i R i 1  , ) Minimax POVM agrees with the Bayes POVM w.r.t. the worst case prior. *The worst-case prior to Bob is called a least favorable prior.
  • 20.
    Example (1/2) states 1 0 0 1/ 2 1/ 2 0           0 0 0     Quantum       , .   0 0 0 1 2 , 1/ 2 1/ 2 0 0 0 0 0 0 0             0 0 0 3 0 0 1         0-1 loss w(( , i, j)  1   ij Minimax POVM 1 1/ 2 1/ 2 0   1 2  2  M  M 1/ 2 1 1/ 2 0 , 0 0 0 . 1  1/ 2  1/ 2 0 2 1/ 2 1 1/ 2 0 , 1 1                  0 0 0 3       M  0 0 0 0 0 0      0 0 1      *This is a counterexample of Theorem 2 in Hirota and Ikehara (1982), where their proof seems not mathematically rigorous.
  • 21.
    states Example (2/2) 1 0 0 1/ 2 1/ 2 0         0 0 0     Quantum ,  0 0 0 1 2  , 1/ 2 1/ 2 0     . 0 0 0 0 0 0              0 0 0 3 0 0 1         0-1 loss ij w(i, j)  1   LFP (1)  (2)  1 / 2, (3)  0 LF LF LF    Minimax POVM is constructed as a Bayes POVM w.r.t. LFP. Important li i Implication When completely unknown, it is not necessarily wise to find an optimal POVM with the uniform prior. (Although Statisticians already know this fact long decades ago…)
  • 22.
    Rewritten in TechnicalTerms 11.Nature Parameter space Quantum Statistical Model   { }  S (H)    1,2,..., k  Experimenter and Statistician Decision space POVM on U u uU U  1,2,..., k  {M } Loss function w( , u ) Risk function   u R ( ) w( , u )Tr M M     uU Theorem i f R ( ) i f  ( ) R ( ) inf sup sup inf  M M M M     
  • 23.
    Main Result (BriefSummary) Quantum Minimax Theorem       R R inf sup ( )  sup inf ( ) (d ) M M ( ) M M     Po   P Po R (  )   ( )T (  )M(d ) : w( , u Tr du M where  U  R R cf) inf sup ( )  sup inf ( ) (d )     D D      P ( )  Conditions, assumptions, and essence of proof are explained.
  • 24.
    Previous Works Wald(1950) Statistical Decision Theory H l Holevo (1973) Q t Quantum St titi Statistical lD ii Decision Th Theory Recent results and applications (many!) Kumagai and Hayashi (2013) Guta and Jencova (2007) Hayashi (1998) Le Cam (1964) ( Classical) Minimax Theorem in statistical decision theory ) y First ver. is given by Wald (1950) we show Quantum Minimax Theorem in quantum statistical decision theory
  • 25.
    2. Quantum StatisticalModels and Measurements
  • 26.
    Formal Definition ofStatistical Models (naïve ) Quantum statistical model = A parametric family of density operators   3 1 2 1 1   i  0  (  ) 2  (  ) 2  (  ) 2  1 Ex.        1 2 3      i 1    1 2 3 2 ( )    , , R 1 2 3      (r , s; )  r  ( ) s  (r  ) ( s  )   R Basic Idea (Holevo, 1976) A quantum statistical model is defined by a map : (H) 1    L    ( ) Next, we see the required conditions for this map
  • 27.
    Regular Operator-Valued Function(1/2) Definition  K   Locally compact metric space compact set : {( , ) : ( , )  }  K  K  K d    0 (H) trace-class operators on a Hilbert space 1 L : (H) 1 T   L For a map , we define      K X X T  T  X   K T ( ) : inf :   ( )  ( )  ,( , ) 1 Definition An operator-valued function T  K T K   lim ( ) 0 0     is regular
  • 28.
    Regular Operator-Valued Function(2/2) Remark 1 lim ( ) 0  K T K      0 Uniformly continuous w.r.t. trace norm on every compact set as sup T (  ) T (  )  0  0 Converse does not hold if 1 ( , ) K dim H       Remark 2 The regularity assures that the following operator-valued integral is well-defined   f ( )T ( ) (d ) for every ( )   P () 0 f  C  and      K X X T  T  X   K T ( ) : inf :   ( )  ( )  ,( , ) 1 y
  • 29.
    Quantum Statistical Models Definition S (H) : {X  L(H) : Tr X  1, X  0}  :   S (H) is called a quantum statistical model if it satisfies the conditions below. Conditions 1. Identifiability (one-to-one map) 1 2 1 2  ( )   ( ),   2. Regularity lim ( ) 0 0 T  K  K      ↑Necessary for our arguments
  • 30.
    Measurements (POVMs) UDecision space (locally compact space)   AU Borel sets (the smallest sigma algebra containing all open sets) D fi i i Definition M Positive-operator valued function is called a POVM if it satisfies B  A(U )  MB   0        U B B A j i B B     M  B M( B )     j j 1 1 j j , , 1 2  i j   ,  MU   I Po (U ) All POVMs over U
  • 31.
    Born Rule Axiomof Quantum Mechanics For the quantum system described by  and a measurement described by a POVM M(dx) t measurement t outcome i is di tib t distributed d di according t to x ~ Tr M(dx ) (dx )      POVM x M(dx ) (dx )  
  • 32.
    3. Quantum Statistical Decision Problems
  • 33.
    Basic Setting Situation For the quantum system specified with the unknown parameter experimenters extract information for some purposes.  ( ) POVM x a ( x )  (dx ) M(dx )  Typical sequence of task 1 1. To choose a measurement (POVM) 2. To perform the measurement over the quantum system 3. To take an action a(x) among choices based on the measurement outcome x formally called a decision function
  • 34.
    Decision Functions Example - Estimate the unknown parameter x 1    a ( x )  xn n x) - Estimate the unknown d.o. rather than parameter     a ( x ) a ( x ) 0         1 2 ( ) ( ) 0    0 0 1 ( ) ( ) 1 3 3 * 2 a x a x a x a x  - Validate the entanglement/separability a ( x )  0,1 - Constr ct Construct confidence region (credible region) [a L (x), a R (x)]
  • 35.
    Remarks for Non-statisticians Remarks 1. If the quantum state is completely known, then the distribution of measurement outcome is also known and the best action is chosen. 2. Action has to be made from finite data. 3. Precise estimation of the parameter is only a typical example.  ( ) POVM x a ( x ) (dx )   M(dx )
  • 36.
    Loss Functions Performanceof decision functions is compared by adopting the loss functions (smaller is better). Ex: Parameter Estimation Quantum Statistical Model {  ( ) :    R m }  Action space Loss function (squared error) 2 w( , a )    a Later we see the formal definition of the loss function.
  • 37.
    Measurements Over Decisions Basic Idea From the g g, beginning, we y only consider POVMs over the action p space. Put together  ( ) POVM x a ( x )  (dx ) M(dx )  ( da )   POVM a N(da )  ( )
  • 38.
    Quantum Statistical DecisionProblems Formal Definitions  Parameter space (locally compact space)  ( ) Quantum statistical model U *Decision p space ( y locally p compact p ) space) w :  U  R  { } Loss function (lower semi continuous**; bounded below) The triplet ( ( ),U , w) l d bl is called a quantum statistical decision problem. cf) statistical decision ( p U w) problem ,,  p (dx ) Statistical model  U Decision space w :  U  R  { } Loss function *we follow Holevo’s terminology instead of saying “action space”. ** we impose a slightly stronger condition on the loss than LeCam, Holevo.
  • 39.
    4. Risk Functionsand Optimality Criteria
  • 40.
    Comparison of POVMs The loss (e.g. squared error) depends on both unknown parameter  and our decision u u, which is a random variable variable.( (u ~   (du )  Tr ( )M(du ) ) In order to compare two POVMs, we focus on the average of the loss w w.r r.t t. the distribution of u u. ( ) POVM E[w( , u )]  u (du )   M(du ) N(du ' ) Compared at the same  POVM E'[w( ,u')]  ( ) u ' (du ' )  
  • 41.
    D fi iti Risk Functions Definition The risk function of M  Po (U ) R ( ) : w( , u )Tr ( )M(d u ) M     E w( , u )     U u ~ Tr ( )M(d u ) R kf N i i i 1. Smaller risk is better. Remarks for Non-statisticians 2. Generally there exists no POVM achieving the uniformly smallest risk among POVMs. Since R M (  ) depends on the unknown parameter, we need additional optimality criteria.
  • 42.
    Optimality Criteria (1/2) Definition A POVM is said to be minimax if it minimizes the supremum of the risk function, i.e., sup ( ) inf sup ( ) M M R   R  M* M ( )  Po U    Historical Remark Bogomolov showed the existence of a minimax POVM (Bogomolov 1981) in a more general framework. (Description of quantum states and measurements is different from recent one.)
  • 43.
    Optimality Criteria (2/2) P() All probability distributions (defined on Borel sets) In y Bayesian , statistics,   ( P() is called a p prior distribution. Definition The average risk of M  P o ( U ) w.r.t.   R (  ) (d  ) : ,M M   r  A POVM is said to be Bayes if it minimizes the average risk, i.e., ,M M ( ) ,M inf    r r Po U  Historical Remark Holevo showed the existence of a Bayes POVM (Holevo 1976; see also Ozawa, 1980) in a more general framework.
  • 44.
    Ex. of LossFunctions and Risk Functions Parameter Estimation { ( ) :    R m } U   2 w( , u )    u R ( ) u Tr ( )M(d u ) 2 M     E [ ] 2    u     U Construction of Predictive Density Operator { ( ) n :    R p } U  S (H  m ) w( , u )  D ( ( )  m || u )     R ( ) D( ( ) m || u )Tr ( ) m M(d u ) M      U
  • 45.
  • 46.
    Main Result (QuantumMinimax Theorem) Theorem (*) Let  and U be a compact metric space. Then the following equality holds for every quantum statistical decision problem.       R R inf sup ( )  sup inf ( ) (d ) M ( ) M ( ) M ( ) M     Po U   P Po U   R ( ) : w( , u )Tr ( )M(d u ) M where the risk function is given by     U Corollary y For every closed convex subset Q  Po (U ) the above assertion holds. *see quant-ph 1410.3639 for proof
  • 47.
    Key Lemmas ForTheorem Compactness of the POVMs Holevo (1976) If the decision space U is compact, then P Po (U ) is also compact. Equicontinuity of risk functions Lemma by FT w :   U  R Loss function If w is bounded and continuous, then : { : M ( )} ( ) The equicontinuity implies M F  R  Po U  C  is (uniformly) equicontinuous. F  C () the compactness of under the uniform convergence topology We show main theorem by using Le Cam’s result with both lemmas. However, not a consequence of their previous old results.
  • 48.
    6. Minimax POVMs and Least Favorable Priors
  • 49.
    Previous Works (LFPs) Wald (1950) Statistical Decision Theory Jeffreys (1961) Jeffreys prior Bernardo ( , 1979, ) 2005) Reference prior; reference analysis Komaki (2011) Latent information prior Tanaka (2012) MDP p prior ( Reference p prior for p pure-states model) ) Objective priors in Bayesian analysis are indeed least favorable priors.
  • 50.
    Main Result (ExistenceTheorem of LFP) Definition If a prior supremum, i.e., achieves the supremum i e the following holds  R    R         inf ( ) (d )  sup inf ( ) (d ) M ( ) M ( ) M ( ) M P Po U Po U LF  The prior L F is called a least favorable prior(LFP).    R ( ) : w( , u )Tr ( )M(du ) M where     U Theorem Let  and U be a p compact metric p space. w: continuous loss function. Then, for every decision problem, there exists a LFP.
  • 51.
    Pathological Example Remark Even on a compact space, a bounded lower semicontinuous (and not continuous) loss function does not necessarily admit a LFP. Example   [0,1] R( ) ( )  ( , )  M R  L  u 1  ( 1 0  0 1    M M 1 sup R ( ) sup R ( ) (d )  [ 0 ,1] [ 0 ,1]   But for every prior, 1   R ( ) (d ) [ 0 ,1] M
  • 52.
    Minimax POVM isconstructed using LFP Corollary Let  and U be a compact space metric space. If a LFP exists for a quantum statistical decision problem, Every minimax POVM is a Bayes POVM with LFP respect to the LFP. In particular, if the Bayes POVM w.r.t. the LFP is unique, then it is minimax. y Corollary2 For every closed convex subset Q  Po (U ) the above assertion holds. Much of theoretical results are derived from quantum minimax theorem.
  • 53.
  • 54.
    Discussion 1. Weshow quantum minimax theorem, which gives theoretical basis in quantum statistical decision theory, in particular in objective Bayesian analysis. 2. For every closed convex subset of POVMs, all of assertions still hold. Our result has also practical meanings for experimenters. Future Works FutureWorks - Show other decision-theoretic results and previous known results by using quantum minimax theorem - Propose a practical algorithm of finding a least favorable prior - Define geometrical structure in the quantum statistical decision problem If you are interested in details, please see quant-ph 1410.3639 ; citations are also welcome!