SlideShare a Scribd company logo
UNIVERSIDAD IBEROAMERICANA
Entropy in the Mexican capital market
by
Wasim Alexis Mobayed Davids
A work submitted in partial fulfillment for the
degree of Bachelor in Engineering Physics
in the
Universidad Iberoamericana
Departamento de F´ısica y Matem´aticas
November 2016
“There is no useful information contained in historical price movement of securities.”
Louis Bachelier
“If you are a good economist, a virtuous economist, you are reborn as a physicist. But
if you are an evil, wicked economist, you are reborn as a sociologist.”
Paul Krugman
“It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you are. If
it doesn’t agree with experiment, it’s wrong.”
Richard Feynman
Figure 1: It always seems to lead back to math . . .
Taken from xkcd webcomics http://bit.ly/2fyuJC7
UNIVERSIDAD IBEROAMERICANA
Abstract
Universidad Iberoamericana
Departamento de F´ısica y Matem´aticas
Bachelor in Engineering Physics
by Wasim Alexis Mobayed Davids
We present a case study for the Mexican debt market using entropy, a concept recently
developed in the context of asset pricing, for the first time in (revised) literature. The
scope of this work is not only to make an empiric exercise, but rather to provide an
intuitive and understandable review of asset pricing, starting from the basic pricing
equation and developing it further to understand the underlying ideas behind asset
pricing theory. Additionally, an alternative proof of the existence of a state-price vector
and the absence of arbitrage as its consequence is presented using functional analysis.
Our results are consistent with the theory developed by Backus, Chernov, & Zin as
we have found that data successfully describes a negative horizon dependence curve.
Being a work submitted in partial fulfillment for the degree of bachelor in engineering
physics, this document also presents the author’s acquired learning of financial theory,
attempting to find parallelism with the physical sciences each time there is room for
comparison.
Acknowledgements
I would like to thank my advisor, Dr. Jos´e Miguel Torres for all his valuable help during
the realization of this work as well as my examiners, Dr. Alfredo Sandoval Villalbazo
and Dr. Carlos Ponzio. As a general comment and clarification towards this work,
I would like to mention that as a physics student I’ve always been fascinated by the
way that human behavior and social sciences can be so precisely reproduced through
mathematical models. After taking a stochastic calculus course with Dr. Torres in
2016, I started discovering the great existing parallelisms and similarities between the
intricate mathematical methods used in theoretical physics and the formal foundations
of financial and economic theory. This led me to develop a great interest in subjects
that had been absolutely foreign to me during my undergraduate formation. Thus, I
would also like to thank Dr. Torres for introducing me to what might as well be the line
of studies I would like to pursue in my graduate degree.
iii
Contents
Abstract ii
Acknowledgements iii
Preface v
1 Introduction 1
1.1 Brief introduction to asset pricing theory . . . . . . . . . . . . . . . . . . 2
1.2 The stochastic discount factor . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Dynamic asset pricing 8
2.1 Elements of functional analysis . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Distances and convex sets . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 The separating hyperplane theorem . . . . . . . . . . . . . . . . . 11
2.1.3 Farkas’ lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Arbitrage and state prices . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Entropy in finance 21
3.1 Risk-neutral vs real (physical) probabilities . . . . . . . . . . . . . . . . . 22
3.2 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Case study for Mexico 33
4.1 General overview of the Mexican treasury bills . . . . . . . . . . . . . . . 35
4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . . . 40
Bibliography 41
iv
Preface
The scientific method may solely be the greatest invention ever created by mankind.
Following its simple and elegant recipe we have been able to explore the intricacies and
whimsical nature of the genetic code to the unnatural and counter-intuitive qualities of
relativistic phenomena. Its importance is evidenced through its results, as it has proved
to be tremendously successful in fields as (similarly) unrelated to the exact sciences such
as psychology and sociology, just to name a few.
In essence, the scientific method is based on the measurable characteristics of a studied
event, and, as so, anything that is prone to be numerically quantified can be scrutinized
under the scope of the scientific method. The quintessential example of the direct
applicability of the scientific method must be physics. Do not take my word for it (after
all, I may be biased towards giving this statement) but a long tradition of scientific
philosophy may ascertain my claim. Rare as it may seem, physics and mathematics have
only existed as a commonly referred marriage since a single-digit number of centuries.
The early physical sciences thrived with the empiricism that modern day science seems
to lack more and more in our times.
Thomas Kuhn proposed that science evolves through the evolution of the underlying
assumptions we make about the world when studying a problem, as he explained in
The Structure of Scientific Revolutions [1]. Against all conventional wisdom, Kuhn
postulated that the history of science is characterized by a progression of paradigm
shifts, in which scientists’ very understanding of the nature of science changes. Not only
does this Kuhnian theory of scientific revolutions describe such events as the discovery of
atoms, but it has also been alleged to apply to mathematical developments such as the
process of the legitimization of irrational numbers. One major scientific revolution was
led (arguably) by Isaac Newton with the invention of calculus and its direct application to
the physical sciences. Even in those early days, economists started using those informal
v
Preface vi
notions of mathematics to try to explain social phenomena. Gottfried Achenwall, a
german jurist and economist is considered to be one of the fathers of statistics as he
was one of the first social scientists to incorporate rigorous mathematical analysis to his
work. After all, it is not a coincidence that statistics, being one of the pillars of modern
day science has its etymology in the word state.
Economics and finance soon became as intertwined with mathematics as physics once
did, leading to an extremely successful partnership from which powerful results have
arisen. Parallelisms between physics and economics are evident for anyone who studies
one of these fields and picks up a book of the other. It is well known that Louis Bachelier
was the first person to model the stochastic process now referred to as Brownian motion
as part of his PhD thesis concerning the theory of speculation. Some years later, a
well-known physicist by the name of Albert Einstein published On the Motion of Small
Particles Suspended in a Stationary Liquid, as Required by the Molecular Kinetic Theory
of Heat, and wrongly given the credit of being the first solver of this enigma. This story
was not mentioned as a mere distraction from the topic we try to deal with in this
work, but rather to exemplify how these disciplines bring each other forward more than
specialists tend to recognize. Physicists and economists have both incurred in great
mathematical efforts for the sake of generality, and have met more than once down the
road of formalism. As an attempt to become part of this growing collaboration, the
author presents this work to contribute to the best of his knowledge, limited as it still
may be . . .
Chapter 1
Introduction
The structure of this work will be divided into four basic parts, the first of which is this
introductory chapter that can be skipped by the impatient reader who is familiar with
the basic asset pricing theory that will be covered. The second part of the text is a
formal proof of the existence of state-price vectors as well as other important issues in
finance using functional analysis. After this, a case study of the Mexican capital market
will be developed based on the concept of entropy developed by ´Alvarez & Jermann [2],
and Backus, Chernov & Zin [3]. Chapter 3 will recover the idea of entropy developed
by the authors and chapter 4 will analyze real bond prices from an extensive database
to provide an empirical link to the theory.
Because this work is to be submitted as partial fulfillment for a Physics Engineering
degree, the reader might more commonly be familiar with scientific terminology rather
than that of finance. Because of this, constant references to physical parallelisms will be
made each time there is room for comparison, and basic finance and economics principles
will be explained thoroughly.
With this said, we will proceed to develop a basic introduction to asset pricing the-
ory based on John Cochrane’s premise which states that the basic pricing equation is
principle from which all special cases in valuation are derived. For a more detailed ex-
planation of these circumstances, the reader is invited to consult Asset pricing [4] by
the aforementioned physicist/economist.
1
Introduction 2
1.1 Brief introduction to asset pricing theory
The basic question asset pricing theory tries to solve is simple to state: how much
is an asset worth? By asset we understand a resource with economic value that an
individual or corporation owns with the expectation that it will provide a future benefit.
The really attractive side of handling this theory is its wide applicability; whether we
want to examine the price of an option, stock, bond or any other investment, asset pricing
theory proves to be a powerful ally. One intuitive hypothesis we could make about the
price of an asset is that it is directly related to the expected payoff it produces. However,
uncertainty of payoffs poses a risk that should be taken into account when pricing it,
since not all assets are submitted to the same risk of default. We also have to account
for the delay of the payment, since the investor incurs in greater risk by waiting longer
for its maturity. All of these intuitions are condensed quite precisely in the basic pricing
equation as it will hopefully become evident in this introductory chapter. In its simplest
form, it reads:
P = E[mx]. (1.1)
That is, the price P is the expected value of the payoff x discount by a random variable
m called the stochastic discount factor. We will proceed to understand this equation
based on a consumption pricing model derived from simple economic principles. The
intuition for this can be understood by examining the following examples:
• For a stock: We pay the price of a stock today at a price Pt, then the payoff will
be tomorrow’s price plus a dividend. That is:
Pt → xt+1 = Pt+1 + dt+1. (1.2)
• For a bond: Supposing we get a dollar for our investment at a discounted price P,
then we would have:
Pt → 1. (1.3)
Alternatively, we could invest that dollar and get the risk-free rate.
Introduction 3
• For a bet: Suppose you make a bet with your buddies on the next football game.
The entry price would be at P = 0 since no money is put down, and you could
either win or lose the settled quantity depending on the outcome of the match.
Pt = 0 → {0, 1}. (1.4)
All of these cases have one common characteristic: the randomness of payoffs (except
for bonds, of course). Thus, we have to introduce the term state-price security also
known as an Arrow-Debreu security. We can understand these securities as contracts
that agree to pay a specific payoff if a particular state occurs at a particular time in the
future and pays zero payoff in all the other states. The price of this security is the state
price of this particular state of the world, which may be represented by a vector. All
of the possible outcomes constitute a state space in which these vectors live. From a
physicist’s point of view, this sounds an awful lot like the state space used in quantum
mechanics. We will further examine these notions and similarities in the next chapter
when proving the existence of state-price vectors and its consequences.
Now we try to find what the value of a payoff is worth to an investor. For this, we need
a convenient mathematical framework to capture the desire of the potential stakeholder.
Since we are evaluating what an investor is willing to pay to get a random payoff, our
function needs to capture two evident facts about human behavior: people prefer money
now and at a lower risk. This leads us to what economists call the utility function.
Utility measures welfare or satisfaction of a consumer as a function of consumption of
goods. Assumptions of rationality of individual actors need to be made when modeling
this function, leading to a somewhat biased form of our hypothesis. These functions can
be as complex as the problem needs, but for our purposes we will model investors by a
utility function defined over current and future values of consumption as shown:
U(ct, ct+1) = u(ct) + βEt[u(ct+1)], (1.5)
where ct is consumption today and the second term is the expected utility of consumption
tomorrow discounted by some number. The parameter β present in the second term is
called the subjective discount factor (not to be confused with the stochastic discount
factor), which typically takes the value of β ∼ 0.95 at an annual basis.
Introduction 4
The internal utility function needs to capture the fact that satisfaction of the investor
is clearly related with more consumption, but also denote that this rise in utility will be
at a declining rate. To put it in plain words: your third pizza slice is not as satisfying as
your first; this is called the law of diminishing marginal utility. It implies our function
needs to asymptote as it reaches infinity and accordingly, reach low and even negative
values as consumption hits a low number. For this, a power function (also known as the
isoelastic function for utility) is proposed with the following form:
u(c) =
c1−γ − 1
1 − γ
. (1.6)
The γ parameter is a measure of risk aversion which will become clearer once we analyze
the form of this function. Another important thing to analyze is the derivative of
the utility function with respect to consumption. This is called marginal utility and
it measures how much utility changes with consumption, diminishing as consumption
increases as we would expect. To visualize these quantities, we will set γ to 1 using
l’Hˆopital’s rule to get:
lim
γ→1
u(c) = log(c),
u (c) = c−γ
= c−1
.
(1.7)
To visualize these functions:
Figure 1.1: Utility (blue) and marginal utility (orange) plots. We can clearly see
that the concavity of the logarithmic function is compliant with our necessity of utility
to asymptote when consumption increases. Marginal utility decreases with growing
consumption, as expected.
Introduction 5
As mentioned before, γ is a parameter of risk aversion which can be clearly seen by
examining the shape of the utility function. To exemplify: if we have a consumption
set at a value for ¯c with its respective utility and then take a bet, we can either win
or lose units of some good in a ∆ interval. Suppose the bet makes us gain or lose the
same amount and the two states are equally probable, like a coin toss. Should we win,
following the law of decreasing marginal utility, we can see that our increase of utility
would not be as big as the decrease for losing. If we examine this notion further, we
realize that the expected value of the utility will be lower than the utility of the original
¯c if we decide to take that bet. That is:
E[u(¯c ± ∆)] < u(¯c). (1.8)
This means the consumer is made less well-off if he is forced to take the bet. Funda-
mentally, what this reveals is that people dislike losses more than they value gains. We
are capturing this feature of human psychology through the concavity of the function,
which is regulated by parameter γ. Now we have a powerful mathematical form that
lets us separate the two crucial items expressed in the basic pricing equation. The dis-
count factor β tells us how much people dislike delay, giving more or less value to what
happens in the future. A larger value for gamma makes the utility function more curved
symbolizing a greater risk aversion.
We must bear in mind that the objective of any investor is to maximize his utility with
whatever resources he has. As such, we will let our investor buy or sell his payoffs xt+1
at a price pt. The amount he decides to buy or sell will be determined by his budget
constraint. Ideally, if he decides to buy ξ securities today, this investment should at
least represent a payoff tomorrow that would equal the utility he had before losing the
price of the assets. In other words, pay some today and lose the price of the security,
but in exchange get a little bit more of the payoff tomorrow. The optimal investment
would be the one in which the marginal cost of losing the consumption today equals the
marginal benefit of getting the payoff tomorrow. The problem is then translated to:
max
ξ
u(ct − ξPt) + βEt[u(ct+1 + ξxt+1)]. (1.9)
Introduction 6
By setting the derivative with respect to ξ to zero we obtain the first-order condition
for an optimal consumption choice:
ptu (ct) = βEt[u (ct+1)xt+1]. (1.10)
Re-arranging this expression, we solve for the price leaving us one step from getting to
the familiar expression of the pricing equation we presented earlier:
pt = Et β
u (ct+1)
u (ct)
xt+1 (1.11)
If we group the terms multiplying the payoff in the last equation, we obtain our first
expression for the pricing equation (3.3):
m = β
u (ct+1)
u (ct)
P = E[mx]
(1.12)
Most of the theory in asset pricing is derived from this equation in which the term we
designated as m is called the stochastic discount factor.
1.2 The stochastic discount factor
A further examination of the discount factor’s form will reveal where it gets its stochastic
attribute from. Not only does this term capture the two psychological behaviors we
studied in the last section, but it also brings an important element into discussion:
uncertainty. The payoffs definitely bring this challenge to the table, but we can also see
that the discount factor is a function of tomorrow’s utility, which is of course uncertain
as it is directly related to tomorrow’s consumption. As an example, the discount factor
for the power utility function we’ve been using would take the following form:
mu (c)=c−γ = β
ct+1
ct
−γ
. (1.13)
Introduction 7
An important thing worth noting at this point is that this model holds only for valuing
assets after the investment has been made. This equation characterizes how a consumer
with discount factor β and risk aversion γ feels about value once he got the payoff. This
chicken and the egg dilemma is solved by taking the perspective that the investor looks at
the payoffs and prices, and adjusts his consumption until it lines up with the price-payoff
equilibrium.
The SDF generalizes on the intuition we discussed earlier about the price of an asset
being related to its payoffs to any security, no matter what they are. If we bought a
risk-free asset, its price in an uncertainty free situation would be:
pt =
1
Rf
xt+1, (1.14)
where Rf is the gross risk-free rate. Assuming this rate is greater than one, this would
turn the 1/Rf factor into a discount factor and we would say the asset sells at a discount.
This is the common case for treasury bonds such as T-bills or Mexican federal treasury
certificates, commonly referred to as CETES.
Riskier assets will have lower prices than their risk-free counterpart so we have to dis-
count them further with a risk-adjusted discount factor. The pricing equation generalizes
this notion to be used in all of the classical issues of the theory of finance, which tells us
something transcendental about the stochastic discount factor: we can incorporate all
risk corrections into a single and unique discount factor. We will further study this
factor in the next chapter to understand it as a crucial element that safeguards the law
of one price.
Chapter 2
Dynamic asset pricing
In this chapter we look more closely at the stochastic discount factor, state-price vectors
and their implications. Through this close examination we will prove that the approach
we explained previously is valid making no assumptions about the utility functions or
the completeness of markets (a complete market is based on the premise that all Arrow-
Debreu securities are available).
As we anticipated at the end of the introductory chapter, the existence of a unique
discount factor is basically a re-statement of the law of one price or arbitrage absence.
By this law, if two asset portfolios have the same payoffs and risks, then they must
have the same price. Violation of this basic principle would present the opportunity of
an arbitrage profit where an investor could sell a portfolio at an expensive price and
then buy the same portfolio at a cheap price. That is, the price of a happy meal should
be the sum of the individual prices of a small burger, soda, fries, and a toy. Hedgers
and traders are constantly looking for arbitrage opportunities, but the proliferation of
high-speed computing and information technology are making arbitrage become extinct
in financial markets. As it is often said in finance: there is no such thing as a free lunch.
It is quite intuitive to think that if the law of one price holds, then the stochastic discount
factor must be unique. Otherwise, following the basic pricing equation P = E[mx], two
assets with the same payoff could have different market prices. As we will see in the
following section of this text, we can only guarantee the absence of arbitrage if and only
there is a state-price vector. The reader is invited to consult Functional Analysis [5] by
8
Dynamic asset pricing 9
Stein & Shakarchi for deeper understanding of the mathematical framework used in the
following sections.
2.1 Elements of functional analysis
In this section we will cover the basic elements in functional analysis that will be used
to give a theoretical foundation to this chapter. In some measure, these will be used
to prove the fundamental theorem of asset pricing and justify the need for a state-price
vector. The proof presented in the next section is mainly based on convex analysis, in
particular the separating hyperplane theorem and Farkas’ lemma. Gyula Farkas was a
Hungarian mathematician of the late nineteenth-century. Not much is known about him
other than his work and the fact that he practiced both law and mathematics; another
proof of the healthy coexistence of social and exact sciences.
Most of the concepts developed throughout this chapter are inspired by Stein & Shakarchi
[5] or the author’s personal notes from his topology courses in undergraduate school
(credit and gratitude to Dr. Edmundo Palacios). We will be using matrix notation to
indicate sets, and all vectors we define in the following pages will be considered column
vectors unless stated otherwise. It is intended for all of these proofs to be as intuitive
and easy-reading as plausible, so please excuse the author from lacking mathematical
rigor or seriousness at times. It’s all good fun and with the best intentions in mind.
2.1.1 Distances and convex sets
Dealing with distances and closeness between point is the first element that needs to be
covered in order to give consistency to our mathematical framework. For convenience,
we will be working with a Euclidean space equipped with a continuous norm. For closed
sets, it can be proved that there always exists a nearest point of a given point. We will
not go on and prove this since it is beyond the scope of this work but the basic idea
behind showing closeness of a set would require us to prove that the complement of this
set is open. This can be done by finding a ball with radius centered in u ∈ Rc such
that B (u, ) ⊂ Rc. That is, the ball is not contained in the complement of our space
R, so by definition, that makes it a closed set. The existence of the nearest point is
necessary, however, it need not be unique; an interesting feature of convex sets.
Dynamic asset pricing 10
In a Euclidean space, a convex set is a topology such that, for every pair of points within
it lies a straight line within the region. To give an example, a solid volume like a sphere
or a cube is convex, but any other shape with an indent or hollow space inside it is not.
To prove convexity of a set C we need to verify that λx + (1 − λ) y ∈ C and λ ∈ [0, 1].
That is, a set is convex if ∀ x, y ∈ C, C contains the line segment between x and y.
Graphically:
Figure 2.1: Graphical representation of a convex and non-convex set.
Another important element related to convexity that we will need to define for the sake
of the following theorems is the concept of a convex cone. A convex cone is simply a
convex set that is closed under linear combinations with positive coefficients. That is, a
cone C is a convex one if αx + βy ∈ C ∀ {x, y ∈ C : α, β ∈ R++}.
Figure 2.2: A convex cone (light blue). Inside of it, the light red convex cone consists
of all points αx + βy with α, β > 0, for the depicted x and y. Both regions extend
infinitely in upper right direction.
Dynamic asset pricing 11
2.1.2 The separating hyperplane theorem
A hyperplane generalizes on the notion of the classical plane we graph in R3, but much of
the way they are constructed remind us of our first lesson of multivariate calculus where
we conceived them as the dot product of two perpendicular vectors. More formally, we
can define it as a set H ⊆ Rn that satisfies H = x ∈ Rn : a†x = α for any nonzero
vector a and α ∈ R where the vector a is a normal vector to the hyperplane. We can
define a half-space as either of the sides our hyperplane divides the space Rn into, which
satisfy:
H+
a,α = x ∈ Rn
: a†
x α
H−
a,α = x ∈ Rn
: a†
x α
(2.1)
As such, hyperplanes can be used to separate two sets that can reside on either of the
half-spaces created by the presence of these objects. Consider two sets A and B in Rn.
It is said that the hyperplane Ha,α strongly separates A and B if there is an > 0
such that A ⊆ H+
a,α+ and B ⊆ H−
a,α− or vice-versa, and neither intersect the plane at
any point.
As the reader may imagine, not all sets can be separated by a hyperplane, and this
is why we defined convexity previously. We will prove that convex sets can always be
separated by a hyperplane via the separating hyperplane theorem. This theorem states:
Theorem: Let C be a convex set such that C ⊆ Rn and z ∈ Rn but z /∈ C, then C
and z can be strongly separated.
Figure 2.3: Figure 1 is a convex set like we defined earlier; figure 2, however, is not.
Taken from Caltech’s notes on functional analysis [6].
Dynamic asset pricing 12
Proof (based on Stein & Shakarchi [5] and Eggleston [7]):
First let C ⊆ Rn and z ∈ Rn, z /∈ C as expressed before in the statement of the theorem.
Now let p be the unique nearest point to x in space C. By definition, since C is a convex
set, the existence of p holds. Let x ∈ C and λ ∈ [0, 1]. Again, using an argument of
convexity, we have that (1 − λ)p + λx ∈ C, and with p being the nearest point, so it
should follow that:
(1 − λ)p + λx − z p − z , i.e.,
(p − z) + λ(x − p) p − z .
(2.2)
Equation (3.3) is nothing else than a re-statement of the triangle inequality for the points
inside the set C and the one outside of it. The intuition behind the inequality we want
to reach is one that makes explicit that the points involved lie on separate half-spaces
like the ones we have defined before. By calculating the inner products of the differences
and squaring both sides we obtain:
p − z 2
+ 2λ(p − z)†
(x − p) + λ2
x − p 2
p − z 2
. (2.3)
Finally, we subtract the term p−z 2, which now allows us to divide by λ (with λ → 0+),
and multiply by -1 to obtain the inverse inequality:
(z − p)†
(x − p) 0 ∀x ∈ C. (2.4)
If we now consider a hyperplane H which contains p with normal vector a = z − p,
as such H = x ∈ Rn : a†x = α, α = a†p , equation (2.4) shows that C ⊆ H−
a,α, and,
moreover, that z /∈ H−
a,α as z = p (because z /∈ C). Now we consider a second hyperplane
H∗ (that is, with the same normal vector) containing the point (1/2)(z + p), then it is
clear that H∗ separates z and C as desired.
2.1.3 Farkas’ lemma
Making use of all the concepts we have developed before, we present the Farkas’ lemma,
a direct consequence of the separating hyperplane theorem. Its primary function is to
Dynamic asset pricing 13
determine whether a system of linear inequalities has a solution or not. This result states
that a vector is either in a convex cone or that there exists a hyperplane that separates
the vector from a cone. This is a strict if, as there are no other alternatives for the cone
and vector. This means that given these two statements, one must have a solution, but
not both nor none.
Lemma: Let A be a matrix of m × n dimensions and C ∈ Rn. From this follows that
exactly one of the following systems of inequalities has a solution:
1.
Ax O and C†
x > 0 for some x ∈ Rn
. (2.5)
2.
A†
y = C and y O for some y ∈ Rm
, (2.6)
where O denotes the zero vector.
Proof (based on Kosaku [8] and Lee [9]):
We will denote the columns of A† by {a1, a2, ..., am}. System 2, represented by equations
(2.6), will have a solution if C lies in the convex cone spanned by {a1, a2, ..., am}. On the
other hand, system 1, represented by equations (3.1), will be feasible if the closed convex
cone {x : Ax O} and the open half-space x : C†x > 0 have a nonempty intersection.
Now, we will suppose that system 2 is solvable. That means there exists y O such
that A†y = C. Now let x be such that it complies with Ax O. By consequence
C†x = y†Ax 0. This means system 1 has no solution which is consistent with the
theorem.
For our proof to be complete, however, we must prove that it holds when system 1 does
have a solution; for this, we will suppose that system 2 has no solution and see what
follows. We will now construct a set S such that S = x : x = A†y, y O . It is clear
that this is a case of a convex set and that c /∈ S. For a nonempty closed convex set in
Rn and y /∈ S, we can prove that there exists a nonzero vector p and a scalar α such that
p†y > α and p†x α for each x ∈ S. We will not go into proving this but rather use its
result to say that there must be a vector p ∈ Rn and a scalar α such that p†C > α and
p†x α ∀ x ∈ S.
Dynamic asset pricing 14
Since the zero vector is included in S, and α 0, this implies that p†C > 0. Other
conditions that are also met are α p†A†y = y†Ap ∀ y O. Now, because y O can
be made as large as we want, the last inequality means that Ap O. The importance
behind this confusing development is that we have constructed a vector p ∈ Rn such
that Ap O and C†p > 0, which means that system 1 has a solution when system 2
hasn’t. This completes the second part of the proof as we can see that the solutions of
systems 1 and 2 are mutually exclusive.
The geometric interpretation we provided at the beginning of this section may be more
intuitive for the reader so we will provide an example following this definition. We
understand Farkas’ lemma geometrically by stating that the vector C is either:
1. in the cone generated by the columns of A† (a non negative linear combination of
the columns of A), or
2. there is a y ∈ Rm that makes an acute angle with the vector C and a non-acute
(could be right or obtuse) angle with every column of A.
After much painstaking picture editing in non-friendly editing software, we present the
following graphic intepretation:
Figure 2.4: This is an example for a matrix A ∈ R2×4
. In the left figure we see a
hyperplane containing the origin with vector y as its normal vector. This hyperplane
separates C from the cone spanned by the columns of A, labeled with an. It is clearly
seen that the vector y forms an acute angle with C and obtuse angles with the column
vectors of A. In the right figure we see the vector C being contained by the cone
generated by the aforementioned vectors.
Dynamic asset pricing 15
2.2 Arbitrage and state prices
As anticipated, this section will deeply develop the notions of state-price vectors, arbi-
trage, and their co-dependency. To do this, we will work with the concepts developed
and proved earlier. Once again, we will be dealing with uncertainty as it is one of the
central subjects in asset pricing, but we will treat it as state space. As such, our secu-
rity’s price will be the state-price weighted sum of its payoffs in all possible states of the
world.
We will represent uncertainty by a finite set of states {1, 2, 3, ..., S} that define the payoff
of the related asset. Referring to the physicist in all of us, we can think of this as a
“limited” Hilbert space used to represent quantum states. The space for the securities
will be spanned by an N × S matrix D, with each entry Di,j representing the payoff of
asset i in state j. S configures our state space and N is the cardinality of the number
of different assets contained in our portfolio; we will refer to D as the payoff matrix.
Following these definitions, a portfolio will be constituted by a linear combination of
assets. As such, the denoted portfolio θ ∈ RN and each entry will be the amount of
assets i included in the portfolio. With these two, we can construct a payoff vector given
by D†θ ∈ RS. The last element we need to define is the price of the portfolio, which
will be computed by executing q†θ with q ∈ RN (each entry is the price for a unit of
security i). An example is given to illustrate these definitions. Let’s consider a portfolio
consisting of three securities with one of these being a risk-free asset in three possible
states of the world. A possible payoff matrix is:
Di,j =





1 0 2
1 1 1
4 1 5





∈ RN×S
= R3×3
. (2.7)
As we can see, the security N = 2 produces the same payoff in any state of the world,
and as such, it constitutes a risk-free asset. We will now define a price vector which
tries to be consistent with the notion that RFA’s have higher prices due to the absence
of uncertainty, but stocks can have higher payoffs and should be valued accordingly. If
we give equal probabilities to the three states, then the price will be the expected value
of the payoffs discounted by a factor that should discount its price accordingly. Since
Dynamic asset pricing 16
this factor is stochastic as we have discussed before, we will define it arbitrarily for this
example. We will discuss the choice of the discount factor to show the intuition, trying
to be consistent.
qi =





0.8
0.95
3.23





∈ RN
= R3
. (2.8)
The price vector was constructed using the following discount factors mi = (0.8, 0.95, 0.98),
which should be consistent with the payoffs of each asset. If we examine the expectation
of the payoffs (weighing each state with the same probability), then we can see that
securities 1 and 2 would need to have the same price to follow the law of one price.
However, asset 2 is risk-free and, as such, should have a higher price than asset 1 which
could default in state 2 as we can see in the payoff matrix (3.6). Similarly, asset 3 has
a lower risk than its counterparts since it always pays and at a higher expectation than
the others; because of this, it is penalized with a larger discount factor and turns out
to be quite expensive. An investor would then look at these and make the best decision
to maximize his utility as we discussed in chapter one. The idea behind a state-price
vector is to generalize on the notion of the stochastic discount factor into a more general
identity we will discuss later.
Following the initial definitions, we will proceed to create a portfolio θ with the three
securities described above. Let’s say the investor is willing to invest $ 43 in some portfolio
and decides to conform it the following way:
θi =





5
7
10





∈ RN
= R3
. (2.9)
We will not go into the discussion if this was an optimal choice or not (it probably isn’t
since he is incurring in risk by buying asset 1 which has the same expectation than 2,
which is an RFA), but rather use it to compute the payoff matrix.
Dynamic asset pricing 17
D†
θ =





1 1 4
0 1 1
2 1 5










5
7
10





=





52
17
67





∈ RS
= R3
(2.10)
Being an element of the state-space, this vector expresses the payoff for portfolio θ in
all states of the world. As we can see, state 3 is the most favorable for the investor, as
it could be clearly predicted just by looking at the payoff matrix. State 1 is almost as
favorable as payment is only a unit lower for each asset than state 3. It is worth noting
that given the dimensions of the matrices involved in the multiplication, the reason for
the payoff matrix to belong in the state-space was interchanging the dimensions of the
payoff matrix when calculating its transpose. Even when the dimensions of rows and
columns agree, we treat with different elements after rotating the original matrix D.
The only element that remains to be calculated is the market value of the portfolio,
which can be easily computed by taking the sum of the dot products of the price of each
asset and the amount of corresponding units in the portfolio:
Market value = q†
θ =
N
i=1
qi · θi =
3
i=1





0.8
0.95
3.23





·





5
7
10





= 42.95 ∈ R (2.11)
The price of portfolio θ is then $ 42.95 which falls into the $43 budget constraint pre-
sented by the investor. With these concepts clarified, we can now proceed to use them as
part of our analysis. With this new language, we will define arbitrage as having q†θ 0,
a portfolio with negative or no cost, having D†θ > 0, a positive payoff. In other words,
a portfolio offering “something for nothing.”
A state-price vector ψ is a vector that satisfies the expression q = Dψ, and we can
understand it as a functional relating prices with payoffs. We will try to make it evident
that the ruling out of arbitrage gives way for the existence of state-price vectors. Before
going into the proof, some notational clarifications will be made. For notational prices
we will define:
Rn
+ = {x ∈ Rn
| x 0} , Rn
++ = {x ∈ Rn
| x > 0} . (2.12)
Dynamic asset pricing 18
Theorem: There is no arbitrage if and only there is a state-price vector.
We will prove this theorem works both ways (representing arbitrage with variable A)
and, as such:
A ⇐⇒ ∃ψ (2.13)
Proof: The first part of this proof is the left-right direction of the statement. As we
will witness, proving the existence of the state-price vector as a consequence of arbitrage
will be a much more complex task than proving the converse theorem. After proving the
existence of this vector it will be pretty straightforward to see that absence of arbitrage
directly follows from it.
We will begin by stating some important notions will be used throughout this demon-
stration. First, we will define the vector space containing all the elements used in this
proof. Let L be:
L = R × RS
+ ∈ RS+1
. (2.14)
The familiar L character used for a normed vector space with finite dimensionality is not
a coincidence for the curious physicist who is noting similarities, since this is purposely
used to denote a Lebesgue space much like the well-known L2 Hilbert space used in
quantum mechanics.
As we defined earlier, the conditions for arbitrage to exist are that q†θ 0 while D†θ > 0.
The space containing these two functions is L since the market price q†θ ∈ R and the
payoff D†θ ∈ RS, so both are subsets of L. To be able to use the Farkas’ theorem
as we will develop up next, we need the functionals q†θ and Dθ to be convex sets.
Following the definition and proof we presented earlier, we can now define a set M =
(q†θ, D†θ) : θ ∈ RN ⊆ L that we will prove to be convex. Adhering to the description
of a convex set, we can say with no loss of generality that the set M is convex because
elements m1, m2 ∈ M satisfy:
Dynamic asset pricing 19
αm1 + (1 − α)m2 = α


q†θ1
D†θ1

 + (1 − α)


q†θ2
D†θ2

 =


q†(αθ1 + (1 − α)θ2)
D†(αθ1 + (1 − α)θ2)

 ∈ M.
(2.15)
Also, α ∈ [0, 1], ∴ M is convex. Now, because M is convex and either of the functionals
q†θ, D†θ ⊆ M, then each of these are convex as well. Duffie proves the existence of
a state-price vector using the Riesz representation theorem which is also used in the
popular braket notation used in quantum mechanics, as it abstracts the functionals
used to describe quantum states. Elegant as this proof may be, we do not think it
is as intuitive as the reader might wish; this is why we opt to make use of the Farkas’
lemma we have discussed before. We adhered scrupulously to the precept of the brilliant
theoretical physicist, L. Boltzmann, according to whom matters of elegance ought to be
left to the tailor and to the cobbler.
If we have followed the mathematical development this far, it will be clear that this
resembles the first set of inequalities that the Farkas’ lemma uses to separate vectors.
Our purpose will be to reach a situation in which the solution for the inequalities are
mutually exclusive.
To have the same form of the problem expressed in equations (3.1) and (2.6), we need
to be looking for the solution of a linear equation for some vector which we expressed
as y ∈ Rn
++ according to Farkas’ lemma. We can express our system of inequalities as
such:
1.
q†
θ 0
D†
θ > 0
(2.16)
2.
q = Dψ
ψ > 0.
(2.17)
Once we have fully understood the Farkas’ lemma (and proved as well for the more
formal reader), it is pretty straightforward to see that, by theorem, the absence of
arbitrage (the infeasibility of the first system of equations) implies that there is a solution
Dynamic asset pricing 20
for the second system of equations. Then, this implies there is a non-negative vector
ψ = {ψ1, ψ2, ..., ψS} such that the price of any asset in our portfolio is given by q = Dψ.
This vector ψ is our desired state-price vector. We have proved that given a no-
arbitrage condition a state-price vector exists. Now that we have ψ, it is clear to see
that by the Farkas’ lemma, the system q†θ 0, D†θ > 0 has no solution. Since this
system is essentially the definition of arbitrage, we can rule it out to say that ∃ψ → A.
We have now proved the two directions of the theorem.
Translated to financial terms, Farkas’ lemma tells us that there either is a way of assign-
ing a non-negative price to a dollar in each state in a way that the price of each asset is
just the sum total of the value of its payoffs, or there is a portfolio (with negative price)
whose payoffs are non-negative, which means that you are “being paid to hold it.”
Chapter 3
Entropy in finance
Entropy is probably one of the most widespread concepts used in the physical sciences.
It is relevant to astrophysicists and string theorists alike, but its meaning differs greatly
depending on the context in which it is utilized. As a fundamental aspect of thermo-
dynamics and physics, several different approaches to entropy beyond that of Clausius
and Boltzmann are valid; this is probably one of the reasons of why this concept is often
misinterpreted.
Entropy has often been loosely associated with the amount of order or disorder (chaos)
in a thermodynamic system. The classical qualitative description of entropy refers to
changes in the status-quo of the system and is generally a measure of the usefulness
of energy and the amount of energy wasted in some transformation from one state to
another. Because of its seemingly wide application potential, it has been adopted as
common scientific argot in a series of disciplines that far exceed the realm of physics,
ranging information theory to economics and finance. Rather than to extrapolate the
implications and meaning of this concept to other fields, its mathematical structure is
what proves to be valuable to seemingly unrelated problems. Our early thermodynamics
and statistical mechanics courses will tell us that the entropy of a system is the nat-
ural logarithm of the number of possible configurations, multiplied by the Boltzmann
constant kB. That leads us to the familiar equation:
S = kB log Ω. (3.1)
21
Entropy in finance 22
This does not mean that the interpretation of a measure of order holds in other disciplines
but in essence, we do want to capture a measure of dispersion. For our purposes, we
will base this section on the definition of entropy developed by Backus, Chernov, & Zin
in their paper Sources of Entropy in Representative Agent Models [3]. In this work, the
authors use entropy as a measure of dispersion in the pricing kernel of an asset. As
we explained before, the pricing kernel is just another way of referring to the stochastic
discount factor, the notion of which we generalized in the last chapter through the
existence of state-price vectors. Backus, et. al. claim that excess returns are reflected
in the pricing kernel’s dispersion for risky assets and through the dynamics of risk-free
bond yields.
The reason for the success of entropy in finance relies on the fact that it is more easily
extensible to multiple periods of time than other statistical measures such as standard
deviation or variance. In some sense, entropy will generalize on the concept of variance.
A second reason for its applicability is that many models in asset pricing have a loglinear
behavior, such as the isoleastic function we studied in chapter one. Finally, using entropy
allows us to deviations from the normal distribution for pricing kernels in a simple
manner. In this section we will develop the basic mathematical intuition behind entropy
in finance.
3.1 Risk-neutral vs real (physical) probabilities
We will try to be as consistent as we can in order to maintain notation used in previous
chapters and, with this in mind, proceed to define risk-adjusted probabilities. We can
construct a vector of probabilities of the form p ∈ RS
+ such that p1 + p2 + ... + pS = 1.
We now define a state-price vector for a tuple (D, q) and let ψ0 be the sum of all possible
vectors for all states such that ψ0 = ψ1 + ... + ψS. Let ˆψj = ψj/ψ0 ∀ states j. This will
constitute a new vector of the form ( ˆψ1, ..., ˆψS) of probabilities for every asset i in every
state j. As such, we can now rewrite q = Dψ as:
qi
ψ0
= ˆE(Di) ≡
S
j=1
ˆψjDi,j. (3.2)
Entropy in finance 23
We can interpret this as a “normalized price” of a security with risk-neutral probabilities.
This means that if we had a portfolio θ with D†θ = (1, 1, 1, ..., 1), then ψ0 = θ · q is the
discount for a risk-free investment. Then the price for any security i would be given by
qi = ψ0
ˆE(Di), returning to a familiar notion of a price of an asset to be its discounted
payoff with this notion of artificial probabilities incorporated. This notion will become
clearer in the next exercise.
Now we examine the price of a risk-free asset in which we will consider, as said before,
that all the entries in matrix D are ones. Because of this, we have:
qf = Df ψ =
S
ψS. (3.3)
We can express the gross interest rate for a risk-free asset as the payoff (which we will
express as 1 for convenience), over its price:
Rf =
1
qf
=
1
S ψS
. (3.4)
This equation resembles the classical present value formula in which the future value of
an investment is divided by the gross-rate to obtain the current value. If we follow the
notion of the normalized price for any asset i in the portfolio, and then multiply and
divide equation qi = S Di,sψS by the sum of the entire state-price vector’s entries we
can obtain:
qi =
S
Di,SψS =
t
ψt
S
Di,S
ψS
t ψt
. (3.5)
If we further examine the fraction argument of the last summation over S, we can see that
this is no more than an element of vector ψt divided by the sum of all its entries. This
means that this number will always be positive and ranging the (0, 1] interval. We can
conceive this number as a “probability” which we shall define as risk-neutral probabilities
and denote P∗. If we solve for the gross interest rate expressed in equation (3.3), we can
incorporate this in the price equation substituting the risk-neutral probability by P∗
S to
obtain:
Entropy in finance 24
qi =
1
Rf
S
Di,SP∗
S =
1
Rf
E∗
[Di] (3.6)
This means that the price of any asset is the “expected value” of the payoff discounted
by the risk-free rate. The reason for the superscript in the expectation operator in
equation (3.6) and quotation marks in the last sentence are indicating that we are not
using the real probabilities for state S but rather the risk-neutral probabilities. Using
these probabilities directly would mean that an investor is valuing an asset as if it were
not submitted to any risk. Because of this, we need to clearly differentiate them from
real or physical probabilities which we will denote by PS to distinguish from the
risk-neutral analogous.
To deeply understand what this probabilities try to measure we have to return to some
of the basic concepts we discussed in the first chapter of this text to know why they
are important for a consumer. We remember that an investor’s purpose is to maximize
his utility based on consumption while reducing the risk of his investment. If we have a
risk-averse investor, we have studied that his utility function has a concave form which
we abstracted using the isoelastic function. However, an investor who is risk-neutral will
have a utility function in which losses are equally valued to gains so his wellness will be
altered in the same manner for any positive or negative change. Thus, a risk-neutral
investor’s utility function is nothing but a straight line. In the next figure we will study
the implications of risk-neutrality and risk-aversion as it will give us clear insight as to
what entropy tries to measure.
In figure 4.5, the concave function is our familiar risk averse utility function. In contrast,
the straight line (red) represents a risk-neutral investment. We can see that evaluating
the expected value in the utility function does not reside in our curve. However, if we
find the point in which it does by translating this point leftwards, we can see that it
is situated at an x value below the expected utility of the utility function. This point
is called the certainty equivalent (labeled CE in the figure) and it represents the value
below the expectation of the risky investment that the investor is willing to take to
“buy” certainty. In the figure exhibited above we have:
Entropy in finance 25
Figure 3.1: Graphical representation of a risk premium and its connection with en-
tropy represented by letter L
U(x) =
x1−γ
1 − γ γ=0.9
U(CE) < U(E[x]).
The difference between the utility for the expected value and the utility for the certainty
equivalent is the least benefit the investor expects from taking a risk. This is expressed
in a difference of wealth x signaled in the figure as a risk premium. Thus, we can relate
this to what we have been studying by analyzing that if the risk-adjusted investments
yield very close benefits than those of risk-neutral ones, then we might be better off
holding only bonds in our portfolio. Or, are we? This is one of the questions entropy
tries to solve as it tries to examine the dynamics of parameter L = U(E[x]) − U(CE),
or, in other words, the relation between physical and risk-neutral probabilities.
3.2 Entropy
As stated before, entropy tries to measure how much real probabilities diverge from
risk-neutral probabilities. The basic framework that the definition of entropy uses is the
Entropy in finance 26
Kullback-Leibler divergence, which quantifies how much two probability distributions
vary. For discrete probability distributions P and Q, the Kullback–Leibler divergence
from Q to P is defined to be:
DKL(P Q) =
i
P(i) log
P(i)
Q(i)
. (3.7)
In words, it is the expectation of the logarithmic difference between the probabilities
P and Q, where the expectation is taken using the probabilities P. Multiplying the
probability by the distribution generates the expectation for each event, and in our
particular case, we use it to define relative entropy, which reads:
Lt
P∗
t,t+n
Pt,t+n
= −Et log
P∗
t,t+n
Pt,t+n
. (3.8)
For the sake of the notation used in the first part of this chapter, we will remain using P∗
for risk-neutral probabilities and P for their real or physical analogous. True probability
for a state in time t + 1 conditioned on time t is given by Pt,t+1 = P(xt+1|xt), hence the
subscripts in the equation above. The intuition behind entropy suggests that a greater
difference between true and risk-neutral probabilities should be associated with a larger
risk premium. The goal for this tool is to connect the properties of excess returns to
features of pricing kernels, which is nothing but a synonym for the stochastic discount
factor as it will soon be clear. Risk premiums will then be associated to variability in
the P∗
t,t+n/Pt,t+n ratio. For a better handling of equation (3.8), we will make use of the
fact that E[P∗
t,t+n/Pt,t+n] = 1, and thus log E[P∗
t,t+n/Pt,t+n] = 0 to rewrite entropy as:
Lt
P∗
t,t+n
Pt,t+n
= log Et
P∗
t,t+n
Pt,t+n
− Et log
P∗
t,t+n
Pt,t+n
. (3.9)
This rearrangement of the equation makes it easy to see that if the ratio is constant, it
will be equal to one and the entropy will be zero. Because of the logarithmic argument,
we can see that the function tells us that entropy can not take negative values and that it
increases with variability on the ratio. These are the basic characteristics of a dispersion
measure, which hopefully will allow us to link theoretical ideas to real data like asset
returns and bond yields.
Entropy in finance 27
Now we will proceed to rewrite entropy in terms that are more familiar to us. At this
point it is worth noticing that this is just a re-statement of the basic pricing equation
we have been developing since chapter one with a new perspective telling us that the
stochastic discount factor is just a ratio of probabilities of the form:
q = E Di
P∗
t,t+n
Pt,t+n
where m =
P∗
t,t+n
Pt,t+n
. (3.10)
Now, going back to out pricing equation and substituting the payoff x by the interest
rate, which are related by r = x/q, we obtain:
Et[mt,t+nrt,t+n] = 1, because q = E[mx], and r =
x
q
. (3.11)
The t+n subscript used in equation (3.11) indicates we are dealing with a pricing kernel
(or stochastic discount factor) and gross returns on an n-period time horizon. This
kernel can be decomposed into a singular period series of multiplied kernels as such:
qt = Et[mt,t+1Et+1[mt+1,t+2qt+2]], (3.12)
and so on, recursively, until reaching period n where the last argument will at the end
of the series will be Et+n−1[mt+n−1,t+nqt+n]. In words, it means that the stochastic
discount factor, and the gross return on an asset for n periods can be decomposed into
the multiplication of one-period pricing kernels.
mt+n =
n
j
mt+j−1,t+j,
rt+n =
n
j
rt+j−1,t+j.
Now we define conditional entropy incorporating the stochastic discount factor into our
previous definition and taking it to an n temporal horizon to obtain:
Lt(mt,t+n) = log Etmt,t+n − Et log mt,t+n. (3.13)
Entropy in finance 28
We now take conditional entropy and compute its expected value to obtain a mean value
which we will simply refer to as entropy. Then, we will scale it by the time horizon
to finally obtain a mean entropy per period. We can examine a conditional entropy per
period by setting the n value to a desired number. Entropy provides an upper bound for
the mean of excess returns and, as such, we can assume that any return will be “under”
this limit. This will allow us to set convenient inequalities from which interesting results
can be derived. Mathematically:
ELt(mt,t+n) = E log Etmt,t+n − Et log mt,t+n,
I(n) =
1
n
ELt(mt,t+n).
To be able to determine these inequalities, it is worth remembering that for a risk-free
asset we have:
E[mrf ] = 1
E[m] =
1
rf
= qf .
(3.14)
Now, we can examine conditional entropy for one period and relate it to one-period
excess returns. An excess return will be represented by the risk premium generated
by the difference of a risky asset and a non-risky one. Like we stated before, entropy
provides an upper bound for these returns and, thus, its value will always be higher or
equal to the risk premium’s expectation. That is:
I(1) = ELt(mt,t+1) E[log rt,t+1 − log r1
t,t+1]. (3.15)
In the words of Backus et. al.: mean excess log returns are bounded above by the (mean
conditional) entropy of the pricing kernel [3]. It will be useful for the next part of the
procedure to notice that, for a risk-free asset, the concavity of the logarithmic function
makes the next inequality hold:
Entropy in finance 29
log E[mt,t+1rt,t+1] = log(1) = 0 E log(mt,t+1rt,t+1) = E log(mt,t+1) + E log(rt,t+1).
(3.16)
The E log(mt,t+1) term in the equation is conveniently included in the definition of the
conditional entropy (see (3.13)), which allows us to incorporate Lt to the inequality by
substituting the expectation of the pricing kernel’s logarithm with log Emt,t+1 − Lt to
obtain:
Lt log E[mt,t+1] + E[log rt,t+1]. (3.17)
Substituting the expected value of the pricing kernel with the risk-free rate’s inverse
(E[m] = 1/rf ), we finally obtain the entropy bound we anticipated:
L E log r − log rf (3.18)
If we take the expectation of both sides of the last equation, we recover the expression
for the upper bound presented in equation (3.15). Now we will make a brief parenthesis
to develop a concept that will be used in the following section: cumulants.
Cumulants are a set of quantities that provide an alternative to the moments of a dis-
tribution. Using cumulants provides the basis for an alternative manner to produce
analytical results instead of dealing directly with probability density functions. They
will be useful for us because, as it will be clear in the next paragraphs, we will be able
to rewrite expectancies with analytical expressions that make our mathematical han-
dling easier and more revealing. The cumulant generating function is simply the natural
logarithm of the moment-generating function MX(t) := E etX , t ∈ R. Deriving with
respect to the variable t and evaluating at zero produces the nth moment for the distri-
bution. We show the definition for the cumulant generating function and the calculation
of the first cumulant:
K(t) = log E etX
∂K(t)
∂t t=0
= E[x].
(3.19)
Entropy in finance 30
The cumulants kn can be obtained as a power series using a Maclaurin series. This makes
the series be centered at zero, and is consistent with the calculation of the cumulants we
presented above by differentiating the first expression n times and evaluating the result
at zero. The series expansion reads:
K(t) =
∞
n=1
κn
tn
n!
= µt + σ2 t2
2
+ . . . (3.20)
where κ represents the nth cumulant in the series.
For our purposes, we will find it convenient to use cumulants to determine the rela-
tion between one-period entropy and the conditional distribution of log mt,t+1. The
corresponding cumulant generating function is:
Kt(s) = log Et[es log mt,t+1
], (3.21)
and its corresponding series expansion:
Kt(s) =
∞
j=1
κjt
sj
j!
. (3.22)
If we evaluate the generating function and the first cumulant at s = 1, we can see that:
Kt(1) = log Et[elog mt,t+1
] = Et[mt,t+1]
κ1t =
∂K(t)
∂s s=0
=
∂
∂s s=0
log Et[es log mt,t+1
] = Et log mt,t+1
We now substitute these expressions in the definition of conditional entropy (3.13) to
obtain:
Lt(mt,t+1) = log Etmt,t+1 − Et log mt,t+1 = Kt(1) − κ1t
= κ2t
log mt,t+1
2!
+ κ3t
log mt,t+1
3!
+ κ4t
log mt,t+1
4!
+ . . .
(3.23)
Entropy in finance 31
It is easily verifiable through an examination of the cumulant generating function that
the first cumulant is the mean, followed by the variance, skewness, kurtosis and so forth.
This means that if the distribution of mt,t+1 is normal, then any cumulant of orders j 3
will be zero. Of course, we do not expect for the pricing kernels to be distributed in a
normal fashion, and because of this we need a convenient mathematical form to measure
deviations from normality. We now define a horizon dependence as the difference in
entropy over horizons of n and one, respectively:
H(n) = I(n) − I(1) =
1
n
ELt(mt,t+n) − ELt(mt,t+1). (3.24)
If all the pricing kernels for an n period are independent and identically distributed,
then we would expect for the pricing kernel to just be a scaled version of a one-period
kernel. This would mean that the conditional entropy would be the same for I(1) and
I(n):
ELt(mt,t+n) = nELt(mt,t+1) → H(n) = 0 (3.25)
Again, we find room for comparison with physics, as this is a generalization of a well-
known characteristic of random walks, often used to model diffusion problems like Brow-
nian motion. This generalization is the proportionality of variance to the time interval.
For a particle in a known fixed position at t = 0, the central limit theorem tells us that
after a large number of independent steps in the random walk, the walker’s position is
distributed according to a normal distribution of total variance. However, as we have
anticipated before, this is far from being the case for reality, as history has shown us that
horizon dependence reflects important departures from the independent and identically
distributed case. However unfortunate this is for the sake of predictability, it allows us
to have some measure of the pricing kernel’s dynamics and even more importantly, it is
observable through its connection to bond yields. We will rewrite conditional entropy
incorporating bonds into its definition to make this connection explicit.
Lt(mt,t+n) = log Etmt,t+n − Et log mt,t+n = log qn
t − Et
n
j=0
log mt+j−1,t+j, (3.26)
Entropy in finance 32
where qn
t is the price of a bond for an n period horizon. Entropy is therefore:
I(n) =
1
n
E log qn
t − E log mt,t+1. (3.27)
Now we need to relate bond prices to its yield, and for this we will make use of both the
time-horizon expression and the conditional entropy. We remember that the price of a
bond given by the present value formula is given by (consider payment of one unit):
qn
f =
1
(1 + r)n
= y−n
. → log qn
f = −ny → yn
t = −
1
n
qn
t . (3.28)
As we can see, we have computed the logarithm of the price as the entropy definition
requires, obtaining a relationship with its yield. We can now take these yields and plug
them into the entropy equation to obtain:
H(n) = −E(yn
t − y1
t ). (3.29)
This last expression for horizon dependence as a function of yield spreads tells us a lot.
It tells us that if the mean yield curve increases, it will have a negative value as the
yield for a t horizon is larger with relation to a single-period one. A positive horizon
dependence would be revealing of a decreasing yield curve. The observation of excess
returns for stocks have showed that one-period entropy is larger than that of bonds,
which is typically less than 0.1 % in most cases for observable time horizons. These
bounds are used by Backus et. al. as diagnostics for candidate pricing kernels. Thus
far, we have proved the existence of a pricing kernel and, with the help of entropy,
suggested a possible measurement instrument for it. Now, we will proceed to study,
for the first time in (the revised) literature, the case of entropy for the Mexican debt
market.
Chapter 4
Case study for Mexico
After a long and winding development of asset-pricing theory and one of its latest ap-
plications, we present an empiric study in the hopes that it will provide an intuitive
perspective of the discussed concepts. Our primary interest is to show that the yield
curve for bonds in Mexico is ascending, and as such, will have a negative horizon depen-
dence. We will show that bonds have small excess returns, that is, smaller than most
equity indexes.
The data used for this very brief case study was provided by Dr. Torres as part of his
personal archive and is at the disposal of the author should the reader want to consult
it further. This database is extensive both in content and in variety, for it is a very
representative sample of the Mexican debt market, showing registers for treasury bills
(CETES), interbank offering rates in Mexico (TIIE), M-bonds (coupon bonds), etc. So
far we have been discussing bonds without really defining them, so even when this issue
should have probably been addressed before, it will be relevant to start off with a brief
background of this financial instrument’s development for our country.
Our database has some chaotic points throughout time, as two major financial crisis
impacted the price and yields of bonds in our country. Although the author was just a
toddler at the time, the financial crisis of 1994 may be remembered by the reader. This
economic crisis with international repercussion was primarily ignited because of a lack of
international reserves, causing the devaluation of the Mexican peso during the first days
of president Ernesto Zedillo’s administration. A few weeks before the beginning of the
process of the peso’s devaluation, president Clinton requested Congress the authorization
33
Case study for Mexico 34
of a credit line for 20 billion dollars for the Mexican government, so it could guarantee
full compliance of its obligations registered in this currency. Although it is not the scope
of this work, it is worth saying that among the causes involved in this economic crisis
was the exercise of the North American Free Trade Agreement, which made Mexico
an attractive place for investment. President Salinas de Gortari took advantage of this
situation and used it to finance his administration through the emission of CETES.
These bonds had a short maturity term and were bought and sold in pesos, but were
protected against devaluations by being quoted in dollars. This meant that when these
instruments reached maturity, the holder was paid with the spot exchange rate and dis-
encouraged investors to buy dollars as there was already an instrument with equal or
higher yields. This caused an enormous attraction of investors towards these instruments
which provoked us having an overvalued currency. This was a major disincentive towards
real investments which caused a decrease in trades and exports.
We can see another spike in rates during the 1998 Russian financial crisis. We will not go
into studying the latter, but it is worth noting that one of the mechanisms to stabilize the
Russian market involved swapping out enormous volumes of maturing GKO’s (Russia’s
government bonds) into long term European bonds that would later be issued (and
bought by many European countries). The following figure is very revealing of these
situations:
Figure 4.1: Average CETES rates over time series generated using mentioned
database. Two points are signaled. Labeled MTC and RFC are the Mexican Tequila
Crisis and Russian financial crisis, correspondingly. After these two we have seen a
substantial decrease in rates which have stabilized over the last decade.
Case study for Mexico 35
4.1 General overview of the Mexican treasury bills
Mexican federal treasury bills, commonly referred to as CETES are the oldest debt
obligation emitted by the Federal Government. They were issued for the first time in
January 1978 and have constituted a fundamental pillar of the Mexican capital market
ever since. These titles belong to the family of bonds denominated zero-coupon bonds
which pay no interest to the holder except for the natural discount at which they are
offered. CETES are offered at a price below their nominal value; they are issued at a
substantial discount to par value, so that the interest is effectively rolled up to maturity
(and usually taxed as such); the bondholder receives the full principal amount on the
redemption date. Banco de M´exico issues these bonds at four maturity terms: 28 days,
91 days, 182 days, and 364 days. However, the Bank has issued these bills at very short
terms as reduced as a week and up to 728 days.
The primary manner to issue these titles is through weekly auctions in which participants
present bids for the amount they wish to acquire as well as the interest rate they are
willing to pay. This is often called the primary market and its historical data can
be found at the Bank of Mexico’s registers with a weekly resolution. However, the
secondary market provides the alternative to buy and sell these titles at the disposal of
the consumer, submitting their price to a regular exercise of supply and demand. These
bills are often used as underlying assets in the derivatives and futures market.
Banco de M´exico, as the financial agent of the Mexican Federal Government has the
mandate to carry out weekly primary auctions of its securities according to a predeter-
mined calendar as well as a coordinated strategy released by the Secretariat of Finance
and Public Credit (Hacienda) each quarter. These exercises are held as Dutch auctions,
where there are multiple winning bidders. Because of this, securities are allocated in line
with ascending order of the corresponding discount rates suggested, without exceeding
the maximum amount indicated in the offering. This type of auction encourages more
aggressive bids from intermediaries, thus promoting lower interest rates. Auctions are
held every Tuesday at 10:00 AM; a previous announcement of the auctioneer is made on
the Friday before the auction. Banks and other financial institutions need to be granted
in order for them to participate, as well as adhering to a set of rules set by the Bank
of Mexico. After the bidding, result disclosures are published at 11:30 AM, and settled
Case study for Mexico 36
on Thursday. The following announcement was presented on newspapers in the year of
2003.
Figure 4.2: Emission of CETES in 2003. Four different bond packages are issued with
1, 3, 6 and 12 month maturities. Taken from [10]
4.2 Methodology
This methodology section develops the basic ideas behind a yield curve and why they are
relevant to our study. This was made evident in the last equation we derived for entropy,
but we will try to make it as intuitive as possible. The methodology for constructing this
curve is trivial, so the important part of our study is understanding what this curve tells
us and how it is related to entropy. Yield curves are also called term structures of interest
rates, and we will use these names indifferently. The term structure of interest rates
shows the relationship between interest rates or bond yields and different maturities. It is
an important actor of an economy as it reflects expectations of market participants about
future changes in interest rates and their assessment of monetary policy conditions. Here
we purposely used the term assessment in order to make explicit that the impression
Case study for Mexico 37
people have on the well-being of the economy is directly related to how much they value
an investment. A poor expectation of the country’s future will result in a bad valuation
of longer period bonds as they will be riskier to the consumer’s perspective.
The convexity of a yield curve is negative if we value longer period bonds with a higher
rate and are often called normal in that it represents the expected shift in yields as
maturity dates extend out in time. It is most commonly associated with positive eco-
nomic growth. If we have the opposite case, we denominate this an inverted yield curve
and this means that short term bonds would have higher rates than long period ones.
History has shown us that inversions of a yield curve have preceded many of the U.S.
recessions and due to this correlation, is is also a prediction for lower interest rates in
the future. A flat curve is the last possibility we have for our term structure of interest
rates. This would mean that investors think that interest rates will remain the same in
the future. Graphically:
Figure 4.3: Different possibilities of yield curves approximated with a logarithmic,
inverse and constant functions for normal, inverse and flat curves, respectively.
As we have discussed before, bonds prove to be the best ally we’ve got to reflect on
the dynamics of the pricing kernel. In more concrete terms and with our developed
knowledge, we know that the cash flows for bonds are fixed (ideally), so their prices
and respective yields and returns are function of nothing else than the pricing kernel.
Our mathematical knowledge and the proofs we have developed in this document prove
this pricing kernel must exist, but it can not directly be observed. We can think of the
pricing of a bond as a “reverse engineering” activity, in which properties of the pricing
kernel are inferred from their prices. To construct the yield curve, we take the arithmetic
mean for all reported rates for each term and plot them in an increasing period fashion.
Case study for Mexico 38
4.3 Results
We present our results for CETES at 28, 91, 182 and 364 maturities:
Table 4.1: This table shows descriptive statistics generated for the mentioned
database. This is a daily register of the rates for bonds since August 14, 1995 to
September 26, 2016, when this data was queried. Some rows had no information for all
the terms so we reduced the count to only the days that had the four rates reported
reaching a sample size of 4875 days.
Descriptive statistics for CETES rates (1995-2016)
Term 28 days 91 days 182 days 364 days
Mean 0.108600401 0.11308304 0.115932446 0.118480772
Standard Error 0.001362825 0.001417789 0.001431194 0.001443432
Median 0.073007 0.0745 0.075683 0.076
Mode 0.071 0.0725 0.075 0.0755
Standard Deviation 0.095154072 0.098991742 0.099927666 0.100782151
Sample Variance 0.009054297 0.009799365 0.009985538 0.010157042
Kurtosis 3.419469465 2.904737779 2.441350969 2.203240772
Skewness 1.821707186 1.73184007 1.64887389 1.599187532
The data shows a logarithmic tendency. The dotted line is a logarithmic fit which results
in R(P) = 0.0039 log(P) + 0.0957, where R is the expected rate as a function of term
P, with a correlation coefficient R2 = 0.99971.
Figure 4.4: We observe a clear logarithmic tendency for the data. The equation is
presented in the graphic along with the respective error bars for each average.
Case study for Mexico 39
Remembering the familiar time horizon for bond yields we derived from the last section,
we have:
H(n) = −E[yn
− y1
]. (4.1)
In the independent and identically distributed case, we will have that H(n) = 0 and
we would expect to have a flat yield curve. Just as we anticipated this is (thankfully)
not the case. If the mean yield curve slopes upwards, then H(n) is negative and slopes
downward. We have shown that the average slopes of yield curves are mirrored by the
behavior of entropy over different time horizons. In the particular case of bonds, it seems
that a feature is the convergence of forward rates to a constant value.
Figure 4.5: We can observe that for each time horizon between terms we have a differ-
ent average slope. This means that pricing kernels are not independent and identically
distributed and entropy varies with the investment horizon. As with entropy, we can
infer its magnitude from asset prices: negative horizon dependence is associated with
an increasing mean yield curve and positive mean yield spreads.
Case study for Mexico 40
4.4 Conclusions and future work
It has been a long journey since we first presented the basic pricing equation to where we
stand right now. We have been able to achieve what we hope is an intuitive perspective
of asset pricing, from its very foundations to a practical example for our country’s debt
market. A lot of work remains to be done as we have just uncovered the tip of the
iceberg. The first and most natural step to consider is to expand this analysis to other
financial instruments and examine the possibility of measuring entropy. An interesting
activity would be to compare the horizon dependence of the national IPC index and
to that of bonds, in order to sort out risk premiums and validate the upper bound we
talked about in last chapter.
While examining references for the realization of this document, a more recent work
regarding the concept of coentropy as a measure of dependence between variables was
found. Looking at this article could take us further in our study of asset pricing and
probably give us a better grasp as to how we can use these powerful tools for proper
valuation of bonds. As for the theoretical part of this work, we believe there is always
room for improvement when it comes to making a formal proof of a mathematical con-
cept. Overall, this semester long experience has proved to be challenging and motivating
enough to make the author seriously consider taking a turn towards getting a graduate
degree in economic theory, which is probably the biggest and most ambitious future
work this endeavour has left pending.
This work is probably (and also hopefully) the last work the author will present during
his undergraduate degree. If one can be allowed some sentimentality, I want to thank
every single person who has been there for me during this incredible period. Like the
pricing kernel, I am just the multiplication of each of your individual efforts propelling
me forward.
Bibliography
[1] Thomas S. Kuhn. Die Stuktur wissenschaftlicher Revolution. Suhrkamp, 1967.
[2] Urban J. Jermann Fernando Alvarez. Using asset prices to measure the persistence
of the marginal utility of wealth. Econometrica, 73(6):1977–2016, 2005. ISSN
00129682, 14680262. URL http://www.jstor.org/stable/3598756.
[3] David Backus, Mikhail Chernov, and Stanley Zin. Sources of Entropy in Represen-
tative Agent Models. 2011. doi: 10.3386/w17219.
[4] J.H. Cochrane. Asset Pricing: (Revised Edition). Princeton University Press, 2009.
ISBN 9781400829132.
[5] Elias M. Stein and Rami Shakarchi. Functional analysis: introduction to further
topics in analysis. Princeton University Press, 2011.
[6] California Institute of Technology. Notes on functional analysis. URL http://
people.hss.caltech.edu/~kcb/notes/separatinghyperplane.pdf.
[7] Harold Gordon Eggleston. Convexity. Cambridge Tracts in Mathematics and Math-
ematical Physics, 1958.
[8] Yoshia Kosaku. Functional analysis. Springer-Verlag, 1980.
[9] Jon Lee. A first course in linear programming. Cambridge University Press, 2004.
[10] Banco de M´exico. URL http://www.banxico.org.mx/.
41

More Related Content

Viewers also liked

Serrano edição 161
Serrano edição 161Serrano edição 161
Serrano edição 161
Ecos Alcântaras
 
Banking and insurance
Banking and insurance Banking and insurance
Banking and insurance
Shweta Rawat
 
CDMP II Final Report for Australia DFAT June 30 2015 Low Res
CDMP II Final Report for Australia DFAT June 30 2015 Low ResCDMP II Final Report for Australia DFAT June 30 2015 Low Res
CDMP II Final Report for Australia DFAT June 30 2015 Low Res
Md. Abdul Quayyum
 
Cw mar scene_v1
Cw mar scene_v1Cw mar scene_v1
Cw mar scene_v1
CalumetPress
 
EE551_HW6
EE551_HW6EE551_HW6
EE551_HW6
Spencer Minder
 
Motos mas vendidas en colombia
Motos mas vendidas en colombiaMotos mas vendidas en colombia
Motos mas vendidas en colombia
Pedroyairosorio
 
Analyze Lanka Hospitals Financial statements according to LKAS.
Analyze Lanka Hospitals Financial statements according to LKAS.Analyze Lanka Hospitals Financial statements according to LKAS.
Analyze Lanka Hospitals Financial statements according to LKAS.
Dilshan Manawadu
 
Bienvenue en France
Bienvenue en FranceBienvenue en France
Bienvenue en France
yannickayamonte
 
Redis - Usability and Use Cases
Redis - Usability and Use CasesRedis - Usability and Use Cases
Redis - Usability and Use Cases
Fabrizio Farinacci
 
Amy_Turnage_Portfolio
Amy_Turnage_PortfolioAmy_Turnage_Portfolio
Amy_Turnage_Portfolio
Amy Turnage
 
Presentación de innovación completa karen cruz y karen francisco
Presentación de innovación completa karen cruz y karen franciscoPresentación de innovación completa karen cruz y karen francisco
Presentación de innovación completa karen cruz y karen francisco
karen cruz
 
Cambria Gallery
Cambria GalleryCambria Gallery
Cambria Gallery
Darrin Maxwell
 

Viewers also liked (13)

Serrano edição 161
Serrano edição 161Serrano edição 161
Serrano edição 161
 
Banking and insurance
Banking and insurance Banking and insurance
Banking and insurance
 
WHMIS Level II Training
WHMIS Level II TrainingWHMIS Level II Training
WHMIS Level II Training
 
CDMP II Final Report for Australia DFAT June 30 2015 Low Res
CDMP II Final Report for Australia DFAT June 30 2015 Low ResCDMP II Final Report for Australia DFAT June 30 2015 Low Res
CDMP II Final Report for Australia DFAT June 30 2015 Low Res
 
Cw mar scene_v1
Cw mar scene_v1Cw mar scene_v1
Cw mar scene_v1
 
EE551_HW6
EE551_HW6EE551_HW6
EE551_HW6
 
Motos mas vendidas en colombia
Motos mas vendidas en colombiaMotos mas vendidas en colombia
Motos mas vendidas en colombia
 
Analyze Lanka Hospitals Financial statements according to LKAS.
Analyze Lanka Hospitals Financial statements according to LKAS.Analyze Lanka Hospitals Financial statements according to LKAS.
Analyze Lanka Hospitals Financial statements according to LKAS.
 
Bienvenue en France
Bienvenue en FranceBienvenue en France
Bienvenue en France
 
Redis - Usability and Use Cases
Redis - Usability and Use CasesRedis - Usability and Use Cases
Redis - Usability and Use Cases
 
Amy_Turnage_Portfolio
Amy_Turnage_PortfolioAmy_Turnage_Portfolio
Amy_Turnage_Portfolio
 
Presentación de innovación completa karen cruz y karen francisco
Presentación de innovación completa karen cruz y karen franciscoPresentación de innovación completa karen cruz y karen francisco
Presentación de innovación completa karen cruz y karen francisco
 
Cambria Gallery
Cambria GalleryCambria Gallery
Cambria Gallery
 

Similar to ASE_III FINAL

How Physics Became a Blind Science_Crimson Publishers
How Physics Became a Blind Science_Crimson PublishersHow Physics Became a Blind Science_Crimson Publishers
How Physics Became a Blind Science_Crimson Publishers
CrimsonPublishersRDMS
 
La nuova critica_63-64_scientific_models
La nuova critica_63-64_scientific_modelsLa nuova critica_63-64_scientific_models
La nuova critica_63-64_scientific_models
chebichev
 
The nature and future of econophysics
The nature and future of econophysicsThe nature and future of econophysics
The nature and future of econophysics
Institute of Technology Telkom
 
Population Dynamics and Nonlinearities in Economic Systems
Population Dynamics and Nonlinearities in Economic Systems        Population Dynamics and Nonlinearities in Economic Systems
Population Dynamics and Nonlinearities in Economic Systems
Edward Hugh
 
COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, VOL. XIII, 001.docx
COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, VOL. XIII, 001.docxCOMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, VOL. XIII, 001.docx
COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, VOL. XIII, 001.docx
pickersgillkayne
 
The Entropy Law and the Economic Process
The Entropy Law and the Economic ProcessThe Entropy Law and the Economic Process
The Entropy Law and the Economic Process
João Soares
 
Essentials of physical economics
Essentials of physical economicsEssentials of physical economics
Essentials of physical economics
Albert Bernard Jansen, eMBA
 
Why anything rather than nothing? The answer of quantum mechnaics
Why anything rather than nothing? The answer of quantum mechnaicsWhy anything rather than nothing? The answer of quantum mechnaics
Why anything rather than nothing? The answer of quantum mechnaics
Vasil Penchev
 
Complexity and Scenario Planning
Complexity and Scenario PlanningComplexity and Scenario Planning
Complexity and Scenario Planning
Kan Yuenyong
 
Project
ProjectProject
Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016
Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016
Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016
wpfisherjr
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
Elisa Bellotti
 
Chapter 3: What is Science?
Chapter 3: What is Science?Chapter 3: What is Science?
Chapter 3: What is Science?
Douglas Arndt
 
A Philosophical Essay On Probabilities
A Philosophical Essay On ProbabilitiesA Philosophical Essay On Probabilities
A Philosophical Essay On Probabilities
Cheap Paper Writing Service
 
Teoria das supercordas
Teoria das supercordasTeoria das supercordas
Teoria das supercordas
XequeMateShannon
 
Econforrealpeople
EconforrealpeopleEconforrealpeople
Econforrealpeople
Carlito Malvar Ong
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Science and Technology Studies presentation
Science and Technology Studies presentationScience and Technology Studies presentation
Science and Technology Studies presentation
Tori Roggen
 
Where does insight come from
Where does insight come fromWhere does insight come from
Where does insight come from
pvhead123
 
Simon, Herbert A. (1969). The Science Of The Artificial.
Simon, Herbert A. (1969). The Science Of The Artificial.Simon, Herbert A. (1969). The Science Of The Artificial.
Simon, Herbert A. (1969). The Science Of The Artificial.
Robert Louis Stevenson
 

Similar to ASE_III FINAL (20)

How Physics Became a Blind Science_Crimson Publishers
How Physics Became a Blind Science_Crimson PublishersHow Physics Became a Blind Science_Crimson Publishers
How Physics Became a Blind Science_Crimson Publishers
 
La nuova critica_63-64_scientific_models
La nuova critica_63-64_scientific_modelsLa nuova critica_63-64_scientific_models
La nuova critica_63-64_scientific_models
 
The nature and future of econophysics
The nature and future of econophysicsThe nature and future of econophysics
The nature and future of econophysics
 
Population Dynamics and Nonlinearities in Economic Systems
Population Dynamics and Nonlinearities in Economic Systems        Population Dynamics and Nonlinearities in Economic Systems
Population Dynamics and Nonlinearities in Economic Systems
 
COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, VOL. XIII, 001.docx
COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, VOL. XIII, 001.docxCOMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, VOL. XIII, 001.docx
COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, VOL. XIII, 001.docx
 
The Entropy Law and the Economic Process
The Entropy Law and the Economic ProcessThe Entropy Law and the Economic Process
The Entropy Law and the Economic Process
 
Essentials of physical economics
Essentials of physical economicsEssentials of physical economics
Essentials of physical economics
 
Why anything rather than nothing? The answer of quantum mechnaics
Why anything rather than nothing? The answer of quantum mechnaicsWhy anything rather than nothing? The answer of quantum mechnaics
Why anything rather than nothing? The answer of quantum mechnaics
 
Complexity and Scenario Planning
Complexity and Scenario PlanningComplexity and Scenario Planning
Complexity and Scenario Planning
 
Project
ProjectProject
Project
 
Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016
Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016
Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
 
Chapter 3: What is Science?
Chapter 3: What is Science?Chapter 3: What is Science?
Chapter 3: What is Science?
 
A Philosophical Essay On Probabilities
A Philosophical Essay On ProbabilitiesA Philosophical Essay On Probabilities
A Philosophical Essay On Probabilities
 
Teoria das supercordas
Teoria das supercordasTeoria das supercordas
Teoria das supercordas
 
Econforrealpeople
EconforrealpeopleEconforrealpeople
Econforrealpeople
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Science and Technology Studies presentation
Science and Technology Studies presentationScience and Technology Studies presentation
Science and Technology Studies presentation
 
Where does insight come from
Where does insight come fromWhere does insight come from
Where does insight come from
 
Simon, Herbert A. (1969). The Science Of The Artificial.
Simon, Herbert A. (1969). The Science Of The Artificial.Simon, Herbert A. (1969). The Science Of The Artificial.
Simon, Herbert A. (1969). The Science Of The Artificial.
 

ASE_III FINAL

  • 1. UNIVERSIDAD IBEROAMERICANA Entropy in the Mexican capital market by Wasim Alexis Mobayed Davids A work submitted in partial fulfillment for the degree of Bachelor in Engineering Physics in the Universidad Iberoamericana Departamento de F´ısica y Matem´aticas November 2016
  • 2. “There is no useful information contained in historical price movement of securities.” Louis Bachelier “If you are a good economist, a virtuous economist, you are reborn as a physicist. But if you are an evil, wicked economist, you are reborn as a sociologist.” Paul Krugman “It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you are. If it doesn’t agree with experiment, it’s wrong.” Richard Feynman Figure 1: It always seems to lead back to math . . . Taken from xkcd webcomics http://bit.ly/2fyuJC7
  • 3. UNIVERSIDAD IBEROAMERICANA Abstract Universidad Iberoamericana Departamento de F´ısica y Matem´aticas Bachelor in Engineering Physics by Wasim Alexis Mobayed Davids We present a case study for the Mexican debt market using entropy, a concept recently developed in the context of asset pricing, for the first time in (revised) literature. The scope of this work is not only to make an empiric exercise, but rather to provide an intuitive and understandable review of asset pricing, starting from the basic pricing equation and developing it further to understand the underlying ideas behind asset pricing theory. Additionally, an alternative proof of the existence of a state-price vector and the absence of arbitrage as its consequence is presented using functional analysis. Our results are consistent with the theory developed by Backus, Chernov, & Zin as we have found that data successfully describes a negative horizon dependence curve. Being a work submitted in partial fulfillment for the degree of bachelor in engineering physics, this document also presents the author’s acquired learning of financial theory, attempting to find parallelism with the physical sciences each time there is room for comparison.
  • 4. Acknowledgements I would like to thank my advisor, Dr. Jos´e Miguel Torres for all his valuable help during the realization of this work as well as my examiners, Dr. Alfredo Sandoval Villalbazo and Dr. Carlos Ponzio. As a general comment and clarification towards this work, I would like to mention that as a physics student I’ve always been fascinated by the way that human behavior and social sciences can be so precisely reproduced through mathematical models. After taking a stochastic calculus course with Dr. Torres in 2016, I started discovering the great existing parallelisms and similarities between the intricate mathematical methods used in theoretical physics and the formal foundations of financial and economic theory. This led me to develop a great interest in subjects that had been absolutely foreign to me during my undergraduate formation. Thus, I would also like to thank Dr. Torres for introducing me to what might as well be the line of studies I would like to pursue in my graduate degree. iii
  • 5. Contents Abstract ii Acknowledgements iii Preface v 1 Introduction 1 1.1 Brief introduction to asset pricing theory . . . . . . . . . . . . . . . . . . 2 1.2 The stochastic discount factor . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Dynamic asset pricing 8 2.1 Elements of functional analysis . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Distances and convex sets . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 The separating hyperplane theorem . . . . . . . . . . . . . . . . . 11 2.1.3 Farkas’ lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Arbitrage and state prices . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3 Entropy in finance 21 3.1 Risk-neutral vs real (physical) probabilities . . . . . . . . . . . . . . . . . 22 3.2 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4 Case study for Mexico 33 4.1 General overview of the Mexican treasury bills . . . . . . . . . . . . . . . 35 4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.4 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . . . 40 Bibliography 41 iv
  • 6. Preface The scientific method may solely be the greatest invention ever created by mankind. Following its simple and elegant recipe we have been able to explore the intricacies and whimsical nature of the genetic code to the unnatural and counter-intuitive qualities of relativistic phenomena. Its importance is evidenced through its results, as it has proved to be tremendously successful in fields as (similarly) unrelated to the exact sciences such as psychology and sociology, just to name a few. In essence, the scientific method is based on the measurable characteristics of a studied event, and, as so, anything that is prone to be numerically quantified can be scrutinized under the scope of the scientific method. The quintessential example of the direct applicability of the scientific method must be physics. Do not take my word for it (after all, I may be biased towards giving this statement) but a long tradition of scientific philosophy may ascertain my claim. Rare as it may seem, physics and mathematics have only existed as a commonly referred marriage since a single-digit number of centuries. The early physical sciences thrived with the empiricism that modern day science seems to lack more and more in our times. Thomas Kuhn proposed that science evolves through the evolution of the underlying assumptions we make about the world when studying a problem, as he explained in The Structure of Scientific Revolutions [1]. Against all conventional wisdom, Kuhn postulated that the history of science is characterized by a progression of paradigm shifts, in which scientists’ very understanding of the nature of science changes. Not only does this Kuhnian theory of scientific revolutions describe such events as the discovery of atoms, but it has also been alleged to apply to mathematical developments such as the process of the legitimization of irrational numbers. One major scientific revolution was led (arguably) by Isaac Newton with the invention of calculus and its direct application to the physical sciences. Even in those early days, economists started using those informal v
  • 7. Preface vi notions of mathematics to try to explain social phenomena. Gottfried Achenwall, a german jurist and economist is considered to be one of the fathers of statistics as he was one of the first social scientists to incorporate rigorous mathematical analysis to his work. After all, it is not a coincidence that statistics, being one of the pillars of modern day science has its etymology in the word state. Economics and finance soon became as intertwined with mathematics as physics once did, leading to an extremely successful partnership from which powerful results have arisen. Parallelisms between physics and economics are evident for anyone who studies one of these fields and picks up a book of the other. It is well known that Louis Bachelier was the first person to model the stochastic process now referred to as Brownian motion as part of his PhD thesis concerning the theory of speculation. Some years later, a well-known physicist by the name of Albert Einstein published On the Motion of Small Particles Suspended in a Stationary Liquid, as Required by the Molecular Kinetic Theory of Heat, and wrongly given the credit of being the first solver of this enigma. This story was not mentioned as a mere distraction from the topic we try to deal with in this work, but rather to exemplify how these disciplines bring each other forward more than specialists tend to recognize. Physicists and economists have both incurred in great mathematical efforts for the sake of generality, and have met more than once down the road of formalism. As an attempt to become part of this growing collaboration, the author presents this work to contribute to the best of his knowledge, limited as it still may be . . .
  • 8. Chapter 1 Introduction The structure of this work will be divided into four basic parts, the first of which is this introductory chapter that can be skipped by the impatient reader who is familiar with the basic asset pricing theory that will be covered. The second part of the text is a formal proof of the existence of state-price vectors as well as other important issues in finance using functional analysis. After this, a case study of the Mexican capital market will be developed based on the concept of entropy developed by ´Alvarez & Jermann [2], and Backus, Chernov & Zin [3]. Chapter 3 will recover the idea of entropy developed by the authors and chapter 4 will analyze real bond prices from an extensive database to provide an empirical link to the theory. Because this work is to be submitted as partial fulfillment for a Physics Engineering degree, the reader might more commonly be familiar with scientific terminology rather than that of finance. Because of this, constant references to physical parallelisms will be made each time there is room for comparison, and basic finance and economics principles will be explained thoroughly. With this said, we will proceed to develop a basic introduction to asset pricing the- ory based on John Cochrane’s premise which states that the basic pricing equation is principle from which all special cases in valuation are derived. For a more detailed ex- planation of these circumstances, the reader is invited to consult Asset pricing [4] by the aforementioned physicist/economist. 1
  • 9. Introduction 2 1.1 Brief introduction to asset pricing theory The basic question asset pricing theory tries to solve is simple to state: how much is an asset worth? By asset we understand a resource with economic value that an individual or corporation owns with the expectation that it will provide a future benefit. The really attractive side of handling this theory is its wide applicability; whether we want to examine the price of an option, stock, bond or any other investment, asset pricing theory proves to be a powerful ally. One intuitive hypothesis we could make about the price of an asset is that it is directly related to the expected payoff it produces. However, uncertainty of payoffs poses a risk that should be taken into account when pricing it, since not all assets are submitted to the same risk of default. We also have to account for the delay of the payment, since the investor incurs in greater risk by waiting longer for its maturity. All of these intuitions are condensed quite precisely in the basic pricing equation as it will hopefully become evident in this introductory chapter. In its simplest form, it reads: P = E[mx]. (1.1) That is, the price P is the expected value of the payoff x discount by a random variable m called the stochastic discount factor. We will proceed to understand this equation based on a consumption pricing model derived from simple economic principles. The intuition for this can be understood by examining the following examples: • For a stock: We pay the price of a stock today at a price Pt, then the payoff will be tomorrow’s price plus a dividend. That is: Pt → xt+1 = Pt+1 + dt+1. (1.2) • For a bond: Supposing we get a dollar for our investment at a discounted price P, then we would have: Pt → 1. (1.3) Alternatively, we could invest that dollar and get the risk-free rate.
  • 10. Introduction 3 • For a bet: Suppose you make a bet with your buddies on the next football game. The entry price would be at P = 0 since no money is put down, and you could either win or lose the settled quantity depending on the outcome of the match. Pt = 0 → {0, 1}. (1.4) All of these cases have one common characteristic: the randomness of payoffs (except for bonds, of course). Thus, we have to introduce the term state-price security also known as an Arrow-Debreu security. We can understand these securities as contracts that agree to pay a specific payoff if a particular state occurs at a particular time in the future and pays zero payoff in all the other states. The price of this security is the state price of this particular state of the world, which may be represented by a vector. All of the possible outcomes constitute a state space in which these vectors live. From a physicist’s point of view, this sounds an awful lot like the state space used in quantum mechanics. We will further examine these notions and similarities in the next chapter when proving the existence of state-price vectors and its consequences. Now we try to find what the value of a payoff is worth to an investor. For this, we need a convenient mathematical framework to capture the desire of the potential stakeholder. Since we are evaluating what an investor is willing to pay to get a random payoff, our function needs to capture two evident facts about human behavior: people prefer money now and at a lower risk. This leads us to what economists call the utility function. Utility measures welfare or satisfaction of a consumer as a function of consumption of goods. Assumptions of rationality of individual actors need to be made when modeling this function, leading to a somewhat biased form of our hypothesis. These functions can be as complex as the problem needs, but for our purposes we will model investors by a utility function defined over current and future values of consumption as shown: U(ct, ct+1) = u(ct) + βEt[u(ct+1)], (1.5) where ct is consumption today and the second term is the expected utility of consumption tomorrow discounted by some number. The parameter β present in the second term is called the subjective discount factor (not to be confused with the stochastic discount factor), which typically takes the value of β ∼ 0.95 at an annual basis.
  • 11. Introduction 4 The internal utility function needs to capture the fact that satisfaction of the investor is clearly related with more consumption, but also denote that this rise in utility will be at a declining rate. To put it in plain words: your third pizza slice is not as satisfying as your first; this is called the law of diminishing marginal utility. It implies our function needs to asymptote as it reaches infinity and accordingly, reach low and even negative values as consumption hits a low number. For this, a power function (also known as the isoelastic function for utility) is proposed with the following form: u(c) = c1−γ − 1 1 − γ . (1.6) The γ parameter is a measure of risk aversion which will become clearer once we analyze the form of this function. Another important thing to analyze is the derivative of the utility function with respect to consumption. This is called marginal utility and it measures how much utility changes with consumption, diminishing as consumption increases as we would expect. To visualize these quantities, we will set γ to 1 using l’Hˆopital’s rule to get: lim γ→1 u(c) = log(c), u (c) = c−γ = c−1 . (1.7) To visualize these functions: Figure 1.1: Utility (blue) and marginal utility (orange) plots. We can clearly see that the concavity of the logarithmic function is compliant with our necessity of utility to asymptote when consumption increases. Marginal utility decreases with growing consumption, as expected.
  • 12. Introduction 5 As mentioned before, γ is a parameter of risk aversion which can be clearly seen by examining the shape of the utility function. To exemplify: if we have a consumption set at a value for ¯c with its respective utility and then take a bet, we can either win or lose units of some good in a ∆ interval. Suppose the bet makes us gain or lose the same amount and the two states are equally probable, like a coin toss. Should we win, following the law of decreasing marginal utility, we can see that our increase of utility would not be as big as the decrease for losing. If we examine this notion further, we realize that the expected value of the utility will be lower than the utility of the original ¯c if we decide to take that bet. That is: E[u(¯c ± ∆)] < u(¯c). (1.8) This means the consumer is made less well-off if he is forced to take the bet. Funda- mentally, what this reveals is that people dislike losses more than they value gains. We are capturing this feature of human psychology through the concavity of the function, which is regulated by parameter γ. Now we have a powerful mathematical form that lets us separate the two crucial items expressed in the basic pricing equation. The dis- count factor β tells us how much people dislike delay, giving more or less value to what happens in the future. A larger value for gamma makes the utility function more curved symbolizing a greater risk aversion. We must bear in mind that the objective of any investor is to maximize his utility with whatever resources he has. As such, we will let our investor buy or sell his payoffs xt+1 at a price pt. The amount he decides to buy or sell will be determined by his budget constraint. Ideally, if he decides to buy ξ securities today, this investment should at least represent a payoff tomorrow that would equal the utility he had before losing the price of the assets. In other words, pay some today and lose the price of the security, but in exchange get a little bit more of the payoff tomorrow. The optimal investment would be the one in which the marginal cost of losing the consumption today equals the marginal benefit of getting the payoff tomorrow. The problem is then translated to: max ξ u(ct − ξPt) + βEt[u(ct+1 + ξxt+1)]. (1.9)
  • 13. Introduction 6 By setting the derivative with respect to ξ to zero we obtain the first-order condition for an optimal consumption choice: ptu (ct) = βEt[u (ct+1)xt+1]. (1.10) Re-arranging this expression, we solve for the price leaving us one step from getting to the familiar expression of the pricing equation we presented earlier: pt = Et β u (ct+1) u (ct) xt+1 (1.11) If we group the terms multiplying the payoff in the last equation, we obtain our first expression for the pricing equation (3.3): m = β u (ct+1) u (ct) P = E[mx] (1.12) Most of the theory in asset pricing is derived from this equation in which the term we designated as m is called the stochastic discount factor. 1.2 The stochastic discount factor A further examination of the discount factor’s form will reveal where it gets its stochastic attribute from. Not only does this term capture the two psychological behaviors we studied in the last section, but it also brings an important element into discussion: uncertainty. The payoffs definitely bring this challenge to the table, but we can also see that the discount factor is a function of tomorrow’s utility, which is of course uncertain as it is directly related to tomorrow’s consumption. As an example, the discount factor for the power utility function we’ve been using would take the following form: mu (c)=c−γ = β ct+1 ct −γ . (1.13)
  • 14. Introduction 7 An important thing worth noting at this point is that this model holds only for valuing assets after the investment has been made. This equation characterizes how a consumer with discount factor β and risk aversion γ feels about value once he got the payoff. This chicken and the egg dilemma is solved by taking the perspective that the investor looks at the payoffs and prices, and adjusts his consumption until it lines up with the price-payoff equilibrium. The SDF generalizes on the intuition we discussed earlier about the price of an asset being related to its payoffs to any security, no matter what they are. If we bought a risk-free asset, its price in an uncertainty free situation would be: pt = 1 Rf xt+1, (1.14) where Rf is the gross risk-free rate. Assuming this rate is greater than one, this would turn the 1/Rf factor into a discount factor and we would say the asset sells at a discount. This is the common case for treasury bonds such as T-bills or Mexican federal treasury certificates, commonly referred to as CETES. Riskier assets will have lower prices than their risk-free counterpart so we have to dis- count them further with a risk-adjusted discount factor. The pricing equation generalizes this notion to be used in all of the classical issues of the theory of finance, which tells us something transcendental about the stochastic discount factor: we can incorporate all risk corrections into a single and unique discount factor. We will further study this factor in the next chapter to understand it as a crucial element that safeguards the law of one price.
  • 15. Chapter 2 Dynamic asset pricing In this chapter we look more closely at the stochastic discount factor, state-price vectors and their implications. Through this close examination we will prove that the approach we explained previously is valid making no assumptions about the utility functions or the completeness of markets (a complete market is based on the premise that all Arrow- Debreu securities are available). As we anticipated at the end of the introductory chapter, the existence of a unique discount factor is basically a re-statement of the law of one price or arbitrage absence. By this law, if two asset portfolios have the same payoffs and risks, then they must have the same price. Violation of this basic principle would present the opportunity of an arbitrage profit where an investor could sell a portfolio at an expensive price and then buy the same portfolio at a cheap price. That is, the price of a happy meal should be the sum of the individual prices of a small burger, soda, fries, and a toy. Hedgers and traders are constantly looking for arbitrage opportunities, but the proliferation of high-speed computing and information technology are making arbitrage become extinct in financial markets. As it is often said in finance: there is no such thing as a free lunch. It is quite intuitive to think that if the law of one price holds, then the stochastic discount factor must be unique. Otherwise, following the basic pricing equation P = E[mx], two assets with the same payoff could have different market prices. As we will see in the following section of this text, we can only guarantee the absence of arbitrage if and only there is a state-price vector. The reader is invited to consult Functional Analysis [5] by 8
  • 16. Dynamic asset pricing 9 Stein & Shakarchi for deeper understanding of the mathematical framework used in the following sections. 2.1 Elements of functional analysis In this section we will cover the basic elements in functional analysis that will be used to give a theoretical foundation to this chapter. In some measure, these will be used to prove the fundamental theorem of asset pricing and justify the need for a state-price vector. The proof presented in the next section is mainly based on convex analysis, in particular the separating hyperplane theorem and Farkas’ lemma. Gyula Farkas was a Hungarian mathematician of the late nineteenth-century. Not much is known about him other than his work and the fact that he practiced both law and mathematics; another proof of the healthy coexistence of social and exact sciences. Most of the concepts developed throughout this chapter are inspired by Stein & Shakarchi [5] or the author’s personal notes from his topology courses in undergraduate school (credit and gratitude to Dr. Edmundo Palacios). We will be using matrix notation to indicate sets, and all vectors we define in the following pages will be considered column vectors unless stated otherwise. It is intended for all of these proofs to be as intuitive and easy-reading as plausible, so please excuse the author from lacking mathematical rigor or seriousness at times. It’s all good fun and with the best intentions in mind. 2.1.1 Distances and convex sets Dealing with distances and closeness between point is the first element that needs to be covered in order to give consistency to our mathematical framework. For convenience, we will be working with a Euclidean space equipped with a continuous norm. For closed sets, it can be proved that there always exists a nearest point of a given point. We will not go on and prove this since it is beyond the scope of this work but the basic idea behind showing closeness of a set would require us to prove that the complement of this set is open. This can be done by finding a ball with radius centered in u ∈ Rc such that B (u, ) ⊂ Rc. That is, the ball is not contained in the complement of our space R, so by definition, that makes it a closed set. The existence of the nearest point is necessary, however, it need not be unique; an interesting feature of convex sets.
  • 17. Dynamic asset pricing 10 In a Euclidean space, a convex set is a topology such that, for every pair of points within it lies a straight line within the region. To give an example, a solid volume like a sphere or a cube is convex, but any other shape with an indent or hollow space inside it is not. To prove convexity of a set C we need to verify that λx + (1 − λ) y ∈ C and λ ∈ [0, 1]. That is, a set is convex if ∀ x, y ∈ C, C contains the line segment between x and y. Graphically: Figure 2.1: Graphical representation of a convex and non-convex set. Another important element related to convexity that we will need to define for the sake of the following theorems is the concept of a convex cone. A convex cone is simply a convex set that is closed under linear combinations with positive coefficients. That is, a cone C is a convex one if αx + βy ∈ C ∀ {x, y ∈ C : α, β ∈ R++}. Figure 2.2: A convex cone (light blue). Inside of it, the light red convex cone consists of all points αx + βy with α, β > 0, for the depicted x and y. Both regions extend infinitely in upper right direction.
  • 18. Dynamic asset pricing 11 2.1.2 The separating hyperplane theorem A hyperplane generalizes on the notion of the classical plane we graph in R3, but much of the way they are constructed remind us of our first lesson of multivariate calculus where we conceived them as the dot product of two perpendicular vectors. More formally, we can define it as a set H ⊆ Rn that satisfies H = x ∈ Rn : a†x = α for any nonzero vector a and α ∈ R where the vector a is a normal vector to the hyperplane. We can define a half-space as either of the sides our hyperplane divides the space Rn into, which satisfy: H+ a,α = x ∈ Rn : a† x α H− a,α = x ∈ Rn : a† x α (2.1) As such, hyperplanes can be used to separate two sets that can reside on either of the half-spaces created by the presence of these objects. Consider two sets A and B in Rn. It is said that the hyperplane Ha,α strongly separates A and B if there is an > 0 such that A ⊆ H+ a,α+ and B ⊆ H− a,α− or vice-versa, and neither intersect the plane at any point. As the reader may imagine, not all sets can be separated by a hyperplane, and this is why we defined convexity previously. We will prove that convex sets can always be separated by a hyperplane via the separating hyperplane theorem. This theorem states: Theorem: Let C be a convex set such that C ⊆ Rn and z ∈ Rn but z /∈ C, then C and z can be strongly separated. Figure 2.3: Figure 1 is a convex set like we defined earlier; figure 2, however, is not. Taken from Caltech’s notes on functional analysis [6].
  • 19. Dynamic asset pricing 12 Proof (based on Stein & Shakarchi [5] and Eggleston [7]): First let C ⊆ Rn and z ∈ Rn, z /∈ C as expressed before in the statement of the theorem. Now let p be the unique nearest point to x in space C. By definition, since C is a convex set, the existence of p holds. Let x ∈ C and λ ∈ [0, 1]. Again, using an argument of convexity, we have that (1 − λ)p + λx ∈ C, and with p being the nearest point, so it should follow that: (1 − λ)p + λx − z p − z , i.e., (p − z) + λ(x − p) p − z . (2.2) Equation (3.3) is nothing else than a re-statement of the triangle inequality for the points inside the set C and the one outside of it. The intuition behind the inequality we want to reach is one that makes explicit that the points involved lie on separate half-spaces like the ones we have defined before. By calculating the inner products of the differences and squaring both sides we obtain: p − z 2 + 2λ(p − z)† (x − p) + λ2 x − p 2 p − z 2 . (2.3) Finally, we subtract the term p−z 2, which now allows us to divide by λ (with λ → 0+), and multiply by -1 to obtain the inverse inequality: (z − p)† (x − p) 0 ∀x ∈ C. (2.4) If we now consider a hyperplane H which contains p with normal vector a = z − p, as such H = x ∈ Rn : a†x = α, α = a†p , equation (2.4) shows that C ⊆ H− a,α, and, moreover, that z /∈ H− a,α as z = p (because z /∈ C). Now we consider a second hyperplane H∗ (that is, with the same normal vector) containing the point (1/2)(z + p), then it is clear that H∗ separates z and C as desired. 2.1.3 Farkas’ lemma Making use of all the concepts we have developed before, we present the Farkas’ lemma, a direct consequence of the separating hyperplane theorem. Its primary function is to
  • 20. Dynamic asset pricing 13 determine whether a system of linear inequalities has a solution or not. This result states that a vector is either in a convex cone or that there exists a hyperplane that separates the vector from a cone. This is a strict if, as there are no other alternatives for the cone and vector. This means that given these two statements, one must have a solution, but not both nor none. Lemma: Let A be a matrix of m × n dimensions and C ∈ Rn. From this follows that exactly one of the following systems of inequalities has a solution: 1. Ax O and C† x > 0 for some x ∈ Rn . (2.5) 2. A† y = C and y O for some y ∈ Rm , (2.6) where O denotes the zero vector. Proof (based on Kosaku [8] and Lee [9]): We will denote the columns of A† by {a1, a2, ..., am}. System 2, represented by equations (2.6), will have a solution if C lies in the convex cone spanned by {a1, a2, ..., am}. On the other hand, system 1, represented by equations (3.1), will be feasible if the closed convex cone {x : Ax O} and the open half-space x : C†x > 0 have a nonempty intersection. Now, we will suppose that system 2 is solvable. That means there exists y O such that A†y = C. Now let x be such that it complies with Ax O. By consequence C†x = y†Ax 0. This means system 1 has no solution which is consistent with the theorem. For our proof to be complete, however, we must prove that it holds when system 1 does have a solution; for this, we will suppose that system 2 has no solution and see what follows. We will now construct a set S such that S = x : x = A†y, y O . It is clear that this is a case of a convex set and that c /∈ S. For a nonempty closed convex set in Rn and y /∈ S, we can prove that there exists a nonzero vector p and a scalar α such that p†y > α and p†x α for each x ∈ S. We will not go into proving this but rather use its result to say that there must be a vector p ∈ Rn and a scalar α such that p†C > α and p†x α ∀ x ∈ S.
  • 21. Dynamic asset pricing 14 Since the zero vector is included in S, and α 0, this implies that p†C > 0. Other conditions that are also met are α p†A†y = y†Ap ∀ y O. Now, because y O can be made as large as we want, the last inequality means that Ap O. The importance behind this confusing development is that we have constructed a vector p ∈ Rn such that Ap O and C†p > 0, which means that system 1 has a solution when system 2 hasn’t. This completes the second part of the proof as we can see that the solutions of systems 1 and 2 are mutually exclusive. The geometric interpretation we provided at the beginning of this section may be more intuitive for the reader so we will provide an example following this definition. We understand Farkas’ lemma geometrically by stating that the vector C is either: 1. in the cone generated by the columns of A† (a non negative linear combination of the columns of A), or 2. there is a y ∈ Rm that makes an acute angle with the vector C and a non-acute (could be right or obtuse) angle with every column of A. After much painstaking picture editing in non-friendly editing software, we present the following graphic intepretation: Figure 2.4: This is an example for a matrix A ∈ R2×4 . In the left figure we see a hyperplane containing the origin with vector y as its normal vector. This hyperplane separates C from the cone spanned by the columns of A, labeled with an. It is clearly seen that the vector y forms an acute angle with C and obtuse angles with the column vectors of A. In the right figure we see the vector C being contained by the cone generated by the aforementioned vectors.
  • 22. Dynamic asset pricing 15 2.2 Arbitrage and state prices As anticipated, this section will deeply develop the notions of state-price vectors, arbi- trage, and their co-dependency. To do this, we will work with the concepts developed and proved earlier. Once again, we will be dealing with uncertainty as it is one of the central subjects in asset pricing, but we will treat it as state space. As such, our secu- rity’s price will be the state-price weighted sum of its payoffs in all possible states of the world. We will represent uncertainty by a finite set of states {1, 2, 3, ..., S} that define the payoff of the related asset. Referring to the physicist in all of us, we can think of this as a “limited” Hilbert space used to represent quantum states. The space for the securities will be spanned by an N × S matrix D, with each entry Di,j representing the payoff of asset i in state j. S configures our state space and N is the cardinality of the number of different assets contained in our portfolio; we will refer to D as the payoff matrix. Following these definitions, a portfolio will be constituted by a linear combination of assets. As such, the denoted portfolio θ ∈ RN and each entry will be the amount of assets i included in the portfolio. With these two, we can construct a payoff vector given by D†θ ∈ RS. The last element we need to define is the price of the portfolio, which will be computed by executing q†θ with q ∈ RN (each entry is the price for a unit of security i). An example is given to illustrate these definitions. Let’s consider a portfolio consisting of three securities with one of these being a risk-free asset in three possible states of the world. A possible payoff matrix is: Di,j =      1 0 2 1 1 1 4 1 5      ∈ RN×S = R3×3 . (2.7) As we can see, the security N = 2 produces the same payoff in any state of the world, and as such, it constitutes a risk-free asset. We will now define a price vector which tries to be consistent with the notion that RFA’s have higher prices due to the absence of uncertainty, but stocks can have higher payoffs and should be valued accordingly. If we give equal probabilities to the three states, then the price will be the expected value of the payoffs discounted by a factor that should discount its price accordingly. Since
  • 23. Dynamic asset pricing 16 this factor is stochastic as we have discussed before, we will define it arbitrarily for this example. We will discuss the choice of the discount factor to show the intuition, trying to be consistent. qi =      0.8 0.95 3.23      ∈ RN = R3 . (2.8) The price vector was constructed using the following discount factors mi = (0.8, 0.95, 0.98), which should be consistent with the payoffs of each asset. If we examine the expectation of the payoffs (weighing each state with the same probability), then we can see that securities 1 and 2 would need to have the same price to follow the law of one price. However, asset 2 is risk-free and, as such, should have a higher price than asset 1 which could default in state 2 as we can see in the payoff matrix (3.6). Similarly, asset 3 has a lower risk than its counterparts since it always pays and at a higher expectation than the others; because of this, it is penalized with a larger discount factor and turns out to be quite expensive. An investor would then look at these and make the best decision to maximize his utility as we discussed in chapter one. The idea behind a state-price vector is to generalize on the notion of the stochastic discount factor into a more general identity we will discuss later. Following the initial definitions, we will proceed to create a portfolio θ with the three securities described above. Let’s say the investor is willing to invest $ 43 in some portfolio and decides to conform it the following way: θi =      5 7 10      ∈ RN = R3 . (2.9) We will not go into the discussion if this was an optimal choice or not (it probably isn’t since he is incurring in risk by buying asset 1 which has the same expectation than 2, which is an RFA), but rather use it to compute the payoff matrix.
  • 24. Dynamic asset pricing 17 D† θ =      1 1 4 0 1 1 2 1 5           5 7 10      =      52 17 67      ∈ RS = R3 (2.10) Being an element of the state-space, this vector expresses the payoff for portfolio θ in all states of the world. As we can see, state 3 is the most favorable for the investor, as it could be clearly predicted just by looking at the payoff matrix. State 1 is almost as favorable as payment is only a unit lower for each asset than state 3. It is worth noting that given the dimensions of the matrices involved in the multiplication, the reason for the payoff matrix to belong in the state-space was interchanging the dimensions of the payoff matrix when calculating its transpose. Even when the dimensions of rows and columns agree, we treat with different elements after rotating the original matrix D. The only element that remains to be calculated is the market value of the portfolio, which can be easily computed by taking the sum of the dot products of the price of each asset and the amount of corresponding units in the portfolio: Market value = q† θ = N i=1 qi · θi = 3 i=1      0.8 0.95 3.23      ·      5 7 10      = 42.95 ∈ R (2.11) The price of portfolio θ is then $ 42.95 which falls into the $43 budget constraint pre- sented by the investor. With these concepts clarified, we can now proceed to use them as part of our analysis. With this new language, we will define arbitrage as having q†θ 0, a portfolio with negative or no cost, having D†θ > 0, a positive payoff. In other words, a portfolio offering “something for nothing.” A state-price vector ψ is a vector that satisfies the expression q = Dψ, and we can understand it as a functional relating prices with payoffs. We will try to make it evident that the ruling out of arbitrage gives way for the existence of state-price vectors. Before going into the proof, some notational clarifications will be made. For notational prices we will define: Rn + = {x ∈ Rn | x 0} , Rn ++ = {x ∈ Rn | x > 0} . (2.12)
  • 25. Dynamic asset pricing 18 Theorem: There is no arbitrage if and only there is a state-price vector. We will prove this theorem works both ways (representing arbitrage with variable A) and, as such: A ⇐⇒ ∃ψ (2.13) Proof: The first part of this proof is the left-right direction of the statement. As we will witness, proving the existence of the state-price vector as a consequence of arbitrage will be a much more complex task than proving the converse theorem. After proving the existence of this vector it will be pretty straightforward to see that absence of arbitrage directly follows from it. We will begin by stating some important notions will be used throughout this demon- stration. First, we will define the vector space containing all the elements used in this proof. Let L be: L = R × RS + ∈ RS+1 . (2.14) The familiar L character used for a normed vector space with finite dimensionality is not a coincidence for the curious physicist who is noting similarities, since this is purposely used to denote a Lebesgue space much like the well-known L2 Hilbert space used in quantum mechanics. As we defined earlier, the conditions for arbitrage to exist are that q†θ 0 while D†θ > 0. The space containing these two functions is L since the market price q†θ ∈ R and the payoff D†θ ∈ RS, so both are subsets of L. To be able to use the Farkas’ theorem as we will develop up next, we need the functionals q†θ and Dθ to be convex sets. Following the definition and proof we presented earlier, we can now define a set M = (q†θ, D†θ) : θ ∈ RN ⊆ L that we will prove to be convex. Adhering to the description of a convex set, we can say with no loss of generality that the set M is convex because elements m1, m2 ∈ M satisfy:
  • 26. Dynamic asset pricing 19 αm1 + (1 − α)m2 = α   q†θ1 D†θ1   + (1 − α)   q†θ2 D†θ2   =   q†(αθ1 + (1 − α)θ2) D†(αθ1 + (1 − α)θ2)   ∈ M. (2.15) Also, α ∈ [0, 1], ∴ M is convex. Now, because M is convex and either of the functionals q†θ, D†θ ⊆ M, then each of these are convex as well. Duffie proves the existence of a state-price vector using the Riesz representation theorem which is also used in the popular braket notation used in quantum mechanics, as it abstracts the functionals used to describe quantum states. Elegant as this proof may be, we do not think it is as intuitive as the reader might wish; this is why we opt to make use of the Farkas’ lemma we have discussed before. We adhered scrupulously to the precept of the brilliant theoretical physicist, L. Boltzmann, according to whom matters of elegance ought to be left to the tailor and to the cobbler. If we have followed the mathematical development this far, it will be clear that this resembles the first set of inequalities that the Farkas’ lemma uses to separate vectors. Our purpose will be to reach a situation in which the solution for the inequalities are mutually exclusive. To have the same form of the problem expressed in equations (3.1) and (2.6), we need to be looking for the solution of a linear equation for some vector which we expressed as y ∈ Rn ++ according to Farkas’ lemma. We can express our system of inequalities as such: 1. q† θ 0 D† θ > 0 (2.16) 2. q = Dψ ψ > 0. (2.17) Once we have fully understood the Farkas’ lemma (and proved as well for the more formal reader), it is pretty straightforward to see that, by theorem, the absence of arbitrage (the infeasibility of the first system of equations) implies that there is a solution
  • 27. Dynamic asset pricing 20 for the second system of equations. Then, this implies there is a non-negative vector ψ = {ψ1, ψ2, ..., ψS} such that the price of any asset in our portfolio is given by q = Dψ. This vector ψ is our desired state-price vector. We have proved that given a no- arbitrage condition a state-price vector exists. Now that we have ψ, it is clear to see that by the Farkas’ lemma, the system q†θ 0, D†θ > 0 has no solution. Since this system is essentially the definition of arbitrage, we can rule it out to say that ∃ψ → A. We have now proved the two directions of the theorem. Translated to financial terms, Farkas’ lemma tells us that there either is a way of assign- ing a non-negative price to a dollar in each state in a way that the price of each asset is just the sum total of the value of its payoffs, or there is a portfolio (with negative price) whose payoffs are non-negative, which means that you are “being paid to hold it.”
  • 28. Chapter 3 Entropy in finance Entropy is probably one of the most widespread concepts used in the physical sciences. It is relevant to astrophysicists and string theorists alike, but its meaning differs greatly depending on the context in which it is utilized. As a fundamental aspect of thermo- dynamics and physics, several different approaches to entropy beyond that of Clausius and Boltzmann are valid; this is probably one of the reasons of why this concept is often misinterpreted. Entropy has often been loosely associated with the amount of order or disorder (chaos) in a thermodynamic system. The classical qualitative description of entropy refers to changes in the status-quo of the system and is generally a measure of the usefulness of energy and the amount of energy wasted in some transformation from one state to another. Because of its seemingly wide application potential, it has been adopted as common scientific argot in a series of disciplines that far exceed the realm of physics, ranging information theory to economics and finance. Rather than to extrapolate the implications and meaning of this concept to other fields, its mathematical structure is what proves to be valuable to seemingly unrelated problems. Our early thermodynamics and statistical mechanics courses will tell us that the entropy of a system is the nat- ural logarithm of the number of possible configurations, multiplied by the Boltzmann constant kB. That leads us to the familiar equation: S = kB log Ω. (3.1) 21
  • 29. Entropy in finance 22 This does not mean that the interpretation of a measure of order holds in other disciplines but in essence, we do want to capture a measure of dispersion. For our purposes, we will base this section on the definition of entropy developed by Backus, Chernov, & Zin in their paper Sources of Entropy in Representative Agent Models [3]. In this work, the authors use entropy as a measure of dispersion in the pricing kernel of an asset. As we explained before, the pricing kernel is just another way of referring to the stochastic discount factor, the notion of which we generalized in the last chapter through the existence of state-price vectors. Backus, et. al. claim that excess returns are reflected in the pricing kernel’s dispersion for risky assets and through the dynamics of risk-free bond yields. The reason for the success of entropy in finance relies on the fact that it is more easily extensible to multiple periods of time than other statistical measures such as standard deviation or variance. In some sense, entropy will generalize on the concept of variance. A second reason for its applicability is that many models in asset pricing have a loglinear behavior, such as the isoleastic function we studied in chapter one. Finally, using entropy allows us to deviations from the normal distribution for pricing kernels in a simple manner. In this section we will develop the basic mathematical intuition behind entropy in finance. 3.1 Risk-neutral vs real (physical) probabilities We will try to be as consistent as we can in order to maintain notation used in previous chapters and, with this in mind, proceed to define risk-adjusted probabilities. We can construct a vector of probabilities of the form p ∈ RS + such that p1 + p2 + ... + pS = 1. We now define a state-price vector for a tuple (D, q) and let ψ0 be the sum of all possible vectors for all states such that ψ0 = ψ1 + ... + ψS. Let ˆψj = ψj/ψ0 ∀ states j. This will constitute a new vector of the form ( ˆψ1, ..., ˆψS) of probabilities for every asset i in every state j. As such, we can now rewrite q = Dψ as: qi ψ0 = ˆE(Di) ≡ S j=1 ˆψjDi,j. (3.2)
  • 30. Entropy in finance 23 We can interpret this as a “normalized price” of a security with risk-neutral probabilities. This means that if we had a portfolio θ with D†θ = (1, 1, 1, ..., 1), then ψ0 = θ · q is the discount for a risk-free investment. Then the price for any security i would be given by qi = ψ0 ˆE(Di), returning to a familiar notion of a price of an asset to be its discounted payoff with this notion of artificial probabilities incorporated. This notion will become clearer in the next exercise. Now we examine the price of a risk-free asset in which we will consider, as said before, that all the entries in matrix D are ones. Because of this, we have: qf = Df ψ = S ψS. (3.3) We can express the gross interest rate for a risk-free asset as the payoff (which we will express as 1 for convenience), over its price: Rf = 1 qf = 1 S ψS . (3.4) This equation resembles the classical present value formula in which the future value of an investment is divided by the gross-rate to obtain the current value. If we follow the notion of the normalized price for any asset i in the portfolio, and then multiply and divide equation qi = S Di,sψS by the sum of the entire state-price vector’s entries we can obtain: qi = S Di,SψS = t ψt S Di,S ψS t ψt . (3.5) If we further examine the fraction argument of the last summation over S, we can see that this is no more than an element of vector ψt divided by the sum of all its entries. This means that this number will always be positive and ranging the (0, 1] interval. We can conceive this number as a “probability” which we shall define as risk-neutral probabilities and denote P∗. If we solve for the gross interest rate expressed in equation (3.3), we can incorporate this in the price equation substituting the risk-neutral probability by P∗ S to obtain:
  • 31. Entropy in finance 24 qi = 1 Rf S Di,SP∗ S = 1 Rf E∗ [Di] (3.6) This means that the price of any asset is the “expected value” of the payoff discounted by the risk-free rate. The reason for the superscript in the expectation operator in equation (3.6) and quotation marks in the last sentence are indicating that we are not using the real probabilities for state S but rather the risk-neutral probabilities. Using these probabilities directly would mean that an investor is valuing an asset as if it were not submitted to any risk. Because of this, we need to clearly differentiate them from real or physical probabilities which we will denote by PS to distinguish from the risk-neutral analogous. To deeply understand what this probabilities try to measure we have to return to some of the basic concepts we discussed in the first chapter of this text to know why they are important for a consumer. We remember that an investor’s purpose is to maximize his utility based on consumption while reducing the risk of his investment. If we have a risk-averse investor, we have studied that his utility function has a concave form which we abstracted using the isoelastic function. However, an investor who is risk-neutral will have a utility function in which losses are equally valued to gains so his wellness will be altered in the same manner for any positive or negative change. Thus, a risk-neutral investor’s utility function is nothing but a straight line. In the next figure we will study the implications of risk-neutrality and risk-aversion as it will give us clear insight as to what entropy tries to measure. In figure 4.5, the concave function is our familiar risk averse utility function. In contrast, the straight line (red) represents a risk-neutral investment. We can see that evaluating the expected value in the utility function does not reside in our curve. However, if we find the point in which it does by translating this point leftwards, we can see that it is situated at an x value below the expected utility of the utility function. This point is called the certainty equivalent (labeled CE in the figure) and it represents the value below the expectation of the risky investment that the investor is willing to take to “buy” certainty. In the figure exhibited above we have:
  • 32. Entropy in finance 25 Figure 3.1: Graphical representation of a risk premium and its connection with en- tropy represented by letter L U(x) = x1−γ 1 − γ γ=0.9 U(CE) < U(E[x]). The difference between the utility for the expected value and the utility for the certainty equivalent is the least benefit the investor expects from taking a risk. This is expressed in a difference of wealth x signaled in the figure as a risk premium. Thus, we can relate this to what we have been studying by analyzing that if the risk-adjusted investments yield very close benefits than those of risk-neutral ones, then we might be better off holding only bonds in our portfolio. Or, are we? This is one of the questions entropy tries to solve as it tries to examine the dynamics of parameter L = U(E[x]) − U(CE), or, in other words, the relation between physical and risk-neutral probabilities. 3.2 Entropy As stated before, entropy tries to measure how much real probabilities diverge from risk-neutral probabilities. The basic framework that the definition of entropy uses is the
  • 33. Entropy in finance 26 Kullback-Leibler divergence, which quantifies how much two probability distributions vary. For discrete probability distributions P and Q, the Kullback–Leibler divergence from Q to P is defined to be: DKL(P Q) = i P(i) log P(i) Q(i) . (3.7) In words, it is the expectation of the logarithmic difference between the probabilities P and Q, where the expectation is taken using the probabilities P. Multiplying the probability by the distribution generates the expectation for each event, and in our particular case, we use it to define relative entropy, which reads: Lt P∗ t,t+n Pt,t+n = −Et log P∗ t,t+n Pt,t+n . (3.8) For the sake of the notation used in the first part of this chapter, we will remain using P∗ for risk-neutral probabilities and P for their real or physical analogous. True probability for a state in time t + 1 conditioned on time t is given by Pt,t+1 = P(xt+1|xt), hence the subscripts in the equation above. The intuition behind entropy suggests that a greater difference between true and risk-neutral probabilities should be associated with a larger risk premium. The goal for this tool is to connect the properties of excess returns to features of pricing kernels, which is nothing but a synonym for the stochastic discount factor as it will soon be clear. Risk premiums will then be associated to variability in the P∗ t,t+n/Pt,t+n ratio. For a better handling of equation (3.8), we will make use of the fact that E[P∗ t,t+n/Pt,t+n] = 1, and thus log E[P∗ t,t+n/Pt,t+n] = 0 to rewrite entropy as: Lt P∗ t,t+n Pt,t+n = log Et P∗ t,t+n Pt,t+n − Et log P∗ t,t+n Pt,t+n . (3.9) This rearrangement of the equation makes it easy to see that if the ratio is constant, it will be equal to one and the entropy will be zero. Because of the logarithmic argument, we can see that the function tells us that entropy can not take negative values and that it increases with variability on the ratio. These are the basic characteristics of a dispersion measure, which hopefully will allow us to link theoretical ideas to real data like asset returns and bond yields.
  • 34. Entropy in finance 27 Now we will proceed to rewrite entropy in terms that are more familiar to us. At this point it is worth noticing that this is just a re-statement of the basic pricing equation we have been developing since chapter one with a new perspective telling us that the stochastic discount factor is just a ratio of probabilities of the form: q = E Di P∗ t,t+n Pt,t+n where m = P∗ t,t+n Pt,t+n . (3.10) Now, going back to out pricing equation and substituting the payoff x by the interest rate, which are related by r = x/q, we obtain: Et[mt,t+nrt,t+n] = 1, because q = E[mx], and r = x q . (3.11) The t+n subscript used in equation (3.11) indicates we are dealing with a pricing kernel (or stochastic discount factor) and gross returns on an n-period time horizon. This kernel can be decomposed into a singular period series of multiplied kernels as such: qt = Et[mt,t+1Et+1[mt+1,t+2qt+2]], (3.12) and so on, recursively, until reaching period n where the last argument will at the end of the series will be Et+n−1[mt+n−1,t+nqt+n]. In words, it means that the stochastic discount factor, and the gross return on an asset for n periods can be decomposed into the multiplication of one-period pricing kernels. mt+n = n j mt+j−1,t+j, rt+n = n j rt+j−1,t+j. Now we define conditional entropy incorporating the stochastic discount factor into our previous definition and taking it to an n temporal horizon to obtain: Lt(mt,t+n) = log Etmt,t+n − Et log mt,t+n. (3.13)
  • 35. Entropy in finance 28 We now take conditional entropy and compute its expected value to obtain a mean value which we will simply refer to as entropy. Then, we will scale it by the time horizon to finally obtain a mean entropy per period. We can examine a conditional entropy per period by setting the n value to a desired number. Entropy provides an upper bound for the mean of excess returns and, as such, we can assume that any return will be “under” this limit. This will allow us to set convenient inequalities from which interesting results can be derived. Mathematically: ELt(mt,t+n) = E log Etmt,t+n − Et log mt,t+n, I(n) = 1 n ELt(mt,t+n). To be able to determine these inequalities, it is worth remembering that for a risk-free asset we have: E[mrf ] = 1 E[m] = 1 rf = qf . (3.14) Now, we can examine conditional entropy for one period and relate it to one-period excess returns. An excess return will be represented by the risk premium generated by the difference of a risky asset and a non-risky one. Like we stated before, entropy provides an upper bound for these returns and, thus, its value will always be higher or equal to the risk premium’s expectation. That is: I(1) = ELt(mt,t+1) E[log rt,t+1 − log r1 t,t+1]. (3.15) In the words of Backus et. al.: mean excess log returns are bounded above by the (mean conditional) entropy of the pricing kernel [3]. It will be useful for the next part of the procedure to notice that, for a risk-free asset, the concavity of the logarithmic function makes the next inequality hold:
  • 36. Entropy in finance 29 log E[mt,t+1rt,t+1] = log(1) = 0 E log(mt,t+1rt,t+1) = E log(mt,t+1) + E log(rt,t+1). (3.16) The E log(mt,t+1) term in the equation is conveniently included in the definition of the conditional entropy (see (3.13)), which allows us to incorporate Lt to the inequality by substituting the expectation of the pricing kernel’s logarithm with log Emt,t+1 − Lt to obtain: Lt log E[mt,t+1] + E[log rt,t+1]. (3.17) Substituting the expected value of the pricing kernel with the risk-free rate’s inverse (E[m] = 1/rf ), we finally obtain the entropy bound we anticipated: L E log r − log rf (3.18) If we take the expectation of both sides of the last equation, we recover the expression for the upper bound presented in equation (3.15). Now we will make a brief parenthesis to develop a concept that will be used in the following section: cumulants. Cumulants are a set of quantities that provide an alternative to the moments of a dis- tribution. Using cumulants provides the basis for an alternative manner to produce analytical results instead of dealing directly with probability density functions. They will be useful for us because, as it will be clear in the next paragraphs, we will be able to rewrite expectancies with analytical expressions that make our mathematical han- dling easier and more revealing. The cumulant generating function is simply the natural logarithm of the moment-generating function MX(t) := E etX , t ∈ R. Deriving with respect to the variable t and evaluating at zero produces the nth moment for the distri- bution. We show the definition for the cumulant generating function and the calculation of the first cumulant: K(t) = log E etX ∂K(t) ∂t t=0 = E[x]. (3.19)
  • 37. Entropy in finance 30 The cumulants kn can be obtained as a power series using a Maclaurin series. This makes the series be centered at zero, and is consistent with the calculation of the cumulants we presented above by differentiating the first expression n times and evaluating the result at zero. The series expansion reads: K(t) = ∞ n=1 κn tn n! = µt + σ2 t2 2 + . . . (3.20) where κ represents the nth cumulant in the series. For our purposes, we will find it convenient to use cumulants to determine the rela- tion between one-period entropy and the conditional distribution of log mt,t+1. The corresponding cumulant generating function is: Kt(s) = log Et[es log mt,t+1 ], (3.21) and its corresponding series expansion: Kt(s) = ∞ j=1 κjt sj j! . (3.22) If we evaluate the generating function and the first cumulant at s = 1, we can see that: Kt(1) = log Et[elog mt,t+1 ] = Et[mt,t+1] κ1t = ∂K(t) ∂s s=0 = ∂ ∂s s=0 log Et[es log mt,t+1 ] = Et log mt,t+1 We now substitute these expressions in the definition of conditional entropy (3.13) to obtain: Lt(mt,t+1) = log Etmt,t+1 − Et log mt,t+1 = Kt(1) − κ1t = κ2t log mt,t+1 2! + κ3t log mt,t+1 3! + κ4t log mt,t+1 4! + . . . (3.23)
  • 38. Entropy in finance 31 It is easily verifiable through an examination of the cumulant generating function that the first cumulant is the mean, followed by the variance, skewness, kurtosis and so forth. This means that if the distribution of mt,t+1 is normal, then any cumulant of orders j 3 will be zero. Of course, we do not expect for the pricing kernels to be distributed in a normal fashion, and because of this we need a convenient mathematical form to measure deviations from normality. We now define a horizon dependence as the difference in entropy over horizons of n and one, respectively: H(n) = I(n) − I(1) = 1 n ELt(mt,t+n) − ELt(mt,t+1). (3.24) If all the pricing kernels for an n period are independent and identically distributed, then we would expect for the pricing kernel to just be a scaled version of a one-period kernel. This would mean that the conditional entropy would be the same for I(1) and I(n): ELt(mt,t+n) = nELt(mt,t+1) → H(n) = 0 (3.25) Again, we find room for comparison with physics, as this is a generalization of a well- known characteristic of random walks, often used to model diffusion problems like Brow- nian motion. This generalization is the proportionality of variance to the time interval. For a particle in a known fixed position at t = 0, the central limit theorem tells us that after a large number of independent steps in the random walk, the walker’s position is distributed according to a normal distribution of total variance. However, as we have anticipated before, this is far from being the case for reality, as history has shown us that horizon dependence reflects important departures from the independent and identically distributed case. However unfortunate this is for the sake of predictability, it allows us to have some measure of the pricing kernel’s dynamics and even more importantly, it is observable through its connection to bond yields. We will rewrite conditional entropy incorporating bonds into its definition to make this connection explicit. Lt(mt,t+n) = log Etmt,t+n − Et log mt,t+n = log qn t − Et n j=0 log mt+j−1,t+j, (3.26)
  • 39. Entropy in finance 32 where qn t is the price of a bond for an n period horizon. Entropy is therefore: I(n) = 1 n E log qn t − E log mt,t+1. (3.27) Now we need to relate bond prices to its yield, and for this we will make use of both the time-horizon expression and the conditional entropy. We remember that the price of a bond given by the present value formula is given by (consider payment of one unit): qn f = 1 (1 + r)n = y−n . → log qn f = −ny → yn t = − 1 n qn t . (3.28) As we can see, we have computed the logarithm of the price as the entropy definition requires, obtaining a relationship with its yield. We can now take these yields and plug them into the entropy equation to obtain: H(n) = −E(yn t − y1 t ). (3.29) This last expression for horizon dependence as a function of yield spreads tells us a lot. It tells us that if the mean yield curve increases, it will have a negative value as the yield for a t horizon is larger with relation to a single-period one. A positive horizon dependence would be revealing of a decreasing yield curve. The observation of excess returns for stocks have showed that one-period entropy is larger than that of bonds, which is typically less than 0.1 % in most cases for observable time horizons. These bounds are used by Backus et. al. as diagnostics for candidate pricing kernels. Thus far, we have proved the existence of a pricing kernel and, with the help of entropy, suggested a possible measurement instrument for it. Now, we will proceed to study, for the first time in (the revised) literature, the case of entropy for the Mexican debt market.
  • 40. Chapter 4 Case study for Mexico After a long and winding development of asset-pricing theory and one of its latest ap- plications, we present an empiric study in the hopes that it will provide an intuitive perspective of the discussed concepts. Our primary interest is to show that the yield curve for bonds in Mexico is ascending, and as such, will have a negative horizon depen- dence. We will show that bonds have small excess returns, that is, smaller than most equity indexes. The data used for this very brief case study was provided by Dr. Torres as part of his personal archive and is at the disposal of the author should the reader want to consult it further. This database is extensive both in content and in variety, for it is a very representative sample of the Mexican debt market, showing registers for treasury bills (CETES), interbank offering rates in Mexico (TIIE), M-bonds (coupon bonds), etc. So far we have been discussing bonds without really defining them, so even when this issue should have probably been addressed before, it will be relevant to start off with a brief background of this financial instrument’s development for our country. Our database has some chaotic points throughout time, as two major financial crisis impacted the price and yields of bonds in our country. Although the author was just a toddler at the time, the financial crisis of 1994 may be remembered by the reader. This economic crisis with international repercussion was primarily ignited because of a lack of international reserves, causing the devaluation of the Mexican peso during the first days of president Ernesto Zedillo’s administration. A few weeks before the beginning of the process of the peso’s devaluation, president Clinton requested Congress the authorization 33
  • 41. Case study for Mexico 34 of a credit line for 20 billion dollars for the Mexican government, so it could guarantee full compliance of its obligations registered in this currency. Although it is not the scope of this work, it is worth saying that among the causes involved in this economic crisis was the exercise of the North American Free Trade Agreement, which made Mexico an attractive place for investment. President Salinas de Gortari took advantage of this situation and used it to finance his administration through the emission of CETES. These bonds had a short maturity term and were bought and sold in pesos, but were protected against devaluations by being quoted in dollars. This meant that when these instruments reached maturity, the holder was paid with the spot exchange rate and dis- encouraged investors to buy dollars as there was already an instrument with equal or higher yields. This caused an enormous attraction of investors towards these instruments which provoked us having an overvalued currency. This was a major disincentive towards real investments which caused a decrease in trades and exports. We can see another spike in rates during the 1998 Russian financial crisis. We will not go into studying the latter, but it is worth noting that one of the mechanisms to stabilize the Russian market involved swapping out enormous volumes of maturing GKO’s (Russia’s government bonds) into long term European bonds that would later be issued (and bought by many European countries). The following figure is very revealing of these situations: Figure 4.1: Average CETES rates over time series generated using mentioned database. Two points are signaled. Labeled MTC and RFC are the Mexican Tequila Crisis and Russian financial crisis, correspondingly. After these two we have seen a substantial decrease in rates which have stabilized over the last decade.
  • 42. Case study for Mexico 35 4.1 General overview of the Mexican treasury bills Mexican federal treasury bills, commonly referred to as CETES are the oldest debt obligation emitted by the Federal Government. They were issued for the first time in January 1978 and have constituted a fundamental pillar of the Mexican capital market ever since. These titles belong to the family of bonds denominated zero-coupon bonds which pay no interest to the holder except for the natural discount at which they are offered. CETES are offered at a price below their nominal value; they are issued at a substantial discount to par value, so that the interest is effectively rolled up to maturity (and usually taxed as such); the bondholder receives the full principal amount on the redemption date. Banco de M´exico issues these bonds at four maturity terms: 28 days, 91 days, 182 days, and 364 days. However, the Bank has issued these bills at very short terms as reduced as a week and up to 728 days. The primary manner to issue these titles is through weekly auctions in which participants present bids for the amount they wish to acquire as well as the interest rate they are willing to pay. This is often called the primary market and its historical data can be found at the Bank of Mexico’s registers with a weekly resolution. However, the secondary market provides the alternative to buy and sell these titles at the disposal of the consumer, submitting their price to a regular exercise of supply and demand. These bills are often used as underlying assets in the derivatives and futures market. Banco de M´exico, as the financial agent of the Mexican Federal Government has the mandate to carry out weekly primary auctions of its securities according to a predeter- mined calendar as well as a coordinated strategy released by the Secretariat of Finance and Public Credit (Hacienda) each quarter. These exercises are held as Dutch auctions, where there are multiple winning bidders. Because of this, securities are allocated in line with ascending order of the corresponding discount rates suggested, without exceeding the maximum amount indicated in the offering. This type of auction encourages more aggressive bids from intermediaries, thus promoting lower interest rates. Auctions are held every Tuesday at 10:00 AM; a previous announcement of the auctioneer is made on the Friday before the auction. Banks and other financial institutions need to be granted in order for them to participate, as well as adhering to a set of rules set by the Bank of Mexico. After the bidding, result disclosures are published at 11:30 AM, and settled
  • 43. Case study for Mexico 36 on Thursday. The following announcement was presented on newspapers in the year of 2003. Figure 4.2: Emission of CETES in 2003. Four different bond packages are issued with 1, 3, 6 and 12 month maturities. Taken from [10] 4.2 Methodology This methodology section develops the basic ideas behind a yield curve and why they are relevant to our study. This was made evident in the last equation we derived for entropy, but we will try to make it as intuitive as possible. The methodology for constructing this curve is trivial, so the important part of our study is understanding what this curve tells us and how it is related to entropy. Yield curves are also called term structures of interest rates, and we will use these names indifferently. The term structure of interest rates shows the relationship between interest rates or bond yields and different maturities. It is an important actor of an economy as it reflects expectations of market participants about future changes in interest rates and their assessment of monetary policy conditions. Here we purposely used the term assessment in order to make explicit that the impression
  • 44. Case study for Mexico 37 people have on the well-being of the economy is directly related to how much they value an investment. A poor expectation of the country’s future will result in a bad valuation of longer period bonds as they will be riskier to the consumer’s perspective. The convexity of a yield curve is negative if we value longer period bonds with a higher rate and are often called normal in that it represents the expected shift in yields as maturity dates extend out in time. It is most commonly associated with positive eco- nomic growth. If we have the opposite case, we denominate this an inverted yield curve and this means that short term bonds would have higher rates than long period ones. History has shown us that inversions of a yield curve have preceded many of the U.S. recessions and due to this correlation, is is also a prediction for lower interest rates in the future. A flat curve is the last possibility we have for our term structure of interest rates. This would mean that investors think that interest rates will remain the same in the future. Graphically: Figure 4.3: Different possibilities of yield curves approximated with a logarithmic, inverse and constant functions for normal, inverse and flat curves, respectively. As we have discussed before, bonds prove to be the best ally we’ve got to reflect on the dynamics of the pricing kernel. In more concrete terms and with our developed knowledge, we know that the cash flows for bonds are fixed (ideally), so their prices and respective yields and returns are function of nothing else than the pricing kernel. Our mathematical knowledge and the proofs we have developed in this document prove this pricing kernel must exist, but it can not directly be observed. We can think of the pricing of a bond as a “reverse engineering” activity, in which properties of the pricing kernel are inferred from their prices. To construct the yield curve, we take the arithmetic mean for all reported rates for each term and plot them in an increasing period fashion.
  • 45. Case study for Mexico 38 4.3 Results We present our results for CETES at 28, 91, 182 and 364 maturities: Table 4.1: This table shows descriptive statistics generated for the mentioned database. This is a daily register of the rates for bonds since August 14, 1995 to September 26, 2016, when this data was queried. Some rows had no information for all the terms so we reduced the count to only the days that had the four rates reported reaching a sample size of 4875 days. Descriptive statistics for CETES rates (1995-2016) Term 28 days 91 days 182 days 364 days Mean 0.108600401 0.11308304 0.115932446 0.118480772 Standard Error 0.001362825 0.001417789 0.001431194 0.001443432 Median 0.073007 0.0745 0.075683 0.076 Mode 0.071 0.0725 0.075 0.0755 Standard Deviation 0.095154072 0.098991742 0.099927666 0.100782151 Sample Variance 0.009054297 0.009799365 0.009985538 0.010157042 Kurtosis 3.419469465 2.904737779 2.441350969 2.203240772 Skewness 1.821707186 1.73184007 1.64887389 1.599187532 The data shows a logarithmic tendency. The dotted line is a logarithmic fit which results in R(P) = 0.0039 log(P) + 0.0957, where R is the expected rate as a function of term P, with a correlation coefficient R2 = 0.99971. Figure 4.4: We observe a clear logarithmic tendency for the data. The equation is presented in the graphic along with the respective error bars for each average.
  • 46. Case study for Mexico 39 Remembering the familiar time horizon for bond yields we derived from the last section, we have: H(n) = −E[yn − y1 ]. (4.1) In the independent and identically distributed case, we will have that H(n) = 0 and we would expect to have a flat yield curve. Just as we anticipated this is (thankfully) not the case. If the mean yield curve slopes upwards, then H(n) is negative and slopes downward. We have shown that the average slopes of yield curves are mirrored by the behavior of entropy over different time horizons. In the particular case of bonds, it seems that a feature is the convergence of forward rates to a constant value. Figure 4.5: We can observe that for each time horizon between terms we have a differ- ent average slope. This means that pricing kernels are not independent and identically distributed and entropy varies with the investment horizon. As with entropy, we can infer its magnitude from asset prices: negative horizon dependence is associated with an increasing mean yield curve and positive mean yield spreads.
  • 47. Case study for Mexico 40 4.4 Conclusions and future work It has been a long journey since we first presented the basic pricing equation to where we stand right now. We have been able to achieve what we hope is an intuitive perspective of asset pricing, from its very foundations to a practical example for our country’s debt market. A lot of work remains to be done as we have just uncovered the tip of the iceberg. The first and most natural step to consider is to expand this analysis to other financial instruments and examine the possibility of measuring entropy. An interesting activity would be to compare the horizon dependence of the national IPC index and to that of bonds, in order to sort out risk premiums and validate the upper bound we talked about in last chapter. While examining references for the realization of this document, a more recent work regarding the concept of coentropy as a measure of dependence between variables was found. Looking at this article could take us further in our study of asset pricing and probably give us a better grasp as to how we can use these powerful tools for proper valuation of bonds. As for the theoretical part of this work, we believe there is always room for improvement when it comes to making a formal proof of a mathematical con- cept. Overall, this semester long experience has proved to be challenging and motivating enough to make the author seriously consider taking a turn towards getting a graduate degree in economic theory, which is probably the biggest and most ambitious future work this endeavour has left pending. This work is probably (and also hopefully) the last work the author will present during his undergraduate degree. If one can be allowed some sentimentality, I want to thank every single person who has been there for me during this incredible period. Like the pricing kernel, I am just the multiplication of each of your individual efforts propelling me forward.
  • 48. Bibliography [1] Thomas S. Kuhn. Die Stuktur wissenschaftlicher Revolution. Suhrkamp, 1967. [2] Urban J. Jermann Fernando Alvarez. Using asset prices to measure the persistence of the marginal utility of wealth. Econometrica, 73(6):1977–2016, 2005. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/3598756. [3] David Backus, Mikhail Chernov, and Stanley Zin. Sources of Entropy in Represen- tative Agent Models. 2011. doi: 10.3386/w17219. [4] J.H. Cochrane. Asset Pricing: (Revised Edition). Princeton University Press, 2009. ISBN 9781400829132. [5] Elias M. Stein and Rami Shakarchi. Functional analysis: introduction to further topics in analysis. Princeton University Press, 2011. [6] California Institute of Technology. Notes on functional analysis. URL http:// people.hss.caltech.edu/~kcb/notes/separatinghyperplane.pdf. [7] Harold Gordon Eggleston. Convexity. Cambridge Tracts in Mathematics and Math- ematical Physics, 1958. [8] Yoshia Kosaku. Functional analysis. Springer-Verlag, 1980. [9] Jon Lee. A first course in linear programming. Cambridge University Press, 2004. [10] Banco de M´exico. URL http://www.banxico.org.mx/. 41