Autorizada la entrega del proyecto del alumno:              Rub´ n Salgado Fern´ ndez                 e               a   ...
UNIVERSIDAD PONTIFICIA DE COMILLAS        ESCUELA TECNICA SUPERIOR DE INGENIER´ (ICAI)                 ´                  ...
AcknowlegdementsFirstly, I would like to thank my director, Carlos Mat´ Jim´ nez, PhD, for giving me the chance of        ...
Resumen       ´En los ultimos a˜ os los m´ todos Bayesianos se han extendido y se han venido utilizando de forma          ...
iii                         a                                            ´    Por otro lado, se est´ empezando a aceptar q...
AbstractIn the recent years, Bayesian methods have been spread and successfully used in many and severalfields such as Mark...
vone of the ”Statistics of data”. The most basic concept to constitute such Statistics is the symbolicdata; furthermore, t...
List of Figures 1.1   Project Work Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       5 2.1 ...
LIST OF FIGURES                                                                                   vii  C.1 Load Data Menu ...
LIST OF FIGURES                                                                                  viii  C.35 Save Options i...
List of Tables 2.1   Distributions in Bayesian Data Analysis . . . . . . . . . . . . . . . . . . . . . . . .        10 2.2...
LIST OF TABLES                                                                                   x  7.7   Error Measure fo...
ContentsAcknowlegdements                                                                                          iResumen...
CONTENTS                                                                                              xii   3.2   Markov c...
CONTENTS                                                                                              xiii7 Results       ...
CONTENTS                                                                                            xiv         9.3.5   Da...
CONTENTS                                                                                           xv        A.2.8 Univari...
CONTENTS                                                                                         xviD Obtaining and Instal...
Chapter 1Introduction1.1 Project MotivationStatistics is primarily concerned with the analysis of data, either to assist i...
1. Introductiondata analysis have been appeared in many different fields, including Actuarial Science, Biometrics,Finance, ...
1. Introductionwide and exhaustive documentation about some of the currently more used and advanced techniques,including B...
1. Introduction1.2 ObjectivesThis project pretends to get the following aims.    • To provide a wide and rigorous document...
1. Introductionconsidered the most suitable for this project since it will let us introduce successive models into theappl...
Chapter 2Bayesian Data Analysis2.1     What is Bayesian Data Analysis?Statistics can be defined as the discipline that prov...
2. Bayesian Data Analysis    As far as Bayesian analysis is concerned and according to [Gelm04], the process can be divide...
2. Bayesian Data Analysisare ”improper” in that they do not integrate to 1, and this fact can cause problems. In these cas...
2. Bayesian Data Analysisis known as the prior predictive distribution, since it is not conditional upon a previous observ...
2. Bayesian Data Analysis   Distribution         Expression                          Information Required              Res...
2. Bayesian Data Analysis                                             θ = (θ1 , θ2 ) = (µ, σ 2 )                          ...
2. Bayesian Data Analysis                             µ1 = (V0−1 + n)−1 V0−1 µ0 + n¯                                      ...
2. Bayesian Data Analysis    Let us see an application to the Spanish Stock market. Let us suppose that the monthly closev...
2. Bayesian Data Analysis           −3        x 10    7                                                                   ...
2. Bayesian Data Analysis                                         Univariate Normal                                  Multi...
2. Bayesian Data Analysisfor other likelihood distributions. More details can be found in [Cong06], [Gelm04], or [Rossi06]...
2. Bayesian Data Analysis    Now let us extend the model, and assume that the parameters Θ11 , Θ12 that govern the distrib...
2. Bayesian Data Analysis    For more details about Bayesian hierarchical models, the reader is referenced to [Cong06], [G...
2. Bayesian Data Analysis    With this approach not only the limitation of the parametric approach related to the probabil...
Chapter 3Posterior Simulation3.1 IntroductionA practical problem with Bayesian inference is the difficulty of summarizing r...
3. Posterior Simulationchain as an approximate draw from equilibrium.    The technique has been developed strongly in diff...
3. Posterior Simulationwhere the Xn−1 have a common range called the state space of the Markov chain.    The common langua...
3. Posterior Simulation    • The period of a state i, denoted by d, is defined as di = mcd(n : [Pn ]ii > 0). The state i is...
3. Posterior Simulation3.4 Gibbs samplerIn many models, it is not easy to draw directly from the posterior distribution p(...
3. Posterior Simulation    Those values dropped which are affected by the starting point are called the burn-in. Generally...
3. Posterior Simulation    There are some special cases of this method. The most important are briefly explained below. Asw...
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Bayesian Regression System for Interval-valued data
Upcoming SlideShare
Loading in...5
×

Bayesian Regression System for Interval-valued data

717

Published on

Proyecto Final de Carrera realizado por Rubén Salgado y dirigido por Carlos Maté consistente en un Sistema de Regresión Bayesiana aplicado a Datos de tipo Intervalo.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
717
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Bayesian Regression System for Interval-valued data"

  1. 1. Autorizada la entrega del proyecto del alumno: Rub´ n Salgado Fern´ ndez e a EL DIRECTOR DEL PROYECTO Carlos Mat´ Jim´ nez e eFdo.: Fecha: 12/06/2007Vo Bo DEL COORDINADOR DE PROYECTOS Claudia Meseguer VelascoFdo.: Fecha: 12/06/2007
  2. 2. UNIVERSIDAD PONTIFICIA DE COMILLAS ESCUELA TECNICA SUPERIOR DE INGENIER´ (ICAI) ´ IA ´ INGENIERO EN ORGANIZACION INDUSTRIAL PROYECTO FIN DE CARRERA Bayesian Regression System for Interval-Valued Data.Application to the Spanish Continuous Stock Market AUTOR : Salgado Fern´ ndez, Rub´ n a e M ADRID , Junio 2007
  3. 3. AcknowlegdementsFirstly, I would like to thank my director, Carlos Mat´ Jim´ nez, PhD, for giving me the chance of e emaking this project. With him, I have learnt, not only about Statistics and investigation, but alsoabout how to enjoy with them. Special thanks to my parents. Their love and all they have taught me in this life are the thingswhat have made possible being the person I am now. Thanks to my brothers, my sister and the rest of my family for their support and for the stolen time. Thanks to Charo for standing my bad mood in the bad moments, for supporting me and for givingme the inspiration to go ahead. Madrid, June 2007 i
  4. 4. Resumen ´En los ultimos a˜ os los m´ todos Bayesianos se han extendido y se han venido utilizando de forma n eexitosa en muchos y variados campos tales como marketing, medicina, ingenier´a, econometr´a o mer- ı ıcados financieros. La principal caracter´stica que hace destacar al an´ lisis Bayesiano de datos (AN- ı aBAD) frente a otras alternativas es que, no s´ lo tiene en cuenta la informaci´ n objetiva procedente de o olos datos del suceso en estudio, sino tambi´ n el conocimiento anterior al mismo. Los beneficios que ese obtienen de este enfoque son m´ ltiples ya que, cuanto mayor sea el conocimiento de la situaci´ n, u o a ´con mayor fiabilidad se podr´ n tomar las decisiones y estas ser´ n m´ s acertadas. Pero no siempre todo a ahan sido ventajas. El ANBAD, hasta hace unos a˜ os, presentaba una serie de dificultades que limita- nban el desarrollo del mismo a los investigadores. Si bien la metodolog´a Bayesiana existe como tal ıdesde hace bastante tiempo, no se ha empezado emplear de manera generalizada hasta los 90’s. Estaexpansi´ n ha sido propiciada en gran parte por el avance en el desarrollo computacional y la mejora y operfeccionamiento de distintos m´ todos de c´ lculo como los m´ todos de cadenas de Markov-Monte e a eCarlo. ı ´ En especial, esta metodolog´a se ha mostrado extraordinariamente util en la aplicaci´ n a los mod- oelos de regresi´ n, ampliamente adoptados. En m´ ltiples ocasiones en la pr´ ctica, se dan situaciones o u aen las que se requiere analizar la relaci´ n entre dos variables cuantitativas. Los dos objetivos fun- odamentales de este an´ lisis ser´ n, por un lado, determinar si dichas variables est´ n asociadas y en a a aqu´ sentido se da dicha asociaci´ n (es decir, si los valores de una de las variables tienden a aumentar e o-o disminuir- al aumentar los valores de la otra); y por otro, estudiar si los valores de una variablepueden ser utilizados para predecir el valor de la otra. Un modelo de regresi´ n trata de proporcionar oinformaci´ n sobre uno o varios sucesos a trav´ s de su relaci´ n con el comportamiento de otros. Con o e ola metodolog´a Bayesiana se permite incorporar el conocimiento del investigador al an´ lisis, haciendo ı alos resultados m´ s precisos, ya que no se a´slan los resultados a los datos de una determinada muestra. a ı ii
  5. 5. iii a ´ Por otro lado, se est´ empezando a aceptar que el siglo XXI en el ambito de la estad´stica va a ıser el siglo de la ”estad´stica del conocimiento” a diferencia del anterior que fue el de la ”estad´stica ı ıde los datos”. El concepto b´ sico para construir dicha estad´stica es el de dato simb´ lico y se han a ı odesarrollado m´ todos estad´sticos para algunos tipos de datos simb´ licos. e ı o En la actualidad, la exigencia del mercado, la demanda y, en general, del mundo crece. Estoimplica que cada vez sea mayor el deseo de predecir la ocurrencia de un evento o poder controlar elcomportamiento de ciertas cantidades con el menor error posible con el fin de ofrecer mejores pro-ductos, obtener mayores beneficios o adelantos cient´ficos y mejores resultados. ı Sobre esta realidad, este proyecto trata de responder a dichas necesidades proporcionando unaamplia documentaci´ n sobre varias de las t´ cnicas m´ s utilizadas y m´ s punteras a d´a de hoy, como o e a a ıson el an´ lisis Bayesiano de datos, los modelos de regresi´ n y los datos simb´ licos, y proponiendo a o odiferentes t´ cnicas de regresi´ n. De igual forma se desarrollar´ una herramienta que permita poner e o aen pr´ ctica todos los conocimientos adquiridos. Dicha aplicaci´ n estar´ dirigida al mercado burs´ til a o a aespa˜ ol y permitir´ al usuario utilizarla de manera sencilla y amigable. En cuanto al desarrollo de esta n aherramienta se emplear´ uno de los lenguajes m´ s novedosos y con m´ s proyecci´ n del momento: R. a a a o Se trata, por tanto, de un proyecto que combina las t´ cnicas m´ s novedosas y con mayor proyecci´ n e a otanto en materia te´ rica, como es la regresi´ n Bayesiana aplicada a datos de tipo intervalo, como en o omateria pr´ ctica, como es el empleo del lenguaje R. a
  6. 6. AbstractIn the recent years, Bayesian methods have been spread and successfully used in many and severalfields such as Marketing, Medicine, Engineering, Econometrics or Financial Markets. The main char-acteristic that makes Bayesian Data Analysis (BADAN) remarkable compared with other alternativesis that not only does it take into account the objective information coming from the analyzed event,but also the pre-event knowledge. The benefits obtained from this approach are innumerable due tothe fact that the more knowledge of the situation one has, the more reliable and accurate decisionscould be taken. However, although Bayesian methodology was set long time ago, it has not beenapplied in a general way until the 90’s because of the computational difficulties. Such expansion hasbeen mainly favoured by the advances in that field and the improvement on different calculus meth-ods, such as Markov-chain Monte Carlo methods. Particularly, this Bayesian methodology has been resulted in an extraordinary useful applicationfor the regression models, which have been adopted by large. There are many times in real life inwhich it is necessary to analyse the situation between two quantitive variables. The two main objec-tives of this analysis would be, on the one hand, to determine whether such variables are associatedand in what sense that association comes about (that is, whether the value of one of the variablestends to rise- or to decrease- when augmented the value of the other); and on the other hand, to studywhether the values of one variable can be used to predict the value of the other. A regression modeloffers information about one or more events through their relationship with the behaviour of the oth-ers. With the Bayesian methodology it is possible to add the researcher’s knowledge to the analysis,making thus the results be more accurate due to the fact that the results are not isolated from the dataof one determined sample. On the other hand, in the Statistics field, it has been more and more accepted the fact that the XXIcentury will be the century of the ”Statistics of knowledge” contrary to the last one, which was the iv
  7. 7. vone of the ”Statistics of data”. The most basic concept to constitute such Statistics is the symbolicdata; furthermore, there have been developed more statistics methods for some types of symbolic data. Nowadays, the requirements of the market, and the demands of the world in general, are growingup. This implies the continuous increase of the desire for predicting the occurrence of an event or forthe ability of controlling the behaviour of certain quantities with the minimum error with the aim ofoffering better products, obtaining more benefits or scientific improvements and better outcomes. Under this frame, this project tries to responds such needs by offering a large documentationabout several of the most applied and leading nowadays techniques, such as Bayesian data analysis,regression models, and symbolic data, and suggesting different regression techniques. Similarly, ithas been developed a tool that allow the reader to put all the acquired knowledge into practice. Suchapplication will be aimed to the Spanish Continuous Stock Market and it will let the user apply it eas-ily. As far as the development of this tool is concerned, it has been used one of the more innovativeand with more projection languages of the moment: R. So, the project is about a combination of the techniques that are most innovative and with themost projection both in theoretical questions such as Bayesian regression applied to interval- valueddata and in practical questions such us the employment of the R language.
  8. 8. List of Figures 1.1 Project Work Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Univariate Normal Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.1 Interval time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 7.1 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 73 7.2 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 74 7.3 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . . . . . 75 7.4 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 75 7.5 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.6 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 77 7.7 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 78 7.8 Bayesian Centre and Radius Method in testing test . . . . . . . . . . . . . . . . . . 80 7.9 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 81 7.10 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 81 7.11 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.12 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.13 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 85 7.14 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 85 7.15 Bayesian Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . 87 9.1 BARESIMDA MDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 10.1 Interface between BARESIMDA and R . . . . . . . . . . . . . . . . . . . . . . . . 104 10.2 Interface between BARESIMDA and Excel . . . . . . . . . . . . . . . . . . . . . . 105 10.3 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 vi
  9. 9. LIST OF FIGURES vii C.1 Load Data Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.2 Select File Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 C.3 Display Loaded Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 C.4 Define New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 C.5 Enter New Variable Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 C.6 Display New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 C.7 Edit Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 C.8 Select Variable to Be Editted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 C.9 Enter New Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.10 Confirmation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.11 New Row data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.12 Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.13 Look And Feel Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.14 Look And Feel Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.15 New Look And Feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 C.16 Type Of User Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 C.17 Select Type Of User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 C.18 Non-Symbolic Classical Regression Menu . . . . . . . . . . . . . . . . . . . . . . . 131 C.19 Select Non-Symbolic Variables in Simple Regression . . . . . . . . . . . . . . . . . 131 C.20 Brief Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 C.21 Analysis Options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 132 C.22 New Prediction in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . 133 C.23 Graphics options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 133 C.24 Save options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . . . 134 C.25 Non-Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . 134 C.26 Select Variables in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 134 C.27 Analysis options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 135 C.28 Graphics options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . 135 C.29 Save options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . 136 C.30 Intercept in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . . . 136 C.31 Non-Symbolic Bayesian Simple Regression Menu . . . . . . . . . . . . . . . . . . . 136 C.32 Select Variables in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . 137 C.33 Analysis Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 137 C.34 Graphics Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 138
  10. 10. LIST OF FIGURES viii C.35 Save Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . . 138 C.36 Prior Experienced Especification Options in Non-Symbolic Bayesian Simple Regres- sion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 C.37 Prior Inexperienced Especification in Non-Symbolic Bayesian Simple Regression . . 139 C.38 Prior Experienced Especification in Non-Symbolic Bayesian Simple Regression . . . 139 C.39 Non-Symbolic Bayesian Multiple Regression menu . . . . . . . . . . . . . . . . . . 139 C.40 Analysis Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140 C.41 Graphics Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140 C.42 Save Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . . 140 C.43 Model Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . 141 C.44 Symbolic Classical Simple Regression Menu . . . . . . . . . . . . . . . . . . . . . 141 C.45 Select Variables in Symbolic Classical Simple Regression . . . . . . . . . . . . . . . 141 C.46 Analysis Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142 C.47 Graphics Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142 C.48 Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . . 143 C.49 Select Variables in Symbolic Classical Multiple Regression . . . . . . . . . . . . . . 143 C.50 Analysis Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144 C.51 Graphics Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144 C.52 Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . . . . . . . . . . . 145 C.53 Select Variables in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145 C.54 Analysis Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145 C.55 Graphics Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 146 C.56 Model Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . 147 C.57 Symbolic Bayesian Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . 147 C.58 Select Variables in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . . 147 C.59 Graphics Options in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . 148
  11. 11. List of Tables 2.1 Distributions in Bayesian Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Comparison between Univariate and Multivariate Normal . . . . . . . . . . . . . . . 15 2.3 Conjugate distributions for other likelihood distributions . . . . . . . . . . . . . . . 16 4.1 Bayes Factor Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Sensitivity Summary I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3 Sensitivity Summary II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1 Multiple and Simple Regression Comparison . . . . . . . . . . . . . . . . . . . . . 40 5.2 Sensitivity analysis of parameter β . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3 Sensitivity analysis of parameter σ2 . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.4 Classical and Bayesian regression comparison . . . . . . . . . . . . . . . . . . . . . 48 5.5 Main Prior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.6 Main Posterior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.7 Prior and Posterior Parameters Summary . . . . . . . . . . . . . . . . . . . . . . . . 59 5.8 Main Posterior Predictive Distributions Summary . . . . . . . . . . . . . . . . . . . 60 6.1 Multivalued Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.2 Modal-multivalued Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 7.1 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 74 7.2 Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . . 76 7.3 Error Measure for Centre Method (2002) . . . . . . . . . . . . . . . . . . . . . . . . 77 7.4 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 78 7.5 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 80 7.6 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 82 ix
  12. 12. LIST OF TABLES x 7.7 Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . . 83 7.8 Error Measure for Centre Method (2002) . . . . . . . . . . . . . . . . . . . . . . . . 84 7.9 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 84 7.10 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 86 11.1 Estimated material costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 11.2 Amortization Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 11.3 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
  13. 13. ContentsAcknowlegdements iResumen iiAbstract ivList of Figures viList of Tables xContents xvi1 Introduction 1 1.1 Project Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Bayesian Data Analysis 6 2.1 What is Bayesian Data Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Bayesian Analysis for Normal and other distributions . . . . . . . . . . . . . . . . . 10 2.2.1 Univariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Multivariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Other distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Hierarchical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Nonparametric Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Posterior Simulation 20 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 xi
  14. 14. CONTENTS xii 3.2 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5 Metropolis-Hastings sampler and its special cases . . . . . . . . . . . . . . . . . . . 25 3.5.1 Metropolis-Hastings sampler . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5.2 Metropolis sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5.3 Random-walk sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5.4 Independence sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Sensitivity Analysis 28 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3 Alternative Stats to Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.4 Highest Posterior Density Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5 Model Comparison Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Regression Analysis 35 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Classical Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 The Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.4 Normal Linear Regression Model subject to inequality constraints . . . . . . . . . . 48 5.5 Normal Linear Regression Model with Independent Parameters . . . . . . . . . . . . 49 5.6 Normal Linear Regression Model with Heteroscedasticity and Correlation . . . . . . 51 5.6.1 Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.6.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.7 Models Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 Symbolic Data 61 6.1 What is symbolic data analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2 Interval-valued variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.3 Classical regression analysis with Interval-valued data . . . . . . . . . . . . . . . . . 67 6.4 Bayesian regression analysis with Interval-valued data . . . . . . . . . . . . . . . . 70
  15. 15. CONTENTS xiii7 Results 72 7.1 Spanish Continuous Stock Market data sets . . . . . . . . . . . . . . . . . . . . . . 72 7.2 Direct Relation between Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.3 Uncorrelated Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798 A Guide to Statistical Software Today 88 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.2 Commercial Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 8.2.1 The SAS System for Statistical Analysis . . . . . . . . . . . . . . . . . . . . 89 8.2.2 Minitab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8.2.3 BMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8.2.4 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2.5 S-PLUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2.6 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8.3 Public License Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.3.1 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.3.2 BUGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.4 Analysis Packages with Statistical Libraries . . . . . . . . . . . . . . . . . . . . . . 94 8.4.1 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 8.4.2 Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.4.3 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.5 Some General Languages with Statistical Libraries . . . . . . . . . . . . . . . . . . 95 8.5.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.5.2 C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8.6 Developed Software Tool: BARESIMDA . . . . . . . . . . . . . . . . . . . . . . . 969 Software Requirements Specification 98 9.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 9.2 Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 9.3 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 9.3.1 Classical Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 99 9.3.2 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . 99 9.3.3 Bayesian Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 100 9.3.4 Bayesian Regression with interval- valued data . . . . . . . . . . . . . . . . 100
  16. 16. CONTENTS xiv 9.3.5 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.3.6 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.3.7 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.4 External Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 9.4.1 User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 9.4.2 Software Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10210 Software Architecture Study 103 10.1 Hardware/ Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 10.2 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10411 Project Budget 106 11.1 Engineering Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 11.2 Investment and Elements Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 11.2.1 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10812 Conclusions 110 12.1 Bayesian Regression applied to Symbolic Data . . . . . . . . . . . . . . . . . . . . 110 12.2 BARESIMDA Software Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 12.3 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112A Probability Distributions 113 A.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.1.1 Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.1.2 Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.1.3 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.2 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.2.1 Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.2.2 Univariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.2.3 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 A.2.4 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 A.2.5 Inverse- Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 A.2.6 Chi-square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-Square . . . . . . . . . . . . . . 118
  17. 17. CONTENTS xv A.2.8 Univariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.2.9 Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 A.2.10 Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 A.2.11 Multivariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 A.2.12 Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 A.2.13 Inverse- Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121B Installation Guide 122 B.1 From source folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 B.2 From installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122C User’s Guide 123 C.1 Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.1.1 Loading an excel file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.1.2 Defining a new variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.1.3 Editing an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 C.1.4 Deleting an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.1.5 Typing in a new data row . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.1.6 Deleting an existing data row . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.1.7 Modifying an existing data . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.2.1 Setting the look& feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.2.2 Selecting the type of user . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.3 Non Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 C.3.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 131 C.3.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 133 C.3.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 136 C.3.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 139 C.4 Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 C.4.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 140 C.4.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 143 C.4.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 144 C.4.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 146
  18. 18. CONTENTS xviD Obtaining and Installing R 149 D.1 Binary distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 D.2 Installation from source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 D.3 Package installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151E Obtaining and installing Java Runtime Environment 152 E.1 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 E.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 E.2.1 Installation of Self-Extracting Binary . . . . . . . . . . . . . . . . . . . . . 153 E.2.2 Installation of RPM File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 E.3 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155Bibliography 157
  19. 19. Chapter 1Introduction1.1 Project MotivationStatistics is primarily concerned with the analysis of data, either to assist in arriving at an improvedunderstanding of some underlying mechanism, or as a means for making informed rational decisions.Both these aspects generally involve some degree of uncertainty. The statistician’s task is then toexplain such uncertainty, and to reduce it to the extent in which this is possible. Problems of this typeoccur throughout all the physical, social and other sciences. One way of looking at statistics stemsfrom the perception that, ultimately, probability is the only appropriate way to describe and system-atically deal with uncertainty, as if it were the language for the logic of uncertainty. Thus, inferencestatements are precisely framed as probability statements on the possible values of the unknown quan-tities of interest (parameters or future observations) conditional on the observed, available data. Thescientific discipline based on this understanding is called Bayesian Statistics. Moreover, increasinglyneeded and sophisticated models, often hierarchical models, to describe available data are typicallytoo much complex for conventional statistics to handle, but can be tackled within Bayesian Statistics.In principle, Bayesian Statistics is designed to handle all situations where uncertainty is found. Sincesome uncertainty is present in most aspects of life, it may be argued that Bayesian Statistics shouldbe appreciated and used by everyone. It is the logic of contemporary society and science. Accordingto [Rupp04], applying Bayesian methodology is no more discussed, but the question is when this hasto be done. Bayesian methods have matured and improved in several ways during last fifteen years. Actually,they are increasingly becoming attractive to researchers as well as successful applications of Bayesian 1
  20. 20. 1. Introductiondata analysis have been appeared in many different fields, including Actuarial Science, Biometrics,Finance, Market Research, Marketing, Medicine, Engineering or Social Science. It is not only thatthe Bayesian approach produces appropriate answers to many current important problems, but alsothere is an evident need for it, given the inapplicability of conventional statistics to many of them. Thus, the main characteristic offered by Bayesian data analysis is the possibility of incorporatingresearcher’s knowledge about the problem to be handled. This supposes obtaining the better and themore reliable results as far as prior knowledge is more and more precise. But Bayesian Statistics wasrestrained until mid 90’s by its computational complexity. Since then, it has had a great expansionfavoured by the development and improvement of different computational methods in this field suchas Markov chain Monte Carlo. This methodology has shown to be extremely useful in its application to regression models, whichare widely accepted. Let us remember that the general purpose of regression analysis is to learn moreabout the relationship between several independent or predictor variables and a dependent or criterionvariable. Bayesian methodology let the researcher incorporate her or his knowledge to the analysis,improving the results since they do not only depend on the sampling data. On the other hand, increasingly, datasets are so large that they must be summarized in some fash-ion so that the resulting summary dataset is of a more manageable size, while still retaining as muchknowledge inherent to the entire dataset as possible. One consequence of this situation is that datamay no longer be formatted as single values such as is the case for classical data, but rather may berepresented by lists, intervals, distributions, and the like. These summarized data are examples ofsymbolic data. This kind of data also lets us represent better the knowledge and beliefs having in ourmind and that it is limited and hardly to take out with classical Statistics. According to [Bill02], thisresponds to the current need of changing from a Statistics of data in the past century to a Statistics ofknowledge in XXI century. Market and demand requirements are increasing continuously throughout the time. This impliesa need of better and more accurate methods to forecast new situations and to control different quanti-ties with the minimum error in order to supply better products, to obtain higher incomes or scientistadvantages and better results. Dealing with this outlook, this project is intended to respond to those requirements providing a 2
  21. 21. 1. Introductionwide and exhaustive documentation about some of the currently more used and advanced techniques,including Bayesian data analysis, regression models and symbolic data. Different examples relatedto the Continuous Spanish Stock Market have been explained throughout this writing, making clearthe advantages of employing the described methods. Likewise a software tool with a user- friendlygraphical interface has been developed to practice and to check all the acquired knowledge. Therefore, this is a project combining the most recent techniques with major future implicationsin theoretical issues, as Bayesian regression applied to interval- valued data is, with a technologicalpart dealing with the problem of interconnecting two software programs: one used to show the graph-ical user interface and the other one employed to make computations. Regarding to a more personal motivation, when accepting this project, several factors were takeninto consideration by the author: • A great challenge: it is an ambitious project with a high technical complexity related to both its theoretical basis and its technological basis. This represents a very good letter of introduction in order to be incorporated to the labour world. • A good planning time: this project was designed to be finished before June of 2007, which means to be able of finishing the career in June and incorporating to labour world in September. • Some very interesting issues: on one hand, it deals with the always needed issue of forecasting and modelling observations and situations in order to get the best possible results. On the other hand, it focuses on the Stock Market, which meets my personal hobbies. • A new programming language: the possibility of learning deeply a new and relatively recent programming language, such as R, was an extra- motivation factor. • The project director: Carlos Mat´ is considered a demanding and very competent director by e the students of the university. • An investigation scholarship: The possibility of being in the Industrial Organization department of the University learning from people such as the director mentioned above and another very recognized professors was a great factor. 3
  22. 22. 1. Introduction1.2 ObjectivesThis project pretends to get the following aims. • To provide a wide and rigorous documentation about the following issues: Bayesian data anal- ysis, regression models and symbolic data. From this point, documentation about Bayesian regression will be developed, as well as the software tool designed. • To build a software tool in order to fit Bayesian regression models to interval- valued data, finding out the most efficient way to design the graphical user interface. This must be as user- friendly as possible. • To find out the most efficient way to offer that system to future clients from the tests carried out with the application. • To design a survey to measure the quality of the tool and users’ satisfaction. • The possibility to write an article for a scientific journal.1.3 MethodologyAs the title of the project indicates, the last purpose is the development of an application aimed to-wards stock markets based on a Bayesian regression system and, therefore, some previous knowledgeis required. The first stage is the familiarization of the Bayesian data analysis, regression models applied toBayesian methodology and symbolic data. Within this phase, Bayesian data analysis will be firstly studied, trying to synthesize and to getthe most important elements. A special dedication will be given to posterior simulation and computa-tional algorithms. Then, regression models will be treated, reviewing quickly the classical approach,to deep later into the different Bayesian regression models, applying great part of what was explainedin Bayesian methodology. Finally, this first stage will be completed with the application to symbolicdata, paying special attention to interval- valued data. The second stage is referred to the development of the software application, employing an incre-mental methodology for programming and testing iterative prototypes. This methodology has been 4
  23. 23. 1. Introductionconsidered the most suitable for this project since it will let us introduce successive models into theapplication. The following figure shows the structure of the work packages the project is divided into: Figure 1.1: Project Work Packages 5
  24. 24. Chapter 2Bayesian Data Analysis2.1 What is Bayesian Data Analysis?Statistics can be defined as the discipline that provides us with a methodology to collect, to organize,to summarize and to analyze a set of data. Regarding data analysis, it can be divided into two ways of analysis: exploratory data analysis andconfirmatory data analysis. The former is used to represent, describe and analyze a set of data throughsimple methods in the first stages of statistical analysis. The latter is applied to make inferences fromdata, based on probability models. In the same way, confirmatory data analysis is divided into two branches depending on the adoptedapproach. The first one, known as frequentist, is used to make the inference of the data resulting froma sampling through classical methods. The second branch, known as Bayesian, goes further in theanalysis and adds to those data the prior knowledge which the researcher has about the treated prob-lem. Since the frequentist approach is not worthy to explain everything here, a more extended revisionof different classical methods related to the frequentist approach can be found in [Mont02].   Exploratory   Data Analysis Frequentist  Confirmatory   Bayesian 6
  25. 25. 2. Bayesian Data Analysis As far as Bayesian analysis is concerned and according to [Gelm04], the process can be dividedinto the following three steps: • To set up a full probability model, through a joint probability distribution for all observable and unobservable quantities in a problem. • To condition on observed data, obtaining the posterior distribution. • Finally, to evaluate the fit of the model and the implications of the resulting posterior distribu- tion. f (θ, y), known as the joint probability distribution (or f (y|θ), if there are several parameters θ),is obtained by means of f (θ, y) = f (y|θ)f (θ) (resp. f (θ, y) = f (y|θ)f (θ)) (2.1)where y is the set of sampled data. So this distribution is the product of two densities that are referredto as the sampling distribution f (y|θ) (resp. f (y|θ)) and the prior distribution f (θ) (resp. f (θ)). The sampling distribution, as its name suggests, is the probability model that the researcher as-signs to the statistics (resp. set of statistics) to be studied after the data have been observed. Here,an important problem stands up in relation to parametric approach due to the fact that the probabilitymodel that the researcher chooses could not be adequate. The nonparametric approach overcomesthis inconvenient as it will be seen later. When y is considered fixed, so it is function of θ (resp. θ), the sampling distribution is called thelikelihood function and obeys the likelihood principle, which states that for a given sample of data,any two probability models f (y|θ) (resp. f (y|θ)) with the same likelihood function yield the sameinference for θ, (resp. θ). The prior distribution does not depend upon the data. Accordingly, it contains the informationand the knowledge that the researcher has about the situation or problem to be solved. When thereis not any previous significant population from which the engineer can take his knowledge, that is,the researcher has not any prior information about the problem, a non-informative prior distributionmust be used in the analysis in order to let the data speak for themselves. Hence, it is assumed thatthe prior knowledge will have very little importance in the results. But most non- informative priors 7
  26. 26. 2. Bayesian Data Analysisare ”improper” in that they do not integrate to 1, and this fact can cause problems. In these casesit is necessary to be sure that the posterior distribution is proper. Another possibility is to use aninformative prior distribution but with an insignificant weight (around zero) associated to it. Though the prior distribution can take any form, it is common to choose particular classes ofpriors that make computation and interpretation easier. These are the conjugate priors. A conjugateprior distribution is one which, when combined with the likelihood function, gives a distribution thatfalls in the same class of distributions as the prior. Furthermore, and according to [Koop03], a naturalconjugate prior has the additional property that it has the same form as the likelihood does. But it isnot always possible to find this kind of distribution and the researcher has to manage a lot of distribu-tions to be able to give expression to his prior knowledge about the problem. This is another handicapthat the nonparametric approach reduces. In relation to the prior, what distribution should be chosen? There are three different points ofview corresponding to different styles of Bayesians: • Classical Bayesians consider that the prior is a necessary evil and priors that interject the least information possible should be chosen. • Modern parametric Bayesians considers that the prior is a useful convenience and priors with desirable properties such as conjugacy should be chosen. They remark that given a distribu- tional choice, prior hyper-parameters that interject the least information possible should be chosen. • Subjective Bayesians give essential importance to the prior, in the sense they consider it as a summary of old beliefs. So prior distributions which are based on previous knowledge (either the results of earlier studies or non-scientific opinion) should be chosen. Returning to Bayesian data analysis process, simply conditioning on the observed data y andapplying the Bayes’ Theorem, the posterior distribution, namely f (θ|y) (resp. f (θ|y)), yields: f (θ, y) f (θ)f (y|θ) f (θ, y) f (θ)f (y|θ) f (θ|y) = = (resp. f (θ|y) = = ) (2.2) f (y) f (y) f (y) f (y)where ∞ ∞ ∞ f (y) = f (θ)f (y|θ)dθ (resp. f (y) = f (θ)f (y|θ)dθ) (2.3) 0 0 0 8
  27. 27. 2. Bayesian Data Analysisis known as the prior predictive distribution, since it is not conditional upon a previous observation ofthe process and is applied to an observable quantity. An equivalent form of the posterior distribution displayed above omits the prior predictive distri-bution, since it does not involve θ (resp. θ) and the interest is based on learning about θ (resp. θ).So, with fixed y, it can be said that the posterior distribution is proportional to the joint probabilitydistribution f (θ, y). Once the posterior distribution is calculated, some kind of summary measure will be required toestimate the uncertainty about the parameter θ (resp. θ). This is due to the fact that the posteriordistribution is a high- dimensional object and its use is not practical for a problem. That measurewhich will summarize the posterior distribution can be the posterior mean, mode, median or variance,apart from others. Its choice will depend on the requirements of the problem. So the posterior dis-tribution has a great importance since it lets the researcher manage the uncertainty about θ (resp. θ)and provide him information about it (resp. them) taking into account both his prior knowledge andthe data collected from sampling on that parameter. According to [Mat´ 06], it is not difficult to deduce that posterior inference will fit in the non- eBayesian one as long as the estimation which the researcher gives to the parameter θ (resp. θ) is thesame as the one resulting from the sampling. Once the data y have been observed, a new unknown observable quantity y can be predicted for ˜the same process through the posterior predictive distribution, namely f (˜|y): y f (˜|y) = y f (˜, θ|y)dθ = y f (˜|θ, y)dθ = y f (˜|θ)f (θ|y)dθ y (2.4) To sum up, the basic idea is to update the prior distribution f (θ) through Bayes’ theorem byobserving the data y in order to get a posterior distribution f (θ|y). Then a summary measure or aprediction for new data can be obtained from f (θ|y). Table 2.1 reflects what has been said. 9
  28. 28. 2. Bayesian Data Analysis Distribution Expression Information Required Result Likelihood f (y|θ) Data Distribution f (y|θ) Prior f (θ) Researcher’s Knowledge Parameter Distribution f (θ) Joint f (y|θ)f (θ) Likelihood Distribution Prior Distribution f (θ, y) Posterior f (θ)f (y|θ) Prior Joint Distribution f (θ|y) Predictive f (˜|θ)f (θ|y)dθ y New Data Distribution Posterior Distribution f (˜|y) y Table 2.1: Distributions in Bayesian Data Analysis2.2 Bayesian Analysis for Normal and other distributions2.2.1 Univariate Normal distributionThe basic model to be discussed concerns an observable variable , normally distributed with mean µand unknown variance σ 2 : y|µ, σ 2 N (µ, σ 2 ) (2.5) As it can be seen in Appendix A, the likelihood function for a single observation is 1 f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp − (y − µ)2 (2.6) 2σ 2 This means that the likelihood function is proportional to a Normal distribution, omitting thoseterms that are constant. Now let us consider we have n independent observations y1 , y2 , . . . , yn . According to the previ-ous section, the parameters to be estimated θ are µ and σ 2 : 10
  29. 29. 2. Bayesian Data Analysis θ = (θ1 , θ2 ) = (µ, σ 2 ) (2.7) A full probability model must be set up through a joint probability distribution: f (θ, (y1 , y2 , . . . , yn )) = f (θ, y) = f (y|θ)f (θ) (2.8) The likelihood function for a sample of n iid observations in this case is n 1 f (y|θ) = f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp − (yi − µ)2 (2.9) 2σ 2 i=1 As it was recommended previously, a conjugate prior will be chosen; in fact, it will be a naturalconjugate prior. According to [Gelm04], this likelihood function suggests a conjugate prior distribu-tion of the form f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) (2.10)where the marginal distribution of σ 2 is the Scaled Inverse-χ2 and the conditional distribution of µgiven σ 2 is Normal (details about these distributions in Appendix A): µ|σ 2 N (µ0 , σ 2 V0 ) (2.11) σ2 Inv − χ2 (µ0 , s2 ) 0 (2.12) So the joint prior distribution is: f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) ∝ N − Inv − χ2 (µ0 , s2 V0 , ν0 , s2 ) 0 0 (2.13) Its four parameters can be identified as the location and scale of µ and the degrees of freedom andscale of σ 2 , respectively. As a natural conjugate prior was employed, the posterior joint distribution will have the sameform that the prior has. So, conditioning on the data, and according to Bayes’ Theorem, we have: f (θ|y) = f (µ, σ 2 |y) = f (y|µ, σ 2 )f (µ, σ 2 ) ∝ N − Inv − χ2 (µ1 , s2 V1 , ν1 , s2 ) 1 1 (2.14)where it be can shown that 11
  30. 30. 2. Bayesian Data Analysis µ1 = (V0−1 + n)−1 V0−1 µ0 + n¯ y (2.15) −1 V1 = V0−1 + n (2.16) ν1 = ν0 + n (2.17) V0−1 n ν1 s2 = ν0 s2 + (n − 1)s2 + 1 0 (¯ − µ0 )2 y (2.18) V0−1 + n All these formulae evidence that Bayesian inference combines prior and posterior information. The first term means that posterior mean µ1 is a weighted mean of prior mean µ0 and empiricalmean divided by the sum of their respective weights, where these are represented by V0−1 and thesimple size n. The second term represents the importance that posterior mean has and it can be seen as a com-promise between the sample size and the significance given to the prior mean. The third term indicates that the degrees of freedom of posterior variance are the sum of the priordegrees of freedom and the sample size. That is, the prior degrees of freedom can be understood as afictitious sample size on which the expert’s prior information is based. The last term explains the posterior sum of square errors as a combination of prior and empiricalsum of square errors plus a term that measures the conflict between prior and posterior information. A more detailed explanation of this last step can be found in [Gelm04], [Koop03] or [Cong06]. It is obvious that the marginal posterior distributions are: µ|σ 2 , y N (µ1 , σ 2 V0 ) (2.19) σ 2 |y Inv − χ2 (ν1 , s2 ) 1 (2.20) If we integrate out σ 2 , the marginal for µ will be a t-distribution (see Appendix A for details): µ|y tν1 (µ1 , s2 V0 ) 1 (2.21) 12
  31. 31. 2. Bayesian Data Analysis Let us see an application to the Spanish Stock market. Let us suppose that the monthly closevalues associated with Ibex 35 are normally distributed. If we take the values at which the Span-ish index closed during the first two weeks in January in 2006, it can be shown that the mean was10893.29 and the standard deviation was 61.66. So the non- Bayesian approach would inference aNormal distribution with the previous mean and standard deviation. Let us guess that we had askedany analyst about the Ibex 35 evolution in January, he would have affirmed strongly that it woulddecrease slightly, the mean close value at the end of the month would be around 10870 and, hence,the standard deviation would be higher, around 100. Then, according to the previous formulas, theposterior parameters would be µ1 = (100 + 10)−1 (100 × 10870 + 10 × 10893.29) = 10872.12 V1 = (100 + 10)−1 = 0.0091 ν1 = 100 + 10 = 110 (100 × 1002 + 9 × 61.66 + 1000 (10893.29 − 10870)2 ) 110 s1 = = 95.60 110 This means that there is a difference of almost 20 points between the Bayesian estimation and thenon-Bayesian for the mean close value of January. When the month of January would have passed, wecould compare both results and we could note that the Bayesian estimation was closer to the finallyreal mean close value and standard deviation: 10871.2 and 112.44. In figure 2.1, it can be seen howthe blue line representing the Bayesian estimation is closer to the cyan line representing the final realmean close value than the red line representing the frequentist estimation:2.2.2 Multivariate Normal distributionNow, let us consider that we have an observable vector y of d components with the multivariateNormal distribution: y N (µ, Σ) (2.22)where the first parameter is the mean column vector and the second one is the variance-covariancematrix. Extending what was said above to the multivariate case, we have: 13
  32. 32. 2. Bayesian Data Analysis −3 x 10 7 Frequentist Approach Bayesian Approach 6 Real Mean Colse Value in January 5 4 3 2 1 0 10000 10200 10400 10600 10800 11000 11200 11400 11600 11800 12000 Figure 2.1: Univariate Normal Example 1 f (y|µ, Σ) ∝ Σ−1/2 exp − (y − µ) Σ−1 (y − µ) (2.23) 2 And for n iid observations: n −n/2 1 f (y1 , y2 , . . . , yn |µ, Σ) ∝ Σ exp − (yi − µ) Σ−1 (yi − µ) (2.24) 2 i=1 A multivariate generalization of the Scaled-Inverse χ2 is the Inverse Wishart distribution (seedetails in Appendix A), so the prior joint distribution is Λ0 f (θ|y) = f (µ, Σ|y) ∝ N − Inv − W ishart µ0 , , ν0 , Λ0 (2.25) k0due to the fact that Σ µ|Σ N µ0 , (2.26) k0 Σ Inv − W ishart ν0 , Λ−1 0 (2.27) 14
  33. 33. 2. Bayesian Data Analysis Univariate Normal Multivariate NormalExpression y N (µ, σ 2 ) y N (µ, Σ)Parameters to estimate µ, σ 2 µ, Σ 2 µ|σ 2 N µ0 , σ0 k µ|Σ Σ N µ0 , k0Prior Distributions σ2 Inv − χ2 ν0 , σ0 2 Σ Inv − W ishart ν0 , Λ−1 0 2 σ0 µ, σ 2 N − Inv − χ2 µ0 , 2 k0 , ν0 , σ0 µ, Σ N − Inv − W ishart µ0 , k0 , ν0 , Λ−1 Σ 0 2 µ|σ 2 N µ1 , σ1 k µ|Σ Σ N µ1 , k1Posterior Distributions σ2 Inv − χ2 ν1 , σ1 2 Σ Inv − W ishart ν1 , Λ−1 1 2 σ1 µ, σ 2 N − Inv − χ2 µ1 , 2 k1 , ν1 , σ1 µ, Σ N − Inv − W ishart µ1 , Λ1 , ν1 , Λ1 k 1 Table 2.2: Comparison between Univariate and Multivariate Normal The posterior results are the same that were told for the univariate case but applying these distri- butions. For those interested readers, more information in [Gelm04] or [Cong06]. A summary is shown in Table 2.2 in order to get the most important ideas. 2.2.3 Other distributions As it has just been made with the Normal distribution, a Bayesian analysis for other distributions could be done. For instance, the exponential distribution is commonly used in reliability analysis. Because of this project will deal with the Normal distribution for the likelihood, it will not be explained in detail the analysis with other distributions. Table 2.3 shows the conjugate prior and posterior distributions 15
  34. 34. 2. Bayesian Data Analysisfor other likelihood distributions. More details can be found in [Cong06], [Gelm04], or [Rossi06]. Likelihood Parameter Conjugate Prior Hyperparameters Posterior Hyperparameters Bin(y|n, θ) θ Beta α, β α + y, β + n − y P (y|θ) θ Gamma α, β α + n¯, β + n y Exp (y|θ) θ Gamma α, β α + 1, β + y Geo(y|θ) θ Beta α, β α + 1, β + y Table 2.3: Conjugate distributions for other likelihood distributions2.3 Hierarchical ModelsHierarchical data arise when they are structured or related among them. When this occurs, standardtechniques either assume that these groups belong to entirely different populations or ignore the ag-gregate information entirely. Hierarchical models provide a way of pooling the information for the disparate groups withoutassuming that they belong to precisely the same population. Suppose we have collected data about some random variable Y from m different populations withn observations for each population. Let yij represent observation j from population i. Now suppose yij f (θi ), where θi is a vectorof parameters for population i. Furthermore, θi f (Θ), where Θ may also be a vector. Until thispoint, we have only rewritten what it was said previously. 16
  35. 35. 2. Bayesian Data Analysis Now let us extend the model, and assume that the parameters Θ11 , Θ12 that govern the distributionof the θ’s are themselves random variables and assign a prior distribution to these variables as well: Θ f (ψ) (2.28)where Θ is called the hyperprior. The vector parameter ψ for the hyperprior may be ”known” andrepresents our prior beliefs about Θ or, in theory; we can also assign a probability distribution forthese quantities as well, and proceed to another layer of hierarchy. According to [Gelm04], the idea of exchangeability will be used to create a joint probabilitydistribution model for all the parameters θ. A formal definition to explain what exchangeabilityconsists of is: ”The parameters θ1 , θ2 , . . . , θn are exchangeable in their joint distribution if f (θ1 , θ2 , . . . , θn ) isinvariant to permutations in the index 1, 2, . . . , n”. This means that if no information other than the data is available to distinguish any of the θi fromany of the others, and no ordering of the parameters can be made, one must assume symmetry amongthe parameters in the prior distribution. So we can treat the parameters for each sub-population asexchangeable units. This can be formulated by: f θ1 , θ2 , . . . , θn |Θ = Πl f θi |Θ i=1 (2.29) The prior joint distribution is now: f θ1 , θ2 , . . . , θn , Θ = f θ1 , θ2 , . . . , θn |Θ f (Θ) (2.30) And conditioning on the data, it yields: f θ1 , θ2 , . . . , θn |y = f θ1 , θ2 , . . . , θn , Θ f y|θ1 , θ2 , . . . , θn , Θ (2.31) Perhaps the most important point in practice is that non-hierarchical models are usually inappro-priate for hierarchical data, while non-hierarchical data can be modelled following the hierarchicalstructure and assigning concrete values to the hyperprior parameters. This kind of models will be used in Bayesian regression models with autocorrelated errors, as itwill be seen in the following chapters. 17
  36. 36. 2. Bayesian Data Analysis For more details about Bayesian hierarchical models, the reader is referenced to [Cong06], [Gelm04]and [Rossi06].2.4 Nonparametric BayesianTo overcome the limitations that have been mentioned throughout this chapter, it is the nonparametricapproach which achieves to get through and to reduce the restrictions of the parametric approach.This kind of analysis can be performed through the so-called Dirichlet Process, which allows us toexpress in a simple way the prior distributions or the distribution family of F , where F is the distri-bution function of the studied variable. This process has a parameter, called α, which is transformedinto a distribution probability. According to [Mat´ 06], a Dirichlet Process for F (t) requires to know: e • A previous proposal for F (t), F0 (t), that corresponds to the distribution function that remarks the prior knowledge which the engineer has and it is denoted by α(t) F0 (t) = (2.32) M • A measure of the confidence about the previous proposal, denoted by M , and whose values can vary between 0 and ∞, depending on whether there is a total confidence in the data or in the previous proposal respectively. ˆ It can be demonstrated that the posterior distribution for F (t), Fn (t), with a sampling over n data,is given by ˆ Fn (t) = pn Fn (t) + (1 − pn )Fn (t) (2.33) Mwhere Fn (t) is the empirical distribution function and pn = M +n . A more detailed information about the nonparametric approach and how Dirichlet processes areused can be found in [Mull04] or [Gosh03]. 18
  37. 37. 2. Bayesian Data Analysis With this approach not only the limitation of the parametric approach related to the probabilitymodel of the variable to study is avoided, since no hypothesis is required, but also it allows us toconfer a quantified importance to the prior knowledge which the engineer gives, depending on theconfidence on the certainty about this knowledge. 19
  38. 38. Chapter 3Posterior Simulation3.1 IntroductionA practical problem with Bayesian inference is the difficulty of summarizing realistically complexposterior distributions. In most practical problems, posterior densities will not take the form of anywell-known and understood density, so summary statistics, such as the posterior mean and variance ofparameters of interest, will not be analytically available. It is at this point where the importance of theBayesian computation arises and any computational tools are required to gain meaningful inferencefrom the posterior distribution. Its importance is such that the computing revolution of the last 20years has led to a blossoming of Bayesian methods in many fields such Econometrics, Ecology orHealth. Regarding to this, the most transcendent simulation methods are the Markov chain Monte Carlomethods (MCMC). MCMC methods date from the original work of [Metr53], who were interestedin methods for the efficient simulation of the energy levels of atoms in a crystalline structure. Theoriginal idea was subsequently generalized by [Hast70], but its true potential was not fully realizedwithin the statistical literature until [Gelf90] demonstrated its application to the estimation of inte-grals commonly occurring in the context of Bayesian statistical inference. As [Berg05] points up, the underlying principle is simple: if one wishes to sample randomly froma specific probability distribution then design a Markov chain whose long-time equilibrium is thatdistribution, write a computer program to simulate the Markov chain, run it for a time long enoughto be confident that approximate equilibrium has been attained, then record the state of the Markov 20
  39. 39. 3. Posterior Simulationchain as an approximate draw from equilibrium. The technique has been developed strongly in different fields and with rather different emphasesin the computer science community concerned with the study of random algorithms (where the em-phasis is on whether the resulting algorithm scales well with increasing size of the problem), in thespatial statistics community (where one is interested in understanding what kinds of patterns arisefrom complex stochastic models), and also in the applied statistics community (where it is appliedlargely in Bayesian contexts, enabling researchers to formulate statistical models which would other-wise be resistant to effective statistical analyses). The development of the theoretical work also benefits the development of statistical applications.The MCMC simulation techniques have been applied to develop practical statistical inferences foralmost all problems in (bio) statistics, for example, the problems in longitudinal data analysis, im-age analysis, genetics, contagious disease epidemics, random spatial pattern, and financial statisticalmodels such as GARCH and stochastic volatility. The simplicity of the underlying principle of MCMC is a major reason for its success. Howevera substantial complication arises as the underlying target problem becomes more complex; namely,how long should one run the Markov chain so as to ensure that it is close to equilibrium? According to[Gelm04], with n = 100 independent samples should be enough for reasonable posterior summaries,but in some cases more samples are needed to assure more accuracy.3.2 Markov chainsThe essential theory required in developing Monte Carlo methods based on Markov chains is pre-sented here. The most fundamental result is that certain Markov chains converge to a unique invariantdistribution, and can be used to estimate expectations with respect to this distribution. But in order toreach this conclusion, some concepts need to be defined firstly. A Markov chain is a series of random variables, X0 , . . . , Xn , also called a statistic process, inwhich only the value of Xn−1 influences the distribution of Xn . Formally: P (Xn = xn |X0 = x0 , . . . , Xn−1 = xn−1 ) = P (Xn = xn |Xn−1 = xn−1 ) (3.1) 21
  40. 40. 3. Posterior Simulationwhere the Xn−1 have a common range called the state space of the Markov chain. The common language to refer to different situations in which a Markov chain can be found isthe following. If Xn = i, it is said that the chain is in the state i in the step n or that it has the valuei in the step n. This language confers the chain certain dynamic view, which is corroborated by themain tool to study it: the transition probabilities P (Xn+1 = j|Xn = i), which are represented by thetransition matrix P = (Pij ) with Pij = P (Xn+1 = j|Xn = i) . This is used to show the probabilityof changing of state i to state j. Due to the fact that in major interesting applications Markov chains are homogeneous, the transi-tion matrix can be defined from the initial probability, P0 = P (X1 = j|X0 = i). Regarding to this, aMarkov chain Xt is homogeneous if P (Xn+1 = j|Xn = i) = P (X1 = j|X0 = i) for all n, i, j. Furthermore, using Chapman- Kolmogorov equation, it can be shown that, given the transitionmatrixes P and, for step n, Pn of a homogenous Markov chain, then Pn = P n . On the other hand we will see the concepts of invariant or stationary distribution, ergodicity andirreducibility, which are indispensable to reach the main result. It will be assumed that Xt is a ho-mogenous Markov chain. Then, vector P is an invariant distribution of the chain Xt if satisfies: a) πj ≥ 0 such as j πj = 1. b) π = πP . That is, a stationary distribution over the states of a Markov chain is one that persists forever onceit is reached. The concept of ergodic state requires making other definitions clear such as recurrence and aperi-odicity: • The state i is recurrent if P (Xn = i|X0 = i) = 1 for any n ≥ 1. Otherwise, it is transient. Moreover, i will be positive recurrent if the expected (average) return time is finite, and null recurrent if it is not. 22
  41. 41. 3. Posterior Simulation • The period of a state i, denoted by d, is defined as di = mcd(n : [Pn ]ii > 0). The state i is aperiodic if di = 1, or periodic if it is greater. Then a state is ergodic if it is positive recurrent and aperiodic. The last concept to define is theirreducibility. A set of states C ∈ S, where S is the set of all possible states, is irreducible if for alli, j ∈ C: • i and j have the same period. • i is transient if and only if j is transient. • i is recurrent if and only if j is null recurrent. Now, having all these concepts in mind, we can know if a Markov chain has a stationary distribu-tion with next lemma:Lema 3.2.1. Let Xt be a homogenous and irreducible Markov chain. The chain will have only onestationary distribution if, and only if, all the states are positive recurrent. In that case, it will haveinputs given by πi = µi −1 , where µi denotes the expected return time of the state i. The relation with the long time behaviour is given by this other lemma:Lema 3.2.2. Let Xt be a homogenous, irreducible and aperiodic Markov chain. Then 1 [Pn ]ij −→ for all i, j ∈ S as n ∞ (3.2) µi3.3 Monte Carlo IntegrationMonte Carlo integration estimates the integral E[g(θ)] by obtaining samples θt , t = 1, . . . , n fromthe posterior distribution p(θ|y) and averaging n 1 E[g(θ)] = g(θt ) (3.3) n t=1 where the function g(θ) represents the function of interest to estimate. Note that if samplesθt , t = 1, . . . , n has p(θ|y) as its stationary distribution, the θt form a Markov chain. 23
  42. 42. 3. Posterior Simulation3.4 Gibbs samplerIn many models, it is not easy to draw directly from the posterior distribution p(θ|y). However, if theparameter θ is partitioned into several blocks as θ = (θ1 , . . . , θp ) where θj for j = 1, . . . , p, then thefull conditional posterior distributions, p(θ1 |y, θ2 , . . . , θp ), . . . , p(θp |y, θ1 , . . . , θp−1 ) , could be sim-ple to draw from to obtain a sequence θ1 , . . . , θp . For instance, in the Normal linear regression modelit is convenient to set j=2, with θ1 = β and θ2 = σ 2 , and the full conditional distributions would bep(θ1 = β|y, θ2 = σ 2 ) and p(θ2 = σ 2 |y, θ1 = β), which are very useful in the Normal independentmodel which will be explained later. The Gibbs sampler is defined by iterative sampling from each of those p conditional distributions: 1. Set a starting value, θ0 = (θ2 , . . . , θp ). 0 0 2. Take random draws 1 0 0 - θ1 from p(θ1 |y, θ2 , . . . , θp ) 1 1 0 - θ2 from p(θ2 |y, θ1 , . . . , θp ) . -. . 1 1 1 - θp from p(θp |y, θ1 , . . . , θp−1 ) 3. Repeat step 2 as necessary. 4. Reject those θ affected by θ0 = (θ2 , . . . , θp ), that is the first p − 1 draws, and average the rest 0 0 of draws applying the Monte Carlo integration. For instance, in the Normal regression model we would have: 1. Set a starting value, θ0 = (θ2 = (σ 2 )0 ). 0 2 2. Take random draws - θ1 = β1 from p(θ1 = β|y, θ2 = (σ 2 )0 ) 1 1 0 2 - θ2 = (σ 2 )1 from p(θ2 = σ 2 |y, θ1 = β) 1 2 1 3. Repeat step 2 as necessary. 1 1 4. Eliminate those θ1 = β1 and average the rest of draws applying the Monte Carlo integration. 24
  43. 43. 3. Posterior Simulation Those values dropped which are affected by the starting point are called the burn-in. Generally,any set of values which are discarded in a MCMC simulation is called the burn-in. The size of theburn-in period is the subject of current research in MCMC methods. As the state of each draw depends on the state of the previous one, the sequence is a Markovchain. More detail information can found in [Chen00], [Mart01] or [Rossi06].3.5 Metropolis-Hastings sampler and its special cases3.5.1 Metropolis-Hastings samplerThe Metropolis-Hastings method is adequate to simulate models that are not conditionally conjugate.Furthermore, it can be combined with the Gibbs sampler to simulate posterior distributions wheresome of the conditional posterior distributions are easy to sample from and other ones are not. Asthe algorithms above explained, this is based on formulating a Markov chain, but using a proposaldistribution, q(.|θt ), which depends on the current state θt , to generate a new proposed sample θ∗ .This proposal is accepted as the next state with probability given by p(θ∗ |y)q(θt |θ∗ ) α(θt , θ∗ ) = min 1, (3.4) p(θt |y)q(θ∗ |θt ) If the point θ∗ is not accepted, then the chain does not move and θt+1 = θt . According to[Mart01], the steps to follow are: 1. Initialize the chain to θ0 and set t=0. 2. Generate a candidate point θ∗ from q(.|θt ). 3. Generate U from a uniform (0,1) distribution. 4. If U ≤ α(θt , θ∗ ) then set θt+1 = θ∗ , else set θt+1 = θt . 5. Set t=t+1 and repeat steps 2 trough 5. 6. Take the average of the draws g(θ1 ), . . . , g(θn ) Note that it should be, not only recommendable, but also essential that the proposal distributionq(·|θt ) were easy to sample from. 25
  44. 44. 3. Posterior Simulation There are some special cases of this method. The most important are briefly explained below. Aswell as those, it can shown according to [Gelm04] that the Gibbs sampler is another special case ofthe Metropolis-Hastings algorithm where the proposal point is always accepted.3.5.2 Metropolis samplerThis method is a particular case of the Metropolis-Hastings sampler where the proposal distributionhas to be symmetric. That is, q(θ∗ |θt ) = q(θt |θ∗ ) (3.5) for all θ∗ and θt . Then, the probability of accepting the new point is p(θ = θ∗ |y) α(θt , θ∗ ) = min 1, (3.6) p(θ = θt |y) The same procedure seen in the Metropolis-Hastings sampler has to be followed.3.5.3 Random-walk samplerThis special case refers to a proposal distribution of the form q(θ∗ |θt) = q(|θt − θ∗ |) (3.7) And the candidate point is θ∗ = θt + z, where z is called the increment random variable from q.Then, the probability of accepting the new point is p(θ = θ∗ |y) α(θt , θ∗ ) = min 1, (3.8) p(θ = θt |y) The same procedure seen in the Metropolis-Hastings sampler has to be followed.3.5.4 Independence samplerThe last variation has a proposal distribution such that q(θ∗ |θt ) = q(θ∗ ) (3.9) So it does not depend on θt . Then, the probability of accepting the new point is 26

×