Bayesian Regression System for Interval-valued data
Upcoming SlideShare
Loading in...5
×
 

Bayesian Regression System for Interval-valued data

on

  • 802 views

Proyecto Final de Carrera realizado por Rubén Salgado y dirigido por Carlos Maté consistente en un Sistema de Regresión Bayesiana aplicado a Datos de tipo Intervalo.

Proyecto Final de Carrera realizado por Rubén Salgado y dirigido por Carlos Maté consistente en un Sistema de Regresión Bayesiana aplicado a Datos de tipo Intervalo.

Statistics

Views

Total Views
802
Views on SlideShare
802
Embed Views
0

Actions

Likes
0
Downloads
6
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Bayesian Regression System for Interval-valued data Bayesian Regression System for Interval-valued data Document Transcript

    • Autorizada la entrega del proyecto del alumno: Rub´ n Salgado Fern´ ndez e a EL DIRECTOR DEL PROYECTO Carlos Mat´ Jim´ nez e eFdo.: Fecha: 12/06/2007Vo Bo DEL COORDINADOR DE PROYECTOS Claudia Meseguer VelascoFdo.: Fecha: 12/06/2007
    • UNIVERSIDAD PONTIFICIA DE COMILLAS ESCUELA TECNICA SUPERIOR DE INGENIER´ (ICAI) ´ IA ´ INGENIERO EN ORGANIZACION INDUSTRIAL PROYECTO FIN DE CARRERA Bayesian Regression System for Interval-Valued Data.Application to the Spanish Continuous Stock Market AUTOR : Salgado Fern´ ndez, Rub´ n a e M ADRID , Junio 2007
    • AcknowlegdementsFirstly, I would like to thank my director, Carlos Mat´ Jim´ nez, PhD, for giving me the chance of e emaking this project. With him, I have learnt, not only about Statistics and investigation, but alsoabout how to enjoy with them. Special thanks to my parents. Their love and all they have taught me in this life are the thingswhat have made possible being the person I am now. Thanks to my brothers, my sister and the rest of my family for their support and for the stolen time. Thanks to Charo for standing my bad mood in the bad moments, for supporting me and for givingme the inspiration to go ahead. Madrid, June 2007 i
    • Resumen ´En los ultimos a˜ os los m´ todos Bayesianos se han extendido y se han venido utilizando de forma n eexitosa en muchos y variados campos tales como marketing, medicina, ingenier´a, econometr´a o mer- ı ıcados financieros. La principal caracter´stica que hace destacar al an´ lisis Bayesiano de datos (AN- ı aBAD) frente a otras alternativas es que, no s´ lo tiene en cuenta la informaci´ n objetiva procedente de o olos datos del suceso en estudio, sino tambi´ n el conocimiento anterior al mismo. Los beneficios que ese obtienen de este enfoque son m´ ltiples ya que, cuanto mayor sea el conocimiento de la situaci´ n, u o a ´con mayor fiabilidad se podr´ n tomar las decisiones y estas ser´ n m´ s acertadas. Pero no siempre todo a ahan sido ventajas. El ANBAD, hasta hace unos a˜ os, presentaba una serie de dificultades que limita- nban el desarrollo del mismo a los investigadores. Si bien la metodolog´a Bayesiana existe como tal ıdesde hace bastante tiempo, no se ha empezado emplear de manera generalizada hasta los 90’s. Estaexpansi´ n ha sido propiciada en gran parte por el avance en el desarrollo computacional y la mejora y operfeccionamiento de distintos m´ todos de c´ lculo como los m´ todos de cadenas de Markov-Monte e a eCarlo. ı ´ En especial, esta metodolog´a se ha mostrado extraordinariamente util en la aplicaci´ n a los mod- oelos de regresi´ n, ampliamente adoptados. En m´ ltiples ocasiones en la pr´ ctica, se dan situaciones o u aen las que se requiere analizar la relaci´ n entre dos variables cuantitativas. Los dos objetivos fun- odamentales de este an´ lisis ser´ n, por un lado, determinar si dichas variables est´ n asociadas y en a a aqu´ sentido se da dicha asociaci´ n (es decir, si los valores de una de las variables tienden a aumentar e o-o disminuir- al aumentar los valores de la otra); y por otro, estudiar si los valores de una variablepueden ser utilizados para predecir el valor de la otra. Un modelo de regresi´ n trata de proporcionar oinformaci´ n sobre uno o varios sucesos a trav´ s de su relaci´ n con el comportamiento de otros. Con o e ola metodolog´a Bayesiana se permite incorporar el conocimiento del investigador al an´ lisis, haciendo ı alos resultados m´ s precisos, ya que no se a´slan los resultados a los datos de una determinada muestra. a ı ii
    • iii a ´ Por otro lado, se est´ empezando a aceptar que el siglo XXI en el ambito de la estad´stica va a ıser el siglo de la ”estad´stica del conocimiento” a diferencia del anterior que fue el de la ”estad´stica ı ıde los datos”. El concepto b´ sico para construir dicha estad´stica es el de dato simb´ lico y se han a ı odesarrollado m´ todos estad´sticos para algunos tipos de datos simb´ licos. e ı o En la actualidad, la exigencia del mercado, la demanda y, en general, del mundo crece. Estoimplica que cada vez sea mayor el deseo de predecir la ocurrencia de un evento o poder controlar elcomportamiento de ciertas cantidades con el menor error posible con el fin de ofrecer mejores pro-ductos, obtener mayores beneficios o adelantos cient´ficos y mejores resultados. ı Sobre esta realidad, este proyecto trata de responder a dichas necesidades proporcionando unaamplia documentaci´ n sobre varias de las t´ cnicas m´ s utilizadas y m´ s punteras a d´a de hoy, como o e a a ıson el an´ lisis Bayesiano de datos, los modelos de regresi´ n y los datos simb´ licos, y proponiendo a o odiferentes t´ cnicas de regresi´ n. De igual forma se desarrollar´ una herramienta que permita poner e o aen pr´ ctica todos los conocimientos adquiridos. Dicha aplicaci´ n estar´ dirigida al mercado burs´ til a o a aespa˜ ol y permitir´ al usuario utilizarla de manera sencilla y amigable. En cuanto al desarrollo de esta n aherramienta se emplear´ uno de los lenguajes m´ s novedosos y con m´ s proyecci´ n del momento: R. a a a o Se trata, por tanto, de un proyecto que combina las t´ cnicas m´ s novedosas y con mayor proyecci´ n e a otanto en materia te´ rica, como es la regresi´ n Bayesiana aplicada a datos de tipo intervalo, como en o omateria pr´ ctica, como es el empleo del lenguaje R. a
    • AbstractIn the recent years, Bayesian methods have been spread and successfully used in many and severalfields such as Marketing, Medicine, Engineering, Econometrics or Financial Markets. The main char-acteristic that makes Bayesian Data Analysis (BADAN) remarkable compared with other alternativesis that not only does it take into account the objective information coming from the analyzed event,but also the pre-event knowledge. The benefits obtained from this approach are innumerable due tothe fact that the more knowledge of the situation one has, the more reliable and accurate decisionscould be taken. However, although Bayesian methodology was set long time ago, it has not beenapplied in a general way until the 90’s because of the computational difficulties. Such expansion hasbeen mainly favoured by the advances in that field and the improvement on different calculus meth-ods, such as Markov-chain Monte Carlo methods. Particularly, this Bayesian methodology has been resulted in an extraordinary useful applicationfor the regression models, which have been adopted by large. There are many times in real life inwhich it is necessary to analyse the situation between two quantitive variables. The two main objec-tives of this analysis would be, on the one hand, to determine whether such variables are associatedand in what sense that association comes about (that is, whether the value of one of the variablestends to rise- or to decrease- when augmented the value of the other); and on the other hand, to studywhether the values of one variable can be used to predict the value of the other. A regression modeloffers information about one or more events through their relationship with the behaviour of the oth-ers. With the Bayesian methodology it is possible to add the researcher’s knowledge to the analysis,making thus the results be more accurate due to the fact that the results are not isolated from the dataof one determined sample. On the other hand, in the Statistics field, it has been more and more accepted the fact that the XXIcentury will be the century of the ”Statistics of knowledge” contrary to the last one, which was the iv
    • vone of the ”Statistics of data”. The most basic concept to constitute such Statistics is the symbolicdata; furthermore, there have been developed more statistics methods for some types of symbolic data. Nowadays, the requirements of the market, and the demands of the world in general, are growingup. This implies the continuous increase of the desire for predicting the occurrence of an event or forthe ability of controlling the behaviour of certain quantities with the minimum error with the aim ofoffering better products, obtaining more benefits or scientific improvements and better outcomes. Under this frame, this project tries to responds such needs by offering a large documentationabout several of the most applied and leading nowadays techniques, such as Bayesian data analysis,regression models, and symbolic data, and suggesting different regression techniques. Similarly, ithas been developed a tool that allow the reader to put all the acquired knowledge into practice. Suchapplication will be aimed to the Spanish Continuous Stock Market and it will let the user apply it eas-ily. As far as the development of this tool is concerned, it has been used one of the more innovativeand with more projection languages of the moment: R. So, the project is about a combination of the techniques that are most innovative and with themost projection both in theoretical questions such as Bayesian regression applied to interval- valueddata and in practical questions such us the employment of the R language.
    • List of Figures 1.1 Project Work Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Univariate Normal Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.1 Interval time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 7.1 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 73 7.2 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 74 7.3 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . . . . . 75 7.4 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 75 7.5 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.6 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 77 7.7 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 78 7.8 Bayesian Centre and Radius Method in testing test . . . . . . . . . . . . . . . . . . 80 7.9 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 81 7.10 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 81 7.11 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.12 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.13 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 85 7.14 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 85 7.15 Bayesian Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . 87 9.1 BARESIMDA MDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 10.1 Interface between BARESIMDA and R . . . . . . . . . . . . . . . . . . . . . . . . 104 10.2 Interface between BARESIMDA and Excel . . . . . . . . . . . . . . . . . . . . . . 105 10.3 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 vi
    • LIST OF FIGURES vii C.1 Load Data Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.2 Select File Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 C.3 Display Loaded Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 C.4 Define New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 C.5 Enter New Variable Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 C.6 Display New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 C.7 Edit Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 C.8 Select Variable to Be Editted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 C.9 Enter New Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.10 Confirmation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.11 New Row data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.12 Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.13 Look And Feel Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.14 Look And Feel Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.15 New Look And Feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 C.16 Type Of User Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 C.17 Select Type Of User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 C.18 Non-Symbolic Classical Regression Menu . . . . . . . . . . . . . . . . . . . . . . . 131 C.19 Select Non-Symbolic Variables in Simple Regression . . . . . . . . . . . . . . . . . 131 C.20 Brief Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 C.21 Analysis Options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 132 C.22 New Prediction in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . 133 C.23 Graphics options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 133 C.24 Save options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . . . 134 C.25 Non-Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . 134 C.26 Select Variables in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 134 C.27 Analysis options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 135 C.28 Graphics options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . 135 C.29 Save options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . 136 C.30 Intercept in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . . . 136 C.31 Non-Symbolic Bayesian Simple Regression Menu . . . . . . . . . . . . . . . . . . . 136 C.32 Select Variables in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . 137 C.33 Analysis Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 137 C.34 Graphics Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 138
    • LIST OF FIGURES viii C.35 Save Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . . 138 C.36 Prior Experienced Especification Options in Non-Symbolic Bayesian Simple Regres- sion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 C.37 Prior Inexperienced Especification in Non-Symbolic Bayesian Simple Regression . . 139 C.38 Prior Experienced Especification in Non-Symbolic Bayesian Simple Regression . . . 139 C.39 Non-Symbolic Bayesian Multiple Regression menu . . . . . . . . . . . . . . . . . . 139 C.40 Analysis Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140 C.41 Graphics Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140 C.42 Save Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . . 140 C.43 Model Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . 141 C.44 Symbolic Classical Simple Regression Menu . . . . . . . . . . . . . . . . . . . . . 141 C.45 Select Variables in Symbolic Classical Simple Regression . . . . . . . . . . . . . . . 141 C.46 Analysis Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142 C.47 Graphics Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142 C.48 Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . . 143 C.49 Select Variables in Symbolic Classical Multiple Regression . . . . . . . . . . . . . . 143 C.50 Analysis Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144 C.51 Graphics Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144 C.52 Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . . . . . . . . . . . 145 C.53 Select Variables in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145 C.54 Analysis Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145 C.55 Graphics Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 146 C.56 Model Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . 147 C.57 Symbolic Bayesian Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . 147 C.58 Select Variables in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . . 147 C.59 Graphics Options in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . 148
    • List of Tables 2.1 Distributions in Bayesian Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Comparison between Univariate and Multivariate Normal . . . . . . . . . . . . . . . 15 2.3 Conjugate distributions for other likelihood distributions . . . . . . . . . . . . . . . 16 4.1 Bayes Factor Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Sensitivity Summary I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3 Sensitivity Summary II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1 Multiple and Simple Regression Comparison . . . . . . . . . . . . . . . . . . . . . 40 5.2 Sensitivity analysis of parameter β . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3 Sensitivity analysis of parameter σ2 . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.4 Classical and Bayesian regression comparison . . . . . . . . . . . . . . . . . . . . . 48 5.5 Main Prior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.6 Main Posterior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.7 Prior and Posterior Parameters Summary . . . . . . . . . . . . . . . . . . . . . . . . 59 5.8 Main Posterior Predictive Distributions Summary . . . . . . . . . . . . . . . . . . . 60 6.1 Multivalued Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.2 Modal-multivalued Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 7.1 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 74 7.2 Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . . 76 7.3 Error Measure for Centre Method (2002) . . . . . . . . . . . . . . . . . . . . . . . . 77 7.4 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 78 7.5 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 80 7.6 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 82 ix
    • LIST OF TABLES x 7.7 Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . . 83 7.8 Error Measure for Centre Method (2002) . . . . . . . . . . . . . . . . . . . . . . . . 84 7.9 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 84 7.10 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 86 11.1 Estimated material costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 11.2 Amortization Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 11.3 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
    • ContentsAcknowlegdements iResumen iiAbstract ivList of Figures viList of Tables xContents xvi1 Introduction 1 1.1 Project Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Bayesian Data Analysis 6 2.1 What is Bayesian Data Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Bayesian Analysis for Normal and other distributions . . . . . . . . . . . . . . . . . 10 2.2.1 Univariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Multivariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Other distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Hierarchical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Nonparametric Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Posterior Simulation 20 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 xi
    • CONTENTS xii 3.2 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5 Metropolis-Hastings sampler and its special cases . . . . . . . . . . . . . . . . . . . 25 3.5.1 Metropolis-Hastings sampler . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5.2 Metropolis sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5.3 Random-walk sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5.4 Independence sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Sensitivity Analysis 28 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3 Alternative Stats to Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.4 Highest Posterior Density Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5 Model Comparison Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Regression Analysis 35 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Classical Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 The Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.4 Normal Linear Regression Model subject to inequality constraints . . . . . . . . . . 48 5.5 Normal Linear Regression Model with Independent Parameters . . . . . . . . . . . . 49 5.6 Normal Linear Regression Model with Heteroscedasticity and Correlation . . . . . . 51 5.6.1 Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.6.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.7 Models Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 Symbolic Data 61 6.1 What is symbolic data analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2 Interval-valued variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.3 Classical regression analysis with Interval-valued data . . . . . . . . . . . . . . . . . 67 6.4 Bayesian regression analysis with Interval-valued data . . . . . . . . . . . . . . . . 70
    • CONTENTS xiii7 Results 72 7.1 Spanish Continuous Stock Market data sets . . . . . . . . . . . . . . . . . . . . . . 72 7.2 Direct Relation between Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.3 Uncorrelated Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798 A Guide to Statistical Software Today 88 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.2 Commercial Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 8.2.1 The SAS System for Statistical Analysis . . . . . . . . . . . . . . . . . . . . 89 8.2.2 Minitab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8.2.3 BMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8.2.4 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2.5 S-PLUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2.6 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8.3 Public License Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.3.1 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.3.2 BUGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.4 Analysis Packages with Statistical Libraries . . . . . . . . . . . . . . . . . . . . . . 94 8.4.1 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 8.4.2 Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.4.3 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.5 Some General Languages with Statistical Libraries . . . . . . . . . . . . . . . . . . 95 8.5.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.5.2 C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8.6 Developed Software Tool: BARESIMDA . . . . . . . . . . . . . . . . . . . . . . . 969 Software Requirements Specification 98 9.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 9.2 Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 9.3 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 9.3.1 Classical Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 99 9.3.2 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . 99 9.3.3 Bayesian Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 100 9.3.4 Bayesian Regression with interval- valued data . . . . . . . . . . . . . . . . 100
    • CONTENTS xiv 9.3.5 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.3.6 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.3.7 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.4 External Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 9.4.1 User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 9.4.2 Software Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10210 Software Architecture Study 103 10.1 Hardware/ Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 10.2 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10411 Project Budget 106 11.1 Engineering Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 11.2 Investment and Elements Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 11.2.1 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10812 Conclusions 110 12.1 Bayesian Regression applied to Symbolic Data . . . . . . . . . . . . . . . . . . . . 110 12.2 BARESIMDA Software Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 12.3 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112A Probability Distributions 113 A.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.1.1 Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.1.2 Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.1.3 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.2 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.2.1 Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.2.2 Univariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.2.3 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 A.2.4 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 A.2.5 Inverse- Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 A.2.6 Chi-square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-Square . . . . . . . . . . . . . . 118
    • CONTENTS xv A.2.8 Univariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.2.9 Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 A.2.10 Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 A.2.11 Multivariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 A.2.12 Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 A.2.13 Inverse- Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121B Installation Guide 122 B.1 From source folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 B.2 From installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122C User’s Guide 123 C.1 Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.1.1 Loading an excel file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.1.2 Defining a new variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.1.3 Editing an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 C.1.4 Deleting an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.1.5 Typing in a new data row . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.1.6 Deleting an existing data row . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.1.7 Modifying an existing data . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.2.1 Setting the look& feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.2.2 Selecting the type of user . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.3 Non Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 C.3.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 131 C.3.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 133 C.3.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 136 C.3.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 139 C.4 Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 C.4.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 140 C.4.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 143 C.4.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 144 C.4.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 146
    • CONTENTS xviD Obtaining and Installing R 149 D.1 Binary distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 D.2 Installation from source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 D.3 Package installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151E Obtaining and installing Java Runtime Environment 152 E.1 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 E.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 E.2.1 Installation of Self-Extracting Binary . . . . . . . . . . . . . . . . . . . . . 153 E.2.2 Installation of RPM File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 E.3 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155Bibliography 157
    • Chapter 1Introduction1.1 Project MotivationStatistics is primarily concerned with the analysis of data, either to assist in arriving at an improvedunderstanding of some underlying mechanism, or as a means for making informed rational decisions.Both these aspects generally involve some degree of uncertainty. The statistician’s task is then toexplain such uncertainty, and to reduce it to the extent in which this is possible. Problems of this typeoccur throughout all the physical, social and other sciences. One way of looking at statistics stemsfrom the perception that, ultimately, probability is the only appropriate way to describe and system-atically deal with uncertainty, as if it were the language for the logic of uncertainty. Thus, inferencestatements are precisely framed as probability statements on the possible values of the unknown quan-tities of interest (parameters or future observations) conditional on the observed, available data. Thescientific discipline based on this understanding is called Bayesian Statistics. Moreover, increasinglyneeded and sophisticated models, often hierarchical models, to describe available data are typicallytoo much complex for conventional statistics to handle, but can be tackled within Bayesian Statistics.In principle, Bayesian Statistics is designed to handle all situations where uncertainty is found. Sincesome uncertainty is present in most aspects of life, it may be argued that Bayesian Statistics shouldbe appreciated and used by everyone. It is the logic of contemporary society and science. Accordingto [Rupp04], applying Bayesian methodology is no more discussed, but the question is when this hasto be done. Bayesian methods have matured and improved in several ways during last fifteen years. Actually,they are increasingly becoming attractive to researchers as well as successful applications of Bayesian 1
    • 1. Introductiondata analysis have been appeared in many different fields, including Actuarial Science, Biometrics,Finance, Market Research, Marketing, Medicine, Engineering or Social Science. It is not only thatthe Bayesian approach produces appropriate answers to many current important problems, but alsothere is an evident need for it, given the inapplicability of conventional statistics to many of them. Thus, the main characteristic offered by Bayesian data analysis is the possibility of incorporatingresearcher’s knowledge about the problem to be handled. This supposes obtaining the better and themore reliable results as far as prior knowledge is more and more precise. But Bayesian Statistics wasrestrained until mid 90’s by its computational complexity. Since then, it has had a great expansionfavoured by the development and improvement of different computational methods in this field suchas Markov chain Monte Carlo. This methodology has shown to be extremely useful in its application to regression models, whichare widely accepted. Let us remember that the general purpose of regression analysis is to learn moreabout the relationship between several independent or predictor variables and a dependent or criterionvariable. Bayesian methodology let the researcher incorporate her or his knowledge to the analysis,improving the results since they do not only depend on the sampling data. On the other hand, increasingly, datasets are so large that they must be summarized in some fash-ion so that the resulting summary dataset is of a more manageable size, while still retaining as muchknowledge inherent to the entire dataset as possible. One consequence of this situation is that datamay no longer be formatted as single values such as is the case for classical data, but rather may berepresented by lists, intervals, distributions, and the like. These summarized data are examples ofsymbolic data. This kind of data also lets us represent better the knowledge and beliefs having in ourmind and that it is limited and hardly to take out with classical Statistics. According to [Bill02], thisresponds to the current need of changing from a Statistics of data in the past century to a Statistics ofknowledge in XXI century. Market and demand requirements are increasing continuously throughout the time. This impliesa need of better and more accurate methods to forecast new situations and to control different quanti-ties with the minimum error in order to supply better products, to obtain higher incomes or scientistadvantages and better results. Dealing with this outlook, this project is intended to respond to those requirements providing a 2
    • 1. Introductionwide and exhaustive documentation about some of the currently more used and advanced techniques,including Bayesian data analysis, regression models and symbolic data. Different examples relatedto the Continuous Spanish Stock Market have been explained throughout this writing, making clearthe advantages of employing the described methods. Likewise a software tool with a user- friendlygraphical interface has been developed to practice and to check all the acquired knowledge. Therefore, this is a project combining the most recent techniques with major future implicationsin theoretical issues, as Bayesian regression applied to interval- valued data is, with a technologicalpart dealing with the problem of interconnecting two software programs: one used to show the graph-ical user interface and the other one employed to make computations. Regarding to a more personal motivation, when accepting this project, several factors were takeninto consideration by the author: • A great challenge: it is an ambitious project with a high technical complexity related to both its theoretical basis and its technological basis. This represents a very good letter of introduction in order to be incorporated to the labour world. • A good planning time: this project was designed to be finished before June of 2007, which means to be able of finishing the career in June and incorporating to labour world in September. • Some very interesting issues: on one hand, it deals with the always needed issue of forecasting and modelling observations and situations in order to get the best possible results. On the other hand, it focuses on the Stock Market, which meets my personal hobbies. • A new programming language: the possibility of learning deeply a new and relatively recent programming language, such as R, was an extra- motivation factor. • The project director: Carlos Mat´ is considered a demanding and very competent director by e the students of the university. • An investigation scholarship: The possibility of being in the Industrial Organization department of the University learning from people such as the director mentioned above and another very recognized professors was a great factor. 3
    • 1. Introduction1.2 ObjectivesThis project pretends to get the following aims. • To provide a wide and rigorous documentation about the following issues: Bayesian data anal- ysis, regression models and symbolic data. From this point, documentation about Bayesian regression will be developed, as well as the software tool designed. • To build a software tool in order to fit Bayesian regression models to interval- valued data, finding out the most efficient way to design the graphical user interface. This must be as user- friendly as possible. • To find out the most efficient way to offer that system to future clients from the tests carried out with the application. • To design a survey to measure the quality of the tool and users’ satisfaction. • The possibility to write an article for a scientific journal.1.3 MethodologyAs the title of the project indicates, the last purpose is the development of an application aimed to-wards stock markets based on a Bayesian regression system and, therefore, some previous knowledgeis required. The first stage is the familiarization of the Bayesian data analysis, regression models applied toBayesian methodology and symbolic data. Within this phase, Bayesian data analysis will be firstly studied, trying to synthesize and to getthe most important elements. A special dedication will be given to posterior simulation and computa-tional algorithms. Then, regression models will be treated, reviewing quickly the classical approach,to deep later into the different Bayesian regression models, applying great part of what was explainedin Bayesian methodology. Finally, this first stage will be completed with the application to symbolicdata, paying special attention to interval- valued data. The second stage is referred to the development of the software application, employing an incre-mental methodology for programming and testing iterative prototypes. This methodology has been 4
    • 1. Introductionconsidered the most suitable for this project since it will let us introduce successive models into theapplication. The following figure shows the structure of the work packages the project is divided into: Figure 1.1: Project Work Packages 5
    • Chapter 2Bayesian Data Analysis2.1 What is Bayesian Data Analysis?Statistics can be defined as the discipline that provides us with a methodology to collect, to organize,to summarize and to analyze a set of data. Regarding data analysis, it can be divided into two ways of analysis: exploratory data analysis andconfirmatory data analysis. The former is used to represent, describe and analyze a set of data throughsimple methods in the first stages of statistical analysis. The latter is applied to make inferences fromdata, based on probability models. In the same way, confirmatory data analysis is divided into two branches depending on the adoptedapproach. The first one, known as frequentist, is used to make the inference of the data resulting froma sampling through classical methods. The second branch, known as Bayesian, goes further in theanalysis and adds to those data the prior knowledge which the researcher has about the treated prob-lem. Since the frequentist approach is not worthy to explain everything here, a more extended revisionof different classical methods related to the frequentist approach can be found in [Mont02].   Exploratory   Data Analysis Frequentist  Confirmatory   Bayesian 6
    • 2. Bayesian Data Analysis As far as Bayesian analysis is concerned and according to [Gelm04], the process can be dividedinto the following three steps: • To set up a full probability model, through a joint probability distribution for all observable and unobservable quantities in a problem. • To condition on observed data, obtaining the posterior distribution. • Finally, to evaluate the fit of the model and the implications of the resulting posterior distribu- tion. f (θ, y), known as the joint probability distribution (or f (y|θ), if there are several parameters θ),is obtained by means of f (θ, y) = f (y|θ)f (θ) (resp. f (θ, y) = f (y|θ)f (θ)) (2.1)where y is the set of sampled data. So this distribution is the product of two densities that are referredto as the sampling distribution f (y|θ) (resp. f (y|θ)) and the prior distribution f (θ) (resp. f (θ)). The sampling distribution, as its name suggests, is the probability model that the researcher as-signs to the statistics (resp. set of statistics) to be studied after the data have been observed. Here,an important problem stands up in relation to parametric approach due to the fact that the probabilitymodel that the researcher chooses could not be adequate. The nonparametric approach overcomesthis inconvenient as it will be seen later. When y is considered fixed, so it is function of θ (resp. θ), the sampling distribution is called thelikelihood function and obeys the likelihood principle, which states that for a given sample of data,any two probability models f (y|θ) (resp. f (y|θ)) with the same likelihood function yield the sameinference for θ, (resp. θ). The prior distribution does not depend upon the data. Accordingly, it contains the informationand the knowledge that the researcher has about the situation or problem to be solved. When thereis not any previous significant population from which the engineer can take his knowledge, that is,the researcher has not any prior information about the problem, a non-informative prior distributionmust be used in the analysis in order to let the data speak for themselves. Hence, it is assumed thatthe prior knowledge will have very little importance in the results. But most non- informative priors 7
    • 2. Bayesian Data Analysisare ”improper” in that they do not integrate to 1, and this fact can cause problems. In these casesit is necessary to be sure that the posterior distribution is proper. Another possibility is to use aninformative prior distribution but with an insignificant weight (around zero) associated to it. Though the prior distribution can take any form, it is common to choose particular classes ofpriors that make computation and interpretation easier. These are the conjugate priors. A conjugateprior distribution is one which, when combined with the likelihood function, gives a distribution thatfalls in the same class of distributions as the prior. Furthermore, and according to [Koop03], a naturalconjugate prior has the additional property that it has the same form as the likelihood does. But it isnot always possible to find this kind of distribution and the researcher has to manage a lot of distribu-tions to be able to give expression to his prior knowledge about the problem. This is another handicapthat the nonparametric approach reduces. In relation to the prior, what distribution should be chosen? There are three different points ofview corresponding to different styles of Bayesians: • Classical Bayesians consider that the prior is a necessary evil and priors that interject the least information possible should be chosen. • Modern parametric Bayesians considers that the prior is a useful convenience and priors with desirable properties such as conjugacy should be chosen. They remark that given a distribu- tional choice, prior hyper-parameters that interject the least information possible should be chosen. • Subjective Bayesians give essential importance to the prior, in the sense they consider it as a summary of old beliefs. So prior distributions which are based on previous knowledge (either the results of earlier studies or non-scientific opinion) should be chosen. Returning to Bayesian data analysis process, simply conditioning on the observed data y andapplying the Bayes’ Theorem, the posterior distribution, namely f (θ|y) (resp. f (θ|y)), yields: f (θ, y) f (θ)f (y|θ) f (θ, y) f (θ)f (y|θ) f (θ|y) = = (resp. f (θ|y) = = ) (2.2) f (y) f (y) f (y) f (y)where ∞ ∞ ∞ f (y) = f (θ)f (y|θ)dθ (resp. f (y) = f (θ)f (y|θ)dθ) (2.3) 0 0 0 8
    • 2. Bayesian Data Analysisis known as the prior predictive distribution, since it is not conditional upon a previous observation ofthe process and is applied to an observable quantity. An equivalent form of the posterior distribution displayed above omits the prior predictive distri-bution, since it does not involve θ (resp. θ) and the interest is based on learning about θ (resp. θ).So, with fixed y, it can be said that the posterior distribution is proportional to the joint probabilitydistribution f (θ, y). Once the posterior distribution is calculated, some kind of summary measure will be required toestimate the uncertainty about the parameter θ (resp. θ). This is due to the fact that the posteriordistribution is a high- dimensional object and its use is not practical for a problem. That measurewhich will summarize the posterior distribution can be the posterior mean, mode, median or variance,apart from others. Its choice will depend on the requirements of the problem. So the posterior dis-tribution has a great importance since it lets the researcher manage the uncertainty about θ (resp. θ)and provide him information about it (resp. them) taking into account both his prior knowledge andthe data collected from sampling on that parameter. According to [Mat´ 06], it is not difficult to deduce that posterior inference will fit in the non- eBayesian one as long as the estimation which the researcher gives to the parameter θ (resp. θ) is thesame as the one resulting from the sampling. Once the data y have been observed, a new unknown observable quantity y can be predicted for ˜the same process through the posterior predictive distribution, namely f (˜|y): y f (˜|y) = y f (˜, θ|y)dθ = y f (˜|θ, y)dθ = y f (˜|θ)f (θ|y)dθ y (2.4) To sum up, the basic idea is to update the prior distribution f (θ) through Bayes’ theorem byobserving the data y in order to get a posterior distribution f (θ|y). Then a summary measure or aprediction for new data can be obtained from f (θ|y). Table 2.1 reflects what has been said. 9
    • 2. Bayesian Data Analysis Distribution Expression Information Required Result Likelihood f (y|θ) Data Distribution f (y|θ) Prior f (θ) Researcher’s Knowledge Parameter Distribution f (θ) Joint f (y|θ)f (θ) Likelihood Distribution Prior Distribution f (θ, y) Posterior f (θ)f (y|θ) Prior Joint Distribution f (θ|y) Predictive f (˜|θ)f (θ|y)dθ y New Data Distribution Posterior Distribution f (˜|y) y Table 2.1: Distributions in Bayesian Data Analysis2.2 Bayesian Analysis for Normal and other distributions2.2.1 Univariate Normal distributionThe basic model to be discussed concerns an observable variable , normally distributed with mean µand unknown variance σ 2 : y|µ, σ 2 N (µ, σ 2 ) (2.5) As it can be seen in Appendix A, the likelihood function for a single observation is 1 f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp − (y − µ)2 (2.6) 2σ 2 This means that the likelihood function is proportional to a Normal distribution, omitting thoseterms that are constant. Now let us consider we have n independent observations y1 , y2 , . . . , yn . According to the previ-ous section, the parameters to be estimated θ are µ and σ 2 : 10
    • 2. Bayesian Data Analysis θ = (θ1 , θ2 ) = (µ, σ 2 ) (2.7) A full probability model must be set up through a joint probability distribution: f (θ, (y1 , y2 , . . . , yn )) = f (θ, y) = f (y|θ)f (θ) (2.8) The likelihood function for a sample of n iid observations in this case is n 1 f (y|θ) = f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp − (yi − µ)2 (2.9) 2σ 2 i=1 As it was recommended previously, a conjugate prior will be chosen; in fact, it will be a naturalconjugate prior. According to [Gelm04], this likelihood function suggests a conjugate prior distribu-tion of the form f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) (2.10)where the marginal distribution of σ 2 is the Scaled Inverse-χ2 and the conditional distribution of µgiven σ 2 is Normal (details about these distributions in Appendix A): µ|σ 2 N (µ0 , σ 2 V0 ) (2.11) σ2 Inv − χ2 (µ0 , s2 ) 0 (2.12) So the joint prior distribution is: f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) ∝ N − Inv − χ2 (µ0 , s2 V0 , ν0 , s2 ) 0 0 (2.13) Its four parameters can be identified as the location and scale of µ and the degrees of freedom andscale of σ 2 , respectively. As a natural conjugate prior was employed, the posterior joint distribution will have the sameform that the prior has. So, conditioning on the data, and according to Bayes’ Theorem, we have: f (θ|y) = f (µ, σ 2 |y) = f (y|µ, σ 2 )f (µ, σ 2 ) ∝ N − Inv − χ2 (µ1 , s2 V1 , ν1 , s2 ) 1 1 (2.14)where it be can shown that 11
    • 2. Bayesian Data Analysis µ1 = (V0−1 + n)−1 V0−1 µ0 + n¯ y (2.15) −1 V1 = V0−1 + n (2.16) ν1 = ν0 + n (2.17) V0−1 n ν1 s2 = ν0 s2 + (n − 1)s2 + 1 0 (¯ − µ0 )2 y (2.18) V0−1 + n All these formulae evidence that Bayesian inference combines prior and posterior information. The first term means that posterior mean µ1 is a weighted mean of prior mean µ0 and empiricalmean divided by the sum of their respective weights, where these are represented by V0−1 and thesimple size n. The second term represents the importance that posterior mean has and it can be seen as a com-promise between the sample size and the significance given to the prior mean. The third term indicates that the degrees of freedom of posterior variance are the sum of the priordegrees of freedom and the sample size. That is, the prior degrees of freedom can be understood as afictitious sample size on which the expert’s prior information is based. The last term explains the posterior sum of square errors as a combination of prior and empiricalsum of square errors plus a term that measures the conflict between prior and posterior information. A more detailed explanation of this last step can be found in [Gelm04], [Koop03] or [Cong06]. It is obvious that the marginal posterior distributions are: µ|σ 2 , y N (µ1 , σ 2 V0 ) (2.19) σ 2 |y Inv − χ2 (ν1 , s2 ) 1 (2.20) If we integrate out σ 2 , the marginal for µ will be a t-distribution (see Appendix A for details): µ|y tν1 (µ1 , s2 V0 ) 1 (2.21) 12
    • 2. Bayesian Data Analysis Let us see an application to the Spanish Stock market. Let us suppose that the monthly closevalues associated with Ibex 35 are normally distributed. If we take the values at which the Span-ish index closed during the first two weeks in January in 2006, it can be shown that the mean was10893.29 and the standard deviation was 61.66. So the non- Bayesian approach would inference aNormal distribution with the previous mean and standard deviation. Let us guess that we had askedany analyst about the Ibex 35 evolution in January, he would have affirmed strongly that it woulddecrease slightly, the mean close value at the end of the month would be around 10870 and, hence,the standard deviation would be higher, around 100. Then, according to the previous formulas, theposterior parameters would be µ1 = (100 + 10)−1 (100 × 10870 + 10 × 10893.29) = 10872.12 V1 = (100 + 10)−1 = 0.0091 ν1 = 100 + 10 = 110 (100 × 1002 + 9 × 61.66 + 1000 (10893.29 − 10870)2 ) 110 s1 = = 95.60 110 This means that there is a difference of almost 20 points between the Bayesian estimation and thenon-Bayesian for the mean close value of January. When the month of January would have passed, wecould compare both results and we could note that the Bayesian estimation was closer to the finallyreal mean close value and standard deviation: 10871.2 and 112.44. In figure 2.1, it can be seen howthe blue line representing the Bayesian estimation is closer to the cyan line representing the final realmean close value than the red line representing the frequentist estimation:2.2.2 Multivariate Normal distributionNow, let us consider that we have an observable vector y of d components with the multivariateNormal distribution: y N (µ, Σ) (2.22)where the first parameter is the mean column vector and the second one is the variance-covariancematrix. Extending what was said above to the multivariate case, we have: 13
    • 2. Bayesian Data Analysis −3 x 10 7 Frequentist Approach Bayesian Approach 6 Real Mean Colse Value in January 5 4 3 2 1 0 10000 10200 10400 10600 10800 11000 11200 11400 11600 11800 12000 Figure 2.1: Univariate Normal Example 1 f (y|µ, Σ) ∝ Σ−1/2 exp − (y − µ) Σ−1 (y − µ) (2.23) 2 And for n iid observations: n −n/2 1 f (y1 , y2 , . . . , yn |µ, Σ) ∝ Σ exp − (yi − µ) Σ−1 (yi − µ) (2.24) 2 i=1 A multivariate generalization of the Scaled-Inverse χ2 is the Inverse Wishart distribution (seedetails in Appendix A), so the prior joint distribution is Λ0 f (θ|y) = f (µ, Σ|y) ∝ N − Inv − W ishart µ0 , , ν0 , Λ0 (2.25) k0due to the fact that Σ µ|Σ N µ0 , (2.26) k0 Σ Inv − W ishart ν0 , Λ−1 0 (2.27) 14
    • 2. Bayesian Data Analysis Univariate Normal Multivariate NormalExpression y N (µ, σ 2 ) y N (µ, Σ)Parameters to estimate µ, σ 2 µ, Σ 2 µ|σ 2 N µ0 , σ0 k µ|Σ Σ N µ0 , k0Prior Distributions σ2 Inv − χ2 ν0 , σ0 2 Σ Inv − W ishart ν0 , Λ−1 0 2 σ0 µ, σ 2 N − Inv − χ2 µ0 , 2 k0 , ν0 , σ0 µ, Σ N − Inv − W ishart µ0 , k0 , ν0 , Λ−1 Σ 0 2 µ|σ 2 N µ1 , σ1 k µ|Σ Σ N µ1 , k1Posterior Distributions σ2 Inv − χ2 ν1 , σ1 2 Σ Inv − W ishart ν1 , Λ−1 1 2 σ1 µ, σ 2 N − Inv − χ2 µ1 , 2 k1 , ν1 , σ1 µ, Σ N − Inv − W ishart µ1 , Λ1 , ν1 , Λ1 k 1 Table 2.2: Comparison between Univariate and Multivariate Normal The posterior results are the same that were told for the univariate case but applying these distri- butions. For those interested readers, more information in [Gelm04] or [Cong06]. A summary is shown in Table 2.2 in order to get the most important ideas. 2.2.3 Other distributions As it has just been made with the Normal distribution, a Bayesian analysis for other distributions could be done. For instance, the exponential distribution is commonly used in reliability analysis. Because of this project will deal with the Normal distribution for the likelihood, it will not be explained in detail the analysis with other distributions. Table 2.3 shows the conjugate prior and posterior distributions 15
    • 2. Bayesian Data Analysisfor other likelihood distributions. More details can be found in [Cong06], [Gelm04], or [Rossi06]. Likelihood Parameter Conjugate Prior Hyperparameters Posterior Hyperparameters Bin(y|n, θ) θ Beta α, β α + y, β + n − y P (y|θ) θ Gamma α, β α + n¯, β + n y Exp (y|θ) θ Gamma α, β α + 1, β + y Geo(y|θ) θ Beta α, β α + 1, β + y Table 2.3: Conjugate distributions for other likelihood distributions2.3 Hierarchical ModelsHierarchical data arise when they are structured or related among them. When this occurs, standardtechniques either assume that these groups belong to entirely different populations or ignore the ag-gregate information entirely. Hierarchical models provide a way of pooling the information for the disparate groups withoutassuming that they belong to precisely the same population. Suppose we have collected data about some random variable Y from m different populations withn observations for each population. Let yij represent observation j from population i. Now suppose yij f (θi ), where θi is a vectorof parameters for population i. Furthermore, θi f (Θ), where Θ may also be a vector. Until thispoint, we have only rewritten what it was said previously. 16
    • 2. Bayesian Data Analysis Now let us extend the model, and assume that the parameters Θ11 , Θ12 that govern the distributionof the θ’s are themselves random variables and assign a prior distribution to these variables as well: Θ f (ψ) (2.28)where Θ is called the hyperprior. The vector parameter ψ for the hyperprior may be ”known” andrepresents our prior beliefs about Θ or, in theory; we can also assign a probability distribution forthese quantities as well, and proceed to another layer of hierarchy. According to [Gelm04], the idea of exchangeability will be used to create a joint probabilitydistribution model for all the parameters θ. A formal definition to explain what exchangeabilityconsists of is: ”The parameters θ1 , θ2 , . . . , θn are exchangeable in their joint distribution if f (θ1 , θ2 , . . . , θn ) isinvariant to permutations in the index 1, 2, . . . , n”. This means that if no information other than the data is available to distinguish any of the θi fromany of the others, and no ordering of the parameters can be made, one must assume symmetry amongthe parameters in the prior distribution. So we can treat the parameters for each sub-population asexchangeable units. This can be formulated by: f θ1 , θ2 , . . . , θn |Θ = Πl f θi |Θ i=1 (2.29) The prior joint distribution is now: f θ1 , θ2 , . . . , θn , Θ = f θ1 , θ2 , . . . , θn |Θ f (Θ) (2.30) And conditioning on the data, it yields: f θ1 , θ2 , . . . , θn |y = f θ1 , θ2 , . . . , θn , Θ f y|θ1 , θ2 , . . . , θn , Θ (2.31) Perhaps the most important point in practice is that non-hierarchical models are usually inappro-priate for hierarchical data, while non-hierarchical data can be modelled following the hierarchicalstructure and assigning concrete values to the hyperprior parameters. This kind of models will be used in Bayesian regression models with autocorrelated errors, as itwill be seen in the following chapters. 17
    • 2. Bayesian Data Analysis For more details about Bayesian hierarchical models, the reader is referenced to [Cong06], [Gelm04]and [Rossi06].2.4 Nonparametric BayesianTo overcome the limitations that have been mentioned throughout this chapter, it is the nonparametricapproach which achieves to get through and to reduce the restrictions of the parametric approach.This kind of analysis can be performed through the so-called Dirichlet Process, which allows us toexpress in a simple way the prior distributions or the distribution family of F , where F is the distri-bution function of the studied variable. This process has a parameter, called α, which is transformedinto a distribution probability. According to [Mat´ 06], a Dirichlet Process for F (t) requires to know: e • A previous proposal for F (t), F0 (t), that corresponds to the distribution function that remarks the prior knowledge which the engineer has and it is denoted by α(t) F0 (t) = (2.32) M • A measure of the confidence about the previous proposal, denoted by M , and whose values can vary between 0 and ∞, depending on whether there is a total confidence in the data or in the previous proposal respectively. ˆ It can be demonstrated that the posterior distribution for F (t), Fn (t), with a sampling over n data,is given by ˆ Fn (t) = pn Fn (t) + (1 − pn )Fn (t) (2.33) Mwhere Fn (t) is the empirical distribution function and pn = M +n . A more detailed information about the nonparametric approach and how Dirichlet processes areused can be found in [Mull04] or [Gosh03]. 18
    • 2. Bayesian Data Analysis With this approach not only the limitation of the parametric approach related to the probabilitymodel of the variable to study is avoided, since no hypothesis is required, but also it allows us toconfer a quantified importance to the prior knowledge which the engineer gives, depending on theconfidence on the certainty about this knowledge. 19
    • Chapter 3Posterior Simulation3.1 IntroductionA practical problem with Bayesian inference is the difficulty of summarizing realistically complexposterior distributions. In most practical problems, posterior densities will not take the form of anywell-known and understood density, so summary statistics, such as the posterior mean and variance ofparameters of interest, will not be analytically available. It is at this point where the importance of theBayesian computation arises and any computational tools are required to gain meaningful inferencefrom the posterior distribution. Its importance is such that the computing revolution of the last 20years has led to a blossoming of Bayesian methods in many fields such Econometrics, Ecology orHealth. Regarding to this, the most transcendent simulation methods are the Markov chain Monte Carlomethods (MCMC). MCMC methods date from the original work of [Metr53], who were interestedin methods for the efficient simulation of the energy levels of atoms in a crystalline structure. Theoriginal idea was subsequently generalized by [Hast70], but its true potential was not fully realizedwithin the statistical literature until [Gelf90] demonstrated its application to the estimation of inte-grals commonly occurring in the context of Bayesian statistical inference. As [Berg05] points up, the underlying principle is simple: if one wishes to sample randomly froma specific probability distribution then design a Markov chain whose long-time equilibrium is thatdistribution, write a computer program to simulate the Markov chain, run it for a time long enoughto be confident that approximate equilibrium has been attained, then record the state of the Markov 20
    • 3. Posterior Simulationchain as an approximate draw from equilibrium. The technique has been developed strongly in different fields and with rather different emphasesin the computer science community concerned with the study of random algorithms (where the em-phasis is on whether the resulting algorithm scales well with increasing size of the problem), in thespatial statistics community (where one is interested in understanding what kinds of patterns arisefrom complex stochastic models), and also in the applied statistics community (where it is appliedlargely in Bayesian contexts, enabling researchers to formulate statistical models which would other-wise be resistant to effective statistical analyses). The development of the theoretical work also benefits the development of statistical applications.The MCMC simulation techniques have been applied to develop practical statistical inferences foralmost all problems in (bio) statistics, for example, the problems in longitudinal data analysis, im-age analysis, genetics, contagious disease epidemics, random spatial pattern, and financial statisticalmodels such as GARCH and stochastic volatility. The simplicity of the underlying principle of MCMC is a major reason for its success. Howevera substantial complication arises as the underlying target problem becomes more complex; namely,how long should one run the Markov chain so as to ensure that it is close to equilibrium? According to[Gelm04], with n = 100 independent samples should be enough for reasonable posterior summaries,but in some cases more samples are needed to assure more accuracy.3.2 Markov chainsThe essential theory required in developing Monte Carlo methods based on Markov chains is pre-sented here. The most fundamental result is that certain Markov chains converge to a unique invariantdistribution, and can be used to estimate expectations with respect to this distribution. But in order toreach this conclusion, some concepts need to be defined firstly. A Markov chain is a series of random variables, X0 , . . . , Xn , also called a statistic process, inwhich only the value of Xn−1 influences the distribution of Xn . Formally: P (Xn = xn |X0 = x0 , . . . , Xn−1 = xn−1 ) = P (Xn = xn |Xn−1 = xn−1 ) (3.1) 21
    • 3. Posterior Simulationwhere the Xn−1 have a common range called the state space of the Markov chain. The common language to refer to different situations in which a Markov chain can be found isthe following. If Xn = i, it is said that the chain is in the state i in the step n or that it has the valuei in the step n. This language confers the chain certain dynamic view, which is corroborated by themain tool to study it: the transition probabilities P (Xn+1 = j|Xn = i), which are represented by thetransition matrix P = (Pij ) with Pij = P (Xn+1 = j|Xn = i) . This is used to show the probabilityof changing of state i to state j. Due to the fact that in major interesting applications Markov chains are homogeneous, the transi-tion matrix can be defined from the initial probability, P0 = P (X1 = j|X0 = i). Regarding to this, aMarkov chain Xt is homogeneous if P (Xn+1 = j|Xn = i) = P (X1 = j|X0 = i) for all n, i, j. Furthermore, using Chapman- Kolmogorov equation, it can be shown that, given the transitionmatrixes P and, for step n, Pn of a homogenous Markov chain, then Pn = P n . On the other hand we will see the concepts of invariant or stationary distribution, ergodicity andirreducibility, which are indispensable to reach the main result. It will be assumed that Xt is a ho-mogenous Markov chain. Then, vector P is an invariant distribution of the chain Xt if satisfies: a) πj ≥ 0 such as j πj = 1. b) π = πP . That is, a stationary distribution over the states of a Markov chain is one that persists forever onceit is reached. The concept of ergodic state requires making other definitions clear such as recurrence and aperi-odicity: • The state i is recurrent if P (Xn = i|X0 = i) = 1 for any n ≥ 1. Otherwise, it is transient. Moreover, i will be positive recurrent if the expected (average) return time is finite, and null recurrent if it is not. 22
    • 3. Posterior Simulation • The period of a state i, denoted by d, is defined as di = mcd(n : [Pn ]ii > 0). The state i is aperiodic if di = 1, or periodic if it is greater. Then a state is ergodic if it is positive recurrent and aperiodic. The last concept to define is theirreducibility. A set of states C ∈ S, where S is the set of all possible states, is irreducible if for alli, j ∈ C: • i and j have the same period. • i is transient if and only if j is transient. • i is recurrent if and only if j is null recurrent. Now, having all these concepts in mind, we can know if a Markov chain has a stationary distribu-tion with next lemma:Lema 3.2.1. Let Xt be a homogenous and irreducible Markov chain. The chain will have only onestationary distribution if, and only if, all the states are positive recurrent. In that case, it will haveinputs given by πi = µi −1 , where µi denotes the expected return time of the state i. The relation with the long time behaviour is given by this other lemma:Lema 3.2.2. Let Xt be a homogenous, irreducible and aperiodic Markov chain. Then 1 [Pn ]ij −→ for all i, j ∈ S as n ∞ (3.2) µi3.3 Monte Carlo IntegrationMonte Carlo integration estimates the integral E[g(θ)] by obtaining samples θt , t = 1, . . . , n fromthe posterior distribution p(θ|y) and averaging n 1 E[g(θ)] = g(θt ) (3.3) n t=1 where the function g(θ) represents the function of interest to estimate. Note that if samplesθt , t = 1, . . . , n has p(θ|y) as its stationary distribution, the θt form a Markov chain. 23
    • 3. Posterior Simulation3.4 Gibbs samplerIn many models, it is not easy to draw directly from the posterior distribution p(θ|y). However, if theparameter θ is partitioned into several blocks as θ = (θ1 , . . . , θp ) where θj for j = 1, . . . , p, then thefull conditional posterior distributions, p(θ1 |y, θ2 , . . . , θp ), . . . , p(θp |y, θ1 , . . . , θp−1 ) , could be sim-ple to draw from to obtain a sequence θ1 , . . . , θp . For instance, in the Normal linear regression modelit is convenient to set j=2, with θ1 = β and θ2 = σ 2 , and the full conditional distributions would bep(θ1 = β|y, θ2 = σ 2 ) and p(θ2 = σ 2 |y, θ1 = β), which are very useful in the Normal independentmodel which will be explained later. The Gibbs sampler is defined by iterative sampling from each of those p conditional distributions: 1. Set a starting value, θ0 = (θ2 , . . . , θp ). 0 0 2. Take random draws 1 0 0 - θ1 from p(θ1 |y, θ2 , . . . , θp ) 1 1 0 - θ2 from p(θ2 |y, θ1 , . . . , θp ) . -. . 1 1 1 - θp from p(θp |y, θ1 , . . . , θp−1 ) 3. Repeat step 2 as necessary. 4. Reject those θ affected by θ0 = (θ2 , . . . , θp ), that is the first p − 1 draws, and average the rest 0 0 of draws applying the Monte Carlo integration. For instance, in the Normal regression model we would have: 1. Set a starting value, θ0 = (θ2 = (σ 2 )0 ). 0 2 2. Take random draws - θ1 = β1 from p(θ1 = β|y, θ2 = (σ 2 )0 ) 1 1 0 2 - θ2 = (σ 2 )1 from p(θ2 = σ 2 |y, θ1 = β) 1 2 1 3. Repeat step 2 as necessary. 1 1 4. Eliminate those θ1 = β1 and average the rest of draws applying the Monte Carlo integration. 24
    • 3. Posterior Simulation Those values dropped which are affected by the starting point are called the burn-in. Generally,any set of values which are discarded in a MCMC simulation is called the burn-in. The size of theburn-in period is the subject of current research in MCMC methods. As the state of each draw depends on the state of the previous one, the sequence is a Markovchain. More detail information can found in [Chen00], [Mart01] or [Rossi06].3.5 Metropolis-Hastings sampler and its special cases3.5.1 Metropolis-Hastings samplerThe Metropolis-Hastings method is adequate to simulate models that are not conditionally conjugate.Furthermore, it can be combined with the Gibbs sampler to simulate posterior distributions wheresome of the conditional posterior distributions are easy to sample from and other ones are not. Asthe algorithms above explained, this is based on formulating a Markov chain, but using a proposaldistribution, q(.|θt ), which depends on the current state θt , to generate a new proposed sample θ∗ .This proposal is accepted as the next state with probability given by p(θ∗ |y)q(θt |θ∗ ) α(θt , θ∗ ) = min 1, (3.4) p(θt |y)q(θ∗ |θt ) If the point θ∗ is not accepted, then the chain does not move and θt+1 = θt . According to[Mart01], the steps to follow are: 1. Initialize the chain to θ0 and set t=0. 2. Generate a candidate point θ∗ from q(.|θt ). 3. Generate U from a uniform (0,1) distribution. 4. If U ≤ α(θt , θ∗ ) then set θt+1 = θ∗ , else set θt+1 = θt . 5. Set t=t+1 and repeat steps 2 trough 5. 6. Take the average of the draws g(θ1 ), . . . , g(θn ) Note that it should be, not only recommendable, but also essential that the proposal distributionq(·|θt ) were easy to sample from. 25
    • 3. Posterior Simulation There are some special cases of this method. The most important are briefly explained below. Aswell as those, it can shown according to [Gelm04] that the Gibbs sampler is another special case ofthe Metropolis-Hastings algorithm where the proposal point is always accepted.3.5.2 Metropolis samplerThis method is a particular case of the Metropolis-Hastings sampler where the proposal distributionhas to be symmetric. That is, q(θ∗ |θt ) = q(θt |θ∗ ) (3.5) for all θ∗ and θt . Then, the probability of accepting the new point is p(θ = θ∗ |y) α(θt , θ∗ ) = min 1, (3.6) p(θ = θt |y) The same procedure seen in the Metropolis-Hastings sampler has to be followed.3.5.3 Random-walk samplerThis special case refers to a proposal distribution of the form q(θ∗ |θt) = q(|θt − θ∗ |) (3.7) And the candidate point is θ∗ = θt + z, where z is called the increment random variable from q.Then, the probability of accepting the new point is p(θ = θ∗ |y) α(θt , θ∗ ) = min 1, (3.8) p(θ = θt |y) The same procedure seen in the Metropolis-Hastings sampler has to be followed.3.5.4 Independence samplerThe last variation has a proposal distribution such that q(θ∗ |θt ) = q(θ∗ ) (3.9) So it does not depend on θt . Then, the probability of accepting the new point is 26
    • 3. Posterior Simulation p(θ∗ |y)p(θt ) w(θ∗ ) α(θt , θ∗ ) = min 1, = min 1, (3.10) p(θt |y)p(θ∗ ) w(θt ) where p(θ|y) w(θ) = (3.11) q(θ) It is important to remark that to make this method works well, the proposal distribution q shouldbe very similar to the posterior distribution p(θ|y). The same procedure seen in the Metropolis-Hastings sampler has to be followed.3.6 Importance samplingImportance sampling is a variance reduction technique that can be used in the Monte Carlo method.The idea behind this method is that certain values of the input random variables in a simulation havemore impact on the parameter being estimated than others. So instead of taking a simple average,importance sampling takes a weighted average. Let q(θ) be a density from which is easy to obtain random draws θ(s) for s = 1, . . . , S. Then q(θ)is called the importance function, and the importance sampling can be defined: PS (s) )g(θ (s) ) s=1 w(θ p(θ=θ(s) |y) The function gs = ˆ PS (s) ) , where w(θ(s) ) = q(θ=θ (s) ) , converges to E[g(θ)|y] as s=1 w(θS −→ inf. p∗ (θ|y) In fact, w(θ(s) ) can be formulated by w(θ(s) ) = q ∗ (θ|y) , where the new densities are proportionalto the old ones. For more information and details about Markov chain Monte Carlo methods and their application,the reader is referred to [Chen00], [Gilk95], [Berg05] and [Kend05]. 27
    • Chapter 4Sensitivity Analysis4.1 IntroductionThere will be many times where the researcher, having selected a model, wants to consider the pos-sibility of choosing another model or simply to compare with it. Then it is necessary any tool thathelp him to compare both models, and to select one of them. This will be useful to make the variablesselection too in the regression models. In this section, the Bayesian Model Comparison is brieflydiscussed, remarking those methods which will be more useful. In the Bayesian field, common methods for model comparison are based on the following: sepa-rate estimation, comparative estimation and simultaneous estimation. Comparative estimation is based on distance measures such as entropy distance, and the underly-ing idea is that the more parsimonious model may be preferred between two models whose distancebetween their posterior or posterior predictive distributions is sufficiently small. Simultaneous model estimation let us compare many models at the same time, and the main meth-ods are reversible jump MCMC (RJMCMC) and birth and death MCMC (BDMCMC). Separate estimation compares two models not necessarily nested, and the most used terms are theposterior predictive distributions and the posterior probability of the model. Since methods which canbe considered to be into this type are the most accepted, we will explain some of them, remarking themost important ones. 28
    • 4. Sensitivity Analysis4.2 Bayes FactorThis is probably the dominant method of Bayesian model testing. It is the analogue of likelihood ratiotests within the frequentist framework, and the basic intuition is that prior and posterior informationare combined in a ratio that provides evidence in favour of one model specification versus another. Let us suppose we have two models to compare, M1 and M2 . Let p(M1 ) and p(M2 ) be theprior probabilities for the model M1 , M2 , respectively, and p(M1 |y) and p(M2 |y) be the posteriorprobabilities for the model M1 , M2 , respectively. Then the Bayes Factor is: p(y|M1 ) p(M1 |y)p(M1 ) B(y) = = (4.1) p(y|M2 ) p(M2 |y)p(M2 ) This means that the Bayes Factor chooses the very model for which the marginal likelihood of thedata, namely p(y|Mi ), is maximum. Therefore, the value of a factor gives evidence of the preferencebetween two models. According to [Jeff61], the following interpretation is suggested: Bayes Factor Interpretation 1 B(y) < 10 Strong evidence for M2 1 1 10 < B(y) < 3 Moderate evidence for M2 1 3 < B(y) < 1 Weak evidence for M2 1 < B(y) < 3 Weak evidence for M1 13 < B(y) < 10 Moderate evidence for M2 B(y) > 10 Strong evidence for M1 Table 4.1: Bayes Factor Interpretation 29
    • 4. Sensitivity Analysis The marginal likelihood usually involves an integral which can be analytically evaluated only forsome special cases. So, while Bayes Factors are rather intuitive, they are often quite difficult or evenimpossible to calculate from a practical point of view. Because of this, there are other alternatives tothis method.4.3 Alternative Stats to Bayes Factor ˆLet θ be the posterior mean of the posterior distribution and let us assume that the Bayes estimate forthe parameters θ is approximately equal to the maximum likelihood estimate. Then, the followingstats, from which some of them are used in frequentist statistics, could be useful diagnostics: • The likelihood Ratio, which will always favour the unrestricted model, and where the ratio is: ˆ ˆ Ratio = −2[log(p(θRestricted |y)) − log(p(θF ull |y))] (4.2) The ratio is distributed as a χ2 , where p is the number of parameters, including the intercept. p • Akaike Information Criterion (AIC), where a ratio between AIC1 (AIC for M1 ) and AIC2 (AIC for M2 ) less than 1 indicates that M1 is better. This method let the models not have to be nested and it favours more complicated models. The stat is: ˆ AIC = −2log(p(θ|y)) + 2p (4.3) where p is the number of parameters, including the intercept. It is used to be better than the previous one. • The Bayesian Information Criterion (BIC), which is also known as Schwarz Criterion (SC), Schwarz Information Criterion (SIC) or Schwarz Bayesian Criterion (SBC). As it occurred with the AIC, this method can be used for non- nested models. The BIC is: ˆ BIC = −2log(p(θ|y)) + plog(n) (4.4) where p is the number of parameters, including the intercept, and n is the sample size. Given any two estimated models, the model with the lower value of BIC is the one to be preferred. Since this method promotes model parsimony by penalizing models with increased model complexity (larger p) and sample size, say n, it may be preferred to the AIC. 30
    • 4. Sensitivity Analysis • The Deviance Information Criterion (DIC), which is a new statistic introduced by the devel- opers of the WinBugs software, who explained it in a detailed way in [Spie03]. The main and most important difference with the previous methods is that this is not an approximation of the Bayes Factor. It is a hierarchical modelling generalization of the AIC and the BIC, and it is particularly useful when the posterior distributions have been obtained by simulation. The DIC is: L −4 ˆ DIC = log(p(y|θl )) + 2log(p(y|θ)) (4.5) L l=1 where θL is the draw which has been obtained by simulating the posterior distribution in the L iteration. This method also penalizes against higher dimensional models, and it may be preferred to previous ones, mainly in linear models context.4.4 Highest Posterior Density IntervalsAll the techniques mentioned above typically require the elicitation of informative priors. However,there could be Bayesians who were interested to do model comparison with a non-informative prior.In such case, there are other techniques which can be used. Since the most common one in regressionanalysis is the Highest Posterior Density Interval (HPDI), we will only explain this method and letwill the interested reader reference to the below citations. Before defining the idea of HPDI is required to make the concept of credible set clear. Let usassume that ω is the region over which the coefficients β are defined. Then, C ⊆ ω is a 100(1 − α)%credible set with respect to β if: p(β ∈ C|y) = 1 − α (4.6) Since there are commonly numerous credible intervals, it is used to choose the one with smallestarea, namely the Highest Posterior Density Interval. Formally, a 100(1 − α)% highest posterior density interval for α is a 100(1 − α)% credible inter-val for θ with the property that it has a smaller area than any other 100(1−α)% credible interval for β. 31
    • 4. Sensitivity Analysis This is the Bayesian analogue of confidence intervals within frequentist framework, but now themeaning is more in line with commonsense. More information about all these methods and other variants of the Bayes Factor can be found ina more detailed way in [Aitk97], [Berg98], [Chen00], [Cong06] or [Koop03].4.5 Model Comparison SummaryA model comparison summary can be found in Tables 4.2 and 4.3 where the mark symbols mean: • * Good • ** Better • *** Still better • **** Probably the best 32
    • 4. Sensitivity Analysis Method Formulae Interpretation Mark p(y|M1 ) 1 Bayes Factor B(y) = p(y|M2 ) B(y) < 10 Strong evidence for M2 * 1 1 10 < B(y) <3 Moderate evidence for M2 1 3 < B(y) < 1 Weak evidence for M2 1 < B(y) < 3 Weak evidence for M1 3 < B(y) < 10 Moderate evidence for M1 B(y) > 10 Strong evidence for M133 Likelihood Ratio ˆ ˆ Ratio=−2 log p(βRestricted |y) − log p(βF ull |y) Ratio > χ2 Reject the restricted model * p Ratio < χ2 p Reject the restricted model ˆ AIC1 AIC AIC=−2 log p(β|y) + 2p AIC2 < 1 M1 is better than M2 ** AIC1 AIC2 > 1 M2 is better than M1 Table 4.2: Sensitivity Summary I
    • 4. Sensitivity Analysis Method Formulae Interpretation Mark ˆ BIC1 BIC BIC=−2log p(β|y) + plog(n) BIC2 < 1 M1 is better than M2 *** BIC1 BIC2 > 1 M2 is better than M1 1 L ˆ DIC1 DIC DIC=−4 L i=1 log p(y|β L ) + 2log p(y|β) DIC2 < 1 M1 is better than M2 ****34 DIC1 DIC2 > 1 M2 is better than M1 There is a probability of HPDI HPDI=p(β ∈ C|y) = 1 − α with the smallest area 100(1 − α)% of β being **** in the region C Table 4.3: Sensitivity Summary II
    • Chapter 5Regression Analysis5.1 IntroductionRegression analysis is a statistical tool for the investigation of relationships between variables, suchas models the relationship between one or more random variables y, called the response variables,and an independent variable or variables x, called the predictors. That is, it allows us to examinethe conditional distribution of y given x, denoted by p(y|β, x), when the n observations (xi , yi ), areexchangeable. Applications of regression analysis exist in almost every field. In economics, the dependent vari-able might be Ibex 35 index and the independent variables could be Dow Jones and FTSE 100 indexes.In political science, the dependent variable might be a state’s level of welfare spending and the inde-pendent variables measures of public opinion and institutional variables that would cause the state tohave higher or lower levels of welfare spending. In sociology, the dependent variable might be a mea-sure of the social status of various occupations and the independent variables characteristics of theoccupations (pay, qualifications, etc.). In psychology, the dependent variable might be individual’sracial tolerance as measured on a standard scale and with indicators of social background as indepen-dent variables. In education, the dependent variable might be a student’s score on an achievement testand the independent variables characteristics of the student’s family, teachers, or school. Before explaining the Bayesian regression, it will be reviewed the classical regression model,focusing on those parts useful for the former. 35
    • 5. Regression Analysis5.2 Classical Regression ModelThe simplest version of this model is the Normal linear model, where the variable y given X is aNormal distribution whose mean is a linear function of X: E(yi |β, X) = β0 + β1 xi1 + · · · + βp xip for all i = 1, . . . , n. (5.1) Even though the mean of y is a linear function of X, the real and the observed data do not fit in,and this is due to a random error, namely , so the appropriate form to reach a probabilistic linearmodel is through yi = β0 + β1 xi1 + · · · + βp xip + i for all i = 1, . . . , n. (5.2)where i is the term of the random error, which has a Normal distribution with mean 0 and varianceσ 2 . Due to the fact that the random variable yi is the result of the addition of a constant (the mean)and a random variable which has Normal distribution, yi follows a Normal distribution: yi N (β0 + β1 xi1 + · · · + βp xip , σ 2 ) for all i = 1, . . . , n (5.3) When the variance of y given X, β is assumed to be constant over all observations, the modelwill be called ordinary linear regression model. In a matrix notation, the Normal linear model can be denoted by Y = Xβ + (5.4)and Y N (Xβ, σ 2 I) (5.5)where:         y1 1 x11 . . . x1p β0 0          y2  1 x21 . . . x1p  β1   1         Y = .  X = . . .. .  β= .  =. . . . . .  . . . . . .  . . yn 1 xn1 . . . xnp βp pand I is the identity matrix. ˆIt can be shown that the ordinary least squares estimate of β, namely β, is 36
    • 5. Regression Analysis   ˆ β0   β0  ˆ ˆ   β = (X X −1 )X Y =  .  (5.6) . . ˆ β0where     n n n n i=1 xi1 ... i=1 xik i=1 yi      n n 2 n   n   i=1 xi1 i=1 xi1 ... i=1 xik xik   i=1 xi1 yi  XX= . . .. .  XY = .   . . . . . . .   . .      n n n 2 n i=1 xik i=1 xik xi1 ... i=1 xik i=1 xik yi As well, it can be shown that ˆ E(β) = β (5.7) ˆ Furthermore, the variances of β are proportional to the elements of the matrix (X X)−1 , denotedby C, which multiplied by the constant σ 2 represents the covariance matrix. The elements of thediagonal of that matrix are the variances of ˆ V ar(βj ) = σ 2 Cjj for all j = 0, 1, . . . , p. (5.8)where C = (X X)−1 . Likewise, the classical estimation of σ 2 is given in terms of the sum of squares error, SSE = n i=1 (yi − yi )2 , and is given by the mean squares error: ˆ SSE ˆ ˆ (YX β) (YX β) ˆ Y Y −β X Y σ 2 = M SE = = = (5.9) n−p n−p n−pwhere n is the number of observations and p corresponds to the number of parameters β. Regarding individual regression coefficients β, there will be sometimes where to make hypothesistests about them can be interesting in order to evaluate the potential value of each regressor variableof the model. The statistic to use in these cases is ˆ βj T0 = (5.10) σ 2 Cjj ˆ 37
    • 5. Regression Analysis ˆwhere Cjj is the element of the diagonal of the matrix XX corresponding to βj . So the null hypoth-esis will be rejected if |T0 | > tn−p, α . 2 Finally, once the model has been estimated and validated, one of its more important applicationsconsists of new predictions about the response variable Y when a new explanatory variable X ∗ isobserved. In this case, a point estimate would be ˆ ˆ Y ∗ = X∗ β (5.11)and a confidence interval for this future observation will be ˆ Y ∗ ± tn−p, α σ 2 (1 + X ∗ (X X)−1 X ∗ ) ˆ (5.12) 2where X ∗ = [x∗ 1 x∗ 2 ... x∗ ] k (5.13) These results can be found in a more detailed way in [Mont02], [Zamo01] or [Mat´ 95]. e To understand better all that has been said above, let us see a practical application in the StockMarkets. Let us suppose we are interesting in investigating the relationship between Ibex 35 indexand Dow Jones, FTSE 100 and Dax indexes the previous day. For such purpose, we have the points(taken as the mean of the daily maximum and minimum points) from January to October in 2006; thisis during the first ten months in 2006. The model to adjust is: IBEX35t = β1 DowJonest−1 + β2 F T SE100t−1 + β3 DAXt−1 + twhere t N (0, σ2 ) ˆ The estimates β are calculated according to what said before resulting:     β1 1.0147     β2  = −2.0085     β3 2.1082 38
    • 5. Regression Analysis The estimate for the variance σ 2 is: σ 2 = 332.182 So the model calculated is: IBEX35t = 1.0147 × DowJonest−1 − 2.0085 × F T SE100t−1 + +2.1082 × DAXt−1 + twhere t N (0, 332.182 ) This indicates that when Dow Jones or DAX goes up, Ibex 35 will increase the next day too.However, when FTSE100 arises, Ibex 35 will decrease the next day. If we use this model to predict the value which Ibex 35 will have on November, 1st , when DOWJones, FTSE 100 and DAX values are known the previous day, we have: IBEX35t = 1.0147 × 12067 − 2.0085 × 6155 + 2.1082 × 6287 = 13137 Finally, a comparison between the multiple and the simple Normal linear regression models isshown in Table 5.1 indicating the different parameters to use in each case. The goal of this compar-ison is to make clear that the simple Normal regression is a particular case of the multiple Normalregression where there is only a regressor variable or predictor.5.3 The Bayesian ApproachThe main difference between classical and Bayesian approach of the regression analysis is that thelatter treats the parameters like random variables which have a distribution. The aim of Bayesianapproach is to make inferences through the posterior distribution based on a prior distribution forthe parameters β and σ 2 of the Normal linear model and to provide a predictive distribution for themodel’s predictions. As it was said in the preceding section, and according to [Rossi06], the Normal linear regressionmodel is given by: 39
    • 5. Regression Analysis Multiple Normal Linear Regression Simple Normal Linear Regression Function yi = β0 + β1 xi1 + · · · + βp xip + i y = β0 + β1 x + Mean µi = β0 + β1 xi1 + · · · + βp xip µ = β0 + β1 x Variance σ2 σ2 Model Y N (µ, σ 2 I) Y N (µ.σ 2 ) ˆ β ˆ β = (X X)−1 X Y ˆ E[β] β β ¯ ˆ V ar(β) ˆ V ar(βj ) = σ 2 Cjj ˆ V ar(β0 ) = σ 2 1 + Pn x 2 n x 2 i=1 (xi −¯) 2 ˆ V ar(β1 ) = Pn σ i=1 (xi −¯)2 x Pn ˆ Y Y −β X Y y 2 i=1 (yi −ˆ1 ) σ2 σ2 = n−p σ2 = ˆ n−2 ˆ ˆ 1 (x −¯)2 x Prediction Yf ± tn−p, α σ 2 (1 + Xf (X X)−1 Xf ) ˆ Yf ± tn−p, α σ 2 (1 + ˆ n + Pn f x 2 ) 2 2 i=1 (xi −¯) Only applied to those data in same Only applied to those data in same Limitation range as sampled data range as sampled data Table 5.1: Multiple and Simple Regression Comparison Y = Xβ + (5.14)where 40
    • 5. Regression Analysis N (0, σ 2 I) (5.15) So Y |X, β, σ 2 N (Xβ, σ 2 I) (5.16) For simplicity of notation, we will not explicitly include X in our conditioning set for regressionmodel. Using the definition of the multivariate Normal density, the likelihood function is obtained: (σ 2 )−n/2 −1 p(Y |β, σ 2 ) = exp (Y − Xβ) (Y − Xβ) (5.17) (2π)n 2σ 2 It will be convenient to write (Y − Xβ) (Y − Xβ) (5.18)in terms of the ordinary least squares estimators v =n−p (5.19) ˆ β = (X X)−1 X Y (5.20) ˆ ˆ (Y −X β) (Y −X β) s2 = n−p (5.21) So ˆ ˆ (Y − Xβ) (Y − Xβ) = vs2 + (β − β)X X(β − β) (5.22) Then 1 −1 ˆ ˆ −vs2 p(Y |β, σ 2 ) = exp (β − β) (X X)(β − β) (σ 2 )−v/2 exp (5.23) (2π)n/2 σ p 2σ 2 2σ 2 As it was said before, n corresponds to the number of observations and p refers to the number ofparameters β. This new form of expressing the likelihood function would be more useful to find anatural conjugate prior distribution, which would have the same form that the former has. 41
    • 5. Regression Analysis The prior distribution for β and σ 2 , denoted by p(β, σ 2 ), can be written in a more convenientway applying the definition of the joint distribution: p(β, σ 2 ) = p(β|σ 2 )p(σ 2 ) (5.24) Note that β and σ 2 are supposed to be dependent, which will rarely occur. Some authors prefer 1to work with the error precision, σ2 say, instead of the variance σ 2 . All this is very similar to that explained in the Bayesian Analysis for the Normal distribution. Theterm of the first parenthesis in the likelihood function suggests a form of a Normal distribution for theparameter β knowing σ 2 . So −1 p(β|σ 2 ) ∝ (σ 2 )−p exp (β − β0 ) V0−1 (β − β0 ) (5.25) 2σ 2and, hence, β|σ 2 N (β0 , σ 2 V0 ) (5.26) According to [Rossi06] the term of the second parenthesis in the likelihood function suggests aform of an inverse gamma distribution for the parameter σ 2 (see appendix A). So v0 −v0 s2 p(σ 2 ) ∝ (σ 2 )−( 2 +1) exp 0 (5.27) 2σ 2and, hence, v0 v0 s2 σ2 Inv − G , 0 (5.28) 2 2 Note that there is an extra term (σ 2 )−1 here which is not suggested by the form of the likelihoodexplained above. This term can be rationalized by viewing the conjugate prior as arising from theposterior of a sample of size v0 with sufficient statistics, s2 , β0 , formed with the noninformative prior, 0p(β, σ 2 ) ∝ σ −2 , which will be briefly explained later. So the natural conjugate prior distribution of the parameters β and σ 2 is: p+v0 −1 p(β, σ 2 ) ∝ (σ 2 )−( 2 +1) exp [v0 s2 + (β − β0 ) V0−1 (β − β0 )] 0 (5.29) 2σ 2and, hence, 42
    • 5. Regression Analysis β, σ 2 N − Inv − χ2 (β0 , V0 s2 ; v0 , s2 ) 0 0 (5.30)where the prior hyper-parameters β0 ,V0 ,v0 and s2 show the knowledge that the researcher has about 0the problem and her or his confidence in it. Furthermore, the parameter β0 measures the marginaleffect of the explanatory variable on the dependent variable. As well, V0 indicates the uncertaintyabout the prior information and it plays the same role than (X X)−1 does in the classical approach,v0 represents a fictitious data set so it plays a similar role than n and s2 is an imaginary s2 for those 0fictitious data. In terms of the distribution, β0 and V0 σ 2 represent the location and scale of β, respec-tively, and v0 and s2 the degrees of freedom and scale of σ 2 , respectively. 0 Since a conjugate prior distribution has been used, the posterior distribution will have the sameform. That is, the posterior distribution will be a Normal-Scaled Inverse χ2 with a posterior hyper-parameters β1 , V1 , v1 and s2 . According to [Rossi06] and [Koop03], it can be shown that 1 β, σ 2 |y N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 ) 1 1 (5.31) The relation between the prior and the posterior hyper-parameters, according to [Koop03], is: V1 = (V0−1 + X X)−1 (5.32) ˆ β1 = V1 (V0−1 β0 + X X β) (5.33) v1 = v0 + n (5.34) ˆ ˆ v1 s2 = v0 s2 + vs2 + (β − β0 )[V0 + (X X)−1 ]−1 (β − β0 ) (5.35) 1 0 As it was mentioned in the Bayesian Data Analysis chapter, a measure is needed to summarizethe posterior distribution, and this is usually the posterior mean, namely E(β|y). According to whatsaid in previous chapters, the marginal for β will be a multivariate t-distribution (see Appendix A): β|y tv1 (β1 , s2 V1 ) 1 (5.36)where ˆ E(β|y) = β1 = V1 (V0−1 β0 + X X β) (5.37)and 43
    • 5. Regression Analysis v1 s2 1 V ar(β|y) = V1 (5.38) v1 − 2 ˆ So the posterior mean is a weighted average of the ordinary least squares estimate, β, and theprior mean, β1 , where those weights are proportional to the observed data, X X, and the importancegiven to the prior, V0−1 , respectively. This should make clear that as prior variance for β is decreased,greater posterior weight is placed on prior beliefs relative to the data, so the posterior mean will becloser to the prior mean. v1 s2 The elements of the diagonal of the matrix v1 −2 V1 1 are the variances of β0 , β1 , . . . , βp . v1 s2 1 V ar(βj ) = V1jj for all j = 0, 1, . . . , p (5.39) v1 − 2 Likewise, the marginal posterior for σ 2 is: σ 2 |y Inv − χ2 (v1 , s2 ) 1 (5.40)and, hence, v1 s2 E(σ 2 |y) = 1 (5.41) v1 − 2 2v1 s4 2 V ar(σ 2 |y) = 1 (5.42) (v1 − 2)2 (v1 − 4) So, as we increase the total of fictitious data v0 , then v1 tends towards v0 , and, hence, σ 2 getcloser to s2 . Tables 5.2 and 5.3 shows how the different posterior parameters of interest vary depending on theprior parameters V0 (considering V0 as cIk ) and v0 and the sample size n: Table 5.2 means that if the size of the sample increases towards infinity, then the prior informationthat the researcher gives has very little or almost none importance, as it occurs if the precision of theprior distribution for β decreases (that is, V0 increases) towards 0. The difference between both casesis that in the former the variance of β is lower than in the latter. The number of fictitious data does not seem to affect to the posterior mean, but it affects to theposterior variance increasing it (resp. decreasing) as the fictitious data increase (resp. decrease). 44
    • 5. Regression Analysis Action E[β|y] V ar[β|y] n Increase Closer to OLS estimates Closer to 0 Decrease Closer to β0 Further from 0 V0 Increase Closer to OLS estimates Further from 0 Decrease Closer to β0 Closer to 0 ν0 Increase Not affected Increase Decrease Not affected Increase Table 5.2: Sensitivity analysis of parameter β Table 5.3 refers to the parameter σ 2 , and it means that if the fictitious data increase, then theinformation given by the researcher will have much more weight over the posterior mean of σ 2 thanthe real data have, and the variance will be lower too. The other way round occurs when the numberof real data increases. Then, the data information will have the most important weight and the priorinformation will not have any value. Another interesting result is that as the precision of the priordistribution for β decreases (that is, V0 increases) the posterior mean of σ 2 will approximate to thenumber of real data times the ordinary least estimates. Continuing in a different issue, the fact that the natural conjugate prior implies prior informationenters in the same manner as data information helps with prior elicitation. When several priors canbe applied to the same problem, two strategies can be adopted to surmount the possible criticisms.First, a prior sensitivity analysis can be carried out to demonstrate that results are the same withdifferent priors chosen. But, if results are sensitive to choice of prior, Bayesian approach allows forthe scientifically honest finding of such a state of affairs. There has been work done on extremebounds analysis for quantities such as the posterior mean of a parameter. [Poir95] provides a detailed 45
    • 5. Regression Analysis Action E[β|y] V ar[β|y] n Increase Closer to OLS estimates Closer to 0 Decrease Closer to s2 0 Further from 0 V0 Increase Closer to vs2 Closer to 0 Decrease Closer to OLS estimates Further from 0 ν0 Increase Closer to s2 0 Closer to 0 Decrease Closer to OLS estimates Further from 0 Table 5.3: Sensitivity analysis of parameter σ 2discussion about this issue. A second strategy is to use a non-informative prior to let the data speakloudly and be predominant over prior information. For example, let’s set v0 = 0, and V0−1 = 0. Then β, σ 2 |y N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 ) 1 1 (5.43)where V1 = (X X)−1 (5.44) ˆ β1 = β (5.45) v1 = n (5.46) v1 s2 = vs2 1 (5.47) With this non-informative prior, all of these formulas involve only data information and equal toordinary least squares results. Bayesians often write this prior as: 46
    • 5. Regression Analysis p(β, σ 2 ) ∝ σ −2 (5.48) Finally, one of the goals of the Bayesian approach is to provide a predictive model to predict anunobserved data point generated from the same model that the data set with n observations (N (0, σ 2 )with the same β). This is denoted by: Y ∗ = X ∗β + ∗ (5.49) where Y ∗ is not observed and ∗ is independent of . Bayesian prediction is based on calculating p(y ∗ |y) = p(y ∗ |y, β, σ 2 )p(β, σ 2 |y)dβdσ 2 (5.50) The key to get the prediction is to find out the form of p(y ∗ |y, β, σ 2 ), since the posterior p(β, σ 2 |y)has been already calculated, and to test if p(y ∗ |y) is easy to integrate or, on the contrary, a posteriorsimulator has to be employed. Since ∗ is independent of , then Y ∗ is independent of Y , and p(y ∗ |y, β, σ 2 ) can be written asp(y ∗ |β, σ 2 ), which is a multivariate Normal, as it was seen before. T (σ 2 )− 2 1 p(y ∗ |β, σ 2 ) = exp − (y ∗ − X ∗ β) (y ∗ − X ∗ β) (5.51) (2π)S 2σ 2 Multiplying this by the posterior obtained previously and integrating yields a multivariate t: y ∗ |y tv1 (X ∗ β1 , s2 (IT + X ∗ V1 X ∗ )) 1 (5.52)where T is the number of observed X ∗ . It is easy to see that: E(y ∗ |y) = X ∗ β1 V ar(y ∗ |y) = s2 (IT + X ∗ V1 X ∗ ) 1 (5.53) A brief summary that compares the classical and the Bayesian approaches is displayed to note thecoincidences and differences between them. 47
    • 5. Regression Analysis Classical Regression Bayesian Regression ˆ β = (X X)−1 X Y ˆ β1 = V1 V0−1 β0 + X X β ˆ ˆ (β−β0 ) (β−β0 ) ˆ ν0 s2 +νs2 + Y Y −βX Y 0 V0 +(X X)−1 σ2 ˆ = n−p s2 1 = ν1 ˆ E[β] = β E[β|y] = β1 ˆ ν1 s2 V ar βj = σ 2 Cjj V ar (βj |y) = ν1 −2 V1jj 1 Y ∗ |y tn−p X ∗ β, σ 2 IT ˆ Y ∗ |y tν1 X ∗ β1 , s2 IT + X ∗ V1 X ∗ 1 Table 5.4: Classical and Bayesian regression comparison A very interesting and more exhaustive comparison between these two approaches can be read inthe article written by [Urba92], where he explains the advantages and disadvantages of using each ofthem.5.4 Normal Linear Regression Model subject to inequality constraintsIn this section, let us guess we want to impose inequality constraints on the coefficients in the Normallinear regression model, such as βj ∈ A, where A is the region of all valid values of the coefficients.This is quite simple in Bayesian regression since they are imposed through the prior distribution: p(β, σ 2 ) N − Inv − χ2 (β0 , V0 s2 ; v0 , s2 )1(β ∈ A) 0 0 (5.54)where β0 , V0 , v0 and s2 are prior hyper-parameters to be chosen and 1(β ∈ A) is the indicator func- 0tion, which equals 1 if β ∈ A and 0 otherwise. Likewise, the posterior distribution for β is now: 48
    • 5. Regression Analysis p(β|y) ∝ tv1 (β1 , s2 V1 )1(β ∈ A) 1 (5.55)where β1 , V1 , v1 and s2 were defined previously. 1 So the difference introducing inequality constraints is that we must add the indicator function now. This can result very easy, but for general choice of A neither analytical posterior results nor Gibbssampling work. The most suitable method is the importance sampling, which has already explained.In this case, according to [Koop03] the importance function is: q(β) = tv1 (β1 , s2 V1 ) 1 (5.56) The strategy consists of getting draws y ∗(s) drawing from p(y ∗ |β (s) , σ 2(s) using the draws β (s)and σ 2(s) which were obtained for the posterior distribution. Then using these draws (y ∗ )(s) in theImportance Sampling, the mean and the variance can be calculated. Other more simple way consists of ignoring the constraints until the end of simulation, and thendiscarding those draws which violate the restrictions. According to [Gelm04], this works reasonablywell if the constraints do not eliminate a large portion of data.5.5 Normal Linear Regression Model with Independent ParametersNow, suppose that the parameters β and σ 2 are independent, so p(β, σ 2 ) = p(β)p(σ 2 ) (5.57) With the same likelihood function as that used in the previous section, this assumption implies thatβ follows a Multivariate Normal Distribution with mean β0 , as it occurred with β and σ 2 dependent,but with variance V0 , and σ 2 has exactly the same Scaled − Inv − χ2 distribution used previously.That is: β N (β0 , V0 )σ 2 Inv − χ2 (v0 , s2 ) 0 (5.58) The prior joint distribution is 49
    • 5. Regression Analysis 1 v0 v0 s2 p(β, σ 2 ) ∝ exp − (β − β0 ) V0−1 (β − β0 ) (σ 2 )−( 2 +1) exp − 0 (5.59) 2 2σ 2 β, σ 2 N − Inv − χ2 (β0 , V0 , v0 , s2 ) 0 (5.60) As the posterior joint distribution is proportional to the prior times the likelihood: 1 (Y − Xβ) (Y − Xβ) p(β, σ 2 |Y ) ∝ exp − + (β − β0 ) V0−1 (β − β0 ) × 2 σ2 n+v0 v0 s2 + vs2 × (σ 2 )−( 1 +1) exp − 0 2 (5.61) 2σ Since this function does not take the form of any well-known density, it is interesting to find theconditional distributions for β, p(β|Y, σ 2 ), and for σ 2 , p(σ 2 |Y, β), because with them any informa-tion from p(β, σ 2 |Y ) can be obtained through the posterior simulation with the Gibbs sampler alreadyexplained in previous chapters. According to [Koop03], it can be shown that those conditional distributions are: 1 p(β|Y, σ 2 ) ∝ exp − (β − β1 ) V1−1 (β − β1 ) (5.62) 2 p+v0 1 p(σ 2 |Y, β) ∝ (σ 2 )−( 2 +1) exp − 2 (Y − Xβ) (Y − Xβ) + v0 s2 + vs2 0 (5.63) 2σ And this all yields: β|y, σ 2 N (β1 , V1 ) (5.64) σ 2 |y, β Inv − χ2 (v1 , s2 ) 1 (5.65)where 1 V1 = (V0−1 + X X)−1 (5.66) σ2 1 β1 = V1 (V0−1 |β0 + 2 X Y ) (5.67) σ v1 = n + v0 (5.68) (Y − Xβ) (Y − Xβ) + v0 s2 s2 1 = 0 (5.69) v1 50
    • 5. Regression Analysis The fact that the posterior distribution has an unknown form affects to the prediction for y ∗ ,p(y ∗ |y), too. As it has been already said for the posterior predictive in Bayesian Approach chapter,the interest is on p(y ∗ |y, β, σ 2 ). Since y and y ∗ are independent of one another, p(y ∗ |y, β, σ 2 ) = p(y ∗ |β, σ 2 ) (5.70) And hence T ∗ 2 (σ 2 ) 2 1 p(y |β, σ ) = exp − (y ∗ − X ∗ β)(y ∗ − X ∗ β) (5.71) (2π) T 2 2σ 2 As the analytical solution of the integral of this figure is not trivial, the importance of Gibbssampler arises again, and, combine it with the Monte Carlo integration, any posterior and predictiveinference can be done. The strategy consists of getting draws y ∗(s) drawing from p(y ∗ |β s , σ 2(s) )using the draws β (s) , σ 2(s) which were obtained for the posterior distribution. Then using these drawsy ∗(s) in the Monte Carlo integration the mean and the variance can be calculated.5.6 Normal Linear Regression Model with Heteroscedasticity and Cor- relationUntil now the variances have been supposed to be equal and having no correlation, but this is not veryrealistic. In this section we are going to relax that assumption and to consider the next model: Y = Xβ + (5.72)where N (0, Σ) (5.73) That is, we are considering heteroscedasticity and correlation. According to [Koop03], since Σ isa positive definite matrix, a matrix P can be found that verifies P ΣP = I, and it can be shown that Y ∗ = X ∗β + ∗ (5.74)where ∗ (0, σ 2 I) (5.75) 51
    • 5. Regression Analysisand Y ∗ = PY (5.76) X∗ = P X (5.77) ∗ =P (5.78) Then, the likelihood function to consider now is: 1 2 −p 1 ˆ ˆ p(Y |β, σ 2 , Σ) = n (σ ) 2 exp − (β − βΣ) X Σ−1 X(β − βΣ) × (2π) 2 2σ 2 v −vsΣ−2 × (σ 2 )− 2 exp (5.79) 2σ 2where: v = n−p (5.80) ˆ βΣ = (X ∗ X ∗ )−1 X ∗ Y ∗ (5.81) ˆ ˆ (Y ∗ − X ∗ βΣ) (Y ∗ − X ∗ βΣ) s2 (Σ) = (5.82) vwhich is very similar to that use with equal variances. Using the prior distributions described in the previous section, we have: p(β, σ 2 , Σ) = p(β)p(σ 2 )p(Σ) (5.83)where β is normally distributed with prior parameters β0 , V0 and σ 2 is an scaled inverse Chi-squarewith parameters v0 and s2 . 0 Hence, knowing that the posterior distribution is proportional to the prior times the likelihood: 1 (Y ∗ − X ∗ β) (Y ∗ − X ∗ β)p(β, σ 2 , Σ|Y ) ∝ p(Σ) × exp − + (β − β0 ) V0−1 (β − β0 ) × 2 σ2 n+v0 v0 s2 × (σ 2 )−( 2 +1) exp − 2 0 (5.84) 2σ 52
    • 5. Regression Analysis This suggests a Normal distribution for the posterior conditional for β and an scaled inverse Chi-square for the posterior conditional for σ 2 , as occurred before. Therefore: β|Y, σ 2 , Σ N (β1 , V1 ) (5.85) σ 2 |Y, β, Σ Inv − χ2 (v1 , s2 ) 1 (5.86)where X Σ−1 X −1 V1 = (V0−1 + ) (5.87) σ2 ˆ X Σ−1 X βΣ β1 = V1 (V0−1 β0 + 2 ) (5.88) σ v1 = n + v0 (5.89) (Y − Xβ) Σ (Y − Xβ) + v0 s2 s2 1 = 0 (5.90) v1 According to [Koop03], the posterior conditional for Σ yields: 1 1 p(Σ|Y, β, σ 2 ) ∝ p(Σ)|Σ|− 2 exp − (Y − Xβ) Σ−1 (Y − Xβ) (5.91) σ2 So we have come to the point where the form that Σ takes is crucial.5.6.1 HeteroscedasticityLet us suppose we suspect that there is not correlation among the errors but their variances are differ-ent. Hence, we will have n variances ωi for n errors i . It could be that the researcher has an idea of the form of Σ and assumes that ωi = h(xi , α) = (1 + α1 xi1 + · · · + αp xip )2 (5.92) That is, the variances are related to some or all independent variables. The researcher shouldchoose a prior for α, and then, Bayesian inference can be carried out through a Metropolis-Hastingsalgorithm such as Random walk. If the researcher knows that the error variances are different but has not idea of their form, then aprior for Σ has to be chosen. According to [Koop03]: 53
    • 5. Regression Analysis n p(Σ) = p(ωi ) (5.93) i=1where ωi Inv − χ2 (vw , 1) (5.94) But now a hyper-prior distribution should be fixed for vw , such as p(Σ) = p(Σ|vw )p(vw ) (5.95) That is, we are using a hierarchical prior to treat the heteroscedasticity. According to [Gelm04] aMetropolis-Hastings algorithm can be used to draw posterior simulations.5.6.2 CorrelationNow, let us assume that there is some correlation among the errors through the time- space relationshipsuch as the error in one period depends on that in the previous period. This is a type of regressioncalled autoregressive, and it can be considered a time series. For example, if we are consideringthe relation among the Ibex 35 values one day and the previous ones, we could say that there is acorrelation between errors that exists in the relation among Fridays and previous days and what existsin the relation among the values on Thursdays or Wednesdays or Tuesdays and the previous days.That is: t = ρ1 t−1 + ρ2 t−2 + · · · + ρp t−p + ut (5.96)where ut N (0, σ 2 ) (5.97) We will consider that there is stationary. This means, in a general way, that the probability dis-tribution does not vary through the time. Some time series does not seem to be stationary, but thedifferences do. The main difference to take into account is the first one mentioned. The first differ-ence of t is δ t and it indicates the variation in among the periods t and t − 1, t − 2 ,. . . , t − p. According to [Koop03], the irregular component ut can be formulated in the following way: ρ(L) t = ut (5.98) 54
    • 5. Regression Analysiswhere L is called the lag operator and has the property that L t = t−1 and ρ(L) = (1 − ρ1 L − · · · −ρp Lp ). So, if we have the regression model: Yt = Xt β + t (5.99) Then, it is possible to find a model such as Yt∗ = Xt β + ut , ut ∗ N (0, σ 2 ) (5.100)where Yt∗ = ρ(L)Yt (5.101) ∗ Xt = ρ(L)Xt (5.102) Therefore, using an independent Normal scaled inverse chi-square prior for β and σ 2 , it yields: β|Y, σ 2 , ρ N (β1 , V 1) (5.103) σ 2 |Y, β, ρ Inv − χ2 (v1 , s2 ) 1 (5.104)where X ∗ X ∗ −1 V1 = (V0−1 + ) (5.105) σ2 X∗ Y ∗ β1 = V1 (V0−1 β0 + ) (5.106) σ2 v1 = v0 + T − p (5.107) (Y ∗ − X ∗ β) (Y ∗ − X ∗ β) + v0 s2 s2 1 = 0 (5.108) v1 And now, as it occurred with heteroscedasticity, a prior should be selected for ρ. Let us choose amultivariate Normal subject to the constraint ρ ∈ φ, where φ is the stationary region. Then, p(ρ) N (ρ0 , Vρ0 )1(ρ ∈ φ) (5.109) p(ρ|Y, β, σ 2 ) N (ρ1 , Vρ1 )1(ρ ∈ φ) (5.110) 55
    • 5. Regression Analysiswhere ρ0 and Vρ0 are the prior parameters which the research should establish and ρ1 and Vρ1 are theposterior parameters with the next relation: −1 E E −1 Vρ1 = (Vρ0 + ) (5.111) σ2 −1 EE ρ1 = Vρ1 (Vρ0 ρ0 + 2 ) (5.112) σwhere E is a matrix with the errors through the time from t − 1 to t − p for each independent variable. According to [Koop03], a Gibbs sampler can be used to draw posterior simulations.5.7 Models SummarySince the main models to be used in the posterior application are those homoscedastic and not autocorrelated, the main ideas are shown in Tables 5.5, 5.6, 5.7 and 5.8. 56
    • 5. Regression Analysis Case β σ2 Joint Prior Distribution p(β, σ 2 ) = p(β|σ 2 )p(σ 2 ) β|σ 2 N (β0 , σ 2 V0 ) σ2 Inv − χ2 (v0 , s2 ) 0 β, σ 2 N − Invχ2 (β0 , V0 s2 ; v0 , s2 ) 0 0 p(β, σ 2 ) = p(β|σ 2 )p(σ 2 ) β|σ 2 N (β0 , σ 2 V0 )1(β ∈ A) σ2 Inv − χ2 (v0 , s2 ) 0 β, σ 2 N − Invχ2 (β0 , V0 s2 ; v0 , s2 )1(β ∈ A) 0 057 p(β, σ 2 ) = p(β)p(σ 2 ) β N (β0 , V0 ) σ2 Inv − χ2 (v0 , s2 ) 0 β, σ 2 N − Invχ2 (β0 , V0 ; v0 , s2 ) 0 p(β, σ 2 ) = p(β)p(σ 2 ) β N (β0 , V0 )1(β ∈ A) σ2 Inv − χ2 (v0 , s2 ) 0 β, σ 2 N − Invχ2 (β0 , V0 ; v0 , s2 )1(β ∈ A) 0 Table 5.5: Main Prior Distributions Summary
    • Case Joint Posterior Distribution Key 5. Regression Analysis Obtain Margin Distribu- p(β, σ 2 |y) = p(y|β, σ 2 )p(β|σ 2 )p(σ 2 ) β, σ 2 N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 ) 1 1 tions, Draw directly from them and summarize Obtain Margin Distri- butions, Draw directly p(β, σ 2 |y) = p(y|β, σ 2 )p(β|σ 2 )p(σ 2 ) β, σ 2 N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 )1(β ∈ A) 1 1 from them, discard invalid draws and summarize58 Obtain Conditional Distri- 1 (Y −Xβ) (Y −Xβ) p(β, σ 2 |y) = p(y|β, σ 2 )p(β)p(σ 2 ) ∝ exp − 2 σ2 + (β − β0 ) V0−1 (β − β0 ) × butions, Draw with Gibbs Sampler and summarize n+v0 v s2 +vs2 2 +1) 0 × (σ 2 )−( exp − 0 2σ2 Obtain Conditional Distri- (Y −Xβ) (Y −Xβ) butions, Draw with Gibbs p(β, σ 2 |y) = p(y|β, σ 2 )p(β)p(σ 2 ) ∝ exp − 1 2 σ2 + (β − β0 ) V0−1 (β − β0 ) × Sampler, discard invalid n+v0 draws and summarize +1) v0 s2 +vs2 0 × (σ 2 )−( 2 exp − 1(β ∈ A) 2σ 2 Table 5.6: Main Posterior Distributions Summary
    • 5. Regression Analysis Case Prior Parameters Posterior Parameters Relation β0 β1 ˆ β1 = V1 (V0−1 β0 + X X β) V0 V1 V1 = (V0−1 + X X)−1 p(β, σ 2 |y) = p(y|β, σ 2 )p(β|σ 2 )p(σ 2 ) v0 v1 v1 = v0 + n ˆ ˆ v0 s2 +vs2 +(β−β0 )[V0 +(X X)−1 ](β−β0 ) s2 s2 0 0 1 s2 = 1 v159 X Y β0 β1 β1 = V1 (V0−1 β0 + σ2 ) X X −1 V0 V1 V1 = (V0−1 + σ2 ) p(β, σ 2 |y) = p(y|β, σ 2 )p(β)p(σ 2 ) v0 v1 v1 = v0 + n v0 s2 +(Y −Xβ) (Y −Xβ) 0 s2 0 s2 1 s2 1 = v1 Table 5.7: Prior and Posterior Parameters Summary
    • Case p(y ∗ |y, β, σ 2 ) Key Constraint 5. Regression Analysis Obyain draws y ∗ from p(y ∗ |y, β, σ 2 ) using previous draws from posterior p(y ∗ |y) = p(y ∗ |y, β, σ 2 )p(β|σ 2 , y)p(σ 2 |y)dβdσ 2 N (β, σ 2 ) No simulation. Use Monte Carlo integra- tion to get predictive inferences. Obyain draws y ∗ from p(y ∗ |y, β, σ 2 ) using previous draws from posterior p(y ∗ |y) = p(y ∗ |y, β, σ 2 )p(β|σ 2 , y)p(σ 2 |y)dβdσ 2 N (β, σ 2 ) Yes simulation. Use Monte Carlo integra- tion to get predictive inferences.60 Obyain draws y ∗ from p(y ∗ |y, β, σ 2 ) using previous draws from posterior p(y ∗ |y) = p(y ∗ |y, β, σ 2 )p(β|y)p(σ 2 |y)dβdσ 2 N (β, σ 2 ) No simulation. Use Monte Carlo integra- tion to get predictive inferences. Obyain draws y ∗ from p(y ∗ |y, β, σ 2 ) using previous draws from posterior p(y ∗ |y) = p(y ∗ |y, β, σ 2 )p(β, y)p(σ 2 |y)dβdσ 2 N (β, σ 2 ) Yes simulation. Use Monte Carlo integra- tion to get predictive inferences. Table 5.8: Main Posterior Predictive Distributions Summary
    • Chapter 6Symbolic Data6.1 What is symbolic data analysis?Nowadays there are more and more data which are susceptible to be analyzed and studied. The tech-nological advances let us get huge quantities of information about a specific variable. But part ofthat information is lost due to the fact that standard statistical methods do not have the flexibility tomanage such quantity of information. For example, let us assume we are studying the evolution ofstock prices for an enterprise. At the end of each month we would have the different values that thestock has been taking daily. It seems reasonable to think that the researcher would take only the dailyclose prices, or the daily mean prices, but he would not manage all the gathered information. The symbolic data analysis (SDA) deals with this problem and let us analyse vast information ef-ficiently in order to extract the required knowledge and to represent it better. Going on with the sameexample, the symbolic data will let the engineer manage the daily maximum and minimum pricesof a month, or manage a histogram for monthly prices and work with them. In this way, SDA com-plements other statistical tools which are widely used, such as candlesticks. More information aboutcandlesticks and other interesting tools can be found in [Lee 06] and [Irpi05]. For instance, Figure 6.1illustrates an interval time series for the daily maximum and minimum Ibex 35 values in January 2006. So, the possibilities with symbolic data are evident. For instance, let us think of an applicationof this with warrants. Warrant is a right, without obligation, to buy, namely call warrant, or to sell,namely put warrant, something at an agreed price, namely strike. So you could get a predicted stockprice range to decide the best put warrant or the most suitable call warrant, and obtain more benefits. 61
    • 6. Symbolic Data Figure 6.1: Interval time series Regarding to the aggregation method used by SDA lies the notion of a symbolic object. This is amathematical model of a concept (see [Dida95]) which, basically, let us select some individuals froma group. Going further with SDA and according to [Bill06a], three main kinds of symbolic data canbe considered: multi-valued, interval-valued and modal- valued. As far as the former is concerned, a multi-valued symbolic random variable Y is one whose pos-sible value takes one or more values from the list of values in its domain Y. The complete list ofpossible values in Y is finite, and values may be well- defined categorical or quantitative values. For example, let us have all the companies which have formed the Ibex 35 index since its be-ginning. Then we could define a variable Y = blue chips in the Ibex 35 having 15 observationswu = year. Thus, we have, for instance, that during the first year, in 1992 (wu = w1 ), Telef´ nica, oRepsol, Endesa, SCH and BBVA were considered to be the blue chips. In 2007 (wu = w1 ) Santander,Telef´ nica, BBVA, Endesa and Repsol YPF are considered to be the blue chips. o Likewise, an interval-valued symbolic random variable Y is one that takes values in an interval. 62
    • 6. Symbolic Data wu Year Z = Blue chips in Ibex 35 w1 1992 {Telef´ nica, Repsol, Endesa, SCH, BBVA} o . . . . . . . . . w15 2007 {Telef´ nica, Repsol YPF, Endesa, BBVA, Santander} o Table 6.1: Multivalued Data ExampleThat interval can be closed or open at either end. This is very important in SDA; furthermore, it canextract the tendency of centralization and dispersion of a dataset. Let us recall the example of thedaily stock prices for a company in a month. This information can be recorded as the daily maximumand minimum values during the month. As this is one of the most interesting types of symbolic datafor our purpose, we will take it up again below. Finally, let a random variable Y take possible values {ηk : k = 1, 2, . . . } over a domain Y. Then,a modal valued outcome is that formed with the value ηk and an associated measure πk . This lastone is usually a weight, probability, relative frequency, and the like. But it can also be capacities,necessities, possibilities, credibilities and related entities. Then modal multi-valued variable can be defined now. This is a variable whose observed outcometakes values that are a subset of the domain with its respective measure. For example we could definea variable Z = Importance of the companies in the Ibex 35 index. Thus, for instance, we have that themost important company in 1992 was Telef´ nica and now Santander is currently the company with ohighest weight in the index in 2007. Another example: let us suppose we define a variable Y = Maximum daily stock price for enter-prises in the Continuous Spanish Stock Market. We could have for the enterprise Endesa: 63
    • 6. Symbolic Data wu Year Z = Importance of a Ibex 35 company {Telef´ nica, 13.7; Repsol, 9.7; Endesa, 9.2; SCH, 8.0, BBVA, o 7.2; Iberdrola, 6.9; Santander, 5.9; Banco Popular, 3.8; Banesto, 3.6; Banco Exterior, 3.0; Cepsa, 2.5; Tabacalera, 2.4; Acesa, 2.1; Uni´ n FENOSA, 2.0; Gas Natural, 1.9; Sevillana de Elec- o tric, 1.8; Fuerzas E. Catalua, 1.7; Bankinter, 1.6; Dragados, w1 1992 1.4; Aguas de Barcelona, 1.3; Mapfre, 1.3; Asland, 1.2;FCC, 1.1; Portland Valderribas, 1.0; Hidrocantbrico, 0.8;Vallehermoso, 0.8; Metrovacesa, 0.8; Acerinox, 0.7; Viscofn, 0.6; Cubiertas y MZOV, 0.5; Sarri´ , 0.4; Uralita, 0.4; Huarte, 0.3; Urbis, 0.3; o Agromn, 0.2} . . . . . . . . . { Telef´ nica, 16.0; Repsol YPF, 5.9; Endesa, 7.5; BBVA, 13.0; o Iberdrola, 5.6; Santander, 17.2; Banco Popular, 3.4; Banesto, 0.5; Uni´ n FENOSA, 1.8; Gas Natural, 1.5; Bankinter, 0.9; Cor. o Mapfre, 0.7; FCC, 1.2; Sacyr Vallehermoso, 1.0; Metrovacesa, w15 2007 0.5; Acerinox, 1.0; Inditex 3.0; ACS Const, 2.9; B. Sabadell, 2.1; Altadis, 2.0; Abertis A, 2.0; G. Ferrovial, 1.6; Acciona, 1.4; FCC, 1.2; Gamesa, 1.0; Enags, 0.8; REE, 0.8, Cintra, 0.7; Agbar, 0.7; Telecinco, 0.6; Iberia, 0.5; Indra A, 0.5; Fadesa, 0.5; Sogecable, 0.4; Antena 3 TV, 0.4; NH Hoteles, 0.4} Table 6.2: Modal-multivalued Example Y (Endesa) = {38.7, 0.125; 38.75, 0.125; 38.8, 0.250; 38.85, 0.250; 38.9, 0.125; 39, 0.125} This means that we assign a probability of 0.125 to the possibility of that Endesa maximum dailyprice is 38.7, 0.125 to the possibility of that Endesa maximum daily price is 38.75, a probability of 64
    • 6. Symbolic Data0.25 to the possibility of that Endesa maximum daily price is 38.8 and so on. Another very interesting variant of this type are modal interval-valued variables. That is, insteadof a value with a probability, the variable can take any value in an interval with a probability. Contin-uing with the previous example: Y (Endesa) = {[38.7, 38.75), 0.125; [38.75, 38.85), 0.125; [38.8, 38.9), 0.25} For more information and other types of data, the reader is referenced to [Bill06a], [Huiw06] and[Arro06].6.2 Interval-valued variablesAs it has been already mentioned, to summarize a dataset is one of the three possible sources or rea-sons from which the interval data may result. According to [Huiw06], there are other two sources:the imprecision of measurement and the expert’s knowledge including uncertainty. Now, suppose u ∈ E is the set of m symbolic objects with observations Y (u) with u = 1, . . . , m.Let us suppose we are interested in the particular random variable Yj ≡ Z, and that the realizationof Z for the observation wu is the interval Z(wu ) = [au , bu ] = ξ. Then, according to [Bill06a], theempirical density function of Z is 1 Iu (ξ) f (ξ) = , ξ∈R (6.1) m Z(u) u∈Ewhere Iu (·) indicates if ξ is or is not in the interval Z(u) and where Z(u) is the length of thatinterval. Likewise, it can be shown that the symbolic empirical mean is given by ¯ 1 Z= (bu + au ) (6.2) 2m u∈E and the symbolic empirical variance is given by 2 2 1 1 S = b2 u + bu au + a2 u − (bu + au ) . (6.3) 3m 4m2 u∈E u∈E 65
    • 6. Symbolic Data These formulas are coherent with the hypothesis of uniformity into the intervals. As well as thesymbolic mean can be understood intuitively as the centre of gravity, the symbolic variance is not soeasy to be understood. In fact, it would seem more reasonable to formulate the variance as: 2 2 1 12 S = (bu + au ) − (bu + au ) . (6.4) 4m 4m2 u∈E u∈E That is, the variance of the midpoints. But this last formulation does not take into account theinternal variation of the intervals, while the former does and, hence, this is higher. For example, let us consider the maximum and minimum points for the Ibex 35 during December2006. Then, according to what said above, the mean point in that month was: ¯ 1 Z= (highu + lowu ) = 14116. 38 u∈E And the empirical symbolic variance is: 2 2 1 1 S = b2 u + bu au ) + a2 u − (bu + au ) = 28006. 3m 4m2 u∈E u∈E If we had calculated the variance taking only the midpoints the result would have been: 2 2 1 2 1 S = (bu + au ) − (bu + au ) = 26023 4m 4m2 u∈E u∈Ewhich is lower than that obtained previously because it does not take into consideration the internalvariation of the intervals. Although it seems that everything related to interval-valued data are advantages, according to[Huiw06], there are two major limitations when applying multivariate analysis on an interval dataset.The first one is that the computing work is hard, and the second one is that the hyperrectangle mayenlarge the range of the original dataset and reduce analysis accuracy. The methodology of interval data applied to multivariate analysis involved transforming symbolicdata matrix into numerical matrix. That is, to reduce p-dimensional observations into s-dimensional 66
    • 6. Symbolic Datacomponents (where usually s p). This is called Principal Component Analysis. There are twomain methods that carry out this: the Vertices Method and the Centres Method. The former consistsof getting a matrix with 2p rows and p columns, from a hyperrectangle in the p−dimensional space,where each row contains the coordinates of one vertex of hyperrectangle in Rp . On the other hand,the latter deals with the idea of the average value of every variable for each category of data. A moreextended review of these two methods can found in [Bill06a]. [Huiw06] point up some limitations ofthese methods and propose a new type of symbolic data: factor interval data. Due to the fact that thesymbolic data is a wide field, the reader is referenced to all the above citations.6.3 Classical regression analysis with Interval-valued dataRegarding to classical multiple regression there are three current approaches to be considered, thoughone of them is just a regression fit. Let us begin with the most intuitive to finish with the most con-ceptual. Due to the fact that now we have intervals instead of single values, it would be natural to takemidpoints and to proceed as it was done with multiple classical regression. That is, to use the resultto make new predictions from a new interval applying it to each extreme of the interval. Moreover,[DeCa05] remark the need of establishing the constraint βi ≥ 0 to ensure that the lower extreme of thepredicted interval is lower than the higher extreme, and suggest the algorithm presented by [Laws74]to solve such constraint. We suggest the alternative of getting enough draws from the posterior distri-bution of β, discard those who are negative and average. Let us recall the same example shown in the classical multiple regression, but taking now themaximum and minimum values that Ibex 35, Dow Jones, FTSE 100 and DAX took in the first tenmonths in 2006. We would take the midpoints of those intervals and we would obtain the same resultthat we got in the classical multiple regression: IBEX35t = 1.0102DowJonest−1 − 2.0144F T SE100t−1 + 2.1229DAXt−1 + twhere t N 0, 332.712 67
    • 6. Symbolic Data We could use this model to predict a new observation for November, 1st , applying it to eachextreme of the intervals: max (IBEX35t ) = 1.0102 × 12161 − 2.0144 × 6149.9 + 2.1229 × 6289.7 = 13242.41 min (IBEX35t ) = 1.0102 × 11986.84 − 2.0144 × 6110.9 + 2.1229 × 6237.55 = 13040.7 So the prediction would be: [13040.7, 13242.41]. A disadvantage of this approach is that it does not take into account the interval length. To solve that problem, [DeCa05] and [DeCa04] suggest another regression for the interval range.They refer to this new approach as the constrained centre and range method (CCRM). In that casethe constraint is applied to the interval range regression instead of to the centres regression. We willemploy the radii instead of the ranges. So, going on with the previous example, we would have theradios for the different indexes to build the next model: RadioIBEX35t = 0.35RadioDowJonest−1 + 0.484RadioF T SE100t−1 + + 0.272RadioDAXt−1 + twhere t N (0, 26.312 ) With this new approach, the prediction could be calculated from the midpoint and the range ofthe interval:M idpointIBEX35t = 1.0102 × 12073.65 − 2.0144 × 6130.4 + 2.1229 × 6262.125 = 13141.3 RadioIBEX35t = 0.35 × 86.81 + 0.484 × 19.5 + 0.272 × 24.575 = 46.53 Now the prediction would be: [13094.75, 13187.81]. Finally, the last approach is the use of the symbolic mean, the symbolic variance and the sym-bolic covariance to make the regression. This means that a symbolic regression is used instead of the 68
    • 6. Symbolic Dataclassical regression. For this new approach another way of estimating is needed. Recall the classical univariate multiple regression model: Y = β0 + X1 β1 + · · · + Xp βp + (6.5)where N (0, σ 2 ) (6.6) Calculating the mean values we have: ¯ ¯ ¯ Y = β0 + X1 β1 + · · · + Xp βp + ¯ (6.7)from which it can be easily deduced that: β0 = 0 ⇒ ¯ = 0 (6.8) This means that the mean error is zero if there is a constant term in the model. This is a veryimportant point for the posterior consequences. Then we can obtain an equivalent model as: ¯ ¯ ¯ Y − Y = β0 + X1 − X1 β1 + · · · + Xp − Xp βp + (6.9) ¯ ¯ where Y − Y is the new dependent variable and X − X is the new matrix of independent variables. β can be estimated in the following way: ˆ −1 β = SXX SXY (6.10)where   var (X1 ) cov (X1 , X2 ) . . . cov (X1 , Xp )   cov (X1 , X2 ) var (X2 ) . . . cov (X2 , Xp )   SXX = . . .. .   . . . . . . .    cov (X1 , Xp ) cov (X2 , Xp ) . . . var (Xp ) and 69
    • 6. Symbolic Data   cov (X1 , Y )   cov (X2 , Y )   SXY = .   . .    cov (Xp , Y )where independent term is not being taken into account (so there is no column of ones in matrix X). The independent term β0 is estimated in the following way: p ˆ ¯ β0 = Y − ¯ βj Xj (6.11) j=1 With this new way, the symbolic variance, the symbolic covariance and the symbolic mean forinterval- valued variables can be used to estimate β. But this approach has the limitation that to be able to employ the symbolic statistics and the lastway of estimating it is necessary to introduce the independent term in the regression model. In fact,the most important point is that this last approach suggested by [Bill06a] is just a regression fit sincethey do not defined any residual term for symbolic data.6.4 Bayesian regression analysis with Interval-valued dataOnce we know how the interval-valued data can be employed in the classical regression, let us seehow this could be included in the Bayesian approach. For such purpose we will employ the CCRMproposed by [DeCa05]. According to what has been said above and in Bayesian Regression, there is nothing new to bedone. The problem is reduced to two Bayesian regressions: one for the centres and another for theradios with a constraint applied to . As we saw in Bayesian Regression, the constraint is much easierto be incorporated into the Bayesian approach than into the classical one. So, introducing the Bayesian approach into the regression with symbolic data, the engineer willbe able to incorporate more information to the problem that he could do with Bayesian regression andtraditional data. This is due to the fact that now two regressions are being made, and the expert willbe able to establish if the centres value will increase or decrease and the same for the radios. In this 70
    • 6. Symbolic Datasense, an opinion like: ’I think that Dow Jones will have less importance over Ibex 35 and DAX will have more relevancethan they have had until now, and there will be more volatility.’would mean that, for instance, the prior mean for the Dow Jones midpoint will be less than theindicated by the data. On the contrary, the prior mean for the DAX midpoint would be greater, as itwould occur with the prior means for the radios. 71
    • Chapter 7ResultsTo show the usefulness of the Bayesian Centre and Radius approach proposed in this project, exper-iments with real symbolic interval-valued data sets fitting a linear regression model together with adata set from Spanish Continuous Stock Market are considered in this section.7.1 Spanish Continuous Stock Market data setsWe have considered two situations in the Spanish Continuous Stock Market. On one hand, we haveused the monthly minimum and maximum prices of BBVA and BSCH from January 2000 to June2007 in order to show how the classical regression approach applied to interval- valued data can beimproved through the Bayesian Centre and Radius approach when the variables are directly related.This will let us see other advantages of the proposed approach over the classical regression with sin-gles values. On the other hand, we have taken the daily minimum and maximum prices of others two SpanishContinuous Stock Market companies such as Dogi and Zardoya from January 2006 to December 2006in order to show that the Bayesian Centre and Radius approach is better than other approaches evenwhen the variables are not related; that is, they are uncorrelated.7.2 Direct Relation between VariablesIn this case 66 of the total 89 months will be applied to the training set. The other 23 months will beapplied to the testing set. 72
    • 7. Results Let us begin with the classical regression approach applied to the midpoints of the monthly mini-mum and maximum prices that BBVA and BSCH took in the considered training period. These datayields the next model: BSCHM idpoint = 1.3008 + 0.6229 × BBV AM idpoint +where N 0, 0.52372 Figures 7.1 and 7.2 show that this model fits well enough for both training and testing sets. 13 12 11 10 9 8 7 6 5 7 8 9 10 11 12 13 14 15 16 17 Midpoints BBVA prices Figure 7.1: Classical Regression with single values in training test If we calculate the different error measures (Mean Error, Mean Absolute Error, Mean SquareError and Root Mean Square Error) for each set, we obtain the Table 7.1. This means that it is a goodmodel, but we are only using the midpoints to fit new data when we have much more available data.Therefore we are wasting information we have gathered. Actually, this can be seen graphically inFigure 7.3. This figure makes oneself think that the model is not as good as it was believed before,since there are too available information for a very simple result and one could expect more fromthose data. Thus, other approach, known as Centre Method, could be considered by applying the obtainedmodel to each maximum and minimum price to get a predicted maximum and minimum price. Thisprovides the results displayed in Figures 7.4 and 7.5. According to [Bill00], the total deviation is given by: CenterM ethod2000 = lower + upper (7.1) 73
    • 7. Results 14.5 14 13.5 13 12.5 12 11.5 11 10.5 10 9.5 13 14 15 16 17 18 19 20 Midpoints BBVA prices Figure 7.2: Classical Regression with single values in testing test Set ME MAE MSE RMSE Training 0 0.4208 0.2660 0.5157 Testing 0.2321 0.3831 0.2446 0.4946 Table 7.1: Error Measures for Classical Regression with single values The resulting error measures can be seen in Table 7.2. Now, we have a fitted interval-valued data for each observed interval- valued data. This approachseems to take advantage of the extracted data. Now let us see the resulting error measures according to Centre method proposed by [Bill02],where the sum of square errors is given by: n 2 2 SSECenterM ethod2002 = lower + upper (7.2) i=1 and, thus, the absolute mean error is given by: 74
    • 7. Results 13 12 11 10 9 8 7 6 5 4 6 8 10 12 14 16 18 Minimum and Maximum BBVA prices Figure 7.3: Classical Regression with interval- valued data 13 12 11 10 9 8 7 6 5 4 6 8 10 12 14 16 18 Minimum and Maximum BBVA prices Figure 7.4: Centre Method (2000) in training set n i=1 (| lower | +| upper |) M AECenterM ethod2002 = (7.3) n Table 7.3 shows that this new definition of the error does not improve very much the previousone. However, let us compare these last approaches with the Centre and Radius Method. In this casewe will have the next model: BCSHM idpoint = 1.3008 + 0.6299 × BBV AM idpoint + M idpoint 75
    • 7. Results 15 14 13 12 11 10 9 12 13 14 15 16 17 18 19 20 21 Minimum and Maximum BBVA prices Figure 7.5: Centre Method (2000) in testing set Set ME MAE MSE RMSE Training 0 0.8416 1.0638 1.0314 Testing 0.4643 0.7663 0.9784 0.9891 Table 7.2: Error Measure for Centre Method (2000)where M idpoint N (0, 0.52372 )and BCSHRadius = 0.106 + 0.6188 × BBV ARadius + Radiuswhere Radius N (0, 0.14582 ) 76
    • 7. Results Set ME MAE MSE RMSE Training 0 0.8917 0.5922 0.7695 Testing 0.4643 0.7717 0.5125 0.7159 Table 7.3: Error Measure for Centre Method (2002) According to [DeCa07], the sum of squares of deviations is given by: n 2 2 SSECentreRadiusM ethod = M idpoint + Radius (7.4) i=1 Therefore, the mean absolute error is given by: n i=1 (| M idpoint | +| Radius |) M AE = (7.5) n 13 12 11 10 9 8 7 6 5 4 6 8 10 12 14 16 18 Minimum and Maximum BBVA prices Figure 7.6: Centre and Radius Method in training set The results shown in Figures 7.6 and and in Table 7.4 clearly clearly show how the error measuresare less with Centre and Radius Method than with Centre Method, and thus, the former is better than 77
    • 7. Results 15 14 13 12 11 10 9 8 12 13 14 15 16 17 18 19 20 21 Minimum and Maximum BBVA prices Figure 7.7: Centre and Radius Method in testing set Set ME MAE MSE RMSE Training 0 0.5233 0.2866 0.5353 Testing 0.1837 0.4712 0.2558 0.5058 Table 7.4: Error Measures for Centre and Radius Methodthe latter. Now, let us take into consideration an expert’s knowledge about the Spanish Continuous StockMarket and see the results of the Bayesian Centre and Radius Method. Obviously, the Bayesianmethodology is mainly useful in the testing set since it is there where there are unobserved data. Bearing in mind the previous Centre and Radius model, an expert could think that BSCH wouldslightly get better respect to BBVA and assign the prior distribution seen in 5.36 with the followingprior parameters for the Midpoints: 78
    • 7. Results 1.3008 β0 = 0.64 V0 = 0.000000001 s2 = 0.52372 0 v0 = 107 Then the final Midpoints model would be: BCSHM idpoint = 1.3008 + 0.64 × BBV AM idpoint + M idpoint Let us assume that the expert would consider that the volatility would not vary and she or heassign a vague prior parameters for the Radius distribution: 0.106 β0 = 0.6188 V0 = 106 s2 = 0.14582 0 v0 = 4 Then, the final Radius model would be: BCSHRadius = 0.106 + 0.6188 × BBV ARadius + Radius The results for the testing set are shown in Figure 7.8 and in Table 7.5. This shows that the proposed Bayesian Centre and Radius Method improve all the previous ap-proaches since let us manage more information than the classical regression and we obtain best resultsthan those obtained with the Centre and Centre and Radius methods.7.3 Uncorrelated VariablesIn this other case 170 of the total 255 days will be applied to the training set. The other days will beapplied to the testing set. 79
    • 7. Results 15 14 13 12 11 10 9 12 13 14 15 16 17 18 19 20 21 Minimum and Maximum BBVA prices Figure 7.8: Bayesian Centre and Radius Method in testing test Set ME MAE MSE RMSE Testing 0.0126 0.4409 0.1997 0.4469 Table 7.5: Error Measures in Bayesian Centre and Radius Method The classical regression with the midpoints of the prices range yields the next model: DogiM idpoint = 5.6570 − 0.0806 × ZardoyaM idpoint +where N (0, 0.28822 ) Figures 7.9 and 7.10 show that this model does not fits well for neither training nor testing sets. If we calculate the different error measures (Mean Error, Mean Absolute Error, Mean SquareError and Root Mean Square Error) for each set, we obtain the Table 7.6. 80
    • 7. Results 5 4.8 4.6 4.4 4.2 4 3.8 3.6 3.4 3.2 3 21 21.5 22 22.5 23 23.5 24 24.5 25 25.5 26 Midpoints Zardoya prices Figure 7.9: Classical Regression with single values in training test 4.1 4 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2 3.1 22.4 22.6 22.8 23 23.2 23.4 23.6 23.8 24 24.2 Midpoints Zardoya prices Figure 7.10: Classical Regression with single values in testing test The Centre Method could be applied to get a predicted maximum and minimum price. Thismethod yields the next model: DogiM idpoint = 5.6570 + 0.0792 × ZardoyaM idpoint +where 81
    • 7. Results Set ME MAE MSE RMSE Training 0 0.4231 0.2268 0.4763 Testing -0.3518 0.3651 0.1642 0.4052 Table 7.6: Error Measures for Classical Regression with single values N (0, 7.21372 ) Note that the slope has changed since, according to [DeCa04], it cannot be negative to ensure thatthe fitted maximum is greater than the fitted minimum. This provides the results shown in Figures 7.11 and 7.12. 8 7 Minimum and Maximum Dogi prices 6 5 4 3 2 20 21 22 23 24 25 26 27 Minimum and Maximum Zardoya prices Figure 7.11: Centre Method (2000) in training set Table 7.7 shows the resulting error measures. 82
    • 7. Results 8 7.5 7 6.5 6 5.5 5 4.5 4 3.5 3 22 22.5 23 23.5 24 24.5 Minimum and Maximum Zardoya prices Figure 7.12: Centre Method (2000) in testing set Set ME MAE MSE RMSE Training -7.1288 7.1288 51.8315 7.1994 Testing -8.0653 8.0653 65.1544 8.0718 Table 7.7: Error Measure for Centre Method (2000) It is very clear that this model is not accurate. This example evidences the main weak point ofthis approach: the positive constraint imposed to the coefficients. This makes impossible having aninverse relationship between variables. This is pointed out by the error measures, which are very high. Now let us see the resulting error measures according to Centre method proposed by [Bill02]: This new definition of the error improves the previous one. However, let us compare these last approaches with the Centre and Radius Method. In this casewe will have the next model: DogiM idpoint = 5.6570 − 0.086 × ZardoyaM idpoint + 83
    • 7. Results Set ME MAE MSE RMSE Training -7.1288 7.1288 25.9183 5.0910 Testing -8.0653 8.0653 32.5825 5.7081 Table 7.8: Error Measure for Centre Method (2002)where N (0, 0.28822 )and DogiRadius = 0.0283 + 0.08 × ZardoyaRadius +where N (0, 0.02592 ) The results can be seen in Figures 7.13 and 7.14 and Table 7.9. Set ME MAE MSE RMSE Training 0 0.4385 0.2273 0.4768 Testing -0.3426 0.3882 0.1655 0.4068 Table 7.9: Error Measures for Centre and Radius Method As it occurred with a direct relationship between variables, now the error measures are less withCentre and Radius Method than with Centre Method, and thus, the former is better than the latter even 84
    • 7. Results 5 4.5 4 3.5 3 2.5 20 21 22 23 24 25 26 27 Minimum and Maximum Zardoya prices Figure 7.13: Centre and Radius Method in training set 4.4 4.2 4 3.8 3.6 3.4 3.2 3 22 22.5 23 23.5 24 24.5 Minimum and Maximum Zardoya prices Figure 7.14: Centre and Radius Method in testing setwhen there is not a clear relationship. Now, let us see what happens introducing the Bayesian methodology. Bearing in mind the previ-ous Centre and Radius model, an expert could think that the situation would change drastically andassign the following prior parameters to the prior distribution explained in 5.36 for the Midpoints: 85
    • 7. Results 3.1 β0 = 0.02 V0 = 10−8 s2 = 0.28822 0 v0 = 106 So the final Midpoint model would be: DogiM idpoint = 3.1 + 0.02 × ZardoyaM idpoint + M idpoint And the following prior parameters to the prior distribution for the Radii: 0.0283 β0 = 0.08 V0 = 106 s2 = 0.02592 0 v0 = 4 So the final Radius model would be: DogiRadius = 0.0283 + 0.08 × ZardoyaRadius + The results for the testing set are shown in Figure 7.15 and Table 7.10. Set ME MAE MSE RMSE Testing 0.1031 0.2008 0.0443 0.2104 Table 7.10: Error Measures in Bayesian Centre and Radius Method 86
    • 7. Results 4.4 4.2 4 3.8 3.6 3.4 3.2 3 22 22.5 23 23.5 24 24.5 Minimum and Maximum Zardoya prices Figure 7.15: Bayesian Centre and Radius Method in testing set The Bayesian Centre and Radius Method results again better than the rest of approaches, even inunfavourable conditions. Therefore, we can conclude that the Bayesian Centre and Radius method has the same advantagesas the Centre and Radius method as [DeCa07] describe, but it has also the great advantages of theBayesian methodology. All this makes obtaining less errors in new predictions. An important futuredevelopment could be to build a Bayesian symbolic regression model with uniformly distributederrors. 87
    • Chapter 8A Guide to Statistical Software Today8.1 IntroductionStatistical software begins to blend in one direction with relational database software such as Oracleor Sybase (software we do not discuss here) and with mathematical software such as MATLAB inthe other direction. Mathematical software exhibits not only statistical capabilities flowing from codefor matrix manipulation, but also optimization and symbolic manipulation useful for statistical pur-poses. This chapter is an assessment of the state of the art of the statistical software arena as of 2007.It attempts to touch upon a few commercial packages, a few general public license packages, a fewanalysis packages with statistic add-ons, and a few general purpose languages with statistical libraries. We begin with the most important commercial packages such as SAS, Minitab, BMDP, SPSSor S-PLUS, followed by some of the public licenses statistical and Bayesian software such as R orBUGS and then some general purpose mathematical software and some general programming lan-guage with statistical libraries. Finally, it is exposed the role of the developed application in the current statistical scene, remark-ing the main advantages and disadvantages. 88
    • 8. A Guide to Statistical Software Today8.2 Commercial Packages8.2.1 The SAS System for Statistical AnalysisSAS began as a statistical analysis system in the late 1960’s growing out of a project in the Depart-ment of Experimental Statistics at North Carolina State University. The SAS Institute was founded in1976. Since that time, the SAS System has expanded to become an evolving system for complete datamanagement and analysis. This means that SAS is really much more than a simple software system.As an example of its great potential, it is worth to mention that it is used by the 90 percent of thosecompanies on the Fortune 500 list. This expansion is probably due to the fact that the SAS manage-ment has aligned themselves with the recent ”statistical-like” advances within the computer sciencecommunity such as data mining. This clever integration of mathematical/statistical methodologies,database technology, and business applications has helped propel SAS to the top of the commercialstatistical software arena. The architecture for the SAS approach is called the SAS Intelligence Platform, which is really aclosely integrated set of hardware/software components that allow users to fully utilize the businessintelligence (BI) that can be extracted from their client base. Among the products making up the SASSystem are products for: management of large data bases; statistical analysis of time series; statisticalanalysis of most classical statistical problems, including multivariate analysis, linear models (as wellas generalized linear models), and clustering; data visualization and plotting. Being more precise, theSAS Intelligence Platform consists of the following components: • The SAS Enterprise ETL Servers • The SAS Intelligence Storage • The SAS Enterprise BI Server • The SAS Analytic Technologies One of the strengths of SAS is the fact that the package which contains those capabilities that onenormally associates with a data analysis package is constantly being upgraded with each release inorder to reflect the latest and greatest algorithmic developments in the statistical field. The SAS System is available on PC and UNIX based platforms, as well as on mainframe com-puters so it covers almost the main options, except Macintosh. As one could guess from what has 89
    • 8. A Guide to Statistical Software Todaybeen said above, this system is aimed mainly to industrial, scientists and statisticians users with avery high needs and knowledge and who do not care about spending time in learning process to usethis complex system. Some useful URL’s are: • http://www.sas.com/ which is the main URL for SAS • http://is.rice.edu/ radam/prog.html which contains some user-developed tips on using SAS Other statistical systems which are of the same general vintage as SAS are MINITAB, BMDP andSPSS. All of these systems began as mainframe systems, but have evolved to smaller scale systemsas computing has evolved.8.2.2 MinitabMinitab Inc. was formed more than 20 years ago around its flagship product, MINITAB statisticalsoftware. MINITAB Statistical Software provides tools to analyze data across a variety of disciplines,and is targeted for users at every level: scientists, business and industrial users, faculty, and students. In relation to the operating system, MINITAB is available on the most widely-used computerplatforms, including Windows, DOS, Macintosh, OpenVMS, and Unix. As the opposite of SAS, MINITAB is quite easy to learn and use. There is no lengthy learningprocess and little need for unwieldy manuals. Perhaps, this may be the main reason why MINITABis used extensively in the educational community. For more details about this software visit the URL http://www.minitab.com/.8.2.3 BMDPBMDP has its roots as a bio-medical analysis packages from the late 1960s. In many ways it has re-mained true to its origins and this is evidenced by its long list of clients which includes such biomed-ical giants as Bristol- Myers Squibb, Merck and Glaxo Wellcome. There are three main distributions: 90
    • 8. A Guide to Statistical Software TodayBMDP New System Personal Edition, the BMDP Classic for PCs - Release 7, and the BMDP NewSystem Professional Edition. As BMDP New System has an easy-to-use interface that makes dataanalysis possible with simple point and click and fill-in-the-blank interactions, the Professional Edi-tion combines the full suite of BMDP Classic for PCs Release 7 statistics with the powerful datamanagement and front-end data exploration features of the BMDP New System Personal Edition. A reference URL for BMDP is http://www.ppgsoft.com/bmdp00.html.8.2.4 SPSSSPSS is a multinational software company founded in the late 1960s that provides statistical productand service solutions for survey research, marketing and sales analysis, quality improvement, scien-tific research, government reporting and education. SPSS starts with the SPSS Base which includes most popular statistics, complete graphics, broaddata management and reporting capabilities. The SPSS products are a modular system and includesSPSS Professional Statistics, SPSS Advanced Statistics, SPSS Tables, SPSS Trends, SPSS Cate-gories, SPSS CHAID, SPSS LISREL 7, SPSS Developer’s Kit, SPSS Exact Tests, Teleform, andMapInfo. Although this software was originally designed for mainframe use, SPSS has adapted tomarket demand and it has releases for Windows, MAC and UNIX. A reference URL for SPSS is http://www.spss.com/.8.2.5 S-PLUSWhile there are many different packages for performing statistical analysis, one that offers some ofthe greatest flexibility with regard to the implementation of user defined functions and the customiza-tion of ones environment is S-PLUS, which is one of the two implementations of the S language (Ris the other one, which will be viewed later). S is an exceptionally well-developed tool for statistical research and analysis. S is especiallystrong for statistical graphics, the output of data analysis through which both raw data and results aredisplayed for both analysts and clients. S was originally developed at AT&T Bell Labs (recently splitinto AT&T Laboratories and Lucent Bell Labs) by a team of researchers including Richard A. Becker, 91
    • 8. A Guide to Statistical Software TodayJohn M. Chambers, Allan Wilks, William S. Cleveland and Trevor Hastie. The original descriptionof the S language, which was written by Becker, Chambers, and Wilks in 1988, was awarded by theAssociation for Computing Machinery (ACM) for the 1998 Software System Award. The aim of thelanguage, as expressed by John Chambers, is ”to turn ideas into software, quickly and faithfully”. A good introduction to the application of S to statistical analysis problems is contained in [Cham92]and [Cham83]. More recent work that focus on the statistical capabilities of the S-PLUS system canbe found in [Vena02]. S-PLUS is manufactured and supported by the Statistical Sciences Corporation, now a division ofMathSoft. It runs on both PC and UNIX based platforms. In addition the company offers easy linksfor the user to call S-PLUS from within C/FORTRAN or for the user to call C/FORTRAN compiledfunctions within the S-PLUS environment. Statistical Sciences has made great efforts to keep thesoftware current with regard to the needs of the statistical community. They have released dedicatedmodules which are targeted at specific application areas. The S-PLUS home page can be reached at http://www.mathsoft.com/. This site contains an inter-esting comparison between SAS and S-PLUS.8.2.6 OthersOther statistically oriented packages enjoying good reputations are SYSTAT, DataDesk, JMP andStatGraphics. SYSTAT originated as a PC-based package developed by Leland Wilkinson and is nowowned by SPSS. The current version is 6.0 and is a Microsoft Windows oriented product. On the con-trary, DataDesk is a Macintosh-based product authored by Paul Velleman from Cornell University.Currently released is version 5.0.1. and it is a GUI-based product which contains many innovativegraphical data analysis and statistical analysis features. More information about DataDesk can befound at URL: http://www.lightlink.com/datadesk/. JMP is another SAS product that is highly visu-alization oriented. It is a stand alone product for PC and Macintosh platforms. Information on JMPcan be found at http://www.sas.com/. Statgraphics is an education- orientated statistical software tobe used mainly in Universities and which offers a friendly-user interface. A good reference showinghow to use StatGrahics can be found in [Mat´ 95]. e 92
    • 8. A Guide to Statistical Software Today8.3 Public License Packages8.3.1 RR is an Open Source implementation of the well-known S language which was originated at the Uni-versity of Auckland, New Zealand, in the early 1990’s. It works on multiple computing platforms likeUnix systems or Windows, but the most important characteristic is that a software system that existsunder the Open source paradigm benefits from having ”many pairs of eyes” to examine the softwareto help insure quality of the software. An example of the rapid development of this software is that in1997, only two years after the public release in June 1995, the led team had to select a Core group ofaround 10 members, who was responsible for changes to source code. R software, for the most part, is a command line based language which is organized into vari-ous packages. Basic packages are installed by default, and the user can download and install a greatvariety of packages to be used. There are also several major projects that are ”R spin-offs”, such as”Bioconductor”, which is an R package for gene expression analysis or ”Omega”, which is anotherpackage focused on providing seamless interface between R and a number of other languages (PERL,PYTHON, MATLAB). There are two main packages that have to be mentioned because of their im-portance and implication for this project: JRI and bayesm. The first one deals with the problem ofcommunicating Java with R. This will let us create a graphical user interface using Swing in Javaand make all the statistical calculates with R. The second one, developed by [Rossi06], contains themain functions to be used in Bayesian analysis. Precisely, in Bayesian data analysis is where R canbe better than the other statistical softwares. More information about R can be found at http://www.r-project.org/.8.3.2 BUGSThe BUGS (Bayesian inference Using Gibbs Sampling) project is concerned with flexible softwarefor the Bayesian analysis of complex statistical models using Markov chain Monte Carlo methods.The project began in 1989 in the MRC Biostatistics Unit and led initially to the ”Classic” BUGSprogram, and the onto the WinBUGS software developed jointly with the Imperial College Schoolof Medicine at St. Mary’s, London. Development now also includes the OpenBUGS project in theUniversity of Helsinki, Finland. 93
    • 8. A Guide to Statistical Software Today The main advantages of these software is, as R, the flexibility that offers to the researcher tomodel whatever he needs, but they are slightly more complex to learn than R is. Due to this fact, PhilWoodward developed BugsXLA, which is an Excel Add-In that not only allows the user to specifya model as one would in a package such as SAS or S-PLUS, but also aids the specification of priorsand control of the MCMC run itself. More information can be found in http://www.mrc-bsu.cam.ac.uk/bugs/.8.4 Analysis Packages with Statistical Libraries8.4.1 MatlabMATLAB is an interactive computing environment that can be used for scientific and statistical dataanalysis and visualization. The basic data object in MATLAB is the matrix. The user can per-form numerical analysis, signal processing, image processing and statistics on matrices, thus freeingthe user from programming considerations inherent in other programming languages such as C andFORTRAN. There are versions of MATLAB for Unix platforms, PC’s running Microsoft Windowsand Macintosh. Because the functions are platform independent, provides the user with maximumreusability of their work. MATLAB comes with many functions for basic data analysis and graphics. Most of these arewritten as M-file functions, which are basically text files that the user can read and adapt for otheruses. The user also has the ability to create their own M-file functions and script files, thus makingMATLAB a programming language. The recent addition of the MATLAB C-Compiler and C-MathLibrary allow the user to write executable code from their MATLAB library of functions, yieldingfaster execution times and stand-alone applications. For researchers who need more specific functionality, MATLAB offers several modules or tool-boxes. These typically focus on areas that might not be of interest to the general scientific community.Basically, the toolboxes are a collection of M-file functions that implement algorithms and functionscommon to an area of interest. 94
    • 8. A Guide to Statistical Software Today One of the most useful capabilities of MATLAB is the tools available for visualizing data. MAT-LAB supports standard two and three dimensional scatter plots along with surface plots. In addition itprovides the user with a graphics property editor. As it occurs with R, there is a considerable amountof contributed MATLAB code available on the internet. One notably useful source of code is avail-able via the home page for MATLAB at http://www.mathworks.com/, where more information aboutthis software can be found.8.4.2 MathematicaMathematica is an algebra computational system developed originally by Stephen Wolfram and soldby his company, Wolfram Research. It has numerical and graphical features and powerful symbolicprocessing capabilities but is comparatively complex to learn. Information on Mathematica is avail-able at URL http://www.wolfram.com/.8.4.3 OthersOther mathematical software worth noting is MAPLE, with powerful symbolic processing capabili-ties, and MATHCAD, a package which combines numerical, symbolic, and graphical features. Moreinformation about these software can be found at their official web sites, which are : • http://www.maplesoft.com/ • http://www.mathsoft.com/8.5 Some General Languages with Statistical Libraries8.5.1 JavaIt is difficult to asses the state of the art with regards to Java statistical libraries in that there may bemany custom user developed packages that we are unaware of. Given this caveat, there are three mainpackages to mention. The first one is StatCrunch, which provides the user capability to perform interactive exploratorydata analysis, logistic regression, nonparametric procedures, regression and regression diagnostics, 95
    • 8. A Guide to Statistical Software Todayand others. The reader is referred to a review that appeared in [West04]. Another source of Java- based statistics functions is the Apache Software Foundation Jakartamath project. The math project seeks to provide common mathematical functionality to the Java usercommunity. The final source for Java- based statistical analysis is the Visual Numerics JSML package. Itprovides the user with an integrated set of statistical, visualization, data mining, neural network, andnumerical packages. The reader is referred to http://www.vni.com/products/imsl/jmsl/jmsl.html foradditional discussions on JSML.8.5.2 C++C++ is another object oriented language program, like Java, with different statistical libraries. Thereare two libraries that are worth mentioning. Goose and Probability and Statistics. The first one is dedicated to statistical computation, and provide support for t-tests, F-tests,Kruskall-Wallis tests, Spearman tests and others with an implementation of simple linear regressionmodels. More information about this in http://www.gnu.org/software/goose/goose.html. The second one is aimed to Microsoft Windows developers and consists of five packages: statis-tics, discrete probability, standard probability distributions, hypothesis testing and correlation andregression. A strength of these modules are their support for various interfaces including C# and C++.NET. The reader is referred to the URL http://www.webcabcomponents.com/dotNET/dotnet/pss/.8.6 Developed Software Tool: BARESIMDAThe software tool developed throughout this project, as it has been said above, is based on Java andR, both of them public licenses software. It has not been developed with the intention of creatinga complete statistical package which can be an alternative to any of the above software. Evidently,it is very difficult to incorporate all the facilities that those programs have, and much more in a oneyear period time for only one developer. In fact, BARESIMDA only focus on regression analysis 96
    • 8. A Guide to Statistical Software Todayprocedures with different approaches and data. In that sense, the developed tool gathers classical andBayesian regression and let user analyze Normal regression models in a very simple way with a veryintuitive graphical user interface. This is a very important feature in Bayesian approach, where thereis a complex theoretical basis and many users may not be familiarized with it. Another advantage, maybe the most important one, over the rest of statistical package is thatBARESIMDA incorporates regression analysis with interval data in either classical and Bayesian ap-proach. Not only it displays the analytical results but also let us see graphically the goodness fit andthe centres and radii tendencies. With this first version of BARESIMDA, we have wanted to start the way towards public licensesoftware which can take the advantages from the Java graphical user interface with Swing and fromstatistical libraries in R. 97
    • Chapter 9Software Requirements SpecificationThis chapter defines the complete description of the functions to be performed by the BARESIMDAsoftware, so it will assist the potential users to determine if the software specified meets their needsor how the software must be modified in order to meet their needs. This also reduces the development effort since the preparation of the Software RequirementsSpecification (SRS) forces the developer to consider rigorously all of the requirements before de-sign begins and reduces later redesign, recoding, and retesting. Careful review of the requirementsin the SRS can reveal omissions, misunderstandings, and inconsistencies early in the developmentcycle when these problems are easy to correct. Likewise it provides a basis for estimating costs andschedules and a baseline for verification and validation.9.1 PurposeThe aim of this system is to provide a tool to build different types of regression analysis and checkadvantages and disadvantages in each approach that has been developed.9.2 Intended AudienceThe software is intended to be handled by different types of users such: • Inexperienced people who has a minimum knowledge about what regression is and what con- sists on. 98
    • 9. Software Requirements Specification • Students and people with a medium degree of knowledge about regression and minimum infor- mation about Bayesian paradigm. • Graduate and experienced people who has a deep knowledge about regression and Bayesian analysis and want to learn about symbolic regression.9.3 FunctionalityThe software is supposed to do those things shown in the following points.9.3.1 Classical Regression with crisp dataThis refers to analytic and graphical analysis of multiple and simple classical Normal regressionmodels with crisp data. Being precise, the software has to provide the following facilities: • Regression analysis summary with estimated parameters • ANOVA table. • Normality test. • Heteroscedasticity test. • Autocorrelated errors test. • To predict new data. • Complementary graphics to see the fitted model.9.3.2 Classical Regression with interval- valued dataAs well as it was done with crisp data, a regression analysis must be carried out with symbolic data,specifically, with interval- valued data. All the functions which were described previously must beimplemented for the centres and radii regressions. In addition, the software will display graphicallythe adequacy of the fitted model to the original interval- valued data. 99
    • 9. Software Requirements Specification9.3.3 Bayesian Regression with crisp dataThe user must be capable of creating two different Bayesian models: Normal and Independent Nor-mal. Since the main characteristic of Bayesian paradigm is the possibility to introduce subjectiveinformation, the application will provide a very intuitive dialog to retrieve user’s beliefs about thedifferent parameters. The software will display the estimated parameters, it will provide a Normalitytest for residuals and input fields to make new predictions.9.3.4 Bayesian Regression with interval- valued dataAs it occurred with classical regression, Bayesian regression must be possible to be carried out withinterval- valued data so the user will be able to incorporate prior information about the centres andthe radii. The analysis options are the same that those for crisp data and it has additional graphics tosee the adequacy of fitted interval- valued data with observed data.9.3.5 Data ManipulationThe user will be able to type in new data by hand or to load an existing excel file into the application.In the same way, he will be able to save both source data and the following resulting data: • Residuals • Normalized residuals • Studentized residuals • Fitted values • Predicted values in an excel file.9.3.6 PortabilityThe application must be able to be executed in the main platforms such as Windows, Linux and Unix.9.3.7 MaintainabilityIn the same way, the tool must be well structured to be easily maintainable since changes and exten-sions in the future are quite probable. 100
    • 9. Software Requirements Specification9.4 External Interfaces9.4.1 User InterfacesThe application to be developed will have a Multiple Document Interface (MDI) with a high degreeof usability. The former means that its windows will reside under a single parent window like FigureC.1 shows. Figure 9.1: BARESIMDA MDI This will avoid filling up the operating system task management interface, as they are hierarchi-cally organized, and it will let the user hidden/ show/minimize/maximize them as a whole. The second characteristic means that the user will not have to think too much about what theapplication does or how it does it. There will be an option to configure the application look to be able to be adapted to user’s prefer-ences. The user will have the possibility to set the windows look as: • Unix • Windows 101
    • 9. Software Requirements Specification • Windows Classic • Java In the same way, the user will be able to indicate if she or he is an experienced or an inexperienceduser, what will help her or him to specify prior information in Bayesian regression.9.4.2 Software InterfacesBARESIMDA will connect to a Statistics software which will be responsible for making all compu-tations and returning them to BARESIMDA. In this way all the operations must be transparent forthe end user through an interface which will let us interact between both programs. This makes theapplication more usable. Regarding input and output data, there will be necessary an interface to let us read and write fromand to an excel file. 102
    • Chapter 10Software Architecture Study10.1 Hardware/ Software ArchitectureThe application will be programmed in Java and built, executed and tested with SDK version 1.4.2or posterior. Specifically, the graphical user interface will be developed using Swing. This is oneof the most powerful tools for developing user-friendly mechanism for interacting with an applica-tion giving it a distinctive ”look” and ”feel”. Its libraries are part of the Java Foundation Classes(JFC) - Java’s libraries for cross- platform GUI development. For more information on JFC visithttp://java.sun.com/products/jfc/. This will let us develop the main interface in a particular system,but then can be executed in any platform, allowing users from different operating systems use thelook and feel of their own platform. The software which has been chosen to carry out the statistics processes has been R since it isdistributed under public license, like Java; it let developer a high degree of flexibility to model and toprogram the models that he wants to build and it is having a great expansion between statisticians andscientists. The way to communicate BARESIMDA and R is through the Java to R interface, called JRI. Thisis a .jar library which can be obtained from http://rosuda.org/JRI/ and it allows running R inside Javaapplications as a single thread. Basically it loads R dynamic library into Java and provides a Java APIto R functionality. JRI uses native code, but it supports all platforms where Sun’s Java (or compatible)is available, including Windows, Mac OS X, Sun and Linux. More information about this interfacecan be found in the reference cited above. 103
    • 10. Software Architecture Study Figure 10.1: Interface between BARESIMDA and R As it was indicated in the previous chapter, BARESIMDA is required to read/ write excel files. Forsuch purpose, POI project consists of various parts that fit together to deliver the data in a Microsoftfile format to the Java application. Specifically, and according to our requirements, HSSF is the POIproject’s pure Java implementation of the Excel file format. It provides a way to create, modify, readand write XLS spreadsheets. Being more precise, it offers: • Low level structures for those with special needs • An event-model API for efficient read-only access. • A full user model API for creating, reading and modifying XLS files. Visit http://jakarta.apache.org/poi/hssf/index.html for more information.10.2 Logical ArchitectureThe application will be structured in three levels or layers where each of them will have a well definedresponsibility: 104
    • 10. Software Architecture Study Figure 10.2: Interface between BARESIMDA and Excel • gui: it will be responsible for showing the graphical user interface and for getting the input parameters and requests and passing them to the classes which will process them. • action: it will contain the main procedures that treat the information and elaborate the regres- sion models and analysis. The results will be given back to the caller process. It will be responsible for calling the dao classes too. • dao: it will be responsible for accessing to permanent data, this is, to load and to save informa- tion. Figure 10.3 shows the relation among these packages. Figure 10.3: Logical Architecture 105
    • Chapter 11Project BudgetProject costs for this system have been divided into two types of costs, which will be commented inthe following sections: • Engineering costs. • Investment and Materials Costs. There is also a section summarizing the entire expected budget for the project. There is not anycommercial cost since it is intended to be a public license software for free distribution.11.1 Engineering CostsA computer engineer working in the environment where the project is focused on is expected to earnaround 2500 e/month. There is an additional extra cost of 30% for paying the Social Security. The programmer works 8 hours/day, in a mean of 22 days/month. This makes a mean of 176hours/month. Thus, the cost per hour is 18.46 e/h. The estimated time required for the development of the project is divided in the work packagesexplained in the beginning of this project: • Bayesian Data Analysis: 168 hours • Regression Models: 160 hours. 106
    • 11. Project Budget • Symbolic Data: 64 hours. • Requirements Specification: 40 hours. • Architecture Study: 56 hours. • Design: 80 hours. • Programming: 416 hours. • Testing: 40 hours. The estimated time required for the project is: 1024 hours (5 months and 18 days). Thus, theestimated engineering costs is: 18903.04e.11.2 Investment and Elements CostsThe elements used for the development of this project have been computer and software equipment.These costs can be seen in Table 11.1. Element Price Pentium D925 to 3GHz 630e Other expenses (Internet connection, office materials) 60e Total 690e Table 11.1: Estimated material costs The amortization period for this type of elements is considered to be complete after 10000 work-ing hours. Moreover, the usage rate is considered to be of about 85% of the engineering work hours,thus obtaining the results shown in Table 11.2. 107
    • 11. Project Budget Concept Total Hours of use of the material 870.4 hours Resources cost/hour 0.19e/h Total amortization materials costs 165.38e Table 11.2: Amortization Costs Thus, the sum of the engineering and material cost is 19068.42e. It can be assumed that theinvestment done is about 5% of the engineering cost, thus the investment cost sums 945.15e. Therefore, the total cost of the project, which is the sum of the engineering, materials, and invest-ment costs, is estimated to be 20013.57e.11.2.1 Summarized BudgetThe overall expected budget can be observed in Table 11.3. 108
    • 11. Project Budget Cost Total Engineering 18903.04e Material 165.38e Investment 945.15e Total 20013.57e Table 11.3: Summarized Budget 109
    • Chapter 12Conclusions12.1 Bayesian Regression applied to Symbolic DataDealing with a current and recent investigation matter such as symbolic data involves having a highEnglish level, since it is the universal language. On the other hand, a project like this that dependson current investigations makes more difficult its progress since it is not treating with an establishedissue. A good investigation task requires a rigorous documentation and a complete bibliography. Theremust be enough well cited references to let the reader find more information about those points inter-esting for her or him. Bayesian methodology is called to be a fundamental element in business processes orientated topredict and forecast new situations and quantities. Although I have really enjoyed a lot with thisproject, I suspect that, with a more complete previous formation in Bayesian data analysis, I couldhave saved some initial time in learning concepts that, later, result obvious. This would let me extendthe project to other fields such as regression with hierarchical models or nonparametric Bayesian re-gression, where the authentic Bayesian potential resides. However, this is probably due to the factthat the more one knows about an issue, the more one likes it and the more one wants to learn aboutit, and therefore the problem would never finish. Regarding to this, the project has responded andexceeded to the initial personal perspectives, arousing a great interest for the investigation field andlearning to value this hard but exciting arena. 110
    • 12. Conclusions If I could change any of the past related to the project planning, I would have tried to condense thestudy stage in order to spend more time on the application of the software tool to more real problemsand situations. Nevertheless, it would be difficult to carry out since the project development is donewithin an academic year, where more activities are done.12.2 BARESIMDA Software ToolPublic license software is increasingly enormously fortunately. This lets everybody have more op-tions to choose. In that sense, R is a great tool for programming new models, but it requires, on one hand, a veryhigh Statistics knowledge, since people requirements with a low- medium Statistics level are satisfiedby current Statistics software. On the other hand, it requires a medium programming level to be ableto carry out one’s ideas. Moreover, the way in which R handles data results tedious for someone usedto work with matrix representation. Interconnecting different interfaces or applications is usually a difficult task, especially when thereis very little documentation to establish the connection in both sides. This problem is very important,and is not usually taken into consideration when integrating different environments. Concerning Java, it is really incredible all the possibilities and facilities that this programminglanguage offers. This makes programming task much easier.12.3 Future DevelopmentsAs it can be deduced from all what has been said above and in previous chapters, the project couldhave many and different extensions. The more important ones are: • Bayesian regression with hierarchical models for interval- valued data. • Bayesian time series for interval- valued data. • Bayesian linear regression for histogram-valued data. • Nonparametric Bayesian regression for interval- valued data. 111
    • 12. Conclusions • Bayesian Vector Autoregressive for interval- valued data. • Bayesian regression for functional data. • Bayesian symbolic regression with uniformly distributed errors. Likewise, the software tool can be improved adding some conventional statistical functions inorder to get public license Statistics software with a user-friendly graphical interface.12.4 SummaryIn one hand, we have built a new Bayesian regression model for interval- valued data fitting betterthan other existing approaches providing that prior information is accurate. And, as it has been shown,this works well for both directly relationship between variables and uncorrelated variables. This isan important advance in symbolic data field, since to our best knowledge there is not any Bayesianapproach for this kind of data. On the other hand, a new software tool letting user make Bayesian symbolic regression has beendeveloped. Again, to our best knowledge, there is not any package with the same friendly-user in-terface and the same facilities. Furthermore, it offers the possibility to make standard and Bayesianregression with classical and symbolic data individually respectively. As a result of this project, author and director are working together in a paper about past, presentand future of regression which is intended to be sent to ANALES. In the same way, another possiblearticle about Bayesian symbolic regression is born in mind to a more transcendent journal such asComputational Statistics and Data Analysis (CSDA). 112
    • Appendix AProbability DistributionsA number of probability distributions together with their density or probability mass function, meanor variance have been used or mentioned previously. For ease of reference, their definitions areregrouped in this chapter, together with a short discussion of their key properties. More informationabout these distributions in a Bayesian context can be found in [Gelm04] or [Mat´ 93]. eA.1 Discrete DistributionsA.1.1 BinomialThe Binomial distribution is perhaps the most commonly encountered discrete distribution in Statis-tics, and it is used in quality control by attributes and sampling techniques with replacement. Considera sequence of n independent trials, each of which can result in one of just two possible outcomes,namely success and failure. Further assume that the probability of success, p, is the same for eachtrial. Let Y denote the number of successes observed in the n trials. Then Y has a Binomial distri-bution with parameters n and p. Properly, a discrete random variable, Y , has a Binomial distributionwith parameters n and p, denoted Y Bin(n, p), if its probability mass function is given by: f (y|n, p) = py (1 − p)n−y (A.1) where n > 0, y = 0, 1, . . . , n and 0 ≤ p ≤ 1. Likewise, mean and variance are defined: 113
    • A. Probability Distributions E(Y ) = np (A.2) V ar(Y ) = np(1 − p) (A.3)A.1.2 GeometricThe Geometric distribution is related in a certain way with the previous one. Consider the samesituation that before with a sequence of n independent trials with a constant success probability pin each trial. In this case the number of trials varies until the first success is obtained, that is, it isused to model the number of trials until the first success is obtained and it is common in reliabilityanalysis. Formally, a discrete random variable, Y , has a Geometric distribution with parameter p,denoted Y Geo(p), if its probability mass function is given by: f (y|p) = (1 − p)y−1 p (A.4)where p ≥ 0 and y = 1, 2, . . . , n. In the same way, mean and variance are denoted by: 1 E(Y ) = (A.5) p 1−p V ar(Y ) = (A.6) p2A.1.3 PoissonThe Poisson distribution is commonly used to represent count data, such as the number of shares soldin a fixed time period. As well it is usual to see it in reliability analysis. In an strict way, a discreterandom variable, Y , has a Poisson distribution with parameter λ, denoted Y P (λ), if its probabilitymass function is given by: exp(−λ)λy f (y|λ) = (A.7) y!where λ ≥ 0 and y = 0, 1, . . . , n. 114
    • A. Probability Distributions In the same way, mean and variance are denoted by: E(Y ) = λ (A.8) V ar(Y ) = λ (A.9)A.2 Continuous DistributionsA.2.1 UniformThe uniform distribution is used to represent a variable that is known to lie in an interval and equallylikely to be found anywhere in the interval. The main characteristic is that if a variable, X, has aprobability distribution F (x), then the variable Y = F (X) is uniform in the interval [0, 1]. Properly,a continuous random variable, Y , has a Uniform distribution over the interval [a,b], denoted YU (a, b), if its probability density function is given by: 1 b−a a≤y≤b f (y|a, b) = (A.10) 0 otherwisewhere −∞ < a < b < ∞. Mean and variance are specified alike by: a+b E(Y ) = (A.11) 2 (b − a)2 V ar(Y ) = (A.12) 12A.2.2 Univariate NormalThe Normal distribution, also called Gaussian distribution, is ubiquitous in statistical work. It is afamily of distributions of the same general form, differing in their location and scale parameters: themean and standard deviation, respectively. The standard normal distribution is the normal distributionwith a mean of zero and a variance of one. Formally, a continuous random variable, Y , has a Normaldistribution with mean µ and variance σ 2 , denoted Y N (µ, σ 2 ), if its probability density functionis given by: 1 (y − µ)2 f (y|µ, σ 2 ) = √ exp(− ) (A.13) 2πσ 2 2σ 2 115
    • A. Probability Distributionswhere σ 2 ≥ 0, −∞ < µ < ∞ and y ∈ . Likewise, mean and variance are formulated by: E(Y ) = µ (A.14) V ar(Y ) = σ 2 (A.15)A.2.3 ExponentialThis distribution is used to model the time, t, between independent events that happen at a constantrate,λ. Therefore, this is the distribution of waiting times for the next event in a Poisson process andis a special case of the Gamma distribution with α = 1. Formally, a continuous random variable,Y , has an Exponential distribution with parameter λ, denoted Y Exp(λ), if its probability densityfunction is given by: f (y|λ) = λexp(−λy) (A.16)where λ ≥ 0 and y ≥ 0. Similarly, mean and variance are identified by: 1 E(Y ) = (A.17) λ V ar(Y ) = 1λ2 (A.18)A.2.4 GammaA Gamma distribution is a general type of statistical distribution that is related to the Beta distributionand arises naturally in processes for which the waiting times between Poisson distributed events arerelevant. In Bayesian context, the Gamma distribution is the conjugate prior distribution for the inverse ofthe normal variance and for the mean parameter of the Poisson distribution. 116
    • A. Probability Distributions In a formal way, a continuous random variable, Y , has a Gamma distribution with shape and scaleparameters α and β , respectively, denoted Y Gamma(α, β), if its probability density function isgiven by: y y α−1 exp(− β ) f (y|α, β) = (A.19) β α Γ(α)where α > 0 and y > 0. Similarly, mean and variance are identified by: E(Y ) = αβ (A.20) V ar(Y ) = αβ 2 (A.21)A.2.5 Inverse- GammaIf Y −1 has a Gamma distribution with parameters α and β then Y has the Inverse- Gamma distribu-tion. In a Bayesian context, this distribution is the conjugate prior distribution for the normal variance. Formally, a continuous random variable, Y , has an Inverse-Gamma distribution with shape andscale parameters α and β, respectively, denoted Y Inv − Gamma(α, β), if its probability densityfunction is given by: β α y −α−1 exp(− β ) y f (y|α, β) = (A.22) Γ(α)where α > 0, β > 0 and y > 0. Similarly, mean and variance are identified by: β E(Y ) = α>1 (A.23) α−1 β2 V ar(Y ) = α>2 (A.24) (α − 1)2 (α − 2)A.2.6 Chi-squareIt is an essential distribution in Inference Statistics and in goodness tests. The χ2 distribution is a vspecial case of the Gamma distribution, with shape parameter α = 2 and scale parameter β = 12. 117
    • A. Probability DistributionsSince it is a special case, we need not define again the density function and mean and variance as theycan be deduced easily from the Gamma distribution.A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-SquareAs the χ2 distribution is a special case of the Gamma distribution, the inverse χ2 distribution is a vspecial case of the inverse- Gamma distribution, with shape parameter α = 2 and scale parameterβ = 12. So, for density function and mean and variance, see inverse- Gamma distribution. We alsodefine the scaled inverse χ2 distribution, which is useful for variance parameters in normal models.A continuous random variable, Y , has a scaled inverse χ2 distribution with v degrees of freedom andscale s, denoted Y ScaledInv − χ2 (v, s2 ), if its probability density function is given by: v (v)2 v vs2 f (y|v, s) = v v sv y −( 2 +1) exp(− ) (A.25) Γ( 2 ) 2y The mean and variance are defined by: v 2 E(Y ) = s v>2 (A.26) v−2 2v 2 V ar(Y ) = v>4 (A.27) (v − 2)2 (v − 4) Note that this is the same as Inv − Gamma(α = v , β = v s2 ). 2 2A.2.8 Univariate Student- tThe Student’s t-distribution is a probability distribution that arises in the problem of estimating themean of a normally distributed population when the sample size is small. In regression analysis, it isused to represent the posterior predictive distribution in Normal regression. As anecdote, it is worthto mention that this distribution was published by William Gosset in 1908, but he was not allowedto bring it out under his own name, so the paper was written under the pseudonym Student. Strictly,a continuous random variable, Y , has a Student’s t-distribution with degrees of freedom, denotedY t(v), if its probability density function is given by: Γ( v+1 ) y 2 v+1 f (y|v) = √ 2 v (1 + )− 2 (A.28) vπΓ( 2 ) v 118
    • A. Probability Distributionswhere v > 0 and y > 0. In the same way, mean and variance are identified by: E(Y ) = 0 v>1 (A.29) v V ar(Y ) = v>2 (A.30) v−2A.2.9 BetaIn probability theory and statistics, the Beta distribution is a family of continuous distributions de-fined on the interval [a, b] differing the values of their two non- negative shape parameters, α andβ. In Bayesian context, the Beta is the conjugate prior distribution for the binomial probability. Acontinuous random variable, Y , has a Beta distribution with α and β, denoted Y Beta(α, β) if itsprobability density function is given by: Γ(α + β) α−1 f (y|α, β) = y (1 − y)β−1 (A.31) Γ(α)γ(β)where α ≥ 0 and β ≥ 0 Mean and variance are identified by: α E(Y ) = (A.32) α+β αβ V ar(Y ) = (A.33) (α + β)2 (α + β + 1)A.2.10 Multivariate NormalThe multivariate normal distribution extends the univarate Normal distribution model to fit vectorobservations. A p- dimensional vector of continuous random variables, Y = (Y1 , Y2 , . . . , Yp ), is saidto have a multivariate Normal distribution with vector of means µ and variance- covariance matrix Σ,if its probability density function is given by: 1 p 1 1 f () = ( √ ) 2 |Σ|− 2 exp[− (y − µ) Σ−1 (y − µ)] (A.34) 2π 2 119
    • A. Probability Distributions Likewise, mean and variance are formulated by: E(Y ) = µ (A.35) V ar(Y ) = Σ (A.36)A.2.11 Multivariate Student- tIt is a multivariate generalization of the Student’s t-distribution. Rigorously, a continuous randomvariable has a multivariate Student’s t-distribution with v degrees of freedom, location µ = (µ1 , . . . , µd )and symmetric, positive definite dxd scale matrix Σ, denoted Y t(v, µ, Σ), if its probability densityfunction is given by: Γ( v+d ) 1 1 −(v+d) f (y|v, µ, Σ) = 2 Σ−1 2 (1 + (y − µ) Σ−1 (y − µ))− 2 d d (A.37) Γ( v )v π 2 2 v 2 In the same way, mean and variance are defined by: E(Y ) = µ v > 1) (A.38) v V ar(Y ) = Σ v>2 (A.39) v−2A.2.12 WishartThe Wishart is the conjugate prior distribution for the inverse covariance matrix in a multivariate Nor-mal distribution. It is a multivariate generalization of the Gamma distribution. The integral is finite ifthe degrees of freedom parameter, v, is greater or equal to the dimension, k. Formally, a continuous random variable, Y , has a Wishart distribution with v degrees of freedomand symmetric, positive definite k × k scale matrix S, denoted Y W ishartv (S), if its probabilitydensity function is given by (W positive definite): k vk k(k−1) v + 1 − i −1 − v v+k+1 1 f (y|v, S) = (2 π 2 4 Γ( )) |S| 2 |W |− 2 exp[− tr(S −1 W )] (A.40) 2 2 i=1 Similarly, mean is defined by: E(Y ) = vS (A.41) 120
    • A. Probability DistributionsA.2.13 Inverse- WishartIf W −1 W ishartv (S), then W has the inverse- Wishart distribution. This is the conjugate priordistribution for the multivariate Normal covariance matrix. Formally, a continuous random variable,Y, has an inverse- Wishart distribution with v degrees of freedom and symmetric, positive definite kxkscale matrix S, denoted Y Inv − W ishartv (S −1 ), if its probability density function is given by(W positive definite): k vk k(k−1) v + 1 − i −1 v v+k+1 1 f (y|v, S) = (2 2 π 4 Γ( )) |S| 2 |W | 2 exp[− tr(SW −1 )] (A.42) 2 2 i=1 Similarly, mean is defined by: E(Y ) = (v − k − 1)−1 S (A.43) 121
    • Appendix BInstallation GuideB.1 From source folderSource folder contains the following files and folders: • BARESIMDA.jar: it is the executable application file. Java Runtime Enviroment 1.4.2 or pos- terior, R 2.4.1 or posterior and the libraries provided in the folder requires to be installed. • R Libraries: with the libraries to be moved into the R software library %R HOME%library. • Java Library: it contains the file to be moved into %JAVA HOME%libext.%R HOME% and %JAVA HOME% refers to the path in which R and Java are installed respec-tively. For instance, in Windows, if you have installed them into the root directory C: you should haveC:RR-2.4.1library and C:Javalibext.B.2 From installerAn installer will be provided to make the installation process much easier. It is not necessary anyprevious program, since the installer will install the Java Runtime Environment and R. As result ofexecuting this installer, a new folder and a shortcut icon will be created. 122
    • Appendix CUser’s GuideC.1 Data EntryC.1.1 Loading an excel file 1. Select the file menu item in the menu bar. Figure C.1: Load Data Menu 2. Put the mouse over the Load element and click on it. 3. A dialog box shown in Figure C.2 will be displayed. Click on the Search button to select the excel file to load and indicate the sheet number in the field with that label. If the first row in the data sheet is the header with the variable names, then click OK to load data. Otherwise, deselect the variable names option and click OK. 4. Then, data will be displayed in the Data windows as it was done in an excel sheet (see Figure C.3)C.1.2 Defining a new variable 1. Ensure that Data window is the active window. 123
    • C. User’s Guide Figure C.2: Select File Dialog Figure C.3: Display Loaded Data 2. Define the new variable by clicking on the New Variable button (see Figure C.4). 3. You will be required to type in the name of the new variable. Type it in and click OK (see Figure C.5). 4. A new column will be added to the spreadsheet with the new variable as header (see Figure C.6). 5. If you want to define several new variables, repeat from step 2 as necessary 124
    • C. User’s Guide Figure C.4: Define New Variable Figure C.5: Enter New Variable Name Figure C.6: Display New Variable 125
    • C. User’s GuideC.1.3 Editing an existing variable 1. Ensure that Data window is the active window. 2. Click on the Edit Variable button (see Figure C.7). Figure C.7: Edit Variable 3. A dialog will be displayed. Select the variable to edit and go on (see Figure C.8). Figure C.8: Select Variable to Be Editted 4. A new dialog will be shown and you will be required to type in the new name of the variable. Type it in and the variable will be stored with the new name (see Figure C.9). 126
    • C. User’s Guide Figure C.9: Enter New NameC.1.4 Deleting an existing variable 1. Ensure that Data window is the active window. 2. Click on the Delete Variable button and a dialog will be displayed. 3. Select the variable to delete and go on. A confirmation dialog will be shown. Confirm that is the variable to be deleted and the variable and its data will be removed from the application (see Figure C.10). Figure C.10: ConfirmationC.1.5 Typing in a new data row 1. Ensure that Data window is the active window. 2. Click on the New Row button. If there is any defined variable previously, a row will be added to the spreadsheet with so many columns as variables are defined (see Figure C.11). 3. Double-click onto the cell to edit and enter the new value. When you finish, press enter (see Figure C.12). 4. Repeat step 2 and 3 as necessary. 127
    • C. User’s Guide Figure C.11: New Row data Figure C.12: Type DataC.1.6 Deleting an existing data row 1. Ensure that Data window is the active window. 2. Select the data row or rows to be deleted. Then click on the Delete Row button. A confirmation dialog will be displayed. 3. Confirm and all data in those rows will be removed. 128
    • C. User’s GuideC.1.7 Modifying an existing data 1. Ensure that Data window is the active window. 2. Select the data cell to be modified and double-click onto it. You will be able to edit the cell value. When you finish, press Enter.C.2 ConfigurationC.2.1 Setting the look& feel 1. Select the Look&Feel item in the Configuration element of the menu bar (see Figure C.13). Figure C.13: Look And Feel Menu 2. Select the Look&Feel style you want. The available options are: Metal (Java style), CDE/ Motif (Unix/ Linux style), Windows and Windows Classic (see Figure C.14). Figure C.14: Look And Feel Styles 3. When you have selected your option (for instance CDE/ Motif), the application appearance will be modified (see Figure C.15).C.2.2 Selecting the type of user 1. Select the Type Of User item in the Configuration element of the menu bar (see Figure C.16). 2. A dialog will be displayed. Select the type of user you are and accept (see Figure C.17). This will be useful to define prior information in Bayesian regression. 129
    • C. User’s Guide Figure C.15: New Look And Feel Figure C.16: Type Of User Menu Figure C.17: Select Type Of User 130
    • C. User’s GuideC.3 Non Symbolic RegressionC.3.1 Simple Classical Regression 1. Select the Classical menu in the Non-Symbolic Regression element of the menu bar. Then, select Simple Regression (see Figure C.18). Figure C.18: Non-Symbolic Classical Regression Menu 2. You will be required to select the independent and dependent variables from the defined vari- ables. Select them and go on (see Figure C.19). Figure C.19: Select Non-Symbolic Variables in Simple Regression 3. A brief report will be displayed in the Classical Simple regression window indicating that for more details see Analysis Options in the ToolBar (see Figure C.20). 4. From this point, you can: (a) Change dependent and independent variables in the Variables Options, by selecting them again as it was done before. 131
    • C. User’s Guide Figure C.20: Brief Report (b) Select tests and analysis in the Analysis Options, by clicking on the wanted analysis options. The analysis options available are shown in Figure C.21. Figure C.21: Analysis Options in Non-Symbolic Classical Simple Regression To make new predictions, you will have to select the predict option and introduce the new observed value and press OK (see Figure C.22). (c) Select graphics in the Graphics Options, by clicking on the wanted graphics options. The graphics options available are show in Figure C.28. (d) Save some results in the Save Options, by clicking on the wanted save options and select- ing the file where is going to be saved. The save options available are shown in Figure C.24. 132
    • C. User’s Guide Figure C.22: New Prediction in Non-Symbolic Classical Simple Regression Figure C.23: Graphics options in Non-Symbolic Classical Simple RegressionC.3.2 Multiple Classical Regression 1. Select the Classical menu in the Non-Symbolic Regression element of the menu bar. Then, select Multiple Regression (see Figure C.25). 2. You will be required to select the dependent and independent variables from the defined vari- ables. Select them and go on (see Figure C.26). 133
    • C. User’s Guide Figure C.24: Save options in Non-Symbolic Classical Simple Regression Figure C.25: Non-Symbolic Classical Multiple Regression Menu Figure C.26: Select Variables in Non-Symbolic Classical Multiple Regression 3. From this point a new Multiple Classical regression window is created, and the procedure is similar to that described in Simple Classical Regression. Therefore the user is referenced to that section to see how to select variable, analysis, graphics and save options. (a) Available Analysis Options can be seen in Figure C.21. There are two new analysis options: backward and forward selection. This will let you to 134
    • C. User’s Guide Figure C.27: Analysis options in Non-Symbolic Classical Multiple Regression know those independent variables that really influences into the dependent variable. (b) Available Graphics Options are shown in Figure C.28. Figure C.28: Graphics options in Non-Symbolic Classical Multiple Regression (c) Available Save Options can be seen in Figure C.29. 4. You will be able to select if there is an intercept in the model or not by clicking on the Model option (see Figure C.30). 135
    • C. User’s Guide Figure C.29: Save options in Non-Symbolic Classical Multiple Regression Figure C.30: Intercept in Non-Symbolic Classical Multiple RegressionC.3.3 Simple Bayesian Regression 1. Select the Bayesian menu in the Non-Symbolic Regression element of the menu bar. Then, select Simple Regression (see Figure C.31). Figure C.31: Non-Symbolic Bayesian Simple Regression Menu 2. You will be required to select the dependent and independent variables from the defined vari- ables as it was done in Simple Classical Regression. Select them and go on (see Figure C.32). 3. A new Bayesian Simple Regression window will be created. The estimates mean and standard deviation of the parameters will be displayed as well as the 95% highest posterior density interval and the standard numerical error. 4. You will be able to select variable, analysis, graphics and save options as it was done in Simple 136
    • C. User’s Guide Figure C.32: Select Variables in Non-Symbolic Bayesian Simple Regression Classical Regression, although for Bayesian regression, these options are more limited. How- ever, the procedure is the same. (a) Available Analysis Options are shown in Figure C.33. Figure C.33: Analysis Options in Non-Symbolic Bayesian Simple Regression (b) Available Graphics Options can be seen in C.34. (c) Available Save Options are shown in Figure C.35. 5. In Bayesian regression, new options are available in the ToolBar: (a) Specifying Prior Information, by clicking on the Prior Information item in the ToolBar. A new input dialog will be displayed, where you will be able to specify prior information. If you have selected Experienced User in the Type Of User option in the Configuration menu, you will see a dialog like that shown in Figure C.38. 137
    • C. User’s Guide Figure C.34: Graphics Options in Non-Symbolic Bayesian Simple Regression Figure C.35: Save Options in Non-Symbolic Bayesian Simple RegressionFigure C.36: Prior Experienced Especification Options in Non-Symbolic Bayesian Simple Regression 138
    • C. User’s Guide Otherwise, you will see that shown in Figure C.37. Figure C.37: Prior Inexperienced Especification in Non-Symbolic Bayesian Simple Regression (b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar. Figure C.38: Prior Experienced Especification in Non-Symbolic Bayesian Simple RegressionC.3.4 Multiple Bayesian Regression 1. Select the Bayesian menu in the Non-Symbolic Regression element of the menu bar. Then, select Simple Regression (see Figure C.39). Figure C.39: Non-Symbolic Bayesian Multiple Regression menu 2. You will be required to select the dependent and independent variables from the defined vari- ables as it was done in Multiple Classical Regression. Select them and go on. 3. A new Bayesian Multiple regression will be created. From this point the procedure is the same as in Bayesian Simple Regression. (a) Analysis Options are shown in Figure C.40. 139
    • C. User’s Guide Figure C.40: Analysis Options in Non-Symbolic Bayesian Multiple Regression (b) Graphics options can be seen in Figure C.41. Figure C.41: Graphics Options in Non-Symbolic Bayesian Multiple Regression (c) Save Options are shown in Figure C.42. Figure C.42: Save Options in Non-Symbolic Bayesian Multiple Regression (d) Model Options are those shown in Figure C.43.C.4 Symbolic RegressionC.4.1 Simple Classical Regression 1. Select the Classical menu in the Symbolic Regression element of the menu bar. Then, select Simple Regression (see Figure C.44). 140
    • C. User’s Guide Figure C.43: Model Options in Non-Symbolic Bayesian Multiple Regression Figure C.44: Symbolic Classical Simple Regression Menu 2. You will be required to select the minimum and maximum dependent and minimum and max- imum independent variables from the defined variables. Select them and go on (see Figure C.45). Figure C.45: Select Variables in Symbolic Classical Simple Regression 3. A brief report will be displayed for midpoints and radii analysis. This is very similar to the case for Non-Symbolic Regression, but now you will have an analysis for the midpoints and other one for the radii. In this case, there more graphics options. (a) Analysis Options are shown in Figure C.46. (b) Graphics Options can be seen in Figure refGraphics SCSS. 141
    • C. User’s Guide Figure C.46: Analysis Options in Symbolic Classical Simple Regression Figure C.47: Graphics Options in Symbolic Classical Simple Regression (c) Save Options are the same that in Non-Symbolic Regression were. 142
    • C. User’s GuideC.4.2 Multiple Classical Regression 1. Select the Classical menu in the Symbolic Regression element of the menu bar. Then, select Multiple Regression (see Figure C.48). Figure C.48: Symbolic Classical Multiple Regression Menu 2. You will be required to select the minimum and maximum dependent and minimum and max- imum independent variables from the defined variables (se Figure C.49). Ensure that the first maximum independent variable selected is the adequate for the minimum independent variable chosen. Figure C.49: Select Variables in Symbolic Classical Multiple Regression 3. A brief report will be displayed for midpoints and radii analysis. This is very similar to the case for Non-Symbolic Regression, but now you will have an analysis for the midpoints and other one for the radii. In this case, there more graphics options. (a) Analysis Options are shown in Figure C.50. (b) Graphics Options can be seen in Figure C.51. (c) Save Options are the same the same that were in Non-Symbolic Regression. 143
    • C. User’s Guide Figure C.50: Analysis Options in Symbolic Classical Multiple Regression Figure C.51: Graphics Options in Symbolic Classical Multiple RegressionC.4.3 Simple Bayesian Regression 1. Select the Bayesian menu in the Symbolic Regression element of the menu bar. Then, select Simple Regression (see Figure C.52). 2. You will be required to select the minimum and maximum dependent and minimum and maxi- mum independent variables from the defined variables (see Figure C.53). 144
    • C. User’s Guide Figure C.52: Symbolic Bayesian Simple Regression Figure C.53: Select Variables in Symbolic Bayesian Simple Regression 3. A new Bayesian Simple Regression window will be created. The estimates mean and standard deviation of the midpoints and radii parameters will be displayed as well as the 95% highest posterior density interval and the standard numerical error. 4. You will be able to select variable, analysis, graphics and save options as it was done in Non- Symbolic Regression. (a) Available Analysis Options are shown in Figure C.54. Figure C.54: Analysis Options in Symbolic Bayesian Simple Regression (b) Available Graphics Options can be seen in Figure C.55. 145
    • C. User’s Guide Figure C.55: Graphics Options in Symbolic Bayesian Simple Regression (c) Save Options are the same that were in Non-Symbolic Regression. 5. As it occurred in Non- Symbolic Regression, in Bayesian analysis, new options are available in the ToolBar: (a) Specifying Midpoints and Radii Prior Information, by clicking on the Midpoints or Raddi Prior Information item in the ToolBar. A new input dialog will be displayed, where you will be able to specify prior information. (b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar (see Figure C.56).C.4.4 Multiple Bayesian Regression 1. Select the Bayesian menu in the Symbolic Regression element of the menu bar. Then, select Simple Regression (see Figure C.57). 2. You will be required to select the minimum and maximum dependent and minimum and max- imum independent variables from the defined variables (see Figure C.58). Ensure that the first 146
    • C. User’s Guide Figure C.56: Model Options in Symbolic Bayesian Simple Regression Figure C.57: Symbolic Bayesian Multiple Regression Menu maximum independent variable selected is the adequate for the minimum independent variable chosen. Figure C.58: Select Variables in Symbolic Bayesian Multiple Regression 3. A new Bayesian Multiple Regression window will be created. The estimates mean and standard deviation of the midpoints and radii parameters will be displayed as well as the 95% highest posterior density interval and the standard numerical error. 4. You will be able to select variable, analysis, graphics and save options as it was done in Non- Symbolic Regression. (a) Analysis Options are the same that were in Non-Symbolic Regression. 147
    • C. User’s Guide (b) Graphics Options are shown in Figure C.59. Figure C.59: Graphics Options in Symbolic Bayesian Multiple Regression (c) Save Options are the same that were in Non-Symbolic Regression. 5. As it occurred in Non- Symbolic Regression, in Bayesian analysis, new options are available in the ToolBar: (a) Specifying Midpoints and Radii Prior Information, by clicking on the Midpoints or Raddi Prior Information item in the ToolBar. A new input dialog will be displayed, where you will be able to specify prior information. (b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar. 148
    • Appendix DObtaining and Installing RThe way to obtain R is to download it from one of the CRAN (Comprehensive R Archive Network)sites. The main site is http://cran.r-project.org. It has a number of mirror sites worldwide, which maybe closer to you and give faster download times. Installation details tend to vary over time, so you should read the accompanying documents andany other information offered on CRAN.D.1 Binary distributionsThe version for recent variants of Microsoft Windows comes as a single SetupR.exe file, on whichyou simply double- click with the mouse and then follow the on- screen instructions. When the pro-cess is completed, you will have an entry under Programs on the Start men for invoking R, as well asa desktop icon. For Linux distributions that use RPM package format (RedHat, Mandrake,LinuxRPC and SuSE)and also for Alpha Unix (OSF/Tru64), .rpm files of R and the recommended add-on packages can beinstalled using the rpm command. Packages for the Debian APT package manager are also available. For the Macintosh platforms there are two different binary distributions: the ”Carbon” R and the”Darwin” R. The first version is intended to run natively on MacOS System from 8.6 to OS X, andthe second one as a usual Unix command undex OS X. The Darwin R also requires an X windowsmanager like XDarwin to use the X11 graphic device. 149
    • D. Obtaining and Installing R Carbon R comes in single .sit archive file that you simply decompress by dragging the file ontoStuffit Expander ad move the resulting folder rmxyz into your favourite applications folder. The Dar-win version is a .tgz archive, which can be installed, after decompression, with some (fairly trivial)manual adjustments. Darwin R can also be installed using the ”fink”. Fink installs all dynamic libraries that might beneeded, and it can update R to newer versions when available.D.2 Installation from sourceInstallation from source code is possible on all supported platforms, although nontrivial on Macintoshand Windows, mainly because the build environment is not part of the system. On Unix-like systems(Macintosh OS X included), the process can be as simple as unpacking the sources and writing .configure make make install and then you would unpack the recommended package bundle, change to its directory and enter R CMD INSTALL *.tar.gz The above works on widely used platforms, provided that the relevant compilers and support li-braries are installed. If your system is more esoteric or you want to use special compilers or libraries,then you may need to dig deeper. For Windows and Carbon Macintosh, the directories src/gnuwin32 and src/macintosh have IN-STALL file with detailed information about the procedure to follow. 150
    • D. Obtaining and Installing RD.3 Package installationTo install R packages such as bayesm under Unix/Linux or Windows, you can connect to the Internet,start R, and enter install.packages(”bayesm”,.libPaths()[1]) The Windows version provides a convenient menu interface for the operation. If your R machine is not connected to the Internet, you can also download the package the pack-age as a file and install that. For Windows and the Carbon version of Macintosh, you need to getthe binary package (.zip or .sit extension). For Windows, installation from a local .zip file is possiblevia a menu entry. For Macintosh users, the procedure is described in the Macintosh FAQ. For Unixand Linux, you can issue the following at the shell prompt (the -l option allows you to give a privatelibrary): R CMD INSTALL bayesm On Unix and Linux systems you will need superuser permissions to install. Otherwise you canset up a private library directory and install into that. Use the R LIBS environment variable to useyour private library subsequently. A similar issue arises if R is installed on a read-only file system ina Windows environment. Further details can be found in the help page for library. Information and further Internet resources for R can be obtained from CRAN and the R homepageat http://www.r-project.org. Notice in particular the mailing lists, the user-contributed documents, andthe FAQs. 151
    • Appendix EObtaining and installing Java RuntimeEnvironmentThe way to obtain the Java Runtime Enviroment (JRE) is to download it from Sun Microsystems offi-cial site. The main site is http://java.sun.com, from where you can select the version to be downloaded.The link to download the current version, which is J2SE v1.4.2 14 JRE, is http://java.sun.com-/j2se/1.4.2/download.html.E.1 Microsoft WindowsYou must have administrative permissions in order to install the Java 2 Runtime Environment on Mi-crosoft Windows 2000 and XP. The download page provides the following two choices of installation.Continue based on your choice. 1. Windows Installation- After clicking the ”Download” link for the JRE, a dialog box pops up. Choose the open option to start a small program which then prompts you for more information about what you want to install. 2. Windows Offline Installation- After clicking the JRE ”Download” link for the ”Windows Of- fline Installation”, a dialog box pops up. Choose the save option to save the downloaded file without installing it. Run this file by double-clicking on the installer’s icon. Then follow the instructions that the installer provides. When done with the installation, you can delete the downloaded file to recover disk space. 152
    • E. Obtaining and installing Java Runtime EnvironmentE.2 LinuxJava 2 Runtime Environment 1.4.2 is available in two installation formats: 1. Self-extracting Binary File - This file can be used to install the Java 2 Runtime Environment in a location chosen by the user. This one can be installed by anyone (not only root users), and it can easily be installed in any location. As long as you are not root user, it cannot displace the system version of the Java platform supplied by Linux. To use this file, see Installation of Self-Extracting Binary below. 2. RPM Packages - A rpm.bin file which contains RPM packages, installed with the rpm utility. It requires root access to install, and installs by default in a location that replaces the system version of the Java platform supplied by Linux. To use this bundle, see Installation of RPM File below. Choose the install format that is most suitable to your needs.E.2.1 Installation of Self-Extracting BinaryUse these instructions if you want to use the self-extracting binary file to install the Java 2 RuntimeEnvironment. If you want to install RPM packages instead, see Installation of RPM File. 1. Download and check the download file size to ensure that you have downloaded the full, uncor- rupted software bundle. You can download to any directory you choose; it does not have to be the directory where you want to install the Java 2 Runtime Environment. Before you download the file, notice its byte size provided on the download page on the web site. Once the download has completed, compare that file size to the size of the downloaded file to make sure they are equal. 2. Make sure that execute permissions are set on the self-extracting binary. Run this command: chmod +x j2re-1 4 2 14-linux-i586.bin. 3. Change directory to the location where you would like the files to be installed. The next step installs the Java 2 Runtime Environment into the current directory. 4. Run the self-extracting binary. Execute the downloaded file, prep ended by the path to it. For example, if the file is in the current directory, prep end it with ”./” (necessary if ”.” is not in the 153
    • E. Obtaining and installing Java Runtime Environment PATH environment variable): ./j2re-1 4 2 14-linux-i586.bin The binary code license is displayed, and you are prompted to agree to its terms. The Java 2 Runtime Environment files are installed in a directory called j2re1.4.2 14 in the current directory.E.2.2 Installation of RPM FileUse these instructions if you want to install Java 2 Runtime Environment in the form of RPM pack-ages. If you want to use the self-extracting binary file instead, see Installation of Self-ExtractingBinary. 1. Download and check the file size. You can download to any directory you choose. Before you download the file, notice its byte size provided on the download page on the web site. Once the download has completed, compare that file size to the size of the downloaded file to make sure they are equal. 2. Extract the contents of the downloaded file. Change directory to where the downloaded file is located and run these commands to first set the executable permissions and then run the binary to extract the RPM file: chmod a+x j2re-1 4 2 14-linux-i586-rpm.bin ./j2re-1 4 2 14-linux-i586-rpm.bin Note that the initial ”./” is required if you do not have ”.” in your PATH environment variable. The script displays a binary license agreement, which you are asked to agree to before installationcan proceed. Once you have agreed to the license, the install script creates the file j2re-1 4 2 14-linux-i586.rpm in the current directory. 1. Become root by running the su command and entering the super-user password. 154
    • E. Obtaining and installing Java Runtime Environment 2. Run the rpm command to install the packages that comprise the Java 2 Runtime Environment: rpm -iv j2re-1 4 2 14-linux-i586.rpm 3. Delete the bin and rpm file if you want to save disk space. 4. Exit the root shell.E.3 UNIX 1. Check the download file size. You can download to any directory you choose; it does not have to be the directory where you want to install the J2RE. Before you download the file, notice its byte size provided on the download page on the web site. Once the download has completed, compare that file size to the size of the downloaded file to make sure they are equal. 2. Make sure that execute permissions are set on the self-extracting binary: On SPARC processors: chmod +x j2re-1 4 2 14-solaris-sparc.sh On x86 processors: chmod +x j2re-1 4 2 14-solaris-i586.sh 3. Change directory to the location where you would like the files to be installed. The next step installs the J2RE into the current directory. 4. Run the self-extracting binary. Execute the downloaded file, prep ending the path to it. For example, if the downloaded file is in the current directory, prep end it with ”./”: On SPARC processors: ./j2re-1 4 2 14-solaris-sparc.sh On x86 processors: ./j2re-1 4 2 14-solaris-i586.sh The binary code license is displayed, and you are prompted to agree to its terms. The J2RE files are installed in a directory called j2re1.4.2 14 in the current directory. 155
    • E. Obtaining and installing Java Runtime Environment More information about installation process on different kinds of operating systems can be foundin the Sun Microsystems official site which has been mentioned above. 156
    • Bibliography[Aitk97] Aitkin, M., The calibration of P-values, posterior Bayes factors and the AIC from the pos- terior distribution of the likelihood, Statistics and Computing 7 (4), 253-261. 1997[Arro06] Arroyo, J. and Mat´ , C., Introducing interval time series: accuracy measures, COMPSTAT, e Rome 2006.[Berg05] Berg, B.A., Introduction to Markov Chain Monte Carlo Simulations and their Statistical Analysis, NATIONAL UNIVERSITY OF SINGAPORE 7. 2005[Berg98] Berger, J. and Pericchi, L., Accurate and stable Bayesian model selection: the median intrinsic Bayes Factor, The Indian Journal Of Statistics 60 (1), 1-18. 1998[Bill00] Billard, L. and Diday, E., Regression Analysis for Interval-Valued Data, Data Analysis, Classification and Related Methods: Proceedings of the Seventh Conference of the International Federation of Classification Societies, Namur, Belgium 2000.[Bill02] Billard, L. and Diday, E., From the Statistics of Data to the Statistics of Knowledge: Sym- bolic Data Analysis, Journal of the American Statistical Association 98 (462), 470-487. 2002.[Bill06a] Billard, L. and Diday, E., Symbolic Data Analysis: Conceptual Statistics and Data Mining, Wiley ,England 2006.[Bill06b] Billard, L. and Diday, E., Symbolic Data Analysis: what is it?, COMPSTAT, Rome 2006.[Cham83] Chambers, J.M., Cleveland, W.S., Kleiner, B. and Tukey, P.A., Graphics Methods for Data Analysis, Wadsworth, 1983.[Cham92] Chambers, J.M. and Hastie, T.J., Statistical Models in S, Hall/CRC, 1992.[Chen00] Chen, M., Shao, Q. and Ibrahim, J.G., Monte Carlo Methods in Bayesian Computation, Springer, New York 2000. 157
    • [Chen03] Cheng, R. and Sahu, S., A fast distance based approach for determining the number of components in mixtures, Canadian Journal of Statistics 31, 3-22, 2003.[Cong06] Congdon, P., Bayesian Statistical Modelling, Wiley, England 2006.[Dalg02] Dalgaard, P., Introductory Statistics with R, Springer, New York 2002.[DeCa04] De Carvalho, F.A.T., Freire, E.S. and Lima Neto, E.A. A New Method to Fit a Linear Regression Model for Interval-Valued Data, KI 2004: Advances in Artificial Intelligence: 27th Annual German Conference in AI, 295-306, Springer, Ulm, Germany, 2004.[DeCa05] De Carvalho, F.A.T., Freire, E.S. and Lima Neto, E.A. Applying Constrained Linear Re- gression Models to Predict Interval-Valued Data , KI 2005: Advances in Artificial Intelligence 3698, 92-106, Springer, Koblenz, Germany 2005.[DeCa07] De Carvalho, F.A.T. and Lima Neto, E.A., Centre and Range method for fitting a linear regression model to symbolic interval data, Computational Statistics and Data Analysis, 2007.[Dida95] Diday, E., Probabilist, Possibilist and Belief Objects for Knowledge Analysis, Annals of Operations Research, 55, 227-276, 1995.[Gelf90] Gelfand, A.E. and Smith, A.F.M., Sampling-based approaches to calculating marginal den- sities, Journal of the American Statistical Association 85, 398-409, 1990.[Gelm04] Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B., Bayesian Data Analysis, Hall/CRC, Boca Raton, Florida 2004.[Gilk95] Gilks, W.R., Best, N. and Tan, K.K.C., Adaptive rejection Metropolis sampling within Gibbs sampling, Applied Statistics 44, 455-472, 1995.[Gosh03] Gosh, J.K. and Ramamoorthi, R.V., Bayesian Nonparametrics, Spriger, New York 2003.[Hast70] Hastings, W.K., Monte Carlo sampling methods using Markov chains and their applica- tions, Biometrika 57, 97-109, 1970.[Huiw06] Huiwen, W., Mok, H.M.K. and Dapeng, L., Factor interval data analysis and its applica- tion, COMPSTAT, Rome 2006.[Irpi05] Irpino, A., ”Spaghetti” PCA analysis: An extension of principal component analysis to time dependent interval data, Pattern Recognition Letters, 2005. 158
    • [Jeff61] Jeffreys, H., Theory of Probability, Oxford University Press, 1961.[Kend05] Kendall, W. S., Liang, F. and Wang, J-S., Markov chain Monte Carlo: Innovations and Applications, National University of Singapore 7, 2005.[Koop03] Koop, G., Bayesian Econometrics, Wiley, England 2003.[Laws74] Lawson, C.l. and Hanson, R.J., Solving Least Squares Problem, Prentice-Hall, New York 1974.[Lee 06] Lee, C-H.L., Liu, A. and Chen, W-S., Pattern Discovery of Fuzzy Time Series for Financial Prediction, IEEE 18, (5), 2006.[Mart01] Martinez, W.L. and Martinez, A.R., Computational Statistics Handbook with MATLAB , Hall/CRC, Boca Raton, Florida 2001. e e ´[Mat´ 93] Mat´ , C. and Sarabia, A., Problemas de Probabilidad y Estad´stica, CLAGSA, Madrid ı 1993.[Mat´ 95] Mat´ , C., Curso General sobre StatGraphics II, Universidad Pontifica Comillas, Madrid e e 1995.[Mat´ 06] Mat´ , C., An´ lisis Bayesiano de Datos, Asociaci´ n Espaola para la Calidad, Madrid 2006. e e a o[Metr53] Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E., Equation of state calculations by fast computing machines, Journal of Chemical Physics 21, 1087-1092, 1953.[Mont02] Montgomery, D.C. and Runger, G.C., Probabilidad y Estad´stica Aplicadas a la Inge- ı nier´a, Wiley, 2002. ı[Mull04] Muller, P. and Quintana, F.A., Nonparametric Bayesian Data Analysis, Statistical Science 19, 95-110, 2004.[Poir95] Poirier, D., Intermediate Statistics and Econometrics: A Comparative Approach., The MIT Press, Cambridge 1995.[Rossi06] Rossi, P.E., Allenby, G. and McCulloch, R., Bayesian Statistics and Marketing, Wiley, New York 2006. 159
    • [Rupp04] Rupp, A.A., Dey, D.K. and Zumbo, B.D., To Bayes or Not to Bayes, From Whether to When: Applications of Bayesian Methodology to Modeling, Structural Equation Modeling: A Multidisciplinary Journal 11 (3), 424-451. 2004.[Spie03] Spiegelhalter, D., Thomas, A., Best, N., Gilks, W. and Lunn, D., BUGS: Bayesian inference using Gibbs sampling, 2003.[Urba92] Urbach, P., Regression Analysis: Classical and Bayesian , The British Journal for the Phi- losophy of Science 43 (3), 311-342, 1992.[Vena02] Venables, W.N. and Ripley, B.D., Modern Applied Statistics with S, Springer, New York 2002.[West04] West, R.W., Wu,T. and Heydt, D., An introduction to StatCrunch 3.0, Journal of Statistical Software 9 (6), 2004.[Zamo01] Zamora, MM. and Estavillo, J., Modelo de regresi´ n normal cl´ sico, 2001. o a 160