Knowledge Discovery in Stock Market
Upcoming SlideShare
Loading in...5

Knowledge Discovery in Stock Market



Perhaps more than any other kind of time series data, financial markets have been scrutinized by countless mathematicians, economists, investors and speculators over hundreds of years. Even in modern ...

Perhaps more than any other kind of time series data, financial markets have been scrutinized by countless mathematicians, economists, investors and speculators over hundreds of years. Even in modern times, despite all scientific advances, the effort of predicting future movements of the stock market sometimes still bears resemblance to the ancient alchemistic aspirations of turning base metals into gold. That is not to say that there is no genuine scientific effort in studying financial markets, but distinguishing serious research from charlatanism (or even fraud) remains remarkably difficult.



Total Views
Views on SlideShare
Embed Views



2 Embeds 7 5 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Knowledge Discovery in Stock Market Knowledge Discovery in Stock Market Document Transcript

    • Knowledge Discovery in the Stock MarketSupervised and Unsupervised Learning with BayesiaLabStefan Conrady, stefan.conrady@conradyscience.comDr. Lionel Jouffe, jouffe@bayesia.comJune 29, 2011Conrady Applied Science, LLC - Bayesia’s North American Partner for Sales and Consulting
    • Knowledge Discovery in the Stock Market with Bayesian NetworksTable of ContentsTutorial Highlights 1 Background & Objective 1 Notation 2 Dataset 3 Data Preparation and Transformation 4 Data Import 5 Determining Discretization Intervals 6 Modeling Mode 8 Unsupervised Learning 12 Bayesian Network versus Correlation Matrix 16 Inference with Bayesian Networks 16 Inference with Hard Evidence 18 Inference with Soft Evidence 22 Bayesian Network Metrics 25 Arc Force 25 Mutual Information 26 Correlation 27 Summary - Unsupervised Learning 27 Supervised Learning 29 Inference with Supervised Learning 32 Adaptive Questionnaire 34 Summary - Supervised Learning 38Appendix Appendix 39 Markov Blanket 39 Bayes’ Theorem 39 About the Authors | ii
    • Knowledge Discovery in the Stock Market with Bayesian Networks Stefan Conrady 40 Lionel Jouffe 40 Contact Information 41 Conrady Applied Science, LLC 41 Bayesia S.A.S. 41 Copyright | iii
    • Knowledge Discovery in the Stock Market with Bayesian NetworksTutorialHighlights• Unsupervised Learning with BayesiaLab can rapidly generate plausible structures of unfamiliar problem domains, as illustrated in this paper with examples from the U.S. stock market.• Supervised Learning with BayesiaLab delivers reliable models in high-dimensional domains, providing both powerful predictive performance plus a platform for simulating domain dynamics.• Knowledge representation with Bayesian networks is highly intuitive and effectively provides computable knowledge that allows inference and reasoning under uncertainty.Background & ObjectivePerhaps more than any other kind of time series data, nancial markets have been scrutinized by countless mathemati-cians, economists, investors and speculators over hundreds of years. Even in modern times, despite all scienti c ad-vances, the effort of predicting future movements of the stock market sometimes still bears resemblance to the ancientalchemistic aspirations of turning base metals into gold. That is not to say that there is no genuine scienti c effort instudying nancial markets, but distinguishing serious research from charlatanism (or even fraud) remains remarkablydif cult.We neither aspire to develop a crystal ball for investors nor do we expect to contribute to the economic and economet-ric literature. However, we nd the wealth of data in the nancial markets to be fertile ground for experimenting withknowledge discovery algorithms and for generating knowledge representations in the form of Bayesian networks. Thisarea can perhaps serve as a very practical proof of the powerful properties of Bayesian networks, as we can quicklycompare machine-learned ndings with our own understanding of market dynamics. For instance, the prevailing opin-ions among investors regarding the relationships between major stocks should be re ected in any structure that is to bediscovered by our algorithms.More speci cally, we will utilize the unsupervised and supervised learning algorithms of the BayesiaLab software pack-age to automatically generate Bayesian networks from daily stock returns over a six-year period. We will examine 459stocks from the S&P 500 index, for which observations are available over the entire timeframe. We selected the S&P500 as the basis for our study, as the companies listed on this index are presumably among the best-known corporationsworldwide, so even a casual observer should be able to critically review the machine-learned ndings. In other words,we are trying to machine-learn the obvious, as any mistakes in this process would automatically become self-evident.Quite often experts’ reaction to such machine-learned ndings is, “well, we already knew that.” That is the very pointwe want to make, as machine-learning can — within seconds — catch up with human expertise accumulated over years,and then rapidly expand beyond what is already known.The power of such algorithmic learning will be still more apparent in entirely unknown domains. However, if we wereto machine-learn the structure of a foreign equity market for expository purposes in this paper, chances are that manyreaders would not immediately be able to judge the resulting structure as plausible or | 1
    • Knowledge Discovery in the Stock Market with Bayesian NetworksIn addition to generating human-readable and interpretable structures, we want to illustrate how we can immediatelyuse machine-learned Bayesian networks as “computable knowledge” for automated inference and prediction. Our ob-jective is to gain both a qualitative and quantitative understanding of the stock market by using Bayesian networks. Inthe quantitative context, we will also show how BayesiaLab can reliably carry out inference with multiple pieces of un-certain and even con icting evidence. The inherent ability of Bayesian networks to perform computations under uncer-tainty makes them highly suitable for a wide range of real-world applications.Continuing the practice established in our previous white papers, we attempt to present the proposed approach in thestyle of a tutorial, so that each step can be immediately replicated (and scrutinized) by any reader equipped with theBayesiaLab software.1 This re ects our desire to establish a high degree of transparency regarding all proposed methodsand to minimize the risk of Bayesian networks being perceived as a black-box technology.NotationTo clearly distinguish between natural language, software-speci c functions and example-speci c variable names, thefollowing notation is used:• Bayesian network and BayesiaLab-speci c functions, keywords, commands, etc., are capitalized and shown in bold type.• Names of attributes, variables, nodes and are italicized.1 The preprocessed dataset with daily return data is available for download from our nancial/ | 2
    • Knowledge Discovery in the Stock Market with Bayesian NetworksDatasetThe S&P 500 is a free- oat capitalization-weighted index of the prices of 500 large-cap common stocks actively tradedin the United States, which has been published since 1957. The stocks included in the S&P 500 are those of large pub-licly held companies that trade on either of the two largest American stock market exchanges; the New York Stock Ex-change and the NASDAQ. For our case study we have tracked the daily closing prices of all stocks included in the S&P500 index from January 3, 2005 through December 30, 2010, only excluding those stocks which were not traded con-tinuously over the entire study period. This leaves a total of 459 stock prices with 1,510 observations each. 60 40 A AA 300 AAPL ABC ABT 60 ACE 40 30 20 20 40 20 100 40 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 40 60 ADBE ADI ADM ADP ADSK 40 AEE 40 40 45 20 35 20 20 20 20 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 60 80 AEP AES AET 60 AFL AGN AIG 40 20 1000 30 10 40 500 20 20 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 75 60 75 60 40 AIV AIZ AKAM AKS ALL ALTR 30 10 25 20 25 20 20 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 30 20 AMAT AMD 80 AMGN AMT AMZN AN 40 50 150 20 30 10 10 40 50 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 150 80 100 75 ANF AON APA APC APD APH 40 50 40 30 25 20 50 50 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 | 3
    • Knowledge Discovery in the Stock Market with Bayesian NetworksData Preparation and TransformationRather than treating the time series in levels, we will difference the stock prices and compute the daily returns. Morespeci cally, we will take differences of the logarithms of the levels, which is a good approximation of the daily stockreturn in percentage terms. After this transformation, 1,509 observations remain and a selection of the rst 36 stocks (inalphabetical order) is shown below. A AA 0.1 AAPL 0.1 ABC 0.05 ABT ACE 0.1 0.1 0.1 0.0 -0.05 -0.1 -0.1 -0.1 -0.1 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0.1 0.1 ADBE ADI 0.1 ADM 0.05 ADP ADSK AEE 0.1 0.1 -0.1 -0.1 -0.1 -0.1 -0.05 -0.1 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0.2 AEP AES AET 0.25 AFL AGN 0.5 AIG 0.05 0.2 0.1 0.0 -0.1 0.0 -0.05 -0.25 -0.5 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0.2 0.2 AIV AIZ 0.2 AKAM AKS ALL 0.1 ALTR 0.2 0.1 0.0 -0.1 -0.2 -0.1 -0.2 -0.2 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 AMAT 0.2 AMD AMGN AMT 0.2 AMZN AN 0.1 0.1 0.1 0.1 0.0 0.0 0.0 -0.2 -0.2 -0.1 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0.1 ANF AON APA 0.1 APC 0.05 APD APH 0.1 0.1 0.1 0.0 0.0 -0.1 -0.1 -0.05 -0.1 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 | 4
    • Knowledge Discovery in the Stock Market with Bayesian NetworksData ImportWe use BayesiaLab’s Data Import Wizard to load all 459 time series2 into memory from a comma-separated le.BayesiaLab automatically detects the column headers, which contain the ticker symbols3 as variable names.The next step identi es the data types contained in the dataset and, as expected, BayesiaLab nds 459 continuous vari-ables.2 Although the dataset has a temporal ordering, for expository simplicity we will treat each time interval as an inde-pendent observation.3 A ticker symbol is a short abbreviation used to uniquely identify publicly traded | 5
    • Knowledge Discovery in the Stock Market with Bayesian NetworksThere are no missing values in the dataset and we do not want to lter out any observations, so the next screen of theData Import Wizard can be skipped entirely.The next step, however, is critical. As part of every data import process into BayesiaLab we must discretize any con-tinuous variables, which means all 459 variables in our particular case.BayesiaLab offers a number of algorithms to automatically discretize the continuous variables and one of the most prac-tical ones, for subsequent Unsupervised Learning, is the K-Means algorithm. It provides a very quick way to capture thesalient characteristics of probability density curves and creates suitable thresholds for binning purposes.Determining Discretization IntervalsAnalyst judgement is required though for choosing an appropriate number of intervals. A common heuristic found inthe statistical literature is ve observations per parameter. We adapt this as a guide for the minimum number of obser-vations required for each cell in any of the yet-to-be-learned Conditional Probability Tables (CPT).In our particular case we already know that we will initially perform Unsupervised Learning with the Maximum WeightSpanning Tree algorithm. This tree structure implies that each Node will have only have one parent, which, in turn,means that each CPT will have the size determined by number of parent states times the number of child states. Choos-ing ve intervals for the discretization process would thus mean a CPT size of 25 cells.4With a uniform distribution of the states this would suggest that we have approximately 60 observations per cell, whichwould clearly be more than enough. However, upon visual inspection of the actual distributions of the variables, theuniform distribution assumption does de nitely not hold. The graph below shows the distribution of variable AA:4 Other learning algorithms do not have this one-parent constraint and, for instance, a ve-interval discretization withthree parents per node would generate CPTs consisting of 625 cells. Even when assuming uniform distributions, theavailable observations would be insuf cient for estimation | 6
    • Knowledge Discovery in the Stock Market with Bayesian NetworksRather, looking at this graph, it may be more appropriate to assume a normal distribution.5 Given that each Node willhave one parent, we would perhaps further assume a bivariate normal distribution for the joint distribution of each pairof Nodes. We need to emphasize that we are not attempting to t distributions per se, but that we are rather trying to nd a heuristic that allows us to establish the minimum number of observations needed to characterize the tail ends ofthe distributions.An assumed bivariate normal distribution would yield a discrete probability density function similar to what is shown inthe table below. In other words, this is what we would expect the Conditional Probability Table (CPT) to approxi-mately look like, once we have discretized the states and learned the CPT from the actual occurrences. However, wehave not yet discretized the states and much less estimated the CPT. Actually, we have not really determined how manydiscretization levels are correct. So, it is a catch-22 and hence the need for a heuristic.Our heuristic is that we use our qualitative understanding of the distributions to determine a reasonable number of in-tervals that provides a minimum number of samples for the tails. More formally, the “thinnest tail” is the minimal localjoint probability (MLJP). Assuming 5 states for parent and child each, and with a total of 1,509 observations, thiswould translate into approximately 4 observations for the MLJP (highlighted in red). :212.-;4<;7=3>?;@4?.- 789 !" !# $ # " !" !"#$% &"&% #"&(% &"&% !"#$% :212.-;4<; !# &"&% (")(% $"*(% (")(% &"&% 81/.52; $ #"&(% $"*(% &("$#% $"*(% #"&(% @4?.- # &"&% (")(% $"*(% (")(% &"&% " !"#$% &"&% #"&(% &"&% !"#$% &(!$ +,-./012345- :212.-;4<;7=3>?;@4?.- 789 !" !# $ # " !" 6 #! #! 6 :212.-;4<; !# #! )) &6* )) #! 81/.52; $ &6* #6! &6* @4?.- # #! )) &6* )) #! " 6 #! #! 6Although the number of expected samples for the MLJP appears to be below the recommended minimum, we will fornow proceed on this basis and set the number of intervals to 5. Only upon completion of the discretization, and afterlearning the network including the CPTs, we will know for sure whether this was indeed a reasonable assumption ornot.5 We omit plotting the distributions of all variables, but all the variables’ distributions do indeed resemble the | 7
    • Knowledge Discovery in the Stock Market with Bayesian NetworksClicking Finish will now perform the discretization. A progress bar will be shown to track the state of this process.Modeling ModeUpon conclusion, the variables are delivered as blue Nodes into the Graph Panel of BayesiaLab and by default we arenow in the Modeling Mode. The original variable names, which were stored the rst line of the database, become ourNode | 8
    • Knowledge Discovery in the Stock Market with Bayesian NetworksAt this point it is practical to add Node Comments to the Node Names. Node Comments are typically used inBayesiaLab for longer and more descriptive titles, which can be turned on or off, depending on the desired view of thegraph. Here, we associate a dictionary of the complete company names with the Node Comments, while the more com-pact ticker symbols remain as Node Names.6The syntax for this association is rather straightforward: we simply de ne a text le which includes one Node Name perline. Each Node Name is followed by the equal sign (“=”), or alternatively TAB or SPACE, and then by the full com-pany name, which will serve as the Node Comment.This le can then be loaded into BayesiaLab via Data>Associate Dictionary>Node>Comments.Once the comments are loaded, a small call-out symbol will appear next to each Node Name. This indicates thatNode Comments are available for display.6 To maintain a compact presentation, we will typically use the ticker symbol when referencing a particular stock ratherthan the full company | 9
    • Knowledge Discovery in the Stock Market with Bayesian NetworksAs the name implies, selecting View>Display Node Comments (or alternatively the keyboard shortcut “M”) will revealthe company | 10
    • Knowledge Discovery in the Stock Market with Bayesian NetworksNode Comments can be displayed for either all Nodes or only for selected ones.Before proceeding with the rst learning step, it is also recommended to brie y switch into the Validation Mode (F5)and to check the distributions of the states of the Nodes. The Monitors of the rst nine Nodes are shown below. At rstglance, the distributions appear to be plausible representations of the historical return | 11
    • Knowledge Discovery in the Stock Market with Bayesian NetworksUnsupervised LearningTo perform the rst Unsupervised Learning algorithm on our dataset, we switch back into Modeling Mode (F4) andselect Learning>Association Discovering>Maximum Spanning Tree.7 This starts the Maximum Weight Spanning Treealgorithm, which is the fastest of the Unsupervised Learning algorithms and thus recommended at the beginning of moststudies.8 As the name implies, this algorithm generates a tree structure, i.e. it permits only one parent per Node. Thisconstraint is one of the reasons for the extreme learning speed of this algorithm.9 Performing the algorithm with a le ofthis size should only take a few seconds.7 In BayesiaLab nomenclature, Unsupervised Learning is listed in the Learning menu as “Association Discovering”8 Several other Unsupervised Learning algorithms are available in BayesiaLab, including Taboo, EQ, SopLEQ and Ta-boo Order.9 It goes beyond the scope of this tutorial to discuss the different types of learning algorithms and their speci c | 12
    • Knowledge Discovery in the Stock Market with Bayesian NetworksAt rst glance, however, the resulting network does not appear simple and tree-like at all.This can be quickly resolved with BayesiaLab’s built-in layout algorithms. Selecting View>Automatic Layout (shortcut“P”) rearranges the network instantly to reveal a much more intuitive | 13
    • Knowledge Discovery in the Stock Market with Bayesian NetworksThe resulting, reformatted Bayesian network representing the stock returns can now be read and interpretedimmediately:10 11For instance, we can zoom into the branch of the Bayesian network which contains Procter & Gamble (PG).BayesiaLab offers a search function (shortcut Ctrl-F or ⌘-F), which helps nd individual nodes very easily.10 A separate, high-resolution PDF of this Bayesian network can be downloaded nancial/SP500_V13.pdf. This allows those readers without an activeBayesiaLab installation to explore the network graph in much greater detail.11 For expositional clarity we have only learned contemporaneous relationships and, as a result, potential lag structureswill not appear in this network. However, in BayesiaLab, Unsupervised Learning can be generalized to a temporal ap-plication. A white paper speci cally focusing on learning temporal (or dynamic) Bayesian networks is planned for thenear | 14
    • Knowledge Discovery in the Stock Market with Bayesian NetworksThe neighborhood of Procter & Gamble contains many familiar company names, mostly from the CPG industry.12 Per-haps these companies appear all-too-obvious and the reader may wonder what insight is gained at this point. Chancesare that even a casual observer of the industry would have mentioned Kimberly-Clark, Colgate-Palmolive and Johnson& Johnson as businesses operating in the same eld as Procter & Gamble, which would therefore presumably havesomewhat related stock price movements.The key point is that without any prior knowledge of this domain a computer algorithm automatically extracted thisstructure, i.e. a Bayesian network, which intuitively matches the understanding that we have established over years asconsumers of these companies’ products.Clearly, if this was an unfamiliar domain, the knowledge gain for the reader would be far greater. However, a lesser-known domain would presumably prevent the reader’s intuitive veri cation of the machine-discovered structure here.12 CPG stands for Consumer Packaged | 15
    • Knowledge Discovery in the Stock Market with Bayesian NetworksBayesian Network versus Correlation MatrixThe bene t of the concise representation as a Bayesian network is further demonstrated by juxtaposing it to a correla-tion matrix, which would perhaps be the rst step in a traditional statistical analysis of this domain. Even when usingheat map-style color-coding, the sheer number of relationships13 makes an immediate visual interpretation of the corre-lation matrix very dif cult (see the subset of 25 by 25 cells from the correlation matrix below). A AA AAPL ABC ADI ADM ADP ADSK AEE AEP AES AET AFL AGN AIV AIZ AKAM AKS ALL ALTR AMAT AMD AMGN AMT AMZNA 1 0.570668 0.46678 0.408163 0.533252 0.425324 0.535525 0.495613 0.531351 0.486749 0.490094 0.384297 0.476417 0.465186 0.506165 0.450875 0.4315 0.533276 0.490529 0.521889 0.541416 0.454983 0.388191 0.526454 0.447969AA 0.570668 1 0.412423 0.363121 0.432512 0.49727 0.513374 0.453742 0.540668 0.487494 0.555778 0.386198 0.505749 0.417878 0.533665 0.525495 0.433653 0.691676 0.558741 0.443481 0.502896 0.406542 0.357239 0.532022 0.369067AAPL 0.46678 0.412423 1 0.236667 0.43525 0.323588 0.403402 0.417302 0.340484 0.322327 0.319482 0.289725 0.334087 0.328982 0.402068 0.340316 0.38855 0.432112 0.351426 0.444068 0.463454 0.395558 0.330339 0.437053 0.450858ABC 0.408163 0.363121 0.236667 1 0.329262 0.298421 0.416881 0.31158 0.440094 0.417974 0.347976 0.408529 0.294418 0.391646 0.33699 0.360633 0.288028 0.340885 0.39043 0.318401 0.309671 0.244243 0.36276 0.347773 0.269919ADI 0.533252 0.432512 0.43525 0.329262 1 0.321593 0.483858 0.482746 0.425898 0.371848 0.343594 0.314271 0.389693 0.366576 0.462091 0.371839 0.426141 0.460124 0.423266 0.691107 0.638214 0.495377 0.330517 0.467126 0.420969ADM 0.425324 0.49727 0.323588 0.298421 0.321593 1 0.378516 0.322902 0.452433 0.403492 0.417093 0.305003 0.366817 0.304062 0.366267 0.358504 0.389176 0.452943 0.392224 0.352995 0.339473 0.274791 0.266671 0.414046 0.313261ADP 0.535525 0.513374 0.403402 0.416881 0.483858 0.378516 1 0.452686 0.542809 0.527541 0.456298 0.372908 0.50101 0.486193 0.526986 0.507023 0.406286 0.476395 0.514611 0.513513 0.515278 0.394056 0.406387 0.48288 0.41627ADSK 0.495613 0.453742 0.417302 0.31158 0.482746 0.322902 0.452686 1 0.421398 0.402325 0.442238 0.349215 0.417223 0.389226 0.447525 0.405751 0.392804 0.43849 0.41419 0.46149 0.497755 0.396007 0.333145 0.45594 0.383973AEE 0.531351 0.540668 0.340484 0.440094 0.425898 0.452433 0.542809 0.421398 1 0.756735 0.590583 0.424766 0.513378 0.475327 0.474898 0.473565 0.321768 0.452686 0.537636 0.447271 0.436028 0.31983 0.390525 0.465076 0.32218AEP 0.486749 0.487494 0.322327 0.417974 0.371848 0.403492 0.527541 0.402325 0.756735 1 0.565275 0.403458 0.42596 0.440173 0.419188 0.458727 0.318872 0.422276 0.459285 0.396228 0.417472 0.292099 0.398822 0.446867 0.314108AES 0.490094 0.555778 0.319482 0.347976 0.343594 0.417093 0.456298 0.442238 0.590583 0.565275 1 0.378383 0.476892 0.40224 0.420327 0.453099 0.34483 0.492532 0.476188 0.349014 0.398017 0.315139 0.308978 0.438492 0.28071AET 0.384297 0.386198 0.289725 0.408529 0.314271 0.305003 0.372908 0.349215 0.424766 0.403458 0.378383 1 0.370713 0.421565 0.364347 0.420521 0.249157 0.360531 0.427641 0.290668 0.279035 0.275143 0.321026 0.401321 0.280863AFL 0.476417 0.505749 0.334087 0.294418 0.389693 0.366817 0.50101 0.417223 0.513378 0.42596 0.476892 0.370713 1 0.418877 0.588516 0.588617 0.351403 0.446767 0.634718 0.390395 0.459462 0.364762 0.285856 0.50493 0.359955AGN 0.465186 0.417878 0.328982 0.391646 0.366576 0.304062 0.486193 0.389226 0.475327 0.440173 0.40224 0.421565 0.418877 1 0.422619 0.396071 0.323589 0.388559 0.443402 0.332295 0.393542 0.347243 0.345897 0.461649 0.336944AIV 0.506165 0.533665 0.402068 0.33699 0.462091 0.366267 0.526986 0.447525 0.474898 0.419188 0.420327 0.364347 0.588516 0.422619 1 0.558192 0.408232 0.49093 0.644666 0.485371 0.541239 0.390922 0.30768 0.512831 0.397449AIZ 0.450875 0.525495 0.340316 0.360633 0.371839 0.358504 0.507023 0.405751 0.473565 0.458727 0.453099 0.420521 0.588617 0.396071 0.558192 1 0.353718 0.45162 0.616235 0.378966 0.430116 0.315676 0.343417 0.513195 0.347806AKAM 0.4315 0.433653 0.38855 0.288028 0.426141 0.389176 0.406286 0.392804 0.321768 0.318872 0.34483 0.249157 0.351403 0.323589 0.408232 0.353718 1 0.438362 0.364883 0.435992 0.428331 0.368554 0.245363 0.419715 0.385661AKS 0.533276 0.691676 0.432112 0.340885 0.460124 0.452943 0.476395 0.43849 0.452686 0.422276 0.492532 0.360531 0.446767 0.388559 0.49093 0.45162 0.438362 1 0.478014 0.420897 0.475609 0.423204 0.337167 0.508704 0.390437ALL 0.490529 0.558741 0.351426 0.39043 0.423266 0.392224 0.514611 0.41419 0.537636 0.459285 0.476188 0.427641 0.634718 0.443402 0.644666 0.616235 0.364883 0.478014 1 0.436321 0.503192 0.387605 0.312268 0.525026 0.351342ALTR 0.521889 0.443481 0.444068 0.318401 0.691107 0.352995 0.513513 0.46149 0.447271 0.396228 0.349014 0.290668 0.390395 0.332295 0.485371 0.378966 0.435992 0.420897 0.436321 1 0.645041 0.490712 0.332572 0.480285 0.443469AMAT 0.541416 0.502896 0.463454 0.309671 0.638214 0.339473 0.515278 0.497755 0.436028 0.417472 0.398017 0.279035 0.459462 0.393542 0.541239 0.430116 0.428331 0.475609 0.503192 0.645041 1 0.481282 0.354883 0.482778 0.435212AMD 0.454983 0.406542 0.395558 0.244243 0.495377 0.274791 0.394056 0.396007 0.31983 0.292099 0.315139 0.275143 0.364762 0.347243 0.390922 0.315676 0.368554 0.423204 0.387605 0.490712 0.481282 1 0.230527 0.390012 0.318144AMGN 0.388191 0.357239 0.330339 0.36276 0.330517 0.266671 0.406387 0.333145 0.390525 0.398822 0.308978 0.321026 0.285856 0.345897 0.30768 0.343417 0.245363 0.337167 0.312268 0.332572 0.354883 0.230527 1 0.327344 0.330847AMT 0.526454 0.532022 0.437053 0.347773 0.467126 0.414046 0.48288 0.45594 0.465076 0.446867 0.438492 0.401321 0.50493 0.461649 0.512831 0.513195 0.419715 0.508704 0.525026 0.480285 0.482778 0.390012 0.327344 1 0.412541AMZN 0.447969 0.369067 0.450858 0.269919 0.420969 0.313261 0.41627 0.383973 0.32218 0.314108 0.28071 0.280863 0.359955 0.336944 0.397449 0.347806 0.385661 0.390437 0.351342 0.443469 0.435212 0.318144 0.330847 0.412541 1Admittedly, there are a number of statistical techniques available which can help in this situation, but the point is thatgenerating a Bayesian network (e.g. with the Maximum Weight Spanning Tree algorithm we used) takes the practitionerabout the same amount of time as computing a correlation matrix, yet the former yields a much richer picture.Beyond visual interpretability, there is another key distinction between these two representations. Whereas the correla-tion matrix is merely descriptive, the Bayesian network is actually computable. By its very nature, any Bayesian networkis a functioning model. On the other hand, with the correlation matrix one could not predict the value of one stockgiven the observation of several others. For this purpose, we would have to t and estimate speci c models, e.g. a re-gression. In a Bayesian network, however, we can use the graph of the Bayesian network itself for computing inference.For instance, given that we observe the values of JNJ and CL, we immediately obtain an updated value for PG and, atthe same time, also updated values for all other Nodes in the network. We refer to this property as omnidirectional in-ference, which re ects the updating of beliefs given evidence according to Bayes’ Rule.14 We shall illustrate carrying outomnidirectional inference in the next section.Inference with Bayesian NetworksWe have shown that the Maximum Weight Spanning Tree algorithm can generate a readily-interpretable and fully-computable Bayesian network from daily stock return data. However, we have not yet explained in detail what thisstructure represents speci cally.Each Arc in this structure represents a probabilistic relationship between a pair of Nodes. The parameters15 of theserelationships are encoded in Conditional Probability Tables. In the example of the PG and JNJ relationship shown be-low, the table de nes the probabilities of the states of PG, given the states of JNJ. This table can be accessed in theModeling Mode by simply double-clicking on the desired Node, which opens up the Node Editor.13 459 2 − 459 = 105,111 214 See appendix for a brief summary of Bayes’ Theorem.15 We use the term “parameter” rather loosely in this context, as Bayesian networks are entirely nonparametric modelsin | 16
    • Knowledge Discovery in the Stock Market with Bayesian NetworksFor clarity, we show the relevant portion of the network for JNJ and PG below plus an enlarged version of the condi-tional probability table from the Node Editor:This says, among other things, given that we observe a JNJ return greater than 1.2%, there would be a 50.9% probabil-ity that we would observe a PG return of greater than 1.2% (see bottom right cell in the above table). More formallywe can also write, P(PG>0.012 | JNJ > 0.012) = 50.9%.The upper left cell says, given that we observe a JNJ return smaller than -0.9% there is a 46.5% probability that we willobserve a PG return smaller than -1.3%, i.e. P(PG<=0.013 | JNJ <=0.009) = 46.5%.16If we follow the network “downstream,” i.e from PG to KMB, we see that their relationship is quanti ed in yet anotherconditional probability table.16 As the discretization intervals were generated by the K-Means algorithm, the bins do not necessarily have the sameinterval size, which we see in this | 17
    • Knowledge Discovery in the Stock Market with Bayesian NetworksThis can be interpreted in the same way: given that we observe a return of PG greater than 1.2%, there is a 42.4%probability that we would also observe a KMB return of higher than 1.2%. This kind of inference is perhaps the sim-plest type, as we can directly read the table, i.e. “given this, then that.”Inference with Hard EvidenceBeyond reviewing the conditional probability tables directly in Modeling Mode in the Node Editor, as above, we cancarry out inference conveniently in the Validation Mode (shortcut F5) of BayesiaLab.This allows setting evidence and observing inference directly via the Monitors in the Monitor Panel (right side of screen-shot). We will now highlight JNJ and PG and focus on their Monitors only. Prior to setting any evidence, we will sim-ply see their marginal distributions in the Monitors. As we would expect, we see the returns distributed around 0 andthe expected value of the returns is | 18
    • Knowledge Discovery in the Stock Market with Bayesian NetworksObserving a speci c state of a Node is equivalent to setting evidence and we can do that directly on the histograms in-side the Monitors. For instance, we can double-click on the state JNJ > 0.012, which sets it to a 100% probability, asindicated by the green bar. Setting such evidence will automatically propagate this evidence throughout the network andwe can immediately observe the new distribution of PG. The gray arrows indicate how the distributions have changedcompared to before setting evidence.So far, this provides no more insight than what we could read from the Conditional Probability Table in the Node Edi-tor of the PG Node. What is not readily accessible from the CPT is the inverse probability by carrying out inference inthe opposite direction of the Arc, i.e. setting evidence on PG and computing JNJ. Bayes’ Rule speci es the necessarycomputation in this case.1717 See appendix for more details about Bayes’ Rule. Although this calculation is straightforward, application errors areunfortunately commonplace. The error is so common that is now widely known as the Prosecutor’s Fallacy. In a recentwhite paper, Paradoxes and Fallacies, we dedicated a chapter to this | 19
    • Knowledge Discovery in the Stock Market with Bayesian NetworksIn BayesiaLab the inference computation of JNJ is automatic once we set evidence to PG. To illustrate this, we arbitrar-ily set the PG return to <=-1.3% and we can immediately see the updated distribution of JNJ.So far, this could have been computed quite easily by directly applying Bayes’ Rule. It becomes a bit more challengingwhen we look at more than two Nodes at the same time. This time we will examine JNJ, PG and KMB (their relevantsubnetwork is shown for reference below).Once again, prior to setting any evidence, the Monitors show the marginal distributions of JNJ, PG and | 20
    • Knowledge Discovery in the Stock Market with Bayesian NetworksUpon setting JNJ > 0.012, we can now see how the evidence not only propagates to PG, but also further “downstream”to KMB:We can also invert the chain of inference by simply setting evidence at the other end of the network, e.g. KMB > | 21
    • Knowledge Discovery in the Stock Market with Bayesian NetworksOr, we can set evidence on both ends, i.e. on JNJ and KMB, and then read the inference in the middle, for PG.This inference will probably not surprise us: we now have an 80% probability that PG will have a return greater than1.2%, given that we set both JNJ and KMB to >0.012.Inference with Soft EvidenceWe are not limited to only setting “hard evidence,” as we did above. In the real world, observations often provide “softevidence” only. So, instead of setting any of these variables to a state with a 100% probability and thus make them“hard evidence,” we can use BayesiaLab to set any evidence according to its nature, even when it is uncertain.For illustration purposes, we will now generate two kinds of “soft evidence,” one for JNJ and one for KMB.1. We set the evidence directly by right-clicking on the JNJ Monitor and selecting Enter Probabilities: We can now adjust the histogram by dragging the bars to the desired probability levels which re ect our subjective | 22
    • Knowledge Discovery in the Stock Market with Bayesian Networks Clicking the light-green button con rms our choice of probabilities. In addition, we right-click on the Monitor again to Fix Probabilities, meaning that we want to hold these values re- gardless of any subsequent evidence we enter.2. Assuming that we have a more general expectation regarding the KMB return, without having any beliefs regarding the probabilities of speci c states, we can set the expected mean of the entire KMB distribution. For instance, we set the expected mean of the states of KMB to -1% by right-clicking the KMB Monitor and selecting Distribution for Target Value/ | 23
    • Knowledge Discovery in the Stock Market with Bayesian Networks We type in “-0.01” into the dialog box, which generates a new KMB distribution with the desired mean value of -0.01 or -1%. It is obvious that an in nite number of combinations could generate a mean value of -1%. However, as an aid to the analyst, BayesiaLab computes which distribution with a mean value of -1% would be “closest” to the a-priori distri- bution.Not only are these observations “soft,” in this example they are also of the opposite sign, i.e. JNJ has a positive mean ofthe return and KMB has a negative mean of the return.As a result, carrying out inference generates a more uniform probability distribution for PG (rather than a narrowerdistribution), effectively increasing our uncertainty about the state of PG compared to the marginal distribution. Theknowledge gain for the analyst is that greater volatility for PG must be expected.We have limited our example to inference within a small subnetwork of only three Nodes, but we could have performedthe same approach over the entire Bayesian network of 459 Nodes. With this, the analyst has the complete freedom toset an unlimited number of all different kinds of evidence, both hard and soft, and to carry out inference “backwards”and “forwards” within the network. For users of the BayesiaLab software, the automatic computation of inference andthe instant visual updating of the Monitors is comparable to recalculating all cells in a large | 24
    • Knowledge Discovery in the Stock Market with Bayesian NetworksBayesian Network MetricsAs shown in these examples, the Arcs represent the probabilistic relationships between Nodes. In addition to visuallyinterpreting the network structure, and beyond carrying out inference, we can also review the “summary statistics” ofthe network and its components with several metrics.It is important to point out that we use the information theory-based concepts of Entropy, Arc Force and Mutual In-formation as central metrics in generating and analyzing Bayesian networks. This is a clear departure from commonlyused metrics in traditional statistics, such as covariance and correlation. While these information theory-based metricsmay appear novel to end-users of research, they have many advantages. Most importantly, we can entirely discard the(often incorrect) assumption regarding linearity and normal distributions. As a result, highly nonlinear dynamics can beeasily captured in a Bayesian network.Arc ForceFor instance, the importance of each Arc can be highlighted by displaying the associated Arc Force and its contributionwith respect to the overall network. From within the Validation Mode, the Arc Force can be displayed by selectingAnalysis>Graphic>Arc Force (or with the shortcut “F”) | 25
    • Knowledge Discovery in the Stock Market with Bayesian NetworksMutual InformationA perhaps more accessible interpretation is possible by displaying the Mutual Information, which can be obtained byselecting Analysis>Graphic>Arcs’ Mutual Information.18The Mutual Information I(X,Y) measures how much (on average) the observation of random variable Y tells us aboutthe uncertainty of X, i.e. by how much the entropy of X is reduced if we have information on Y. Mutual Information isa symmetric metric, which re ects the uncertainty reduction of X by knowing Y as well as of Y by knowing X.In our example, knowing the value of PG on average reduces the uncertainty of the value of KMB by 0.2843 bits, whichmeans that it reduces its uncertainty by 13.27% (shown in blue, in the direction of the arc). Conversely, knowing KMBreduces the uncertainty or PG by 13.09% (shown in red, in the opposite direction of the arc).18 Although interpreting Mutual Information is somewhat more intuitive, in the case of a network tree, Mutual Infor-mation is identical to Arc Force. For Bayesian networks that are not trees, this distinction becomes very | 26
    • Knowledge Discovery in the Stock Market with Bayesian NetworksCorrelationWhile we emphasize the importance of Arc Force and Mutual Information as measures capable for capturing nonlinearrelationships, BayesiaLab allows to display Pearson’s R for the network (select Analysis>Graphic>Pearson’s Correlationor shortcut “G”).By displaying the Pearson’s correlation coef cient, we implicitly make the assumption of linear relationships betweenthe connected Nodes, which may often not hold in practice. Special care must thus be taken when interpreting low val-ues of R, as they may re ect nonlinearity rather than independence. On the other hand, R values close to 1 do indeedsuggest the presence of linear relationship. Furthermore, Pearson’s R can be very helpful for determining the sign of therelationship between variables. BayesiaLab will color-code positive and negative correlations by highlighting the associ-ated Arcs in blue and red respectively. Finally, correlation is typically a much more familiar metric to most audienceswho are not familiar with Mutual Information.Summary - Unsupervised LearningIn summary, Unsupervised Learning is an excellent approach to obtain a general understanding of simultaneous rela-tionships between many variables in a dataset. The learned Bayesian network allows immediate visual | 27
    • Knowledge Discovery in the Stock Market with Bayesian Networksplus immediate computation of omnidirectional inference based on any type of evidence, including uncertain and con- icting observations. Given these properties, Unsupervised Learning with Bayesian networks becomes a universal androbust tool for knowledge discovery and modeling in unknown problem | 28
    • Knowledge Discovery in the Stock Market with Bayesian NetworksSupervised LearningUpon gaining a general understanding of a domain, questions typically arise regarding individual variables and how topredict them speci cally. Even though we can use Unsupervised Learning to discover a network structure and use it forprediction, Supervised Learning is often a more appropriate method when studying a speci c target variable. By focus-ing on a single target variable, BayesiaLab’s learning algorithms focus on tting a (generative) model to a single targetrather than tting a model that balances the t in terms of all variables.To remain consistent with the example we started earlier, we will once again use PG for illustration purposes. Morespeci cally, we will characterize PG as the Target Node. We can do so by right-clicking on the node and then selectingSet as Target Node from the contextual menu (or by double-clicking the Node while holding “T”).Now that we have de ned a Target Node, we can perform a range of Supervised Learning algorithms implemented inBayesiaLab.19The Markov Blanket20 algorithm is suitable for this kind of application and its speed is particularly helpful when deal-ing with hundreds or even thousands of variables. Furthermore, BayesiaLab offers the Augmented Markov Blanket,which starts with the Markov Blanket structure and then uses an unsupervised search to nd the probabilistic relationsthat hold between each variable belonging to the Markov Blanket.21 This unsupervised search requires additional com-putation time but generally results in an improved predictive performance of the model.The learning process can be started by selecting Learning>Target Node Characterization>Augmented Markov Blanketfrom the menu.2219 For expositional clarity we will only learn contemporaneous relationships and, as a result, potential lag structures willnot appear in the resulting networks. However, in BayesiaLab, Supervised Learning can be generalized to a temporalapplication.20 See appendix for a de nition of the Markov Blanket21 Intuitively, the “augmented” part of the network plays the same role as the interaction terms between independentvariables in a regression.22 In BayesiaLab nomenclature, Supervised Learning is listed in the Learning menu as “Target Node Characterization” | 29
    • Knowledge Discovery in the Stock Market with Bayesian NetworksAs we still have our previous network that was generated through Unsupervised Learning, we need to con rm the dele-tion of that original network before proceeding with Supervised Learning.After a few seconds, we will see the result of the Supervised Learning process. Our Target Node PG is now connected toall variables in its Markov Blanket. This means that, given the knowledge of the Nodes in the Markov Blanket, PG isindependent of the remaining network. This effectively identi es the subset of variables which are most important forpredicting the value of the Target Node, PG.As stated in the introduction, it is not our intention to forecast stock prices per se, but rather to identify meaningful andrelevant structures in the market. Such a structure is this Augmented Markov Blanket and a stock market analyst canuse it to identify a relevant subset of stocks for an in-depth analysis, perhaps with the objective of establishing a buy/sellrecommendation or to directly trade on such knowledge.Once we have this network, we can use it to analyze these Nodes’ relationships in a number of ways within BayesiaLab.For instance, we can select Analysis>Graphic>Target Mean Analysis, which graphs PG as a function of the other Nodesin the | 30
    • Knowledge Discovery in the Stock Market with Bayesian NetworksAlternatively, by selecting Analysis>Report>Target Analysis>Correlation with the Target Node,we obtain a table displaying the Mutual Information between the Nodes in the network and the Target Variable, | 31
    • Knowledge Discovery in the Stock Market with Bayesian NetworksBy clicking Quadrants these values can be displayed as a graph:Inference with Supervised LearningTo illustrate potential applications of Supervised Learning, beyond interpretation, we have created a simple simulationof possible stock market conditions. Despite the hypothetical nature of these scenarios, the underlying Bayesian networkwas learned from actual market data (as is the case for this entire white paper) and, as a result, the computed inferencebased on these assumed conditions is “real.”One could imagine this purely hypothetical scenario: Colgate-Palmolive and Johnson & Johnson are involved in a pat-ent lawsuit and an investment analyst speculates about the impact of the imminent verdict in this court case. It is fairlyeasy to imagine that a verdict in favor of Johnson & Johnson would result in a boost to its stock price and | 32
    • Knowledge Discovery in the Stock Market with Bayesian Networksously cause a sharp drop for Colgate-Palmolive’s stock. Conversely, a win for Colgate-Palmolive would result in just theopposite. However, our question is how either outcome would affect Procter & Gamble’s return, PG. We can best an-swer this question by simulating either outcome within the Bayesian network we learned.Prior to setting any evidence, our marginal distributions of returns would be as follows, i.e. this is what we would ex-pect any given day without any other knowledge:If we were now to believe in a Johnson & Johnson win in combination with a Colgate-Palmolive loss and the corre-sponding stock price movement for both of them, we could create the following scenario:The gray arrows now highlight the impact on all other stocks in this model, including our target variable, PG. Themodel suggests that the new distribution for PG would now be distinctly bimodal as opposed to the normal | 33
    • Knowledge Discovery in the Stock Market with Bayesian NetworksNow considering the opposite verdict, i.e. a Colgate-Palmolive win and a Johnson & Johnson defeat, we can once againassume their resulting stock price movements and then infer the impact on PG.This time, the a gain for PG would be much more probable.So, if an analyst had a deep understanding of the subject matter (or insider knowledge23 ) and hence could anticipate thepatent trial’s outcome, he should, everything else being equal, update his beliefs regarding the Procter & Gamble stockreturn according to the computed inference of our model.It is important to stress that this doesn’t mean we have discovered a causal pathway, but rather that we are taking ad-vantage of historically observed associations between returns, which have generated a model in the form of a Bayesiannetwork. The Bayesian network simply allows us to consequently exploit our learned knowledge.Adaptive QuestionnaireThe Bayesian network from above can perhaps also serve to illustrate how evidence-gathering can be optimized inBayesiaLab. Once again, this is purely hypothetical, but let’s assume that a stock trader seeks to predict tomorrow’sreturn of PG. Tomorrow, as it turns out, earnings will also be released for numerous other stocks in the CPG industry,excluding PG. With limited time, our stock trader needs to prioritize his research resources on those stocks, which willbe most informative of the PG return. BayesiaLab has a convenient function, Adaptive Questionnaire, which allows theanalyst to adapt his evidence-seeking process as per the most recent information obtained and given the previouslylearned Bayesian network (shown again below for reference).23 It should be noted that insider trading can refer to both legal and illegal conduct. See | 34
    • Knowledge Discovery in the Stock Market with Bayesian NetworksThe function can be called by selecting Inference>Adaptive Questionnaire. The following pop-up window then promptsto select and con rm the Target.Initially, the analyst’s research should begin with CL as the most informative Node, which is listed at the top of allMonitors, right below the Target, | 35
    • Knowledge Discovery in the Stock Market with Bayesian NetworksLet’s now assume he receives a tip, suggesting that CL earnings are coming in much higher than expected. He translatesthis updated, subjective beliefs into “soft” evidence and thus sets P(CL>0.017)=60%, P(CL<=0.017)=30%,P(CL<=0.05)=10%, plus the remaining states to zero.Upon entering this probability distribution, the Adaptive Questionnaire will move CL to the bottom (green bars withgray background) and scroll up the next most important Node to study, in this case KMB.Upon setting this evidence, the probabilities need to be xed by right-clicking the Monitor and selecting Fix Probabili-ties.This is important as other simultaneous beliefs have yet to be set. By not xing the probabilities of CL, subsequent evi-dence could inadvertently update the probabilities that were just de ned.Next, the analyst may obtain inconclusive views from his sources on KMB and thus he cannot set any new evidence tothis particular Node, although it would be the most informative evidence at this point. Rather, he moves on to CLX,which is widely believed to meet the expected earnings without any surprises. As a result, our analyst sets hard negativeevidence on either end of the return distribution, meaning that he anticipates no major swings either way:P(CLX<=-0.11)=0 and P(CLX>0.13)=0. Upon setting this evidence, and once again xing it, the Adaptive | 36
    • Knowledge Discovery in the Stock Market with Bayesian Networksnaire presents a new order of Nodes. Interestingly, given the evidence set on CLX, KMB has declined in importancewith respect to PG.In the new order JNJ is next and our analyst determines that the stock will de nitely gain based on insider rumors heheard. He translates this insight into a certain JNJ return greater than 1.2% and sets it as “hard” evidence accordingly.Given all the evidence he gathered, although some of it may be vague, the analyst concludes that there is now a 90%probability of a PG return greater than 0.3%. Perhaps more importantly, the chance of a decline of -1.3% or below hasdiminished to virtually zero. This translates into an expected mean return of 1.5% versus the a-priori expectation of0%.With the Bayesian network generated through Unsupervised Learning and the subsequent application of the AdaptiveQuestionnaire, the analyst has optimized his information-seeking process and thus spent the least amount of resourcesfor a maximum reduction of uncertainty regarding the variable of | 37
    • Knowledge Discovery in the Stock Market with Bayesian NetworksSummary - Supervised LearningIn many ways, Supervised Learning with BayesiaLab resembles traditional modeling and can thus be benchmarkedagainst a wide range of statistical techniques. In addition to its predictive performance, BayesiaLab offers an array ofanalysis tools, which can provide the analyst with a deeper understanding of the domain’s underlying dynamics. TheBayesian network also provides the basis for a wide range of scenario simulation and optimization algorithms imple-mented in BayesiaLab. Beyond mere one-time predictions, BayesiaLab allows dealing with evidence interactively andincrementally, which makes it a highly adaptive tool for real-time | 38
    • Knowledge Discovery in the Stock Market with Bayesian NetworksAppendixAppendixMarkov BlanketIn many cases, the Markov Blanket algorithm is a good starting point for any predictive model, whether used for scor-ing or classi cation. This algorithm is extremely fast and can even be applied to databases with thousands of variablesand millions of records.The Markov Blanket for a node A is the set of nodes composed of A’s parents, its children, and its children’s other par-ents (=spouses).The Markov Blanket of the node A contains all the variables, which, if we know their states, will shield the node Afrom the rest of the network. This means that the Markov Blanket of a node is the only knowledge needed to predictthe behavior of that node A. Learning a Markov Blanket selects relevant predictor variables, which is particularly help-ful when there is a large number of variables in the database (In fact, this can also serve as a highly-ef cient variableselection method in preparation for other types of modeling, outside the Bayesian network framework).Bayes’ TheoremBayes’ theorem relates the conditional and marginal probabilities of discrete events A and B, provided that the probabil-ity of B does not equal zero: P(B A)P(A)P(A B) = P(B)In Bayes’ theorem, each probability has a conventional name:• P(A) is the prior probability (or “unconditional” or “marginal” probability) of  A. It is “prior” in the sense that it does not take into account any information about  B. The unconditional probability  P(A) was called “a  priori” by Ronald A. Fisher.• P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from or depends upon the speci ed value of | 39
    • Knowledge Discovery in the Stock Market with Bayesian Networks• P(B|A) is the conditional probability of B given A. It is also called the likelihood.• P(B) is the prior or marginal probability of B.Bayes theorem in this form gives a mathematical representation of how the conditional probability of event A given B isrelated to the converse conditional probability of B given A.About the AuthorsStefan ConradyStefan Conrady is the cofounder and managing partner of Conrady Applied Science, LLC, a privately held consulting rm specializing in knowledge discovery and probabilistic reasoning with Bayesian networks. In 2010, Conrady AppliedScience was appointed the authorized sales and consulting partner of Bayesia S.A.S. for North America.Stefan Conrady studied Electrical Engineering and has extensive management experience in the elds of product plan-ning, marketing and analytics, working at Daimler and BMW Group in Europe, North America and Asia. Prior to es-tablishing his own rm, he was heading the Analytics & Forecasting group at Nissan North America.Lionel JouffeDr. Lionel Jouffe is cofounder and CEO of France-based Bayesia S.A.S. Lionel Jouffe holds a Ph.D. in Computer Scienceand has been working in the eld of Arti cial Intelligence since the early 1990s. He and his team have been developingBayesiaLab since 1999 and it has emerged as the leading software package for knowledge discovery, data mining andknowledge modeling using Bayesian networks. BayesiaLab enjoys broad acceptance in academic communities as well asin business and industry. The relevance of Bayesian networks, especially in the context of consumer research, is high-lighted by Bayesia’s strategic partnership with Procter & Gamble, who has deployed BayesiaLab globally since | 40
    • Knowledge Discovery in the Stock Market with Bayesian NetworksContact InformationConrady Applied Science, LLC312 Hamlet’s End WayFranklin, TN 37067USA+1 888-386-8383info@conradyscience.comwww.conradyscience.comBayesia S.A.S.6, rue Léonard de VinciBP 11953001 Laval CedexFrance+33(0)2 43 49 75 69info@bayesia.comwww.bayesia.comCopyright© 2011 Conrady Applied Science, LLC and Bayesia S.A.S. All rights reserved.Any redistribution or reproduction of part or all of the contents in any form is prohibited other than the following:• You may print or download this document for your personal and noncommercial use only.• You may copy the content to individual third parties for their personal use, but only if you acknowledge Conrady Applied Science, LLC and Bayesia S.A.S as the source of the material.• You may not, except with our express written permission, distribute or commercially exploit the content. Nor may you transmit it or store it in any other website or other form of electronic retrieval | 41