Breast Cancer Diagnostics with Bayesian Networks


Published on

The Wisconsin Breast Cancer Database (WBCD) is a widely studied (and publicly available) data set from the field of breast cancer diagnostics. The creators of this database, Wolberg, Street, Heisey and Managasarian, made an important contribution with their research towards automating diagnostics with image processing and machine learning.

Beyond the medical field, many statisticians and computer scientists have proposed a wide range of classification models based on WBCD. Such new methods have continuously raised the benchmark in terms of diagnostic performance.

Our white paper now reevaluates the Wisconsin Breast Cancer Database within the framework of Bayesian networks, which, to our knowledge, has not been done before. We demonstrate how the BayesiaLab software can extremely quickly — and simply — create a Bayesian network model that is on par performance-wise with virtually all existing models that have been developed from WBCD over the last 15 years.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Breast Cancer Diagnostics with Bayesian Networks

  1. 1. Breast Cancer Diagnostics with Bayesian NetworksInterpreting the Wisconsin Breast Cancer Database with BayesiaLabStefan Conrady, stefan.conrady@conradyscience.comDr. Lionel Jouffe, jouffe@bayesia.comMay 20, 2013
  2. 2. Table of ContentsCase Study & TutorialIntroduction 4Background 6Wisconsin Breast Cancer Database 6Notation 7Model Development 8Data Import 8Unsupervised Learning 13Model 1: Markov Blanket 16Model 1: Performance 21K-Folds Cross-Validation 23Model 2: Augmented Markov Blanket 25Model 2a: Performance 28Structural Coefficient 32Model 2b: Augmented Markov Blanket (SC=0.3) 38Model 2b: Performance 39Conclusion 40Model Inference 41Interactive Inference 42Adaptive Questionnaire 43Target Interpretation Tree 46Summary 52AppendixFramework: The Bayesian Network Paradigm 53Acyclic Graphs & Bayes’s Rule 53Compact Representation of the Joint Probability Distribution 54References 55Contact Information 56Bayesia USA 56Breast Cancer Diagnostics with Bayesian Networksii | |
  3. 3. Bayesia Singapore Pte. Ltd. 56Bayesia S.A.S. 56Copyright 56Breast Cancer Diagnostics with Bayesian | | iii
  4. 4. Case Study & TutorialIntroductionData classification is one of the most common tasks in the field of statistical analysis and countless methodshave been developed for this purpose over time. A common approach is to develop a model based onknown historical data, i.e. where the class membership of a record is known, and to use this generalizationto predict the class membership for a new set of observations.Applications of data classifications permeate virtually all fields of study, including social sciences, engineer-ing, biology, etc. In the medical field, classification problems often appear in the context of disease identifi-cation, i.e. making a diagnosis about a patient’s condition. The medical sciences have a long history of de-veloping large body of knowledge, which links observable symptoms with known types of illnesses. It is thephysician’s task to use the available medical knowledge to make inference based on the patient’s symptoms,i.e. to classify the medical condition in order to enable appropriate treatment.Over the last two decades, so-called medical expert systems have emerged, which are meant to support phy-sicians in their diagnostic work. Given the sheer amount of medical knowledge in existence today, it shouldnot be surprising that significant benefits are expected from such machine-based support in terms of medicalreasoning and inference.In this context, several papers by Wolberg, Street, Heisey and Managasarian became much-cited examples.They proposed an automated method for the classification of Fine Needle Aspirates1 through imaging proc-essing and machine learning with the objective of achieving a greater accuracy in distinguishing betweenmalignant and benign cells for the diagnosis of breast cancer. At the time of their study, the practice of vis-ual inspection of FNA yielded inconsistent diagnostic accuracy. The proposed new approach would increasethis accuracy reliably to over 95%. This research was quickly translated into clinical practice and has sincebeen applied with continued success.As part of their studies in the late 1980s and 1990s, the research team generated what became known as theWisconsin Breast Cancer Database, which contains measurements of hundreds of FNA samples and the as-sociated diagnoses. This database has been extensively studied, even outside the medical field. Statisticiansand computer scientists have proposed a wide range of techniques for this classification problem and havecontinuously raised the benchmark for predictive performance.Our objective with this paper is to present Bayesian networks as a highly practical framework for workingwith this kind of classification problem. We intend to demonstrate how the BayesiaLab software can ex-Breast Cancer Diagnostics with Bayesian Networks4 | | www.bayesia.com1 Fine needle aspiration (FNA) is a percutaneous (“through the skin”) procedure that uses a fine gauge needle (22 or 25gauge) and a syringe to sample fluid from a breast cyst or remove clusters of cells from a solid mass. With FNA, thecellular material taken from the breast is usually sent to the pathology laboratory for analysis.
  5. 5. tremely quickly, and relatively simply, create Bayesian network models that achieve the performance of thebest custom-developed models, while only requiring a fraction of the development time.Furthermore, we wish to illustrate how Bayesian networks can help researchers and practitioners generate adeeper understanding of the underlying problem domain. Beyond merely producing predictions, we can useBayesian networks to precisely quantify the importance of individual variables and employ BayesiaLab tohelp identify the most efficient path towards a diagnosis.BayesiaLab’s speed of model building, its excellent classification performance, plus the ease of interpretationprovide researchers with a powerful new tool. Bayesian networks and BayesiaLab have thus become a driverin accelerating research.Breast Cancer Diagnostics with Bayesian | | 5
  6. 6. BackgroundTo provide context for this study, we quote Mangasarian, Street and Wolberg (1994), who conducted theoriginal research related breast cancer diagnosis with digital image processing and machine learning:Most breast cancers are detected by the patient as a lump in the breast. The majority of breastlumps are benign, so it is the physician’s responsibility to diagnose breast cancer, that is, to distin-guish benign lumps from malignant ones. There are three available methods for diagnosing breastcancer: mammography, FNA with visual interpretation and surgical biopsy. The reported sensitiv-ity, i.e. ability to correctly diagnose cancer when the disease is present of mammography variesfrom 68% to 79%, of FNA with visual interpretation from 65% to 98%, and of surgical biopsyclose to 100%.Therefore mammography lacks sensitivity, FNA sensitivity varies widely, and surgical biopsy, al-though accurate, is invasive, time consuming and costly. The goal of the diagnostic aspect of ourresearch is to develop a relatively objective system that diagnoses FNAs with an accuracy that ap-proaches the best achieved visually.Wisconsin Breast Cancer DatabaseThis breast cancer database was created through the clinical work of Dr. William H. Wolberg at the Univer-sity of Wisconsin Hospitals in Madison. As of 1992, Dr. Wolberg had collected 699 instances of patientdiagnoses in this database, consisting of two classes: 458 benign cases (65.5%) and 241 malignant cases(34.5%).The following eleven attributes2 are included in the database:1. Sample code number2. Clump Thickness (1 - 10)3. Uniformity of Cell Size (1 - 10)4. Uniformity of Cell Shape (1 - 10)5. Marginal Adhesion (1 - 10)6. Single Epithelial Cell Size (1 - 10)7. Bare Nuclei (1 - 10)8. Bland Chromatin (1 - 10)9. Normal Nucleoli (1 - 10)10. Mitoses (1 - 10)11. Class (benign/malignant)Breast Cancer Diagnostics with Bayesian Networks6 | | www.bayesia.com2 “Attribute” and “variable” are used interchangeably throughout the paper.
  7. 7. Attributes #2 through #10 were computed from digital images of fine needle aspirates (FNA) of breastmasses. These features describe the characteristics of the cell nuclei in the image. The attribute #11, Class,was established via subsequent biopsies or via long-term monitoring of the tumor.We will not go into detail here regarding the definition of the attributes and their measurement. Rather, werefer the reader to papers referenced in the bibliography.The Wisconsin Breast Cancer Database is available to any interested researcher from the UC Irvine MachineLearning Repository.3 We use this database in its original format without any further transformation, soour results can be directly compared to dozens of methods that have been developed since the originalstudy.NotationTo clearly distinguish between natural language, software-specific functions and study-specific variablenames, the following notation is used:• BayesiaLab-specific functions, keywords, commands, etc., are capitalized and printed in bold type. Youcan look up such terms in the BayesiaLab Library ( for more details.• The names of variables, attributes, nodes, and node states are capitalized and italicized.Breast Cancer Diagnostics with Bayesian | | 73 UC Irvine Machine Learning Repository website:
  8. 8. Model DevelopmentData ImportOur modeling process begins with importing the database,4 which is formatted as a text file with comma-separated values. Therefore, we start with Data | Open Data Source | Text File.The Data Import Wizard then guides us through the required steps. In the first dialogue box of the DataImport Wizard, we click on Define Typing and specify that we wish to set aside a Test Set from the data-base.Breast Cancer Diagnostics with Bayesian Networks8 | | www.bayesia.com4 If we exclude the variable Sample code number, this database can also be used with the publicly-available evaluationversion of BayesiaLab, which is limited to a maximum of ten nodes. Deleting this variable does not affect the workflowor the results of the analysis.
  9. 9. Following common practice, we will randomly select 20% of the 699 records as Test Set, and, conse-quently, the remaining 80% will serve as our Learning Set set.5In the next step, the Data Import Wizard will suggest the data format for each variable. Attributes 2through 10 are identified as continuous variables and Class is read as a discrete variable. Only for the firstvariable, Sample code number, we have to specify Row Identifier, so it is not mistaken for a continuous pre-dictor variable.In the next step, the Information Panel reports that we have a total of 16 missing values in the entire data-set. We can also see that the column Bare Nuclei is labeled with a small question mark, indicating the pres-ence of missing values in this particular column.Breast Cancer Diagnostics with Bayesian | | 95 “Learning/Test Set” and “Learning/Test Sample” are used interchangeably in this paper.
  10. 10. We now need to specify the type of Missing Values Imputation. Given the small size of the dataset, and thesmall number of missing values, we will choose the Structural EM method.6A critical element of the data import process is the discretization of all continuous variables. On the nextscreen we click Select All Continuous to apply the same discretization algorithm across all continuous vari-ables. Alternatively, we could choose the type of discretization individually by variable. However, we willnot discuss this option any further in this paper.As the objective of this exercise is classification, we choose the Decision Tree algorithm from the drop-downmenu in the Multiple Discretization panel. This discretizes each variable for a maximum information gainwith respect to the Target Class.Breast Cancer Diagnostics with Bayesian Networks10 | | www.bayesia.com6 For more details on missing values imputation with Bayesian network, see Conrady and Jouffe (2012).
  11. 11. Bayesian networks are entirely non-parametric, probabilistic models, and for their estimation they require acertain minimum number of observations. To help us with the selection of the number of discretization lev-els (or Intervals), we use the heuristic of five observations per parameter and probability cell. Given that wehave a relatively small database with only 560 observations,7 three discretization intervals for each variableappear to be an appropriate choice. If we used a higher number of Intervals, we would need more observa-tions for a reliable estimation of the parameters.Upon clicking Finish, we will immediately see a representation of the newly imported database in the formof a fully unconnected Bayesian network in the Graph Panel. Each variable is now represented as a bluenode in the graph panel of BayesiaLab.Breast Cancer Diagnostics with Bayesian | | 117 560 cases are in the training set (80%) and 139 are in the test set (20%).
  12. 12. The question mark symbol, which is associated with the Bare Nuclei node, indicates that there are missingvalues for this variable. Hovering over the question mark with the mouse pointer while pressing the “i” keywill show the number of missing values.Optionally, BayesiaLab can display an import report summarizing the obtained discretizations for all vari-ables.Breast Cancer Diagnostics with Bayesian Networks12 | |
  13. 13. Unsupervised LearningWhen exploring a new domain, we generally recommended performing Unsupervised Learning on the newlyimported database. This is also the case here, even though our principal objective is predictive modeling, forwhich Supervised Learning will later be the main tool.Learning | Unsupervised Structural Learning | EQ initiates the EQ Algorithm, which is suitable for the initialreview of the database. For larger databases with significantly more variables, the Maximum Weight Span-ning Tree is a very fast algorithm and can be used instead.Upon learning, the initial Bayesian network looks like this:In its “raw” form, the crossing arcs make this network somewhat tricky to read. BayesiaLab has a numberof layout algorithms that can quickly “disentangle” such a network and produce a much more user-friendlyformat.Breast Cancer Diagnostics with Bayesian | | 13
  14. 14. We can select View | Automatic Layout or alternative use the shortcut “P”.Now we can visually review the learned network structure and compare it to our own domain knowledge.This allows for a “sanity check” of the database and the variables, and it may highlight any inconsistencies.Beyond visually inspecting the network structure, BayesiaLab allows us to visualize the quantitative part ofthis network. To do this, we first need to switch into the Validation Mode by clicking on the highlightedbutton in the lower-lefthand corner of the Graph Panel, or by alternatively using the “F5” key as a shortcut.We can now display the Pearson Correlation between the nodes that are directly linked in the graph by se-lecting Analysis | Visual | Pearson’s Correlation from the menu.Breast Cancer Diagnostics with Bayesian Networks14 | |
  15. 15. Each arc’s thickness is now proportional to the Pearson Correlation between the connected nodes. Also, theblue and red colors indicate positive and negative correlations respectively. Any unexpected sign of correla-tions would thus become apparent very quickly. In our example, we only have positive correlations and thusall arcs are blue.Additionally, callouts indicate that further information can be displayed. We can opt to display thisnumerical information via View | Display Arc Comments.Breast Cancer Diagnostics with Bayesian | | 15
  16. 16. This function is also available via a button in the menu:Model 1: Markov BlanketNow that we have performed an initial review of the dataset with the Unsupervised Learning step, we canreturn to the Modeling Mode by clicking on the corresponding button in the lower lefthand corner of theBreast Cancer Diagnostics with Bayesian Networks16 | |
  17. 17. screen or using the shortcut “F4”.8This allows us to proceed to the modeling stage. Given our objective of predicting the state of the variableClass, i.e. benign versus malignant, we will define Class as the Target Variable by right-clicking on the nodeand selecting Set as Target Variable from the contextual menu. Alternatively, we can double-click on Classwhile holding the shortcut “T” pressed. We need to specify this explicitly, so the subsequent SupervisedLearning algorithm can use Class as the dependent variable.This setting is confirmed by the “bullseye”appearance of the new Target Node.Breast Cancer Diagnostics with Bayesian | | 178 We will mostly omit further references to switching between Modeling Mode (F4) and Validation Mode (F5). Therequired modes can generally be inferred from the context.
  18. 18. Upon this selection, all Supervised Learning algorithms become available under Learning | Supervised Learn-ing.In many cases, the Markov Blanket algorithm is a good starting point for a predictive model. This algorithmis extremely fast and can even be applied to databases with thousands of variables and millions of records,even though database size is not a concern in this particular study.Upon learning the Markov Blanket for Class, and once again applying the Automatic Layout, the resultingBayesian network looks as follows:Markov Blanket DefinitionThe Markov Blanket for a node A is the set of nodes composed of A’s parents, its children, and itschildren’s other parents (=spouses).The Markov Blanket of the node A contains all the variables, which,if we know their states, will shield the node A from the rest of thenetwork. This means that the Markov Blanket of a node is the onlyknowledge needed to predict the behavior of that node A. Learning aMarkov Blanket selects relevant predictor variables, which is particu-larly helpful when there is a large number of variables in the database.In fact, this can also serve as a highly-efficient variable selectionmethod in preparation for other types of modeling, e.g. neural net-works.Breast Cancer Diagnostics with Bayesian Networks18 | |
  19. 19. This network suggests that Class has a direct probabilistic relationship with all variables except MarginalAdhesion and Single Epithelial Cell Size, which are both disconnected. The lack of their connection with theTarget indicates that these nodes are independent given the nodes in the Markov Blanket.Beyond distinguishing between predictors (connected nodes) and non-predictors (disconnected nodes), wecan further examine the relationship versus the Target Node Class by highlighting the Mutual Informationof the arcs connecting the nodes. This function is accessible within the Validation Mode via Analysis | Vis-ual | Arcs’ Mutual Information.NoteWe can see on the graph learned earlier with the EQ algorithm that Uniformity of Cell Shape is thenode that makes these two nodes conditionally independent of Class.Breast Cancer Diagnostics with Bayesian | | 19
  20. 20. We will also go ahead and immediately select View | Display Arc Comments.The thickness of the arcs is now proportional to the Mu-tual Information, i.e. the strength of the relationship be-tween the nodes. Intuitively, Mutual Information measuresthe information that X and Y share: it measures how muchknowing one of these variables reduces our uncertaintyabout the other. For example, if X and Y are independent,then knowing X does not provide any information about Yand vice versa, so their Mutual Information is zero. At the other extreme, if X and Y are identical then allinformation conveyed by X is shared with Y: knowing X determines the value of Y and vice versa.Formal Definition of Mutual InformationI(X;Y ) = p(x,y)logp(x,y)p(x)p(y)⎛⎝⎜⎞⎠⎟x∈X∑y∈Y∑Breast Cancer Diagnostics with Bayesian Networks20 | |
  21. 21. In the top part of the comment box attached to each arc, the Mutual Information of the arc isshown. Expressed as a percentage and highlighted in blue, we see the relative Mutual Informa-tion in the direction of the arc (parent node ➔ child node). And, at the bottom, we have therelative Mutual Information in the opposite direction of the arc (child node ➔ parent node).Model 1: PerformanceAs we are not equipped with specific domain knowledge about the variables, we will not further interpretthese relationships but rather run an initial test regarding the Network Performance. We want to know howwell this Markov Blanket model can predict the states of the Class variable, i.e. Benign versus Malignant.This test is available via Analysis | Network Performance | Target.Using our previously defined Test Set for validating our model, we obtain the following, rather encouragingresults:Breast Cancer Diagnostics with Bayesian | | 21
  22. 22. Of the 88 Benign cases of the test set, 3 were incorrectly identified, which corresponds to a false positiverate of 3.41%. More importantly though, of the 51 Malignant cases, all were identified correctly (true posi-tives) with no false negatives. The overall performance can be expressed as the Total Precision, which iscomputed as total number of correct predictions (true positives + true negatives) divided by the total num-ber of cases in the Test Set , i.e. (85 +51) ÷ 139 = 97.84%.As the selection of the Learning Set and the Test Set during the data import process is random, BayesiaLabmay learn slightly different networks based on different Learning Sets after each data import. Hence, yourown network performance evaluation could deviate from what is shown above, unless you chose the sameFixed Seed for the random number generator when you defined Data Typing during the data import proc-ess.Breast Cancer Diagnostics with Bayesian Networks22 | |
  23. 23. K-Folds Cross-ValidationTo mitigate the sampling artifacts that may occur in a one-off test, we can systematically learn networks ona sequence of different subsets and then aggregate the test results. Analogous to the original papers on thistopic, we will perform K-Folds Cross Validation, which will iteratively select K different Learning Sets andTest Sets and then, based on those, learn the networks and test their performance.The Cross Validation can then be started via Tools | Cross Validation | Targeted Evaluation | K-Folds.We use the same learning algorithm as before, i.e. the Markov Blanket, and we choose 10 as the number ofsub-samples to be analyzed. Of the total dataset of 699 cases, each of the ten iterations will create a Test Setof 69 randomly drawn samples, and use the remaining 630 as the Learning Set. This means that BayesiaLablearns one network per Learning Set and then tests the performance on the respective Test Set.Breast Cancer Diagnostics with Bayesian | | 23
  24. 24. The summary, including the synthesized results, is shown below.These results confirm the good performance of this model. The Total Precision is 97%, with a false negativerate of 2%. This means 2% of the cases were predicted as Benign, while the were actually Malignant.Breast Cancer Diagnostics with Bayesian Networks24 | |
  25. 25. Clicking Comprehensive Report produces a summary, which can also be saved in HTML format. This isconvenient for subsequent editing, as the generated HTML file can be opened and edited as a spreadsheet.Value Benign MalignantGini Index 33.95% 64.59%Relative Gini Index 98.50% 98.55%Mean Lift 1.42 2.04Relative Lift Index 99.74% 99%ValueBenign(458)Malignant(241)Benign (446) 441 5Malignant (253) 17 236ValueBenign(458)Malignant(241)Benign (446) 98.88% 1.12%Malignant (253) 6.72% 93.28%ValueBenign(458)Malignant(241)Benign (446) 96.29% 2.07%Malignant (253) 3.71% 97.93%R: 0.93817485358R2: 0.88017205588OccurrencesReliabilityPrecisionSampling Method: K-FoldsLearning Algorithm: Markov BlanketTarget: ClassRelative Gini Index Mean: 98.53%Relative Lift Index Mean: 99.37%Total Precision: 96.85%As our Markov Blanket modeling is already performing at a level comparable to the models that have beenpublished in the literature, we might be tempted to conclude our analysis at this point. However, we willattempt to see whether further performance improvements are possible.Model 2: Augmented Markov BlanketBayesiaLab offers an extension to the Markov Blanket algorithm, namely the Augmented Markov Blanket,which performs an Unsupervised Learning Algorithm on the nodes in the Markov Blanket. This allowsidentifying influence paths between the predictor variables and can potentially help improve the predictionperformance.Breast Cancer Diagnostics with Bayesian | | 25
  26. 26. This algorithm can be started via Learning | Supervised Learning | Augmented Markov Blanket.As expected, the resulting network is somewhat more complex than the standard Markov Blanket.If we save the original Markov Blanket and the new Augmented Markov Blanket under different file names,we can use Tools | Compare | Structure to highlight the differences between both. Given that the addition ofthree arcs is immediately visible, this function may appear as overkill for our particular example. However,Breast Cancer Diagnostics with Bayesian Networks26 | |
  27. 27. in more complex situation, Structure Comparison can be rather helpful, and so we will spell out the details.We choose the original network and the newly learned network as the Reference Network and the Com-parison Network respectively.Upon selection, a table provides a list of common arcs and those arcs that have been added in the Compari-son Network, which was learned with the Augmented Markov Blanket algorithm:Breast Cancer Diagnostics with Bayesian | | 27
  28. 28. Clicking Charts provides a visual representation of these differences. The additional arcs, compared to theoriginal Markov Blanket network, are now highlighted in blue. Conversely, had any arcs been deleted, thosewould be shown in red.Model 2a: PerformanceWe now proceed to performance evaluation with this new Augmented Markov Blanket network, analogousto the Markov Blanket model: Analysis | Network Performance | TargetGiven that we had originally split the dataset into a Learning Set and a Test Set, the Network Performanceevaluation is once again carried out separately on both subsets.Breast Cancer Diagnostics with Bayesian Networks28 | |
  29. 29. Interestingly, the performance on the Test Set is better than on the Learning Set. This indicates that overfit-ting is not a problem here.Breast Cancer Diagnostics with Bayesian | | 29
  30. 30. A summary for either subset can be saved by clicking Comprehensive Report. The out-of-sample Test Setreport is generally the more important one. It is shown below.Value Benign MalignantGini Index 36.52% 63.01%Relative Gini Index 99.53% 99.53%Mean Lift 1.45 1.99Relative Lift Index 99.92% 99.79%ValueBenign(88)Malignant(51)Benign (86) 86 0Malignant (53) 2 51ValueBenign(88)Malignant(51)Benign (86) 100% 0%Malignant (53) 3.77% 96.23%ValueBenign(88)Malignant(51)Benign (86) 97.73% 0%Malignant (53) 2.27% 100%OccurrencesReliabilityPrecisionTarget: ClassRelative Gini Index Mean: 99.53%Relative Lift Index Mean: 99.85%Total Precision: 98.56%R: 0.97499525394R2: 0.95061574521As with the earlier model, we repeat K-Folds Cross Validation for the Augmented Markov Blanket. Theresults are shown below, first as a screenshot and then as a spreadsheet generated via Comprehensive Re-port.Breast Cancer Diagnostics with Bayesian Networks30 | |
  31. 31. Value Benign MalignantGini Index 33.95% 64.58%Relative Gini Index 98.50% 98.55%Mean Lift 1.42 2.04Relative Lift Index 99.75% 98.99%ValueBenign(458)Malignant(241)Benign (448) 442 6Malignant (251) 16 235ValueBenign(458)Malignant(241)Benign (448) 98.66% 1.34%Malignant (251) 6.37% 93.63%ValueBenign(458)Malignant(241)Benign (448) 96.51% 2.49%Malignant (251) 3.49% 97.51%R: 0.93877413371R2: 0.88129687412OccurrencesReliabilityPrecisionSampling Method: K-FoldsLearning Algorithm: Augmented Markov BlanketTarget: ClassRelative Gini Index Mean: 98.52%Relative Lift Index Mean: 99.37%Total Precision: 96.85%Despite the greater complexity of this new network, we do not see an improvement in any of the perform-ance measures.Breast Cancer Diagnostics with Bayesian | | 31
  32. 32. Structural CoefficientUp to this point, the difference in network complexity was a only function of the choice of learning algo-rithm. We will now address the Structural Coefficient (SC), which is the only parameter adjustable across allthe learning algorithms in BayesiaLab. In essence, this parameter determines a kind of significance thresh-old, and thus it influences the degree of complexity of the induced networks.By default, this Structural Coefficient is set to 1, which reliably prevents the learning algorithms from over-fitting the model to the data. In studies with relatively few observations, the analyst’s judgment is needed fordetermining a potential downward adjustment of this parameter. On the other hand, when data sets arevery large, increasing the parameter to values higher than 1 will help manage the network complexity.Given the fairly simple network structure of the Markov Blanket model, complexity was of no concern.Augmented Markov Blanket is more complex, but still very manageable. The question is, could a morecomplex network provide greater precision without overfitting? To answer this question, we will perform aStructural Coefficient Analysis, which generates several metrics that help in making the trade-off betweencomplexity and precision: Tools | Cross Validation | Structural Coefficient AnalysisBayesiaLab prompts us to specify the range of the Structural Coefficient to be examined and the number ofiterations to be performed. It is worth noting that the Minimum Structural Coefficient should not be set to0, or even close to 0. A value of 0 would imply a fully connected network, which can take a very long timeto learn depending on the number of variables, or even exceed the memory capacity of the computer run-ning BayesiaLab.Number of Iterations determines the interval steps to be taken within the specified range of the StructuralCoefficient. Given the relatively light computational load, we choose 25 iterations. With more complexmodels, we might be more conservative, as each iteration re-learns and re-evaluates the network. Further-more, we select to compute all metrics.Breast Cancer Diagnostics with Bayesian Networks32 | |
  33. 33. The resulting report shows how the network changes as a function of the Structural Coefficient. This can beinterpreted as the degree of confidence the analyst should have in any particular arc in the structure.Breast Cancer Diagnostics with Bayesian | | 33
  34. 34. Clicking Graphs, will show a synthesized network, consisting of all structures generated during the iterativelearning process.The reference structure is represented by black arcs, which show the original network learned immediatelyprior to the start of the Structural Coefficient Analysis. The blue-colored arcs are not contained in the refer-ence structure, but they appear in networks that have been learned as a function of the different StructuralCoefficients (SC). The thickness of the arcs is proportional to the frequency of individual arcs existing in thelearned networks.More importantly for us, however, is determining the correct level of network complexity for a reliable andaccurate prediction performance while avoiding overfitting the data. We can plot several different metrics inthis context by clicking Curve.Breast Cancer Diagnostics with Bayesian Networks34 | |
  35. 35. Typically, the “elbow” of the L-shaped curve above identifies a suitable value for the Structural Coefficient(SC). More formally, we would look for the point on the curve where the second derivative is maximized.With a visual inspection, an SC value of around 0.3 appears to be a good candidate for that point. The por-tion of the curve, where SC values approach 0, shows the characteristic pattern of overfitting, which is to beavoided.We will also plot the Target’s Precision alone as a function of the SC. On the surface, the curve for theLearning Set resembles an L-shape too, but the curve moves only within roughly 2 percentage points, i.e.between 97% and 99%. For practical purposes, this means that the curve is virtually flat.Breast Cancer Diagnostics with Bayesian | | 35
  36. 36. Breast Cancer Diagnostics with Bayesian Networks36 | |
  37. 37. As a result, the Structure/Target’s Precision Ratioi.e.StructureTargets Precision⎛⎝⎜⎞⎠⎟is primarily a function of the numera-tor, i.e. the Structure, as the denominator, Target’s Precision, is nearly constant across a wide range of SCvalues, as per the graph above.If both Learning and Test Sets are available, a Validation Measure ɣ can be computed to help choose themost appropriate Structural Coefficient.This measure is based on the Test Set’s mean negative log-likelihood (returned by the network learned fromthe Learning Set) and on the variances of the negative log-likelihood of the Test Set and Learning Set (re-turned by the network learned from Learning Set).γ = µLL,Test × max(1,σLL,Test2σLL,Learning2)The range between roughly 0.3 and 0.6, i.e. the section around the minimum of the curve, suggests suitablevalues for the Structural Coefficient.Breast Cancer Diagnostics with Bayesian | | 37
  38. 38. Model 2b: Augmented Markov Blanket (SC=0.3)Given the results from the Structural Coefficient Analysis, we now wish to relearn the network with an SCvalue of 0.3. The SC value can be set by right-clicking on the background of the Graph Panel and then se-lecting Edit Structural Coefficient from the Contextual Menu, or alternatively via the menu, i.e. Edit | EditStructural Coefficient.Once we relearn the network, using the same Augmented Markov Blanket algorithm as before, we obtain amore complex network. The key question is, will this increase in complexity improve the performance orperhaps be counterproductive?Breast Cancer Diagnostics with Bayesian Networks38 | |
  39. 39. Model 2b: PerformanceWe repeat the Network Performance Analysis and generate the Comprehensive Report for the Test Set.Value Benign MalignantGini Index 36.60% 63.15%Relative Gini Index 99.75% 99.75%Mean Lift 1.45 1.99Relative Lift Index 99.96% 99.90%ValueBenign(88)Malignant(51)Benign (86) 86 0Malignant (53) 2 51ValueBenign(88)Malignant(51)Benign (86) 100% 0%Malignant (53) 3.77% 96.23%ValueBenign(88)Malignant(51)Benign (86) 97.73% 0%Malignant (53) 2.27% 100%OccurrencesReliabilityPrecisionTarget: ClassRelative Gini Index Mean: 99.75%Relative Lift Index Mean: 99.93%Total Precision: 98.56%R: 0.97908818201R2: 0.95861366815Breast Cancer Diagnostics with Bayesian | | 39
  40. 40. Secondly, we perform K-Folds Cross Validation:Value Benign MalignantGini Index 33.86% 64.42%Relative Gini Index 98.28% 98.28%Mean Lift 1.42 2.04Relative Lift Index 99.69% 99.05%ValueBenign(458)Malignant(241)Benign (447) 441 6Malignant (252) 17 235ValueBenign(458)Malignant(241)Benign (447) 98.66% 1.34%Malignant (252) 6.75% 93.25%ValueBenign(458)Malignant(241)Benign (447) 96.29% 2.49%Malignant (252) 3.71% 97.51%R: 0.94052337963R2: 0.88458422762OccurrencesReliabilityPrecisionSampling Method: K-FoldsLearning Algorithm: Augmented Markov BlanketTarget: ClassRelative Gini Index Mean: 98.28%Relative Lift Index Mean: 99.37%Total Precision: 96.71%ConclusionAll models reviewed, Model 1 (Markov Blanket), Model 2a (Augmented Markov Blanket, SC=1), Model 2b(Augmented Markov Blanket, SC=0.3), have performed at very similar levels in terms of classification per-formance. Total Precision and false positives/negatives are shown as the key metrics in the summary tablebelow.Total&PrecisionFalse&PositivesFalse&NegativesTotal&PrecisionFalse&PositivesFalse&NegativesMarkov&Blanket&(SC=1) 97.84% 3 0 96.85% 17 5Augmented&Markov&Blanket&(SC=1) 98.56% 2 0 96.85% 16 6Augmented&Markov&Blanket&(SC=0.3) 98.56% 2 0 96.71% 17 6Test&Set&(n=139) 10JFold&CrossJValidation&(n=699)SummaryReestimating these models with more observations could potentially change the results and might moreclearly differentiate the classification performance. For now, we select the Augment Markov Blanket(SC=1), and it will serve as the basis for the next section of this paper, Model Inference.Breast Cancer Diagnostics with Bayesian Networks40 | |
  41. 41. Model InferenceWithout further discussion of the merits of each model specification, we will now show how the learnedAugment Markov Blanket model can be applied in practice and used for inference. First, we need to go toValidation Mode (F5). We can now bring up all the Monitors in the Monitor Panel by selecting all thenodes (Ctrl+A) and double-clicking on any one of them. More conveniently, the Monitors can be displayedby right-clicking inside the Monitor Panel and selecting Sort | Target Correlation from the ContextualMenu.Alternatively, we can do the same via Monitor | Sort | Target Correlation.Monitors are then automatically created for all the nodes correlated with the Target Node. The Monitor ofTarget Node is placed first in the Monitor Panel, followed by the other Monitors in order of their correla-tion with the Target Node, from highest to lowest.Breast Cancer Diagnostics with Bayesian | | 41
  42. 42. Interactive InferenceFor instance, we can use now BayesiaLab to review the individual predictions made based on the model.This feature is called Interactive Inference, which can be accessed from the menu via Inference | InteractiveInference.Also, we have a choice of using either the Learning Set or the Test Set for inference. For our purposes, wechoose the Test Set.The Navigation Bar allows scrolling through each record of the test set. Record #0 can be seen below withall the associated observations highlighted in green. Given the observations shown, the model predicts aBreast Cancer Diagnostics with Bayesian Networks42 | |
  43. 43. 99.97% probability that Class is Benign (the Monitor of the Target Node is highlighted in red).Most cases are rather clear-cut, as above, with probabilities for either diagnosis around 99% or higher.However, there are a number of exceptions, such as case #11. Here, the probability of malignancy is ap-proximately 75%.Adaptive QuestionnaireIn situations, when only individual cases are under review, rather than a batch of cases from a database,BayesiaLab can provide case-by-case diagnosis support with the Adaptive Questionnaire.For a a Target Node with more than two states, the Adaptive Questionnaire requires that we define a Tar-get State. Setting the Target State allows BayesiaLab to compute Binary Mutual Information and then focusBreast Cancer Diagnostics with Bayesian | | 43
  44. 44. on the defined Target State. Technically, setting the Target State is not necessary in our particular exampleas the Target Node is binary.The Adaptive Questionnaire can be started from the menu via Inference | Adaptive Questionnaire.We can set Based on a Target State to Malignant, as we want to highlight this particular state.Furthermore, we can set the cost of collecting observations via the Cost Editor, which can be started via theEdit Costs button. This is helpful when certain observations are more costly to obtain than others.9Unfortunately, our example is not ideally suited to illustrate this feature, as the FNA attributes are all col-lected at the same time, rather than consecutively. However, one can imagine that in other contexts a physi-cian will start the diagnosis process by collecting easy-to-obtain data, such as blood pressure, before pro-ceeding to more elaborate (and more expensive) diagnostic techniques, such as performing an angiogram.Breast Cancer Diagnostics with Bayesian Networks44 | | www.bayesia.com9 Beyond monetary measures, “cost” could reflect, for instance, the degree of pain associated with a surgical procedure.
  45. 45. Once the Adaptive Questionnaire is started, BayesiaLab presents the Monitor of the Target Node (red) andits marginal probability, with the Target State highlighted. Again, as shown below, the Monitors are auto-matically ordered in the sequence of their importance, from high to low, with regard to diagnosing the Tar-get State of the Target Node.This means that the ideal first piece of evidence is Uniformity of Cell Size. Let us suppose this metric is equalto 3 (<=4.5) for the case under investigation. Upon setting this first observation, BayesiaLab will computethe new probability distribution of the Target Node, given the evidence. We see that the probability ofClass=Malignant has increased to 58.53%. Given the evidence, BayesiaLab also recomputes the ideal neworder of questions and now presents Bare Nuclei as the next most relevant question.Let us now assume that Bare Nuclei is not available for observation. We instead set the node Clump Thick-ness to Clump Thickness<=4.5.Breast Cancer Diagnostics with Bayesian | | 45
  46. 46. Given this latest piece of evidence, the probability distribution of Class is once again updated, as is the arrayof questions. The small gray arrows inside the Monitors indicate how the probabilities have changed com-pared to the prior iteration.It is important to point out that not only the Target Node is updated as we set evidence. Rather, all nodesare being updated upon setting evidence, reflecting the omnidirectional nature of inference within a Bayesiannetwork.We can continue this process of updating until we have exhausted all available evidence, or until we havereached an acceptable level of certainty regarding the diagnosis.Target Interpretation TreeAlthough its tree structure is not displayed, the Adaptive Questionnaire is a dynamic tree for seeking evi-dence. More specifically, it is a tree that applies to one specific case given its observed evidence. The TargetInterpretation Tree is a static tree that is induced from all cases. As such it provides a more general ap-proach in terms of searching for the optimum sequence of gathering evidence.Breast Cancer Diagnostics with Bayesian Networks46 | |
  47. 47. The Target Interpretation Tree can be started from the menu via Analysis | Target Interpretation Tree.Upon starting this function, we need to set several options. We define the Search Stop Criteria, and set theMaximum Size of Evidence to 3 and the Minimum Joint Probability to 1 (percent). Furthermore, we checkthe Center on State box and select Malignant from the drop-down menu. This way, Malignant will be high-lighted in each node of the to-be-generated tree.By default, the tree is presented in a top-down format.Often, it may be more convenient to change the layout to a left-to-right format via the Switch Position but-ton in the upper lefthand corner of the window that contains the tree.Breast Cancer Diagnostics with Bayesian | | 47
  48. 48. The following tree is presented in the left-to-right layout.This tree prescribes in which sequence evidence should be sought for gaining the maximum amount of in-formation towards a diagnosis. Going from left to right, we see how the probability distribution for Classchanges given the evidence set thus far.The leftmost node in the tree, without any evidence set, shows the marginal probability distribution ofClass. The bottom panel of this node shows Uniformity of Cells Size as the most important evidence to seek.Breast Cancer Diagnostics with Bayesian Networks48 | |
  49. 49. The three branches that emerge from the node represent the possible states of Uniformity of Cells Size, i.e.the hard evidence we can observe. If we set evidence analogously to what we did in the Adaptive Question-naire, we will choose the middle branch with the value Uniformity of Cell Size<=4.5 (2/3).This evidence updates the probabilities of the Target State, now predicting a 58.53% probability of Class=Malignant. At the same time we can see what is the next best piece of evidence to seek. Here, it is Bare Nu-clei, which will provide the greatest information gain towards the diagnosis of Class. The information gainis quantified with the Score displayed at the bottom of the node.The Score is the Conditional Mutual Information of the node Bare Nuclei with regard to the Target Node,divided by the cost of observing the evidence if the option Utilize Evidence Cost was checked. In our case, aswe did not check this option, the Score is equal to the Conditional Mutual Information.We can quickly verify the Score of 7.1% by running the Mapping function. First, we set the evidence onUniformity of Cell Size (<=4.5) and then run Analysis | Visual | Mapping.Breast Cancer Diagnostics with Bayesian | | 49
  50. 50. The Mapping window features drop-down menus for Node Analysis and Arc Analysis. However, we areonly interested in Node Analysis, and we select Mutual Information with the Target Node as the metric tobe displayed.The size of the nodes, beyond a fixed minimum size,10 is now proportional to the Mutual Information withthe Target Node. To see the specific values, we right-click on the background of the window and select Dis-play Scores on Nodes from the Contextual Menu.Breast Cancer Diagnostics with Bayesian Networks50 | | www.bayesia.com10 The minimum and maximum sizes can be changed via Edit Sizes from the Contextual Menu in the Mapping Window.
  51. 51. This shows us, given Uniformity of Cell Size<=4.5, the Mutual Information of Bare Nuclei with the TargetNode is 0.0711, or 7.1%. Note that the node on which evidence has already been set, i.e. Uniformity of CellSize, shows a Conditional Mutual Information of 0.So, learning Bare Nuclei will bring the highest information gain among the remaining variables. For in-stance, if we now observed Bare Nuclei>5.5 (3/3), the probability of Class=Malignant would reach 98.33%.Breast Cancer Diagnostics with Bayesian | | 51
  52. 52. Finally, BayesiaLab also reports the joint probability of each tree node, i.e. the probability that all pieces ofevidence in a branch, up to and including that tree node, would occur.This says that the joint probability of Uniformity of Cell Size<=4.5 and Bare Nuclei>5.5 is 5.32%.As opposed to this somewhat artificial illustration of a Target Interpretation Tree in the context of FNA-based diagnosis, Target Interpretation Trees are often prepared for emergency situations, such as triageclassification, in which rapid diagnosis with constrained resources is essential. We believe that our examplestill conveys the idea of “optimum escalation” in obtaining evidence towards a diagnosis.SummaryBy using Bayesian networks as the framework and BayesiaLab as the tool, we have shown a practical newmodeling and analysis approach based on the widely studied Wisconsin Breast Cancer Database.BayesiaLab can rapidly machine-learn reliable models, even without prior domain knowledge and withouthypothesis. The classification performance of the BayesiaLab-generated Bayesian network models is on parwith all studies on this topic that are published to date. Beyond the predictive performance, BayesiaLab en-ables a range of analysis and interpretation functions, which can help the researcher gain deeper domainknowledge and perform inference more efficiently.Breast Cancer Diagnostics with Bayesian Networks52 | |
  53. 53. AppendixFramework: The Bayesian Network Paradigm11Acyclic Graphs & Bayes’s RuleProbabilistic models based on directed acyclic graphs have a long and rich tradition, beginning with thework of geneticist Sewall Wright in the 1920s. Variants have appeared in many fields. Within statistics, suchmodels are known as directed graphical models; within cognitive science and artificial intelligence, suchmodels are known as Bayesian networks. The name honors the Rev. Thomas Bayes (1702-1761), whoserule for updating probabilities in the light of new evidence is the foundation of the approach.Rev. Bayes addressed both the case of discrete probability distributions of data and the more complicatedcase of continuous probability distributions. In the discrete case, Bayes’ theorem relates the conditional andmarginal probabilities of events A and B, provided that the probability of B does not equal zero:P(A∣B) =P(B∣A)P(A)P(B)In Bayes’ theorem, each probability has a conventional name:P(A) is the prior probability (or “unconditional” or “marginal” probability) of A. It is “prior” in the sensethat it does not take into account any information about  B; however, the event  B need not occur afterevent A. In the nineteenth century, the unconditional probability P(A) in Bayes’s rule was called the “ante-cedent” probability; in deductive logic, the antecedent set of propositions and the inference rule imply con-sequences. The unconditional probability P(A) was called “a priori” by Ronald A. Fisher.P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is de-rived from or depends upon the specified value of B.P(B|A) is the conditional probability of B given A. It is also called the likelihood.P(B) is the prior or marginal probability of B, and acts as a normalizing constant.Bayes theorem in this form gives a mathematical representation of how the conditional probability of eventA given B is related to the converse conditional probability of B given A.The initial development of Bayesian networks in the late 1970s was motivated by the need to model the top-down (semantic) and bottom-up (perceptual) combination of evidence in reading. The capability for bidirec-tional inferences, combined with a rigorous probabilistic foundation, led to the rapid emergence of Bayesiannetworks as the method of choice for uncertain reasoning in AI and expert systems replacing earlier, ad hocrule-based schemes.Breast Cancer Diagnostics with Bayesian | | 5311 Adapted from Pearl (2000), used with permission.
  54. 54. The nodes in a Bayesian network represent variablesof interest (e.g. the temperature of a device, the gen-der of a patient, a feature of an object, the occur-rence of an event) and the links represent statistical(informational) or causal dependencies among thevariables. The dependencies are quantified by condi-tional probabilities for each node given its parents inthe network. The network supports the computationof the posterior probabilities of any subset of vari-ables given evidence about any other subset.Compact Representation of the JointProbability Distribution“The central paradigm of probabilistic reasoning isto identify all relevant variables x1, . . . , xN in theenvironment [i.e. the domain under study], andmake a probabilistic model p(x1, . . . , xN) of their interaction [i.e. represent the variables’ joint probabilitydistribution].”Bayesian networks are very attractive for this purpose as they can, by means of factorization, compactlyrepresent the joint probability distribution of all variables.“Reasoning (inference) is then performed by introducing evidence that sets variables in known states, andsubsequently computing probabilities of interest, conditioned on this evidence. The rules of probability,combined with Bayes’ rule make for a complete reasoning system, one which includes traditional deductivelogic as a special case.” (Barber, 2012)Breast Cancer Diagnostics with Bayesian Networks54 | |
  55. 55. ReferencesAbdrabou, E. A.M.L, and A. E.B.M Salem. “A Breast Cancer Classifier Based on a Combination of Case-Based Reasoning and Ontology Approach” (n.d.).Conrady, Stefan, and Lionel Jouffe. “Missing Values Imputation -  A New Approach to Missing ValuesProcessing with Bayesian Networks,” January 4, 2012., E. A, K. A Faisal, T. Helmy, F. Azzedin, and A. Al-Suhaim. “Evaluation of Breast Cancer Tu-mor Classification with Unconstrained Functional Networks Classifier.” In The 4th ACS/IEEE Interna-tional Conf. on Computer Systems and Applications, 281–287, 2006.Hung, M. S, M. Shanker, and M. Y Hu. “Estimating Breast Cancer Risks Using Neural Networks.” Journalof the Operational Research Society 53, no. 2 (2002): 222–231.Karabatak, M., and M. C Ince. “An Expert System for Detection of Breast Cancer Based on AssociationRules and Neural Network.” Expert Systems with Applications 36, no. 2 (2009): 3465–3469.Mangasarian, Olvi L, W. Nick Street, and William H Wolberg. “Breast Cancer Diagnosis and Prognosis viaLinear Programming.” OPERATIONS RESEARCH 43 (1995): 570–577.Mu, T., and A. K Nandi. “BREAST CANCER DIAGNOSIS FROM FINE-NEEDLE ASPIRATION USINGSUPERVISED COMPACT HYPERSPHERES AND ESTABLISHMENT OF CONFIDENCE OF MA-LIGNANCY” (n.d.).Pearl, Judea. Causality: Models, Reasoning and Inference. 2nd ed. Cambridge University Press, 2009.Pearl, Judea, and Stuart Russell. Bayesian Networks. UCLA Congnitive Systems Laboratory, November2000., W. H, W. N Street, D. M Heisey, and O. L Mangasarian. “Computer-derived Nuclear FeaturesDistinguish Malignant from Benign Breast Cytology* 1.” Human Pathology 26, no. 7 (1995): 792–796.Wolberg, William H, W. Nick Street, and O. L Mangasarian. “MACHINE LEARNING TECHNIQUES TODIAGNOSE BREAST CANCER FROM IMAGE-PROCESSED NUCLEAR FEATURES OF FINENEEDLE ASPIRATES” (n.d.)., William H, W. Nick Street, and Olvi L Mangasarian. “Breast Cytology Diagnosis Via Digital Im-age Analysis” (1993).———. “Breast Cytology Diagnosis Via Digital Image Analysis” (1993). Cancer Diagnostics with Bayesian | | 55
  56. 56. Contact InformationBayesia USA312 Hamlet’s End WayFranklin, TN 37067USAPhone: +1 888-386-8383info@bayesia.uswww.bayesia.usBayesia Singapore Pte. Ltd.20 Cecil Street#14-01, Equity PlazaSingapore 049705Phone: +65 3158 2690info@bayesia.sgwww.bayesia.sgBayesia S.A.S.6, rue Léonard de VinciBP 11953001 Laval CedexFrancePhone: +33(0)2 43 49 75 69info@bayesia.comwww.bayesia.comCopyright© 2013 Bayesia S.A.S., Bayesia USA and Bayesia Singapore. All rights reserved.Breast Cancer Diagnostics with Bayesian Networks56 | |