SlideShare a Scribd company logo
1 of 50
Download to read offline
Modeling Vehicle Choice and Simulating Market Share
with Bayesian Networks
A case study about predicting the U.S. market share of the Porsche Panamera using
the Bayesia Market Simulator
Stefan Conrady, stefan.conrady@bayesia.us
Dr. Lionel Jouffe, jouffe@bayesia.com
December 18, 2010
Revised April 20, 2013
www.bayesia.us
Table of Contents
Modeling Vehicle Choice and Simulating Market Share with Bayesian Net-
works
Abstract 4
Objective 4
About the Authors 4
Stefan Conrady 4
Lionel Jouffe 5
Acknowledgements 5
Introduction 5
Bayesian Networks for Choice Modeling 6
Case Study 7
Porsche Panamera 8
Common Forecasting Practices 11
Tutorial 11
Notation 11
Data Preparation 12
Consumer Research 12
Variable Selection 12
Set of Choice Alternatives 12
Filtered Values (Censored States) 13
Data Modeling 14
Data Import 14
Missing Values 16
Discretization 17
Variable Classes and Forbidden Arcs 22
Unsupervised Learning 25
Simulation 26
Product Scenario Baseline 27
Product Scenario Simulation 29
Substitution and Cannibalization 37
Market Scenario Simulation 39
Simulating Market Share with the Bayesia Market Simulator
ii
 www.bayesia.us | www.bayesia.sg
Limitations 40
Outlook 40
Summary 40
Appendix
Utility-Based Choice Theory 42
Multinomial Logit Models 43
Stated Preference Data 43
Revealed Preference Data 43
NVES Variables 44
Framework: The Bayesian Network Paradigm 47
Acyclic Graphs & Bayes’s Rule 47
Compact Representation of the Joint Probability Distribution 48
References 49
Contact Information
Bayesia USA 50
Bayesia Singapore Pte. Ltd. 50
Bayesia S.A.S. 50
Copyright 50
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.sg
 iii
Modeling Vehicle Choice and Simulating Market
Share with Bayesian Networks
Abstract
We present a new method and the associated workflow for estimating market shares of future products
based exclusively on pre-introduction data, such as syndicated studies conducted prior to product launch.
Our approach provides a highly practical, fast and economical alternative to conducting new primary re-
search.
With Bayesian networks as the framework, and by employing the BayesiaLab and Bayesia Market Simulator
software packages, this approach helps market researchers and product planners to reliably perform market
share simulations on their desktop computers1 , which would have been entirely inconceivable in the past.
This innovative approach is explained step-by-step in a study about the introduction of the new Porsche
Panamera in the U.S. market. The results confirm that market share simulation with Bayesian networks is
feasible even in niche markets that provide relatively few observations.
We believe that making this method and the tools accessible to practitioners is an important contribution to
real-world marketing. We are confident that for many companies this approach can yield a step-change in
their forecasting ability.
Objective
This tutorial is intended for marketing practitioners, who are exploring the use of Bayesian network for
their work. The example in this tutorial is meant to illustrate the capabilities of BayesiaLab with a real-
world case study and actual consumer data. Beyond market researchers, analysts in many fields will hope-
fully find the proposed methodology valuable and intuitive. In this context, many of the technical steps are
outlined in great detail, such as data preparation and network learning, as they are applicable to research
with BayesiaLab in general, regardless of the domain.
This paper is part of a series of tutorials, which are exploring a broad range of real-world applications of
Bayesian networks.
About the Authors
Stefan Conrady
Stefan Conrady is the Managing Partner of Bayesia USA, which he co-founded in 2010. Bayesia USA serves
as the North American sales and consulting organization for France-based Bayesia S.A.S. Their mission is to
Simulating Market Share with the Bayesia Market Simulator
4 www.bayesia.us | www.bayesia.sg
1 BayesiaLab and Bayesia Market Simulator can run on a wide range of operating systems, including Windows, OS X,
Linux/Unix, etc.
promote Bayesian networks as a new research framework for knowledge discovery and reasoning within
complex domains.
Stefan studied Electrical Engineering and has extensive management experience in the fields of product
planning, marketing and analytics, working at Daimler and BMW Group in Europe, North America and
Asia. Prior to establishing Bayesia USA, he was heading the Analytics & Forecasting group at Nissan North
America.
Lionel Jouffe
Dr. Lionel Jouffe is co-founder and CEO of France-based Bayesia S.A.S. Lionel Jouffe holds a Ph.D. in
Computer Science and has been working in the field of Artificial Intelligence since the early 1990s. He and
his team have been developing BayesiaLab since 1999, and it has emerged as the leading software package
for knowledge discovery, data mining and knowledge modeling using Bayesian networks. BayesiaLab enjoys
broad acceptance in academic communities as well as in business and industry. The relevance of Bayesian
networks, especially in the context of market research, is highlighted by Bayesia’s strategic partnership with
Procter & Gamble, who has deployed BayesiaLab globally since 2007.
Acknowledgements
Strategic Vision, Inc.2 (SVI) has generously made their 2009 New Vehicle Experience Survey available as a
data source for this case study. In this context, special thanks go to Alexander Edwards, President, Automo-
tive Division of Strategic Vision.
We would also like to thank Jeff Dotson3, John Fitzgerald4 and Frank Koppelman5 for their ongoing coach-
ing and their valuable comments on this paper. However, all errors remain the responsibility of the authors.
Finally, Kenneth Train’s6 books and articles have been very helpful over the years as we explored the field of
consumer choice modeling.
Introduction
For the vast majority of businesses, market share is a key performance indicator. Market share is used as a
metric that allows comparing competitive performance independently from overall market size and its fluc-
tuations.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 5
2 www.strategicvision.com
3 Assistant Professor of Marketing, Vanderbilt University, Owen Graduate School of Management.
4 President, Fitzgerald Brunetti Productions, Inc., New York.
5 Professor Emeritus, Professor Emeritus of Civil and Environmental Engineering, Robert R. McCormick School of En-
gineering and Applied Science, Northwestern University.
6 Adjunct Professor of Economics and Public Policy, University of California, Berkeley.
In the product planning process, the expected market share is critical, along with the overall market fore-
cast, as together they define the sales volume expectation. For obvious reasons, sales volume is a key ele-
ment in most business cases.
As a result, it is critical for decision makers to correctly predict the future market shares of products not yet
developed. The task of such market share forecasts typically falls into marketing and market research de-
partments, who are mostly closely involved with understanding consumer behavior and, more specifically,
the product choices they make.
If we fully understood the consumer’s decision making process and observed all components of it, we could
simply generate a deterministic model for predicting future consumer choices. However, we do not and it is
obvious that many elements contributing to a consumer’s purchase decision are inherently unobservable.
Despite our limited comprehension of the true human choice process, there are a number of tools that still
allow modeling consumer choice with what is observable, and accounting for what will remain unknow-
able. In this context, and based on the seminal works of Nobel-laureate Daniel McFadden7, choice modeling
has emerged as an important tool in understanding and simulating consumer choice.
Such choice models serve a representation of the “real world” and thus become, what Judea Pearl likes to
call “oracles” that allow us to “deliberately reason about the consequences of actions we have not yet
taken.”8
Bayesian Networks for Choice Modeling
Using Bayesian networks9 as the general framework for modeling a domain or system has many advantages,
which Darwiche (2010) summarizes as follows:
“Bayesian networks provide a systematic and localized method for structuring probabilistic information
about a situation into a coherent whole […]”
“Many applications can be reduced to Bayesian network inference, allowing one to capitalize on Bayesian
network algorithms instead of having to invent specialized algorithms for each new application.”
Given the very attractive properties of Bayesian networks for representing a wide range of problem do-
mains, it seems appropriate applying them for choice modeling too. In particular, the BayesiaLab software
package has made it very convenient to automatically machine-learn fairly large and complex Bayesian net-
works from observational data.
Simulating Market Share with the Bayesia Market Simulator
6 www.bayesia.us | www.bayesia.sg
7 Daniel McFadden received, jointly with James Heckman, the 2000 Nobel Memorial Prize in Economic Sciences;
McFadden’s share of the prize was “for his development of theory and methods for analyzing discrete choice”.
8 A recurring quote from Judea Pearl’s many lectures on causality.
9 A Bayesian network is a graphical model that represents the joint probability distribution over a set of random vari-
ables and their conditional dependencies via a directed acyclic graph (DAG). See the appendix for a brief introduction.
Beyond the convenience and speed of estimating Bayesian networks with BayesiaLab, there are three fun-
damental differences in modeling consumer choice with Bayesian networks compared to traditional discrete
choice models.10
Whereas utility-based choice models, such as multinomial logit models (MNL), will “flatten” the vector of
attribute utilities into a single scalar value, Bayesian networks do not inherently restrict all the dimensions
relating to choice. For example, learning a Bayesian network from observed vehicle choices might reveal
that fuel economy and vehicle price are subject to tradeoff, while safety might be a nonnegotiable basic re-
quirement for the consumer. Correctly recognizing such dynamics are obviously critical for making predic-
tions about future consumer choices.
Bayesian networks are nonparametric and, therefore, do not require the specification of a functional form.
No assumptions need to made regarding the form of links between variables. Thus, potentially nonlinear
patterns are not an issue for model estimation or simulation.
Bayesian networks are inherently probabilistic, and, as such, there is no need to specify an error term. In a
traditional choice, an error term would be needed model to make it non-deterministic.
In BayesiaLab all computations are natively discrete and thus no transformation functions, such as logit or
probit, are needed. Given that we are dealing with discrete consumer choices, this all-discrete approach is an
advantage.
For our case study, we use BayesiaLab 5.0 Professional Edition to learn a Bayesian network from consumer
choices in the form of stated preference (SP) or revealed preference (RP) data.11 ,12 The learned Bayesian
network allows us to compute the posterior probability distribution in each choice situation, including hy-
pothetical product alternatives (and even hypothetical consumers). As a result, we obtain a choice probabil-
ity as a function of product and consumer attributes.
In order to obtain a product’s projected market share, we then need to simulate choice probabilities across
all product scenarios and across all individuals in the population under study. For this specific purpose,
Bayesia S.A.S. has developed the Bayesia Market Simulator, which uses the Bayesian networks generated by
BayesiaLab. Both tools will play a central role in this case study.
Case Study
To illustrate the entire market share estimation process with Bayesian networks, we have derived a case
study from the U.S. auto industry. More specifically, we will model consumer choice behavior in the high-
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 7
10 A very brief overview about utility-based choice models is provided in the appendix.
11 The properties of Stated Preference (SP) and Revealed Preference (RP) data are explained in the appendix.
12 Although we focus here exclusively on machine-learning consumer behavior, within the BayesiaLab framework we
can also utilize expert knowledge about consumer behavior. For instance, vehicle dealers and their salespeople will have
extensive knowledge about how consumer behave in the showroom. A special Knowledge Elicitation module in
BayesiaLab can formally capture such expertise and build a new Bayesian network from it or augment an existing one.
Knowledge Elicitation with BayesiaLab will be the subject of a separate tutorial to be published in the near future.
end vehicle market based on 2009 survey data. This is an interesting point in time as it precedes the launch
of the new Porsche Panamera in model year 2010 (MY 2010), which will be the focus of our study.
Porsche Panamera
After the highly successful Cayenne, a four-door luxury SUV, the Panamera is Porsche’s second vehicle with
four doors. Clearly influenced by the legendary 911’s styling, the Panamera offers sports-car looks and per-
formance while comfortably accommodating four passengers. It enters a segment with well-established con-
tenders, such the Mercedes-Benz S-Class13 , the BMW 7-series14 and the Audi A815 , shown below in that
order.
Simulating Market Share with the Bayesia Market Simulator
8 www.bayesia.us | www.bayesia.sg
13 MY 2010 shown
14 MY 2009 shown
15 MY 2009 shown
Beyond these traditional premium sedans, there are a number of less conventional products that one can
assume to be in the Panamera’s competitive field. The coupe-like Mercedes-Benz CLS16 would presumably
fall into this category.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 9
16 MY 2010 shown
Finally, the new Panamera may draw customers away from Porsche’s own product offerings, such as the
Cayenne17, an effect that is often referred to as “product substitution” or “product cannibalization.”
It is not our intention to speculate about potential product interactions, but rather to attempt learning from
revealed consumer behavior in a very formal way with Bayesian networks.
In order not to prematurely restrict our consumer choice set, we have defined a broad set of competitors for
our purposes and included all non-domestic luxury vehicles18 (including Light Trucks) priced above
$75,000.19
What was certainly a very real task for Porsche’s product planning team in recent years, i.e. predicting the
Panamera market share, now becomes the topic of our case study and tutorial. Our objective is to predict
Simulating Market Share with the Bayesia Market Simulator
10 www.bayesia.us | www.bayesia.sg
17 MY 2009 shown
18 We followed the SVI segmentation and included “Luxury Car”, “Premium Coupe”, “Premium Convertible/Roadster”
and “Luxury Utility” in our selection.
19 The $75,000 threshold was chosen as it marks the lower end of the Panamera price range.
what market share the Panamera will achieve without conducting any new research, strictly using RP data
from before the product launch.
Common Forecasting Practices
Although we have no knowledge of the specific forecasting methods at Porsche, we know from industry
experience that volume and market share forecasts are often determined through a long series of negotia-
tions20 between stakeholders, typically with an optimistic marketing group on one side and a skeptical CFO
on the other. While expert consensus may indeed be a reasonable heuristic for business planning, the lack of
forecasting formalisms is often justified by saying that forecasting is at least as much art as it is science.
The authors believe strongly that there is great risk in relying too heavily on “art”, which is inherently non-
auditable, and have thus been pursuing easily tractable, but scientifically sound methods to support manage-
rial decision making, especially in the context of forecasting. With this in mind, this very formal and struc-
tured forecasting exercise was consciously chosen as the topic of the tutorial.
Tutorial
In this tutorial, we will explain each step from data preparation to market share simulation using Bayesia-
Lab and Bayesia Market Simulator, according to the following outline:
• Data preparation (external)
• BayesiaLab:
• Data import
• Data modeling
• Baseline product scenario generation (external)
• Bayesia Market Simulator:
• Network import
• Definition of scenarios
• Market share simulation
Notation
To clearly distinguish between natural language, software-specific functions and study-specific variable
names, the following notation is used:
BayesiaLab and Bayesia Market Simulator functions, keywords, commands, etc., are shown in bold type.
Variable/node names are capitalized and italicized.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 11
20 As an interesting aside, these negotiations are usually Markovian in nature, i.e. the starting point of today’s negotia-
tion only depends on the outcome of the previous negotiation.
Data Preparation
Consumer Research
This tutorial utilizes the 2009 New Vehicle Experience Survey, a syndicated study conducted annually by
Strategic Vision, Inc., which surveys new vehicle buyers in the U.S. This study is widely used in the auto
industry, and it serves one of the primary market research tools. NVES contains over 1,000 variables and
close to 200,000 respondent records. In large auto companies, hundreds of analysts typically have access to
NVES, most often through the mTAB interface provided by Productive Access, Inc. (PAI).21
Variable Selection
Compared to traditional statistical models, Bayesian networks require much less “care” in terms of variable
selection as overparameterization is generally not an issue. Although we could easily start with all 1,000+
variables, for expositional clarity we will initially select only about 50 variables22 from the following cate-
gories, which we assume to capture relevant characteristics of both the consumer and the product:
Vehicle/product attributes, e.g. brand, segment, number of cylinders, transmission, drive type, etc.
Consumer demographics, e.g. age, income, gender, etc.
Vehicle-related consumer attitudes, e.g. “I want to look good when driving my vehicle”, “I want a basic,
no-frills vehicle that does the job,” etc.
Set of Choice Alternatives
Beyond variable selection, we must also define the set of choice alternatives and assume which vehicles a
potential Panamera customer would consider. Not only that, but we also need to make sure that all choice
alternatives for the Panamera’s choice alternatives are included. For instance, if we included the Porsche
Cayenne in the choice set, then the Mercedes-Benz M-Class and the BMW X5 should be included too, and
so on. One might argue that the vehicle purchase might be an alternative to a kitchen renovation or the pur-
chase of a boat. Expert knowledge is clearly required at this point as to how far to expand the choice set.
Furthermore, SVI’s NVES can also help us in this regard as it contains questions about what vehicles actual
buyers did consider and which vehicles they disposed in the context of their most recent purchase.23
As mentioned in the case study introduction, we included “Luxury Car”, “Premium Coupe”, “Premium
Convertible/Roadster” and “Luxury Utility”24 in the choice set and we further restricted it by excluding all
domestic vehicles and vehicles priced below $75,000. For this segment of assumed Panamera competitors,
we have approximately 1,200 unweighted observations in the 2009 NVES, which, on a weighted basis, re-
flect approximately 25,000 vehicles purchased in 2009.
Simulating Market Share with the Bayesia Market Simulator
12 www.bayesia.us | www.bayesia.sg
21 www.paiwhq.com
22 A list of all variables used is given in the appendix. It should be noted that even 50 variables would create a major
computational challenge with MNL models.
23 Martin Krzywinski’s visualization tool, Circos, is highly recommended for the interpretation of cross-shopping behav-
ior: www.mkweb.bcgsc.ca/circos/
24 According to SVI’s segment definition.
Filtered Values (Censored States)
Although we can be less rigorous regarding the maximum number of variables in BayesiaLab, we still need
to be conscious of the information contained in them.
For instance, we need to distinguish unobserved values from non-existing values, although at first glance
both appear to be “simple” missing values in the database. BayesiaLab has a unique feature that allows
treating non-existing values as Filtered Values or Censored States.
To explain Filtered Values, we need to resort to an automotive example from outside our specific study. We
assume that we have two questions about trailer towing. We first ask, “do you use your vehicle for tow-
ing?”, and then, “what is the towing weight?” If the response to the first question is “no”, then a value for
the second one cannot exist, which in BayesiaLab’s nomenclature is a Filtered Value or Censored State. In
this case, we actually must not impute a value for towing weight; instead a Filtered Value code will indicate
this special condition.
On the other hand, a respondent may answer “yes”, but then fail to provide a towing weight. In this case, a
true value for the towing weight exists, but we cannot observe it. Here, it is entirely appropriate to impute a
missing value as we will explain as part of the Data Import procedure.
To indicate Filtered Values to BayesiaLab, we will need to apply a study-specific logic and recode the rele-
vant variables in the original database. Most statistical software packages have a set of functions for this
kind of task.
For example, in STATISTICA this can be done with the Recode function.
Alternatively, this recoding logic can also be expressed with the following pseudo code:
IF towing=yes THEN towing weight=unchanged
IF towing=no THEN towing weight=FV (Filtered Value)
A simple Excel function will achieve the same, and it is assumed that the reader can implement this without
further guidance.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 13
Although Filtered Values are very important in many research contexts, hence the emphasis here, our case
study does not require using them.
Data Modeling
Data Import
To start the analysis with BayesiaLab, we first import the database, which needs to be formatted as a CSV
file.25 With Data>Open Data Source>Text File, we start the Data Import wizard, which immediately pro-
vides a preview of the data file.
The table displayed in the Data Import wizard shows the individual variables as columns and the respon-
dent records as rows. There are a number of options available, such as for Sampling. However, this is not
necessary in our example given the relatively small size of the database.
Clicking the Next button prompts a data type analysis, which provides BayesiaLab’s best guess regarding
the data type of each variable.
Furthermore, the Information box provides a brief summary regarding the number of records, the number
of missing values, filtered states, etc.
Simulating Market Share with the Bayesia Market Simulator
14 www.bayesia.us | www.bayesia.sg
25 CSV stands for “comma-separated values”, a common format for text-based data files. As an alternative to this im-
port format, BayesiaLab offers a JDBC connection, which is practical when accessing large databases on servers.
For this example, we will need to override the default data type for the Unique Identifier variable as each
value is a nominal record identifier rather than a numerical scale value. We can change the data type by
highlighting the Unique Identifier column and clicking the Row Identifier check box, which changes the
color of the Unique Identifier column to beige.
Although it is not imperative to maintain a Row Identifier, and we could instead assign the Not Distributed
status to the Unique Identifier variable, it can be quite helpful for finding individual respondent records at a
later point in the analysis.
As the respondent records in the NVES survey are weighted, we need to select the Weight by clicking on the
Combined Base Weight variable, which will turn the column green.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 15
Missing Values
In the context of data import, it is important to point out how missing values are treated in BayesiaLab. The
native, automatic processing of missing values reveals a particular strength of BayesiaLab.
In traditional statistical analysis, the analyst has to choose from a number of methods to handle missing
values in a database, but, unfortunately, many of them have serious drawbacks. Perhaps the most common
method is case-wise deletion, which simply excludes records that contain any missing values. Casually
speaking, this means throwing away lots of good data (the non-missing values) along with the bad (the
missing values). Another method is means-imputation, by which any missing value is filled in with the vari-
able’s mean. Inevitably, this reduces the variance of the variable and thus has an impact on its summary
statistics, which is clearly undesirable considering the intended analysis. In the case of discrete distributions,
means-imputation typically also introduces a bias. There are other, better techniques, which typically de-
mand significant computational effort and thus often turn out like a labor-intensive standalone project
rather than being just a preparatory step.
Without going into too much detail at this point, BayesiaLab can estimate all missing values given the
learned network structure using the Expectation Maximization (EM) algorithm. As a result, we obtain a
complete database without “making things up.” In traditional statistics, the equivalent would be to say that
neither the mean nor the variance of the variables is affected by the imputation process.
Continuing in our data import process, the next screen provides options as to how to treat the missing val-
ues. Clicking the small upside-down triangle next to the variable names brings up a window with key statis-
tics of the selected variable, in this case Age Bracket.
The very basic functions of filtering, i.e. case-wise deletion, and mean/modal value imputation are available.
However, at this point, we can take advantage of BayesiaLab’s advanced missing values processing algo-
rithms. We will select Dynamic Completion, which will continuously “fill in” and “update” the missing val-
ues according to the conditional distribution of the variable, as defined by the current structure of the net-
works. However, as our network is not yet connected and hence does not have a structure, BayesiaLab will
Simulating Market Share with the Bayesia Market Simulator
16 www.bayesia.us | www.bayesia.sg
draw from the marginal distribution of each variable to “tentatively” establish placeholder values for each
missing value.
A screenshot from STATISTICA, where we have done most of the preprocessing, shows the marginal distri-
bution of the Age Bracket variable in the form of a histogram.26
The missing Age Bracket values will be drawn from this marginal distribution and are used as placeholders
until we can use the structure of the Bayesian network to re-estimate our missing values. As Dynamic Com-
pletion implies, BayesiaLab performs this on a continuous basis in the background, so at any point we
would have the best possible estimates for the missing values, given the current network structure.
Discretization
The next step is the Discretization and Aggregation dialogue, which allows the analyst to determine the type
of discretization that must be performed on all continuous variables.27 We will use the Purchase Price vari-
able to explain the process. Highlighting a variable will show the default discretization algorithm while the
graph panel is initially blank.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 17
26 The normal curve in the histogram is just for illustration purposes. BayesiaLab always uses the actual discrete distri-
bution, not a parametric approximation.
27 BayesiaLab requires discrete distributions for all variables.
By clicking on the Type drop-down menu, the choice of discretization algorithms appears.
Selecting Manual will show a cumulative graph of the Purchase Price distribution, and we can see that it
ranges from $75,000 to $180,000.28
Simulating Market Share with the Bayesia Market Simulator
18 www.bayesia.us | www.bayesia.sg
28 $75,000 was previously selected as the lower boundary for this particular vehicle segment. $180,000 was the highest
reported price in NVES.
We could now manually select binning thresholds by way of point-and-click directly on the graph panel.
This might be relevant if there were government regulations in place with specific vehicle price thresholds.29
For our purposes, however, we want to create price categories that are meaningful in the context of our ve-
hicle segment and five bins may seem like a reasonable starting point.
Clicking Generate Discretization will prompt us to select the type of discretization and the number of de-
sired intervals. Without having a-priori knowledge about the distribution of the Price variable, we may
want to start with the Equal Distances algorithm.
The resulting view shows the generated intervals, and, by clicking on the interval boundaries, we can see the
percentage of cases falling into the adjacent intervals.
We learn from this that our bottom two intervals contain 89% of the cases, whereas the top two intervals
contain just under 5% of the cases. This suggests that we may not have enough granularity to characterize
the bulk of the market towards the bottom end of the price spectrum. Perhaps we also have too few cases
within the top two intervals. So we will generate a new discretization, now with four intervals, and select
KMeans as the type this time.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 19
29 The now-expired luxury tax for passenger cars in the U.S. would be an example for such a policy.
The resulting bins appear much more suitable to describe our domain.
We will proceed similarly with the only other continuous variable in the database, i.e. Age Bracket.
Clicking Finish completes the import process, and 49 variables (columns) from our database are now shown
as blue nodes in the Graph Panel, which is the main window for network editing.
Note
For choosing discretization algorithms beyond this
example, the following rule of thumb may be helpful:
• For supervised learning, choose Decision Tree.
• For unsupervised learning, choose, in the order of
priority, K-Means, Equal Distances or Equal
Frequencies.
Simulating Market Share with the Bayesia Market Simulator
20 www.bayesia.us | www.bayesia.sg
The six nodes on the far left column reflect product attributes (green); the second-from-left column shows
ten demographic attributes (yellow) and all remaining nodes to the right represent 33 vehicle-related atti-
tudes (red). This initial view represents a fully unconnected Bayesian network.
Also, to simplify our nomenclature, we will combine the demographic attributes (yellow) and the vehicle-
related attitudes (red) and refer to them together as “Market” variables (now all red).
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 21
Variable Classes and Forbidden Arcs
One is now tempted to immediately start with Unsupervised Learning to see how all these variables relate to
each other. However, there are two reasons why we need to introduce another step at this point:
Our mission is to model the interactions between products variables and market variables so we can see the
consumer response to products. For instance, we are more interested in learning P(Transmission= “Manual”
| Attitude = “Driving is one of my favorite things”) than we are in P(Age < 45 | Number of children under 6
= 2). Hence we focus the learning algorithm on the area of interest, i.e. product attributes vis-à-vis market
attributes.
We must not learn the dependencies between the product variables themselves because they would simply
reflect today’s product offerings and their contingencies, e.g. P(Vehicle Segment=“4-door sedan” |
Brand=“Porsche”)=0. We do want to understand what is available today, but we certainly do not want to
encode today’s product scenarios as constraints in the network. Instead, we want to be able to introduce
new scenarios, which are not available today.
To focus learning in a specific area, we need to take an indirect approach and tell BayesiaLab “what not to
learn.” So, to prevent the algorithm from learning the product-to-product variable relationships, we will
“forbid” such arcs.
We first create a Class by highlighting all product nodes then right-clicking them. From the menu, we then
select Properties>Classes>Add.
Simulating Market Share with the Bayesia Market Simulator
22 www.bayesia.us | www.bayesia.sg
When prompted for a name, we can choose something descriptive, so we give this new Class the label
“Product”.
Having introduced this Class of node, we can now very easily manage Forbidden Arcs. More specifically, we
want to make all arcs within the Class Products forbidden. A right-click anywhere on the Graph Panel
opens up the menu from which we can select Edit Forbidden Arcs.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 23
In the Forbidden Arc Editor, we can select the Class Product both as start and end.
We now repeat the above steps and also create Forbidden Arcs for the Market variables.
As a result, these Forbidden Arc relationships will appear in the Forbidden Arc Editor and will remain there
unless we subsequently choose to modify them.
Simulating Market Share with the Bayesia Market Simulator
24 www.bayesia.us | www.bayesia.sg
We are also reminded about the presence of Forbidden Arcs by the symbol in the lower right corner of the
screen.
Unsupervised Learning
Now that the learning constraints are in place, we continue to learn the network by selecting Learning>As-
sociation Discovering>EQ.30
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 25
30 EQ is one of the unsupervised learning algorithms implemented in BayesiaLab. Koller and Friedman (2009) provide a
comprehensive introduction to learning algorithms.
The resulting network may appear somewhat unwieldy at first glance, but upon closer inspection we can see
that arcs exist only between Product variables (green) and Market variables (red), which is precisely what
we intended by establishing Forbidden Arcs.
However, we will not analyze this structure any further, but rather use it solely as a statistical device to be
used in the Bayesia Market Simulator. We simply need to save the network in its native xbl file format, so
the Bayesia Market Simulator can subsequently import it.
Simulation
With the Bayesia Market Simulator we have the ability to simulate “alternate worlds” for both the Product
variables as well as for the Market variables. In most applications, however, marketing analysts will want to
primarily study new Product scenarios assuming the Market remains invariant, meaning that consumer
demographics and attitudes remain the same.31
It will be the task of the analyst to define new product scenarios, which will need to include all products
assumed to be in the marketplace for the to-be-projected timeframe, in our case 2010.32 As many products
carry over from one year to the next, e.g. from model year 2010 to model year 2011, it is very helpful to use
Simulating Market Share with the Bayesia Market Simulator
26 www.bayesia.us | www.bayesia.sg
31 The year-to-year invariance assumption of the market has been challenged by many marketing executives during the
most recent recession. In this context, many media headlines also proclaimed a paradigm shift in consumer behavior.
The authors have believed - then as well as now - that more has remained the same than has changed in terms of con-
sumer attitudes.
32 For expositional simplicity, we make no distinction between model year and calendar year.
the currently available products as a baseline scenario, upon which changes can be built. Quite simply, we
need to take inventory of the product landscape today. In the current version of Bayesia Market Simulator
this step is yet not automated, so a practical procedure for generating the baseline scenario is described in
the following section.
Product Scenario Baseline
The idea is that all available product configurations were manifested in the market in 2009 and thus cap-
tured in the 2009 NVES.33
It still requires careful consideration as to how many Product variables should be included to generate the
baseline product scenario. We want to create a type of coordinate system that allows us to identify products
through their principal characteristics. For instance, the following attributes would uniquely define a
“Mercedes-Benz S550 4Matic”:
Brand=“Mercedes-Benz”
Engine Type=“V8”
Drive Type=“AWD”
Transmission=“Automatic”
Segment=“High Premium”34
Price=“>$85,795 AND <= $99,378”
Relating consumer attributes and attitudes to these individual product attributes, rather than to the vehicle
as a whole, will then allow us to construct hypothetical products during our simulation. To stay with the
Mercedes example, we could define a new product by setting the engine type to “V6” and changing the
price to “<$85,795”.
It is easy to imagine how one can get the number of permutations to exceed the number of consumers. For
instance, in the High Premium segment, we could further differentiate between short wheelbase and long
wheelbase versions, which would increase the number of baseline product scenarios. We want to find a rea-
sonable balance between product granularity and the ratio of consumers to product scenarios, although we
cannot provide the reader with a hard-and-fast rule.
Pricing is obviously a very important part of the product scenario configuration and here we are confronted
with the reality that no two customers pay exactly the same for the identical product, and the survey data
makes this very evident. Furthermore, there are numerous product features outside our “coordinate sys-
tem”, e.g. an optional $6,000 high-end audio system, that would materially affect the price point of an indi-
vidual vehicle, but which would not move the vehicle into a different category from a consumer’s perspec-
tive. With options, an S550 can easily reach a price of over $100,000. Still we would want such a high-end
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 27
33 In our example, we judge this to be a reasonable simplification, even though a small number of automobiles at the
very top end of the market, e.g. the Rolls-Royce Phantom, may not be captured in the survey.
34 Using the Strategic Vision segmentation nomenclature, “High Premium” defines a large four-door luxury sedan.
S550 to be grouped with the standard S550. Thus, it is important to define reasonable price brackets that
cover the price spectrum of each vehicle and minimize model fragmentation.
During the Data Import stage, BayesiaLab has discretized all continuous numerical values, including Price,
and created discrete states. If these discrete states are adequate considering the price positioning and price
spectrum of the vehicles under study, we can now leverage this existing binning for generating all current
product scenarios and select Data>Save Data.
In the subsequently appearing dialogue box, we need to select Use the States’ Long Name. It is important
that Use Continuous Values is not checked; otherwise we will lose the discretized states of the Price vari-
able.
This will export all variables and all records, including values from previously performed missing value im-
putations. The output will be in a semicolon-delimited text file, which can be easily imported into Excel or
any statistical application, such as SPSS or STATISTICA. The purpose of loading this into an external appli-
cation is to manipulate the database to extract the unique product combinations available in the market.
In Excel this can be done very quickly by deleting all columns unrelated to the product configuration, which
leaves us with just the product attributes.
Simulating Market Share with the Bayesia Market Simulator
28 www.bayesia.us | www.bayesia.sg
In Excel 2010 (for Windows) and Excel 2011 (for Mac), there is a very convenient feature, which allows to
quickly remove all duplicates, which is exactly what we want to achieve. We want to know all the unique
product configurations currently in the market.
This leaves use with a table of approximately 100 unique product scenario combinations available at the
time of the survey.
To make these unique product scenarios available for subsequent use in the Bayesia Market Simulator, we
need to save the table as a semicolon-delimited CSV file. This is important to point out as most programs
will save CSV files by default as comma-delimited files.
Product Scenario Simulation
Now that we have the Bayesian network describing the overall market (as an xbl file) as well as the baseline
product scenarios (as a csv file), we can proceed to open the Bayesia Market Simulator.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 29
Clicking File>Open will prompt us to open the xbl network file we previously generated with BayesiaLab.
Upon loading we will see the principal interface of the Bayesia Market Simulator. On the left panel, all
nodes of the network appear as variables. We will now need to separate all variables into Market Variables
and Scenario Variables by clicking the respective arrow buttons. In our case, the aptly named Market vari-
ables are the Market Variables in BMS nomenclature and Product variables are the Scenario Variables.
Simulating Market Share with the Bayesia Market Simulator
30 www.bayesia.us | www.bayesia.sg
All variables must be allocated before being able to continue to Scenario Editing. This also implies that
Product variables, which are not to be included as Scenario Variables, must be excluded from the Bayesian
network file. If necessary, we will return to BayesiaLab to make such edits
As we are working with RP data, every record in our database reflects one vehicle purchase, i.e. “reveals”
one choice, and therefore we need to leave the Target Variable and Target State fields blank. These fields
would only be used in conjunction with SP data, which includes a variable indicating acceptance versus re-
jection.
Clicking Scenario Editing opens up a new window. We can now manually add any product scenarios we
wish to simulate. Given the potentially large number of scenarios, it will typically be better to load the base-
line product scenarios, which were saved earlier.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 31
We can do that by selecting Offer>Import Offers.
We now select to open the semicolon-delimited CSV file with the baseline product scenarios. It is very im-
portant that the CSV file is formatted precisely as specified, for instance, without any extra blank lines.
In case there are any import issues, it can be helpful to review the CSV file in a text editor and to visually
inspect the formatting.
Simulating Market Share with the Bayesia Market Simulator
32 www.bayesia.us | www.bayesia.sg
Upon successful import, all baseline product scenarios will appear in the Scenario Editing dialogue.
The analyst can now add any new product scenarios or delete those products, which are no longer expected
to be in the market.35 By clicking Add Offer an additional scenario will be added at the bottom of the prod-
uct scenario list. In the case of long product scenario lists, this may require scrolling all the way down.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 33
35 To maintain expositional simplicity, we have added all Panamera versions for the entire year 2010 and not changed
any other product scenarios. It should be pointed out that the V6 version of the Porsche Panamera was introduced only
in mid-2010. BMW has also launched an additional six-cylinder version of the 7-series as well as AWD variants, which
are not reflected in the simulation. Finally, Jaguar has released a new XJ in 2010, while that year marked the runout of
the old-generation Audi A8.
Clicking on the product attributes of any scenario prompts drop-down menus to appear with the available
attribute states, e.g. RWD or AWD.36 This also allows to change attributes of existing products, according
to the analysts requirements.
For our case study, we will add the following versions of the Panamera as new product scenarios:
Panamera (V6, RWD)
Panamera 4 (V6, AWD)
Panamera S (V8, RWD)
Panamera 4S (V8, AWD)
Panamera Turbo (V8 Turbo, RWD)
To characterize all of them as large 4-door luxury sedans, which is the key distinction versus previous Por-
sche products, we will assign the “High Premium” attribute to them.
Simulating Market Share with the Bayesia Market Simulator
34 www.bayesia.us | www.bayesia.sg
36 RWD and AWD stands for rear-wheel drive and all-wheel drive respectively
Once this is completed, we need to obtain a database that represents the consumer base, on which these new
product scenarios will be “tried out”. This can either be done by associating the original database, from
which the network was learned, or by creating a new, artificial one that reflects the joint probability distri-
bution of the learned Bayesian network.
The latter can be achieved by selecting Database>Generate.
It is up to the analyst to determine the size of the database to be generated. Although there is no fixed rule,
too small of a database will limit the observability of products with a very small market share.
Alternatively, we can also associate the original database, which contains the survey responses. In our case,
the original database contains 1,203 records, which is very reasonable in terms of computational require-
ments.
Once a database is associated, clicking the Simulation button will start the market share estimation process.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 35
With the given complexity of our network and around 100 product scenarios, the simulation should take no
longer than 30 seconds on a typical desktop computer.
Upon completion, the simulation results will appear in the form of a pie chart and a table. One can go back
and review the scenarios by clicking the Scenario Editing button.
Simulating Market Share with the Bayesia Market Simulator
36 www.bayesia.us | www.bayesia.sg
The aggregated simulated market shares can also be copied from the results table and pasted into Excel or
any other application for further editing and presentation purposes. An example is provided below, showing
the simulated market shares of the brands under study in the High Premium segment.
1%
21%
3%
10%
53%
12%
Simulated High Premium Market Shares ($75,000+)
Audi
BMW
Jaguar
Lexus
Mercedes
Porsche
As can be seen from the results, the Porsche Panamera’s predicted market share appears to be compatible
with the reported running rate for calendar year 2010, which was available at the time of writing. Unfortu-
nately, we do not know how this compares to Porsche’s expectations, but the Panamera seems to be quite
successful overall.
Substitution and Cannibalization
The fully simulated database can also be saved as a semicolon-delimited CSV file, which will allow review-
ing the choice probability for each product scenario by individual consumer in a spreadsheet.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 37
We can literally examine the new, simulated choices record-by-record and see which customers have made
the switch to the Panamera. Applying conditional formatting to the spreadsheet can also be very helpful.
The above screenshot, for example, shows a selection of actual Mercedes buyers, who would either consider
or pick the Porsche Panamera in this simulation. High choice probabilities are shown in shades of red, while
near-zero probabilities are depicted in dark blue.
It is equally interesting to examine which Porsche buyers would pick the Panamera over their current vehicle
choice.
Simulating Market Share with the Bayesia Market Simulator
38 www.bayesia.us | www.bayesia.sg
Not surprisingly, our simulation suggests high probabilities of Panamera choice for several current Cayenne
owners. One is tempted to take this a step further and calculate a rate of cannibalization. In this particular
survey, however, the sample size is too small to attempt doing so. Otherwise, such a computation would be
simple arithmetic.
Market Scenario Simulation
Although experimenting with product scenarios is expected to be the primary use of the Bayesia Market
Simulator, it is also possible to change the market scenarios.
For example, this can be used to simulate the impact of policy changes. One could hypothesize that legisla-
tion would prohibit or severely penalize ownership of vehicles of a certain size or of a specific engine type in
urban areas.37
Upon editing the market segments, the simulation can be rerun to obtain the new market share results.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 39
37 Given the draconian restrictions on motorists in Central London, this example is presumably not very far-fetched.
Limitations
This approach can simulate product and market scenarios consisting of variations of configurations, which
can be observed with sufficient sample today. However, the impact of entirely new technologies cannot be
simulated on this basis. As a result, projecting the market share of the all-electric Nissan Leaf38 would not
possible, whereas estimating the share of a hypothetical three-row BMW crossover vehicle would be feasi-
ble. In all cases, it requires the analyst’s expert knowledge and judgment to determine the adequacy and
equivalency of product attributes observable today.
Outlook
There exist several natural extensions to the presented methodology, however, it would go beyond the scope
of this paper to present them. A brief summary shall suffice for now, and we will go into greater detail in
forthcoming case studies in this series:
Beyond learning from data, we can use expert knowledge to create or augment Bayesian networks. Bayesia-
Lab offers a Knowledge Elicitation module, which formally captures expert knowledge and encodes it in a
Bayesian network. In the absence of market data, this is an excellent approach to have decision makers col-
lectively (and formally correct) reason about future states of the world.
We can extend the concept of product attributes to consumers’ product satisfaction ratings. This will allow
estimating the market share impact as a function of changes in consumer ratings. For instance, an auto-
maker could reason about the volume impact from a vehicle facelift, which is expected to raise the con-
sumer rating of “styling”.
The product cannibalization or substitution rate can be estimated based on the simulated choice behavior,
given that there is sufficient sample size. So, for most mainstream products, this seems to be realistic.
With the ability to study consumer choice at the model level, we can also aggregate these results to the seg-
ment level. Alternatively, using a less granular approach, we can model the entire market at the segment and
brand level, which would allow studying market changes at a larger scale.
Beyond simulating “hard” policy changes affecting the market, e.g. excluding a product class from a certain
geography, we can also use BayesiaLab to simulate new populations with small changes in average con-
sumer attitudes versus the originally surveyed population. For instance, such an artificially modified popula-
tion could be more environmentally conscious, and one could apply opinions prevalent on the West Coast
to the whole country. Bayesia Market Simulator can then generate new market shares based on these new
hypothetical market conditions.
Summary
BayesiaLab and Bayesia Market Simulator are unique in their ability to use Bayesian networks for choice
modeling and market share simulation. The presented workflow provides a comprehensive method for
Simulating Market Share with the Bayesia Market Simulator
40 www.bayesia.us | www.bayesia.sg
38 The all-electric Leaf was launched by Nissan in the U.S. in December of 2010.
simulating market shares of future products based on their key characteristics, without requiring new and
costly experiments.
As a result, BayesiaLab and Bayesia Market Simulator allow using a vast range of existing research for mar-
ket share predictions. Given the significant resources many corporations have allocated over many years to
conducting consumer surveys, these BayesiaLab tools offer an entirely new way to turn the accumulated
research data into practical market oracles.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 41
Appendix
Utility-Based Choice Theory
In today’s choice modeling practice, utility-based choice theory plays a dominant role.
The first concept of utility-based choice theory is that each individual chooses the alternative that yields him
or her the highest utility.
The second idea refers to being able to collapse a vector describing attributes of choice alternatives into a
single scalar utility value for the chooser. For instance, a vector of attributes for one choice alternative, e.g.
[Price, Fuel Economy, Safety Rating], would translate into one scalar value, e.g. [5], specific to each chooser.
The following example is meant to illustrate both:
For Consumer A:
Utility of Product 1:
[Price=$25,000, Fuel Economy=25MPG, Safety Rating=4 stars] = 7 ✓
Utility of Product 2:
[Price=$29,000, Fuel Economy=23MPG, Safety Rating=5 stars] = 5.5
For Consumer B:
Utility of Product 1:
[Price=$25,000, Fuel Economy=25MPG, Safety Rating=4 stars] = 4
Utility of Product 2:
[Price=$29,000, Fuel Economy=23MPG, Safety Rating=5 stars] = 7.5 ✓
This concept implies that consumers make tradeoffs, either explicitly or implicitly, and that there exists an
amount x of “Fuel Economy” that is equivalent in utility to an amount y of “Safety”. The reader may rea-
sonably object that not even a fuel economy of 100MPG would make it acceptable to drive a vehicle that is
rated very poorly on safety.
Also, we do not know a priori what the utility values are nor can we measure them. Neither do we know in
advance how individual product and consumer attributes relate to these unobservable utilities. However,
there are methods that allow us to estimate these unknown variables and, based on this knowledge, they
allow us to predict choice in the future. One such method is briefly highlighted in the following.
Simulating Market Share with the Bayesia Market Simulator
42 www.bayesia.us | www.bayesia.sg
Multinomial Logit Models
In the domain of choice modeling, MultiNomial Logit models (MNL) have become the workhorse of the
industry, but here we only want to provide a cursory overview, so the reader can compare the approach
presented in the case study with current practice.
MNL models provide a functional form for describing the relationship between the utilities of alternatives
and the probability of choice.
For instance, using an MNL model for a choice situation with three vehicle alternatives, Altima, Accord and
Camry, the probability of choosing the Altima can be expressed as:
Pr(Altima) =
exp(VAltima )
exp(VAltima ) + exp(VAccord ) + exp(VCamry )
VAltima in this case stands for the utility of the Altima alternative. The utilities VAltima, VAccord, and VCamry are
a function of the product attributes, e.g.
VAltima = β1 × CostAltima + β2 × FuelEconomyAltima + β3 × SafetyRatingAltima
As we can observe tangible attributes like vehicle cost,
fuel economy and safety rating, and we can also observe who bought which vehicle, we can estimate the
unknown parameters. Once we have the parameters, we can simulate choices based on new, hypothetical
product attributes, such as a better fuel economy for the Altima or a lower price for the Camry.
The parameters of MNL models can be estimated both from “stated preference” (SP) data, i.e. asking con-
sumers about what they would choose, and “revealed preference” (RP) data, i.e. observing what they have
actually chosen. There are numerous variations and extensions to the class of MNL models and the reader
is referred to Train (2003) and Koppelman (2006) for a comprehensive introduction.
Stated Preference Data
Stated preference data typically comes from experiments, i.e. consumer surveys or product clinics. In this
context, conjoint experiments have become a very popular choice elicitation method and a wide range of
tools have been developed for this particular approach. In conjoint studies, consumers would typically be
given a set of artificially generated product choices along with their attributes, from which preference re-
sponses are then elicited. There are many variations of this method that all attempt to address some of the
inherent challenges related to dealing with responses to hypothetical questions.
The Sawtooth software package has become de-facto industry standard for such conjoint studies.39
Revealed Preference Data
In contrast to SP data, revealed preference data is purely derived from passive observations. As the name
implies, the consumer choice is revealed by their actual behavior rather than by their stated intent in a hypo-
thetical situation. A key benefit is that it is typically easier and more economical to obtain passive observa-
tions than to conduct formal experiments. A conceptual limitation of RP data relates to the fact that non-
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 43
39 A wide range of tools is available from Sawtooth Software, Inc., www.sawtoothsoftware.com.
yet-existing products can obviously not be chosen by consumers in the present market environment. Thus
simulating market shares of hypothetical products requires “assembling” them from components and at-
tributes of products, which are already available in the market. This inherently limits the exploration of en-
tirely new technologies, which have little in common with the technologies they may replace.
Studies based on RP data have become very popular for researching travel mode choice, as is also docu-
mented in a large body of research. In market research related to CPG products or durable goods, using RP
data is somewhat less common.
We speculate that one of the reasons for the lack of popularity outside the world of academia is the absence
of easy-to-use software packages. Only recently, with the release of Easy Logit Modeling (ELM)40, specify-
ing and estimating multinomial logit models has become practical for a much broader audience. Although
ELM has successfully removed the burden of manual coding, countless iterations of specification and esti-
mation remain a very time-consuming task of the analyst.
NVES Variables
The following variables from the 2009 Strategic Vision NVES were included this case study:
• UNIQUE IDENTIFIER
• Combined Base Weight
• New Model Purchased - Make/Model/Series (Alpha Order)
• New Model Purchased - Brand
• New Model Purchased - Region Origin
• New Model Segment
• Segmentation 2
• Type Of Transmission
• Number Of Cylinders (VIN)
• Drive Type (VIN)
• Fuel Type
• Gender
• Marital Status
• Age Bracket
Simulating Market Share with the Bayesia Market Simulator
44 www.bayesia.us | www.bayesia.sg
40 Easy Logit Modeling is available from ELM-Works, Inc., www.elm-works.com. ELM can estimate models based on
both RP and SP data, although we only mention it in the RP context.
• Children Under 6
• Children 6 To 12
• Children 13 To 17
• Total Family Pre-Tax Income
• Ethnic Group
• Location Of Residence
• Customer Region Classification #1
• I Seek Variety in My Life
• I'm Curious and Open to Experiences
• Luxury is Not Important Unless it Has Purpose
• I Enjoy Expressing Myself Creatively
• I See Life as Full of Endless Possibilities
• Driving is one of my favorite things to do
• I really don't enjoy driving
• Whenever I get a chance, I love to go for a drive
• When I drive for fun, I mainly prefer to relax and listen to music or talk
• I want vehicles that provide that open-air driving experience
• I prefer a vehicle that has the capability to outperform others
• I prefer vehicles that provide superior straight ahead power
• I prefer vehicles that provide superior handling and cornering agility
• I prefer a balance of comfort and performance
• I prefer vehicles that provide the softest, most comfortable ride quality
• I just want the basics on my vehicle - no extras
• Value equals balance of costs, comfort & performance
• I prefer vehicles that project a tough and workmanlike image
• Vehicles are a 'tool' or a part of the 'gear' in an active outdoors lifestyle
• I Want to be able to tow heavy loads
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 45
• I want to be able to traverse any terrain
• I want the most versatility in my interior
• I want a basic, no frills vehicle that does the job
• My choice of vehicle reflects my personality
• I want a vehicle that says a lot about my success in life / career
• I will switch brand for features or price
• There are lots of different brands of vehicles that I would consider buying
• I prefer sofa-like comfort over a cockpit-like interior
• I want a vehicle that provides the quietest interior
• I want to look good when driving my vehicle
• I want my vehicle to stand out in a crowd
• I would pay significantly more for environmentally friendly vehicle
• Price is most important to me when buying a new vehicle
• Purchase Price (100's)
Simulating Market Share with the Bayesia Market Simulator
46 www.bayesia.us | www.bayesia.sg
Framework: The Bayesian Network Paradigm41
Acyclic Graphs & Bayes’s Rule
Probabilistic models based on directed acyclic graphs have a long and rich tradition, beginning with the
work of geneticist Sewall Wright in the 1920s. Variants have appeared in many fields. Within statistics, such
models are known as directed graphical models; within cognitive science and artificial intelligence, such
models are known as Bayesian networks. The name honors the Rev. Thomas Bayes (1702-1761), whose
rule for updating probabilities in the light of new evidence is the foundation of the approach.
Rev. Bayes addressed both the case of discrete probability distributions of data and the more complicated
case of continuous probability distributions. In the discrete case, Bayes’ theorem relates the conditional and
marginal probabilities of events A and B, provided that the probability of B does not equal zero:
P(A∣B) =
P(B∣A)P(A)
P(B)
In Bayes’ theorem, each probability has a conventional name:
P(A) is the prior probability (or “unconditional” or “marginal” probability) of A. It is “prior” in the sense
that it does not take into account any information about  B; however, the event  B need not occur after
event A. In the nineteenth century, the unconditional probability P(A) in Bayes’s rule was called the “ante-
cedent” probability; in deductive logic, the antecedent set of propositions and the inference rule imply con-
sequences. The unconditional probability P(A) was called “a priori” by Ronald A. Fisher.
P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is de-
rived from or depends upon the specified value of B.
P(B|A) is the conditional probability of B given A. It is also called the likelihood.
P(B) is the prior or marginal probability of B, and acts as a normalizing constant.
Bayes theorem in this form gives a mathematical representation of how the conditional probability of event
A given B is related to the converse conditional probability of B given A.
The initial development of Bayesian networks in the late 1970s was motivated by the need to model the top-
down (semantic) and bottom-up (perceptual) combination of evidence in reading. The capability for bidirec-
tional inferences, combined with a rigorous probabilistic foundation, led to the rapid emergence of Bayesian
networks as the method of choice for uncertain reasoning in AI and expert systems replacing earlier, ad hoc
rule-based schemes.
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 47
41 Adapted from Pearl (2000), used with permission.
The nodes in a Bayesian network represent variables
of interest (e.g. the temperature of a device, the gen-
der of a patient, a feature of an object, the occur-
rence of an event) and the links represent statistical
(informational) or causal dependencies among the
variables. The dependencies are quantified by condi-
tional probabilities for each node given its parents in
the network. The network supports the computation
of the posterior probabilities of any subset of vari-
ables given evidence about any other subset.
Compact Representation of the Joint
Probability Distribution
“The central paradigm of probabilistic reasoning is
to identify all relevant variables x1, . . . , xN in the
environment [i.e. the domain under study], and
make a probabilistic model p(x1, . . . , xN) of their interaction [i.e. represent the variables’ joint probability
distribution].”
Bayesian networks are very attractive for this purpose as they can, by means of factorization, compactly
represent the joint probability distribution of all variables.
“Reasoning (inference) is then performed by introducing evidence that sets variables in known states, and
subsequently computing probabilities of interest, conditioned on this evidence. The rules of probability,
combined with Bayes’ rule make for a complete reasoning system, one which includes traditional deductive
logic as a special case.” (Barber, 2012)
Simulating Market Share with the Bayesia Market Simulator
48 www.bayesia.us | www.bayesia.sg
References
Barber, David. “Bayesian Reasoning and Machine Learning.” http://www.cs.ucl.ac.uk/staff/d.barber/brml.
———. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2011.  
Darwiche, Adnan. “Bayesian networks.” Communications of the ACM 53, no. 12 (12, 2010): 80.  
Koller, Daphne, and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. The MIT Press, 2009.  
Koppelman, Frank, and Chandra Bhat. “A Self Instructing Course in Mode Choice Modeling: Multinomial and Nested
Logit Models.” January 31, 2006.
Krzywinski, M., J. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S. J. Jones, and M. A. Marra. “Circos: An
information aesthetic for comparative genomics.” Genome Research 19, no. 9 (6, 2009): 1639-1645.  
Neapolitan, Richard E., and Xia Jiang. Probabilistic Methods for Financial and Marketing Informatics. 1st ed. Morgan
Kaufmann, 2007.  
Pearl, Judea. Causality: Models, Reasoning and Inference. 2nd ed. Cambridge University Press, 2009.  
Spirtes, Peter, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search, Second Edition. 2nd ed. The
MIT Press, 2001.  
Train, Kenneth. Qualitative Choice Analysis: Theory, Econometrics, and an Application to Automobile Demand. 1st ed.
The MIT Press, 1985.  
Train, Kenneth E. Discrete Choice Methods with Simulation. Cambridge University Press, 2003.  
Simulating Market Share with the Bayesia Market Simulator
www.bayesia.us | www.bayesia.sg 49
Contact Information
Bayesia USA
312 Hamlet’s End Way
Franklin, TN 37067
USA
Phone: +1 888-386-8383
info@bayesia.us
www.bayesia.us
Bayesia Singapore Pte. Ltd.
20 Cecil Street
#14-01, Equity Plaza
Singapore 049705
Phone: +65 3158 2690
info@bayesia.sg
www.bayesia.sg
Bayesia S.A.S.
6, rue Léonard de Vinci
BP 119
53001 Laval Cedex
France
Phone: +33(0)2 43 49 75 69
info@bayesia.com
www.bayesia.com
Copyright
© 2013 Bayesia S.A.S., Bayesia USA and Bayesia Singapore. All rights reserved.
Simulating Market Share with the Bayesia Market Simulator
50 www.bayesia.us | www.bayesia.sg

More Related Content

Similar to Modeling Vehicle Choice and Simulating Market Share with Bayesian Networks

Driver Analysis and Product Optimization with Bayesian Networks
Driver Analysis and Product Optimization with Bayesian NetworksDriver Analysis and Product Optimization with Bayesian Networks
Driver Analysis and Product Optimization with Bayesian NetworksBayesia USA
 
Applied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatApplied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatCharlie Hecht
 
SPSS Solutions
SPSS SolutionsSPSS Solutions
SPSS SolutionsPhi Jack
 
Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016Stanford University
 
Marketing Channels & Channel Conflicts - Aditya Dasgupta
Marketing Channels & Channel Conflicts - Aditya DasguptaMarketing Channels & Channel Conflicts - Aditya Dasgupta
Marketing Channels & Channel Conflicts - Aditya DasguptaAditya Dasgupta
 
Organizational Strategy Final Project
Organizational Strategy Final ProjectOrganizational Strategy Final Project
Organizational Strategy Final ProjectPedram Keyvani
 
Smart Sim Selector: A Software for Simulation Software Selection
Smart Sim Selector: A Software for Simulation Software SelectionSmart Sim Selector: A Software for Simulation Software Selection
Smart Sim Selector: A Software for Simulation Software SelectionCSCJournals
 
Bayesia Lab Choice Modeling 1
Bayesia Lab Choice Modeling 1Bayesia Lab Choice Modeling 1
Bayesia Lab Choice Modeling 1jouffe
 
Innovation at the Edge_Final
Innovation at the Edge_FinalInnovation at the Edge_Final
Innovation at the Edge_FinalChris Waller
 
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris WallerPistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris WallerPistoia Alliance
 
Synergy Platform Whitepaper alpha
Synergy Platform Whitepaper alphaSynergy Platform Whitepaper alpha
Synergy Platform Whitepaper alphaYousef Fadila
 
SFN 2019 Presentation : Method of and system for processing signals sensed fr...
SFN 2019 Presentation : Method of and system for processing signals sensed fr...SFN 2019 Presentation : Method of and system for processing signals sensed fr...
SFN 2019 Presentation : Method of and system for processing signals sensed fr...Pierre-Majorique Léger
 
Prof. Eric T. Bradlow & Steve Ennen of Wharton, OMS 2010 Keynote on Social M...
Prof. Eric T. Bradlow  & Steve Ennen of Wharton, OMS 2010 Keynote on Social M...Prof. Eric T. Bradlow  & Steve Ennen of Wharton, OMS 2010 Keynote on Social M...
Prof. Eric T. Bradlow & Steve Ennen of Wharton, OMS 2010 Keynote on Social M...wimisteve
 
XMANAI Project Booklet - An overview and main highlights
XMANAI Project Booklet - An overview and main highlightsXMANAI Project Booklet - An overview and main highlights
XMANAI Project Booklet - An overview and main highlightsXMANAI
 
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...IJECEIAES
 
Evaluating A Clinical Microsystem Utilizing The...
Evaluating A Clinical Microsystem Utilizing The...Evaluating A Clinical Microsystem Utilizing The...
Evaluating A Clinical Microsystem Utilizing The...Monica Rivera
 

Similar to Modeling Vehicle Choice and Simulating Market Share with Bayesian Networks (20)

Driver Analysis and Product Optimization with Bayesian Networks
Driver Analysis and Product Optimization with Bayesian NetworksDriver Analysis and Product Optimization with Bayesian Networks
Driver Analysis and Product Optimization with Bayesian Networks
 
Applied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatApplied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_Yhat
 
F1033541
F1033541F1033541
F1033541
 
SPSS Solutions
SPSS SolutionsSPSS Solutions
SPSS Solutions
 
Marketing Simulation: An Overview
Marketing Simulation: An OverviewMarketing Simulation: An Overview
Marketing Simulation: An Overview
 
Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016
 
Marketing Channels & Channel Conflicts - Aditya Dasgupta
Marketing Channels & Channel Conflicts - Aditya DasguptaMarketing Channels & Channel Conflicts - Aditya Dasgupta
Marketing Channels & Channel Conflicts - Aditya Dasgupta
 
Organizational Strategy Final Project
Organizational Strategy Final ProjectOrganizational Strategy Final Project
Organizational Strategy Final Project
 
Smart Sim Selector: A Software for Simulation Software Selection
Smart Sim Selector: A Software for Simulation Software SelectionSmart Sim Selector: A Software for Simulation Software Selection
Smart Sim Selector: A Software for Simulation Software Selection
 
Bayesia Lab Choice Modeling 1
Bayesia Lab Choice Modeling 1Bayesia Lab Choice Modeling 1
Bayesia Lab Choice Modeling 1
 
Innovation at the Edge_Final
Innovation at the Edge_FinalInnovation at the Edge_Final
Innovation at the Edge_Final
 
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris WallerPistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
 
Synergy Platform Whitepaper alpha
Synergy Platform Whitepaper alphaSynergy Platform Whitepaper alpha
Synergy Platform Whitepaper alpha
 
Dss project analytics writeup
Dss project analytics writeup Dss project analytics writeup
Dss project analytics writeup
 
Mighty Guides- Data Disruption
Mighty Guides- Data DisruptionMighty Guides- Data Disruption
Mighty Guides- Data Disruption
 
SFN 2019 Presentation : Method of and system for processing signals sensed fr...
SFN 2019 Presentation : Method of and system for processing signals sensed fr...SFN 2019 Presentation : Method of and system for processing signals sensed fr...
SFN 2019 Presentation : Method of and system for processing signals sensed fr...
 
Prof. Eric T. Bradlow & Steve Ennen of Wharton, OMS 2010 Keynote on Social M...
Prof. Eric T. Bradlow  & Steve Ennen of Wharton, OMS 2010 Keynote on Social M...Prof. Eric T. Bradlow  & Steve Ennen of Wharton, OMS 2010 Keynote on Social M...
Prof. Eric T. Bradlow & Steve Ennen of Wharton, OMS 2010 Keynote on Social M...
 
XMANAI Project Booklet - An overview and main highlights
XMANAI Project Booklet - An overview and main highlightsXMANAI Project Booklet - An overview and main highlights
XMANAI Project Booklet - An overview and main highlights
 
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
 
Evaluating A Clinical Microsystem Utilizing The...
Evaluating A Clinical Microsystem Utilizing The...Evaluating A Clinical Microsystem Utilizing The...
Evaluating A Clinical Microsystem Utilizing The...
 

More from Bayesia USA

BayesiaLab_Book_V18 (1)
BayesiaLab_Book_V18 (1)BayesiaLab_Book_V18 (1)
BayesiaLab_Book_V18 (1)Bayesia USA
 
Impact Analysis V12
Impact Analysis V12Impact Analysis V12
Impact Analysis V12Bayesia USA
 
Causality for Policy Assessment and 
Impact Analysis
Causality for Policy Assessment and 
Impact AnalysisCausality for Policy Assessment and 
Impact Analysis
Causality for Policy Assessment and 
Impact AnalysisBayesia USA
 
Vehicle Size, Weight, and Injury Risk: High-Dimensional Modeling and
 Causal ...
Vehicle Size, Weight, and Injury Risk: High-Dimensional Modeling and
 Causal ...Vehicle Size, Weight, and Injury Risk: High-Dimensional Modeling and
 Causal ...
Vehicle Size, Weight, and Injury Risk: High-Dimensional Modeling and
 Causal ...Bayesia USA
 
The Bayesia Portfolio of Research Software
The Bayesia Portfolio of Research SoftwareThe Bayesia Portfolio of Research Software
The Bayesia Portfolio of Research SoftwareBayesia USA
 
Bayesian Networks &amp; BayesiaLab
Bayesian Networks &amp; BayesiaLabBayesian Networks &amp; BayesiaLab
Bayesian Networks &amp; BayesiaLabBayesia USA
 
Causal Inference and Direct Effects
Causal Inference and Direct EffectsCausal Inference and Direct Effects
Causal Inference and Direct EffectsBayesia USA
 
Paradoxes and Fallacies - Resolving some well-known puzzles with Bayesian net...
Paradoxes and Fallacies - Resolving some well-known puzzles with Bayesian net...Paradoxes and Fallacies - Resolving some well-known puzzles with Bayesian net...
Paradoxes and Fallacies - Resolving some well-known puzzles with Bayesian net...Bayesia USA
 
Probabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor AnalysisProbabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor AnalysisBayesia USA
 
Microarray Analysis with BayesiaLab
Microarray Analysis with BayesiaLabMicroarray Analysis with BayesiaLab
Microarray Analysis with BayesiaLabBayesia USA
 
Breast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBreast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBayesia USA
 
BayesiaLab 5.0 Introduction
BayesiaLab 5.0 IntroductionBayesiaLab 5.0 Introduction
BayesiaLab 5.0 IntroductionBayesia USA
 
Car And Driver Hk Interview
Car And Driver Hk InterviewCar And Driver Hk Interview
Car And Driver Hk InterviewBayesia USA
 

More from Bayesia USA (13)

BayesiaLab_Book_V18 (1)
BayesiaLab_Book_V18 (1)BayesiaLab_Book_V18 (1)
BayesiaLab_Book_V18 (1)
 
Impact Analysis V12
Impact Analysis V12Impact Analysis V12
Impact Analysis V12
 
Causality for Policy Assessment and 
Impact Analysis
Causality for Policy Assessment and 
Impact AnalysisCausality for Policy Assessment and 
Impact Analysis
Causality for Policy Assessment and 
Impact Analysis
 
Vehicle Size, Weight, and Injury Risk: High-Dimensional Modeling and
 Causal ...
Vehicle Size, Weight, and Injury Risk: High-Dimensional Modeling and
 Causal ...Vehicle Size, Weight, and Injury Risk: High-Dimensional Modeling and
 Causal ...
Vehicle Size, Weight, and Injury Risk: High-Dimensional Modeling and
 Causal ...
 
The Bayesia Portfolio of Research Software
The Bayesia Portfolio of Research SoftwareThe Bayesia Portfolio of Research Software
The Bayesia Portfolio of Research Software
 
Bayesian Networks &amp; BayesiaLab
Bayesian Networks &amp; BayesiaLabBayesian Networks &amp; BayesiaLab
Bayesian Networks &amp; BayesiaLab
 
Causal Inference and Direct Effects
Causal Inference and Direct EffectsCausal Inference and Direct Effects
Causal Inference and Direct Effects
 
Paradoxes and Fallacies - Resolving some well-known puzzles with Bayesian net...
Paradoxes and Fallacies - Resolving some well-known puzzles with Bayesian net...Paradoxes and Fallacies - Resolving some well-known puzzles with Bayesian net...
Paradoxes and Fallacies - Resolving some well-known puzzles with Bayesian net...
 
Probabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor AnalysisProbabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor Analysis
 
Microarray Analysis with BayesiaLab
Microarray Analysis with BayesiaLabMicroarray Analysis with BayesiaLab
Microarray Analysis with BayesiaLab
 
Breast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBreast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian Networks
 
BayesiaLab 5.0 Introduction
BayesiaLab 5.0 IntroductionBayesiaLab 5.0 Introduction
BayesiaLab 5.0 Introduction
 
Car And Driver Hk Interview
Car And Driver Hk InterviewCar And Driver Hk Interview
Car And Driver Hk Interview
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Modeling Vehicle Choice and Simulating Market Share with Bayesian Networks

  • 1. Modeling Vehicle Choice and Simulating Market Share with Bayesian Networks A case study about predicting the U.S. market share of the Porsche Panamera using the Bayesia Market Simulator Stefan Conrady, stefan.conrady@bayesia.us Dr. Lionel Jouffe, jouffe@bayesia.com December 18, 2010 Revised April 20, 2013 www.bayesia.us
  • 2. Table of Contents Modeling Vehicle Choice and Simulating Market Share with Bayesian Net- works Abstract 4 Objective 4 About the Authors 4 Stefan Conrady 4 Lionel Jouffe 5 Acknowledgements 5 Introduction 5 Bayesian Networks for Choice Modeling 6 Case Study 7 Porsche Panamera 8 Common Forecasting Practices 11 Tutorial 11 Notation 11 Data Preparation 12 Consumer Research 12 Variable Selection 12 Set of Choice Alternatives 12 Filtered Values (Censored States) 13 Data Modeling 14 Data Import 14 Missing Values 16 Discretization 17 Variable Classes and Forbidden Arcs 22 Unsupervised Learning 25 Simulation 26 Product Scenario Baseline 27 Product Scenario Simulation 29 Substitution and Cannibalization 37 Market Scenario Simulation 39 Simulating Market Share with the Bayesia Market Simulator ii www.bayesia.us | www.bayesia.sg
  • 3. Limitations 40 Outlook 40 Summary 40 Appendix Utility-Based Choice Theory 42 Multinomial Logit Models 43 Stated Preference Data 43 Revealed Preference Data 43 NVES Variables 44 Framework: The Bayesian Network Paradigm 47 Acyclic Graphs & Bayes’s Rule 47 Compact Representation of the Joint Probability Distribution 48 References 49 Contact Information Bayesia USA 50 Bayesia Singapore Pte. Ltd. 50 Bayesia S.A.S. 50 Copyright 50 Simulating Market Share with the Bayesia Market Simulator www.bayesia.sg iii
  • 4. Modeling Vehicle Choice and Simulating Market Share with Bayesian Networks Abstract We present a new method and the associated workflow for estimating market shares of future products based exclusively on pre-introduction data, such as syndicated studies conducted prior to product launch. Our approach provides a highly practical, fast and economical alternative to conducting new primary re- search. With Bayesian networks as the framework, and by employing the BayesiaLab and Bayesia Market Simulator software packages, this approach helps market researchers and product planners to reliably perform market share simulations on their desktop computers1 , which would have been entirely inconceivable in the past. This innovative approach is explained step-by-step in a study about the introduction of the new Porsche Panamera in the U.S. market. The results confirm that market share simulation with Bayesian networks is feasible even in niche markets that provide relatively few observations. We believe that making this method and the tools accessible to practitioners is an important contribution to real-world marketing. We are confident that for many companies this approach can yield a step-change in their forecasting ability. Objective This tutorial is intended for marketing practitioners, who are exploring the use of Bayesian network for their work. The example in this tutorial is meant to illustrate the capabilities of BayesiaLab with a real- world case study and actual consumer data. Beyond market researchers, analysts in many fields will hope- fully find the proposed methodology valuable and intuitive. In this context, many of the technical steps are outlined in great detail, such as data preparation and network learning, as they are applicable to research with BayesiaLab in general, regardless of the domain. This paper is part of a series of tutorials, which are exploring a broad range of real-world applications of Bayesian networks. About the Authors Stefan Conrady Stefan Conrady is the Managing Partner of Bayesia USA, which he co-founded in 2010. Bayesia USA serves as the North American sales and consulting organization for France-based Bayesia S.A.S. Their mission is to Simulating Market Share with the Bayesia Market Simulator 4 www.bayesia.us | www.bayesia.sg 1 BayesiaLab and Bayesia Market Simulator can run on a wide range of operating systems, including Windows, OS X, Linux/Unix, etc.
  • 5. promote Bayesian networks as a new research framework for knowledge discovery and reasoning within complex domains. Stefan studied Electrical Engineering and has extensive management experience in the fields of product planning, marketing and analytics, working at Daimler and BMW Group in Europe, North America and Asia. Prior to establishing Bayesia USA, he was heading the Analytics & Forecasting group at Nissan North America. Lionel Jouffe Dr. Lionel Jouffe is co-founder and CEO of France-based Bayesia S.A.S. Lionel Jouffe holds a Ph.D. in Computer Science and has been working in the field of Artificial Intelligence since the early 1990s. He and his team have been developing BayesiaLab since 1999, and it has emerged as the leading software package for knowledge discovery, data mining and knowledge modeling using Bayesian networks. BayesiaLab enjoys broad acceptance in academic communities as well as in business and industry. The relevance of Bayesian networks, especially in the context of market research, is highlighted by Bayesia’s strategic partnership with Procter & Gamble, who has deployed BayesiaLab globally since 2007. Acknowledgements Strategic Vision, Inc.2 (SVI) has generously made their 2009 New Vehicle Experience Survey available as a data source for this case study. In this context, special thanks go to Alexander Edwards, President, Automo- tive Division of Strategic Vision. We would also like to thank Jeff Dotson3, John Fitzgerald4 and Frank Koppelman5 for their ongoing coach- ing and their valuable comments on this paper. However, all errors remain the responsibility of the authors. Finally, Kenneth Train’s6 books and articles have been very helpful over the years as we explored the field of consumer choice modeling. Introduction For the vast majority of businesses, market share is a key performance indicator. Market share is used as a metric that allows comparing competitive performance independently from overall market size and its fluc- tuations. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 5 2 www.strategicvision.com 3 Assistant Professor of Marketing, Vanderbilt University, Owen Graduate School of Management. 4 President, Fitzgerald Brunetti Productions, Inc., New York. 5 Professor Emeritus, Professor Emeritus of Civil and Environmental Engineering, Robert R. McCormick School of En- gineering and Applied Science, Northwestern University. 6 Adjunct Professor of Economics and Public Policy, University of California, Berkeley.
  • 6. In the product planning process, the expected market share is critical, along with the overall market fore- cast, as together they define the sales volume expectation. For obvious reasons, sales volume is a key ele- ment in most business cases. As a result, it is critical for decision makers to correctly predict the future market shares of products not yet developed. The task of such market share forecasts typically falls into marketing and market research de- partments, who are mostly closely involved with understanding consumer behavior and, more specifically, the product choices they make. If we fully understood the consumer’s decision making process and observed all components of it, we could simply generate a deterministic model for predicting future consumer choices. However, we do not and it is obvious that many elements contributing to a consumer’s purchase decision are inherently unobservable. Despite our limited comprehension of the true human choice process, there are a number of tools that still allow modeling consumer choice with what is observable, and accounting for what will remain unknow- able. In this context, and based on the seminal works of Nobel-laureate Daniel McFadden7, choice modeling has emerged as an important tool in understanding and simulating consumer choice. Such choice models serve a representation of the “real world” and thus become, what Judea Pearl likes to call “oracles” that allow us to “deliberately reason about the consequences of actions we have not yet taken.”8 Bayesian Networks for Choice Modeling Using Bayesian networks9 as the general framework for modeling a domain or system has many advantages, which Darwiche (2010) summarizes as follows: “Bayesian networks provide a systematic and localized method for structuring probabilistic information about a situation into a coherent whole […]” “Many applications can be reduced to Bayesian network inference, allowing one to capitalize on Bayesian network algorithms instead of having to invent specialized algorithms for each new application.” Given the very attractive properties of Bayesian networks for representing a wide range of problem do- mains, it seems appropriate applying them for choice modeling too. In particular, the BayesiaLab software package has made it very convenient to automatically machine-learn fairly large and complex Bayesian net- works from observational data. Simulating Market Share with the Bayesia Market Simulator 6 www.bayesia.us | www.bayesia.sg 7 Daniel McFadden received, jointly with James Heckman, the 2000 Nobel Memorial Prize in Economic Sciences; McFadden’s share of the prize was “for his development of theory and methods for analyzing discrete choice”. 8 A recurring quote from Judea Pearl’s many lectures on causality. 9 A Bayesian network is a graphical model that represents the joint probability distribution over a set of random vari- ables and their conditional dependencies via a directed acyclic graph (DAG). See the appendix for a brief introduction.
  • 7. Beyond the convenience and speed of estimating Bayesian networks with BayesiaLab, there are three fun- damental differences in modeling consumer choice with Bayesian networks compared to traditional discrete choice models.10 Whereas utility-based choice models, such as multinomial logit models (MNL), will “flatten” the vector of attribute utilities into a single scalar value, Bayesian networks do not inherently restrict all the dimensions relating to choice. For example, learning a Bayesian network from observed vehicle choices might reveal that fuel economy and vehicle price are subject to tradeoff, while safety might be a nonnegotiable basic re- quirement for the consumer. Correctly recognizing such dynamics are obviously critical for making predic- tions about future consumer choices. Bayesian networks are nonparametric and, therefore, do not require the specification of a functional form. No assumptions need to made regarding the form of links between variables. Thus, potentially nonlinear patterns are not an issue for model estimation or simulation. Bayesian networks are inherently probabilistic, and, as such, there is no need to specify an error term. In a traditional choice, an error term would be needed model to make it non-deterministic. In BayesiaLab all computations are natively discrete and thus no transformation functions, such as logit or probit, are needed. Given that we are dealing with discrete consumer choices, this all-discrete approach is an advantage. For our case study, we use BayesiaLab 5.0 Professional Edition to learn a Bayesian network from consumer choices in the form of stated preference (SP) or revealed preference (RP) data.11 ,12 The learned Bayesian network allows us to compute the posterior probability distribution in each choice situation, including hy- pothetical product alternatives (and even hypothetical consumers). As a result, we obtain a choice probabil- ity as a function of product and consumer attributes. In order to obtain a product’s projected market share, we then need to simulate choice probabilities across all product scenarios and across all individuals in the population under study. For this specific purpose, Bayesia S.A.S. has developed the Bayesia Market Simulator, which uses the Bayesian networks generated by BayesiaLab. Both tools will play a central role in this case study. Case Study To illustrate the entire market share estimation process with Bayesian networks, we have derived a case study from the U.S. auto industry. More specifically, we will model consumer choice behavior in the high- Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 7 10 A very brief overview about utility-based choice models is provided in the appendix. 11 The properties of Stated Preference (SP) and Revealed Preference (RP) data are explained in the appendix. 12 Although we focus here exclusively on machine-learning consumer behavior, within the BayesiaLab framework we can also utilize expert knowledge about consumer behavior. For instance, vehicle dealers and their salespeople will have extensive knowledge about how consumer behave in the showroom. A special Knowledge Elicitation module in BayesiaLab can formally capture such expertise and build a new Bayesian network from it or augment an existing one. Knowledge Elicitation with BayesiaLab will be the subject of a separate tutorial to be published in the near future.
  • 8. end vehicle market based on 2009 survey data. This is an interesting point in time as it precedes the launch of the new Porsche Panamera in model year 2010 (MY 2010), which will be the focus of our study. Porsche Panamera After the highly successful Cayenne, a four-door luxury SUV, the Panamera is Porsche’s second vehicle with four doors. Clearly influenced by the legendary 911’s styling, the Panamera offers sports-car looks and per- formance while comfortably accommodating four passengers. It enters a segment with well-established con- tenders, such the Mercedes-Benz S-Class13 , the BMW 7-series14 and the Audi A815 , shown below in that order. Simulating Market Share with the Bayesia Market Simulator 8 www.bayesia.us | www.bayesia.sg 13 MY 2010 shown 14 MY 2009 shown 15 MY 2009 shown
  • 9. Beyond these traditional premium sedans, there are a number of less conventional products that one can assume to be in the Panamera’s competitive field. The coupe-like Mercedes-Benz CLS16 would presumably fall into this category. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 9 16 MY 2010 shown
  • 10. Finally, the new Panamera may draw customers away from Porsche’s own product offerings, such as the Cayenne17, an effect that is often referred to as “product substitution” or “product cannibalization.” It is not our intention to speculate about potential product interactions, but rather to attempt learning from revealed consumer behavior in a very formal way with Bayesian networks. In order not to prematurely restrict our consumer choice set, we have defined a broad set of competitors for our purposes and included all non-domestic luxury vehicles18 (including Light Trucks) priced above $75,000.19 What was certainly a very real task for Porsche’s product planning team in recent years, i.e. predicting the Panamera market share, now becomes the topic of our case study and tutorial. Our objective is to predict Simulating Market Share with the Bayesia Market Simulator 10 www.bayesia.us | www.bayesia.sg 17 MY 2009 shown 18 We followed the SVI segmentation and included “Luxury Car”, “Premium Coupe”, “Premium Convertible/Roadster” and “Luxury Utility” in our selection. 19 The $75,000 threshold was chosen as it marks the lower end of the Panamera price range.
  • 11. what market share the Panamera will achieve without conducting any new research, strictly using RP data from before the product launch. Common Forecasting Practices Although we have no knowledge of the specific forecasting methods at Porsche, we know from industry experience that volume and market share forecasts are often determined through a long series of negotia- tions20 between stakeholders, typically with an optimistic marketing group on one side and a skeptical CFO on the other. While expert consensus may indeed be a reasonable heuristic for business planning, the lack of forecasting formalisms is often justified by saying that forecasting is at least as much art as it is science. The authors believe strongly that there is great risk in relying too heavily on “art”, which is inherently non- auditable, and have thus been pursuing easily tractable, but scientifically sound methods to support manage- rial decision making, especially in the context of forecasting. With this in mind, this very formal and struc- tured forecasting exercise was consciously chosen as the topic of the tutorial. Tutorial In this tutorial, we will explain each step from data preparation to market share simulation using Bayesia- Lab and Bayesia Market Simulator, according to the following outline: • Data preparation (external) • BayesiaLab: • Data import • Data modeling • Baseline product scenario generation (external) • Bayesia Market Simulator: • Network import • Definition of scenarios • Market share simulation Notation To clearly distinguish between natural language, software-specific functions and study-specific variable names, the following notation is used: BayesiaLab and Bayesia Market Simulator functions, keywords, commands, etc., are shown in bold type. Variable/node names are capitalized and italicized. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 11 20 As an interesting aside, these negotiations are usually Markovian in nature, i.e. the starting point of today’s negotia- tion only depends on the outcome of the previous negotiation.
  • 12. Data Preparation Consumer Research This tutorial utilizes the 2009 New Vehicle Experience Survey, a syndicated study conducted annually by Strategic Vision, Inc., which surveys new vehicle buyers in the U.S. This study is widely used in the auto industry, and it serves one of the primary market research tools. NVES contains over 1,000 variables and close to 200,000 respondent records. In large auto companies, hundreds of analysts typically have access to NVES, most often through the mTAB interface provided by Productive Access, Inc. (PAI).21 Variable Selection Compared to traditional statistical models, Bayesian networks require much less “care” in terms of variable selection as overparameterization is generally not an issue. Although we could easily start with all 1,000+ variables, for expositional clarity we will initially select only about 50 variables22 from the following cate- gories, which we assume to capture relevant characteristics of both the consumer and the product: Vehicle/product attributes, e.g. brand, segment, number of cylinders, transmission, drive type, etc. Consumer demographics, e.g. age, income, gender, etc. Vehicle-related consumer attitudes, e.g. “I want to look good when driving my vehicle”, “I want a basic, no-frills vehicle that does the job,” etc. Set of Choice Alternatives Beyond variable selection, we must also define the set of choice alternatives and assume which vehicles a potential Panamera customer would consider. Not only that, but we also need to make sure that all choice alternatives for the Panamera’s choice alternatives are included. For instance, if we included the Porsche Cayenne in the choice set, then the Mercedes-Benz M-Class and the BMW X5 should be included too, and so on. One might argue that the vehicle purchase might be an alternative to a kitchen renovation or the pur- chase of a boat. Expert knowledge is clearly required at this point as to how far to expand the choice set. Furthermore, SVI’s NVES can also help us in this regard as it contains questions about what vehicles actual buyers did consider and which vehicles they disposed in the context of their most recent purchase.23 As mentioned in the case study introduction, we included “Luxury Car”, “Premium Coupe”, “Premium Convertible/Roadster” and “Luxury Utility”24 in the choice set and we further restricted it by excluding all domestic vehicles and vehicles priced below $75,000. For this segment of assumed Panamera competitors, we have approximately 1,200 unweighted observations in the 2009 NVES, which, on a weighted basis, re- flect approximately 25,000 vehicles purchased in 2009. Simulating Market Share with the Bayesia Market Simulator 12 www.bayesia.us | www.bayesia.sg 21 www.paiwhq.com 22 A list of all variables used is given in the appendix. It should be noted that even 50 variables would create a major computational challenge with MNL models. 23 Martin Krzywinski’s visualization tool, Circos, is highly recommended for the interpretation of cross-shopping behav- ior: www.mkweb.bcgsc.ca/circos/ 24 According to SVI’s segment definition.
  • 13. Filtered Values (Censored States) Although we can be less rigorous regarding the maximum number of variables in BayesiaLab, we still need to be conscious of the information contained in them. For instance, we need to distinguish unobserved values from non-existing values, although at first glance both appear to be “simple” missing values in the database. BayesiaLab has a unique feature that allows treating non-existing values as Filtered Values or Censored States. To explain Filtered Values, we need to resort to an automotive example from outside our specific study. We assume that we have two questions about trailer towing. We first ask, “do you use your vehicle for tow- ing?”, and then, “what is the towing weight?” If the response to the first question is “no”, then a value for the second one cannot exist, which in BayesiaLab’s nomenclature is a Filtered Value or Censored State. In this case, we actually must not impute a value for towing weight; instead a Filtered Value code will indicate this special condition. On the other hand, a respondent may answer “yes”, but then fail to provide a towing weight. In this case, a true value for the towing weight exists, but we cannot observe it. Here, it is entirely appropriate to impute a missing value as we will explain as part of the Data Import procedure. To indicate Filtered Values to BayesiaLab, we will need to apply a study-specific logic and recode the rele- vant variables in the original database. Most statistical software packages have a set of functions for this kind of task. For example, in STATISTICA this can be done with the Recode function. Alternatively, this recoding logic can also be expressed with the following pseudo code: IF towing=yes THEN towing weight=unchanged IF towing=no THEN towing weight=FV (Filtered Value) A simple Excel function will achieve the same, and it is assumed that the reader can implement this without further guidance. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 13
  • 14. Although Filtered Values are very important in many research contexts, hence the emphasis here, our case study does not require using them. Data Modeling Data Import To start the analysis with BayesiaLab, we first import the database, which needs to be formatted as a CSV file.25 With Data>Open Data Source>Text File, we start the Data Import wizard, which immediately pro- vides a preview of the data file. The table displayed in the Data Import wizard shows the individual variables as columns and the respon- dent records as rows. There are a number of options available, such as for Sampling. However, this is not necessary in our example given the relatively small size of the database. Clicking the Next button prompts a data type analysis, which provides BayesiaLab’s best guess regarding the data type of each variable. Furthermore, the Information box provides a brief summary regarding the number of records, the number of missing values, filtered states, etc. Simulating Market Share with the Bayesia Market Simulator 14 www.bayesia.us | www.bayesia.sg 25 CSV stands for “comma-separated values”, a common format for text-based data files. As an alternative to this im- port format, BayesiaLab offers a JDBC connection, which is practical when accessing large databases on servers.
  • 15. For this example, we will need to override the default data type for the Unique Identifier variable as each value is a nominal record identifier rather than a numerical scale value. We can change the data type by highlighting the Unique Identifier column and clicking the Row Identifier check box, which changes the color of the Unique Identifier column to beige. Although it is not imperative to maintain a Row Identifier, and we could instead assign the Not Distributed status to the Unique Identifier variable, it can be quite helpful for finding individual respondent records at a later point in the analysis. As the respondent records in the NVES survey are weighted, we need to select the Weight by clicking on the Combined Base Weight variable, which will turn the column green. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 15
  • 16. Missing Values In the context of data import, it is important to point out how missing values are treated in BayesiaLab. The native, automatic processing of missing values reveals a particular strength of BayesiaLab. In traditional statistical analysis, the analyst has to choose from a number of methods to handle missing values in a database, but, unfortunately, many of them have serious drawbacks. Perhaps the most common method is case-wise deletion, which simply excludes records that contain any missing values. Casually speaking, this means throwing away lots of good data (the non-missing values) along with the bad (the missing values). Another method is means-imputation, by which any missing value is filled in with the vari- able’s mean. Inevitably, this reduces the variance of the variable and thus has an impact on its summary statistics, which is clearly undesirable considering the intended analysis. In the case of discrete distributions, means-imputation typically also introduces a bias. There are other, better techniques, which typically de- mand significant computational effort and thus often turn out like a labor-intensive standalone project rather than being just a preparatory step. Without going into too much detail at this point, BayesiaLab can estimate all missing values given the learned network structure using the Expectation Maximization (EM) algorithm. As a result, we obtain a complete database without “making things up.” In traditional statistics, the equivalent would be to say that neither the mean nor the variance of the variables is affected by the imputation process. Continuing in our data import process, the next screen provides options as to how to treat the missing val- ues. Clicking the small upside-down triangle next to the variable names brings up a window with key statis- tics of the selected variable, in this case Age Bracket. The very basic functions of filtering, i.e. case-wise deletion, and mean/modal value imputation are available. However, at this point, we can take advantage of BayesiaLab’s advanced missing values processing algo- rithms. We will select Dynamic Completion, which will continuously “fill in” and “update” the missing val- ues according to the conditional distribution of the variable, as defined by the current structure of the net- works. However, as our network is not yet connected and hence does not have a structure, BayesiaLab will Simulating Market Share with the Bayesia Market Simulator 16 www.bayesia.us | www.bayesia.sg
  • 17. draw from the marginal distribution of each variable to “tentatively” establish placeholder values for each missing value. A screenshot from STATISTICA, where we have done most of the preprocessing, shows the marginal distri- bution of the Age Bracket variable in the form of a histogram.26 The missing Age Bracket values will be drawn from this marginal distribution and are used as placeholders until we can use the structure of the Bayesian network to re-estimate our missing values. As Dynamic Com- pletion implies, BayesiaLab performs this on a continuous basis in the background, so at any point we would have the best possible estimates for the missing values, given the current network structure. Discretization The next step is the Discretization and Aggregation dialogue, which allows the analyst to determine the type of discretization that must be performed on all continuous variables.27 We will use the Purchase Price vari- able to explain the process. Highlighting a variable will show the default discretization algorithm while the graph panel is initially blank. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 17 26 The normal curve in the histogram is just for illustration purposes. BayesiaLab always uses the actual discrete distri- bution, not a parametric approximation. 27 BayesiaLab requires discrete distributions for all variables.
  • 18. By clicking on the Type drop-down menu, the choice of discretization algorithms appears. Selecting Manual will show a cumulative graph of the Purchase Price distribution, and we can see that it ranges from $75,000 to $180,000.28 Simulating Market Share with the Bayesia Market Simulator 18 www.bayesia.us | www.bayesia.sg 28 $75,000 was previously selected as the lower boundary for this particular vehicle segment. $180,000 was the highest reported price in NVES.
  • 19. We could now manually select binning thresholds by way of point-and-click directly on the graph panel. This might be relevant if there were government regulations in place with specific vehicle price thresholds.29 For our purposes, however, we want to create price categories that are meaningful in the context of our ve- hicle segment and five bins may seem like a reasonable starting point. Clicking Generate Discretization will prompt us to select the type of discretization and the number of de- sired intervals. Without having a-priori knowledge about the distribution of the Price variable, we may want to start with the Equal Distances algorithm. The resulting view shows the generated intervals, and, by clicking on the interval boundaries, we can see the percentage of cases falling into the adjacent intervals. We learn from this that our bottom two intervals contain 89% of the cases, whereas the top two intervals contain just under 5% of the cases. This suggests that we may not have enough granularity to characterize the bulk of the market towards the bottom end of the price spectrum. Perhaps we also have too few cases within the top two intervals. So we will generate a new discretization, now with four intervals, and select KMeans as the type this time. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 19 29 The now-expired luxury tax for passenger cars in the U.S. would be an example for such a policy.
  • 20. The resulting bins appear much more suitable to describe our domain. We will proceed similarly with the only other continuous variable in the database, i.e. Age Bracket. Clicking Finish completes the import process, and 49 variables (columns) from our database are now shown as blue nodes in the Graph Panel, which is the main window for network editing. Note For choosing discretization algorithms beyond this example, the following rule of thumb may be helpful: • For supervised learning, choose Decision Tree. • For unsupervised learning, choose, in the order of priority, K-Means, Equal Distances or Equal Frequencies. Simulating Market Share with the Bayesia Market Simulator 20 www.bayesia.us | www.bayesia.sg
  • 21. The six nodes on the far left column reflect product attributes (green); the second-from-left column shows ten demographic attributes (yellow) and all remaining nodes to the right represent 33 vehicle-related atti- tudes (red). This initial view represents a fully unconnected Bayesian network. Also, to simplify our nomenclature, we will combine the demographic attributes (yellow) and the vehicle- related attitudes (red) and refer to them together as “Market” variables (now all red). Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 21
  • 22. Variable Classes and Forbidden Arcs One is now tempted to immediately start with Unsupervised Learning to see how all these variables relate to each other. However, there are two reasons why we need to introduce another step at this point: Our mission is to model the interactions between products variables and market variables so we can see the consumer response to products. For instance, we are more interested in learning P(Transmission= “Manual” | Attitude = “Driving is one of my favorite things”) than we are in P(Age < 45 | Number of children under 6 = 2). Hence we focus the learning algorithm on the area of interest, i.e. product attributes vis-à-vis market attributes. We must not learn the dependencies between the product variables themselves because they would simply reflect today’s product offerings and their contingencies, e.g. P(Vehicle Segment=“4-door sedan” | Brand=“Porsche”)=0. We do want to understand what is available today, but we certainly do not want to encode today’s product scenarios as constraints in the network. Instead, we want to be able to introduce new scenarios, which are not available today. To focus learning in a specific area, we need to take an indirect approach and tell BayesiaLab “what not to learn.” So, to prevent the algorithm from learning the product-to-product variable relationships, we will “forbid” such arcs. We first create a Class by highlighting all product nodes then right-clicking them. From the menu, we then select Properties>Classes>Add. Simulating Market Share with the Bayesia Market Simulator 22 www.bayesia.us | www.bayesia.sg
  • 23. When prompted for a name, we can choose something descriptive, so we give this new Class the label “Product”. Having introduced this Class of node, we can now very easily manage Forbidden Arcs. More specifically, we want to make all arcs within the Class Products forbidden. A right-click anywhere on the Graph Panel opens up the menu from which we can select Edit Forbidden Arcs. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 23
  • 24. In the Forbidden Arc Editor, we can select the Class Product both as start and end. We now repeat the above steps and also create Forbidden Arcs for the Market variables. As a result, these Forbidden Arc relationships will appear in the Forbidden Arc Editor and will remain there unless we subsequently choose to modify them. Simulating Market Share with the Bayesia Market Simulator 24 www.bayesia.us | www.bayesia.sg
  • 25. We are also reminded about the presence of Forbidden Arcs by the symbol in the lower right corner of the screen. Unsupervised Learning Now that the learning constraints are in place, we continue to learn the network by selecting Learning>As- sociation Discovering>EQ.30 Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 25 30 EQ is one of the unsupervised learning algorithms implemented in BayesiaLab. Koller and Friedman (2009) provide a comprehensive introduction to learning algorithms.
  • 26. The resulting network may appear somewhat unwieldy at first glance, but upon closer inspection we can see that arcs exist only between Product variables (green) and Market variables (red), which is precisely what we intended by establishing Forbidden Arcs. However, we will not analyze this structure any further, but rather use it solely as a statistical device to be used in the Bayesia Market Simulator. We simply need to save the network in its native xbl file format, so the Bayesia Market Simulator can subsequently import it. Simulation With the Bayesia Market Simulator we have the ability to simulate “alternate worlds” for both the Product variables as well as for the Market variables. In most applications, however, marketing analysts will want to primarily study new Product scenarios assuming the Market remains invariant, meaning that consumer demographics and attitudes remain the same.31 It will be the task of the analyst to define new product scenarios, which will need to include all products assumed to be in the marketplace for the to-be-projected timeframe, in our case 2010.32 As many products carry over from one year to the next, e.g. from model year 2010 to model year 2011, it is very helpful to use Simulating Market Share with the Bayesia Market Simulator 26 www.bayesia.us | www.bayesia.sg 31 The year-to-year invariance assumption of the market has been challenged by many marketing executives during the most recent recession. In this context, many media headlines also proclaimed a paradigm shift in consumer behavior. The authors have believed - then as well as now - that more has remained the same than has changed in terms of con- sumer attitudes. 32 For expositional simplicity, we make no distinction between model year and calendar year.
  • 27. the currently available products as a baseline scenario, upon which changes can be built. Quite simply, we need to take inventory of the product landscape today. In the current version of Bayesia Market Simulator this step is yet not automated, so a practical procedure for generating the baseline scenario is described in the following section. Product Scenario Baseline The idea is that all available product configurations were manifested in the market in 2009 and thus cap- tured in the 2009 NVES.33 It still requires careful consideration as to how many Product variables should be included to generate the baseline product scenario. We want to create a type of coordinate system that allows us to identify products through their principal characteristics. For instance, the following attributes would uniquely define a “Mercedes-Benz S550 4Matic”: Brand=“Mercedes-Benz” Engine Type=“V8” Drive Type=“AWD” Transmission=“Automatic” Segment=“High Premium”34 Price=“>$85,795 AND <= $99,378” Relating consumer attributes and attitudes to these individual product attributes, rather than to the vehicle as a whole, will then allow us to construct hypothetical products during our simulation. To stay with the Mercedes example, we could define a new product by setting the engine type to “V6” and changing the price to “<$85,795”. It is easy to imagine how one can get the number of permutations to exceed the number of consumers. For instance, in the High Premium segment, we could further differentiate between short wheelbase and long wheelbase versions, which would increase the number of baseline product scenarios. We want to find a rea- sonable balance between product granularity and the ratio of consumers to product scenarios, although we cannot provide the reader with a hard-and-fast rule. Pricing is obviously a very important part of the product scenario configuration and here we are confronted with the reality that no two customers pay exactly the same for the identical product, and the survey data makes this very evident. Furthermore, there are numerous product features outside our “coordinate sys- tem”, e.g. an optional $6,000 high-end audio system, that would materially affect the price point of an indi- vidual vehicle, but which would not move the vehicle into a different category from a consumer’s perspec- tive. With options, an S550 can easily reach a price of over $100,000. Still we would want such a high-end Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 27 33 In our example, we judge this to be a reasonable simplification, even though a small number of automobiles at the very top end of the market, e.g. the Rolls-Royce Phantom, may not be captured in the survey. 34 Using the Strategic Vision segmentation nomenclature, “High Premium” defines a large four-door luxury sedan.
  • 28. S550 to be grouped with the standard S550. Thus, it is important to define reasonable price brackets that cover the price spectrum of each vehicle and minimize model fragmentation. During the Data Import stage, BayesiaLab has discretized all continuous numerical values, including Price, and created discrete states. If these discrete states are adequate considering the price positioning and price spectrum of the vehicles under study, we can now leverage this existing binning for generating all current product scenarios and select Data>Save Data. In the subsequently appearing dialogue box, we need to select Use the States’ Long Name. It is important that Use Continuous Values is not checked; otherwise we will lose the discretized states of the Price vari- able. This will export all variables and all records, including values from previously performed missing value im- putations. The output will be in a semicolon-delimited text file, which can be easily imported into Excel or any statistical application, such as SPSS or STATISTICA. The purpose of loading this into an external appli- cation is to manipulate the database to extract the unique product combinations available in the market. In Excel this can be done very quickly by deleting all columns unrelated to the product configuration, which leaves us with just the product attributes. Simulating Market Share with the Bayesia Market Simulator 28 www.bayesia.us | www.bayesia.sg
  • 29. In Excel 2010 (for Windows) and Excel 2011 (for Mac), there is a very convenient feature, which allows to quickly remove all duplicates, which is exactly what we want to achieve. We want to know all the unique product configurations currently in the market. This leaves use with a table of approximately 100 unique product scenario combinations available at the time of the survey. To make these unique product scenarios available for subsequent use in the Bayesia Market Simulator, we need to save the table as a semicolon-delimited CSV file. This is important to point out as most programs will save CSV files by default as comma-delimited files. Product Scenario Simulation Now that we have the Bayesian network describing the overall market (as an xbl file) as well as the baseline product scenarios (as a csv file), we can proceed to open the Bayesia Market Simulator. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 29
  • 30. Clicking File>Open will prompt us to open the xbl network file we previously generated with BayesiaLab. Upon loading we will see the principal interface of the Bayesia Market Simulator. On the left panel, all nodes of the network appear as variables. We will now need to separate all variables into Market Variables and Scenario Variables by clicking the respective arrow buttons. In our case, the aptly named Market vari- ables are the Market Variables in BMS nomenclature and Product variables are the Scenario Variables. Simulating Market Share with the Bayesia Market Simulator 30 www.bayesia.us | www.bayesia.sg
  • 31. All variables must be allocated before being able to continue to Scenario Editing. This also implies that Product variables, which are not to be included as Scenario Variables, must be excluded from the Bayesian network file. If necessary, we will return to BayesiaLab to make such edits As we are working with RP data, every record in our database reflects one vehicle purchase, i.e. “reveals” one choice, and therefore we need to leave the Target Variable and Target State fields blank. These fields would only be used in conjunction with SP data, which includes a variable indicating acceptance versus re- jection. Clicking Scenario Editing opens up a new window. We can now manually add any product scenarios we wish to simulate. Given the potentially large number of scenarios, it will typically be better to load the base- line product scenarios, which were saved earlier. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 31
  • 32. We can do that by selecting Offer>Import Offers. We now select to open the semicolon-delimited CSV file with the baseline product scenarios. It is very im- portant that the CSV file is formatted precisely as specified, for instance, without any extra blank lines. In case there are any import issues, it can be helpful to review the CSV file in a text editor and to visually inspect the formatting. Simulating Market Share with the Bayesia Market Simulator 32 www.bayesia.us | www.bayesia.sg
  • 33. Upon successful import, all baseline product scenarios will appear in the Scenario Editing dialogue. The analyst can now add any new product scenarios or delete those products, which are no longer expected to be in the market.35 By clicking Add Offer an additional scenario will be added at the bottom of the prod- uct scenario list. In the case of long product scenario lists, this may require scrolling all the way down. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 33 35 To maintain expositional simplicity, we have added all Panamera versions for the entire year 2010 and not changed any other product scenarios. It should be pointed out that the V6 version of the Porsche Panamera was introduced only in mid-2010. BMW has also launched an additional six-cylinder version of the 7-series as well as AWD variants, which are not reflected in the simulation. Finally, Jaguar has released a new XJ in 2010, while that year marked the runout of the old-generation Audi A8.
  • 34. Clicking on the product attributes of any scenario prompts drop-down menus to appear with the available attribute states, e.g. RWD or AWD.36 This also allows to change attributes of existing products, according to the analysts requirements. For our case study, we will add the following versions of the Panamera as new product scenarios: Panamera (V6, RWD) Panamera 4 (V6, AWD) Panamera S (V8, RWD) Panamera 4S (V8, AWD) Panamera Turbo (V8 Turbo, RWD) To characterize all of them as large 4-door luxury sedans, which is the key distinction versus previous Por- sche products, we will assign the “High Premium” attribute to them. Simulating Market Share with the Bayesia Market Simulator 34 www.bayesia.us | www.bayesia.sg 36 RWD and AWD stands for rear-wheel drive and all-wheel drive respectively
  • 35. Once this is completed, we need to obtain a database that represents the consumer base, on which these new product scenarios will be “tried out”. This can either be done by associating the original database, from which the network was learned, or by creating a new, artificial one that reflects the joint probability distri- bution of the learned Bayesian network. The latter can be achieved by selecting Database>Generate. It is up to the analyst to determine the size of the database to be generated. Although there is no fixed rule, too small of a database will limit the observability of products with a very small market share. Alternatively, we can also associate the original database, which contains the survey responses. In our case, the original database contains 1,203 records, which is very reasonable in terms of computational require- ments. Once a database is associated, clicking the Simulation button will start the market share estimation process. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 35
  • 36. With the given complexity of our network and around 100 product scenarios, the simulation should take no longer than 30 seconds on a typical desktop computer. Upon completion, the simulation results will appear in the form of a pie chart and a table. One can go back and review the scenarios by clicking the Scenario Editing button. Simulating Market Share with the Bayesia Market Simulator 36 www.bayesia.us | www.bayesia.sg
  • 37. The aggregated simulated market shares can also be copied from the results table and pasted into Excel or any other application for further editing and presentation purposes. An example is provided below, showing the simulated market shares of the brands under study in the High Premium segment. 1% 21% 3% 10% 53% 12% Simulated High Premium Market Shares ($75,000+) Audi BMW Jaguar Lexus Mercedes Porsche As can be seen from the results, the Porsche Panamera’s predicted market share appears to be compatible with the reported running rate for calendar year 2010, which was available at the time of writing. Unfortu- nately, we do not know how this compares to Porsche’s expectations, but the Panamera seems to be quite successful overall. Substitution and Cannibalization The fully simulated database can also be saved as a semicolon-delimited CSV file, which will allow review- ing the choice probability for each product scenario by individual consumer in a spreadsheet. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 37
  • 38. We can literally examine the new, simulated choices record-by-record and see which customers have made the switch to the Panamera. Applying conditional formatting to the spreadsheet can also be very helpful. The above screenshot, for example, shows a selection of actual Mercedes buyers, who would either consider or pick the Porsche Panamera in this simulation. High choice probabilities are shown in shades of red, while near-zero probabilities are depicted in dark blue. It is equally interesting to examine which Porsche buyers would pick the Panamera over their current vehicle choice. Simulating Market Share with the Bayesia Market Simulator 38 www.bayesia.us | www.bayesia.sg
  • 39. Not surprisingly, our simulation suggests high probabilities of Panamera choice for several current Cayenne owners. One is tempted to take this a step further and calculate a rate of cannibalization. In this particular survey, however, the sample size is too small to attempt doing so. Otherwise, such a computation would be simple arithmetic. Market Scenario Simulation Although experimenting with product scenarios is expected to be the primary use of the Bayesia Market Simulator, it is also possible to change the market scenarios. For example, this can be used to simulate the impact of policy changes. One could hypothesize that legisla- tion would prohibit or severely penalize ownership of vehicles of a certain size or of a specific engine type in urban areas.37 Upon editing the market segments, the simulation can be rerun to obtain the new market share results. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 39 37 Given the draconian restrictions on motorists in Central London, this example is presumably not very far-fetched.
  • 40. Limitations This approach can simulate product and market scenarios consisting of variations of configurations, which can be observed with sufficient sample today. However, the impact of entirely new technologies cannot be simulated on this basis. As a result, projecting the market share of the all-electric Nissan Leaf38 would not possible, whereas estimating the share of a hypothetical three-row BMW crossover vehicle would be feasi- ble. In all cases, it requires the analyst’s expert knowledge and judgment to determine the adequacy and equivalency of product attributes observable today. Outlook There exist several natural extensions to the presented methodology, however, it would go beyond the scope of this paper to present them. A brief summary shall suffice for now, and we will go into greater detail in forthcoming case studies in this series: Beyond learning from data, we can use expert knowledge to create or augment Bayesian networks. Bayesia- Lab offers a Knowledge Elicitation module, which formally captures expert knowledge and encodes it in a Bayesian network. In the absence of market data, this is an excellent approach to have decision makers col- lectively (and formally correct) reason about future states of the world. We can extend the concept of product attributes to consumers’ product satisfaction ratings. This will allow estimating the market share impact as a function of changes in consumer ratings. For instance, an auto- maker could reason about the volume impact from a vehicle facelift, which is expected to raise the con- sumer rating of “styling”. The product cannibalization or substitution rate can be estimated based on the simulated choice behavior, given that there is sufficient sample size. So, for most mainstream products, this seems to be realistic. With the ability to study consumer choice at the model level, we can also aggregate these results to the seg- ment level. Alternatively, using a less granular approach, we can model the entire market at the segment and brand level, which would allow studying market changes at a larger scale. Beyond simulating “hard” policy changes affecting the market, e.g. excluding a product class from a certain geography, we can also use BayesiaLab to simulate new populations with small changes in average con- sumer attitudes versus the originally surveyed population. For instance, such an artificially modified popula- tion could be more environmentally conscious, and one could apply opinions prevalent on the West Coast to the whole country. Bayesia Market Simulator can then generate new market shares based on these new hypothetical market conditions. Summary BayesiaLab and Bayesia Market Simulator are unique in their ability to use Bayesian networks for choice modeling and market share simulation. The presented workflow provides a comprehensive method for Simulating Market Share with the Bayesia Market Simulator 40 www.bayesia.us | www.bayesia.sg 38 The all-electric Leaf was launched by Nissan in the U.S. in December of 2010.
  • 41. simulating market shares of future products based on their key characteristics, without requiring new and costly experiments. As a result, BayesiaLab and Bayesia Market Simulator allow using a vast range of existing research for mar- ket share predictions. Given the significant resources many corporations have allocated over many years to conducting consumer surveys, these BayesiaLab tools offer an entirely new way to turn the accumulated research data into practical market oracles. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 41
  • 42. Appendix Utility-Based Choice Theory In today’s choice modeling practice, utility-based choice theory plays a dominant role. The first concept of utility-based choice theory is that each individual chooses the alternative that yields him or her the highest utility. The second idea refers to being able to collapse a vector describing attributes of choice alternatives into a single scalar utility value for the chooser. For instance, a vector of attributes for one choice alternative, e.g. [Price, Fuel Economy, Safety Rating], would translate into one scalar value, e.g. [5], specific to each chooser. The following example is meant to illustrate both: For Consumer A: Utility of Product 1: [Price=$25,000, Fuel Economy=25MPG, Safety Rating=4 stars] = 7 ✓ Utility of Product 2: [Price=$29,000, Fuel Economy=23MPG, Safety Rating=5 stars] = 5.5 For Consumer B: Utility of Product 1: [Price=$25,000, Fuel Economy=25MPG, Safety Rating=4 stars] = 4 Utility of Product 2: [Price=$29,000, Fuel Economy=23MPG, Safety Rating=5 stars] = 7.5 ✓ This concept implies that consumers make tradeoffs, either explicitly or implicitly, and that there exists an amount x of “Fuel Economy” that is equivalent in utility to an amount y of “Safety”. The reader may rea- sonably object that not even a fuel economy of 100MPG would make it acceptable to drive a vehicle that is rated very poorly on safety. Also, we do not know a priori what the utility values are nor can we measure them. Neither do we know in advance how individual product and consumer attributes relate to these unobservable utilities. However, there are methods that allow us to estimate these unknown variables and, based on this knowledge, they allow us to predict choice in the future. One such method is briefly highlighted in the following. Simulating Market Share with the Bayesia Market Simulator 42 www.bayesia.us | www.bayesia.sg
  • 43. Multinomial Logit Models In the domain of choice modeling, MultiNomial Logit models (MNL) have become the workhorse of the industry, but here we only want to provide a cursory overview, so the reader can compare the approach presented in the case study with current practice. MNL models provide a functional form for describing the relationship between the utilities of alternatives and the probability of choice. For instance, using an MNL model for a choice situation with three vehicle alternatives, Altima, Accord and Camry, the probability of choosing the Altima can be expressed as: Pr(Altima) = exp(VAltima ) exp(VAltima ) + exp(VAccord ) + exp(VCamry ) VAltima in this case stands for the utility of the Altima alternative. The utilities VAltima, VAccord, and VCamry are a function of the product attributes, e.g. VAltima = β1 × CostAltima + β2 × FuelEconomyAltima + β3 × SafetyRatingAltima As we can observe tangible attributes like vehicle cost, fuel economy and safety rating, and we can also observe who bought which vehicle, we can estimate the unknown parameters. Once we have the parameters, we can simulate choices based on new, hypothetical product attributes, such as a better fuel economy for the Altima or a lower price for the Camry. The parameters of MNL models can be estimated both from “stated preference” (SP) data, i.e. asking con- sumers about what they would choose, and “revealed preference” (RP) data, i.e. observing what they have actually chosen. There are numerous variations and extensions to the class of MNL models and the reader is referred to Train (2003) and Koppelman (2006) for a comprehensive introduction. Stated Preference Data Stated preference data typically comes from experiments, i.e. consumer surveys or product clinics. In this context, conjoint experiments have become a very popular choice elicitation method and a wide range of tools have been developed for this particular approach. In conjoint studies, consumers would typically be given a set of artificially generated product choices along with their attributes, from which preference re- sponses are then elicited. There are many variations of this method that all attempt to address some of the inherent challenges related to dealing with responses to hypothetical questions. The Sawtooth software package has become de-facto industry standard for such conjoint studies.39 Revealed Preference Data In contrast to SP data, revealed preference data is purely derived from passive observations. As the name implies, the consumer choice is revealed by their actual behavior rather than by their stated intent in a hypo- thetical situation. A key benefit is that it is typically easier and more economical to obtain passive observa- tions than to conduct formal experiments. A conceptual limitation of RP data relates to the fact that non- Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 43 39 A wide range of tools is available from Sawtooth Software, Inc., www.sawtoothsoftware.com.
  • 44. yet-existing products can obviously not be chosen by consumers in the present market environment. Thus simulating market shares of hypothetical products requires “assembling” them from components and at- tributes of products, which are already available in the market. This inherently limits the exploration of en- tirely new technologies, which have little in common with the technologies they may replace. Studies based on RP data have become very popular for researching travel mode choice, as is also docu- mented in a large body of research. In market research related to CPG products or durable goods, using RP data is somewhat less common. We speculate that one of the reasons for the lack of popularity outside the world of academia is the absence of easy-to-use software packages. Only recently, with the release of Easy Logit Modeling (ELM)40, specify- ing and estimating multinomial logit models has become practical for a much broader audience. Although ELM has successfully removed the burden of manual coding, countless iterations of specification and esti- mation remain a very time-consuming task of the analyst. NVES Variables The following variables from the 2009 Strategic Vision NVES were included this case study: • UNIQUE IDENTIFIER • Combined Base Weight • New Model Purchased - Make/Model/Series (Alpha Order) • New Model Purchased - Brand • New Model Purchased - Region Origin • New Model Segment • Segmentation 2 • Type Of Transmission • Number Of Cylinders (VIN) • Drive Type (VIN) • Fuel Type • Gender • Marital Status • Age Bracket Simulating Market Share with the Bayesia Market Simulator 44 www.bayesia.us | www.bayesia.sg 40 Easy Logit Modeling is available from ELM-Works, Inc., www.elm-works.com. ELM can estimate models based on both RP and SP data, although we only mention it in the RP context.
  • 45. • Children Under 6 • Children 6 To 12 • Children 13 To 17 • Total Family Pre-Tax Income • Ethnic Group • Location Of Residence • Customer Region Classification #1 • I Seek Variety in My Life • I'm Curious and Open to Experiences • Luxury is Not Important Unless it Has Purpose • I Enjoy Expressing Myself Creatively • I See Life as Full of Endless Possibilities • Driving is one of my favorite things to do • I really don't enjoy driving • Whenever I get a chance, I love to go for a drive • When I drive for fun, I mainly prefer to relax and listen to music or talk • I want vehicles that provide that open-air driving experience • I prefer a vehicle that has the capability to outperform others • I prefer vehicles that provide superior straight ahead power • I prefer vehicles that provide superior handling and cornering agility • I prefer a balance of comfort and performance • I prefer vehicles that provide the softest, most comfortable ride quality • I just want the basics on my vehicle - no extras • Value equals balance of costs, comfort & performance • I prefer vehicles that project a tough and workmanlike image • Vehicles are a 'tool' or a part of the 'gear' in an active outdoors lifestyle • I Want to be able to tow heavy loads Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 45
  • 46. • I want to be able to traverse any terrain • I want the most versatility in my interior • I want a basic, no frills vehicle that does the job • My choice of vehicle reflects my personality • I want a vehicle that says a lot about my success in life / career • I will switch brand for features or price • There are lots of different brands of vehicles that I would consider buying • I prefer sofa-like comfort over a cockpit-like interior • I want a vehicle that provides the quietest interior • I want to look good when driving my vehicle • I want my vehicle to stand out in a crowd • I would pay significantly more for environmentally friendly vehicle • Price is most important to me when buying a new vehicle • Purchase Price (100's) Simulating Market Share with the Bayesia Market Simulator 46 www.bayesia.us | www.bayesia.sg
  • 47. Framework: The Bayesian Network Paradigm41 Acyclic Graphs & Bayes’s Rule Probabilistic models based on directed acyclic graphs have a long and rich tradition, beginning with the work of geneticist Sewall Wright in the 1920s. Variants have appeared in many fields. Within statistics, such models are known as directed graphical models; within cognitive science and artificial intelligence, such models are known as Bayesian networks. The name honors the Rev. Thomas Bayes (1702-1761), whose rule for updating probabilities in the light of new evidence is the foundation of the approach. Rev. Bayes addressed both the case of discrete probability distributions of data and the more complicated case of continuous probability distributions. In the discrete case, Bayes’ theorem relates the conditional and marginal probabilities of events A and B, provided that the probability of B does not equal zero: P(A∣B) = P(B∣A)P(A) P(B) In Bayes’ theorem, each probability has a conventional name: P(A) is the prior probability (or “unconditional” or “marginal” probability) of A. It is “prior” in the sense that it does not take into account any information about  B; however, the event  B need not occur after event A. In the nineteenth century, the unconditional probability P(A) in Bayes’s rule was called the “ante- cedent” probability; in deductive logic, the antecedent set of propositions and the inference rule imply con- sequences. The unconditional probability P(A) was called “a priori” by Ronald A. Fisher. P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is de- rived from or depends upon the specified value of B. P(B|A) is the conditional probability of B given A. It is also called the likelihood. P(B) is the prior or marginal probability of B, and acts as a normalizing constant. Bayes theorem in this form gives a mathematical representation of how the conditional probability of event A given B is related to the converse conditional probability of B given A. The initial development of Bayesian networks in the late 1970s was motivated by the need to model the top- down (semantic) and bottom-up (perceptual) combination of evidence in reading. The capability for bidirec- tional inferences, combined with a rigorous probabilistic foundation, led to the rapid emergence of Bayesian networks as the method of choice for uncertain reasoning in AI and expert systems replacing earlier, ad hoc rule-based schemes. Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 47 41 Adapted from Pearl (2000), used with permission.
  • 48. The nodes in a Bayesian network represent variables of interest (e.g. the temperature of a device, the gen- der of a patient, a feature of an object, the occur- rence of an event) and the links represent statistical (informational) or causal dependencies among the variables. The dependencies are quantified by condi- tional probabilities for each node given its parents in the network. The network supports the computation of the posterior probabilities of any subset of vari- ables given evidence about any other subset. Compact Representation of the Joint Probability Distribution “The central paradigm of probabilistic reasoning is to identify all relevant variables x1, . . . , xN in the environment [i.e. the domain under study], and make a probabilistic model p(x1, . . . , xN) of their interaction [i.e. represent the variables’ joint probability distribution].” Bayesian networks are very attractive for this purpose as they can, by means of factorization, compactly represent the joint probability distribution of all variables. “Reasoning (inference) is then performed by introducing evidence that sets variables in known states, and subsequently computing probabilities of interest, conditioned on this evidence. The rules of probability, combined with Bayes’ rule make for a complete reasoning system, one which includes traditional deductive logic as a special case.” (Barber, 2012) Simulating Market Share with the Bayesia Market Simulator 48 www.bayesia.us | www.bayesia.sg
  • 49. References Barber, David. “Bayesian Reasoning and Machine Learning.” http://www.cs.ucl.ac.uk/staff/d.barber/brml. ———. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2011.   Darwiche, Adnan. “Bayesian networks.” Communications of the ACM 53, no. 12 (12, 2010): 80.   Koller, Daphne, and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. The MIT Press, 2009.   Koppelman, Frank, and Chandra Bhat. “A Self Instructing Course in Mode Choice Modeling: Multinomial and Nested Logit Models.” January 31, 2006. Krzywinski, M., J. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S. J. Jones, and M. A. Marra. “Circos: An information aesthetic for comparative genomics.” Genome Research 19, no. 9 (6, 2009): 1639-1645.   Neapolitan, Richard E., and Xia Jiang. Probabilistic Methods for Financial and Marketing Informatics. 1st ed. Morgan Kaufmann, 2007.   Pearl, Judea. Causality: Models, Reasoning and Inference. 2nd ed. Cambridge University Press, 2009.   Spirtes, Peter, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search, Second Edition. 2nd ed. The MIT Press, 2001.   Train, Kenneth. Qualitative Choice Analysis: Theory, Econometrics, and an Application to Automobile Demand. 1st ed. The MIT Press, 1985.   Train, Kenneth E. Discrete Choice Methods with Simulation. Cambridge University Press, 2003.   Simulating Market Share with the Bayesia Market Simulator www.bayesia.us | www.bayesia.sg 49
  • 50. Contact Information Bayesia USA 312 Hamlet’s End Way Franklin, TN 37067 USA Phone: +1 888-386-8383 info@bayesia.us www.bayesia.us Bayesia Singapore Pte. Ltd. 20 Cecil Street #14-01, Equity Plaza Singapore 049705 Phone: +65 3158 2690 info@bayesia.sg www.bayesia.sg Bayesia S.A.S. 6, rue Léonard de Vinci BP 119 53001 Laval Cedex France Phone: +33(0)2 43 49 75 69 info@bayesia.com www.bayesia.com Copyright © 2013 Bayesia S.A.S., Bayesia USA and Bayesia Singapore. All rights reserved. Simulating Market Share with the Bayesia Market Simulator 50 www.bayesia.us | www.bayesia.sg