# Understand Customer Decision Making: Discrete Choice Models with RStan

## Understand Customer Decision Making: Discrete Choice Models with RStan

In our introductory article, we explained how discrete choice models can generate insights into customers' decision-making behaviour. To illustrate this with an example, we reproduced some of the analyses from Elshiewy et al. (2017). The main results of the modelling using multinomial logistic regression (abbreviated: MNL modelling) were presented in our first article, without going into technical or methodological details. In this article we show how to estimate an MNL model using RStan, the R interface of the statistical software Stan. Stan is a programming language for statistical inference and is written in C++. It is used to specify complex (Bayesian) statistical models. Although there are packages such as rstanarm for regularly used models, one advantage of Stan is its high flexibility in specifying customized models. In particular, Stan allows the use of expert knowledge from marketing professionals that is not included in standard software packages. Since the results obtained with MNL models are often too undercomplex for practical use, in a second step we present so-called random parameter/mixed MNL models. These models are state of the art in marketing research and allow to account for consumer heterogeneity. In particular, different sensitivities to advertising or brand loyalty can be measured, allowing marketing managers to target consumers individually instead of using uniform strategies. We show the usefulness of mixed MNL modelling estimated in Stan compared to simple MNL modelling and present the Stan code of the models.

## Data Basis

The data set used comes from Jain et al. (1994) and was analysed in Elshiewy et al. (2017). It contains revealed preference data for 136 consumers whose decision for one of four cracker brands (Keebler, Nabisco, Sunshine, own brand) was observed at different time points (between 14 and 77). For each consumer n (individual), each brand j (alternative) and each choice situation t, the dataset contains three cracker-specific marketing mix variables (prices of the crackers and information about promotional activities, i.e. in-store and out-of-store advertising). This information is characterised by the fact that it is available both for the cracker brand chosen by the buyer and for the available alternatives. In addition to this alternative-specific information, discrete choice models can also deal with individual-specific variables such as age or gender, which are not available here.

## Data Preparation and Analysis

We install and load the following R packages:

```
library(dplyr)
library(mlogit)
library(rstan)
```

We get access to the data (which is included in the `mlogit`

package) by the following commands:

```
data(Cracker)
cracker <- Cracker
```

The data is available in so-called wide format where each row represents one purchase decision. In wide format there are as many columns as alternatives for each alternative-specific variable. Alternative-specific variables have the same prefix (the name of the marketing mix variable), followed by a separator (for example, a full stop) and an identifier for the alternative. The first marketing mix variable collected is the price of the crackers (`price`

). We normalise the price of a cracker with respect to its weight and convert to $ / oz. Further marketing mix variables are `disp`

(in-store advertising) and `feat`

(out-of-store advertising). As in Elshiewy et al. (2017), we calculate a fourth variable `lastChoice`

, which measures some type of brand loyalty. For this purpose, we use the information about which brand was chosen in the previous purchase. We delete the first purchase decision of each person, as the value of the variable `lastChoice`

is unknown for them. Please note that the id already present in the data set is the `personId n`

, which provides information about which person made the purchase decision. We rename the id to `personId`

and create a `choiceId`

and a `personChoiceId`

.

```
cracker <- cracker %>%
rename(personId = id) %>%
group_by(personId) %>%
mutate(price.keebler = price.keebler / 100,
price.sunshine = price.sunshine / 100,
price.nabisco = price.nabisco / 100,
price.private = price.private / 100,
lastChoice = lag(choice)) %>%
filter(!is.na(lastChoice)) %>%
mutate(personChoiceId = row_number()) %>%
ungroup() %>%
mutate(choiceId = row_number()) %>%
as.data.frame()
```

## Estimating a Multinomial Logit Model with RStan

In the following, we demonstrate how to estimate a multinomial logit model using RStan. The MNL model is a standard procedure in marketing research for analysing purchase decisions. The model we estimate is characterised by generic coefficients (i.e. one coefficient per variable) for the marketing variables. MNL models of this type are also called conditional logit models. Generic coefficients are useful whenever variables are examined that produce the same benefit to buyers regardless of the alternative. For example, a discount of one euro is worth exactly one euro regardless of the alternative brands.

A Stan model is specified in a .stan file. The file consists of at least the three blocks `data`

, `parameters`

and `model`

, in which scalars, integers, vectors, matrices and arrays for data are declared and the model is specified. Detailed information can be found here. As an alternative to a .stan file, you can also specify the blocks in a character vector within an R script, which we do below.

An MNL model with intercepts (here referred to as vector alpha) and generic parameters for alternative-specific variables (beta) can be specified as follows:

```
m1 <- "
data {
int<lower=2> nAlternatives; // number of alternatives/outcomes
int<lower=1> nChoices; // number of decisions
int<lower=1> nVariables; // number of covariates
int<lower=0,upper=nAlternatives> Y[nChoices];
matrix[nAlternatives,nVariables] X[nChoices];
}
parameters {
vector[nAlternatives-1] alpha_raw;
vector[nVariables] beta;
}
transformed parameters {
vector[nAlternatives] alpha;
alpha = append_row(0, alpha_raw);
}
model {
for (i in 1:nChoices)
Y[i] ~ categorical_logit(alpha + X[i]*beta);
}"
```

We then pass a list of data with the objects `nAlternatives`

, `nChoices`

, `nVariables`

, `X`

and `Y`

specified in the data block to the `stan()`

function from the `rstan`

package. In combination with the code in `m1`

, this will estimate the model parameters.

In our case, `nAlternatives`

= 4, `nChoices`

= 3156 and `nVariables`

= 4. `X`

is a list of length `nChoices`

containing in each entry a 4x4 data matrix filled with the values of the marketing variables for each alternative. `Y`

represents the brand (represented by a number between 1 and 4) that was chosen for each purchase situation. To put the data into the correct format, we use the function `mlogit.data()`

, which switches the data from wide to long format. Detailed information on how to use `mlogit.data()`

can be found here.

```
cracker_mlogit <- mlogit.data(cracker, id = "choiceId", choice = "choice", varying= c(2:13), shape = "wide", sep = ".")
cracker_mlogit$lastChoice <- as.integer(cracker_mlogit$lastChoice == cracker_mlogit$alt)
cracker_long <- cracker_mlogit %>% as.data.frame()
disp <- split(cracker_long$disp, ceiling(seq_along(cracker_long$disp)/4))
feat <- split(cracker_long$feat, ceiling(seq_along(cracker_long$feat)/4))
price <- split(cracker_long$price, ceiling(seq_along(cracker_long$price)/4))
lastChoice <- split(cracker_long$lastChoice, ceiling(seq_along(cracker_long$lastChoice)/4))
X <- mapply(cbind, disp, feat, price, lastChoice, SIMPLIFY = FALSE)
Y <- as.numeric(factor(cracker$choice, levels = c("keebler", "nabisco", "private", "sunshine")))
stanData1 <- list(nChoices = 3156, nAlternatives = 4, nVariables = 4, Y = Y, X = X)
```

We now estimate the model using R-Stan as follows:

```
p1 <- stan(model_code = m1, data= stanData1, init_r = 0.1,
iter= 400, warmup = 200, chains = 4,
control = list(max_treedepth = 10, adapt_delta = 0.8))
```

The coefficients and other important information are stored in `p1`

.

Please note that there are a number of recommended steps and measures for evaluating the goodness of fit of the estimate that are not discussed in detail here. It should be noted, however, that one should always test the specified probability model on simulated data (and parameters) before using it for parameter estimation of real data. In this article, we were able to benchmark the model against the model discussed in the article by Elshiewy et al. (2017). Our parameter estimates are consistent with those in the article.

The results of our estimates are shown in the table below. They have already been discussed in our previous article and are therefore only briefly summarized in the conclusion.

### MNL model with RStan

Mean estimate | Mean std. error | |
---|---|---|

Nabisco (Intercept) | 1.14 | 0.01 |

Private (Intercept) | -0.59 | 0.01 |

Sunshine (Intercept) | -0.64 | 0.01 |

Price | -3.60 | 0.01 |

Feature | 0.73 | 0.01 |

Display | 0.17 | 0.00 |

Lastchoice | 2.06 | 0.00 |

## Estimation of a Mixed Logit Model with RStan

Since the MNL model has a number of limitations, the coefficients may be biased. One limitation is that the model is not able to account for heterogeneity between individuals. Instead of assuming that parameters vary from person to person, they make the simplified assumption that there is only one common parameter for each variable. Models that can account for heterogeneity in the coefficients, on the other hand, are called random coefficient logit, hierarchical Bayesian or mixed logit models, depending on the discipline.

In the following, we estimate a mixed logit model for which we assume that the distribution of the coefficients across individuals (also known as "individual part-worths" in marketing) is multivariate-normally distributed. For the representation of the coefficient matrix, we use the Cholesky decomposition for performance reasons.

```
m2 <- "
data {
int N; // number of rows
int nDecisions; // number of person-choice sets
int nPersons; // number of persons
int nVariables; // number of covariates that vary by choice
int nAlternatives; // number of alternatives
vector[N] Y; // binary indicator for choice
matrix[N, nVariables] X; // choice attributes
int decision_person[nDecisions]; // index for person
int start[nDecisions]; // the starting observation for each decision
int end[nDecisions]; // the ending observation for each decision
}
parameters {
vector[nVariables] beta; // hypermeans of the part-worths
vector[nVariables] tau; // diagonal of the part-worth covariance matrix
matrix[nPersons, nVariables] z; // individual random effects (unscaled)
cholesky_factor_corr[nVariables] L_Omega;
}
transformed parameters {
matrix[nPersons, nVariables] beta_individual = rep_matrix(beta', nPersons) + z*diag_pre_multiply(tau, L_Omega);
}
model {
vector[N] log_prob;
// priors on the parameters
to_vector(z) ~ normal(0, 1);
tau ~ normal(0, 0.5);
beta ~ normal(0, 0.5);
L_Omega ~ lkj_corr_cholesky(4);
for(t in 1:nDecisions) {
vector[nAlternatives] utilities;
utilities = X[start[t]:end[t]]*beta_individual[decision_person[t]]';
log_prob[start[t]:end[t]] = log(softmax(utilities));
}
target += log_prob' * Y;
}
"
```

Again, we have to pass a list of objects to the `stan()`

function. The objects are `N`

, `nAlternatives`

, `nDecisions`

, `nPersons`

, `nVariables`

, `Y`

and `X`

and indices (`decision_person`

, `start`

and `end`

). In `Y`

, the purchase decision is coded in long format as a dummy variable. We add constants for three of the four alternatives. The data can be generated in R as follows:

```
X <- cracker_long[c("lastChoice", "disp", "feat", "price")]
X$Keebler <- rep(c(1,0,0,0), n = 3156)
X$Nabisco <- rep(c(0,1,0,0), n = 3156)
X$Sunshine <- rep(c(0,0,0,1), n = 3156)
Y <- cracker_long[c("choice")] %>% unlist() %>% as.numeric()
indexes <- data.frame(person = cracker_long$personId,
decision = cracker_long$choiceId,
row = seq(1:nrow(cracker_long))) %>%
group_by(decision) %>%
summarize(decision_person = first(person),
start = first(row),
end = last(row))
stanData2 <- list(N = 12624, nChoices = 3156, nPersons = 136, nVariables = 7, nAlternatives = 4, Y = Y, X = X, decision_person = indexes$decision_person, start = indexes$start, end = indexes$end)
```

We then estimate the model using:

```
p2 <- stan(model_code = m2, data= stanData2, init_r = 0.1,
iter= 1000, warmup = 500, chains = 4,
control = list(max_treedepth = 10, adapt_delta = 0.8))
```

Please note that increased computer capacity is needed for this modeling. The coefficients and other important information are stored in `p2`

.

Unlike the MNL model, a mixed logit model does not provide a (generic) coefficient per variable, but coefficients per variable and person, which can then be interpreted either at the person level, or aggregated across all persons (as mean sensitivities). The following table presents the mean coefficients (i.e., the aggregates over all persons).

Mean estimate | Mean std. error | SD estimate | |
---|---|---|---|

Keebler (Intercept) | -0.21 | 0.01 | 0.24 |

Nabisco (Intercept) | 2.10 | 0.01 | 0.23 |

Sunshine (Intercept) | -0.53 | 0.01 | 0.22 |

Price | -2.06 | 0.01 | 0.36 |

Feature | 0.73 | 0.01 | 0.19 |

Display | 0.28 | 0.00 | 0.14 |

Lastchoice | 0.75 | 0.00 | 0.12 |

## Conclusion

Discrete choice models are of importance for marketing analysis, as they provide valuable insights into customer buying behavior. For estimating the discrete choice models in this article, we used RStan, the R interface of the statistical programming language Stan. In the first step, we showed how to estimate an MNL model with RStan. An MNL model allows to generate a number of interesting insights. For example, a negative sign for the `price`

variable means that falling prices are associated with higher utility. Decreasing prices thus increase the purchasing probability. Furthermore, positive signs for the variables `feature`

and `display`

indicate that both advertising strategies are useful and increase the purchase probability on average. In addition, the model shows that customers are brand-loyal on average (positive sign for `LastChoice`

).

Please note that with an MNL model only statements about average sensitivities can be made and therefore only uniform advertising strategies can be developed. However, this is not sufficient, as it is quite likely that advertising strategies have positive effects on average, but some individual customers are negatively influenced by additional advertising.Therefore, it has become state-of-the-art to use models that allow customer-specific analyses and take into account the heterogeneity between customers. For this purpose, we presented the RStan code of a mixed logit model. A mixed logit model estimates coefficients per variable and customer. In Table 2 we presented the average coefficients of a mixed logit model over all customers, which makes them comparable to those of the MNL model. Here it is noticeable that there are differences between the (average) parameters of the mixed logit model and those of the MNL model. In particular, (mean) brand loyalty has decreased in the mixed logit model, as the coefficient is significantly smaller. This phenomenon was also found in the article by Elshiewy et al. (2017) (with slightly different values). It is likely that the coefficients of the MNL model are biased and thus brand loyalty is falsely overestimated in the MNL model.

In addition to the mean sensitivities, individual sensitivities can be analyzed, which allows to make statements about individual customers. We found that although price sensitivity is negative on average, there are some customers with positive price sensitivity. In addition, we found that there are some customers who do not respond positively to the advertising. For these customers, different advertising strategies should be developed by marketing managers. Overall, a mixed logit model can provide deeper and more correct insights than an MNL model. In particular, individual decisions can be better understood and consumers can thus be addressed individually. Although there are some packages for mixed logit models available in R, more complex adjustments based on expert knowledge (complex distribution assumptions, parameter bounds) cannot always be made directly. This is the strength of RStan, which allows very flexible model specifications. Model-based statements and forecasts can thus be improved.

### Literature

- Jain, D. C., Vilcassim, N. J., and Chintagunta, P. K. (1994). A random-coefficients logit brand-choice model applied to panel data. Journal of Business & Economic Statistics, 12(3), 317–328.
- Elshiewy, O., Guhl, D. and Boztug, Y. (2017): Multinomial Logit Models in marketing - From Fundamentals to State-of-the-Art. Marketing ZFP - Journal of Research and Marketing, 39(3), 32–55.