```
import arviz as az
import numpy as np
import pandas as pd
import warnings
import bambi as bmb
="ignore", category=FutureWarning) warnings.simplefilter(action
```

# Plot Comparisons

`comparisons`

and `plot_comparisons`

are a part of Bambi’s sub-package `plots`

that feature a set of functions used to interpret complex regression models. This sub-package is inspired by the R package marginaleffects. These two functions allow the modeler to **compare** the predictions made by a model for different contrast and covariate values. Below, it is described why comparing predictions is useful in interpreting generalized linear models (GLMs), how this methodology is implemented in Bambi, and how to use `comparisons`

and `plot_comparisons`

. It is assumed that the reader is familiar with the basics of GLMs. If not, refer to the Bambi Basic Building Blocks example.

Due to the link function in a GLM, there are typically three quantities of interest to interpret:

- the linear predictor \(\eta = X\beta\) where \(X\) is an \(n\) x \(p\) matrix of explanatory variables.
- the mean \(\mu = g^{-1}(\eta)\) where the link function \(g(\cdot)\) relates the linear predictor to the mean of the outcome variable \(\mu = g^{-1}(\eta) = g^{-1}(X\beta)\)
- the response variable \(Y \sim \mathcal{D}(\mu, \theta)\) where \(\mu\) is the mean parameter and \(\theta\) is (possibly) a vector that contains all the other “auxillary” parameters of the distribution.

Often, with GLMs, \(\eta\) is linear in the parameters, but nonlinear in relation of inputs to the outcome \(Y\) due to the link function \(g\). Thus, as modelers, we are usually more interested in interpreting (2) and (3). For example, in logistic regression, the linear predictor is on the log-odds scale, but the quantity of interest is on the probability scale. In Poisson regression, the linear predictor is on the log-scale, but the response variable is on the count scale. Referring back to logistic regression, a specified difference in one of the \(x\) variables does *not* correspond to a constant difference in the the probability of the outcome.

It is often helpful with GLMs, for the modeler and audience, to have a summary that gives the expected difference in the outcome corresponding to a unit difference in each of the input variables. Thus, the goal of `comparisons`

and `plot_comparisons`

is to provide the modeler with a summary and visualization of the average predicted difference.

## Average Predictive Differences

Here, we adopt the notation from Chapter 14.4 of Regression and Other Stories to describe average predictive differences. Assume we have fit a Bambi model predicting an outcome \(Y\) based on inputs \(X\) and parameters \(\theta\). Consider the following scalar inputs:

\[w: \text{the input of interest}\] \[c: \text{all the other inputs}\] \[X = (w, c)\]

Suppose for the input of interest, we are interested in comparing \(w^{\text{high}}\) to \(w^{\text{low}}\) (perhaps age = \(60\) and \(40\) respectively) with all other inputs \(c\) held constant. The *predictive difference* in the outcome changing **only** \(w\) is:

\[\text{average predictive difference} = \mathbb{E}(y|w^{\text{high}}, c, \theta) - \mathbb{E}(y|w^{\text{low}}, c, \theta)\]

Selecting the maximum and minimum values of \(w\) and averaging over all other inputs \(c\) in the data gives you a new “hypothetical” dataset and corresponds to counting all pairs of transitions of \((w^\text{low})\) to \((w^\text{high})\), i.e., differences in \(w\) with \(c\) held constant. The difference between these two terms is the average predictive difference.

### Computing Average Predictive Differences

The objective of `comparisons`

and `plot_comparisons`

is to compute the expected difference in the outcome corresponding to three different scenarios for \(w\) and \(c\) where \(w\) is either provided by the user, else a default value is computed by Bambi (described in the default values section). The three scenarios are:

- user provided values for \(c\).
- a grid of equally spaced and central values for \(c\).
- empirical distribution (original data used to fit the model) for \(c\).

In the case of (1) and (2) above, Bambi assembles all pairwise combinations (transitions) of \(w\) and \(c\) into a new “hypothetical” dataset. In (3), Bambi uses the original \(c\), but replaces \(w\) with the user provided value or the default value computed by Bambi. In each scenario, predictions are made on the data using the fitted model. Once the predictions are made, comparisons are computed using the posterior samples by taking the difference in the predicted outcome for each pair of transitions. The average of these differences is the average predictive difference.

Thus, the goal of `comparisons`

and `plot_comparisons`

is to provide the modeler with a summary and visualization of the average predictive difference. Below, we demonstrate how to compute and plot average predictive differences with `comparisons`

and `plot_comparions`

using several examples.

## Zero Inflated Poisson

We model and predict how many fish are caught by visitors at a state park using survey data. Many visitors catch zero fish, either because they did not fish at all, or because they were unlucky. We would like to explicitly model this bimodal behavior (zero versus non-zero) using a Zero Inflated Poisson model, and to compare how different inputs of interest \(w\) and other covariate values \(c\) are associated with the number of fish caught. The dataset contains data on 250 groups that went to a state park to fish. Each group was questioned about how many fish they caught (`count`

), how many children were in the group (`child`

), how many people were in the group (`persons`

), if they used a live bait and whether or not they brought a camper to the park (`camper`

).

```
= pd.read_csv("https://stats.idre.ucla.edu/stat/data/fish.csv")
fish_data = ["count", "livebait", "camper", "persons", "child"]
cols = fish_data[cols]
fish_data "child"] = fish_data["child"].astype(int)
fish_data["livebait"] = pd.Categorical(fish_data["livebait"])
fish_data["camper"] = pd.Categorical(fish_data["camper"]) fish_data[
```

```
= bmb.Model(
fish_model "count ~ livebait + camper + persons + child",
fish_data, ="zero_inflated_poisson"
family
)
= fish_model.fit(
fish_idata =1000,
draws=0.95,
target_accept=1234,
random_seed=4
chains )
```

```
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [psi, Intercept, livebait, camper, persons, child]
```

`Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 45 seconds.`

### User Provided Values

First, an example of scenario 1 (user provided values) is given below. In both `plot_comparisons`

and `comparisons`

, \(w\) and \(c\) are represented by `contrast`

and `conditional`

, respectively. The modeler has the ability to pass their own values for `contrast`

and `conditional`

by using a dictionary where the key-value pairs are the covariate and value(s) of interest. For example, if we wanted to compare the number of fish caught for \(4\) versus \(1\) `persons`

conditional on a range of `child`

and `livebait`

values, we would pass the following dictionary in the code block below. By default, for \(w\), Bambi compares \(w^\text{high}\) to \(w^\text{low}\). Thus, in this example, \(w^\text{high}\) = 4 and \(w^\text{low}\) = 1. The user is not limited to passing a list for the values. A `np.array`

can also be used. Furthermore, Bambi by default, maps the order of the dict keys to the main, group, and panel of the matplotlib figure. Below, since `child`

is the first key, this is used for the x-axis, and `livebait`

is used for the group (color). If a third key was passed, it would be used for the panel (facet).

```
= bmb.interpret.plot_comparisons(
fig, ax =fish_model,
model=fish_idata,
idata={"persons": [1, 4]},
contrast={"child": [0, 1, 2], "livebait": [0, 1]},
conditional
) 7, 3) fig.set_size_inches(
```

`Default computed for unspecified variable: camper`

The plot above shows that, comparing \(4\) to \(1\) persons given \(0\) children and using livebait, the expected difference is about \(26\) fish. When not using livebait, the expected difference decreases substantially to about \(5\) fish. Using livebait with a group of people is associated with a much larger expected difference in the number of fish caught.

`comparisons`

can be called to view a summary dataframe that includes the term \(w\) and its contrast, the specified `conditional`

covariate, and the expected difference in the outcome with the uncertainty interval (by default the 94% highest density interval is computed).

```
bmb.interpret.comparisons(=fish_model,
model=fish_idata,
idata={"persons": [1, 4]},
contrast={"child": [0, 1, 2], "livebait": [0, 1]},
conditional )
```

`Default computed for unspecified variable: camper`

term | estimate_type | value | child | livebait | camper | estimate | lower_3.0% | upper_97.0% | |
---|---|---|---|---|---|---|---|---|---|

0 | persons | diff | (1, 4) | 0 | 0 | 1 | 4.834472 | 2.563472 | 7.037150 |

1 | persons | diff | (1, 4) | 0 | 1 | 1 | 26.423188 | 23.739729 | 29.072748 |

2 | persons | diff | (1, 4) | 1 | 0 | 1 | 1.202003 | 0.631629 | 1.780965 |

3 | persons | diff | (1, 4) | 1 | 1 | 1 | 6.571943 | 5.469275 | 7.642248 |

4 | persons | diff | (1, 4) | 2 | 0 | 1 | 0.301384 | 0.143676 | 0.467608 |

5 | persons | diff | (1, 4) | 2 | 1 | 1 | 1.648417 | 1.140415 | 2.187190 |

But why is `camper`

also in the summary dataframe? This is because in order to peform predictions, Bambi is expecting a value for each covariate used to fit the model. Additionally, with GLM models, average predictive comparisons are conditional in the sense that the estimate depends on the values of all the covariates in the model. Thus, for unspecified covariates, `comparisons`

and `plot_comparisons`

computes a default value (mean or mode based on the data type of the covariate). Thus, \(c\) = `child`

, `livebait`

, `camper`

. Each row in the summary dataframe is read as “comparing \(4\) to \(1\) persons conditional on \(c\), the expected difference in the outcome is \(y\).”

### Multiple contrast values

Users can also perform comparisons on multiple contrast values. For example, if we wanted to compare the number of fish caught between \((1, 2)\), \((1, 4)\), and \((2, 4)\) `persons`

conditional on a range of values for `child`

and `livebait`

.

```
= bmb.interpret.comparisons(
multiple_values =fish_model,
model=fish_idata,
idata={"persons": [1, 2, 4]},
contrast={"child": [0, 1, 2], "livebait": [0, 1]}
conditional
)
multiple_values
```

`Default computed for unspecified variable: camper`

term | estimate_type | value | child | livebait | camper | estimate | lower_3.0% | upper_97.0% | |
---|---|---|---|---|---|---|---|---|---|

0 | persons | diff | (1, 2) | 0 | 0 | 1 | 0.527627 | 0.295451 | 0.775465 |

1 | persons | diff | (1, 2) | 0 | 1 | 1 | 2.883694 | 2.605690 | 3.177685 |

2 | persons | diff | (1, 2) | 1 | 0 | 1 | 0.131319 | 0.067339 | 0.195132 |

3 | persons | diff | (1, 2) | 1 | 1 | 1 | 0.717965 | 0.592968 | 0.857893 |

4 | persons | diff | (1, 2) | 2 | 0 | 1 | 0.032960 | 0.015212 | 0.052075 |

5 | persons | diff | (1, 2) | 2 | 1 | 1 | 0.180270 | 0.123173 | 0.244695 |

6 | persons | diff | (1, 4) | 0 | 0 | 1 | 4.834472 | 2.563472 | 7.037150 |

7 | persons | diff | (1, 4) | 0 | 1 | 1 | 26.423188 | 23.739729 | 29.072748 |

8 | persons | diff | (1, 4) | 1 | 0 | 1 | 1.202003 | 0.631629 | 1.780965 |

9 | persons | diff | (1, 4) | 1 | 1 | 1 | 6.571943 | 5.469275 | 7.642248 |

10 | persons | diff | (1, 4) | 2 | 0 | 1 | 0.301384 | 0.143676 | 0.467608 |

11 | persons | diff | (1, 4) | 2 | 1 | 1 | 1.648417 | 1.140415 | 2.187190 |

12 | persons | diff | (2, 4) | 0 | 0 | 1 | 4.306845 | 2.267097 | 6.280005 |

13 | persons | diff | (2, 4) | 0 | 1 | 1 | 23.539494 | 20.990931 | 26.240169 |

14 | persons | diff | (2, 4) | 1 | 0 | 1 | 1.070683 | 0.565931 | 1.585718 |

15 | persons | diff | (2, 4) | 1 | 1 | 1 | 5.853978 | 4.858957 | 6.848519 |

16 | persons | diff | (2, 4) | 2 | 0 | 1 | 0.268423 | 0.124033 | 0.412274 |

17 | persons | diff | (2, 4) | 2 | 1 | 1 | 1.468147 | 1.024800 | 1.960934 |

Notice how the contrast \(w\) varies while the covariates \(c\) are held constant. Currently, however, plotting multiple contrast values can be difficult to interpret since the contrast is “abstracted” away onto the y-axis. Thus, it would be difficult to interpret which portion of the plot corresponds to which contrast value. Therefore, it is currently recommended that if you want to plot multiple contrast values, call `comparisons`

directly to obtain the summary dataframe and plot the results yourself.

### Default contrast and conditional values

Now, we move onto scenario 2 described above (grid of equally spaced and central values) in computing average predictive comparisons. You are not required to pass values for `contrast`

and `conditional`

. If you do not pass values, Bambi will compute default values for you. Below, it is described how these default values are computed.

The default value for `contrast`

is a *centered difference* at the mean for a contrast variable with a numeric dtype, and *unique levels* for a contrast varaible with a categorical dtype. For example, if the modeler is interested in the comparison of a \(5\) unit increase in \(w\) where \(w\) is a numeric variable, Bambi computes the mean and then subtracts and adds \(2.5\) units to the mean to obtain a *centered difference*. By default, if no value is passed for the contrast covariate, Bambi computes a one unit centered difference at the mean. For example, if only `contrast="persons"`

is passed, then \(\pm\) \(0.5\) is applied to the mean of persons. If \(w\) is a categorical variable, Bambi computes and returns the unique levels. For example, if \(w\) has levels [“high scool”, “vocational”, “university”], Bambi computes and returns the unique values of this variable.

The default values for `conditional`

are more involved. Currently, by default, if a dict or list is passed to `conditional`

, Bambi uses the ordering (keys if dict and elements if list) to determine which covariate to use as the main, group (color), and panel (facet) variable. This is the same logic used in `plot_comparisons`

described above. Subsequently, the default values used for the `conditional`

covariates depend on their ordering **and** dtype. Below, the psuedocode used for computing default values covariates passed to `conditional`

is outlined:

```
if v == "main":
if v == numeric:
return np.linspace(v.min(), v.max(), 50)
elif v == categorical:
return np.unique(v)
elif v == "group":
if v == numeric:
return np.quantile(v, np.linspace(0, 1, 5))
elif v == categorical:
return np.unique(v)
elif v == "panel":
if v == numeric:
return np.quantile(v, np.linspace(0, 1, 5))
elif v == categorical:
return np.unique(v)
```

Thus, letting Bambi compute default values for `conditional`

is equivalent to creating a hypothetical “data grid” of new values. Lets say we are interested in comparing the number of fish caught for the contrast `livebait`

conditional on `persons`

and `child`

. This time, lets call `comparisons`

first to gain an understanding of the data generating the plot.

```
= bmb.interpret.comparisons(
contrast_df =fish_model,
model=fish_idata,
idata="livebait",
contrast=["persons", "child"],
conditional
)
10) contrast_df.head(
```

```
Default computed for contrast variable: livebait
Default computed for conditional variable: persons, child
Default computed for unspecified variable: camper
```

term | estimate_type | value | persons | child | camper | estimate | lower_3.0% | upper_97.0% | |
---|---|---|---|---|---|---|---|---|---|

0 | livebait | diff | (0, 1) | 1 | 0 | 1 | 1.694646 | 1.252803 | 2.081207 |

1 | livebait | diff | (0, 1) | 1 | 1 | 1 | 0.422448 | 0.299052 | 0.551766 |

2 | livebait | diff | (0, 1) | 1 | 2 | 1 | 0.106202 | 0.063174 | 0.152961 |

3 | livebait | diff | (0, 1) | 1 | 3 | 1 | 0.026923 | 0.012752 | 0.043035 |

4 | livebait | diff | (0, 1) | 2 | 0 | 1 | 4.050713 | 3.299138 | 4.782635 |

5 | livebait | diff | (0, 1) | 2 | 1 | 1 | 1.009094 | 0.755449 | 1.249551 |

6 | livebait | diff | (0, 1) | 2 | 2 | 1 | 0.253511 | 0.163468 | 0.355214 |

7 | livebait | diff | (0, 1) | 2 | 3 | 1 | 0.064224 | 0.032733 | 0.100539 |

8 | livebait | diff | (0, 1) | 3 | 0 | 1 | 9.701813 | 8.391122 | 11.084168 |

9 | livebait | diff | (0, 1) | 3 | 1 | 1 | 2.415233 | 1.922802 | 2.927737 |

Before we talk about the summary output, you should have noticed that messages are being logged to the console. By default `interpret`

is *verbose* and logs a message to the console if a default value is computed for covariates in `conditional`

and `contrast`

. This is useful because unless the documentation is read, it can be difficult to tell which covariates are having default values computed for. Thus, Bambi has a config file `bmb.config["INTERPRET_VERBOSE"]`

where we can specify whether or not to log messages. By default, this is set to true. To turn off logging, set `bmb.config["INTERPRET_VERBOSE"] = False`

. From here on, we will turn off logging.

As `livebait`

was encoded as a categorical dtype, Bambi returned the unique levels of \([0, 1]\) for the contrast. `persons`

and `child`

were passed as the first and second element and thus act as the main and group variables, respectively. It can be see from the output above, that an equally spaced grid was used to compute the values for `persons`

, whereas a quantile based grid was used for `child`

. Furthermore, as `camper`

was unspecified, the mode was used as the default value. Lets go ahead and plot the commparisons.

`"INTERPRET_VERBOSE"] = False bmb.config[`

```
= bmb.interpret.plot_comparisons(
fig, ax =fish_model,
model=fish_idata,
idata="livebait",
contrast=["persons", "child"],
conditional
) 7, 3) fig.set_size_inches(
```

The plot shows us that the expected differences in fish caught comparing a group of people who use livebait and no livebait is not only conditional on the number of persons, but also children. However, the plotted comparisons for `child`

= \(3\) is difficult to interpret on a single plot. Thus, it can be useful to pass specific `group`

and `panel`

arguments to aid in the interpretation of the plot. Therefore, `subplot_kwargs`

allows the user to manipulate the plotting by passing a dictionary where the keys are `{"main": ..., "group": ..., "panel": ...}`

and the values are the names of the covariates to be plotted. Below, we plot the same comparisons as above, but this time we specify `group`

and `panel`

to both be `child`

.

```
= bmb.interpret.plot_comparisons(
fig, ax =fish_model,
model=fish_idata,
idata="livebait",
contrast=["persons", "child"],
conditional={"main": "persons", "group": "child", "panel": "child"},
subplot_kwargs={"figsize":(12, 3), "sharey": True},
fig_kwargs=False
legend )
```

### Unit level contrasts

Evaluating average predictive comparisons at central values for the conditional covariates \(c\) can be problematic when the inputs have a large variance since no single central value (mean, median, etc.) is representative of the covariate. This is especially true when \(c\) exhibits bi or multimodality. Thus, it may be desireable to use the empirical distribution of \(c\) to compute the predictive comparisons, and then average over a specific or set of covariates to obtain the average predictive comparisons. To achieve unit level contrasts, do not pass a parameter into `conditional`

and or specify `None`

.

```
= bmb.interpret.comparisons(
unit_level =fish_model,
model=fish_idata,
idata="livebait",
contrast=None,
conditional
)
# empirical distribution
print(unit_level.shape[0] == fish_model.data.shape[0])
10) unit_level.head(
```

`True`

term | estimate_type | value | camper | child | persons | estimate | lower_3.0% | upper_97.0% | |
---|---|---|---|---|---|---|---|---|---|

0 | livebait | diff | (0, 1) | 0 | 0 | 1 | 0.864408 | 0.627063 | 1.116105 |

1 | livebait | diff | (0, 1) | 1 | 0 | 1 | 1.694646 | 1.252803 | 2.081207 |

2 | livebait | diff | (0, 1) | 0 | 0 | 1 | 0.864408 | 0.627063 | 1.116105 |

3 | livebait | diff | (0, 1) | 1 | 1 | 2 | 1.009094 | 0.755449 | 1.249551 |

4 | livebait | diff | (0, 1) | 0 | 0 | 1 | 0.864408 | 0.627063 | 1.116105 |

5 | livebait | diff | (0, 1) | 1 | 2 | 4 | 1.453235 | 0.964674 | 1.956434 |

6 | livebait | diff | (0, 1) | 0 | 1 | 3 | 1.233247 | 0.900295 | 1.569891 |

7 | livebait | diff | (0, 1) | 0 | 3 | 4 | 0.188019 | 0.090328 | 0.289560 |

8 | livebait | diff | (0, 1) | 1 | 2 | 3 | 0.606361 | 0.390571 | 0.818549 |

9 | livebait | diff | (0, 1) | 1 | 0 | 1 | 1.694646 | 1.252803 | 2.081207 |

```
# empirical (observed) data used to fit the model
10) fish_model.data.head(
```

count | livebait | camper | persons | child | |
---|---|---|---|---|---|

0 | 0 | 0 | 0 | 1 | 0 |

1 | 0 | 1 | 1 | 1 | 0 |

2 | 0 | 1 | 0 | 1 | 0 |

3 | 0 | 1 | 1 | 2 | 1 |

4 | 1 | 1 | 0 | 1 | 0 |

5 | 0 | 1 | 1 | 4 | 2 |

6 | 0 | 1 | 0 | 3 | 1 |

7 | 0 | 1 | 0 | 4 | 3 |

8 | 0 | 0 | 1 | 3 | 2 |

9 | 1 | 1 | 1 | 1 | 0 |

Above, `unit_level`

is the comparisons summary dataframe and `fish_model.data`

is the empirical data. Notice how the values for \(c\) are identical in both dataframes. However, for \(w\), the values are different. However, these unit level contrasts are difficult to interpret as each row corresponds to *that* unit’s contrast. Therefore, it is useful to average over (marginalize) the estimates to summarize the unit level predictive comparisons.

#### Marginalizing over covariates

Since the empirical distrubution is used for computing the average predictive comparisons, the same number of rows (250) is returned as the data used to fit the model. To average over a covariate, use the `average_by`

argument. If `True`

is passed, then `comparisons`

averages over all covariates. Else, if a single or list of covariates are passed, then `comparisons`

averages by the covariates passed.

```
# marginalize over all covariates
bmb.interpret.comparisons(=fish_model,
model=fish_idata,
idata="livebait",
contrast=None,
conditional=True
average_by )
```

term | estimate_type | value | estimate | lower_3.0% | upper_97.0% | |
---|---|---|---|---|---|---|

0 | livebait | diff | (0, 1) | 3.649691 | 2.956185 | 4.333621 |

Passing `True`

to `average_by`

averages over all covariates and is equivalent to taking the mean of the `estimate`

and uncertainty columns. For example:

```
= bmb.interpret.comparisons(
unit_level =fish_model,
model=fish_idata,
idata="livebait",
contrast=None,
conditional
)
"estimate", "lower_3.0%", "upper_97.0%"]].mean() unit_level[[
```

```
estimate 3.649691
lower_3.0% 2.956185
upper_97.0% 4.333621
dtype: float64
```

#### Average by subgroups

Averaging over all covariates may not be desired, and you would rather average by a group or specific covariate. To perform averaging by subgroups, users can pass a single or list of covariates to `average_by`

to average over specific covariates. For example, if we wanted to average by `persons`

:

```
# average by number of persons
bmb.interpret.comparisons(=fish_model,
model=fish_idata,
idata="livebait",
contrast=None,
conditional="persons"
average_by )
```

term | estimate_type | value | persons | estimate | lower_3.0% | upper_97.0% | |
---|---|---|---|---|---|---|---|

0 | livebait | diff | (0, 1) | 1 | 1.374203 | 1.011290 | 1.708711 |

1 | livebait | diff | (0, 1) | 2 | 1.963362 | 1.543330 | 2.376636 |

2 | livebait | diff | (0, 1) | 3 | 3.701510 | 3.056586 | 4.357385 |

3 | livebait | diff | (0, 1) | 4 | 7.358662 | 6.047642 | 8.655654 |

```
# average by number of persons and camper by passing a list
bmb.interpret.comparisons(=fish_model,
model=fish_idata,
idata="livebait",
contrast=None,
conditional=["persons", "camper"]
average_by )
```

term | estimate_type | value | persons | camper | estimate | lower_3.0% | upper_97.0% | |
---|---|---|---|---|---|---|---|---|

0 | livebait | diff | (0, 1) | 1 | 0 | 0.864408 | 0.627063 | 1.116105 |

1 | livebait | diff | (0, 1) | 1 | 1 | 1.694646 | 1.252803 | 2.081207 |

2 | livebait | diff | (0, 1) | 2 | 0 | 1.424598 | 1.078389 | 1.777154 |

3 | livebait | diff | (0, 1) | 2 | 1 | 2.344439 | 1.872191 | 2.800661 |

4 | livebait | diff | (0, 1) | 3 | 0 | 2.429459 | 1.871578 | 2.964242 |

5 | livebait | diff | (0, 1) | 3 | 1 | 4.443540 | 3.747840 | 5.170052 |

6 | livebait | diff | (0, 1) | 4 | 0 | 3.541921 | 2.686445 | 4.391176 |

7 | livebait | diff | (0, 1) | 4 | 1 | 10.739204 | 9.024702 | 12.432764 |

It is still possible to use `plot_comparisons`

when passing an argument to `average_by`

. In the plot below, the empirical distribution is used to compute unit level contrasts for `livebait`

and then averaged over `persons`

to obtain the average predictive comparisons. The plot below is similar to the second plot in this notebook. The differences being that: (1) a pairwise transition grid is defined for the second plot above, whereas the empirical distribution is used in the plot below, and (2) in the plot below, we marginalized over the other covariates in the model (thus the reason for not having a `camper`

or `child`

group and panel, and a reduction in the uncertainty interval).

```
= bmb.interpret.plot_comparisons(
fig, ax =fish_model,
model=fish_idata,
idata="livebait",
contrast=None,
conditional="persons"
average_by
)7, 3) fig.set_size_inches(
```

## Logistic Regression

To showcase an additional functionality of `comparisons`

and `plot_comparisons`

, we fit a logistic regression model to the titanic dataset with interaction terms to model the probability of survival. The titanic dataset gives the values of four categorical attributes for each of the 2201 people on board the Titanic when it struck an iceberg and sank. The attributes are social class (first class, second class, third class, crewmember), age, sex (0 = female, 1 = male), and whether or not the person survived (0 = deceased, 1 = survived).

```
= pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/Stat2Data/Titanic.csv", index_col=0)
dat
"PClass"] = dat["PClass"].str.replace("[st, nd, rd]", "", regex=True)
dat["PClass"] = dat["PClass"].str.replace("*", "0").astype(int)
dat["PClass"] = dat["PClass"].replace(0, np.nan)
dat["PClass"] = pd.Categorical(dat["PClass"], ordered=True)
dat["SexCode"] = pd.Categorical(dat["SexCode"], ordered=True)
dat[
= dat.dropna(axis=0, how="any") dat
```

```
= bmb.Model(
titanic_model "Survived ~ PClass * SexCode * Age",
=dat,
data="bernoulli"
family
)= titanic_model.fit(
titanic_idata =500,
draws=500,
tune=0.95,
target_accept=1234
random_seed )
```

```
Modeling the probability that Survived==1
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [Intercept, PClass, SexCode, PClass:SexCode, Age, PClass:Age, SexCode:Age, PClass:SexCode:Age]
```

`Sampling 4 chains for 500 tune and 500 draw iterations (2_000 + 2_000 draws total) took 127 seconds.`

### Comparison types

`comparisons`

and `plot_comparisons`

also allow you to specify the type of comparison to be computed. By default, a difference is used. However, it is also possible to take the ratio where comparisons would then become *average predictive ratios*. To achieve this, pass `"ratio"`

into the argument `comparison_type`

. Using different comparison types offers a way to produce alternative insights; especially when there are interaction terms as the value of one covariate depends on the value of the other covariate.

```
= bmb.interpret.plot_comparisons(
fig, ax =titanic_model,
model=titanic_idata,
idata={"PClass": [1, 3]},
contrast=["Age", "SexCode"],
conditional="ratio",
comparison_type={"main": "Age", "group": "SexCode", "panel": "SexCode"},
subplot_kwargs={"figsize":(12, 3), "sharey": True},
fig_kwargs=False
legend
)
```

The left panel shows that the ratio of the probability of survival comparing `PClass`

\(3\) to \(1\) conditional on `Age`

is non-constant. Whereas the right panel shows an approximately constant ratio in the probability of survival comparing `PClass`

\(3\) to \(1\) conditional on `Age`

.

```
%load_ext watermark
%watermark -n -u -v -iv -w
```

```
Last updated: Thu Aug 15 2024
Python implementation: CPython
Python version : 3.11.9
IPython version : 8.24.0
bambi : 0.14.1.dev12+g64e57423.d20240730
pandas: 2.2.2
numpy : 1.26.4
arviz : 0.18.0
Watermark: 2.4.3
```