Model

Model(self, formula, data, family='gaussian', priors=None, link=None, categorical=None, potentials=None, dropna=False, auto_scale=True, noncentered=True, center_predictors=True, extra_namespace=None)

Specification of model class.

Parameters

Name Type Description Default
formula str or bambi.formula.Formula A model description written using the formula syntax from the formulae library. required
data pandas.DataFrame A pandas dataframe containing the data on which the model will be fit, with column names matching variables defined in the formula. required
family str or bambi.families.Family A specification of the model family (analogous to the family object in R). Either a string, or an instance of class bambi.families.Family. If a string is passed, a family with the corresponding name must be defined in the defaults loaded at Model initialization. Valid pre-defined families are "bernoulli", "beta", "binomial", "categorical", "gamma", "gaussian", "negativebinomial", "poisson", "t", and "wald". Defaults to "gaussian". 'gaussian'
priors dict Optional specification of priors for one or more terms. A dictionary where the keys are the names of terms in the model, “common,” or “group_specific” and the values are instances of class Prior. If priors are unset, uses automatic priors inspired by the R rstanarm library. None
link str or Dict[str, str] The name of the link function to use. Valid names are "cloglog", "identity", "inverse_squared", "inverse", "log", "logit", "probit", and "softmax". Not all the link functions can be used with all the families. If a dictionary, keys are the names of the target parameters and the values are the names of the link functions. None
categorical str or list The names of any variables to treat as categorical. Can be either a single variable name, or a list of names. If categorical is None, the data type of the columns in the data will be used to infer handling. In cases where numeric columns are to be treated as categorical (e.g., group specific factors coded as numerical IDs), explicitly passing variable names via this argument is recommended. None
potentials A list of 2-tuples. Optional specification of potentials. A potential is an arbitrary expression added to the likelihood, this is generally useful to add constrains to models, that are difficult to express otherwise. The first term of a 2-tuple is the name of a variable in the model, the second a lambda function expressing the desired constraint. If a constraint involves n variables, you can pass n 2-tuples or pass a tuple which first element is a n-tuple and second element is a lambda function with n arguments. The number and order of the lambda function has to match the number and order of the variables names. None
dropna bool When True, rows with any missing values in either the predictors or outcome are automatically dropped from the dataset in a listwise manner. False
auto_scale bool If True (default), priors are automatically rescaled to the data (to be weakly informative) any time default priors are used. Note that any priors explicitly set by the user will always take precedence over default priors. True
noncentered bool If True (default), uses a non-centered parameterization for normal hyperpriors on grouped parameters. If False, naive (centered) parameterization is used. True
center_predictors bool If True (default), and if there is an intercept in the common terms, the data is centered by subtracting the mean. The centering is undone after sampling to provide the actual intercept in all distributional components that have an intercept. Note that this changes the interpretation of the prior on the intercept because it refers to the intercept of the centered data. True
extra_namespace dict Additional user supplied variables with transformations or data to include in the environment where the formula is evaluated. Defaults to None. None

Methods

Name Description
build Set up the model for sampling/fitting.
compute_log_likelihood Compute the model’s log-likelihood
fit Fit the model using PyMC.
graph Produce a graphviz Digraph from a built Bambi model.
plot_priors Samples from the prior distribution and plots its marginals.
predict Predict method for Bambi models
prior_predictive Generate samples from the prior predictive distribution.
set_alias Set aliases for the terms and auxiliary parameters in the model
set_priors Set priors for one or more existing terms.

build

Model.build(self)

Set up the model for sampling/fitting.

Creates an instance of the underlying PyMC model and adds all the necessary terms to it.

Returns

Type Description
None

compute_log_likelihood

Model.compute_log_likelihood(self, idata, data=None, inplace=True)

Compute the model’s log-likelihood

NOTE: This is a new feature and it may not work in all cases.

Parameters

Name Type Description Default
idata InferenceData The InferenceData instance returned by .fit(). required
data pandas.DataFrame or None An optional data frame with values for the predictors and the response on which the model’s log-likelihood function is evaluated. If omitted, the original dataset is used. None
inplace bool If Trueit will modifyidatain-place. Otherwise, it will return a copy ofidatawith thelog_likelihoodgroup added. |True`

Returns

Type Description
InferenceData or None

fit

Model.fit(self, draws=1000, tune=1000, discard_tuned_samples=True, omit_offsets=True, include_mean=False, inference_method='mcmc', init='auto', n_init=50000, chains=None, cores=None, random_seed=None, **kwargs)

Fit the model using PyMC.

Parameters

Name Type Description Default
draws The number of samples to draw from the posterior distribution. Defaults to 1000. 1000
tune int Number of iterations to tune. Defaults to 1000. Samplers adjust the step sizes, scalings or similar during tuning. These tuning samples are be drawn in addition to the number specified in the draws argument, and will be discarded unless discard_tuned_samples is set to False. 1000
discard_tuned_samples bool Whether to discard posterior samples of the tune interval. Defaults to True. True
omit_offsets bool Omits offset terms in the InferenceData object returned when the model includes group specific effects. Defaults to True. True
include_mean bool Compute the posterior of the mean response. Defaults to False. False
inference_method str The method to use for fitting the model. By default, "mcmc". This automatically assigns a MCMC method best suited for each kind of variables, like NUTS for continuous variables and Metropolis for non-binary discrete ones. Alternatively, "vi", in which case the model will be fitted using variational inference as implemented in PyMC using the fit function. Finally, "laplace", in which case a Laplace approximation is used and is not recommended other than for pedagogical use. To get a list of JAX based inference methods, call bmb.inference_methods.names['bayeux']. This will return a dictionary of the available methods such as blackjax_nuts, numpyro_nuts, among others. 'mcmc'
init str Initialization method. Defaults to "auto". The available methods are: * auto: Use "jitter+adapt_diag" and if this method fails it uses "adapt_diag". * adapt_diag: Start with a identity mass matrix and then adapt a diagonal based on the variance of the tuning samples. All chains use the test value (usually the prior mean) as starting point. * jitter+adapt_diag: Same as "adapt_diag", but use test value plus a uniform jitter in [-1, 1] as starting point in each chain. * advi+adapt_diag: Run ADVI and then adapt the resulting diagonal mass matrix based on the sample variance of the tuning samples. * advi+adapt_diag_grad: Run ADVI and then adapt the resulting diagonal mass matrix based on the variance of the gradients during tuning. This is experimental and might be removed in a future release. * advi: Run ADVI to estimate posterior mean and diagonal mass matrix. * advi_map: Initialize ADVI with MAP and use MAP as starting point. * map: Use the MAP as starting point. This is strongly discouraged. * adapt_full: Adapt a dense mass matrix using the sample covariances. All chains use the test value (usually the prior mean) as starting point. * jitter+adapt_full: Same as "adapt_full", but use test value plus a uniform jitter in [-1, 1] as starting point in each chain. 'auto'
n_init int Number of initialization iterations. Only works for "advi" init methods. 50000
chains int The number of chains to sample. Running independent chains is important for some convergence statistics and can also reveal multiple modes in the posterior. If None, then set to either cores or 2, whichever is larger. None
cores int The number of chains to run in parallel. If None, it is equal to the number of CPUs in the system unless there are more than 4 CPUs, in which case it is set to 4. None
random_seed int or list of ints A list is accepted if cores is greater than one. None
**kwargs For other kwargs see the documentation for PyMC.sample(). {}

Returns

Type Description
An ArviZ InferenceData instance if inference_method is "mcmc" (default),
“laplace”, or one of the MCMC methods in
bmb.inference_methods.names\['bayeux'\]\['mcmc\].
An Approximation object if "vi".

graph

Model.graph(self, formatting='plain', name=None, figsize=None, dpi=300, fmt='png')

Produce a graphviz Digraph from a built Bambi model.

Requires graphviz, which may be installed most easily with conda install -c conda-forge python-graphviz

Alternatively, you may install the graphviz binaries yourself, and then pip install graphviz to get the python bindings. See http://graphviz.readthedocs.io/en/stable/manual.html for more information.

Parameters

Name Type Description Default
formatting str One of "plain" or "plain_with_params". Defaults to "plain". 'plain'
name str Name of the figure to save. Defaults to None, no figure is saved. None
figsize tuple Maximum width and height of figure in inches. Defaults to None, the figure size is set automatically. If defined and the drawing is larger than the given size, the drawing is uniformly scaled down so that it fits within the given size. Only works if name is not None. None
dpi int Point per inch of the figure to save. Defaults to 300. Only works if name is not None. 300
fmt str Format of the figure to save. Defaults to "png". Only works if name is not None. 'png'

Returns

Type Description
graphviz.Digraph The graph

Example

model = Model(“y ~ x + (1|z)”) model.build() model.graph()

model = Model(“y ~ x + (1|z)”) model.fit() model.graph()

plot_priors

Model.plot_priors(self, draws=5000, var_names=None, random_seed=None, figsize=None, textsize=None, hdi_prob=None, round_to=2, point_estimate='mean', kind='kde', bins=None, omit_offsets=True, omit_group_specific=True, ax=None, **kwargs)

Samples from the prior distribution and plots its marginals.

Parameters

Name Type Description Default
draws int Number of draws to sample from the prior predictive distribution. Defaults to 5000. 5000
var_names str or list A list of names of variables for which to compute the prior predictive distribution. Defaults to None which means to include both observed and unobserved RVs. None
random_seed int Seed for the random number generator. None
figsize tuple Figure size. If None it will be defined automatically. None
textsize float Text size scaling factor for labels, titles and lines. If None it will be autoscaled based on figsize. None
hdi_prob float or str Plots highest density interval for chosen percentage of density. Use "hide" to hide the highest density interval. Defaults to 0.94. None
round_to int Controls formatting of floats. Defaults to 2 or the integer part, whichever is bigger. 2
point_estimate str Plot point estimate per variable. Values should be "mean", "median", "mode" or None. Defaults to "auto" i.e. it falls back to default set in ArviZ’s rcParams. 'mean'
kind str Type of plot to display ("kde" or "hist") For discrete variables this argument is ignored and a histogram is always used. 'kde'
bins integer or sequence or auto Controls the number of bins, accepts the same keywords matplotlib.pyplot.hist() does. Only works if kind == "hist". If None (default) it will use "auto" for continuous variables and range(xmin, xmax + 1) for discrete variables. None
omit_offsets bool Whether to omit offset terms in the plot. Defaults to True. True
omit_group_specific bool Whether to omit group specific effects in the plot. Defaults to True. True
ax numpy array-like of matplotlib axes or bokeh figures A 2D array of locations into which to plot the densities. If not supplied, ArviZ will create its own array of plot areas (and return it). None
**kwargs Passed as-is to matplotlib.pyplot.hist() or matplotlib.pyplot.plot() function depending on the value of kind. {}

Returns

Type Description
matplotlib axes

predict

Model.predict(self, idata, kind='mean', data=None, inplace=True, include_group_specific=True, sample_new_groups=False)

Predict method for Bambi models

Obtains in-sample and out-of-sample predictions from a fitted Bambi model.

Parameters

Name Type Description Default
idata InferenceData The InferenceData instance returned by .fit(). required
kind str Indicates the type of prediction required. Can be "mean" or "pps". The first returns draws from the posterior distribution of the mean, while the latter returns the draws from the posterior predictive distribution (i.e. the posterior probability distribution for a new observation) in addition to the mean posterior distribution. Defaults to "mean". 'mean'
data pandas.DataFrame or None An optional data frame with values for the predictors that are used to obtain out-of-sample predictions. If omitted, the original dataset is used. None
inplace bool If True it will modify idata in-place. Otherwise, it will return a copy of idata with the predictions added. If kind="mean", a new variable ending in "_mean" is added to the posterior group. If kind="pps", it appends a posterior_predictive group to idata. If any of these already exist, it will be overwritten. True
include_group_specific bool Determines if predictions incorporate group-specific effects. If False, predictions are made with common effects only (i.e. group specific are set to zero). Defaults to True. True
sample_new_groups bool Specifies if it is allowed to obtain predictions for new groups of group-specific terms. When True, each posterior sample for the new groups is drawn from the posterior draws of a randomly selected existing group. Since different groups may be selected at each draw, the end result represents the variation across existing groups. The method implemented is quivalent to sample_new_levels="uncertainty" in brms. False

Returns

Type Description
InferenceData or None

prior_predictive

Model.prior_predictive(self, draws=500, var_names=None, omit_offsets=True, random_seed=None)

Generate samples from the prior predictive distribution.

Parameters

Name Type Description Default
draws int Number of draws to sample from the prior predictive distribution. Defaults to 500. 500
var_names str or list A list of names of variables for which to compute the prior predictive distribution. Defaults to None which means both observed and unobserved RVs. None
omit_offsets bool Whether to omit offset terms in the plot. Defaults to True. True
random_seed int Seed for the random number generator. None

Returns

Type Description
InferenceData InferenceData object with the groups prior, prior_predictive and observed_data.

set_alias

Model.set_alias(self, aliases)

Set aliases for the terms and auxiliary parameters in the model

Parameters

Name Type Description Default
aliases dict A dictionary where key represents the original term name and the value is the alias. required

Returns

Type Description
None

set_priors

Model.set_priors(self, priors=None, common=None, group_specific=None)

Set priors for one or more existing terms.

Parameters

Name Type Description Default
priors dict Dictionary of priors to update. Keys are names of terms to update; values are the new priors (either a Prior instance, or an int or float that scales the default priors). None
common Prior, int, or float A prior specification to apply to all common terms included in the model. None
group_specific Prior, int, or float A prior specification to apply to all group specific terms included in the model. None

Returns

Type Description
None