API Reference#
This reference provides detailed documentation for all modules, classes, and methods in the current release of Bambi.
bambi.models
#
- class bambi.models.Model(formula, data, family='gaussian', priors=None, link=None, categorical=None, potentials=None, dropna=False, auto_scale=True, automatic_priors='default', noncentered=True, priors_cor=None)[source]#
Specification of model class.
- Parameters:
formula (str) – A model description written using the formula syntax from the
formulae
library.data (pandas.DataFrame or str) – The dataset to use. Either a pandas
DataFrame
, or the name of the file containing the data, which will be passed topd.read_csv()
.family (str or bambi.families.Family) – A specification of the model family (analogous to the family object in R). Either a string, or an instance of class
bambi.families.Family
. If a string is passed, a family with the corresponding name must be defined in the defaults loaded atModel
initialization. Valid pre-defined families are"bernoulli"
,"beta"
,"binomial"
,"categorical"
,"gamma"
,"gaussian"
,"negativebinomial"
,"poisson"
,"t"
, and"wald"
. Defaults to"gaussian"
.priors (dict) – Optional specification of priors for one or more terms. A dictionary where the keys are the names of terms in the model, “common” or “group_specific” and the values are instances of class
Prior
whenautomatic_priors
is"default"
.link (str) – The name of the link function to use. Valid names are
"cloglog"
,"identity"
,"inverse_squared"
,"inverse"
,"log"
,"logit"
,"probit"
, and"softmax"
. Not all the link functions can be used with all the families.categorical (str or list) – The names of any variables to treat as categorical. Can be either a single variable name, or a list of names. If categorical is
None
, the data type of the columns in thedata
will be used to infer handling. In cases where numeric columns are to be treated as categorical (e.g., group specific factors coded as numerical IDs), explicitly passing variable names via this argument is recommended.potentials (A list of 2-tuples.) – Optional specification of potentials. A potential is an arbitrary expression added to the likelihood, this is generally useful to add constrains to models, that are difficult to express otherwise. The first term of a 2-tuple is the name of a variable in the model, the second a lambda function expressing the desired constraint. If a constraint involves n variables, you can pass n 2-tuples or pass a tuple which first element is a n-tuple and second element is a lambda function with n arguments. The number and order of the lambda function has to match the number and order of the variables names.
dropna (bool) – When
True
, rows with any missing values in either the predictors or outcome are automatically dropped from the dataset in a listwise manner.auto_scale (bool) – If
True
(default), priors are automatically rescaled to the data (to be weakly informative) any time default priors are used. Note that any priors explicitly set by the user will always take precedence over default priors.automatic_priors (str) – An optional specification to compute automatic priors.
"default"
means to use a method inspired on the R rstanarm library.noncentered (bool) – If
True
(default), uses a non-centered parameterization for normal hyperpriors on grouped parameters. IfFalse
, naive (centered) parameterization is used.priors_cor (dict) – An optional value for eta in the LKJ prior for the correlation matrix of group-specific terms. Keys in the dictionary indicate the groups, and values indicate the value of eta. This is a very experimental feature. Defaults to
None
, which means priors for the group-specific terms are independent.
- build()[source]#
Set up the model for sampling/fitting.
Creates an instance of the underlying PyMC model and adds all the necessary terms to it.
- property common_terms#
Return dict of all common effects in model.
- fit(draws=1000, tune=1000, discard_tuned_samples=True, omit_offsets=True, include_mean=False, inference_method='mcmc', init='auto', n_init=50000, chains=None, cores=None, random_seed=None, **kwargs)[source]#
Fit the model using PyMC.
- Parameters:
draws (int) – The number of samples to draw from the posterior distribution. Defaults to 1000.
tune (int) – Number of iterations to tune. Defaults to 1000. Samplers adjust the step sizes, scalings or similar during tuning. These tuning samples are be drawn in addition to the number specified in the
draws
argument, and will be discarded unlessdiscard_tuned_samples
is set toFalse
.discard_tuned_samples (bool) – Whether to discard posterior samples of the tune interval. Defaults to
True
.omit_offsets (bool) – Omits offset terms in the
InferenceData
object returned when the model includes group specific effects. Defaults toTrue
.include_mean (bool) – Compute the posterior of the mean response. Defaults to
False
.inference_method (str) – The method to use for fitting the model. By default,
"mcmc"
. This automatically assigns a MCMC method best suited for each kind of variables, like NUTS for continuous variables and Metropolis for non-binary discrete ones. Alternatively,"vi"
, in which case the model will be fitted using variational inference as implemented in PyMC using thefit
function. Finally,"laplace"
, in which case a Laplace approximation is used and is not recommended other than for pedagogical use. To use the PyMC numpyro and blackjax samplers, usenuts_numpyro
ornuts_blackjax
respectively. Both methods will only work if you can use NUTS sampling, so your model must be differentiable.init (str) – Initialization method. Defaults to
"auto"
. The available methods are: * auto: Use"jitter+adapt_diag"
and if this method fails it uses"adapt_diag"
. * adapt_diag: Start with a identity mass matrix and then adapt a diagonal based on the variance of the tuning samples. All chains use the test value (usually the prior mean) as starting point. * jitter+adapt_diag: Same as"adapt_diag"
, but use test value plus a uniform jitter in [-1, 1] as starting point in each chain. * advi+adapt_diag: Run ADVI and then adapt the resulting diagonal mass matrix based on the sample variance of the tuning samples. * advi+adapt_diag_grad: Run ADVI and then adapt the resulting diagonal mass matrix based on the variance of the gradients during tuning. This is experimental and might be removed in a future release. * advi: Run ADVI to estimate posterior mean and diagonal mass matrix. * advi_map: Initialize ADVI with MAP and use MAP as starting point. * map: Use the MAP as starting point. This is strongly discouraged. * adapt_full: Adapt a dense mass matrix using the sample covariances. All chains use the test value (usually the prior mean) as starting point. * jitter+adapt_full: Same as"adapt_full"
, but use test value plus a uniform jitter in [-1, 1] as starting point in each chain.n_init (int) – Number of initialization iterations. Only works for
"advi"
init methods.chains (int) – The number of chains to sample. Running independent chains is important for some convergence statistics and can also reveal multiple modes in the posterior. If
None
, then set to eithercores
or 2, whichever is larger.cores (int) – The number of chains to run in parallel. If
None
, it is equal to the number of CPUs in the system unless there are more than 4 CPUs, in which case it is set to 4.random_seed (int or list of ints) – A list is accepted if cores is greater than one.
**kwargs – For other kwargs see the documentation for
PyMC.sample()
.
- Returns:
An ArviZ
InferenceData
instance if inference_method is"mcmc"
(default),”nuts_numpyro”, “nuts_blackjax” or “laplace”.
An
Approximation
object if"vi"
.
- graph(formatting='plain', name=None, figsize=None, dpi=300, fmt='png')[source]#
Produce a graphviz Digraph from a built Bambi model.
- Requires graphviz, which may be installed most easily with
conda install -c conda-forge python-graphviz
Alternatively, you may install the
graphviz
binaries yourself, and thenpip install graphviz
to get the python bindings. See http://graphviz.readthedocs.io/en/stable/manual.html for more information.- Parameters:
formatting (str) – One of
"plain"
or"plain_with_params"
. Defaults to"plain"
.name (str) – Name of the figure to save. Defaults to
None
, no figure is saved.figsize (tuple) – Maximum width and height of figure in inches. Defaults to
None
, the figure size is set automatically. If defined and the drawing is larger than the given size, the drawing is uniformly scaled down so that it fits within the given size. Only works ifname
is notNone
.dpi (int) – Point per inch of the figure to save. Defaults to 300. Only works if
name
is notNone
.fmt (str) – Format of the figure to save. Defaults to
"png"
. Only works ifname
is notNone
.
Example
>>> model = Model("y ~ x + (1|z)") >>> model.build() >>> model.graph()
>>> model = Model("y ~ x + (1|z)") >>> model.fit() >>> model.graph()
- property group_specific_terms#
Return dict of all group specific effects in model.
- property intercept_term#
Return the intercept term
- property offset_terms#
Return dict of all offset effects in model.
- plot_priors(draws=5000, var_names=None, random_seed=None, figsize=None, textsize=None, hdi_prob=None, round_to=2, point_estimate='mean', kind='kde', bins=None, omit_offsets=True, omit_group_specific=True, ax=None)[source]#
Samples from the prior distribution and plots its marginals.
- Parameters:
draws (int) – Number of draws to sample from the prior predictive distribution. Defaults to 5000.
var_names (str or list) – A list of names of variables for which to compute the posterior predictive distribution. Defaults to
None
which means to include both observed and unobserved RVs.random_seed (int) – Seed for the random number generator.
figsize (tuple) – Figure size. If
None
it will be defined automatically.textsize (float) – Text size scaling factor for labels, titles and lines. If
None
it will be autoscaled based onfigsize
.hdi_prob (float or str) – Plots highest density interval for chosen percentage of density. Use
"hide"
to hide the highest density interval. Defaults to 0.94.round_to (int) – Controls formatting of floats. Defaults to 2 or the integer part, whichever is bigger.
point_estimate (str) – Plot point estimate per variable. Values should be
"mean"
,"median"
,"mode"
orNone
. Defaults to"auto"
i.e. it falls back to default set in ArviZ’s rcParams.kind (str) – Type of plot to display (
"kde"
or"hist"
) For discrete variables this argument is ignored and a histogram is always used.bins (integer or sequence or "auto") – Controls the number of bins, accepts the same keywords
matplotlib.pyplot.hist()
does. Only works ifkind == "hist"
. IfNone
(default) it will use"auto"
for continuous variables andrange(xmin, xmax + 1)
for discrete variables.omit_offsets (bool) – Whether to omit offset terms in the plot. Defaults to
True
.omit_group_specific (bool) – Whether to omit group specific effects in the plot. Defaults to
True
.ax (numpy array-like of matplotlib axes or bokeh figures) – A 2D array of locations into which to plot the densities. If not supplied, ArviZ will create its own array of plot areas (and return it).
**kwargs – Passed as-is to
matplotlib.pyplot.hist()
ormatplotlib.pyplot.plot()
function depending on the value ofkind
.
- Returns:
axes
- Return type:
matplotlib axes
- predict(idata, kind='mean', data=None, inplace=True, include_group_specific=True)[source]#
Predict method for Bambi models
Obtains in-sample and out-of-sample predictions from a fitted Bambi model.
- Parameters:
idata (InferenceData) – The
InferenceData
instance returned by.fit()
.kind (str) – Indicates the type of prediction required. Can be
"mean"
or"pps"
. The first returns draws from the posterior distribution of the mean, while the latter returns the draws from the posterior predictive distribution (i.e. the posterior probability distribution for a new observation). Defaults to"mean"
.data (pandas.DataFrame or None) – An optional data frame with values for the predictors that are used to obtain out-of-sample predictions. If omitted, the original dataset is used.
include_group_specific (bool) – If
True
make predictions including the group specific effects. Otherwise, predictions are made with common effects only (i.e. group specific are set to zero).inplace (bool) – If
True
it will modifyidata
in-place. Otherwise, it will return a copy ofidata
with the predictions added. Ifkind="mean"
, a new variable ending in"_mean"
is added to theposterior
group. Ifkind="pps"
, it appends aposterior_predictive
group toidata
. If any of these already exist, it will be overwritten.
- Return type:
InferenceData or None
- prior_predictive(draws=500, var_names=None, omit_offsets=True, random_seed=None)[source]#
Generate samples from the prior predictive distribution. :param draws: Number of draws to sample from the prior predictive distribution. Defaults to 500. :type draws: int :param var_names: A list of names of variables for which to compute the prior predictive distribution.
Defaults to
None
which means both observed and unobserved RVs.- Parameters:
random_seed (int) – Seed for the random number generator.
- Returns:
InferenceData
object with the groupsprior
,prior_predictive
andobserved_data
.- Return type:
InferenceData
- set_alias(aliases)[source]#
Set aliases for the terms and auxiliary parameters in the model
- Parameters:
aliases (dict) – A dictionary where key represents the original term name and the value is the alias.
- set_priors(priors=None, common=None, group_specific=None)[source]#
Set priors for one or more existing terms.
- Parameters:
priors (dict) – Dictionary of priors to update. Keys are names of terms to update; values are the new priors (either a
Prior
instance, or an int or float that scales the default priors). Note that a tuple can be passed as the key, in which case the same prior will be applied to all terms named in the tuple.common (Prior, int, float or str) – A prior specification to apply to all common terms included in the model.
group_specific (Prior, int, float or str) – A prior specification to apply to all group specific terms included in the model.
- property term_names#
Return names of all terms in order of addition to model.
bambi.priors
#
Classes to represent prior distributions and methods to set automatic priors
- class bambi.priors.Prior(name, auto_scale=True, **kwargs)[source]#
Abstract specification of a term prior.
- Parameters:
name (str) – Name of prior distribution. Must be the name of a PyMC distribution (e.g.,
"Normal"
,"Bernoulli"
, etc.)auto_scale (bool) – Whether to adjust the parameters of the prior or use them as passed. Default to
True
.kwargs (dict) – Optional keywords specifying the parameters of the named distribution.
bambi.families
#
Classes to construct model families.
- class bambi.families.Family(name, likelihood, link)[source]#
A specification of model family.
- Parameters:
name (str) – The name of the family. It can be any string.
likelihood (Likelihood) – A
bambi.families.Likelihood
instance specifying the model likelihood function.link (str or Link) – The name of the link function or a
bambi.families.Link
instance. The link function transforms the linear model prediction to the mean parameter of the likelihood function.
Examples
>>> import bambi as bmb
Replicate the Gaussian built-in family.
>>> sigma_prior = bmb.Prior("HalfNormal", sigma=1) >>> likelihood = bmb.Likelihood("Gaussian", parent="mu", sigma=sigma_prior) >>> family = bmb.Family("gaussian", likelihood, "identity") >>> # Then you can do >>> # bmb.Model("y ~ x", data, family=family)
Replicate the Bernoulli built-in family.
>>> likelihood = bmb.Likelihood("Bernoulli", parent="p") >>> family = bmb.Family("bernoulli", likelihood, "logit")
- class bambi.families.Likelihood(name, parent=None, **kwargs)[source]#
Representation of a Likelihood function for a Bambi model.
Notes: *
parent
must not be inkwargs
. *parent
is inferred from thename
if it is a known name- Parameters:
name (str) – Name of the likelihood function. Must be a valid PyMC distribution name.
parent (str) – Optional specification of the name of the mean parameter in the likelihood. This is the parameter whose transformation is modeled by the linear predictor.
kwargs – Keyword arguments that indicate prior distributions for auxiliary parameters in the likelihood.
- class bambi.families.Link(name, link=None, linkinv=None, linkinv_backend=None)[source]#
Representation of a link function.
This object contains two main functions. One is the link function itself, the function that maps values in the response scale to the linear predictor, and the other is the inverse of the link function, that maps values of the linear predictor to the response scale.
The great majority of users will never interact with this class unless they want to create a custom
Family
with a customLink
. This is automatically handled for all the built-in families.- Parameters:
name (str) – The name of the link function. If it is a known name, it’s not necessary to pass any other arguments because functions are already defined internally. If not known, all of
link
,linkinv
andlinkinv_backend
must be specified.link (function) – A function that maps the response to the linear predictor. Known as the \(g\) function in GLM jargon. Does not need to be specified when
name
is a known name.linkinv (function) – A function that maps the linear predictor to the response. Known as the \(g^{-1}\) function in GLM jargon. Does not need to be specified when
name
is a known name.linkinv_backend (function) – Same than
linkinv
but must be something that works with PyMC backend (i.e. it must work with Aesara tensors). Does not need to be specified whenname
is a known name.
bambi.data
#
Code for loading datasets.
- bambi.data.clear_data_home(data_home=None)[source]#
Delete all the content of the data home cache.
- Parameters:
data_home (str) – The path to Bambi data dir. By default a folder named
"bambi_data"
in the user home folder.
- bambi.data.load_data(dataset=None, data_home=None)[source]#
Load a dataset.
Run with no parameters to get a list of all available data sets.
The directory to save can also be set with the environment variable
BAMBI_HOME
. The checksum of the dataset is checked against a hardcoded value to watch for data corruption. Runbmb.clear_data_home()
to clear the data directory.- Parameters:
dataset (str) – Name of dataset to load.
data_home (str, optional) – Where to save remote datasets
- Return type:
pandas.DataFrame