Model
Model(
formula,
data,
family='gaussian',
priors=None,
link=None,
categorical=None,
potentials=None,
dropna=False,
auto_scale=True,
noncentered=True,
center_predictors=True,
extra_namespace=None,
)Specification of model class
Parameters
formula : str or Formula-
A model description written using the formula syntax from the
formulaelibrary. data :pd.DataFrame-
A pandas dataframe containing the data on which the model will be fit, with column names matching variables defined in the formula.
family : str orbambi.Family = 'gaussian'-
A specification of the model family (analogous to the family object in R). Either a string, or an instance of class
bambi.Family. If a string is passed, a family with the corresponding name must be defined in the defaults loaded atModelinitialization. Valid pre-defined families are"bernoulli","beta","binomial","categorical","gamma","gaussian","negativebinomial","poisson","t", and"wald". Defaults to"gaussian". priors : dict = None-
Optional specification of priors for one or more terms. A dictionary where the keys are the names of terms in the model, “common,” or “group_specific” and the values are instances of class
Prior. If priors are unset, use automatic priors inspired by the R rstanarm library. link : str or dict of str to str = None-
The name of the link function to use. Valid names are
"cloglog","identity","inverse_squared","inverse","log","logit","probit", and"softmax". Not all the link functions can be used with all the families. If a dictionary, keys are the names of the target parameters and the values are the names of the link functions. categorical : str or list of str = None-
The names of any variables to treat as categorical. Can be either a single variable name, or a list of names. If categorical is
None, the data type of the columns in thedatawill be used to infer handling. In cases where numeric columns are to be treated as categorical (e.g., group specific factors coded as numerical IDs), explicitly passing variable names via this argument is recommended. potentials : A list of 2-tuples = None-
Optional specification of potentials. A potential is an arbitrary expression added to the likelihood, this is generally useful to add constrains to models, that are difficult to express otherwise. The first term of a 2-tuple is the name of a variable in the model, the second a lambda function expressing the desired constraint. If a constraint involves n variables, you can pass n 2-tuples or pass a tuple which first element is a n-tuple and second element is a lambda function with n arguments. The number and order of the lambda function has to match the number and order of the variables names.
dropna : bool = False-
When
True, rows with any missing values in either the predictors or outcome are automatically dropped from t, optionalhe dataset in a listwise manner. auto_scale : bool = True-
If
True(default), priors are automatically rescaled to the data (to be weakly informative) any time default priors are used. Note that any priors explicitly set by the user will always take precedence over default priors. noncentered : bool = True-
If
True(default), uses a non-centered parameterization for normal hyperpriors on grouped parameters. IfFalse, naive (centered) parameterization is used. center_predictors : bool = True-
If
True(default), and if there is an intercept in the common terms, the data is centered by subtracting the mean. The centering is undone after sampling to provide the actual intercept in all distributional components that have an intercept. Note that this changes the interpretation of the prior on the intercept because it refers to the intercept of the centered data. extra_namespace : dict = None-
Additional user supplied variables with transformations or data to include in the environment where the formula is evaluated. Defaults to
None.
Methods
| Name | Description |
|---|---|
| build | Set up the model for sampling/fitting |
| compute_log_likelihood | Compute the model’s log-likelihood |
| fit | Fit the model using PyMC |
| graph | Produce a graphviz Digraph from a built Bambi model. |
| plot_priors | Samples from the prior distribution and plots its marginals. |
| predict | Predict method for Bambi models |
| prior_predictive | Generate samples from the prior predictive distribution. |
| r2_score | R² for Bayesian regression models. |
| set_alias | Set aliases for the terms and auxiliary parameters in the model |
| set_priors | Set priors for one or more existing terms. |
build
Model.build()Set up the model for sampling/fitting
Creates an instance of the underlying PyMC model and adds all the necessary terms to it.
compute_log_likelihood
Model.compute_log_likelihood(idata, data=None, inplace=True)Compute the model’s log-likelihood
NOTE: This is a new feature and it may not work in all cases.
Parameters
idata :InferenceData-
The
InferenceDatainstance returned by.fit(). data :pd.DataFrameor None = None-
An optional data frame with values for the predictors and the response on which the model’s log-likelihood function is evaluated. If omitted, the original dataset is used.
inplace : bool = True-
If
Trueit will modifyidatain-place. Otherwise, it will return a copy ofidatawith thelog_likelihoodgroup added.
Returns
:InferenceDataor None
fit
Model.fit(
draws=1000,
tune=1000,
discard_tuned_samples=True,
omit_offsets=True,
include_mean=None,
include_response_params=False,
inference_method='pymc',
init='auto',
n_init=50000,
chains=None,
cores=None,
random_seed=None,
**kwargs,
)Fit the model using PyMC
Parameters
draws : int = 1000-
The number of samples to draw from the posterior distribution. Defaults to 1000.
tune : int = 1000-
Number of iterations to tune. Defaults to 1000. Samplers adjust the step sizes, scalings or similar during tuning. These tuning samples are be drawn in addition to the number specified in the
drawsargument, and will be discarded unlessdiscard_tuned_samplesis set toFalse. discard_tuned_samples : bool = True-
Whether to discard posterior samples of the tune interval. Defaults to
True. omit_offsets : bool = True-
Omits offset terms in the
InferenceDataobject returned when the model includes group specific effects. Defaults toTrue. include_mean : (bool,optional,deprecated) = None-
This argument is deprecated and will be removed in future versions. Use
include_response_params. include_response_params : bool = False-
Include parameters of the response distribution in the output. These usually take more space than other parameters as there’s one of them per observation. Defaults to
False. inference_method : str = 'pymc'-
The method to use for fitting the model. By default,
"pymc". This automatically assigns a MCMC method best suited for each kind of variables, like NUTS for continuous variables and Metropolis for non-binary discrete ones. NUTS implementations include"pymc","nutpie","blackjax", and"numpyro". Alternatively,"vi", in which case the model will be fitted using variational inference as implemented in PyMC using thefitfunction. Finally,"laplace", in which case a Laplace approximation is used and is not recommended other than for pedagogical use. init : str = 'auto'-
Initialization method. Defaults to
"auto". The available methods are: * auto: Use"jitter+adapt_diag"and if this method fails it uses"adapt_diag". * adapt_diag: Start with a identity mass matrix and then adapt a diagonal based on the variance of the tuning samples. All chains use the test value (usually the prior mean) as starting point. * jitter+adapt_diag: Same as"adapt_diag", but use test value plus a uniform jitter in [-1, 1] as starting point in each chain. * advi+adapt_diag: Run ADVI and then adapt the resulting diagonal mass matrix based on the sample variance of the tuning samples. * advi+adapt_diag_grad: Run ADVI and then adapt the resulting diagonal mass matrix based on the variance of the gradients during tuning. This is experimental and might be removed in a future release. * advi: Run ADVI to estimate posterior mean and diagonal mass matrix. * advi_map: Initialize ADVI with MAP and use MAP as starting point. * map: Use the MAP as starting point. This is strongly discouraged. * adapt_full: Adapt a dense mass matrix using the sample covariances. All chains use the test value (usually the prior mean) as starting point. * jitter+adapt_full: Same as"adapt_full", but use test value plus a uniform jitter in [-1, 1] as starting point in each chain. n_init : int = 50000-
Number of initialization iterations. Only works for
"advi"init methods. chains : int = None-
The number of chains to sample. Running independent chains is important for some convergence statistics and can also reveal multiple modes in the posterior. If
None, then set to eithercoresor 2, whichever is larger. cores : int = None-
The number of chains to run in parallel. If
None, it is equal to the number of CPUs in the system unless there are more than 4 CPUs, in which case it is set to 4. random_seed : int or list of ints = None-
A list is accepted if cores is greater than one.
kwargs : dict = {}-
For other kwargs see the documentation for
PyMC.sample().
Returns
:InferenceDataorApproximation-
It returns an
InferenceDataifinference_methodis"pymc","nutpie","blackjax","numpyro", or"laplace", and anApproximationobject if"vi".
graph
Model.graph(formatting='plain', name=None, figsize=None, dpi=300, fmt='png')Produce a graphviz Digraph from a built Bambi model.
Requires graphviz, which may be installed most easily with:
conda install -c conda-forge python-graphvizAlternatively, you may install the graphviz binaries yourself, and then pip install graphviz to get the python bindings. See http://graphviz.readthedocs.io/en/stable/manual.html for more information.
Parameters
formatting : str = 'plain'-
One of
"plain"or"plain_with_params". Defaults to"plain". name : str = None-
Name of the figure to save. Defaults to
None, no figure is saved. figsize : tuple = None-
Maximum width and height of figure in inches. Defaults to
None, the figure size is set automatically. If defined and the drawing is larger than the given size, the drawing is uniformly scaled down so that it fits within the given size. Only works ifnameis notNone. dpi : int = 300-
Point per inch of the figure to save. Defaults to 300. Only works if
nameis notNone. fmt : str = 'png'-
Format of the figure to save. Defaults to
"png". Only works ifnameis notNone.
Returns
:graphviz.Digraph-
The graph
Examples
model = Model("y ~ x + (1|z)")
model.fit()
model.graph()plot_priors
Model.plot_priors(
draws=5000,
var_names=None,
filter_vars=None,
kind='kde',
ci_kind=None,
ci_prob=None,
point_estimate=None,
plot_collection=None,
backend=None,
labeller=None,
aes_by_visuals=None,
visuals=None,
stats=None,
figsize=None,
omit_offsets=True,
omit_group_specific=True,
random_seed=None,
bins=None,
hdi_prob=None,
round_to=None,
**pc_kwargs,
)Samples from the prior distribution and plots its marginals.
Parameters
draws : int = 5000-
Number of draws to sample from the prior predictive distribution. Defaults to 5000.
var_names : str or list of str = None-
A list of names of variables for which to compute the prior predictive distribution. Defaults to
Nonewhich means to include both observed and unobserved RVs. filter_vars : (like,regex) = "like"-
If
None, interpretvar_namesas the real variables names. If"like", interpretvar_namesas substrings of the real variables names. If"regex", interpretvar_namesas regular expressions on the real variables names. Forwarded toarviz_plots.plot_dist. kind : str = 'kde'-
Type of plot to display (
"kde"or"hist"). For discrete variables this argument is ignored and a histogram is always used. Forwarded toarviz_plots.plot_dist. ci_kind : (eti,hdi) = "eti"-
Which credible interval to use. Defaults to
arviz_base.rcParams["stats.ci_kind"]. Forwarded toarviz_plots.plot_dist. ci_prob : float = None-
Indicates the probability that should be contained within the plotted credible interval. Defaults to
arviz_base.rcParams["stats.ci_prob"]. Forwarded toarviz_plots.plot_dist. point_estimate : str = None-
Plot point estimate per variable. Values should be
"mean","median","mode"orNone. WhenNone(default) usearviz_base.rcParams["stats.point_estimate"]. Forwarded toarviz_plots.plot_dist. plot_collection : arviz_plots.PlotCollection = None-
The plot collection to use. Forwarded to
arviz_plots.plot_dist. backend : (matplotlib,plotly,bokeh) = "matplotlib"-
The backend to use for plotting. If
None, it inspects whetherplot_connectionis notNone. If it’s not, it usesplot_collection.backend. Otherweise, it usesarviz_base.rcParams["plot.backend"]. Forwarded toarviz_plots.plot_dist. labeller :arviz_base.labels.BaseLabeller = None-
The labeller. If
None, it usesarviz_base.labels.BaseLabeller. Forwarded toarviz_plots.plot_dist. aes_by_visuals : mapping of {str : sequence of str} = None-
Forwarded to
arviz_plots.plot_dist. Seeaes_by_visualsin there. visuals : mapping of {str : mapping or bool} = None-
Forwarded to
arviz_plots.plot_dist. Seevisualsin there. stats :mapping= None-
Forwarded to
arviz_plots.plot_dist. Seestatsin there. figsize : tuple = None-
Figure size. If
Noneit will be defined automatically. omit_offsets : bool = True-
Whether to omit offset terms in the plot. Defaults to
True. omit_group_specific : bool = True-
Whether to omit group specific effects in the plot. Defaults to
True. random_seed : int or None = None-
Seed for random number generator. Passed down to Model.prior_predictive.
bins : (int,optional,deprecated) = None-
This argument is deprecated and will be removed in future versions.
hdi_prob : (float or str,optional,deprecated) = None-
Plots highest density interval for chosen percentage of density. Use
"hide"to hide the highest density interval. This argument is deprecated and will be removed in future versions. round_to : (int,optional,deprecated) = None-
Controls formatting of floats. Defaults to 2 or the integer part, whichever is bigger. This argument is deprecated and will be removed in future versions.
pc_kwargs : dict = {}-
Passed to
arviz_plots.PlotCollection.wrap
Returns
predict
Model.predict(
idata,
kind='response_params',
data=None,
inplace=True,
include_group_specific=True,
sample_new_groups=False,
random_seed=None,
)Predict method for Bambi models
Obtains in-sample and out-of-sample predictions from a fitted Bambi model.
Parameters
idata :InferenceData-
The
InferenceDatainstance returned by.fit(). kind : str = 'response_params'-
Indicates the type of prediction required. Can be
"response_params"or"response". The first returns draws from the posterior distribution of the likelihood parameters, while the latter returns the draws from the posterior predictive distribution (i.e. the posterior probability distribution for a new observation) in addition to the posterior distribution. Defaults to"response_params". data :pd.DataFrameor None = None-
An optional data frame with values for the predictors that are used to obtain out-of-sample predictions. If omitted, the original dataset is used.
inplace : bool = True-
If
Trueit will modifyidatain-place. Otherwise, it will return a copy ofidatawith the predictions added. Ifkind="response_params", a new variable with the name of the parent parameter, e.g."mu"and"sigma"for a Gaussian likelihood, or"p"for a Bernoulli likelihood, is added to theposteriorgroup. Ifkind="response", it appends aposterior_predictivegroup toidata. If any of these already exist, it will be overwritten. include_group_specific : bool = True-
Determines if predictions incorporate group-specific effects. If
False, predictions are made with common effects only (i.e. group specific are set to zero). Defaults toTrue. sample_new_groups : bool = False-
Specifies if it is allowed to obtain predictions for new groups of group-specific terms. When
True, each posterior sample for the new groups is drawn from the posterior draws of a randomly selected existing group. Since different groups may be selected at each draw, the end result represents the variation across existing groups. The method implemented is equivalent tosample_new_levels="uncertainty"in brms. random_seed : (int,RandomStateorGenerator) = None-
Seed for the random number generator.
Returns
:InferenceDataor None
prior_predictive
Model.prior_predictive(
draws=500,
var_names=None,
omit_offsets=True,
random_seed=None,
)Generate samples from the prior predictive distribution.
Parameters
draws : int = 500-
Number of draws to sample from the prior predictive distribution. Defaults to 500.
var_names : str, list of str or None = None-
A list of names of variables for which to compute the prior predictive distribution. Defaults to
Nonewhich means both observed and unobserved RVs. omit_offsets : bool = True-
Whether to omit offset terms in the plot. Defaults to
True. random_seed : int or None = None-
Seed for the random number generator.
Returns
:InferenceData-
InferenceDataobject with the groupsprior,prior_predictiveandobserved_data.
r2_score
Model.r2_score(idata, summary=True)R² for Bayesian regression models.
The R², or coefficient of determination, is defined as the proportion of variance in the data that is explained by the model. It is computed as the variance of the predicted values divided by the variance of the predicted values plus the variance of the residuals. For details of the Bayesian R² see [1]_.
Parameters
idata :InferenceData-
The
InferenceDatainstance returned by.fit(). It should contain theposterior_predictivegroup, otherwise it will be computed and added toidata. summary : bool = True-
If
True, it returns a summary of the Bayesian R². Otherwise, it returns the posterior samples of the Bayesian R².
Returns
:pandas.Series-
A series with the following indices: r2: mean value for the Bayesian R² r2_std: standard deviation of the Bayesian R².
References
.. [1] Gelman et al. R-squared for Bayesian regression models. The American Statistician. 73(3) (2019). https://doi.org/10.1080/00031305.2018.1549100 preprint http://www.stat.columbia.edu/~gelman/research/published/bayes_R2_v3.pdf].
set_alias
Model.set_alias(aliases)Set aliases for the terms and auxiliary parameters in the model
Parameters
aliases : dict of str to str-
A dictionary where key represents the original term name and the value is the alias.
Returns
:None
set_priors
Model.set_priors(priors=None, common=None, group_specific=None)Set priors for one or more existing terms.
Parameters
priors : dict or None = None-
Dictionary of priors to update. Keys are names of terms to update; values are the new priors (either a
Priorinstance, or an int or float that scales the default priors). common : (Prior, int, float or None) = None-
A prior specification to apply to all common terms included in the model.
group_specific : (Prior, int, float or None) = None-
A prior specification to apply to all group specific terms included in the model.