%load_ext autoreload
%autoreload 2
Gaussian Processes
This article demonstrates the how to use Bambi with Gaussian Processes with 1 dimensional predictors. Bambi supports Gaussian Processes through the approximation known as Hilbert Space Gaussian Processes (HSGP).
HSGP is a framework that falls under the class of low-rank approximations that are based on forming a basis function approximation with \(m\) basis functions, where \(m\) is usually much less smaller than \(n\), the number of observations.
For references see Hilbert Space Methods for Reduced-Rank Gaussian Process Regression and Practical Hilbert Space Approximate Bayesian Gaussian Processes for Probabilistic Programming.
If you prefer a video format, have a look at Introduction to Hilbert Space GPs in PyMC given by Bill Engels.
from formulae import design_matrices
import arviz as az
import bambi as bmb
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from bambi.interpret import plot_predictions
from matplotlib.lines import Line2D
A basic example
Let’s get started simulating some data from a smooth function. The goal is to fit a normal likelihood model where a Gaussian process term contributes to the mean.
= np.random.default_rng(seed=121195)
rng
= 100
size = np.linspace(0, 50, size)
x = 0.1 * rng.normal(size=6)
b = 0.15
sigma
= design_matrices("0 + bs(x, df=6, intercept=True)", pd.DataFrame({"x": x}))
dm = np.array(dm.common)
X = 10 * X @ b
f = f + rng.normal(size=size) * sigma
y = pd.DataFrame({"x": x, "y": y})
df
= plt.subplots(figsize=(9, 6))
fig, ax =30, alpha=0.8);
ax.scatter(x, y, s="black"); ax.plot(x, f, color
Now let’s simply create and fit the model. We use the hsgp
to initialize a HSGP term in the model formula. Notice we pass the variable x
and values for two other arguments m
and c
that we’ll cover later.
= bmb.Model("y ~ 0 + hsgp(x, m=10, c=2)", df)
model model
Formula: y ~ 0 + hsgp(x, m=10, c=2)
Family: gaussian
Link: mu = identity
Observations: 100
Priors:
target = mu
HSGP contributions
hsgp(x, m=10, c=2)
cov: ExpQuad
sigma ~ Exponential(lam: 1.0)
ell ~ InverseGamma(alpha: 3.0, beta: 2.0)
Auxiliary parameters
sigma ~ HalfStudentT(nu: 4.0, sigma: 0.2745)
In the model description we can see the contribution of the HSGP term. It consists of two things: the name of the covariance kernel and the priors for its parameters. In this case, it’s an Exponentiated Quadratic covariance kernel with parameters sigma
(amplitude) and ell
(lengthscale). The prior for the amplitude is Exponential(1)
and the prior for the lengthscale is InverseGamma(3, 2)
.
= model.fit(random_seed=121195)
idata print(idata.sample_stats["diverging"].sum().to_numpy())
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sigma, hsgp(x, m=10, c=2)_sigma, hsgp(x, m=10, c=2)_ell, hsgp(x, m=10, c=2)_weights_raw]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 9 seconds.
There were 251 divergences after tuning. Increase `target_accept` or reparameterize.
We recommend running at least 4 chains for robust computation of convergence diagnostics
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
251
={"layout": "constrained"}); az.plot_trace(idata, backend_kwargs
The fit is horrible. To fix that we can use better priors. But before doing that, it’s important to note that HSGP terms have a unique characteristic in that they do not receive priors themselves. Rather, the associated parameters of an HSGP term, such as sigma
and ell
, are the ones that are assigned priors. Therefore, we need to assign the HSGP term a dictionary of priors instead of a single prior.
= {
prior_hsgp "sigma": bmb.Prior("Exponential", lam=2), # amplitude
"ell": bmb.Prior("InverseGamma", mu=10, sigma=1) # lengthscale
}
# This is the dictionary we pass to Bambi
= {
priors "hsgp(x, m=10, c=2)": prior_hsgp,
"sigma": bmb.Prior("HalfNormal", sigma=10)
}= bmb.Model("y ~ 0 + hsgp(x, m=10, c=2)", df, priors=priors)
model model
Formula: y ~ 0 + hsgp(x, m=10, c=2)
Family: gaussian
Link: mu = identity
Observations: 100
Priors:
target = mu
HSGP contributions
hsgp(x, m=10, c=2)
cov: ExpQuad
sigma ~ Exponential(lam: 2.0)
ell ~ InverseGamma(mu: 10.0, sigma: 1.0)
Auxiliary parameters
sigma ~ HalfNormal(sigma: 10.0)
Notice the priors were updated in the model summary. Now we’re ready to fit the model!
= model.fit(random_seed=121195)
idata print(idata.sample_stats["diverging"].sum().to_numpy())
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sigma, hsgp(x, m=10, c=2)_sigma, hsgp(x, m=10, c=2)_ell, hsgp(x, m=10, c=2)_weights_raw]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 22 seconds.
There were 11 divergences after tuning. Increase `target_accept` or reparameterize.
We recommend running at least 4 chains for robust computation of convergence diagnostics
11
={"layout": "constrained"}); az.plot_trace(idata, backend_kwargs
The marginal posteriors look somehow better, but we still have lots of divergences. What else can we do? Change the parametrization!
The hsgp()
function has a centered
argument which defaults to False
and thus Bambi uses a non-centered parametrization by default. But we can change that actually. Let’s try it!
= {
prior_hsgp "sigma": bmb.Prior("Exponential", lam=2), # amplitude
"ell": bmb.Prior("InverseGamma", mu=10, sigma=1) # lengthscale
}
# This is the dictionary we pass to Bambi
= {
priors "hsgp(x, m=10, c=2, centered=True)": prior_hsgp,
"sigma": bmb.Prior("HalfNormal", sigma=10)
}= bmb.Model("y ~ 0 + hsgp(x, m=10, c=2, centered=True)", df, priors=priors)
model model
Formula: y ~ 0 + hsgp(x, m=10, c=2, centered=True)
Family: gaussian
Link: mu = identity
Observations: 100
Priors:
target = mu
HSGP contributions
hsgp(x, m=10, c=2, centered=True)
cov: ExpQuad
sigma ~ Exponential(lam: 2.0)
ell ~ InverseGamma(mu: 10.0, sigma: 1.0)
Auxiliary parameters
sigma ~ HalfNormal(sigma: 10.0)
= model.fit(random_seed=121195)
idata print(idata.sample_stats["diverging"].sum().to_numpy())
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sigma, hsgp(x, m=10, c=2, centered=True)_sigma, hsgp(x, m=10, c=2, centered=True)_ell, hsgp(x, m=10, c=2, centered=True)_weights]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 19 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics
0
={"layout": "constrained"}); az.plot_trace(idata, backend_kwargs
Awesome! That looks much better now.
We still get all the nice things from Bambi when using GPs. An example of this is the plot_cap()
function which enables us to generate a visualization of the adjusted mean with credible bands automatically.
= plt.subplots(figsize=(9, 6))
fig, ax "x"], df["y"], s=30, color="0.5", alpha=0.5)
ax.scatter(df["x", ax=ax);
plot_predictions(model, idata, set(xlabel="Predictor", ylabel="Observed"); ax.
Default computed for conditional variable: x
And on top of that, it’s possible to get draws from the posterior predictive distribution and plot credible bands for it. All we need is the .predict()
method from the model class.
= pd.DataFrame({"x": np.linspace(0, 50, num=500)})
new_data ="response", data=new_data)
model.predict(idata, kind= idata.posterior_predictive["y"].to_numpy().reshape(2000, 500)
pps = np.quantile(pps, q=(0.025, 0.975), axis=0)
qts
= plt.subplots(figsize=(9, 6))
fig, ax "x"], qts[0], qts[1], color="C0", alpha=0.6)
ax.fill_between(new_data["x"], df["y"], s=30, color="C1", alpha=0.9)
ax.scatter(df[="black", ls="--")
ax.plot(x, f, colorset(xlabel="Predictor", ylabel="Observed");
ax.
= [Line2D([], [], color="black", ls="--"), Line2D([], [], color="C0")]
handles = ["True curve", "Posterior predictive distribution"]
labels ; ax.legend(handles, labels)
How does hsgp()
work?
hsgp()
is a transformation that is available in the namespace where the model formula is evaluated. In plain english, hsgp()
is like a function you can use in your model formulas. You don’t need to worry about the details, Bambi knows how to handle them.But if still you want to see the actual code, you can have a look at the implementation of the HSGP
class in bambi/transformations.py.
What users do need to care about is the arguments the hsgp()
transformation support. There are a bunch of arguments that can be passed after the variable number of non-keyword arguments representing the variables of the HSGP contribution. Below is a brief overview of these arguments and their respective descriptions.
m
: The number of basis vectorsL
: The boundary of the variable spacec
: The proportion extension factorby
: This argument specifies the values of a variable used for grouping. It is used to create a HSGP term by group. If left unspecified, the default value isNone
, which means that there is no group variable and all observations belong to the same group.cov
: This argument specifies the name of the covariance function to be used. The default value is"ExpQuad"
.share_cov
: Determines whether the same covariance function is shared across all groups. This argument is relevant only when by is notNone
and the default value isTrue
.scale
: When set toTrue
, the predictors are be rescaled such that the largest Euclidean distance between two points is 1. This adjustment often improves the sampling speed and convergence.iso
: Determines whether to use an isotropic or non-isotropic Gaussian Process. With an isotropic GP, the same level of smoothing is applied to all predictors, while a anisotropic GP allows different levels of smoothing for individual predictors. Note that this argument is ignored if only one predictor is provided. The default value isTrue
.drop_first
: Whether to exclude the first basis vector or not. The default value isFalse
.centered
: Whether to use the centered or the non-centered parametrization. Defaults toFalse
.
The parameters m
, L
and c
are directly related to the basis vectors of the HSGP approximation. If you want to know more about m
, L
, and/or c
, it’s recommended to have a look at the documentation of the HSGP class in PyMC.
So far, we showcased how to use m
, c
and centered
. In the remainder of this article we’re going to see how by
and share_cov
are used when we add a GP contribution by groups.
HSGP by levels of a categorical covariate
In this section we fit a model with a HSGP contribution by levels of a categorical variable. The data was simulated with the gamSim()
function from the R package {mgcv}
by Simon Wood.
= pd.read_csv("data/gam_data.csv")
data "fac"] = pd.Categorical(data["fac"])
data["x2", "y", "fac"]] data.head()[[
x2 | y | fac | |
---|---|---|---|
0 | 0.497183 | 3.085274 | 3 |
1 | 0.196003 | -2.250410 | 2 |
2 | 0.958474 | 0.070548 | 3 |
3 | 0.972759 | -0.230454 | 1 |
4 | 0.755836 | 2.173497 | 2 |
Let’s visualize x2
versus y
for the different levels in fac
.
= plt.subplots(figsize=(9, 5))
fig, ax = [f"C{i}" for i in pd.Categorical(data["fac"]).codes]
colors "x2"], data["y"], color=colors, alpha=0.6)
ax.scatter(data[set(xlabel="x2", ylabel="y"); ax.
We can observe the relation between x2
and y
can be approximated by a smooth non-linear curve, for all groups.
Below, we create the model with Bambi. The biggest difference is that we’re passing by=fac
in the hsgp()
call. This is all we need to ask Bambi to create multiple GP contribution terms, one per group.
Another trick that was not shown previously is the usage of an alias. .set_alias()
from the Model
class allow us to have more readable and shorter names for the components of a model. As you’ll see below, it makes a huge difference when displaying summaries or visualizations for the parameters of the model.
= {
prior_hsgp "sigma": bmb.Prior("Exponential", lam=3),
"ell": bmb.Prior("Exponential", lam=3)
}= {
priors "hsgp(x2, by=fac, m=12, c=1.5)": prior_hsgp,
"sigma": bmb.Prior("HalfNormal", sigma=1)
}= bmb.Model("y ~ 0 + hsgp(x2, by=fac, m=12, c=1.5)", data, priors=priors)
model "hsgp(x2, by=fac, m=12, c=1.5)": "hsgp"})
model.set_alias({ model
Formula: y ~ 0 + hsgp(x2, by=fac, m=12, c=1.5)
Family: gaussian
Link: mu = identity
Observations: 300
Priors:
target = mu
HSGP contributions
hsgp(x2, by=fac, m=12, c=1.5)
cov: ExpQuad
sigma ~ Exponential(lam: 3.0)
ell ~ Exponential(lam: 3.0)
Auxiliary parameters
sigma ~ HalfNormal(sigma: 1.0)
model.build() model.graph()
See how nicer are the names for the HSGP contribution parameters with the alias!
= model.fit(target_accept=0.95, random_seed=121195)
idata print(idata.sample_stats.diverging.sum().item())
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sigma, hsgp_sigma, hsgp_ell, hsgp_weights_raw]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 34 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics
0
az.plot_trace(
idata, =["hsgp_weights", "hsgp_sigma", "hsgp_ell", "sigma"],
var_names={"layout": "constrained"}
backend_kwargs; )
This time we got no divergences and good mixing and nice convergence in our first try (or perhaps it wasn’t the first try!). One thing that stands out are the marginal posterior for some of the beta parameters (the weights of the basis). This may indicate our approximation is using more basis vectors than what’s really needed.
Note: At this point we have used the term ‘basis vector’ several times. This concept is very close to the concept of ‘basis functions’. The difference is that the ‘basis vector’ is a ‘basis function’ already evaluated at a set of points. In this case, the set of points is made by the values of the numerical predictor x2
.
Do you remember how easy was it to use plot_predictions()
above? Should it be harder now? Well, the answer will surprise you: No!
All we need to do is passing a second variable name which is mapped to the color in the visualization. Voilà!
= plt.subplots(figsize = (9, 5))
fig, ax = [f"C{i}" for i in pd.Categorical(data["fac"]).codes]
colors "x2"], data["y"], color=colors, alpha=0.6)
ax.scatter(data["x2", "fac"], ax=ax); plot_predictions(model, idata, [
Default computed for conditional variable: x2, fac
We can go one step further and modify the model to use different covariance functions for the different groups. For that purpose, we pass share_cov=False
. As always, Bambi takes care of all the details.
= {
prior_hsgp "sigma": bmb.Prior("Exponential", lam=1),
"ell": bmb.Prior("Exponential", lam=3)
}= {
priors "hsgp(x2, by=fac, m=12, c=1.5, share_cov=False)": prior_hsgp,
"sigma": bmb.Prior("HalfNormal", sigma=1)
}= bmb.Model("y ~ 0 + hsgp(x2, by=fac, m=12, c=1.5, share_cov=False)", data, priors=priors)
model "hsgp(x2, by=fac, m=12, c=1.5, share_cov=False)": "hsgp"})
model.set_alias({ model
Formula: y ~ 0 + hsgp(x2, by=fac, m=12, c=1.5, share_cov=False)
Family: gaussian
Link: mu = identity
Observations: 300
Priors:
target = mu
HSGP contributions
hsgp(x2, by=fac, m=12, c=1.5, share_cov=False)
cov: ExpQuad
sigma ~ Exponential(lam: 1.0)
ell ~ Exponential(lam: 3.0)
Auxiliary parameters
sigma ~ HalfNormal(sigma: 1.0)
model.build() model.graph()
Have a closer look at the model graph. See that the hsgp_sigma
and hsgp_ell
parameters are no longer scalar. There are three of each, one for each group.
= model.fit(target_accept=0.95, random_seed=121195) idata
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sigma, hsgp_sigma, hsgp_ell, hsgp_weights_raw]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 55 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics
az.plot_trace(
idata, =["hsgp_ell", "hsgp_sigma", "sigma"],
var_names={"layout": "constrained"}
backend_kwargs; )
In fact, we can see not all the groups have similar posteriors for the covariance function parameters when they are allowed to vary.
Before closing the article, it’s worth looking at a particular but not uncommon pattern when using the HSGP approximation. Let’s have a look at the posterior distributions for the weights of the basis.
=["hsgp_weights"], backend_kwargs={"layout": "constrained"}); az.plot_trace(idata, var_names
Looks like some distributions are extremely flat, and others are extremely tight around zero.
To investigate this further we can manually plot the posterior for the first J basis vectors and see what they look like.
= 6
basis_n = plt.subplots(3, 1, figsize = (7, 10))
fig, axes for i in range(3):
= axes[i]
ax = idata.posterior["hsgp_weights"].sel({"hsgp_by": i + 1})
values for j in range(basis_n):
az.plot_kde("hsgp_weights_dim": j}).to_numpy().flatten(),
values.sel({=ax,
ax={"color": f"C{j}"}
plot_kwargs; )
Indeed, we can see that, at least for the first group, the posterior of the weights start being too tight around zero when we consider the 6th basis vector. But the posteriors for the weights of the previous basis vectors look nice.
To confirm our thought, let’s increase the value of basis_n
to 9 and see what happens.
= 9
basis_n = plt.subplots(3, 1, figsize = (7, 10))
fig, axes for i in range(3):
= axes[i]
ax = idata.posterior["hsgp_weights"].sel({"hsgp_by": i + 1})
values for j in range(basis_n):
az.plot_kde("hsgp_weights_dim": j}).to_numpy().flatten(),
values.sel({=ax,
ax={"color": f"C{j}"}
plot_kwargs; )
Alright, that’s too spiky! Nonetheless, we don’t see that happening for the third group yet, indicating the higher number of basis vectors is more appropriate for this group.