formulae is a Python library that implements Wilkinson’s formulas for mixed-effects models.
This package has been written to make it easier to specify models with group effects in Bambi,
a package that makes it easy to work with Bayesian GLMMs in Python, but it could be used
independently as a backend for another library.
The approach in this library is to extend classical statistical formulas in a similar way than in
R package lme4.
History, related projects and credits
formulae was built specifically to satisfy the need for a more concise manner to specify mixed
effects models in Bambi. Before formulae, Bambi used to rely on
Patsy to parse model formulas and build design matrices.
While Patsy is great, flexible, and solid, it does not support formulas with mixed model effects.
At that time, Bambi developers would ask users for random effects to be passed in a list separated
from the model formula which was cumbersome for models with several terms.
It could have been possible to attempt to modify Patsy to make it work with mixed effects formulas.
But lack of familiarity with the internals of the library and the motivation to write something
completely custom to our needs predominated and formulae development started.
From the very beginning, formulae was built with Bambi needs in mind. That’s why its main function,
design_matrices()
, does not return an object that can be directly used as a design matrix, but
as a wrapper for classes containing the design matrices as well as useful methods and attributes.
These methods and attributes are extensively used within Bambi to build internal objects that give
shape to Bambi models.
formulae was officially incorporated into Bambi a couple of months after its inception. Several
updates, bug fixes, and improvements took place from that moment. While there is still much work to
be done, the current shape of formulae does at good job at meeting needs in Bambi.
Future efforts are more likely to be concentrated around adding new features and making the library
more solid in general, instead of converting formulae into a high-level library that can be used as
a direct replacement of Patsy.
But formulae couldn’t have existed if it wasn’t for the following projects that served as both
inspiration and source of information
The work where everything started: Wilkinson, G., & Rogers, C. Symbolic Description of Factorial Models for Analysis of Variance. Journal of the Royal Statistics Society 22, pp. 392–399, 1973.
R: Probably the most popular implementation of Wilkinson’s formulas.
lme4: For the |
operator to extend Wilkinson’s formulas to mixed effects models and helpful information on how to compute mixed effects matrices.
Patsy: The most widely used implementation of Wilkinson’s formulas in Python. Its implementation helped us to write formulae, especially its module and documentation on evaluation environments.
Formulaic: Another implementation of Wilkinson’s formulas in Python that we came across in the middle of this journey. The usage of the backtick operator `
and the quote operator {
are taken from this library.
Finally, if you came here because you only need to obtain design matrices for linear models with
fixed effects, you’d better use Patsy or Formulaic. They are much more friendly and go straight to
the point of returning a design matrix. On the contrary, if you are a developer or someone who needs
to automatically generate design matrices for mixed-effects models, have a try with formulae and
feel free to reach out to us if you have any question or sugerence.