Internals#

This reference provides detailed documentation for modules and classes that are important to developers who want to include formulae in their library.

matrices#

These objects are not intended to be used by end users. But developers working with formulae will need some familiarity with them, especially if you want to take advantage of features like obtaining a design matrix from an existing design but evaluated with new data.

class formulae.matrices.ResponseMatrix(term)[source]#

Representation of the respose matrix of a model.

Parameters

term (Response) – The term that represents the response in the model.

design_matrix#

A 2-dimensional numpy array containing the values of the response.

Type

np.array

name#

The name of the response term.

Type

string

kind#

The kind of the response. Can be "numeric", "categoric" or ``”proportion”`.

Type

string

as_dataframe()[source]#

Returns self.design_matrix as a pandas.DataFrame.

evaluate(data, env)[source]#

Evaluates self.term inside the data mask provided by data and updates self.design_matrix and self.name.

Parameters
  • data (pandas.DataFrame) – The data frame where variables are taken from.

  • env (Environment) – The environment where values and functions are taken from.

class formulae.matrices.CommonEffectsMatrix(terms)[source]#

Representation of the design matrix for the common effects of a model.

Parameters

terms (list) – A list of Term objects.

design_matrix#

A 2-dimensional numpy array containing the values of the design matrix.

Type

np.array

evaluated#

Indicates if the terms have been evaluated at least once. The terms must have been evaluated before calling self.evaluate_new_data() because we must know the kind of each term to correctly handle the new data passed and the terms here.

Type

bool

terms#

A dictionary that holds all the terms passed at instantiation. The keys are given by the term names.

Type

dict

__getitem__(term)[source]#

Get the sub-matrix that corresponds to a given term.

Parameters

term (string) – The name of the term.

Returns

matrix – A 2-dimensional numpy array that represents the sub-matrix corresponding to the term passed.

Return type

np.array

as_dataframe()[source]#

Returns self.design_matrix as a pandas.DataFrame.

evaluate(data, env)[source]#

Obtain design matrix for common effects.

Evaluates self.model inside the data mask provided by data and updates self.design_matrix. This method also sets the values of self.data and self.env.

It also populates the dictionary self.slices

Parameters
  • data (pandas.DataFrame) – The data frame where variables are taken from

  • env (Environment) – The environment where values and functions are taken from.

evaluate_new_data(data)[source]#

Evaluates common terms with new data and return a new instance of CommonEffectsMatrix.

This method is intended to be used to obtain design matrices for new data and obtain out of sample predictions. Stateful transformations are properly handled if present in any of the terms, which means parameters involved in the transformation are not overwritten with the new data.

Parameters

data (pandas.DataFrame) – The data frame where variables are taken from

Returns

new_instance – A new instance of CommonEffectsMatrix whose design matrix is obtained with the values in the new data set.

Return type

CommonEffectsMatrix

class formulae.matrices.GroupEffectsMatrix(terms)[source]#

Representation of the design matrix for the group specific effects of a model.

The sub-matrix that corresponds to a specific group effect can be accessed by self[term_name], for example self["1|g"].

Parameters

terms (list) – A list of GroupSpecificTerm objects.

design_matrix#

A 2 dimensional numpy array with the values of the design matrix.

Type

np.array

evaluated#

Indicates if the terms have been evaluated at least once. The terms must have been evaluated before calling self.evaluate_new_data() because we must know the kind of each term to correctly handle the new data passed and the terms here.

Type

bool

terms#

A dictionary that holds all the group specific terms. The keys are given by the term names.

Type

dict

__getitem__(term)[source]#

Get the sub-matrix that corresponds to a given term.

Parameters

term (string) – The name of a group specific term.

Returns

matrix – A 2-dimensional numpy array that represents the sub-matrix corresponding to the term passed.

Return type

np.array

evaluate(data, env)[source]#

Evaluate group specific terms.

This evaluates self.terms inside the data mask provided by data and the environment env. It updates self.design_matrix with the result from the evaluation of each term.

This method also sets the values of self.data and self.env. It also populates the dictionary self.terms_info with information related to each term,such as the kind, the columns and rows they occupy in the design matrix and the names of the columns.

Parameters
  • data (pandas.DataFrame) – The data frame where variables are taken from

  • env (Environment) – The environment where values and functions are taken from.

evaluate_new_data(data)[source]#

Evaluates group specific terms with new data and return a new instance of GroupEffectsMatrix.

This method is intended to be used to obtain design matrices for new data and obtain out of sample predictions. Stateful transformations are properly handled if present in any of the group specific terms, which means parameters involved in the transformation are not overwritten with the new data.

Parameters

data (pandas.DataFrame) – The data frame where variables are taken from

Returns

new_instance – A new instance of GroupEffectsMatrix whose design matrix is obtained with the values in the new data set.

Return type

GroupEffectsMatrix

terms#

These are internal components of the model that are not expected to be used by end users. Developers won’t (normally) need to access these objects either. But reading this documentation may help you understand how formulae works, with both its advantages and disadvantages.

class formulae.terms.Variable(name, level=None, is_response=False)[source]#

Representation of a variable in a model Term.

This class and Call are the atomic components of a model term.

Parameters
  • name (string) – The identifier of the variable.

  • level (string) – The level to use as reference. Allows to use the notation variable["level"] to indicate which event should be model as success in binary response models. Can only be used with response terms. Defaults to None.

  • is_response (bool) – Indicates whether this variable represents a response. Defaults to False.

eval_categoric(x, spans_intercept)[source]#

Finishes evaluation of a categoric variable.

Converts the intermediate values in x into a numpy array of shape (n, p), where n is the number of observations and p the number of dummy variables used in the numeric representation of the categorical variable.

Parameters
  • x (np.ndarray or pd.Series) – The intermediate values of the variable.

  • spans_intercept (bool) – Indicates if the encoding of categorical variables spans the intercept or not. Omitted when the variable is numeric.

eval_new_data(data_mask)[source]#

Evaluates the variable with new data.

This method evaluates the variable within a new data mask. If this object is categorical, original encoding is remembered (and checked) when carrying out the new evaluation.

Parameters

data_mask (pd.DataFrame) – The data frame where variables are taken from

Returns

result – The rules for the shape of this array are the rules for self.eval_numeric() and self.eval_categoric(). The first applies for numeric variables, the second for categoric ones.

Return type

np.array

eval_new_data_categoric(x)[source]#

Evaluates the variable with new data when variable is categoric.

This method also checks the levels observed in the new data frame are included within the set of the levels of the original data set. If not, an error is raised.

x: np.ndarray or pd.Series

The intermediate values of the variable.

Returns

result – Numeric numpy array (n, p), where n is the number of observations and p the number of dummy variables used in the numeric representation of the categorical variable.

Return type

np.array

eval_numeric(x)[source]#

Finishes evaluation of a numeric variable.

Converts the intermediate values in x into a 1d numpy array.

Parameters

x (np.ndarray or pd.Series) – The intermediate values of the variable.

property labels#

Obtain labels of the columns in the design matrix associated with this Variable

set_data(spans_intercept=None)[source]#

Obtains and stores the final data object related to this variable.

Parameters

spans_intercept (bool) – Indicates if the encoding of categorical variables spans the intercept or not. Omitted when the variable is numeric.

set_type(data_mask)[source]#

Detemines the type of the variable.

Looks for the name of the variable in data_mask and sets the .kind property to "numeric" or "categoric" depending on the type of the variable. It also stores the result of the intermediate evaluation in self._intermediate_data.

Parameters

data_mask (pd.DataFrame) – The data frame where variables are taken from

property var_names#

Returns the name of the variable as a set.

This is used to determine which variables of the data set being used are actually used in the model. This allows us to subset the original data set and only raise errors regarding missing values when the missingness happens in variables used in the model.

class formulae.terms.Call(call, is_response=False)[source]#

Representation of a call in a model Term.

This class and Variable are the atomic components of a model term.

This object supports stateful transformations defined in formulae.transforms. A transformation of this type defines its parameters the first time it is called, and then can be used to recompute the transformation with memorized parameter values. This behavior is useful when implementing a predict method and using transformations such as center(x) or scale(x). center(x) memorizes the value of the mean, and scale(x) memorizes both the mean and the standard deviation.

Parameters
  • call (formulae.terms.call_resolver.LazyCall) – The call expression returned by the parser.

  • is_response (bool) – Indicates whether this call represents a response. Defaults to False.

accept(visitor)[source]#

Accept method called by a visitor.

Visitors are those available in call_utils.py, and are used to work with call terms.

eval_categoric(x, spans_intercept)[source]#

Finishes evaluation of categoric call.

First, it checks whether the intermediate evaluation returned is ordered. If not, it creates a category where the levels are the observed in the variable. They are sorted according to sorted() rules.

Then, it determines the reference level as well as all the other levels. If the variable is a response, the value returned is a dummy with 1s for the reference level and 0s elsewhere. If it is not a response variable, it determines the matrix of dummies according to the levels and the encoding passed.

Parameters
  • x (np.ndarray or pd.Series) – The intermediate values of the variable.

  • spans_intercept (bool) – Indicates if the encoding of categorical variables spans the intercept or not. Omitted when the variable is numeric.

eval_new_data(data_mask)[source]#

Evaluates the function call with new data.

This method evaluates the function call within a new data mask. If the transformation applied is a stateful transformation, it uses the proper object that remembers all parameters or settings that may have been set in a first pass.

Parameters

data_mask (pd.DataFrame) – The data frame where variables are taken from

Returns

result – The rules for the shape of this array are the rules for self.eval_numeric() and self.eval_categoric(). The first applies for numeric calls, the second for categoric ones.

Return type

np.array

eval_new_data_categoric(x)[source]#

Evaluates the call with new data when the result of the call is categoric.

This method also checks the levels observed in the new data frame are included within the set of the levels of the result of the original call If not, an error is raised.

x: np.ndarray or pd.Series

The intermediate values of the variable.

Returns

result – Numeric numpy array (n, p), where n is the number of observations and p the number of dummy variables used in the numeric representation of the categorical variable.

Return type

np.array

eval_numeric(x)[source]#

Finishes evaluation of a numeric call.

Converts the intermediate values of the call into a numpy array of shape (n, 1), where n is the number of observations. This method is used both in self.set_data and in self.eval_new_data.

Parameters

x (np.ndarray or pd.Series) – The intermediate values resulting from the call.

Returns

result – A dictionary with keys "value" and "kind". The first contains the result of the evaluation, and the latter is equal to "numeric".

Return type

dict

property labels#

Obtain labels of the columns in the design matrix associated with this Call

set_data(spans_intercept=False)[source]#

Finishes the evaluation of the call according to its type.

It does not support multi-level categoric responses yet. If self.is_response is True and the variable is of a categoric type, this method returns a 1d array of 0-1 instead of a matrix. # XTODO: Fix previous point In practice, it just completes the evaluation that started with self.set_type().

Parameters

spans_intercept (bool) – Indicates if the encoding of categorical variables spans the intercept or not. Omitted when the variable is numeric.

set_type(data_mask, env)[source]#

Evaluates function and determines the type of the result of the call.

Evaluates the function call and sets the .kind property to "numeric" or "categoric" depending on the type of the result. It also stores the intermediate result of the evaluation in ._intermediate_data to prevent us from computing the same thing more than once.

Parameters
  • data_mask (pd.DataFrame) – The data frame where variables are taken from

  • env (Environment) – The environment where values and functions are taken from.

property var_names#

Returns the names of the variables involved in the call, not including the callee.

This is used to determine which variables of the data set being used are actually used in the model. This allows us to subset the original data set and only raise errors regarding missing values when the missingness happens in variables used in the model.

Uses a visitor of class CallVarsExtractor that walks through the components of the call and returns a list with the name of the variables in the call.

Returns

result – A list of strings with the names of the names of the variables in the call, not including the name of the callee.

Return type

list

class formulae.terms.Term(*components)[source]#

Representation of a model term.

Terms are made of one or more components. Components are instances of Variable or Call. Terms with only one component are known as main effects and terms with more than one component are known as interaction effects. The order of the interaction is given by the number of components in the term.

Parameters

components (Variable or Call) – Atomic components of a term.

data#

The values associated with the term as they go into the design matrix.

Type

np.ndarray

kind#

Indicates the type of the term. Can be one of "numeric", "categoric", or "interaction".

Type

string

name#

The name of the term as it was originally written in the model formula.

Type

string

__add__(other)[source]#

Addition operator. Analogous to set union.

  • "x + x" is equal to just "x"

  • "x + y" is equal to a model with both x and y.

  • "x + (y + z)" adds x to model already containing y and z.

__matmul__(other)[source]#

Simple interaction operator.

This operator is actually invoked as : but internally passed as @ because there is no : operator in Python.

  • "x : x" equals to "x"

  • "x : y" is the interaction between "x" and "y"

  • x:(y:z)" equals to "x:y:z"

  • (x:y):u" equals to "x:y:u"

  • "(x:y):(u + v)" equals to "x:y:u + x:y:v"

__mul__(other)[source]#

Full interaction operator.

This operator includes both the interaction as well as the main effects involved in the interaction. It is a shortcut for x + y + x:y.

  • "x * x" equals to "x"

  • "x * y" equals to``”x + y + x:y”``

  • "x:y * u" equals to "x:y + u + x:y:u"

  • "x:y * u:v" equals to "x:y + u:v + x:y:u:v"

  • "x:y * (u + v)" equals to "x:y + u + v + x:y:u + x:y:v"

__or__(other)[source]#

Group-specific operator. Creates a group-specific term.

Intercepts are implicitly added.

  • "x|g" equals to "(1|g) + (x|g)"

Distributive over right hand side

  • "(x|g + h)" equals to "(1|g) + (1|h) + (x|g) + (x|h)"

__pow__(other)[source]#

Power operator.

It leaves the term as it is. For a power in the math sense do I(x ** n) or {x ** n}.

__sub__(other)[source]#

Subtraction operator. Analogous to set difference.

  • "x - x" returns empty model.

  • "x - y" returns the term "x".

  • "x - (y + z)" returns the term "x".

__truediv__(other)[source]#

Division interaction operator.

  • "x / x" equals to just "x"

  • "x / y" equals to "x + x:y"

  • "x / z:y" equals to "x + x:z:y"

  • "x / (z + y)" equals to "x + x:z + x:y"

  • "x:y / u:v" equals to "x:y + x:y:u:v"

  • "x:y / (u + v)" equals to "x:y + x:y:u + x:y:v"

eval_new_data(data)[source]#

Evaluates the term with new data.

Calls .eval_new_data() method on each component in the term and combines the results appropiately.

Parameters

data (pd.DataFrame) – The data frame where variables are taken from

Returns

result – The values resulting from evaluating this term using the new data.

Return type

np.array

get_component(name)[source]#

Returns a component by name.

Parameters

name (string) – The name of the component to return.

Returns

component – The component with name name.

Return type

:class:.Variable` or :class:.Call`

property labels#

Obtain labels of the columns in the design matrix associated with this Term

property levels#

Obtain levels of the columns in the design matrix associated with this Term

It is like .labels, without the name of the terms

set_data(spans_intercept)[source]#

Obtains and stores the final data object related to this term.

Calls .set_data() method on each component in the term. Then, it uses the .data attribute on each of them to build self.data and self.metadata.

Parameters

encoding (dict or bool) – Indicates if it uses full or reduced encoding when the type of the variable is categoric.

set_type(data, env)[source]#

Set type of the components in the term.

Calls .set_type() method on each component in the term. For those components of class Variable` it only passes the data mask. For :class:.Call` objects it also passes the evaluation environment.

Parameters
  • data (pd.DataFrame) – The data frame where variables are taken from

  • env (Environment) – The environment where values and functions are taken from.

property spans_intercept#

Does this term spans the intercept?

True if all the components span the intercept

property var_names#

Returns the name of the variables in the term as a set.

Loops through each component and updates the set with the .var_names of each component.

Returns

var_names – The names of the variables involved in the term.

Return type

set

class formulae.terms.GroupSpecificTerm(expr, factor)[source]#

Representation of a group specific term.

Group specific terms are of the form (expr | factor). The expression expr is evaluated as a model formula with only common effects and produces a model matrix following the rules for common terms. factor is inspired on factors in R, but here it is evaluated as an ordered pandas.CategoricalDtype object.

The operator | works as in R package lme4. As its authors say: “One way to think about the vertical bar operator is as a special kind of interaction between the model matrix and the grouping factor. This interaction ensures that the columns of the model matrix have different effects for each level of the grouping factor”

Parameters
  • expr (Intercept or Term) – The term for which we want to have a group specific term.

  • factor (Term) – The factor that determines the groups in the group specific term.

data#

The values associated with the term as they go into the design matrix.

Type

np.ndarray

metadata#

Metadata associated with the term. If "numeric" or "categoric" it holds additional information in the component .data attribute. If "interaction", the keys are the name of the components and the values are dictionaries holding the metadata.

Type

dict

kind#

Indicates the type of the term. Can be one of "numeric", "categoric", or "interaction".

Type

string

eval_new_data(data)[source]#

Evaluates the term with new data.

Converts the variable in factor to the type remembered from the first evaluation and produces the design matrix for this grouping, calls .eval_new_data() on self.expr to obtain the design matrix for the expr side, then computes the design matrix corresponding to the group specific effect.

Parameters

data (pd.DataFrame) – The data frame where variables are taken from.

Returns

Zi

Return type

np.ndarray

property name#

Obtain string representation of the name of the term.

Returns

name – The name of the term, such as 1|g or var|g.

Return type

str

property var_names#

Returns the name of the variables in the term as a set.

Obtains both the variables in the expr as well as the variables in factor.

Returns

var_names – The names of the variables involved in the term.

Return type

set

class formulae.terms.Intercept[source]#

Internal representation of a model intercept.

__add__(other)[source]#

Addition operator.

Generally this operator is used to explicitly add an intercept to a model. There may be cases where the result is not a Model, or does not contain an intercept.

  • "1 + 0" and "1 + (-1)" return an empty model.

  • "1 + 1" returns a single intercept.

  • "1 + x" and "1 + (x|g)" returns a model with both the term and the intercept.

  • "1 + (x + y)" adds an intercept to the model given by x and y.

__or__(other)[source]#

Group-specific interaction-like operator. Creates a group-specific intercept.

This operation is usually surrounded by parenthesis. It is not actually required. They are always used because | has lower precedence than any of the other operators except ~.

This operator is distributed over the right-hand side, which means (1|g + h) is equivalent to (1|g) + (1|h).

__sub__(other)[source]#

Subtraction operator.

This operator removes an intercept from a model if the given model has an intercept.

  • "1 - 1" returns an empty model.

  • "1 - 0" and "1 - (-1)" return an intercept.

  • "1 - (x + y)" returns the model given by x and y unchanged.

  • "1 - (1 + x + y)" returns the model given by x and y, removing the intercept.

eval_new_data(data)[source]#

Returns data for a new intercept.

The length of the new intercept is given by the number of rows in data.

set_data(encoding)[source]#

Creates data for the intercept.

It sets self.data equal to a numpy array of ones of length (self.len, 1).

set_type(data, env)[source]#

Sets length of the intercept.

property var_names#

Returns empty set, no variables are used in the intercept.

class formulae.terms.NegatedIntercept[source]#

Internal representation of the opposite of a model intercept.

This object is created whenever we use "0" or "-1" in a model formula. It is not expected to appear in a final model. It’s here to help us make operations using the Intercept and deciding when to keep it and when to drop it.

__add__(other)[source]#

Addition operator.

Generally this operator is used to explicitly remove an from a model.

  • "0 + 1" returns an empty model.

  • "0 + 0" returns a negated intercept

  • "0 + x" returns a model that includes the negated intercept.

  • "0 + (x + y)" adds an the negated intercept to the model given by x and y.

No matter the final result contains the negated intercept, for example if we do something like "y ~ 0 + x + y + 0", the Model that is obtained removes any negated intercepts thay may have been left. They just don’t make sense in a model.

class formulae.terms.Response(term)[source]#

Representation of a response term.

It is mostly a wrapper around Term.

Parameters

term (Term) – The term we want to take as response in the model. Must contain only one component.

__add__(other)[source]#

Modelled as operator.

The operator is ~, but since it is not an operator in Python, we internally replace it with +. It means the LHS is taken as the response, and the RHS as the predictor.

set_data()[source]#

Set data of the response term.

set_type(data, env)[source]#

Set type of the response term.

property var_names#

Returns the name of the variables in the response as a set.

class formulae.terms.Model(*terms, response=None)[source]#

Representation of a model.

Parameters
  • terms (Term) – This object can be instantiated with one or many terms.

  • response (Response) – The response term. Defaults to None which means there is no response.

__add__(other)[source]#

Addition operator. Analogous to set union.

Adds terms to the model and returns the model.

Returns

self – The same model object with the added term(s).

Return type

Model

__matmul__(other)[source]#

Simple interaction operator.

  • "(x + y) : (u + v)" equals to "x:u + x:v + y:u + y:v".

  • "(x + y) : u" equals to "x:u + y:u".

  • "(x + y) : f(u)" equals to "x:f(u) + y:f(u)".

Returns

model – A new instance of the model with all the interaction terms computed.

Return type

Model

__mul__(other)[source]#

Full interaction operator.

  • "(x + y) * (u + v)" equals to "x + y + u + v + x:u + x:v + y:u + y:v".

  • "(x + y) * u" equals to "x + y + u + x:u + y:u".

Returns

model – A new instance of the model with all the interaction terms computed.

Return type

Model

__or__(other)[source]#

Group specific term operator.

Only _models_ "0 + x" arrive here.

  • "(0 + x | g)" equals to "(x|g)".

  • "(0 + x | g + y)" equals to "(x|g) + (x|y)".

There are several edge cases to handle here. See in-line comments.

Returns

model – A new instance of the model with all the terms computed.

Return type

Model

__pow__(other)[source]#

Power of a set made of Term

Computes all interactions up to order n between the terms in the set.

  • "(x + y + z) ** 2" equals to "x + y + z + x:y + x:z + y:z".

Returns

model – A new instance of the model with all the terms computed.

Return type

Model

__sub__(other)[source]#

Subtraction operator. Analogous to set difference.

  • "(x + y) - (x + u)" equals to "y + u"..

  • "(x + y) - x" equals to "y".

  • "(x + y + (1 | g)) - (1 | g)" equals to "x + y".

Returns

self – The same model object with the removed term(s).

Return type

Model

__truediv__(other)[source]#

Division interaction operator.

  • "(x + y) / z" equals to "x + y + x:y:z".

  • "(x + y) / (u + v)" equals to "x + y + x:y:u + x:y:v".

Returns

model – A new instance of the model with all the terms computed.

Return type

Model

add_response(term)[source]#

Add response term to model description.

This method is called when something like "y ~ x + z" appears in a model formula.

This method is called via special methods such as Response.__add__().

Returns

self – The same model object but now with a response term.

Return type

Model

add_term(term)[source]#

Add term to model description.

The term added can be of class Intercept Term, or GroupSpecificTerm. It appends the new term object to the list of common terms or group specific terms as appropriate.

This method is called via special methods such as __add__().

Returns

self – The same model object but now containing the new term.

Return type

Model

property common_components#

Components in common terms in the model.

Returns

components – A list containing all components from common terms in the model.

Return type

list

eval(data, env)[source]#

Evaluates terms in the model.

Parameters
  • data (pd.DataFrame) – The data frame where variables are taken from

  • env (Environment) – The environment where values and functions are taken from.

set_types(data, env)[source]#

Set the type of the terms in the model.

Calls .set_type() method on term in the model.

Parameters
  • data (pd.DataFrame) – The data frame where variables are taken from

  • env (Environment) – The environment where values and functions are taken from.

property terms#

Terms in the model.

Returns

terms – A list containing both common and group specific terms.

Return type

list

property var_names#

Get the name of the variables in the model.

Returns

var_names – The names of all variables in the model.

Return type

set

call_resolver#

class formulae.terms.call_resolver.LazyValue(value, lexeme)[source]#

Lazy representation of a value in Python.

This object holds a value (a string or a number). It returns its value only when it is evaluated via .eval().

Parameters
  • value (string or numeric) – The value it holds.

  • lexeme (string) – The string that generated the value it holds

eval(*args, **kwargs)[source]#

Evaluates the value.

Simply returns the value. Other arguments are ignored but required for consistency among all the lazy objects.

Returns

The value this obejct represents.

Return type

value

class formulae.terms.call_resolver.LazyVariable(name)[source]#

Lazy variable name.

The variable represented in this object does not hold any value until it is explicitly evaluated within a data mask and an evaluation environment.

Parameters

name (str) – The name of the variable it represents.

eval(data_mask, env)[source]#

Evaluates variable.

First it looks for the variable in data_mask. If not found there, it looks in env. Then it just returns the value the variable represents in either the data mask or the evaluation environment.

Parameters
  • data_mask (pd.DataFrame) – The data frame where variables are taken from

  • env (Environment) – The environment where values and functions are taken from.

Returns

The value represented by this name in either the data mask or the environment.

Return type

result

class formulae.terms.call_resolver.LazyOperator(op, *args)[source]#

Unary and Binary lazy operators.

Functions calls like a + b are converted into a LazyOperator that is resolved when you explicitly evaluates it.

Parameters
  • op (builtin_function_or_method) – An operator in the operator built-in module. It can be one of add, pos, sub, neg, pow, mul, and truediv.

  • args – One or two lazy instances.

eval(data_mask, env)[source]#

Evaluates the operation.

Evaluates the arguments involved in the operation, calls the Python operator, and returns the result.

Parameters
  • data_mask (pd.DataFrame) – The data frame where variables are taken from

  • env (Environment) – The environment where values and functions are taken from.

Returns

The value obtained from the operator call.

Return type

result

class formulae.terms.call_resolver.LazyCall(callee, args, kwargs)[source]#

Lazy representation of a function call.

This class represents a function that can be a stateful transform (a function with memory) whose arguments can also be stateful transforms.

To evaluate these functions we don’t create a string representing Python code and let eval() run it. We take care of all the steps of the evaluation to make sure all the possibly nested stateful transformations are handled correctly.

Parameters
  • callee (string) – The name of the function

  • args (list) – A list of lazy objects that are evaluated when calling the function this object represents.

  • kwargs (dict) – A dictionary of named arguments that are evaluated when calling the function this object represents.

eval(data_mask, env)[source]#

Evaluate the call.

This method first evaluates all its arguments, which are themselves lazy objects, and then proceeds to evaluate the call it represents.

Parameters
  • data_mask (pd.DataFrame) – The data frame where variables are taken from

  • env (Environment) – The environment where values and functions are taken from.

Returns

The result of the call evaluation.

Return type

result