mcglm library

mcglm.mcglm module

class mcglm.mcglm.MCGLM(endog, exog, z, link=None, variance=None, offset=None, ntrial=None, power=None, power_fixed=None, maxiter=50, tol=0.001, tuning=1, weights=None)[source]

Bases: MCGLMMean, MCGLMVariance

MCGLM class that implements MCGLM stastical models. (Bonat, Jørgensen 2015)

It extends GLM for multi-responses and dependent components by fitting second-moment assumptions.

Parameters:

endog (array_like) – 1d array of endogenous response variable. In case of multiple responses, the user must pass the responses on a list.
exog (array_like) – A dataset with the endogenous matrix in a Numpy fashion. Since the library doesn’t set an intercept by default, the user must add it. In the case of multiple responses, the user must pass the design matrices as a python list.
z (array_like) – List with matrices components of the linear covariance matrix.
link (array_like, string or None) – Specification for the link function. The MCGLM library implements the following options: identity, logit, power, log, probit, cauchy, cloglog, loglog, negativebinomial. In the case of None, the library chooses the identity link. In multiple responses, user must pass values as list.
variance (array_like, string or None) – Specification for the variance function. The MCGLM library implements the following options: constant, tweedie, binomialP, binomialPQ, geom_tweedie, poisson_tweedie. In the case of None, the library chooses the constant link. In multiple responses, user must pass values as list.
offset (array_like or None) – Offset for continuous or count. In multiple responses, user must pass values as list.
ntrial (array_like or None) – The parameter ntrial is key for binomial responses. In multiple responses, the user must pass values as a list.
power_fixed (array_like or None) – The parameter power is key for Tweedie-like distributions, as it defines the overall behavior of the model. The library mcglm can also estimate the power parameter if power_fixed equals True. Therefore, in the case of variance functions, either tweedie, geom_tweedie, or poisson_tweedie. In multiple responses, the user must pass values as a list.
maxiter (float or None) – The parameter maxiter defines the total maximum possible cycles of iterations for the optimization process. Defaults to 200.
tol (float or None) – The parameter tol defines the minimum absolute change on parameters to run another optimization cycle. If the absolute updating value is lower than tol, the optimization process stops. Defaults to 0.0001.
tuning (float or None) – The optimization process leverages two second-order algorithms for the estimation process. The parameter tuning is an additional component guiding the step size of the process, acting closely to the second-order derivatives. Defaults to 0.5.
weights (array_like or None) – The parameter weights allows one to specificy sample weights.

Examples

>>> import statsmodels.api as sm
>>> data = sm.datasets.scotland.load()
>>> data.exog = sm.add_constant(data.exog)

>>> model = sm.GLM(data.endog, data.exog, z=[mc_id(data.exog)],
...                      link="log", variance="tweedie",
...                      power=2, power_fixed=False)

>>> model_results = model.fit()
>>> model_results.mu
>>> model_results.pearson_residuals
>>> model_results.aic
>>> model_results.bic
>>> model_results.loglikelihood

Notes

MCGLM is a brand new model, which provides a solid statistical model for fitting multi-responses non-gaussian, dependent, or independent data based on second-moment assumptions. When a user instantiates an mcglm object, she must specify attributes such as link, variance, and z matrices; it will drive the overall behavior of the model. For more details, check articles and documentation provided.

property df_model: Calculates the degree of freedom for the model.

property df_resid: Calculates the degree of freedom for the model residuals.

fit()[source]: The interface to run the inference for MCGLM statistical model.

class mcglm.mcglm.MCGLMParameters[source]

Bases: object

According to MCGLM specification, grounded for frequentist inference traits, the estimation of resulting parameters converge asymptotically to a gaussian distribution with tuple mean-variance = (actual parameters, inverse of matrix Godambe). This property allows the calculation of pivotal traits regarding the parameters, such as: hypothesis testing and confidence interval.

This class implements every method related to this trait.

generate_var_variability(w, c_inv, c_val, c_comp)[source]

class mcglm.mcglm.MCGLMResults(normalized_var_cov, nobs, n_targets, y_names, regression, dispersion, n_iter, residue, rho, tau, power, link, variance, power_fixed, p_log_likelihood, aic, bic, df_resid, df_model, mu, y_values, X, ntrial)[source]

Bases: GLMResults

MCGLM Class for generating and manipulating results of mcglm training. The main output goes by the method summary(), the classical statsmodels output. Therefore, the user can access the attributes “aic”, “bic” e loglikelihood.

Parameters:: GLMResults – Class of statsmodels library for presenting results of GLM.

property aic: Akaike Information Criterion -2 * llf + 2 * (df_model + 1)

anova(indexes_covariates=[[1, 2, 2, 2, 2]], covariate_name=[['x1', 'x2']])[source]

property bic

Bayes Information Criterion

deviance - df_resid * log(nobs)

Warning

The current definition is based on the deviance rather than the log-likelihood. This is not consistent with the AIC definition, and after 0.13 both will make use of the log-likelihood definition.

Notes

The log-likelihood version is defined -2 * llf + (df_model + 1)*log(n)

property bse: The standard errors of the parameter estimates.

property loglikelihood

property mu: See GLM docstring.

property pearson_residuals: Pearson residuals. The Pearson residuals are defined as (endog - mu)/sqrt(VAR(mu)) where VAR is the distribution specific variance function. See statsmodels.families.family and statsmodels.families.varfuncs for more information.

property pvalues: The two-tailed p values for the t-stats of the params.

summary(yname=None, xname=None, title=None, alpha=0.05)[source]: It generates the summary report as the sketch of classical “statsmodels” library. The summary shows all parameters found thoroughly, for each response.

property tvalues: Return the t-statistic for a given parameter estimate.

property vcov

Submodules

mcglm.dependencies module

An extension of MCGLM library to provide three options for matrix linear predictor: mc_id, mc_ma, and mc_mixed.

mcglm.dependencies.mc_id(data=None)[source]: mc_id method retrieves a numpy diagonal matrix with data length of the original matrix

mcglm.dependencies.mc_ma(id=None, time=None, data=None, order=1)[source]

mc_ma method retrieves the Z components for matrix linear predictor associated with Autoregressive models(Feller, W. (1957). An introduction to probability theory and its applications / William Feller. Wiley New York, 2nd ed. edition.).

To ilustrate, in a three-row example, a MA(1) produce the following dependence matrix:

[[0, 1, 0], [1, 0, 1], [0, 1, 0]]

A MA(2) would produce:

[[0, 0, 1], [0, 0, 0], [1, 0, 0]]

mcglm.dependencies.mc_mixed(data=None, formula=None)[source]: mc_mixed retrieves the components for matrix linear predictor associated with mixed models(Demidenko E (2013). Mixed Models: Theory and Applications with R. John Wiley & Sons. doi:10.1002/0471728438.).

mcglm.mcglmcattr module

class mcglm.mcglmcattr.MCGLMCAttributes[source]

Bases: object

The class “MCGLMCAttributes” has the sake of calculating every C operations, used on throughout adjustments of mean and variance. This class has two interfaces, “c_inverse” and “c_complete”; one for each of two adjustment steps of MCGLM.

The interface “c_inverse” crafts only inverse C, and the “c_complete” adds its derivatives and other features onto response. A Quasi-likelihood estimation needs only the inverse of “C” matrix. Therefore c_inverse saves computational resources by avoiding unnecessary operations on mean step adjustment.

c_complete(mu, power, rho, tau)[source]

A method to generate the whole list of C components, explicitly made for the variance treatment step. This method interacts with sigma and omega crafting practices, passing the list of each parameter.

Parameters:

mu (array_like) – A vetor with mean parameters.
power (float) – Power parameter.
rho (float) – Correlation parameter.
tau (float) – Dispersion parameter.

Returns:

tuple

Return type:

A tuple with every component of C.

c_inverse(mu, power, rho, tau, full_response=False)[source]

A method to generate only the inverse of the C matrix, explicitly made for the mean treatment step. This method interacts with sigma and omega amenities by list of each parameter.

Parameters:

mu (array_like) – A vetor with mean parameters.
power (float) – Power parameter.
rho (float) – Correlation parameter.
tau (float) – Dispersion parameter.

Returns:

array_like or tuple

Return type:

The inverse of C matrix and its components.

mcglm.mcglmmean module

class mcglm.mcglmmean.MCGLMMean[source]

Bases: MCGLMCAttributes

MCGLMMean is the class for the first moment adjustment within MCGLM inference. It handles lifecycle completely, ranging from mu and derivatives to quasi-likelihood calculations. This class has two interfaces: ‘calculate_mean_features’ which calculates mu attributes, and ‘update_beta’ that applies quasi-likelihood estimation and retrieves a new beta. This class implements the Estimating Equation Quasi-score (Wedderburn, 1974) and the second-order optimization algorithm (Jennrich, 1969) and (Widyaningsih et al., 2017).

References

Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss—Newton method. Biometrika, 61(3):439–447.

Jennrich, R. I. (1969). A Newton-Raphson algorithm for maximum likelihood factor analysis. Psychometrika, 34.

Widyaningsih, P., Saputro, D. e Putri, A. (2017). Fisher scoring method for parameter estimation of geographically weighted ordinal logistic regression (gwolr) model. Journal of Physics: Conference Series, 855:012060.

calculate_mean_features(link, beta, X, offset)[source]

Base method to calculate every attribute related to the mean.

Parameters:

link (str) – Link function.
beta (array-like) – Regression Parameters.
X (array-like) – Matrix with covariates.
offset (int, optional)) – Offset add value. Defaults to 0.

Returns:

tuple

Return type:

Mean attributes, the raw mean and its derivatives.

update_beta(beta, W, power, rho, tau)[source]

The method update_beta takes the current beta, leverages the quasi-likelihood estimator to calculate the next regression parameters.

Parameters:

beta (array-like) – Regression Parameters.
W (array-like) – Weight matrix
power (float) – Power parameter
rho (float) – Correlation parameters
tau (float) – Dispersion parameters.

Returns:

tuple (A tuple with the new regression parameters, the)
quasi-score parameter, sensitivity and the variability matrix.

mcglm.mcglmvariance module

class mcglm.mcglmvariance.MCGLMVariance[source]

Bases: MCGLMCAttributes

The MCGLMVariance class handles the second optimization of the MCGLM second-moment assumptions, therefore, the step for variance. It implements every step of Variance within the scope of the MCGLM algorithm, using many attributes to be specified as attributes. A general class must inherit this MCGLMVariance and leverages its methods properly. MCGLM is in charge of setting the fundamental python attributes and modules orchestration for a complete mcglm adjustment.

The variance step on the optimization sketch boils down to Pearson estimating equations and the chaser algorithm for optimization. The latter uses tuning to set the step size of each iteration.

Heavy operations regarding C components, pivotal to the chaser optimization step, are implemented on the MCGLMAttricutes class, inherited here. The method _c_complete, the one that crafts all of the three attributes thoroughly, is comprehensive for the variance calculation in this class.

static generate_sensitivity(c_intermediate_components, W)[source]

The method to create the sensitivity matrix.

Parameters:

c_intermediate_components (array_type) – Intermediate components of C matrix.
W (array_type) – A weight matrix.

Returns:

array_type

Return type:

A sensitivity matrix

update_covariates(mu_attributes, rho, power, tau, W, dispersion, mu)[source]

The method update_covariates implements a cycle of iteration for the second-moment estimation, the variance.

Parameters:

mu_attributes (dict) – A dict with mean and derivatives.
rho (array_type) – Parameters of correlation.
power (float) – A parameter for Power Tweedie distribution.
tau (float) – Dispersion parameters.
W (array_type) – A weight matrix.
dispersion (array-type) – A vector with dispersion parameters.
mu (array_type) – A vector with mean parameters.

Returns:

tuple (A tuple with new vector of dispersion vector, atributes of)
matrix C, sensitivity.

mcglm.utils module

mcglm.utils.diagonal(n: int, values: array)[source]

mcglm.utils.mc_matrix_linear_predictor(tau: list, z: list)[source]

mcglm.utils.mc_sandwich(central_matrix, left_matrix, right_matrix)[source]

mcglm.utils.mc_sandwich_csr(central_matrix, left_matrix, right_matrix)[source]

mcglm.utils.mc_sandwich_power(central_matrix, left_matrix, right_matrix)[source]

mcglm.utils.mc_sandwich_power_csr(central_matrix, left_matrix, right_matrix)[source]

Module contents

class mcglm.MCGLM(endog, exog, z, link=None, variance=None, offset=None, ntrial=None, power=None, power_fixed=None, maxiter=50, tol=0.001, tuning=1, weights=None)[source]

Bases: MCGLMMean, MCGLMVariance

MCGLM class that implements MCGLM stastical models. (Bonat, Jørgensen 2015)

It extends GLM for multi-responses and dependent components by fitting second-moment assumptions.

Parameters:

endog (array_like) – 1d array of endogenous response variable. In case of multiple responses, the user must pass the responses on a list.
exog (array_like) – A dataset with the endogenous matrix in a Numpy fashion. Since the library doesn’t set an intercept by default, the user must add it. In the case of multiple responses, the user must pass the design matrices as a python list.
z (array_like) – List with matrices components of the linear covariance matrix.
link (array_like, string or None) – Specification for the link function. The MCGLM library implements the following options: identity, logit, power, log, probit, cauchy, cloglog, loglog, negativebinomial. In the case of None, the library chooses the identity link. In multiple responses, user must pass values as list.
variance (array_like, string or None) – Specification for the variance function. The MCGLM library implements the following options: constant, tweedie, binomialP, binomialPQ, geom_tweedie, poisson_tweedie. In the case of None, the library chooses the constant link. In multiple responses, user must pass values as list.
offset (array_like or None) – Offset for continuous or count. In multiple responses, user must pass values as list.
ntrial (array_like or None) – The parameter ntrial is key for binomial responses. In multiple responses, the user must pass values as a list.
power_fixed (array_like or None) – The parameter power is key for Tweedie-like distributions, as it defines the overall behavior of the model. The library mcglm can also estimate the power parameter if power_fixed equals True. Therefore, in the case of variance functions, either tweedie, geom_tweedie, or poisson_tweedie. In multiple responses, the user must pass values as a list.
maxiter (float or None) – The parameter maxiter defines the total maximum possible cycles of iterations for the optimization process. Defaults to 200.
tol (float or None) – The parameter tol defines the minimum absolute change on parameters to run another optimization cycle. If the absolute updating value is lower than tol, the optimization process stops. Defaults to 0.0001.
tuning (float or None) – The optimization process leverages two second-order algorithms for the estimation process. The parameter tuning is an additional component guiding the step size of the process, acting closely to the second-order derivatives. Defaults to 0.5.
weights (array_like or None) – The parameter weights allows one to specificy sample weights.

Examples

>>> import statsmodels.api as sm
>>> data = sm.datasets.scotland.load()
>>> data.exog = sm.add_constant(data.exog)

>>> model = sm.GLM(data.endog, data.exog, z=[mc_id(data.exog)],
...                      link="log", variance="tweedie",
...                      power=2, power_fixed=False)

>>> model_results = model.fit()
>>> model_results.mu
>>> model_results.pearson_residuals
>>> model_results.aic
>>> model_results.bic
>>> model_results.loglikelihood

Notes

MCGLM is a brand new model, which provides a solid statistical model for fitting multi-responses non-gaussian, dependent, or independent data based on second-moment assumptions. When a user instantiates an mcglm object, she must specify attributes such as link, variance, and z matrices; it will drive the overall behavior of the model. For more details, check articles and documentation provided.

property df_model: Calculates the degree of freedom for the model.

property df_resid: Calculates the degree of freedom for the model residuals.

fit()[source]: The interface to run the inference for MCGLM statistical model.

class mcglm.MCGLMCAttributes[source]

Bases: object

The class “MCGLMCAttributes” has the sake of calculating every C operations, used on throughout adjustments of mean and variance. This class has two interfaces, “c_inverse” and “c_complete”; one for each of two adjustment steps of MCGLM.

The interface “c_inverse” crafts only inverse C, and the “c_complete” adds its derivatives and other features onto response. A Quasi-likelihood estimation needs only the inverse of “C” matrix. Therefore c_inverse saves computational resources by avoiding unnecessary operations on mean step adjustment.

c_complete(mu, power, rho, tau)[source]

A method to generate the whole list of C components, explicitly made for the variance treatment step. This method interacts with sigma and omega crafting practices, passing the list of each parameter.

Parameters:

mu (array_like) – A vetor with mean parameters.
power (float) – Power parameter.
rho (float) – Correlation parameter.
tau (float) – Dispersion parameter.

Returns:

tuple

Return type:

A tuple with every component of C.

c_inverse(mu, power, rho, tau, full_response=False)[source]

A method to generate only the inverse of the C matrix, explicitly made for the mean treatment step. This method interacts with sigma and omega amenities by list of each parameter.

Parameters:

mu (array_like) – A vetor with mean parameters.
power (float) – Power parameter.
rho (float) – Correlation parameter.
tau (float) – Dispersion parameter.

Returns:

array_like or tuple

Return type:

The inverse of C matrix and its components.

class mcglm.MCGLMMean[source]

Bases: MCGLMCAttributes

MCGLMMean is the class for the first moment adjustment within MCGLM inference. It handles lifecycle completely, ranging from mu and derivatives to quasi-likelihood calculations. This class has two interfaces: ‘calculate_mean_features’ which calculates mu attributes, and ‘update_beta’ that applies quasi-likelihood estimation and retrieves a new beta. This class implements the Estimating Equation Quasi-score (Wedderburn, 1974) and the second-order optimization algorithm (Jennrich, 1969) and (Widyaningsih et al., 2017).

References

Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss—Newton method. Biometrika, 61(3):439–447.

Jennrich, R. I. (1969). A Newton-Raphson algorithm for maximum likelihood factor analysis. Psychometrika, 34.

Widyaningsih, P., Saputro, D. e Putri, A. (2017). Fisher scoring method for parameter estimation of geographically weighted ordinal logistic regression (gwolr) model. Journal of Physics: Conference Series, 855:012060.

calculate_mean_features(link, beta, X, offset)[source]

Base method to calculate every attribute related to the mean.

Parameters:

link (str) – Link function.
beta (array-like) – Regression Parameters.
X (array-like) – Matrix with covariates.
offset (int, optional)) – Offset add value. Defaults to 0.

Returns:

tuple

Return type:

Mean attributes, the raw mean and its derivatives.

update_beta(beta, W, power, rho, tau)[source]

The method update_beta takes the current beta, leverages the quasi-likelihood estimator to calculate the next regression parameters.

Parameters:

beta (array-like) – Regression Parameters.
W (array-like) – Weight matrix
power (float) – Power parameter
rho (float) – Correlation parameters
tau (float) – Dispersion parameters.

Returns:

tuple (A tuple with the new regression parameters, the)
quasi-score parameter, sensitivity and the variability matrix.

class mcglm.MCGLMVariance[source]

Bases: MCGLMCAttributes

The MCGLMVariance class handles the second optimization of the MCGLM second-moment assumptions, therefore, the step for variance. It implements every step of Variance within the scope of the MCGLM algorithm, using many attributes to be specified as attributes. A general class must inherit this MCGLMVariance and leverages its methods properly. MCGLM is in charge of setting the fundamental python attributes and modules orchestration for a complete mcglm adjustment.

The variance step on the optimization sketch boils down to Pearson estimating equations and the chaser algorithm for optimization. The latter uses tuning to set the step size of each iteration.

Heavy operations regarding C components, pivotal to the chaser optimization step, are implemented on the MCGLMAttricutes class, inherited here. The method _c_complete, the one that crafts all of the three attributes thoroughly, is comprehensive for the variance calculation in this class.

static generate_sensitivity(c_intermediate_components, W)[source]

The method to create the sensitivity matrix.

Parameters:

c_intermediate_components (array_type) – Intermediate components of C matrix.
W (array_type) – A weight matrix.

Returns:

array_type

Return type:

A sensitivity matrix

update_covariates(mu_attributes, rho, power, tau, W, dispersion, mu)[source]

The method update_covariates implements a cycle of iteration for the second-moment estimation, the variance.

Parameters:

mu_attributes (dict) – A dict with mean and derivatives.
rho (array_type) – Parameters of correlation.
power (float) – A parameter for Power Tweedie distribution.
tau (float) – Dispersion parameters.
W (array_type) – A weight matrix.
dispersion (array-type) – A vector with dispersion parameters.
mu (array_type) – A vector with mean parameters.

Returns:

tuple (A tuple with new vector of dispersion vector, atributes of)
matrix C, sensitivity.

mcglm.mc_id(data=None)[source]: mc_id method retrieves a numpy diagonal matrix with data length of the original matrix

mcglm.mc_ma(id=None, time=None, data=None, order=1)[source]

mc_ma method retrieves the Z components for matrix linear predictor associated with Autoregressive models(Feller, W. (1957). An introduction to probability theory and its applications / William Feller. Wiley New York, 2nd ed. edition.).

To ilustrate, in a three-row example, a MA(1) produce the following dependence matrix:

[[0, 1, 0], [1, 0, 1], [0, 1, 0]]

A MA(2) would produce:

[[0, 0, 1], [0, 0, 0], [1, 0, 0]]

mcglm.mc_mixed(data=None, formula=None)[source]: mc_mixed retrieves the components for matrix linear predictor associated with mixed models(Demidenko E (2013). Mixed Models: Theory and Applications with R. John Wiley & Sons. doi:10.1002/0471728438.).