capellini.utils.transforms

Numerical transformations: CLR, closure, message-passing, shrinkage.

Functions

closure(X[, eps])

Row-wise closure: each row sums to 1.

clr(X[, pseudocount, eps])

Row-wise centered log-ratio transform.

double_clr_transform(V_df, B_df[, ...])

Apply closure and CLR separately to viruses and bacteria.

geometric_mean_safe(x[, eps])

Compute the geometric mean of a series, clipping zeros.

normalize_columns(X)

Normalize each column of a DataFrame to sum to 1.

old_style_clr_transform(df)

CLR transform matching the original notebook's shrinkage preprocessing.

row_normalize(W[, eps])

Row-normalize a matrix so each row sums to 1.

schaefer_strimmer_corr(X)

Schäfer-Strimmer shrinkage correlation estimator.

capellini.utils.transforms.closure(X: ndarray, eps: float = 1e-12) ndarray[source]

Row-wise closure: each row sums to 1.

Parameters:
  • X – Nonnegative matrix of shape (n_samples, n_features).

  • eps – Small constant to avoid division by zero.

Returns:

Closed matrix with row sums approximately equal to 1.

capellini.utils.transforms.clr(X: ndarray, pseudocount: float = 1e-06, eps: float = 1e-12) ndarray[source]

Row-wise centered log-ratio transform.

Parameters:
  • X – Nonnegative matrix.

  • pseudocount – Added after clipping to handle zeros safely.

  • eps – Numerical stability constant inside the log.

Returns:

CLR-transformed matrix.

capellini.utils.transforms.double_clr_transform(V_df: DataFrame, B_df: DataFrame, pseudocount: float = 1e-06, eps: float = 1e-12) tuple[DataFrame, DataFrame, DataFrame][source]

Apply closure and CLR separately to viruses and bacteria.

Parameters:
  • V_df – Samples x viruses abundance matrix.

  • B_df – Samples x bacteria abundance matrix.

  • pseudocount – Pseudocount for CLR.

  • eps – Numerical stability constant.

Returns:

Tuple of (V_clr_df, B_clr_df, X_clr_df).

capellini.utils.transforms.geometric_mean_safe(x: Series, eps: float = 1e-12) float[source]

Compute the geometric mean of a series, clipping zeros.

Parameters:
  • x – Input series.

  • eps – Floor value for clipping.

Returns:

Geometric mean as a float.

capellini.utils.transforms.normalize_columns(X: DataFrame) DataFrame[source]

Normalize each column of a DataFrame to sum to 1.

Parameters:

X – Input DataFrame (features x samples convention from old notebook).

Returns:

Column-normalized DataFrame.

capellini.utils.transforms.old_style_clr_transform(df: DataFrame) DataFrame[source]

CLR transform matching the original notebook’s shrinkage preprocessing.

Input is samples x features. The transform adds 1, transposes to features x samples, normalizes samples, divides by sample geometric mean, takes log, and transposes back.

Parameters:

df – Samples x features DataFrame.

Returns:

CLR-transformed DataFrame (samples x features).

capellini.utils.transforms.row_normalize(W: ndarray, eps: float = 1e-12) ndarray[source]

Row-normalize a matrix so each row sums to 1.

Parameters:
  • W – Matrix to normalize.

  • eps – Small constant to avoid division by zero.

Returns:

Row-normalized matrix.

capellini.utils.transforms.schaefer_strimmer_corr(X: DataFrame) tuple[DataFrame, dict][source]

Schäfer-Strimmer shrinkage correlation estimator.

Parameters:

X – Samples x features DataFrame (at least 3 samples required).

Returns:

Tuple of (shrinkage_correlation_DataFrame, diagnostics_dict). diagnostics_dict contains keys: n_samples, n_features, lambda_var, lambda_corr.

Raises:

ValueError – If fewer than 3 samples are provided.