capellini.utils.transforms
Numerical transformations: CLR, closure, message-passing, shrinkage.
Functions
|
Row-wise closure: each row sums to 1. |
|
Row-wise centered log-ratio transform. |
|
Apply closure and CLR separately to viruses and bacteria. |
|
Compute the geometric mean of a series, clipping zeros. |
Normalize each column of a DataFrame to sum to 1. |
|
CLR transform matching the original notebook's shrinkage preprocessing. |
|
|
Row-normalize a matrix so each row sums to 1. |
Schäfer-Strimmer shrinkage correlation estimator. |
- capellini.utils.transforms.closure(X: ndarray, eps: float = 1e-12) ndarray[source]
Row-wise closure: each row sums to 1.
- Parameters:
X – Nonnegative matrix of shape (n_samples, n_features).
eps – Small constant to avoid division by zero.
- Returns:
Closed matrix with row sums approximately equal to 1.
- capellini.utils.transforms.clr(X: ndarray, pseudocount: float = 1e-06, eps: float = 1e-12) ndarray[source]
Row-wise centered log-ratio transform.
- Parameters:
X – Nonnegative matrix.
pseudocount – Added after clipping to handle zeros safely.
eps – Numerical stability constant inside the log.
- Returns:
CLR-transformed matrix.
- capellini.utils.transforms.double_clr_transform(V_df: DataFrame, B_df: DataFrame, pseudocount: float = 1e-06, eps: float = 1e-12) tuple[DataFrame, DataFrame, DataFrame][source]
Apply closure and CLR separately to viruses and bacteria.
- Parameters:
V_df – Samples x viruses abundance matrix.
B_df – Samples x bacteria abundance matrix.
pseudocount – Pseudocount for CLR.
eps – Numerical stability constant.
- Returns:
Tuple of (V_clr_df, B_clr_df, X_clr_df).
- capellini.utils.transforms.geometric_mean_safe(x: Series, eps: float = 1e-12) float[source]
Compute the geometric mean of a series, clipping zeros.
- Parameters:
x – Input series.
eps – Floor value for clipping.
- Returns:
Geometric mean as a float.
- capellini.utils.transforms.normalize_columns(X: DataFrame) DataFrame[source]
Normalize each column of a DataFrame to sum to 1.
- Parameters:
X – Input DataFrame (features x samples convention from old notebook).
- Returns:
Column-normalized DataFrame.
- capellini.utils.transforms.old_style_clr_transform(df: DataFrame) DataFrame[source]
CLR transform matching the original notebook’s shrinkage preprocessing.
Input is samples x features. The transform adds 1, transposes to features x samples, normalizes samples, divides by sample geometric mean, takes log, and transposes back.
- Parameters:
df – Samples x features DataFrame.
- Returns:
CLR-transformed DataFrame (samples x features).
- capellini.utils.transforms.row_normalize(W: ndarray, eps: float = 1e-12) ndarray[source]
Row-normalize a matrix so each row sums to 1.
- Parameters:
W – Matrix to normalize.
eps – Small constant to avoid division by zero.
- Returns:
Row-normalized matrix.
- capellini.utils.transforms.schaefer_strimmer_corr(X: DataFrame) tuple[DataFrame, dict][source]
Schäfer-Strimmer shrinkage correlation estimator.
- Parameters:
X – Samples x features DataFrame (at least 3 samples required).
- Returns:
Tuple of (shrinkage_correlation_DataFrame, diagnostics_dict). diagnostics_dict contains keys: n_samples, n_features, lambda_var, lambda_corr.
- Raises:
ValueError – If fewer than 3 samples are provided.