Configuration

CAPELLINI is driven by a single YAML configuration file that you provide. The package does not ship any bundled configs: it simply remembers the path of the last config you loaded — stored at ~/.capellini/last_config — and re-uses it on the next run.

In the terminal UI, point CAPELLINI at your YAML via Settings → Load config. Programmatic users pass the path directly:

from capellini import CapelliniConfig, CapelliniPipeline

cfg = CapelliniConfig.from_yaml("/path/to/your_config.yaml")
CapelliniPipeline(cfg).run_all()

Key parameters

Parameter	Default	Meaning
`species_level`	`false`	Genus-level (`false`) vs species-level (`true`) target resolution
`BACTERIA_TAXONOMY_RANK`	`target_taxids`	Rank used to aggregate bacteria for the network stage
`PREVALENCE`	`0.10`	Keep features present in ≥ 10 % of samples
`CRISPR_SMOOTH_ALPHA`	`0.95`	Strength of taxonomy smoothing applied to `W`
`LAM` (η in the paper)	`0.5`	Strength of CRISPR-informed abundance propagation
`N_STEPS`	`1`	Number of message-passing updates
`fdr`	`0.05`	SpacePHARER FDR threshold
`min_n_spacers`	`3`	MinCED minimum spacers per array

Inputs

CAPELLINI expects, per cohort:

Raw 16S rRNA amplicon FASTQ files (forward, reverse, or paired).
A viral contig FASTA (e.g. ViroProfiler output).
A sample metadata CSV used to align bacterial and viral abundances.
The SILVA reference (Release 138.1) and SILVA taxmap.

All output folders under base/ are created automatically.

Outputs

For each study, CAPELLINI writes under Enhanced Networks/<study>/:

common/ — aligned, prevalence-filtered V, B, and metadata tables.
shrinkage/ — Schäfer–Strimmer shrinkage correlations on the CLR-stacked \(Z = [B^{\mathrm{CLR}}\ V^{\mathrm{CLR}}]\).
crispr_raw/ and crispr_smooth/ — raw and taxonomy-smoothed CRISPR matrices (\(W\), \(\tilde{W}\)).
xstar/ — host-informed abundances \(Z^*\) from convex and residual message-passing variants.