Pipeline overview ================= .. code-block:: text Preflight → DADA2 → 3-layer NCBI ID Mapping → SpacePHARER Execution → Protein Clusters (ProCs) Estimation → Enhanced Networks Estimation Stages ------ **Preflight** Folder layout; optional fresh-start cleanup that preserves bundled references and the input virus FASTA. **DADA2** Denoise 16S reads to ASVs and assign SILVA taxonomy via the bundled ``DADA2_Pipe.R`` script. **3-layer NCBI ID Mapping** Download ``names.dmp`` and assign real NCBI taxids to the SILVA taxonomy table, then run a taxonomy-aware mapping of ASVs to proGenomes3 representative genomes via ``mmseqs easy-search`` with three-layer fallback (ASV → genus → family) and derivation of the ``target_taxids`` column. **SpacePHARER Execution** Filter the bundled spacer collection to the cohort, build SpacePHARER databases, and run ``predictmatch`` with FDR control to obtain the virus–host adjacency :math:`W`. **Protein Clusters (ProCs) Estimation** Protein clustering of bacterial and viral proteins, building the ProCs presence/count matrix. **Enhanced Networks Estimation** Common-abundance preprocessing, CLR transformation, Schäfer–Strimmer shrinkage correlations, raw and taxonomy-smoothed CRISPR networks .. math:: \tilde{W} = (1 - \alpha) W + \alpha\, K_{\mathrm{vir}}\, W\, K_{\mathrm{bac}}, and X* message-passing propagation .. math:: Z^*_v = Z_v + \eta (Z_b P_h - Z_v), \quad Z^*_b = Z_b + \eta (Z_v P_v - Z_b).