capellini.config
Configuration dataclass for the CAPELLINI pipeline.
Classes
|
All settings for the CAPELLINI pipeline, mirroring the notebook Settings section. |
- class capellini.config.CapelliniConfig(base: str = '', download_path: str = '', input_fasta_folder: str = '', dada2_folder: str = '', mmseq_folder: str = '', sp_folder: str = '', procs_folder: str = '', enhanced_networks_folder: str = '', silva_ref_path: str = '', silva_taxmap_path: str = '', full_ncbi_taxonomy_path: str = '', ncbi_accessory_path: str = '', virus_fasta_name: str = '', metadata_path: str = '', bacterial_raw_fasta_folder: str = '', species_level: bool = False, fresh_start: bool = False, ref_removal: bool = True, regenerate_16S_reference: bool = False, regenerate_spacers_collection: bool = False, silva_ref_url: str = 'https://zenodo.org/records/4587955/files/silva_nr99_v138.1_train_set.fa.gz', silva_taxmap_url: str = 'https://www.arb-silva.de/fileadmin/silva_databases/release_138_1/Exports/taxonomy/tax_slv_ssu_138.1.txt.gz', full_ncbi_taxonomy_url: str = '', ncbi_taxdmp_url: str = 'https://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip', genes_reference_url: str = 'http://progenomes3.embl.de/data/repGenomes/progenomes3.genes.representatives.fasta.bz2', bacContigs_reference_url: str = 'http://progenomes3.embl.de/data/repGenomes/progenomes3.contigs.representatives.fasta.bz2', protein_reference_url: str = 'http://progenomes3.embl.de/data/repGenomes/progenomes3.proteins.representatives.fasta.bz2', direction: str = 'forward', bacteria_fasta_name: str = '16S_DADA2_bacteria.fasta', fasta_generation: bool = True, isolate_ref_16S: bool = True, mapping_saving: bool = True, min_bitscore: int = 50, max_matches: int = 20, add_taxonomy: bool = True, extend_taxonomy: bool = True, min_n_spacers: int = 3, min_length: int = 23, max_length: int = 47, fdr: float = 0.05, keep_spacers_collection: bool = True, remove_decomp_fasta: bool = True, proteins_extraction_path: str = '', clustering_path: str = '', matrix_type: str = 'count', save_single_bacgenome_collection: bool = False, keep_coords: bool = False, filter_1bac_1vir: bool = False, remove_collections: bool = False, batch_size: int = 1500, output_root: str = '', overwrite: bool = False, verbose: bool = True, run_common_abundance: bool = True, run_shrinkage_correlations: bool = True, run_raw_crispr_networks: bool = True, run_smooth_crispr: bool = True, run_xstar: bool = True, prevalence: float = 0.1, keep_column: str = 'keep_for_analysis', bacteria_taxonomy_rank: str = 'target_taxids', bacterial_ranks: list = <factory>, bacterial_weights: list = <factory>, crispr_smooth_alpha: float = 0.95, transpose_raw_crispr_after_load: bool = True, pseudocount: float = 1e-06, lam: float = 0.5, n_steps: int = 1, preserve_scale: bool = False, virus_abundance_raw: str = '', bacteria_otu: str = '', bacteria_taxonomy: str = '', phage_host_predictions: str = '', tax_bac_for_smoothing: str = '', tax_vir: str = '', viral_ranks: list = <factory>, viral_weights: list = <factory>, aggregate_viral_rank: str = 'lev0')[source]
Bases:
objectAll settings for the CAPELLINI pipeline, mirroring the notebook Settings section.
Required fields (no defaults) must be provided explicitly or via from_yaml/from_dict. Derived path fields are computed in __post_init__ when left as empty strings.
- bacContigs_reference_url: str = 'http://progenomes3.embl.de/data/repGenomes/progenomes3.contigs.representatives.fasta.bz2'
- classmethod default() CapelliniConfig[source]
Return a config with all default values (paths will be empty until base is set).
- classmethod from_dict(d: dict[str, Any]) CapelliniConfig[source]
Load config from a plain Python dict.
Old UPPER_CASE network keys are auto-translated to the new lowercase names.
STUDYis silently dropped.YAML values that look like un-interpolated Python f-strings (
f"{dada2_folder}/...") are normalised to empty so the path-derivation logic in__post_init__can fill them in.
- classmethod from_yaml(path: str | Path) CapelliniConfig[source]
Load config from a YAML file.
- Parameters:
path – Path to the YAML configuration file.
- Returns:
CapelliniConfig populated from the YAML.
- genes_reference_url: str = 'http://progenomes3.embl.de/data/repGenomes/progenomes3.genes.representatives.fasta.bz2'
- protein_reference_url: str = 'http://progenomes3.embl.de/data/repGenomes/progenomes3.proteins.representatives.fasta.bz2'
- silva_taxmap_url: str = 'https://www.arb-silva.de/fileadmin/silva_databases/release_138_1/Exports/taxonomy/tax_slv_ssu_138.1.txt.gz'