capellini.config
Configuration dataclass for the CAPELLINI pipeline.
Classes
|
All settings for the CAPELLINI pipeline, mirroring the notebook Settings section. |
- class capellini.config.CapelliniConfig(base: str = '', download_path: str = '', input_fasta_folder: str = '', dada2_folder: str = '', mmseq_folder: str = '', sp_folder: str = '', procs_folder: str = '', enhanced_networks_folder: str = '', silva_ref_path: str = '', silva_taxmap_path: str = '', full_ncbi_taxonomy_path: str = '', virus_fasta_name: str = '', metadata_path: str = '', bacterial_raw_fasta_folder: str = '', species_level: bool = False, fresh_start: bool = False, ref_removal: bool = True, regenerate_16S_reference: bool = False, regenerate_spacers_collection: bool = False, genes_reference_url: str = 'http://progenomes3.embl.de/data/repGenomes/progenomes3.genes.representatives.fasta.bz2', bacContigs_reference_url: str = 'http://progenomes3.embl.de/data/repGenomes/progenomes3.contigs.representatives.fasta.bz2', protein_reference_url: str = 'http://progenomes3.embl.de/data/repGenomes/progenomes3.proteins.representatives.fasta.bz2', direction: str = 'forward', bacteria_fasta_name: str = '16S_DADA2_bacteria.fasta', fasta_generation: bool = True, isolate_ref_16S: bool = True, mapping_saving: bool = True, min_bitscore: int = 50, max_matches: int = 20, add_taxonomy: bool = True, extend_taxonomy: bool = True, min_n_spacers: int = 3, min_length: int = 23, max_length: int = 47, fdr: float = 0.05, keep_spacers_collection: bool = True, remove_decomp_fasta: bool = True, proteins_extraction_path: str = '', clustering_path: str = '', matrix_type: str = 'count', save_single_bacgenome_collection: bool = False, keep_coords: bool = False, filter_1bac_1vir: bool = False, remove_collections: bool = False, batch_size: int = 1500, OUTPUT_ROOT: str = '', OVERWRITE: bool = False, VERBOSE: bool = True, RUN_COMMON_ABUNDANCE: bool = True, RUN_SHRINKAGE_CORRELATIONS: bool = True, RUN_RAW_CRISPR_NETWORKS: bool = True, RUN_SMOOTH_CRISPR: bool = True, RUN_XSTAR: bool = True, PREVALENCE: float = 0.1, KEEP_COLUMN: str = 'keep_for_analysis', BACTERIA_TAXONOMY_RANK: str = 'target_taxids', BACTERIAL_RANKS: list = <factory>, BACTERIAL_WEIGHTS: list = <factory>, CRISPR_SMOOTH_ALPHA: float = 0.95, TRANSPOSE_RAW_CRISPR_AFTER_LOAD: bool = True, PSEUDOCOUNT: float = 1e-06, LAM: float = 0.5, N_STEPS: int = 1, PRESERVE_SCALE: bool = False, STUDY: str = 'default', virus_abundance_raw: str = '', bacteria_otu: str = '', bacteria_taxonomy: str = '', phage_host_predictions: str = '', tax_bac_for_smoothing: str = '', tax_vir: str = '', viral_ranks: list = <factory>, viral_weights: list = <factory>, aggregate_viral_rank: str = 'lev0')[source]
Bases:
objectAll settings for the CAPELLINI pipeline, mirroring the notebook Settings section.
Required fields (no defaults) must be provided explicitly or via from_yaml/from_dict. Derived path fields are computed in __post_init__ when left as empty strings.
- bacContigs_reference_url: str = 'http://progenomes3.embl.de/data/repGenomes/progenomes3.contigs.representatives.fasta.bz2'
- classmethod default() CapelliniConfig[source]
Return a config with all default values (paths will be empty until base is set).
- classmethod from_dict(d: dict[str, Any]) CapelliniConfig[source]
Load config from a plain Python dict.
- classmethod from_yaml(path: str | Path) CapelliniConfig[source]
Load config from a YAML file.
- Parameters:
path – Path to the YAML configuration file.
- Returns:
CapelliniConfig populated from the YAML.
- genes_reference_url: str = 'http://progenomes3.embl.de/data/repGenomes/progenomes3.genes.representatives.fasta.bz2'
- protein_reference_url: str = 'http://progenomes3.embl.de/data/repGenomes/progenomes3.proteins.representatives.fasta.bz2'