capellini.stages.ncbi_mapping

NCBI mapping stage: download taxonomy names and assign real NCBI taxids.

Functions

download_ncbi_names(download_path)

Download NCBI taxdmp.zip, extract names.dmp, and delete the zip.

run_ncbi_mapping(cfg)

Load taxonomy table, assign NCBI taxids, and return the updated DataFrame.

capellini.stages.ncbi_mapping.download_ncbi_names(download_path: str | Path) Path[source]

Download NCBI taxdmp.zip, extract names.dmp, and delete the zip.

Skips the download if names.dmp already exists.

Parameters:

download_path – Directory where names.dmp will be saved.

Returns:

Path to the names.dmp file.

capellini.stages.ncbi_mapping.run_ncbi_mapping(cfg: CapelliniConfig) DataFrame[source]

Load taxonomy table, assign NCBI taxids, and return the updated DataFrame.

Loads the DADA2-produced taxonomy_table_{F|R|P}.csv, downloads NCBI names if needed, looks up real NCBI taxids for each ASV (finest available rank), and adds NCBI_taxid and taxid_matched_rank columns.

Parameters:

cfg – Populated CapelliniConfig instance.

Returns:

taxonomy_table DataFrame with NCBI_taxid and taxid_matched_rank columns added.