capellini.stages.ncbi_mapping
NCBI mapping stage: download taxonomy names and assign real NCBI taxids.
Functions
|
Download NCBI taxdmp.zip, extract names.dmp, and delete the zip. |
|
Load taxonomy table, assign NCBI taxids, and return the updated DataFrame. |
- capellini.stages.ncbi_mapping.download_ncbi_names(download_path: str | Path) Path[source]
Download NCBI taxdmp.zip, extract names.dmp, and delete the zip.
Skips the download if names.dmp already exists.
- Parameters:
download_path – Directory where names.dmp will be saved.
- Returns:
Path to the names.dmp file.
- capellini.stages.ncbi_mapping.run_ncbi_mapping(cfg: CapelliniConfig) DataFrame[source]
Load taxonomy table, assign NCBI taxids, and return the updated DataFrame.
Loads the DADA2-produced taxonomy_table_{F|R|P}.csv, downloads NCBI names if needed, looks up real NCBI taxids for each ASV (finest available rank), and adds NCBI_taxid and taxid_matched_rank columns.
- Parameters:
cfg – Populated CapelliniConfig instance.
- Returns:
taxonomy_table DataFrame with NCBI_taxid and taxid_matched_rank columns added.