capellini.stages.ncbi_mapping
NCBI mapping stage: download taxonomy names and assign real NCBI taxids.
Functions
|
Download taxdmp.zip, extract names.dmp into |
|
Load taxonomy table, assign NCBI taxids, and return the updated DataFrame. |
- capellini.stages.ncbi_mapping.download_ncbi_names(names_dmp_path: str | Path, taxdmp_url: str) Path[source]
Download taxdmp.zip, extract names.dmp into
names_dmp_path.Skips the download if
names_dmp_pathalready exists.- Parameters:
names_dmp_path – Destination path for
names.dmp.taxdmp_url – Source URL of the taxdmp.zip archive.
- Returns:
Path to the names.dmp file.
- capellini.stages.ncbi_mapping.run_ncbi_mapping(cfg: CapelliniConfig) DataFrame[source]
Load taxonomy table, assign NCBI taxids, and return the updated DataFrame.
Loads the DADA2-produced taxonomy_table_{F|R|P}.csv, downloads NCBI names if needed, looks up real NCBI taxids for each ASV (finest available rank), and adds NCBI_taxid and taxid_matched_rank columns.
- Parameters:
cfg – Populated CapelliniConfig instance.
- Returns:
taxonomy_table DataFrame with NCBI_taxid and taxid_matched_rank columns added.