capellini.utils.io

I/O helpers: file reading, writing, downloading, subprocess execution.

Functions

download_if_missing(url, dest, *[, label])

Download url to dest if dest is missing.

open_maybe_gzip(path[, mode, encoding])

Open a file, transparently decompressing if it ends with .gz.

read_table(path[, index_col])

Read a CSV, TSV, or Excel file into a DataFrame, including gzip-compressed variants.

sh(cmd[, desc])

Run a shell command, printing description and command, raising on failure.

write_df(df, path, *[, overwrite, verbose])

Write a DataFrame to CSV, creating parent directories as needed.

capellini.utils.io.download_if_missing(url: str, dest: str | Path, *, label: str | None = None) Path[source]

Download url to dest if dest is missing.

Mirrors the simple urllib.request.urlretrieve(url, dest) pattern used in the notebook checkpoints — no automatic decompression.

Parameters:
  • url – Source URL. Empty ⇒ raises ValueError when the file is missing.

  • dest – Destination path. Parent directories are created if needed.

  • label – Optional human-readable label used in log/print messages.

Returns:

Path to the downloaded (or already-present) file.

capellini.utils.io.open_maybe_gzip(path: str | Path, mode: str = 'rt', encoding: str = 'utf-8') IO[source]

Open a file, transparently decompressing if it ends with .gz.

Parameters:
  • path – Path to the file.

  • mode – File open mode.

  • encoding – Text encoding.

Returns:

File-like object.

capellini.utils.io.read_table(path: str | Path, index_col: int = 0, **kwargs)[source]

Read a CSV, TSV, or Excel file into a DataFrame, including gzip-compressed variants.

Parameters:
  • path – Path to the tabular file.

  • index_col – Column to use as the row index.

  • **kwargs – Additional keyword arguments forwarded to pandas.

Returns:

pd.DataFrame with the file contents.

capellini.utils.io.sh(cmd: str, desc: str = '') CompletedProcess[source]

Run a shell command, printing description and command, raising on failure.

Parameters:
  • cmd – Shell command string.

  • desc – Human-readable description printed before execution.

Returns:

CompletedProcess result.

Raises:

RuntimeError – If the command exits with a non-zero return code, including full stdout and stderr in the message.

capellini.utils.io.write_df(df, path: str | Path, *, overwrite: bool = True, verbose: bool = False, **to_csv_kwargs) Path[source]

Write a DataFrame to CSV, creating parent directories as needed.

Parameters:
  • df – DataFrame to write.

  • path – Destination file path.

  • overwrite – Skip if file exists and overwrite is False.

  • verbose – Print a message on skip or save.

  • **to_csv_kwargs – Forwarded to DataFrame.to_csv.

Returns:

Path to the written file.