capellini.utils.io

I/O helpers: file reading, writing, subprocess execution.

Functions

open_maybe_gzip(path[, mode, encoding])

Open a file, transparently decompressing if it ends with .gz.

read_table(path[, index_col])

Read a CSV, TSV, or Excel file into a DataFrame, including gzip-compressed variants.

sh(cmd[, desc])

Run a shell command, printing description and command, raising on failure.

write_df(df, path, *[, overwrite, verbose])

Write a DataFrame to CSV, creating parent directories as needed.

capellini.utils.io.open_maybe_gzip(path: str | Path, mode: str = 'rt', encoding: str = 'utf-8') IO[source]

Open a file, transparently decompressing if it ends with .gz.

Parameters:
  • path – Path to the file.

  • mode – File open mode.

  • encoding – Text encoding.

Returns:

File-like object.

capellini.utils.io.read_table(path: str | Path, index_col: int = 0, **kwargs)[source]

Read a CSV, TSV, or Excel file into a DataFrame, including gzip-compressed variants.

Parameters:
  • path – Path to the tabular file.

  • index_col – Column to use as the row index.

  • **kwargs – Additional keyword arguments forwarded to pandas.

Returns:

pd.DataFrame with the file contents.

capellini.utils.io.sh(cmd: str, desc: str = '') CompletedProcess[source]

Run a shell command, printing description and command, raising on failure.

Parameters:
  • cmd – Shell command string.

  • desc – Human-readable description printed before execution.

Returns:

CompletedProcess result.

Raises:

RuntimeError – If the command exits with a non-zero return code, including full stdout and stderr in the message.

capellini.utils.io.write_df(df, path: str | Path, *, overwrite: bool = True, verbose: bool = False, **to_csv_kwargs) Path[source]

Write a DataFrame to CSV, creating parent directories as needed.

Parameters:
  • df – DataFrame to write.

  • path – Destination file path.

  • overwrite – Skip if file exists and overwrite is False.

  • verbose – Print a message on skip or save.

  • **to_csv_kwargs – Forwarded to DataFrame.to_csv.

Returns:

Path to the written file.