capellini.utils.io
I/O helpers: file reading, writing, downloading, subprocess execution.
Functions
|
Download |
|
Open a file, transparently decompressing if it ends with .gz. |
|
Read a CSV, TSV, or Excel file into a DataFrame, including gzip-compressed variants. |
|
Run a shell command, printing description and command, raising on failure. |
|
Write a DataFrame to CSV, creating parent directories as needed. |
- capellini.utils.io.download_if_missing(url: str, dest: str | Path, *, label: str | None = None) Path[source]
Download
urltodestifdestis missing.Mirrors the simple
urllib.request.urlretrieve(url, dest)pattern used in the notebook checkpoints — no automatic decompression.- Parameters:
url – Source URL. Empty ⇒ raises ValueError when the file is missing.
dest – Destination path. Parent directories are created if needed.
label – Optional human-readable label used in log/print messages.
- Returns:
Path to the downloaded (or already-present) file.
- capellini.utils.io.open_maybe_gzip(path: str | Path, mode: str = 'rt', encoding: str = 'utf-8') IO[source]
Open a file, transparently decompressing if it ends with .gz.
- Parameters:
path – Path to the file.
mode – File open mode.
encoding – Text encoding.
- Returns:
File-like object.
- capellini.utils.io.read_table(path: str | Path, index_col: int = 0, **kwargs)[source]
Read a CSV, TSV, or Excel file into a DataFrame, including gzip-compressed variants.
- Parameters:
path – Path to the tabular file.
index_col – Column to use as the row index.
**kwargs – Additional keyword arguments forwarded to pandas.
- Returns:
pd.DataFrame with the file contents.
- capellini.utils.io.sh(cmd: str, desc: str = '') CompletedProcess[source]
Run a shell command, printing description and command, raising on failure.
- Parameters:
cmd – Shell command string.
desc – Human-readable description printed before execution.
- Returns:
CompletedProcess result.
- Raises:
RuntimeError – If the command exits with a non-zero return code, including full stdout and stderr in the message.
- capellini.utils.io.write_df(df, path: str | Path, *, overwrite: bool = True, verbose: bool = False, **to_csv_kwargs) Path[source]
Write a DataFrame to CSV, creating parent directories as needed.
- Parameters:
df – DataFrame to write.
path – Destination file path.
overwrite – Skip if file exists and overwrite is False.
verbose – Print a message on skip or save.
**to_csv_kwargs – Forwarded to DataFrame.to_csv.
- Returns:
Path to the written file.