file_ops Module#

file_ops.py

This module provides utility functions for handling file operations commonly needed in data processing workflows, including compression, encryption, directory management, file movement, extraction, and data import.

It supports operations such as: - Creating password-protected ZIP archives. - Reading from .env file. - Copying, moving, and renaming files. - Creating directories safely. - Unzipping CSV files from archives. - Reading Excel files into pandas DataFrames. - Retrieving and sorting files by name or creation date.

exception utils.file_ops.MissingEnvironmentVariable[source]#

Bases: RuntimeError

Exception raised when a required environment variable is missing.

utils.file_ops.create_directories(directories: List[str] | None = None) None[source]#

Creates multiple directories if they don’t exist.

Creates input and output directories by default.

Parameters:

directories (list, optional) – A list of directory paths to be created. Defaults to [“input”, “output”].

utils.file_ops.get_env_variable(name: str, *, required: bool = True, default: str | None = None, redact: bool = True) str[source]#

Fetches an environment variable with strong diagnostics.

Parameters:
  • name (str) – The name of the environment variable.

  • required (bool, optional) – If True, raises an exception if the variable is missing. Defaults to True.

  • default (any, optional) – The value to return if the variable is missing and not required. Defaults to None.

  • redact (bool, optional) – If True, hides the value in logs. Defaults to True.

Returns:

The value of the environment variable.

Return type:

str

Raises:

MissingEnvironmentVariable – If the variable is required and missing.

utils.file_ops.get_git_repo_name(path: str | None = None) str[source]#

Returns the name of the current Git repository.

Returns:

The name of the Git repository, or None if not in a repository.

Return type:

str

utils.file_ops.get_git_root(path: str | None = None) str | None[source]#

Returns the root directory of the Git repository containing the path.

Parameters:

path (str, optional) – The path to start searching from. Defaults to the current working directory.

Returns:

The root directory of the Git repository, or None if not in a repository.

Return type:

str

utils.file_ops.load_env_once() None[source]#

Loads .env file from the project root if it exists.

Does NOT override existing environment variables. Safe to call multiple times.

utils.file_ops.zip_files_with_password(files: List[str], zip_filename: str, password: str, output_dir: str = '.') None[source]#

Creates a password-protected ZIP archive containing the specified files.

Parameters:
  • files (List[str]) – A list of file paths to be included in the ZIP archive.

  • zip_filename (str) – The name of the resulting ZIP file.

  • password (str) – The password to protect the ZIP archive.

  • output_dir (str, optional) – The directory where the ZIP file will be saved. Defaults to the current directory (“.”).