- UV - Fast Python package manager (manages Python versions automatically)
-
Install UV (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh -
Clone the repository:
git clone <repository-url> cd igh-data-transform
-
Install Python and dependencies:
# UV will automatically install Python 3.12 if needed uv sync
You can run the CLI tool without activating the virtual environment using uv run:
# Show available commands
uv run igh-transform --helpAlternatively, activate the virtual environment first:
source .venv/bin/activate # On Linux/Mac
# or
.venv\Scripts\activate # On Windows
# Then run normally
igh-transform --helpTransform raw Bronze layer data to cleaned Silver layer:
uv run igh-transform bronze-to-silver --bronze-db ./data/bronze.db --silver-db ./data/silver.dbThis applies cleanup transformations:
- Drops columns that are entirely null (preserves
valid_from/valid_to) - Normalizes whitespace in text fields
- Ready for table-specific column renames and value mappings
Transform Silver layer to pre-aggregated Gold layer (not yet implemented):
uv run igh-transform silver-to-gold --silver-db ./data/silver.dbThis project uses igh-data-sync to pull data from Microsoft Dataverse before applying transformations.
Setup:
-
Configure environment variables - Create a
.envfile with your Dataverse credentials:CLIENT_ID=your-azure-client-id CLIENT_SECRET=your-azure-client-secret SCOPE=https://your-org.crm.dynamics.com/.default API_URL=https://your-org.api.crm.dynamics.com/api/data/v9.2/ SQLITE_DB_PATH=./data/dataverse.db
-
Run the sync - Pull data from Dataverse to local SQLite:
uv run sync-dataverse
-
Verify the data (optional) - Check foreign key integrity:
uv run sync-dataverse --verify
The synced data will be stored in a SQLite database with SCD2 (Slowly Changing Dimension Type 2) versioning for historical tracking.
The project uses UV for dependency management. Common commands:
- Add a dependency:
uv add <package-name> - Add a dev dependency:
uv add --dev <package-name> - Update dependencies:
uv sync - Run commands without activating venv:
uv run <command> - Run tests:
uv run pytest - Run tests with coverage:
uv run pytest --cov=igh_data_transform --cov-report=term-missing - Run linter:
uv run ruff check src/ tests/
- Adding Transformations - Guide for data analysts on how to add new data transformations