Skip to content

Conversation

@fordN
Copy link
Contributor

@fordN fordN commented Jan 9, 2026

This PR creates a shared set of generalized integration tests that can be used across loader implementations. Previously each loader implemented their own integration test harnesses and tests, leading to huge amounts of code duplication and significant testing work for implementers of new loaders.

In this new setup with shared integration tests we have a set of shared base tests, shared streaming tests, and each loader/datastore just has to implement four functions on its destination data store in order to use these shared tests: get_row_count, query_rows, cleanup_table, and get_column_names. Loader implementers may also add loader specific tests for functionality specific to their loader implementation.

With these changes we significantly reduce the overall loader integration testing code and reduce the amount of code a loader dev has to write for integration tests of a new loader.

Summary:

  • Before: 5,302 lines of test code across 6 loaders
  • After: 2,626 lines (670 shared + 1,956 backend-specific)
  • Saved: 2,676 lines (50% reduction)

Per-Loader Average

  • Before: ~883 lines per loader (all standalone, duplicated)
  • After: ~326 lines per loader (inherits 12 tests automatically)
  • Savings: ~557 lines per loader (63% less work)

Resolves #6

fordN added 4 commits January 8, 2026 18:12
Foundational work to enable all loader tests to inherit common test
patterns
Adds generalized streaming test infrastructure and migrates Redis and
Snowflake loader tests to use the shared base classes.
Migrates the final three loader test suites to use the shared base test
infrastructure
@fordN fordN self-assigned this Jan 9, 2026
@fordN fordN added the enhancement New feature or request label Jan 9, 2026
@fordN fordN force-pushed the ford/generalize-loader-tests branch from 31d82d2 to 1010d2a Compare January 9, 2026 17:47
@fordN fordN force-pushed the ford/generalize-loader-tests branch from 1010d2a to f910b03 Compare January 9, 2026 17:51
fordN added 8 commits February 2, 2026 13:11
Iceberg loader was using an outdated reorg deletion method that
used a _meta_block_ranges column instead of using the modern
state_store + _amp_batch_id approach.

Changes:
1. _handle_reorg now uses state_store.invalidate_from_block() to get
   affected batch IDs, matching PostgreSQL/Snowflake/DeltaLake approach

2. _perform_reorg_deletion now filters rows by _amp_batch_id instead of
   trying to parse non-existent _meta_block_ranges JSON column

3. Efficient filtering using set membership checks on batch IDs
Changed test_append_mode to append data with different IDs (6-10) instead
of reusing the same IDs (1-5) to avoid duplicate key conflicts in key-value
stores like LMDB and Redis.
- Redis stream storage: Corrected test to use f'{table_name}:stream' key
  format to match how Redis loader stores stream data
- LMDB overwrite mode: Fixed _clear_data() to properly delete named
  databases when overwriting data
- LMDB streaming: Added tx_hash column to test data for compatibility
  with key pattern requirements
- Base streaming tests: Updated column references from transaction_hash
  to tx_hash for consistency across all loaders
**Iceberg Loader:**
- Added snapshot_id to metadata for test compatibility
- Modified base loader to pass table_name in kwargs to metadata methods
- Skipped partition_spec test (requires PartitionSpec object implementation)

**PostgreSQL Loader:**
- Fixed _clear_table() to check table existence before TRUNCATE
- Prevents "relation does not exist" errors in overwrite mode

**DeltaLake Loader:**
- Added partition_by property for convenient access
- Added delta_version and files_added metadata aliases
- Fixed test fixture to use unique table paths per test
- Prevents data accumulation across tests

**Test Infrastructure:**
- Updated delta_basic_config fixture to generate unique paths per test
- Prevents cross-test contamination in DeltaLake tests
Key fixes:
- Changed loader.conn to loader.connection (Snowflake uses different attribute name)
- Set supports_overwrite = False (Snowflake doesn't support OVERWRITE mode)
- Set requires_existing_table = False (Snowflake auto-creates tables)
- Added cleanup_tables fixture for Snowflake-specific test cleanup
@edgeandnode edgeandnode deleted a comment from github-actions bot Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integration test standardization

2 participants