Skip to content

Add transformers#801

Open
abnegate wants to merge 1 commit intomainfrom
feat-transformers
Open

Add transformers#801
abnegate wants to merge 1 commit intomainfrom
feat-transformers

Conversation

@abnegate
Copy link
Member

@abnegate abnegate commented Feb 4, 2026

Summary by CodeRabbit

Release Notes

  • New Features
    • Document transformation support: Apply custom transformer callbacks to automatically modify and enrich documents during retrieval, creation, updates, and queries. The transformer receives document context, enabling consistent custom processing across all database operations.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a document transformer feature to the Database class, enabling external code to configure and apply custom transformation callbacks to documents during retrieval, creation, and update operations. The implementation includes a transformer property, getter/setter methods, and integration points across document operations with comprehensive test coverage.

Changes

Cohort / File(s) Summary
Core Transformer Implementation
src/Database/Database.php
Added setTransformer() and getTransformer() public methods, protected $transformer property, and protected transform() method to apply transformer callbacks to documents. Integrated transformer invocation at multiple operation endpoints: getDocument, createDocument, updateDocument, createDocuments, upsertDocuments, and find.
E2E Test Suite
tests/e2e/Adapter/Base.php, tests/e2e/Adapter/Scopes/TransformerTests.php
Added TransformerTests trait inclusion to Base test class and comprehensive test trait covering transformer lifecycle, invocation across all document operations, parameter validation, method chaining, and edge cases like empty documents and null clearing.
Unit Tests
tests/unit/TransformerTest.php
Added unit test class validating setter/getter behavior, fluent interface support, null handling, default state, and method chaining on the transformer property.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • fogelito

Poem

🐰 A transformer hops through documents with care,
Enriching each record beyond compare,
Callbacks dancing on create and find,
Leaving metadata trails behind! ✨📝

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add transformers' accurately describes the main change in the pull request, which introduces a new transformer feature to the Database class with public setter/getter methods and comprehensive tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat-transformers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Database/Database.php (1)

4358-4373: ⚠️ Potential issue | 🟠 Major

Cache hits bypass the transformer (and cached docs are transformed).

If the transformer changes at runtime or depends on request context, cached results become stale or inconsistent. Consider caching the raw decoded document and always applying the transformer on read, or bypassing cache when a transformer is set.

One way to make cache + transformer consistent
         if ($cached) {
             $document = $this->createDocumentInstance($collection->getId(), $cached);

             if ($collection->getId() !== self::METADATA) {
                 if (!$this->authorization->isValid(new Input(self::PERMISSION_READ, [
                     ...$collection->getRead(),
                     ...($documentSecurity ? $document->getRead() : [])
                 ]))) {
                     return $this->createDocumentInstance($collection->getId(), []);
                 }
             }

+            $document = $this->transform($document, $collection);
             $this->trigger(self::EVENT_DOCUMENT_READ, $document);

             return $document;
         }
@@
-        $document = $this->transform($document, $collection);
-
         $relationships = \array_filter(
             $collection->getAttribute('attributes', []),
             fn ($attribute) => $attribute['type'] === Database::VAR_RELATIONSHIP
         );

         // Don't save to cache if it's part of a relationship
         if (empty($relationships)) {
             try {
                 $this->cache->save($documentKey, $document->getArrayCopy(), $hashKey);
                 $this->cache->save($collectionKey, 'empty', $documentKey);
             } catch (Exception $e) {
                 Console::warning('Failed to save document to cache: ' . $e->getMessage());
             }
         }
+
+        $document = $this->transform($document, $collection);
🤖 Fix all issues with AI agents
In `@src/Database/Database.php`:
- Around line 5787-5788: refetchDocuments() is ending up transforming documents
twice because find() now applies transform() and callers later call transform()
again; update the refetch path to use a non-transforming read (add/use a
findRaw/find(..., ['skipTransform' => true]) variant) or change
refetchDocuments() to call the new findRaw helper, and alternatively add a guard
in transform() (e.g., a metadata flag on the document like __transformed) to
no-op when already transformed; locate transform(), find(), and
refetchDocuments() in the diff and implement the findRaw/skipTransform option
and/or the idempotence guard so refetch does not double-transform documents.

In `@tests/e2e/Adapter/Scopes/TransformerTests.php`:
- Around line 142-144: The test calls a non-existent method
Document::attributeExists(), causing runtime failure; update the assertion
inside the foreach over $documents to check the attributes array directly (e.g.
use isset($doc->attributes['_index']) or array_key_exists('_index',
$doc->attributes')) instead of $doc->attributeExists('_index') so the existence
check uses the actual Document attributes storage; modify the assertion line in
TransformerTests.php accordingly.
🧹 Nitpick comments (2)
src/Database/Database.php (1)

1142-1181: Validate transformer return type to avoid hard failures.

If the callback returns anything other than a Document, downstream code will fatal. Consider guarding and throwing a DatabaseException with a clear message.

Proposed hardening
     protected function transform(Document $document, Document $collection): Document
     {
         if ($this->transformer === null || $document->isEmpty()) {
             return $document;
         }
-        return ($this->transformer)($document, $collection, $this);
+        $transformed = ($this->transformer)($document, $collection, $this);
+        if (!$transformed instanceof Document) {
+            throw new DatabaseException('Transformer must return a Document instance');
+        }
+        return $transformed;
     }
tests/unit/TransformerTest.php (1)

11-22: Reflection on mock may not reflect actual behavior.

The test uses reflection to access the transformer property on a mock object. Mocks created with createMock() don't initialize the actual class properties, so $property->getValue($database) returns null because the property doesn't exist on the mock, not because of the Database class's default value.

Consider using the partial mock approach (like in other tests) or testing the public API directly:

♻️ Suggested improvement
 public function testSetTransformer(): void
 {
-    $database = $this->createMock(Database::class);
-
-    // Create a reflection to access the protected property
-    $reflection = new \ReflectionClass(Database::class);
-    $property = $reflection->getProperty('transformer');
-    $property->setAccessible(true);
-
-    // Initially null
-    $this->assertNull($property->getValue($database));
+    $database = $this->getMockBuilder(Database::class)
+        ->disableOriginalConstructor()
+        ->onlyMethods([])
+        ->getMock();
+
+    // Initially null (test via public getter)
+    $this->assertNull($database->getTransformer());
 }

Comment on lines +5787 to +5788
$document = $this->transform($document, $collection);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Refetch path can apply the transformer twice.

When operators trigger refetchDocuments(), it uses find() which now transforms results. These call sites then transform again. If transformers aren’t idempotent, this produces incorrect data. Consider a findRaw/skipTransform path for refetchDocuments() or guard against a second transform after refetch.

Also applies to: 6829-6830

🤖 Prompt for AI Agents
In `@src/Database/Database.php` around lines 5787 - 5788, refetchDocuments() is
ending up transforming documents twice because find() now applies transform()
and callers later call transform() again; update the refetch path to use a
non-transforming read (add/use a findRaw/find(..., ['skipTransform' => true])
variant) or change refetchDocuments() to call the new findRaw helper, and
alternatively add a guard in transform() (e.g., a metadata flag on the document
like __transformed) to no-op when already transformed; locate transform(),
find(), and refetchDocuments() in the diff and implement the
findRaw/skipTransform option and/or the idempotence guard so refetch does not
double-transform documents.

Comment on lines +142 to +144
foreach ($documents as $doc) {
$this->assertTrue($doc->attributeExists('_index'), 'Each document should have _index attribute');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix undefined method attributeExists() — this causes pipeline failure.

The CodeQL analysis correctly identifies that Document::attributeExists() is not a valid method. This will cause a runtime error.

🐛 Proposed fix using isset() on attributes
         // Verify each document was transformed
         foreach ($documents as $doc) {
-            $this->assertTrue($doc->attributeExists('_index'), 'Each document should have _index attribute');
+            $this->assertNotNull($doc->getAttribute('_index'), 'Each document should have _index attribute');
         }
🧰 Tools
🪛 GitHub Actions: CodeQL

[error] 143-143: Call to an undefined method Utopia\Database\Document::attributeExists().

🤖 Prompt for AI Agents
In `@tests/e2e/Adapter/Scopes/TransformerTests.php` around lines 142 - 144, The
test calls a non-existent method Document::attributeExists(), causing runtime
failure; update the assertion inside the foreach over $documents to check the
attributes array directly (e.g. use isset($doc->attributes['_index']) or
array_key_exists('_index', $doc->attributes')) instead of
$doc->attributeExists('_index') so the existence check uses the actual Document
attributes storage; modify the assertion line in TransformerTests.php
accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant