Skip to content

refactor(store): improve cache restore perf & fix go-header store errs#3051

Merged
julienrbrt merged 9 commits intomainfrom
julien/perf
Feb 5, 2026
Merged

refactor(store): improve cache restore perf & fix go-header store errs#3051
julienrbrt merged 9 commits intomainfrom
julien/perf

Conversation

@julienrbrt
Copy link
Member

@julienrbrt julienrbrt commented Feb 5, 2026

Overview

Founds while fixing #3050.

  1. Restoring from store was atrociously long, because it would be loop through the whole state
  2. Go-header expected others errors from store adapter
  3. When syncing from p2p lru cache would evict non executed headers too early.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 5, 2026, 11:55 AM

@claude
Copy link
Contributor

claude bot commented Feb 5, 2026

Claude finished @julienrbrt's task in 1m 51s —— View job


Code Review Complete ✅

I've performed a comprehensive code review of this PR focusing on architecture, performance, security, concurrency, error handling, testing, and documentation.

Review Summary

Overall Assessment: This is a well-implemented performance optimization with appropriate error handling fixes. The architectural approach is sound, and the changes successfully address the stated goals. However, there are several areas that warrant attention for production readiness and maintainability.


✅ Strengths

  1. Significant Performance Improvement: The shift from iterating all blocks to prefix-based metadata queries is an excellent optimization that should dramatically reduce cache restore times.

  2. Clean Error Handling Fix: Correctly returning header.ErrEmptyStore instead of header.ErrNotFound aligns with go-header expectations and improves semantic clarity.

  3. Well-Structured Code: The separation of concerns between Store.GetMetadataByPrefix, cache restoration logic, and store adapters is clean and maintainable.

  4. Good Test Coverage for Core Functionality: The new TestGetMetadataByPrefix test is thorough, and existing tests were properly updated.


⚠️ Issues & Recommendations

1. Critical: Inconsistent Key Trimming Logic (High Priority)

Location: block/internal/cache/generic_cache.go:234-239

The key extraction logic has redundant handling for leading slashes that may never occur:

hash := strings.TrimPrefix(entry.Key, c.storeKeyPrefix)
if hash == "" || hash == entry.Key {
    // Prefix not found or hash part is empty, indicating an invalid key format.
    continue
}

Analysis: GetMetadataByPrefix in pkg/store/store.go:217-218 already strips the meta prefix and removes leading slashes:

key := strings.TrimPrefix(result.Key, metaKeyPrefix)
key = strings.TrimPrefix(key, "/") // Remove leading slash for consistency

This means entry.Key should never have a leading slash when it reaches RestoreFromStore.

Recommendation: Simplify the logic per Gemini's review comment:

// Extract the hash from the key by removing the prefix.
hash := strings.TrimPrefix(entry.Key, c.storeKeyPrefix)
if hash == entry.Key || hash == "" {
    // Prefix not found or hash part is empty, indicating an invalid key format.
    continue
}

Fix inconsistent key trimming →


2. Concurrency: Missing Error Handling in Query Iterator (Medium Priority)

Location: pkg/store/store.go:208-211

for result := range results.Next() {
    if result.Error != nil {
        return nil, fmt.Errorf("error iterating metadata results: %w", result.Error)
    }
    // ...
}

Issue: The defer results.Close() at line 205 may not execute if an error is returned mid-iteration, potentially leaking the query iterator.

Recommendation: Wrap error returns to ensure cleanup:

var entries []MetadataEntry
var iterErr error
for result := range results.Next() {
    if result.Error != nil {
        iterErr = fmt.Errorf("error iterating metadata results: %w", result.Error)
        break
    }
    // ... process result
}

if iterErr != nil {
    return nil, iterErr
}
return entries, nil

Fix iterator resource leak →


3. Concurrency: Race Condition in StoreAdapter Initialization (Medium Priority)

Location: pkg/store/store_adapter.go:332-348

The Start method has a race condition:

func (a *StoreAdapter[H]) Start(ctx context.Context) error {
    a.mu.Lock()
    defer a.mu.Unlock()

    h, err := a.getter.Height(ctx)  // Line 337: calls external method while holding lock
    if err != nil {
        return err
    }
    // ...
}

Issue: Calling a.getter.Height(ctx) while holding a.mu can cause deadlocks if the getter implementation tries to acquire the same lock or perform operations that block indefinitely.

Recommendation: Minimize lock scope:

func (a *StoreAdapter[H]) Start(ctx context.Context) error {
    h, err := a.getter.Height(ctx)  // Call outside lock
    if err != nil {
        return err
    }

    a.mu.Lock()
    defer a.mu.Unlock()

    if h > 0 {
        a.heightSub.SetHeight(h)
        a.initialized = true
    }

    return nil
}

Fix initialization race condition →


4. Error Handling: Inconsistent Empty Store Detection (Low Priority)

Location: pkg/store/store_adapter.go:383-427

Both Head and Tail methods return header.ErrEmptyStore for empty stores, but the logic differs:

  • Head (line 383): Returns error when both store and pending are empty
  • Tail (line 426): Has a comment "should never happen" but still returns ErrEmptyStore

Recommendation: Add invariant assertions or logging when "should never happen" cases occur to aid debugging:

// Tail should never reach here if Head succeeds
log.Error().Msg("Tail: unexpected empty store state - both store and pending cache are empty")
return zero, header.ErrEmptyStore

5. Testing: Insufficient Coverage for New Functionality (Medium Priority)

CodeCov Report: 52.33% patch coverage with 92 lines missing coverage

Missing Coverage Areas:

  1. GetMetadataByPrefix error paths - No test for query errors during iteration
  2. store_adapter.go - 43 missing lines + 7 partials (lines covering pending cache backpressure, DA hint persistence, and edge cases)
  3. generic_cache.go restoration - Missing tests for malformed metadata entries during restore

Recommendations:

  • Add test for GetMetadataByPrefix query failure scenarios
  • Add tests for StoreAdapter pending cache full scenarios and backpressure behavior
  • Add test for RestoreFromStore with corrupted/invalid metadata entries

Add missing test coverage →%20Test%20query%20error%20during%20iteration%202)%20Test%20corrupted%20metadata%20values%20during%20restore.%20Also%20add%20store_adapter%20tests%20for%20pending%20cache%20backpressure%20when%20maxPendingCacheSize%20is%20reached.&repo=evstack/ev-node&branch=julien/perf)


6. Code Quality: Magic Numbers and Constants (Low Priority)

Location: pkg/store/store_adapter.go:23-28

const (
    maxPendingCacheSize = 10_000
    pruneRetryInterval = 50 * time.Millisecond
)

Issue: These constants lack documentation explaining:

  • Why 10,000 is the chosen limit
  • Performance implications of changing these values
  • Relationship to expected sync rates

Recommendation: Add comments:

const (
    // maxPendingCacheSize limits in-memory pending items to prevent OOM.
    // Set to 10k based on ~500KB memory usage for typical block sizes.
    // Backpressure kicks in when limit is reached, slowing P2P sync.
    maxPendingCacheSize = 10_000

    // pruneRetryInterval controls how often we retry pruning when cache is full.
    // 50ms provides responsive backpressure without excessive CPU usage.
    pruneRetryInterval = 50 * time.Millisecond
)

7. Documentation: Missing Rationale for Architecture Decisions (Low Priority)

The PR description mentions "Go-header expected other errors from store adapter" but doesn't explain why ErrEmptyStore is semantically correct vs ErrNotFound.

Recommendation: Add comments in store_adapter.go explaining the distinction:

// Head returns header.ErrEmptyStore (not ErrNotFound) when the store is empty.
// This distinction is important for go-header syncer which treats:
// - ErrEmptyStore: Store needs initialization, trigger trusted header workflow
// - ErrNotFound: Specific item missing, potentially a data corruption issue

🔒 Security Assessment

No security issues identified. The changes don't introduce:

  • Authentication/authorization concerns
  • Input validation issues (metadata keys are validated by prefix matching)
  • Secret exposure risks
  • Injection vulnerabilities

🚀 Performance Assessment

Excellent performance improvement as stated in the PR description. The optimization correctly identifies that iterating through all blocks to collect hashes is O(n) where n = total blocks, while prefix-based metadata queries are O(m) where m = matching entries (typically much smaller).

Potential future optimization (mentioned by @tac0turtle): Execute blocks as they're restored from store rather than loading all into memory first. This would reduce memory pressure during startup.


📋 Additional Observations

  1. Positive: The removal of LRU cache for store_adapter (commit 4db3dc3) is mentioned but not visible in this PR. This suggests deliberate cache strategy improvements.

  2. Positive: The StoreAdapter pending cache architecture with backpressure (lines 644-665) is well-designed for handling burst P2P sync scenarios.

  3. Positive: DA hint persistence to disk (lines 629-631) provides crash resilience.

  4. Minor: The maxPendingCacheSize constant (10,000) seems conservative - modern systems could likely handle 50-100k without issues. Consider making this configurable.


✅ Approval Recommendation

Approve with minor fixes. The core functionality is solid and achieves the stated performance goals. The identified issues are:

  • 1 critical (inconsistent key trimming - easy fix)
  • 2 medium priority (concurrency improvements - recommended but not blocking)
  • 4 low priority (code quality/docs - nice to have)

The 52% patch coverage is concerning but not blocking given the nature of the changes (store adapters and integration code are harder to test in unit tests and may be better covered by integration tests).


@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the store's cache restoration mechanism to dramatically improve performance by switching from a full state iteration to a more efficient prefix-based metadata query. Concurrently, it addresses a compatibility issue with the go-header syncer by standardizing error responses for empty store conditions, ensuring smoother operation and correct error handling within the system.

Highlights

  • Performance Improvement for Cache Restore: The cache restoration process has been significantly optimized by replacing an inefficient iteration through all blocks with a direct, prefix-based query of metadata entries. This change drastically reduces the time required to restore the cache from the store.
  • Fix for go-header Store Errors: Error handling in store adapters has been corrected to return header.ErrEmptyStore instead of header.ErrNotFound when the store is empty. This aligns with the expectations of the go-header syncer, resolving a previously identified issue.
  • New GetMetadataByPrefix Functionality: A new GetMetadataByPrefix method has been introduced to the store interface and its implementations. This method allows for efficient retrieval of all metadata entries matching a given key prefix, which is fundamental to the cache restore performance enhancement.
  • Tracing for New Store Method: Observability has been enhanced by adding tracing for the newly implemented GetMetadataByPrefix method, ensuring that its performance and behavior can be monitored effectively.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • block/internal/cache/generic_cache.go
    • Imported the strings package for string manipulation.
    • Modified RestoreFromStore to utilize the new GetMetadataByPrefix method, removing the need for an explicit list of hashes.
    • Updated RestoreFromStore to correctly extract the hash from metadata keys by trimming the store key prefix, accounting for potential leading slashes.
  • block/internal/cache/generic_cache_test.go
    • Updated test cases to reflect the new RestoreFromStore signature, removing the hashes argument.
  • block/internal/cache/manager.go
    • Refactored RestoreFromStore to remove the block iteration logic for collecting hashes, now relying on the optimized cache restore methods.
    • Adjusted logging in RestoreFromStore to report the count of entries directly from the cache's length rather than the length of collected hashes.
  • pkg/store/data_store_adapter_test.go
    • Updated assertions in tests to expect header.ErrEmptyStore instead of header.ErrNotFound when the data store is empty, aligning with go-header expectations.
  • pkg/store/header_store_adapter_test.go
    • Updated assertions in tests to expect header.ErrEmptyStore instead of header.ErrNotFound when the header store is empty, aligning with go-header expectations.
  • pkg/store/store.go
    • Imported strings and dsq (datastore query) packages.
    • Added GetMetadataByPrefix method to DefaultStore to query metadata entries by prefix using the underlying datastore.
  • pkg/store/store_adapter.go
    • Modified Head and Tail methods to return header.ErrEmptyStore when the store is empty, providing clearer error semantics for the go-header syncer.
  • pkg/store/tracing.go
    • Implemented tracing for the new GetMetadataByPrefix method to monitor its execution and performance.
  • pkg/store/tracing_test.go
    • Updated tracingMockStore to include a mock function for GetMetadataByPrefix to support testing of the new tracing functionality.
  • pkg/store/types.go
    • Introduced a new Metadata interface encapsulating metadata operations (SetMetadata, GetMetadataByPrefix, DeleteMetadata).
    • Updated the main Store interface to embed the new Metadata interface.
    • Defined MetadataEntry struct to represent key-value pairs returned by metadata queries.
Activity
  • The pull request was created by julienrbrt.
  • The title and description indicate a refactor to improve cache restore performance and fix go-header store errors.
  • The changes directly implement the described performance improvements and error handling corrections.
  • No explicit comments or reviews are available in the provided context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant performance improvement for cache restoration by switching from iterating over all blocks to using a more efficient prefix-based query on the store. It also corrects the error type returned by store adapters for empty stores to align with go-header's expectations. The changes are well-implemented across the store, cache, and testing layers. I have one suggestion to simplify a piece of logic for better clarity.

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 73.57513% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.57%. Comparing base (0a349c0) to head (abe998c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/store/store_adapter.go 83.91% 20 Missing and 3 partials ⚠️
pkg/store/tracing.go 0.00% 12 Missing ⚠️
block/internal/cache/generic_cache.go 37.50% 7 Missing and 3 partials ⚠️
pkg/store/store.go 77.77% 2 Missing and 2 partials ⚠️
block/internal/cache/manager.go 50.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3051      +/-   ##
==========================================
+ Coverage   56.35%   56.57%   +0.22%     
==========================================
  Files         118      118              
  Lines       12036    12151     +115     
==========================================
+ Hits         6783     6875      +92     
- Misses       4511     4535      +24     
+ Partials      742      741       -1     
Flag Coverage Δ
combined 56.57% <73.57%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a further optimzation is to start executing blocks in cache as we are getting them from the store so its not a get everything into memory then execute. we dont have to make that change here but something we can do in the future if we see this being an issue

@julienrbrt julienrbrt added this pull request to the merge queue Feb 5, 2026
Merged via the queue into main with commit b8eda5d Feb 5, 2026
27 of 28 checks passed
@julienrbrt julienrbrt deleted the julien/perf branch February 5, 2026 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants