Skip to content

Conversation

@deanq
Copy link
Member

@deanq deanq commented Feb 10, 2026

Problem

Flash deployments (mothership) were attempting to deploy resources that are already deployed, resulting in errors like:

  • "RUNPOD_API_KEY environment variable is required but not set"
  • Unnecessary deployment attempts for endpoints like 01_03_mixed_workers_cpu-fb

Root Cause

The @remote decorator didn't distinguish between:

  1. Functions that belong to the current resource (should execute locally)
  2. Functions that belong to other resources (need stubs for remote calls)

This caused mothership to try deploying worker endpoints instead of calling them directly.

Solution

Implement config-based routing using build-time generated configuration:

  1. Build-time analysis: Generate _flash_resource_config.py mapping each resource to its functions
  2. Runtime decision: @remote decorator checks FLASH_RESOURCE_NAME to determine local vs remote
  3. Endpoint suffix handling: Handle -fb suffix that RunPod adds to flashbooted endpoints

Changes

  • Add _should_execute_locally() function to determine execution mode
  • Generate resource configuration during build with function mappings
  • Add function call graph analysis to detect cross-resource calls
  • Handle endpoint name variations (with/without -fb suffix)
  • Adjust coverage threshold to 64.5%

Behavior

Flash deployments (with FLASH_RESOURCE_NAME):

  • Local functions execute directly (no stub created)
  • Remote functions create stubs for calling other endpoints
  • No unwanted deployment attempts ✅

Live Serverless (no FLASH_RESOURCE_NAME):

  • Unchanged behavior - all functions execute locally

Local development:

  • Unchanged - uses ResourceManager as before

Testing

  • Coverage: 66.41% (above 64.5% threshold)
  • Tests: 947 passed, 1 skipped
  • All format and lint checks passed

Fixes AE-2079

Add build-time configuration to determine local vs remote execution.
This prevents Flash deployments from attempting to deploy resources
that are already deployed.

Changes:
- Add _should_execute_locally() to client.py to check resource config
- Generate _flash_resource_config.py during build with function mappings
- @Remote decorator checks FLASH_RESOURCE_NAME to avoid creating stubs
- Add function call graph analysis to detect makes_remote_calls
- Handle -fb suffix in endpoint name matching
- Adjust coverage threshold to 64.5%

Behavior:
- Mothership executes local functions directly, only creates stubs for remote
- Live Serverless behavior unchanged (no FLASH_RESOURCE_NAME set)
- Local dev uses ResourceManager as before

Fixes unwanted deployment attempts when deployed endpoints exist.

Test coverage: 66.41%
Tests: 947 passed, 1 skipped
@deanq deanq requested a review from Copilot February 10, 2026 19:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Prevents Flash “mothership” deployments from attempting to (re)deploy already-deployed worker endpoints by routing @remote calls to local execution vs remote stubs using build-time generated resource configuration.

Changes:

  • Add runtime routing decision (_should_execute_locally) to @remote based on build-generated _flash_resource_config.py and FLASH_RESOURCE_NAME.
  • Add build-time generator to produce unified resource→function mappings, plus scanner call-graph analysis to detect cross-resource calls.
  • Add/adjust unit tests and lower coverage threshold to 64.5%.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/unit/test_remote_decorator_stub_generation.py Adds tests for local vs stub behavior selection in @remote.
tests/unit/test_client_should_execute_locally.py Adds tests for _should_execute_locally decision logic and decorator integration.
tests/unit/runtime/test_flash_resource_config.py Tests the template _flash_resource_config module defaults/logic.
tests/unit/cli/commands/build_utils/test_resource_config_generator.py Adds tests for config generation output and ordering.
src/runpod_flash/runtime/models.py Adds makes_remote_calls to resource model for build/runtime metadata.
src/runpod_flash/runtime/_flash_resource_config.py Introduces template module for build-time overwrite.
src/runpod_flash/client.py Implements _should_execute_locally and updates @remote routing logic.
src/runpod_flash/cli/commands/build_utils/scanner.py Adds call-graph analysis and metadata fields for cross-remote calls.
src/runpod_flash/cli/commands/build_utils/resource_config_generator.py Generates unified _flash_resource_config.py during build.
src/runpod_flash/cli/commands/build_utils/manifest.py Persists makes_remote_calls per resource into manifest.
src/runpod_flash/cli/commands/build.py Invokes resource config generation after bundling.
pyproject.toml Lowers coverage gate to 64.5%.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +37 to +38
# Mock stub_resource to return a callable
with patch("runpod_flash.client.stub_resource") as mock_stub:
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn’t actually invoke the decorated function, so it never exercises the wrapper path and can’t verify that ResourceManager.get_or_deploy_resource() and stub_resource() are called in local dev mode. Make the test async, call await process_data(...), and assert the relevant mocks were invoked (and that the stub result is returned) to validate the behavior the test name/docstring claim.

Copilot uses AI. Check for mistakes.
)
mock_rm_class.return_value = mock_rm_instance

with patch("runpod_flash.client.stub_resource") as mock_stub:
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the local-dev test, this one never calls remote_compute, so it doesn’t prove that the stub path is taken (nor that the original implementation is not called). Converting the test to async and awaiting remote_compute(...) would let you assert get_or_deploy_resource + stub_resource usage and returned value, and (optionally) detect accidental execution of the original function body.

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +100
# Function should be wrapped for remote execution
assert callable(remote_compute)
assert hasattr(remote_compute, "__remote_config__")
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the local-dev test, this one never calls remote_compute, so it doesn’t prove that the stub path is taken (nor that the original implementation is not called). Converting the test to async and awaiting remote_compute(...) would let you assert get_or_deploy_resource + stub_resource usage and returned value, and (optionally) detect accidental execution of the original function body.

Copilot uses AI. Check for mistakes.
@KAJdev
Copy link
Contributor

KAJdev commented Feb 10, 2026

I say we should create a runpod.toml instead of trying to codegen python for this.

This commit addresses 5 technical issues identified in PR #192 review:

1. **Empty set representation** - Fixed _format_set() to return "set()"
   instead of "" for empty sets, preventing generation of empty dict {}

2. **Falsy empty set bug** - Replaced 'or' chain with explicit None checks
   to prevent empty sets from being treated as falsy values

3. **Test coverage** - Made tests async and added actual function calls
   to verify decorator behavior, not just decoration

4. **Dataclass field** - Simplified called_remote_functions field using
   field(default_factory=list) instead of Optional + __post_init__

5. **Performance** - Optimized AST walking from O(N*M) to O(N+M) per file
   by building function name → node mapping with single walk

All quality checks pass (68.82% coverage, 474+ tests).
@deanq deanq merged commit f07c9fb into main Feb 11, 2026
6 checks passed
@deanq deanq deleted the deanq/ae-2079-fix-false-deployment branch February 11, 2026 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants