Skip to content

Conversation

@atihkin
Copy link
Contributor

@atihkin atihkin commented Feb 2, 2026

Summary

Improves CLI error messages when endpoint creation fails due to configuration issues. Addresses MLE-3108.

Changes

  • Hardware error messages: Show specific reasons for failure (not compatible, unavailable, insufficient capacity, or config not supported with speculative decoding suggestion)
  • Client-side validation: Validates min/max replicas, gpu-count, and availability-zone before API call
  • Model errors: Better message when model not found or unavailable for dedicated endpoints
  • Endpoint not found: Clear message with suggestion to list endpoints
  • Permission errors: Clear access denied messages

Example Error Messages

Hardware unavailable:

Error: Cannot create endpoint with 8x H100 for model 'my-model'

The 8x H100 configuration is currently unavailable. This hardware type has no available capacity at this time.

Available hardware options for this model:
...

Hardware available but config fails:

Error: Cannot create endpoint with 8x H100 for model 'my-model'

Hardware is available but this configuration is not supported. Try adding --no-speculative-decoding.

Invalid replicas:

Error: --min-replicas (5) cannot be greater than --max-replicas (2)

Test plan

  • Test with invalid hardware configuration
  • Test with unavailable hardware
  • Test with invalid min/max replicas
  • Test with invalid gpu-count
  • Test with invalid availability zone
  • Test with non-existent model
  • Test endpoint not found error

Made with Cursor

- Add specific error messages for hardware configuration issues:
  - Hardware not compatible with model
  - Hardware unavailable (no capacity)
  - Insufficient capacity for replicas
  - Hardware available but config not supported (suggests toggling speculative decoding)

- Add client-side validation with clear errors:
  - min/max replicas: must be non-negative, min <= max
  - gpu-count: must be 1, 2, 4, or 8
  - availability-zone: validates against available zones

- Improve API error handling:
  - Model not found: suggests checking model name
  - Endpoint not found: suggests listing endpoints
  - Permission denied: clear access error message

Fixes MLE-3108

Co-authored-by: Cursor <cursoragent@cursor.com>
@atihkin atihkin requested a review from blainekasten February 2, 2026 20:36
@atihkin atihkin closed this Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants