Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Consider custom resources in best-fit node selection for DeploymentScheduler in Ray Serve #51361

Open
ktyxx opened this issue Mar 14, 2025 · 0 comments
Labels
enhancement Request for new feature and/or capability serve Ray Serve Related Issue triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@ktyxx
Copy link

ktyxx commented Mar 14, 2025

Description

Currently, the DeploymentScheduler in Ray Serve does not fully consider custom resources (e.g., resources: {GRAM: 6}) when selecting the best-fit node for deployment.

The function _best_fit_node() primarily evaluates CPU, GPU, and memory, but does not include custom resources defined in ray_actor_options["resources"].
However, custom resources are already considered when filtering available nodes.
This means that a deployment will not be scheduled onto a node without the required custom resources, but it may not be scheduled onto the most optimal node in terms of resource utilization.

Use case

Image Image Image Image

These are the resource states after deployment.

  • Deployment Configuration

ray_actor_options: num_cpus: 10.0 num_gpus: 1.0 memory: 8.0 resources: GRAM: 6

  1. Two replicas of this deployment were scheduled.
  2. Each replica requires 10 CPU, 1 GPU, 8GB memory, and 6 GRAM.
  • Observed Scheduling Behavior (Inefficient Allocation)
  1. Expected: Both replicas should be deployed on Node A and Node D (since they have exactly 6/6 GRAM available).
  2. Actual: One replica went to Node C (which had 12 GRAM total, but 6 available).
    One replica went to Node D (which had exactly 6 GRAM available).
    This means that Node C now has 6/12 GRAM remaining, leading to potential resource fragmentation.
  • Root Cause Analysis
  1. Ray Serve's _best_fit_node() function currently considers CPU, GPU, and memory but does not prioritize exact fits for custom resources (resources: {GRAM: 6}).
  2. The custom resource is correctly used for filtering nodes but is not included in the best-fit selection process.
  3. As a result, the scheduler sometimes places deployments on nodes with excess custom resources instead of an exact match.
  • Suggested Fix
  1. Modify _best_fit_node() to consider custom resources (resources) in best-fit selection in the same way CPU, GPU, and memory are handled.
  2. Introduce an optional flag (RAY_SERVE_CUSTOM_RESOURCES) so users can enable or disable this stricter scheduling behavior.
import os

class Resources(dict):
    CUSTOM_RESOURCES = set(os.environ.get("RAY_SERVE_CUSTOM_RESOURCES", "").split(","))

    def get(self, key: str):
        val = super().get(key)
        if val is not None:
            return val

        if key in ["CPU", "GPU", "memory"] or key in self.CUSTOM_RESOURCES:
            return 0  

        return 0
  • Example Usage of RAY_SERVE_CUSTOM_RESOURCES

export RAY_SERVE_CUSTOM_RESOURCES="GRAM"

  1. "This will make the scheduler consider GRAM as a factor in best-fit node selection, alongside CPU, GPU, and memory."
  • Further Improvement Suggestion

    Additionally, instead of simply minimizing remaining resources, a weighted scoring method could be used.
    
    The current _best_fit_node() function selects the node that minimizes leftover resources,
    but it does not consider the balance between different resources.
    For example, if Node A has 1 CPU, 2 Memory left and Node B has 2 CPU, 1 Memory left, both nodes are considered equal.
    A weighted scoring method that considers the proportional fit (e.g., (remaining_resource / total_resource))
    could improve scheduling efficiency, especially when deploying multiple replicas.
    This would ensure that custom resources (like GRAM) are not just considered, but also optimally utilized.
    
    Alternative Improvement: Hierarchical Resource Prioritization
    In addition to weighted scoring, another potential enhancement is implementing hierarchical resource prioritization.
    
    Currently, all resources are treated equally when selecting the best-fit node. Instead, allowing users to define a priority order
    (e.g., [Custom Resource > GPU > CPU > Memory]) would ensure that deployments are scheduled onto nodes where the most critical resources
    are most available first.
    
    For instance, if a deployment primarily depends on GRAM, the scheduler would first attempt to place it on nodes with the most available GRAM,
    before considering other factors like CPU or memory. This could prevent fragmentation of custom resources and improve overall resource utilization efficiency.
    
    Would this be a reasonable improvement to introduce alongside the weighted scoring method?
    
  • Contribution Intent

  1. I am willing to submit a PR for this enhancement and would appreciate feedback on the approach.
    Would this be a viable improvement for future Ray Serve releases?
@ktyxx ktyxx added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 14, 2025
@jcotant1 jcotant1 added the serve Ray Serve Related Issue label Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability serve Ray Serve Related Issue triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

2 participants