Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llm] Roadmap for Data and Serve LLM APIs #51313

Open
22 tasks
kouroshHakha opened this issue Mar 12, 2025 · 4 comments
Open
22 tasks

[llm] Roadmap for Data and Serve LLM APIs #51313

kouroshHakha opened this issue Mar 12, 2025 · 4 comments
Labels
enhancement Request for new feature and/or capability llm RFC RFC issues

Comments

@kouroshHakha
Copy link
Contributor

kouroshHakha commented Mar 12, 2025

This document includes a list of issues / feature requests that we have collected across the oss and other channels. We’ll update this list with relevant info from issues, etc as we go. If there are any features that are not prioritized here, please feel free to open an RFC or feature request or post on the slack community channel.

Core features

Serve

  • [P0] Prefix aware router
  • [P0] In-place update for deployments when you have new models without having re-deploy the cluster
  • [P0] TPU support
  • [P0] Display vllm emitted metrics: https://github.com/ray-project/ray/pull/51156/files
  • [P1] Open router protocol api for devs / researchers
  • [P1] Prefill disaggregation PxDy pattern (many open questions around this architecture so would be interesting to see under what conditions is this architecture better than simple chunked-prefill enabled replicas under the same resource count)
  • [P1] Distributed kv cache
  • [P1] Embedding models endpoints cc @tnixon
  • [P1] Heterogenous accelerator_type (Have a single deployment that can be scheduled with different engine settings on different accelerator types and shapes with different priorities)
  • [P2] More backends other than vLLM (e.g. sglang) cc @Qiaolin-Yu
  • [P2] Fractional gpu support

Data

  • [P0] Heterogenous accelerator_type in the same pipeline
  • [P0] Multi-node TP+PP for large DeepSeek models
  • [P1] More vision language models
  • [P1] TPU support
  • [P2] More backends other than vLLM (e.g. sglang)

CI/CD and release pipeline

  • [P0] More release tests on data pipelines
  • [P0] For Serve release tests use gen-config on the critical path

Docs and community support

  • [P0] Cover gen-config in serve docs
  • [P0] Run doc-test on examples
  • [P0] Update vllm docs with ray cluster setup guide and serve and data code examples
  • [P1] Example of running deepseek R1 (huge model with ray serve multi node)
@kouroshHakha kouroshHakha pinned this issue Mar 12, 2025
@kouroshHakha kouroshHakha added enhancement Request for new feature and/or capability RFC RFC issues llm labels Mar 12, 2025
@kouroshHakha kouroshHakha changed the title [Q2 Roadmap][ray.llm] Tentative roadmap for Data and Serve LLM APIs [Roadmap][ray.llm] Tentative roadmap for Data and Serve LLM APIs Mar 12, 2025
@Qiaolin-Yu
Copy link

@kouroshHakha Hi, I’m very interested in the part about enabling Ray to support SGLang. Would it be possible for me to work on this?

@richardliaw
Copy link
Contributor

I think that'd be awesome @Qiaolin-Yu . How about let's coordinate on the Ray Slack?

@richardliaw richardliaw changed the title [Roadmap][ray.llm] Tentative roadmap for Data and Serve LLM APIs [llm] Roadmap for Data and Serve LLM APIs Mar 12, 2025
@Qiaolin-Yu
Copy link

I think that'd be awesome @Qiaolin-Yu . How about let's coordinate on the Ray Slack?

Sure!

@tnixon
Copy link

tnixon commented Mar 13, 2025

Would like to see ability to serve embedding models with Ray Serve LLM through a specific endpoint, as per:
#50639 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability llm RFC RFC issues
Projects
None yet
Development

No branches or pull requests

5 participants