[Core][V0] Add guidance backend for structured output #14589

russellb · 2025-03-11T01:02:02Z

This commit is based on the PR #10217. It is updated to be compatible
with main.

Signed-off-by: Russell Bryant rbryant@redhat.com
Co-authored-by: Loc Huynh lohuynh@microsoft.com
Co-authored-by: Michal Moskal michal@moskal.me

I started looking at this after talking to @joerunde about some performance issues observed in production. While the ultimate goal is to get everyone to V1 where we expect to provide more drastic improvements, I wanted to see if we could do something to help in V0 in the meantime.

In my testing, the behavior I see is roughly:

xgrammar is still fastest, but has more limited jsonschema support, so it doesn't solve the challenge here.
guidance provides a signiificant improvement to TTFT, which is the biggest concern here. They observed large and complex jsonschemas taking down the server for an excessively long time.
guidance is showing a hit to TPOT for structured output requests.

I'm posting here so further testing can be done to validate whether the performance characteristics provide enough short term benefit to justify including this in tree.

github-actions · 2025-03-11T01:02:13Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-03-11T01:02:38Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @russellb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Harsha-Nori · 2025-03-11T04:36:38Z

Hey @russellb this looks fantastic! On point 3 -- is it possible to share an example of a workload where TPOT is reducing vs. xgrammar? I don't typically see this in JSON schemas so it'd be a helpful case to look for performance improvements.

russellb · 2025-03-11T13:28:05Z

Hey @russellb this looks fantastic! On point 3 -- is it possible to share an example of a workload where TPOT is reducing vs. xgrammar? I don't typically see this in JSON schemas so it'd be a helpful case to look for performance improvements.

I'm using these changes to the benchmark suite on top of this PR: #14567

I'm running the following command, changing the structured output backend to guidance, xgrammar, or outlines, and also running with a request rate of 1, 5, 10.

python3 benchmarks/benchmark_serving_structured_output.py --port 8432 --model meta-llama/Llama-3.1-8B-Instruct --dataset json-unique --structured-output-ratio 1.0 --structured-output-backend guidance --output-len 300 --num-prompts 90 --request-rate 1

This commit is based on the PR vllm-project#10217. It is updated to be compatible with `main`. Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <lohuynh@microsoft.com> Co-authored-by: Michal Moskal <michal@moskal.me>

russellb · 2025-03-11T13:40:28Z

I'm using these changes to the benchmark suite on top of this PR: #14567

This has been merged and I rebased this branch on top of main to include it.

russellb requested a review from mgoin as a code owner March 11, 2025 01:02

mergify bot added ci/build structured-output labels Mar 11, 2025

mergify bot added the needs-rebase label Mar 11, 2025

russellb force-pushed the llguidance-v0-integration branch from 5244b7f to 2db1a0e Compare March 11, 2025 01:14

mergify bot removed the needs-rebase label Mar 11, 2025

russellb force-pushed the llguidance-v0-integration branch from 2db1a0e to d66d439 Compare March 11, 2025 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core][V0] Add guidance backend for structured output #14589

[Core][V0] Add guidance backend for structured output #14589

russellb commented Mar 11, 2025

github-actions bot commented Mar 11, 2025

mergify bot commented Mar 11, 2025

Harsha-Nori commented Mar 11, 2025

russellb commented Mar 11, 2025

russellb commented Mar 11, 2025

[Core][V0] Add guidance backend for structured output #14589

Are you sure you want to change the base?

[Core][V0] Add guidance backend for structured output #14589

Conversation

russellb commented Mar 11, 2025

github-actions bot commented Mar 11, 2025

mergify bot commented Mar 11, 2025

Harsha-Nori commented Mar 11, 2025

russellb commented Mar 11, 2025

russellb commented Mar 11, 2025