benchmarks: simplify test jsonschema #14567

russellb · 2025-03-10T14:26:12Z

benchmarks: Simplify test jsonschema

PR #13288 changed this schema from an object to an array. The real issue
was that the original schema used features unsupported by xgrammar.
However, by switching to an array, I saw the failure rate go up a lot in
my benchmark runs. This is because as an unbounded array, we would often
cut off generation in the middle of a json response, so it failed to be
parsed.

This change replaces it with a trivial jsonschema. This should be
effective in focusing benchmarks more on the overhead vs the generation.

The change also broke the json-unique option, since the code for
inserting the unique property assumbed the top level was an object. This
change makes json-unique work again.

Signed-off-by: Russell Bryant rbryant@redhat.com

github-actions · 2025-03-10T14:26:35Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

russellb · 2025-03-10T16:15:27Z

I actually want to change the schema back to closer to what it was before. I'm getting worse correctness results with the benchmark script and this schema using an array. We limit the output length and so it's likely that we cut it off in the middle somewhere before the array is closed.

joerunde · 2025-03-10T18:55:11Z

I actually want to change the schema back to closer to what it was before. I'm getting worse correctness results with the benchmark script and this schema using an array. We limit the output length and so it's likely that we cut it off in the middle somewhere before the array is closed.

Should we remove the length limit and allow the model to finish writing valid json? That might be a bit more representative of how structured output is used in practice

PR vllm-project#13288 changed this schema from an object to an array. The real issue was that the original schema used features unsupported by xgrammar. However, by switching to an array, I saw the failure rate go up a lot in my benchmark runs. This is because as an unbounded array, we would often cut off generation in the middle of a json response, so it failed to be parsed. This change replaces it with a trivial jsonschema. This should be effective in focusing benchmarks more on the overhead vs the generation. The change also broke the `json-unique` option, since the code for inserting the unique property assumbed the top level was an object. This change makes `json-unique` work again. Signed-off-by: Russell Bryant <rbryant@redhat.com>

russellb · 2025-03-10T20:48:55Z

I actually want to change the schema back to closer to what it was before. I'm getting worse correctness results with the benchmark script and this schema using an array. We limit the output length and so it's likely that we cut it off in the middle somewhere before the array is closed.

Should we remove the length limit and allow the model to finish writing valid json? That might be a bit more representative of how structured output is used in practice

In this latest version, it's unlikely the limit will matter since the jsonschema is so trivial. I figure for this test, we can aim for small so the benchmark is more focused on the per-request overhead.

I'm going to push another dataset option that uses a larger and more complex schema. For that one I would agree we want to either not use a limit, or make it much larger than what I've been using.

Signed-off-by: Russell Bryant <rbryant@redhat.com>

aarnphm

wfm. Let's merge this one in then we can add more complex schemas afterward.

russellb force-pushed the fix-json-unique branch from f30e3a7 to f338f28 Compare March 10, 2025 20:44

russellb changed the title ~~benchmarks: Fix json-unique dataset for change to schema~~ benchmarks: simplify test jsonschema Mar 10, 2025

russellb requested a review from aarnphm March 10, 2025 20:49

benchmarks: Expand number of fields in test json

bb68cdf

Signed-off-by: Russell Bryant <rbryant@redhat.com>

aarnphm approved these changes Mar 10, 2025

View reviewed changes

robertgshaw2-redhat approved these changes Mar 11, 2025

View reviewed changes

robertgshaw2-redhat enabled auto-merge (squash) March 11, 2025 13:24

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 11, 2025

russellb mentioned this pull request Mar 11, 2025

[Core][V0] Add guidance backend for structured output #14589

Open

robertgshaw2-redhat merged commit 08a1a11 into vllm-project:main Mar 11, 2025
26 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks: simplify test jsonschema #14567

benchmarks: simplify test jsonschema #14567

russellb commented Mar 10, 2025 •

edited

Loading

github-actions bot commented Mar 10, 2025

russellb commented Mar 10, 2025

joerunde commented Mar 10, 2025

russellb commented Mar 10, 2025

aarnphm left a comment

benchmarks: simplify test jsonschema #14567

benchmarks: simplify test jsonschema #14567

Conversation

russellb commented Mar 10, 2025 • edited Loading

github-actions bot commented Mar 10, 2025

russellb commented Mar 10, 2025

joerunde commented Mar 10, 2025

russellb commented Mar 10, 2025

aarnphm left a comment

Choose a reason for hiding this comment

russellb commented Mar 10, 2025 •

edited

Loading