Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmarks: simplify test jsonschema #14567

Merged
merged 2 commits into from
Mar 11, 2025

Conversation

russellb
Copy link
Member

@russellb russellb commented Mar 10, 2025

benchmarks: Simplify test jsonschema

PR #13288 changed this schema from an object to an array. The real issue
was that the original schema used features unsupported by xgrammar.
However, by switching to an array, I saw the failure rate go up a lot in
my benchmark runs. This is because as an unbounded array, we would often
cut off generation in the middle of a json response, so it failed to be
parsed.

This change replaces it with a trivial jsonschema. This should be
effective in focusing benchmarks more on the overhead vs the generation.

The change also broke the json-unique option, since the code for
inserting the unique property assumbed the top level was an object. This
change makes json-unique work again.

Signed-off-by: Russell Bryant rbryant@redhat.com

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@russellb
Copy link
Member Author

I actually want to change the schema back to closer to what it was before. I'm getting worse correctness results with the benchmark script and this schema using an array. We limit the output length and so it's likely that we cut it off in the middle somewhere before the array is closed.

@joerunde
Copy link
Collaborator

I actually want to change the schema back to closer to what it was before. I'm getting worse correctness results with the benchmark script and this schema using an array. We limit the output length and so it's likely that we cut it off in the middle somewhere before the array is closed.

Should we remove the length limit and allow the model to finish writing valid json? That might be a bit more representative of how structured output is used in practice

PR vllm-project#13288 changed this schema from an object to an array. The real issue
was that the original schema used features unsupported by xgrammar.
However, by switching to an array, I saw the failure rate go up a lot in
my benchmark runs. This is because as an unbounded array, we would often
cut off generation in the middle of a json response, so it failed to be
parsed.

This change replaces it with a trivial jsonschema. This should be
effective in focusing benchmarks more on the overhead vs the generation.

The change also broke the `json-unique` option, since the code for
inserting the unique property assumbed the top level was an object. This
change makes `json-unique` work again.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
@russellb russellb changed the title benchmarks: Fix json-unique dataset for change to schema benchmarks: simplify test jsonschema Mar 10, 2025
@russellb
Copy link
Member Author

I actually want to change the schema back to closer to what it was before. I'm getting worse correctness results with the benchmark script and this schema using an array. We limit the output length and so it's likely that we cut it off in the middle somewhere before the array is closed.

Should we remove the length limit and allow the model to finish writing valid json? That might be a bit more representative of how structured output is used in practice

In this latest version, it's unlikely the limit will matter since the jsonschema is so trivial. I figure for this test, we can aim for small so the benchmark is more focused on the per-request overhead.

I'm going to push another dataset option that uses a larger and more complex schema. For that one I would agree we want to either not use a limit, or make it much larger than what I've been using.

@russellb russellb requested a review from aarnphm March 10, 2025 20:49
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wfm. Let's merge this one in then we can add more complex schemas afterward.

@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) March 11, 2025 13:24
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 11, 2025
@robertgshaw2-redhat robertgshaw2-redhat merged commit 08a1a11 into vllm-project:main Mar 11, 2025
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants