[V1][Core] using cached vocab_size for Structured Outputs #14630

aarnphm · 2025-03-11T17:34:43Z

Previously, we obtained vocab_size for xgrammar from hf_text_config directly.

However, in the recent version of xgrammar, the detected vocab_size now include special_tokens, in which raises the issue found in #14534

By calculating the vocab size, it ensures supporting custom tokenizers with the like of Olmo, etc.

github-actions · 2025-03-11T17:34:57Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

njhill · 2025-03-11T17:50:15Z

vllm/v1/structured_output/__init__.py

        self.vllm_config = vllm_config

        tokenizer = tokenizer_group.get_lora_tokenizer(None)
+        self.vocab_size = len(tokenizer.get_vocab())


Could use len(tokenizer), which is cached: https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer.py#L101

Or it might be better to use tokenizer.max_token_id. I have seen cases where the vocab size is actually larger than the number of tokens in the tokenizer since some were removed.

Will need @Ubospica confirmation here.

aarnphm · 2025-03-11T18:11:26Z

Will add tests once #14625 is merged

mergify · 2025-03-12T05:18:09Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @aarnphm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

aarnphm requested review from mgoin and russellb as code owners March 11, 2025 17:34

mergify bot added the v1 label Mar 11, 2025

aarnphm added the structured-output label Mar 11, 2025

njhill reviewed Mar 11, 2025

View reviewed changes

aarnphm force-pushed the v1/molmo-aria-support-vocab branch 2 times, most recently from 8fa2aa6 to a53c058 Compare March 12, 2025 00:20

aarnphm changed the title ~~[V1][Core] calculating vocab_size from given tokenizer~~ [V1][Core] using cached vocab_size for Structured Outputs Mar 12, 2025

aarnphm mentioned this pull request Mar 12, 2025

[Bug]: [V1] Molmo/Aria not supported on V1 due to xgrammar #14534

Open

1 task

mergify bot added the needs-rebase label Mar 12, 2025

aarnphm added 2 commits March 12, 2025 03:49

fix(v1): support for vision models

e248036

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

chore: use max_tokens_id

8872481

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

aarnphm force-pushed the v1/molmo-aria-support-vocab branch from 689278a to 8872481 Compare March 12, 2025 07:50

mergify bot removed the needs-rebase label Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1][Core] using cached vocab_size for Structured Outputs #14630

[V1][Core] using cached vocab_size for Structured Outputs #14630

aarnphm commented Mar 11, 2025

github-actions bot commented Mar 11, 2025

njhill Mar 11, 2025

aarnphm Mar 11, 2025

aarnphm commented Mar 11, 2025

mergify bot commented Mar 12, 2025

[V1][Core] using cached vocab_size for Structured Outputs #14630

Are you sure you want to change the base?

[V1][Core] using cached vocab_size for Structured Outputs #14630

Conversation

aarnphm commented Mar 11, 2025

github-actions bot commented Mar 11, 2025

njhill Mar 11, 2025

Choose a reason for hiding this comment

aarnphm Mar 11, 2025

Choose a reason for hiding this comment

aarnphm commented Mar 11, 2025

mergify bot commented Mar 12, 2025