Add G-Pass@k Metric #589

jnanliu · 2025-02-26T10:14:10Z

This PR aims to support the G-Pass@k from paper.

G-Pass@k is a generalized version of Pass@k, measuring the ability of models to generate m correct solution in k attempts, where m is controled by the parameter thresholds. When the threshold is 0, G-Pass@k will discard to G-Pass@k. G-Pass@k can measure the potential and stability of models simultaneously.

$$ \text{G-Pass@}k_{\tau} = E_{\text{Questions}} \left[ \sum_{j = \lceil \tau \cdot k \rceil}^{c} \frac{\binom{c}{j} \cdot \binom{n - c}{k - j}}{\binom{n}{k}} \right] $$

$$ \text{mG-Pass@}k_{\tau} = 2\int_{0.5}^{1.0} \text{G-Pass@}k_{\tau} d \tau = \frac{2}{k} \sum_{i= \lceil 0.5 \cdot k \rceil + 1}^{k} \text{G-Pass@}k_{\frac{i}{k}} $$

NathanHB · 2025-03-04T12:59:29Z

hey ! Thanks for the PR :)
Do you plan to also add the math benchmark that comes with it ?

…g-pass-at-k-dev

jnanliu · 2025-03-05T11:40:42Z

hey, I have added some tasks in tasks/default_tasks.py that support G-Pass@16 evaluation on AIME24/25 and MATH500 benchmarks, you can check it :)

tonysy · 2025-04-07T12:46:46Z

@NathanHB Hi, I would like to know if this PR can be merged now? or does it need more further modification?

NathanHB

Looks good ! Only a verry small nit, wait for the tests and good to merge

NathanHB · 2025-04-08T09:56:48Z

src/lighteval/metrics/metrics_sample.py

+        k: Union[int, List[int]],
+        n: int = None,
+        thresholds: List[float] = [0.0, 0.25, 0.5, 0.75, 1.0],


list instead of List

have fixed it

HuggingFaceDocBuilderDev · 2025-04-08T11:40:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

jnanliu · 2025-04-14T15:49:25Z

@NathanHB Hi, I have fixed the nit, please allow the checks :)

jnanliu added 4 commits February 26, 2025 07:27

add gpassk metric

b7e104a

fix pre-commit error

7ae3af0

fix return type check

a7ae7cc

fix metrics

c6e7a48

jnanliu added 2 commits March 5, 2025 09:31

Merge branch 'main' of https://github.com/huggingface/lighteval into …

c2ddba8

…g-pass-at-k-dev

support gpassk for aime24/25 and math_500

947a5ec

NathanHB approved these changes Apr 8, 2025

View reviewed changes

Merge branch 'main' into g-pass-at-k-dev

d06b184

jnanliu added 3 commits April 12, 2025 00:12

fix List to list

c65701b

Merge branch 'main' into g-pass-at-k-dev

e17639e

Merge branch 'main' into g-pass-at-k-dev

8719d6a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add G-Pass@k Metric #589

Add G-Pass@k Metric #589

jnanliu commented Feb 26, 2025

NathanHB commented Mar 4, 2025

jnanliu commented Mar 5, 2025

tonysy commented Apr 7, 2025

NathanHB left a comment

NathanHB Apr 8, 2025

jnanliu Apr 11, 2025

HuggingFaceDocBuilderDev commented Apr 8, 2025

jnanliu commented Apr 14, 2025

Add G-Pass@k Metric #589

Are you sure you want to change the base?

Add G-Pass@k Metric #589

Conversation

jnanliu commented Feb 26, 2025

NathanHB commented Mar 4, 2025

jnanliu commented Mar 5, 2025

tonysy commented Apr 7, 2025

NathanHB left a comment

Choose a reason for hiding this comment

NathanHB Apr 8, 2025

Choose a reason for hiding this comment

jnanliu Apr 11, 2025

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 8, 2025

jnanliu commented Apr 14, 2025