Skip to content

[Evals] CLI for scoring custom datasets #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 15, 2024
Merged

[Evals] CLI for scoring custom datasets #33

merged 6 commits into from
Nov 15, 2024

Conversation

yanxi0830
Copy link
Contributor

@yanxi0830 yanxi0830 commented Nov 15, 2024

TL;DR

  • Update dataset register CLI to have flexibility to take input files
  • Run app eval on custom dataset
    • (1) via --dataset-path to read rows locally
    • (2) via --dataset-id to read
image

Test

Run on custom dataset

llama-stack-client eval run_scoring braintrust::answer-correctness braintrust::factuality \
--dataset-path <path-to-local-dataset> \
--output-dir ./ \
--num-examples 5

1. Register a custom dataset

llama-stack-client datasets register --dataset-id <custom-dataset-id> \
--schema '{"generated_answer": {"type": "string"}, "expected_answer": {"type": "string"}, "input_query": {"type": "string"}}' \
--dataset-path <local-file-path> \
--provider-id huggingface-0

2. Run scoring on custom dataset

llama-stack-client eval run_scoring \
braintrust::answer-correctness braintrust::factuality \
--dataset-id <custom-dataset-id> --output-dir ./ 

@yanxi0830 yanxi0830 marked this pull request as ready for review November 15, 2024 17:23
@yanxi0830 yanxi0830 changed the base branch from main to pretty_table November 15, 2024 17:23
@yanxi0830 yanxi0830 changed the title [Evals] CLI for scoring application datasets [Evals] CLI for scoring custom datasets Nov 15, 2024
Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg!

Base automatically changed from pretty_table to main November 15, 2024 20:49
@yanxi0830 yanxi0830 merged commit 5eba7ad into main Nov 15, 2024
1 check passed
@yanxi0830 yanxi0830 deleted the app_eval branch November 15, 2024 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants