[Evals] CLI for scoring custom datasets #33

yanxi0830 · 2024-11-15T17:21:02Z

TL;DR

Update dataset register CLI to have flexibility to take input files
Run app eval on custom dataset
- (1) via --dataset-path to read rows locally
- (2) via --dataset-id to read

Test

Run on custom dataset

llama-stack-client eval run_scoring braintrust::answer-correctness braintrust::factuality \
--dataset-path <path-to-local-dataset> \
--output-dir ./ \
--num-examples 5

1. Register a custom dataset

llama-stack-client datasets register --dataset-id <custom-dataset-id> \
--schema '{"generated_answer": {"type": "string"}, "expected_answer": {"type": "string"}, "input_query": {"type": "string"}}' \
--dataset-path <local-file-path> \
--provider-id huggingface-0

2. Run scoring on custom dataset

llama-stack-client eval run_scoring \
braintrust::answer-correctness braintrust::factuality \
--dataset-id <custom-dataset-id> --output-dir ./

src/llama_stack_client/lib/cli/eval/run_scoring.py

ashwinb

lg!

scoring cli

2a5b738

facebook-github-bot added the cla signed label Nov 15, 2024

yanxi0830 marked this pull request as ready for review November 15, 2024 17:23

yanxi0830 changed the base branch from main to pretty_table November 15, 2024 17:23

yanxi0830 changed the title ~~[Evals] CLI for scoring application datasets~~ [Evals] CLI for scoring custom datasets Nov 15, 2024

yanxi0830 added 3 commits November 15, 2024 14:05

scoring cli

f13ce6f

add option for dataset_path to read rows from local file

fc8b982

fix num examples

d96fe97

ashwinb reviewed Nov 15, 2024

View reviewed changes

src/llama_stack_client/lib/cli/eval/run_scoring.py Outdated Show resolved Hide resolved

fix num examples

1cf9e10

ashwinb approved these changes Nov 15, 2024

View reviewed changes

naming

b9223eb

Base automatically changed from pretty_table to main November 15, 2024 20:49

yanxi0830 merged commit 5eba7ad into main Nov 15, 2024
1 check passed

yanxi0830 deleted the app_eval branch November 15, 2024 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Evals] CLI for scoring custom datasets #33

[Evals] CLI for scoring custom datasets #33

yanxi0830 commented Nov 15, 2024 •

edited

Loading

ashwinb left a comment

[Evals] CLI for scoring custom datasets #33

[Evals] CLI for scoring custom datasets #33

Conversation

yanxi0830 commented Nov 15, 2024 • edited Loading

TL;DR

Test

Run on custom dataset

1. Register a custom dataset

2. Run scoring on custom dataset

ashwinb left a comment

Choose a reason for hiding this comment

yanxi0830 commented Nov 15, 2024 •

edited

Loading