BuilderConfig questions #13

xiaos16 · 2025-03-14T06:43:04Z

hello, When I test tedlium and fleurs-zh， I get these errors,

ERROR - load args is {'path': 'TwinkStart/tedlium', 'name': 'release1'}load dataset error: BuilderConfig 'release1' not found. Available: ['default']

ERROR - load args is {'path': 'google/fleurs', 'name': 'cmn_hans_cn', 'split': 'test'}load dataset error: BuilderConfig 'cmn_hans_cn' not found. Available: ['default']

how to solve it ? thanks ！

UltraEval · 2025-03-18T02:34:07Z

Unable to Reproduce the Bug

To verify the issue, please run the following code and share your results:

from datasets import get_dataset_config_names

configs = get_dataset_config_names("TwinkStart/tedlium")
print(configs)

from datasets import get_dataset_config_names

configs = get_dataset_config_names("google/fleurs")
print(configs)

xiaos16 · 2025-03-18T09:40:26Z

from datasets import get_dataset_config_names
configs = get_dataset_config_names("TwinkStart/tedlium")
print(configs)
['default']
from datasets import get_dataset_config_names
configs = get_dataset_config_names("google/fleurs")
print(configs)
['default']

and I use comands with
'''
python audio_evals/main.py --dataset fleurs-zh --prompt mini-cpm-omni-asr-zh --model MiniCPMo2_6-audio
python audio_evals/main.py --dataset tedlium-release1 --prompt mini-cpm-omni-asr-en --model MiniCPMo2_6-audio
'''

in /UltraEval-Audio-main/registry/dataset/fleurs.yaml, it is this:

fleurs-zh:
class: audio_evals.dataset.huggingface.Huggingface
args:
subset: cmn_hans_cn
default_task: asr-zh
name: google/fleurs
ref_col: raw_transcription
split: test

UltraEval · 2025-03-18T09:45:00Z

from datasets import get_dataset_config_names
configs = get_dataset_config_names("TwinkStart/tedlium")
print(configs)
['default']
from datasets import get_dataset_config_names
configs = get_dataset_config_names("google/fleurs")
print(configs)
['default']

and I use comands with ''' python audio_evals/main.py --dataset fleurs-zh --prompt mini-cpm-omni-asr-zh --model MiniCPMo2_6-audio python audio_evals/main.py --dataset tedlium-release1 --prompt mini-cpm-omni-asr-en --model MiniCPMo2_6-audio '''

in /UltraEval-Audio-main/registry/dataset/fleurs.yaml, it is this:

fleurs-zh: class: audio_evals.dataset.huggingface.Huggingface args: subset: cmn_hans_cn default_task: asr-zh name: google/fleurs ref_col: raw_transcription split: test

I know this config, you should run the following code with your python shell:

from datasets import get_dataset_config_names

configs = get_dataset_config_names("TwinkStart/tedlium")
print(configs)

from datasets import get_dataset_config_names

configs = get_dataset_config_names("google/fleurs")
print(configs)

and share your results

xiaos16 · 2025-03-18T09:50:31Z

yes, the results are both ['default']

from datasets import get_dataset_config_names
configs = get_dataset_config_names("TwinkStart/tedlium")
print(configs)
['default']
from datasets import get_dataset_config_names
configs = get_dataset_config_names("google/fleurs")
print(configs)
['default']

xiaos16 · 2025-03-18T09:51:20Z

UltraEval · 2025-03-18T10:01:28Z

yes, the results are both ['default']

from datasets import get_dataset_config_names configs = get_dataset_config_names("TwinkStart/tedlium") print(configs) ['default'] from datasets import get_dataset_config_names configs = get_dataset_config_names("google/fleurs") print(configs) ['default']

You need check:

connect hf website: https://huggingface.co/datasets/TwinkStart/tedlium

it should be like:

upgrade datasets package

xiaos16 · 2025-03-18T10:56:57Z

which verison do you use? I use datasets ==3.3.2. I also try 3.4.1, but it is not ok for me.

I've downloaded the data locally.

./TwinkStart/tedlium/release1/test-00000-of-00001.parquet

./google/fleurs/data/cmn_hans_cn/audio/test.tar.gz

UltraEval · 2025-03-19T07:25:29Z

which verison do you use? I use datasets ==3.3.2. I also try 3.4.1, but it is not ok for me.

I've downloaded the data locally.

./TwinkStart/tedlium/release1/test-00000-of-00001.parquet

./google/fleurs/data/cmn_hans_cn/audio/test.tar.gz

you can try download hf data with following code

save_path='xx'
dataset = load_dataset('TwinkStart/tedlium', name='release1',  cache_dir=save_path)

xiaos16 · 2025-03-20T03:19:12Z

which verison do you use? I use datasets ==3.3.2. I also try 3.4.1, but it is not ok for me.
I've downloaded the data locally.
./TwinkStart/tedlium/release1/test-00000-of-00001.parquet
./google/fleurs/data/cmn_hans_cn/audio/test.tar.gz

you can try download hf data with following code

save_path='xx'
dataset = load_dataset('TwinkStart/tedlium', name='release1', cache_dir=save_path)

I will try it, thanks !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BuilderConfig questions #13

BuilderConfig questions #13

xiaos16 commented Mar 14, 2025

UltraEval commented Mar 18, 2025 •

edited

Loading

xiaos16 commented Mar 18, 2025

UltraEval commented Mar 18, 2025

xiaos16 commented Mar 18, 2025

xiaos16 commented Mar 18, 2025

UltraEval commented Mar 18, 2025

xiaos16 commented Mar 18, 2025 •

edited

Loading

UltraEval commented Mar 19, 2025

xiaos16 commented Mar 20, 2025

BuilderConfig questions #13

BuilderConfig questions #13

Comments

xiaos16 commented Mar 14, 2025

UltraEval commented Mar 18, 2025 • edited Loading

xiaos16 commented Mar 18, 2025

UltraEval commented Mar 18, 2025

xiaos16 commented Mar 18, 2025

xiaos16 commented Mar 18, 2025

UltraEval commented Mar 18, 2025

xiaos16 commented Mar 18, 2025 • edited Loading

UltraEval commented Mar 19, 2025

xiaos16 commented Mar 20, 2025

UltraEval commented Mar 18, 2025 •

edited

Loading

xiaos16 commented Mar 18, 2025 •

edited

Loading