Audio Wave
VoxCeleb
A large scale audio-visual dataset of human speech

VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube

7,000 +

speakers

VoxCeleb contains speech from speakers spanning a wide range of different ethnicities, accents, professions and ages.





utterances

Utterance Lengths

1 million +

utterances

All speaking face-tracks are captured "in the wild", with background chatter, laughter, overlapping speech, pose variation and different lighting conditions.




gender


Gender Distribution

2,000 +

hours

VoxCeleb consists of both audio and video. Each segment is at least 3 seconds long.





nationalities


Nationality Distribution

Dataset

There are two versions of this dataset, VoxCeleb1 and VoxCeleb2. VoxCeleb1 consists of more than 150,000 utterances from 1251 celebrities, and VoxCeleb2 consists of more than 1,000,000 utterances from 6112 celebrities.

For privacy issues with the dataset, please refer to our Dataset Privacy Notice.

Publications

Please cite the following if you make use of the dataset.

A. Nagrani*, J. S. Chung*, W. Xie, A. Zisserman
Computer Science and Language, 2019


INTERSPEECH, 2018.


INTERSPEECH, 2017.



* Equal Contribution

Applications

VoxCeleb can be used for a number of applications including:








Related Links

Static Face Images for all the identities in VoxCeleb1 can be found in the VGGFace dataset.

Static Face Images for all the identities in VoxCeleb2 can be found in the VGGFace2 dataset.

If you require text annotation (e.g. for audio-visual speech recognition), also consider using the LRS dataset.

Emotion labels obtained using an automatic classifier can be found for the faces in VoxCeleb1 here as part of the 'EmoVoxCeleb' dataset.

Challenge

We host a VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech every year. This is a speaker recognition challenge held on the VoxCeleb datasets! VoxSRC consists of an online challenge and an accompanying workshop at Interspeech.



Previous Challenges

VoxCeleb Speaker Recognition Challenge 2019 (VoxSRC-2019): Challenge / Workshop / Report
VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-2020): Challenge / Workshop / Report
VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-2021): Challenge / Workshop / Report
VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-2022): Challenge / Workshop / Report

Acknowledgements

This work is supported by the EPSRC programme grant Seebibyte EP/M013774/1: Visual Search for the Era of Big Data.