MelNet

Implementation of MelNet: A Generative Model for Audio in the Frequency Domain

Prerequisites

Tested with Python 3.6.8 & 3.7.4, PyTorch 1.2.0 & 1.3.0.
pip install -r requirements.txt

How to train

Datasets

Blizzard, VoxCeleb2, and KSS have YAML files provided under config/. For other datasets, fill out your own YAML file according to the other provided ones.
Unconditional training is possible for all kinds of datasets, provided that they have a consistent file extension specified by data.extension within the YAML file.
Conditional training is currently only implemented for KSS and a subset of the Blizzard dataset.

Running the code

python trainer.py -c [config YAML file path] -n [name of run] -t [tier number] -b [batch size] -s [TTS]
- Each tier can be trained separately. Since each tier is larger than the one before it (with the exception of tier 1), modify the batch size for each tier.
  - Tier 6 of the Blizzard dataset does not fit on a 16GB P100, even with a batch size of 1.
- The -s flag is a boolean for determining whether to train a TTS tier. Since a TTS tier only differs at tier 1, this flag is ignored when [tier number] != 0 . Warning: this flag is toggled True no matter what follows the flag. Ignore it if you're not planning to use it.

How to sample

Preparing the checkpoints

The checkpoints must be stored under chkpt/.
A YAML file named inference.yaml must be provided under config/.
inference.yaml must specify the number of tiers, the names of the checkpoints, and whether or not it is a conditional generation.

Running the code

python inference.py -c [config YAML file path] -p [inference YAML file path] -t [timestep of generated mel spectrogram] -n [name of sample] -i [input sentence for conditional generation]
- Timestep refers to the length of the mel spectrogram. The ratio of timestep to seconds is roughly [sample rate] : [hop length of FFT].
- The -i flag is optional, only needed for conditional generation. Surround the sentence with "" and end with ..
- Both unconditional generation and conditional generation currently does not support primed generation (extrapolating from provided data).

Name	Name	Last commit message	Last commit date
Latest commit Rick-McCoy Update README.md Jan 21, 2020 0b6788f · Jan 21, 2020 History 147 Commits
assets	assets	wrote README.md, requirements.txt	Oct 9, 2019
config	config	Now using PackedSequences.	Nov 16, 2019
datasets	datasets	Now using PackedSequences.	Nov 16, 2019
model	model	Miscellaneous updates	Nov 20, 2019
text	text	TTS implementation	Oct 25, 2019
utils	utils	Miscellaneous updates	Nov 20, 2019
.gitignore	.gitignore	Added Blizzard dataset in TTS.	Nov 14, 2019
LICENSE	LICENSE	Initial commit	Aug 17, 2019
README.md	README.md	Update README.md	Jan 21, 2020
inference.py	inference.py	Fixed reconstruction: now works.	Nov 13, 2019
requirements.txt	requirements.txt	Miscellaneous changes.	Nov 9, 2019
trainer.py	trainer.py	Add inversion of spectrogram to inference	Nov 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MelNet

Prerequisites

How to train

Datasets

Running the code

How to sample

Preparing the checkpoints

Running the code

To-do

Implementation authors

License

About

Releases

Packages

Contributors 2

Languages

License

Deepest-Project/MelNet

Folders and files

Latest commit

History

Repository files navigation

MelNet

Prerequisites

How to train

Datasets

Running the code

How to sample

Preparing the checkpoints

Running the code

To-do

Implementation authors

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages