This project is a comparative study on the workings of Generative Adversarial Networks (GANs) using Anime Face Data. The project aims to generate facial images of Anime, Dogs, and Humans using sketches and Deep Convolutional Gans (DCGAN). The project is developed using the PyTorch framework and is trained on a NVIDIA GeForce RTX GPU machine. The project uses Cats and Dogs, Human Faces, and Anime Faces datasets.
python -m src [train, test] [options]
If you have not done any of the preprocessing steps required for this project,
this project will not run, there is a provided, pretrained model, and i have
also given instructions on how to run this project, within the
src/create_dataset
folder and the src
folder itself.
For examples I have uploaded Everything on youtube: ACGAN Training process, for 2k epochs.
If more time is available, the project will be refactored to provide more information about the train and test.py files arguments that need to be parsed into the module.
Currently, arguments can be parsed using the following command: python -m src train --i 1000
, which trains the model for 1000 iterations.
To get more information about the arguments, the user must run the following command: python -m src [train, test] --help
.
If this is the first time running the project, the user must refer to the src/create_dataset
directory and change any constants within
utils/constants.py
. This is necessary to ensure that the project is set up correctly and is ready to run.
was used as a placeholder to understand the basic workings of GANs. Despite some degree of learning about dog appearance, the results were not very realistic. Overfitting was observed, and further data processing is required to improve the results.
was used for a deeper understanding of the underlying concepts of GANs. After training for 1000 epochs, the results showed some benefits, but convergence failed due to the gradient ascent change shown in the DCGAN study. Further network testing is required to resolve this issue.
The Anime Faces dataset is the core part of this project and provides the most promising results. In order to generate the Anime facial images, the project uses the illustration2vec (i2v) tool to extract the labels and generate them. The i2v tool is used to convert the illustrations into a vector representation, which is then used as input for the GANs. In addition to using the i2v tool, the Anime Faces dataset was also manually classified to provide more accurate labels. This manual classification process involved the review and classification of each illustration into specific categories. Datasets: The project uses two main datasets:
- Automatically generated dataset using illustration2vec (i2v) tool.
- Manually classified dataset, which is created by reviewing and classifying each illustration into specific categories.
By using these two datasets, the project aims to provide a comprehensive study of the workings of GANs using Anime Face data.
Constants used in the project can be found in the src/utils/constant.py file. To run the module, you can use the following command: python -m src. There are two core parameters that need to be set - to run tests, the run_test boolean operator must be switched.
In the future, there will be a refactoring that will have a json file for reference, instead of manually running data.
The project now uses auxiliary Gans, with one DCGAN having extra layers and another without. The data generated by these two GANs is compared with each other and with normal Gans and CGANs.
The evaluation of auxiliary Gans compared to normal Gans and CGANs will provide valuable insights into the workings of GANs.
After working with ACGANs and experiencing issues with mode collapse, I realized that although the labeling was correct, the resulting images produced by the model had a distorted, "acid-like" appearance. In order to address this problem, I decided to switch to a new method that used gradient penalty to help stabilize the training process. By incorporating a gradient penalty term into the generator and discriminator loss functions, I was able to encourage the model to learn more diverse and representative features, which ultimately led to higher-quality generated images.
AcGan (Auxiliary Classifier GAN) is a type of generative adversarial network (GAN) that is used for image synthesis tasks. It was introduced in a 2016 paper by Odena et al. and is notable for its use of an auxiliary classifier in the discriminator network. The auxiliary classifier helps to classify both the real and generated images by their class labels, which are typically used to control the features of the generated images.
One of the key benefits of AcGan is that it allows for the generation of images that can be controlled based on specific class labels. For example, in a dataset of faces, different labels could be assigned to different attributes of the face, such as hair color, eye color, or gender. This allows the generator to learn to produce images that match specific attributes, and the discriminator to learn to distinguish between real and fake images based on those attributes.
However, AcGan has a known issue of mode collapse, where the generator produces limited variations of the same type of image. This occurs because the discriminator's auxiliary classifier is trained to classify the generated images by their class label, which can result in the generator learning to produce only a few types of images that are similar to each other.
WAC-GAN (Wasserstein Auxiliary Classifier GAN) is a type of GAN that was introduced as an extension of AcGan by Gulrajani et al. in a 2017 paper. It incorporates a Wasserstein distance metric and gradient penalty to improve the stability of training and reduce the occurrence of mode collapse. The Wasserstein distance metric is used to measure the distance between the generated and real distributions, while the gradient penalty is used to ensure that the discriminator has a gradient with respect to the input data. By incorporating these techniques, WAC-GAN is able to produce more diverse and high-quality images compared to AcGan. One key advantage of WAC-GAN is its ability to generate images with high fidelity while maintaining control over specific attributes. The use of the Wasserstein distance metric and gradient penalty also leads to more stable training and better convergence properties compared to other GAN architectures.
In January, the project was rewritten to improve its functionality. The code can be found in the commit 3844e3ec.
Running the Project (updated): There are two ways to run this project. One way is through python -m src [train , test ] -- their file parameters. The other way is to cd into src and run python [train,test] --help. This project also uses MLops with W&B for better monitoring and evaluation.