Skip to content

Repository containing the files for Project 2 "Towards Improving Handwritten Chess Notation Classification Using Machine Learning Models", course "Machine Learning", Fall 2024, EPFL

Notifications You must be signed in to change notification settings

MikyLanfra/Chess_Character_Recognition

 
 

Repository files navigation

CS433_Project_2

Running a job on EPFL cluster

Data Generation Pipeline

This guide provides an overview of the data generation pipeline and instructions for using it effectively. All relevant code is located in the src/data_generation_pipeline folder.


Table of Contents

  1. Overview
  2. Getting Started
  3. Running the Pipeline
  4. Configuration Files
  5. Alternative Method

Overview

The data generation pipeline is designed to generate data efficiently and save it to a specified folder. The main components of the pipeline include:

  • Main Script: data_generation.py (entry point for the pipeline)
  • Utility Scripts: image_setup.py, image_generation.py, image_processing.py (helper scripts for generating data)
  • Configuration Files: config_setup.json, config_process.json, config_characters.json (modular configurations for customization)
  • Jupyter Notebook: Data_Generation.ipynb (alternative data generation method for interactive use)

Getting Started

  1. Prerequisites:

    • Ensure you have Python installed (version >= 3.x).
    • Install the required dependencies using:
      pip install -r requirements.txt
    • For the CNN-BiLSTM model, you will need to install the following dependencies:
      pip install -r requirements_cnn-bilstm.txt
    • For the ViT model, you will need to install the following dependencies:
  2. Setup:

    • Navigate to the src/data_generation_pipeline folder:
      cd src/data_generation_pipeline
    • Review and modify the configuration files as needed (see Configuration Files).

Running the Pipeline

To generate data using the main script:

  1. Run the data_generation.py script:

    python data_generation.py
  2. The data will be generated and saved to the output folder specified in the configuration files.


Configuration Files

The pipeline uses the following configuration files for customization:

  • config_setup.json: Defines initial setup parameters for the data generation process.
  • config_process.json: Specifies processing settings, such as transformations or filters.
  • config_characters.json: Contains character-specific configurations for data generation.

To modify the configurations:

  1. Open the respective JSON file in a text editor or IDE.
  2. Update the values according to your requirements.
  3. Save the changes before running the pipeline.

Alternative Method

If you prefer an interactive approach, use the provided Jupyter notebook:

  1. Open Data_Generation.ipynb in Jupyter Notebook or Jupyter Lab.

    jupyter notebook Data_Generation.ipynb
  2. Follow the step-by-step instructions in the notebook to generate data.

  3. The notebook provides a quick and flexible way to experiment with the pipeline.


CNN-BiLSTM Training and Inference

Training the model

You can configure some of the model parameters in the src/models/cnn-bilstm/configs.py file, including the dataset_path which should point to the location of the generated dataset.

To run the training script, make sure you are in the project root directory and run the following command:

python src/models/cnn-bilstm/train_torch.py 

The model will be saved in the models/cnn-bilstm/ directory, in a folder named with the current timestamp. To use the model for inference you should put that model folder name as the MODEL_NAME variable in the src/models/cnn-bilstm/inferenceModel.py script.

Inference with the model

To run the inference script, make sure you are in the project root directory and run the following command:

python src/models/cnn-bilstm/inferenceModel.py 

The training and inference scripts store results using wandb, if you do not have an account set up with wandb, you can comment out the wandb related code in the scripts.



ViT Training and Inference

Training/Test/Validation the model

  1. Install environment_vit.yml using conda
  2. Download pretrained .pth model and put into \vit\saved: https://drive.google.com/file/d/1Wp2YWZF2wsnmVqITLQI8F9tlhly8jEvn/view?usp=sharing
  3. Use jupyter notebook: \VIT\main_pretrained.ipynb

├── environment_vit.yml
├── models
│   └── cnn-bilstm
│       └── best
│           ├── best_model.onnx
│           ├── best_model.pt
│           ├── configs.yaml
│           ├── error_analysis
│           │   ├── error_analysis.csv
│           │   └── preds_and_labels_padded.csv
│           ├── logs
│           │   ├── test
│           │   │   └── events.out.tfevents.1733556571.i01.796664.1
│           │   └── train
│           │       └── events.out.tfevents.1733556571.i01.796664.0
│           ├── model.pt
│           ├── train.csv
│           └── val.csv
├── README.md
├── requirements_cnn-bilstm.txt
├── requirements.txt
├── src
│   ├── data_generation_pipeline
│   │   ├── all_moves_proba.txt
│   │   ├── config_characters.json
│   │   ├── config_process.json
│   │   ├── config_setup.json
│   │   ├── Data_Generation.ipynb
│   │   ├── data_generation.py
│   │   ├── fonts
│   │   │   ├── aAccountantSignature.ttf
│   │   │   ├── Yellowtail-Regular.ttf
│   │   │   └── ylee Mortal Heart, Immortal Memory v.1.11 (TTF).ttf
│   │   ├── image_generation.py
│   │   ├── image_processing.py
│   │   ├── image_setup.py
│   │   ├── Pipeline_Test.ipynb
│   │   └── __pycache__
│   │       ├── image_generation.cpython-310.pyc
│   │       ├── image_generation.cpython-312.pyc
│   │       ├── image_generation.cpython-39.pyc
│   │       ├── image_processing.cpython-310.pyc
│   │       ├── image_processing.cpython-39.pyc
│   │       ├── image_setup.cpython-310.pyc
│   │       ├── image_setup.cpython-312.pyc
│   │       └── image_setup.cpython-39.pyc
│   ├── models
│   │   └── cnn-bilstm
│   │       ├── configs.py
│   │       ├── ctc_beam_search_decoder.py
│   │       ├── error_analysis.py
│   │       ├── inferenceModel.py
│   │       ├── model_architecture.txt
│   │       ├── model.py
│   │       ├── train_job.run
│   │       └── train_torch.py
│   └── utils
│       └── tests
└── VIT
    ├── architecture
    │   ├── main_scratch.ipynb
    │   ├── __pycache__
    │   │   └── transformer.cpython-310.pyc
    │   └── transformer.py
    ├── logs
    │   └── output_2024-12-19_0804.csv
    ├── main_pretrained.ipynb
    ├── saved
    ├── utils
    │   ├── config.py
    │   ├── datasets.py
    │   ├── model.py
    │   ├── plotting.py
    │   ├── __pycache__
    │   │   ├── config.cpython-310.pyc
    │   │   ├── datasets.cpython-310.pyc
    │   │   ├── model.cpython-310.pyc
    │   │   ├── plotting.cpython-310.pyc
    │   │   └── tokeniser.cpython-310.pyc
    │   └── tokeniser.py
    └── visualisation.ipynb

About

Repository containing the files for Project 2 "Towards Improving Handwritten Chess Notation Classification Using Machine Learning Models", course "Machine Learning", Fall 2024, EPFL

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.0%
  • Python 3.0%