- Python 3 (>=3.6)
- DuckDB
- Neo4j
- PyTorch
- Pandas
- Numpy
- Scikit-learn
- Scipy
- Docker Compose
Install the necessary packages:
pip install -r requirements.txt
For CUDA support:
CUDACXX=/usr/local/cuda-12/bin/nvcc CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install pandas numpy scikit-learn scipy implicit
Start the databases:
Neo4j:
cd db/neo4j
docker compose up -d
Please note that the Neo4j database is started with the default password neo4j
. The password is changed after the first run.
Postgres:
cd db/postgres
docker compose up -d
Update the variables inside the run.sh and config/env.sh files to point to the correct directories.
(Note that the initial download will take some time.)
./run.sh download_amazon_reviews
Filter the dataset to retain users and items with at least 15 interactions :
./run.sh k_core_filtering
./run.sh last_out_split
./run.sh timestamp_split
./run.sh random_split
./run.sh load_kcore_ratings_duckdb
This takes some time as the metadata and reviews are downloaded in huggingface format and then loaded into DuckDB.
./run.sh load_metadata_reviews_duckdb
./run.sh load_metadata_neo4j
./run.sh extract_relationships --relationship SIMILAR_TO_BOOK --max_batches 10
./run.sh evaluate_relationships --relationship SIMILAR_TO_BOOK
./run.sh update_kg --relationship SIMILAR_TO_BOOK
Train the BPRMF model:
./run.sh train_bprmf
To resume training from a pre-trained model:
./run.sh train_bprmf_pretrained
./run.sh train_kgat
To use pre-trained embeddings:
./run.sh train_kgat_pretrained
rm -rf db/duckdb/amazon-reviews.db
cd db/neo4j
sudo reset.sh
cd db/postgres
sudo reset.sh