This repository contains the source code and dataset used to analyze all GitHub repositories linked in scientific papers. The dataset was obtained by querying the Microsoft Academic Graph, which is licensed under ODC-By. Our analysis focuses on several dimensions related to these repositories and their associated papers.
Our analysis reveals that both the number of stars and forks across repositories follow a power-law distribution. Typically, only one author of the paper contributes to the associated repository. Most GitHub manuals are concise, often comprising only a few sentences. The majority of the source code is written in Python, and the papers linking to these repositories, along with their authors, predominantly belong to the AI field.
For more details, please refer to the following paper:
Michael Färber: "Analyzing the GitHub Repositories of Research Papers." Proceedings of the 2020 ACM/IEEE Joint Conference on Digital Libraries (JCDL'20), Xi'an, China, 2020. Link to paper.
Please cite this paper if you reference our work.
We would like to thank Erhan Metin for his valuable contributions to this research.