Skip to main content

Advertisement

Log in

Towards semantic versioning of open pre-trained language model releases on hugging face

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

The proliferation of open Pre-trained Language Models (PTLMs) on model registry platforms like Hugging Face (HF) presents both opportunities and challenges for companies building products around them. Similar to traditional software dependencies, PTLMs continue to evolve after a release. However, the current state of release practices of PTLMs on model registry platforms are plagued by a variety of inconsistencies, such as ambiguous naming conventions and inaccessible model training documentation. Given the knowledge gap on current PTLM release practices, our empirical study uses a mixed-methods approach to analyze the releases of 52,227 PTLMs on the most well-known model registry, HF. Our results reveal 148 different naming practices for PTLM releases, with 40.87% of changes to model weight files not represented in the adopted name-based versioning practice or their documentation. In addition, we identified that the 52,227 PTLMs are derived from only 299 different base models (the modified original models used to create 52,227 PTLMs), with Fine-tuning and Quantization being the most prevalent modification methods applied to these base models. Significant gaps in release transparency, in terms of training dataset specifications and model card availability, still exist, highlighting the need for standardized documentation. While we identified a model naming practice explicitly differentiating between major and minor PTLM releases, we did not find any significant difference in the types of changes that went into either type of releases, suggesting that major/minor version numbers for PTLMs often are chosen arbitrarily. Our findings provide valuable insights to improve PTLM release practices, nudging the field towards more formal semantic versioning practices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data Availability

The datasets generated and analyzed during this study are available in the replication package (Ajibode 2024).

Notes

  1. https://huggingface.co/super-cinnamon/fewshot-followup-multi-e5

  2. https://huggingface.co/datasets/gsareen07/llama-2-finetune

  3. https://huggingface.co/mixedbread-ai/mxbai-rerank-large-v1

  4. https://huggingface.co/eachadea/ggml-vicuna-7b-1.1

  5. https://huggingface.co/michellejieli/test_classifier

  6. https://huggingface.co/michellejieli/emotion_text_classifier

  7. https://huggingface.co/080-ai/flintlock_3B_v0.1b

  8. https://huggingface.co/080-ai/tiny-cutlass

  9. By “model name,” we refer to the repository name, such as roneneldan/TinyStories-1M, which differs from the base model name, such as BERT.

  10. https://huggingface.co/starmpcc/Asclepius-13B

  11. https://huggingface.co/starmpcc/Asclepius-7B

  12. https://huggingface.co/THUDM/agentlm-13b

  13. https://huggingface.co/models

  14. https://github.com/onnx/models

  15. https://pytorch.org/hub/

  16. https://modelzoo.co/

  17. http://app.modelhub.ai/

  18. https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2

  19. https://huggingface.co/michellejieli/NSFW_text_classifier

  20. https://huggingface.co/Vezora/Mistral-22B-v0.2

  21. https://huggingface.co/docs/huggingface_hub/package_reference/hf_api

  22. https://huggingface.co/docs/safetensors/index

  23. https://huggingface.co/meta-llama/Llama-2-7b

  24. https://www.surveymonkey.com/mp/sample-size-calculator/

  25. https://huggingface.co/mdhugol/indonesia-bert-sentiment-classification

  26. https://huggingface.co/AdamCodd/yolos-small-person

  27. https://huggingface.co/AdamCodd/donut-receipts-extract

  28. https://huggingface.co/AdamCodd/tinybert-sentiment-amazon

  29. https://huggingface.co/truemansquad/myllm

  30. https://huggingface.co/voidful/voidful/bart-distractor-generation-both

  31. https://huggingface.co/Voicelab/trurl-2-13b

  32. https://huggingface.co/lakshyasoni/my_awesome_qa_model

  33. https://huggingface.co/distilgpt2-emailtype-finetune

  34. https://huggingface.co/Haary/TinyLlama-1.1B-usk-v1

  35. https://huggingface.co/SQAI/distilroberta-base_finetune_v1.2

  36. https://huggingface.co/Shijia/furina_pan_loss_5e-06

  37. https://huggingface.co/youdiniplays/filipinolingo_translation

  38. https://huggingface.co/andikamandalaa/indobert-base-uncased-finetuned-indonlu-smsa

  39. https://huggingface.co/abiatarfestus/marian-finetuned-en_ng_bible-en-to-ng

  40. https://huggingface.co/Arkong/chatglm2-6b-torchkeras-2epoch-11-15

  41. https://huggingface.co/thrunlab/Mistral_Sparse_refined_web_relu_2024-03-01

  42. https://huggingface.co/docs/huggingface_hub/en/package_reference/hf_api

  43. https://huggingface.co/docs/transformers/en/main_classes/configuration

  44. https://huggingface.co/meta-llama/Llama-2-7b-chat

  45. https://huggingface.co/meta-llama/Llama-2-7b

  46. https://huggingface.co/rhaymison/cuscuz-7b

  47. https://huggingface.co/saicharan8/telugu-summarization-umt5-small

  48. https://huggingface.co/google/umt5-small

  49. https://huggingface.co/Salesforce/codegen2-16B_P

  50. https://huggingface.co/datasets/bigcode/the-stack-dedup

  51. https://huggingface.co/chihoonlee10/T3Q-Merge-SOLAR12

  52. https://huggingface.co/howey/electra-large-qqp

  53. https://huggingface.co/monologg/koelectra-base-finetuned-sentiment

  54. https://huggingface.co/msintaha/gpt2-finetuned-rocstories

  55. https://huggingface.co/TheDrummer/Llama-3SOME-8B-v1

  56. https://huggingface.co/TheDrummer/Llama-3SOME-8B-v2

  57. https://pypi.org/project/nltk/

  58. https://docs.python.org/3/library/difflib.html

  59. https://huggingface.co/nitky/Superswallow-70b-v0.2

  60. https://huggingface.co/nitky/Superswallow-70b-v0.3

  61. https://huggingface.co/Sandrro/text_to_function_v2

  62. https://huggingface.co/yacine-djm/binary_v4

  63. https://huggingface.co/lmsys/vicuna-33b-v1.3

  64. Semantic Versioning 2.0.0: https://semver.org

  65. https://en.wikipedia.org/wiki/Npm_left-pad_incident

  66. https://en.wikipedia.org/wiki/Therac-25

  67. https://www.linkedin.com/pulse/behind-closed-doors-decision-release-training-data-gpt-4-jatasra-kr4df

References

  • Abebe SL, Ali N, Hassan AE (2016) An empirical study of software release notes. Empir Soft Eng 21:1107–1142

    Article  MATH  Google Scholar 

  • Ahn D, Almaatouq A, Gulabani M, Hosanagar K (2024) Impact of model interpretability and outcome feedback on trust in ai. In Proceedings of the CHI conference on human factors in computing systems, pp 1–25

  • Ajibode A (2024) Wip-24: Towards semantic versioning of pre-trained language models. https://github.com/SAILResearch/wip-24-adekunle-lm-release

  • Akoglu H (2018) User’s guide to correlation coefficients. Turk J Emerg Med 18(3):91–93

    Article  MATH  Google Scholar 

  • Alcobaça E, Siqueira F, Rivolli A, Garcia LPF, Oliva JT, De Carvalho ACPLF (2020) Mfe: towards reproducible meta-feature extraction. J Mach Learn Res 21(111):1–5

    Google Scholar 

  • Ali S, Arcaini P, Pradhan D, Safdar SA, Yue T (2020) Quality indicators in search-based software engineering: an empirical evaluation. ACM Trans Softw Eng Methodol (TOSEM)29(2):1–29

  • Bender EM, Gebru T, McMillan-Major A, Shmitchell S (2021) On the dangers of stochastic parrots: can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp 610–623

  • Bhat A, Coursey A, Hu G, Li S, Nahar N, Zhou S, Kästner C, Guo JLC (2023) Aspirations and practice of ml model documentation: moving the needle with nudging and traceability. In Proceedings of the 2023 CHI conference on human factors in computing systems, pp 1–17

  • Bi T, Xia X, Lo D, Grundy J, Zimmermann T (2020) An empirical study of release note production and usage in practice. IEEE Tran Softw Eng 48(6):1834–1852

    Article  MATH  Google Scholar 

  • Bobrovskis S, Jurenoks A (2018) A survey of continuous integration, continuous delivery and continuos deployment. In BIR workshops, pp 314–322

  • Boslaugh S (2012) Statistics in a nutshell: A desktop quick reference. “O’Reilly Media, Inc.”

  • Campbell JL, Quincy C, Osserman J, Pedersen OK (2013) Coding in-depth semistructured interviews: problems of unitization and intercoder reliability and agreement. Sociol Methods Res 42(3):294–320

    Article  MathSciNet  Google Scholar 

  • Carvalho L, Seco JC (2021) Deep semantic versioning for evolution and variability. In Proceedings of the 23rd international symposium on principles and practice of declarative programming, pp 1–13

  • Castaño J, Martínez-Fernández S, Franch X, Bogner J (2023) Exploring the carbon footprint of hugging face’s ml models: a repository mining study. In 2023 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), IEEE, pp 1–12

  • Castaño J, Martínez-Fernández S, Franch X, Bogner J (2024) Analyzing the evolution and maintenance of ml models on hugging face. In 2024 IEEE/ACM 21st international conference on mining software repositories (MSR), IEEE, pp 607–618

  • Cocks K, Torgerson DJ (2013) Sample size calculations for pilot randomized trials: a confidence interval approach. J Clin Epidemiol 66(2):197–201

    Article  MATH  Google Scholar 

  • Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2019) Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116

  • Crisan A, Drouhard M, Vig J, Rajani N (2022) Interactive model cards: a human-centered approach to model documentation. In Proceedings of the 2022 ACM conference on fairness, accountability, and transparency, pp 427–439

  • Decan A, Mens T (2019) What do package dependencies tell us about semantic versioning? IEEE Tran Softw Eng 47(6):1226–1240

    Article  Google Scholar 

  • Decan A, Mens T, Claes M, Grosjean P (2016) When github meets cran: an analysis of inter-repository package dependency problems. In 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 1. IEEE, pp 493–504

  • Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L (2024) Qlora: efficient finetuning of quantized llms. Adv Neural Inf Process Syst 36

  • Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert:Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  • Ding N, Qin Y, Yang G, Wei F, Yang Z, Su Y, Hu S, Chen Y, Chan CM, Chen W et al (2023) (2023) Parameter-efficient fine-tuning of large-scale pre-trained language models. Nat Mach Intell 5(3):220–235

    Article  MATH  Google Scholar 

  • Domínguez-Álvarez D, Gorla A (2019) Release practices for ios and Android apps. In Proceedings of the 3rd ACM SIGSOFT international workshop on app market analytics, pp 15–18

  • Eldan R, Li Y (2023) Tinystories:How small can language models be and still speak coherent english? arXiv preprint arXiv:2305.07759

  • Gong Y, Liu G, Xue Y, Li R, Meng L (2023) A survey on dataset quality in machine learning. Inf Softw Technol, pp 107268

  • Gresta R, Durelli V, Cirilo E (2021) Naming practices in java projects: an empirical study. In Proceedings of the XX Brazilian symposium on software quality, pp 1–10

  • Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, De Laroussilhe Q, Gesmundo A, Attariyan M, Gelly S (2019) Parameter-efficient transfer learning for nlp. In International conference on machine learning, PMLR, pp 2790–2799

    Google Scholar 

  • Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 2018

  • Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard A, Adam H, Kalenichenko D (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2704–2713

  • Jiang AQ, Sablayrolles A, Mensch A, Bamford C, Chaplot DS, de las Casas D, Bressand F, Lengyel G, Lample G, Saulnier L et al (2023a) Mistral 7b. arXiv preprint arXiv:2310.06825

  • Jiang W, Cheung C, Kim M, Kim H, Thiruvathukal GK, Davis JC (2024a) Naming practices of pre-trained models in hugging face

  • Jiang W, Cheung C, Thiruvathukal GK, Davis JC (2023b) Exploring naming conventions (and defects) of pre-trained deep learning models in hugging face and other model hubs. arXiv preprint arXiv:2310.01642

  • Jiang W, Synovic N, Hyatt M, Schorlemmer TR, Sethi R, Lu YH, Thiruvathukal GK, Davis JC (2023c) An empirical study of pre-trained model reuse in the hugging face deep learning model registry. arXiv preprint arXiv:2303.02552

  • Jiang W, Synovic N, Sethi R, Indarapu A, Hyatt M, Schorlemmer TR, Thiruvathukal GK, Davis JC (2022) An empirical study of artifacts and security risks in the pre-trained model supply chain. In Proceedings of the 2022 ACM workshop on software supply chain offensive research and ecosystem defenses, pp 105–114

  • Jiang W, Yasmin J, Jones J, Synovic N, Kuo J, Bielanski N, Tian Y, Thiruvathukal GK, Davis JC (2024b) Peatmoss:A dataset and initial analysis of pre-trained models in open-source software. arXiv preprint arXiv:2402.00699

  • Jones J, Jiang W, Synovic N, Thiruvathukal G, Davis J (2024) What do we know about hugging face? a systematic literature review and quantitative validation of qualitative claims. In Proceedings of the 18th ACM/IEEE international symposium on empirical software engineering and measurement, pp 13–24

  • Kandpal N, Wallace E, Raffel C (2022) Deduplicating training data mitigates privacy risks in language models. In international conference on machine learning, PMLR, pp 10697–10707

    MATH  Google Scholar 

  • Kathikar A, Nair A, Lazarine B, Sachdeva A, Samtani S (2023) Assessing the vulnerabilities of the open-source artificial intelligence (ai) landscape: a large-scale analysis of the hugging face platform. In 2023 IEEE international conference on intelligence and security informatics (ISI), IEEE, pp 1–6

  • Kerzazi N, Adams B (2016) Who needs release and devops engineers, and why? In Proceedings of the international workshop on continuous software evolution and delivery, pp 77–83

  • Khomh F, Dhaliwal T, Zou Y, Adams B (2012) Do faster releases improve software quality? an empirical case study of mozilla Firefox. In 2012 9th IEEE working conference on mining software repositories (MSR), IEEE, pp 179–188

  • Kinahan S, Saidi P, Daliri A, Liss J, Berisha V (2024) Achieving reproducibility in eeg-based machine learning. In The 2024 ACM conference on fairness, accountability, and transparency, pp 1464–1474

  • Kirk HR, Jun Y, Volpin F, Iqbal H, Benussi E, Dreyer F, Shtedritski A, Asano Y (2021) Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models. Adv Neural Inf Process Syst 34:2611–2624

    Google Scholar 

  • Lam P, Dietrich J, Pearce DJ (2020) Putting the semantics into semantic versioning. In Proceedings of the 2020 ACM SIGPLAN international symposium on new ideas, new paradigms, and reflections on programming and software, pp 157–179

  • Laukkanen E, Itkonen J, Lassenius C (2017) Problems, causes and solutions when adopting continuous delivery—a systematic literature review. Inf Softw Technol 82:55–79

    Article  MATH  Google Scholar 

  • Lawrie D, Morrell C, Feild H, Binkley D (2007) Effective identifier names for comprehension and memory. Innov Syst Softw Eng 3:303–318

    Article  MATH  Google Scholar 

  • Liu H, Tam D, Muqeeth M, Mohta J, Huang T, Bansal M, Raffel CA (2022) Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Adv Neural Inf Process Syst 35:1950–1965

    Google Scholar 

  • Liu Y, Chen C, Zhang R, Qin T, Ji X, Lin H, Yang M (2020) Enhancing the interoperability between deep learning frameworks by model conversion. In Proceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 1320–1330

  • Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

  • Loomes MJ, Nehaniv CL, Wernick P (2005) The naming of systems and software evolvability. In IEEE international workshop on software evolvability (Software-Evolvability’05), IEEE, pp 23–28

  • Mao HH (2020) A survey on self-supervised pre-training for sequential transfer learning in neural networks. arXiv preprint arXiv:2007.00800

  • Martin J. Fine-tuning and deployment. LinkedIn,. https://www.linkedin.com/pulse/fine-tuning-deployment-dr-john-martin-yvqyf

  • Michlmayr M, Hunt F, Probert D (2007) Release management in free software projects: practices and problems. In open source development, adoption and innovation: IFIP working group 2.13 on open source software, June 11–14, 2007, Limerick, Ireland 3, Springer, pp 295–300

  • Min S, Seo M, Hajishirzi H (2017) Question answering through transfer learning from large fine-grained supervision data. arXiv preprint arXiv:1702.02171

  • Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L, Hutchinson B, Spitzer E, Raji ID, Gebru T (2019) Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency, pp 220–229

  • Nayebi M, Adams B, Ruhe G (2016) Release practices for mobile apps–what do users and developers think? In 2016 ieee 23rd international conference on software analysis, evolution, and reengineering (saner), vol 1. IEEE, pp 552–562

  • Novakouski M, Lewis G, Anderson W, Davenport J (2012) Best practices for artifact versioning in service-oriented systems. Software Engineering Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, Technical Note CMU/SEI-2011-TN-009

  • Osborne C, Ding J, Kirk HR (2024) The ai community building the future? a quantitative analysis of development activity on hugging face hub. J Comput Soc Sci 7(2):2067–2105

    Article  Google Scholar 

  • Osborne C, Ding J, Kirk HR (2024) The ai community building the future? a quantitative analysis of development activity on hugging face hub. J Comput Soc Sci 7(2):2067–2105

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn:Machine learning in python. J Mach Learn Res 12:2825–2830

  • Pérez J, Díaz J, Garcia-Martin J, Tabuenca B (2020) Systematic literature reviews in software engineering—enhancement of the study selection process using cohen’s kappa statistic. J Syst Softw 168:110657

    Article  Google Scholar 

  • Pérez J, Díaz J, Garcia-Martin J, Tabuenca B (2020) Systematic literature reviews in software engineering—enhancement of the study selection process using cohen’s kappa statistic. J Syst Softw 168:110657

  • R OpenAI. Gpt-4 technical report. View in Article 2:13. arxiv arXiv:2303.08774

  • Raemaekers S, van Deursen A, Visser J (2017) Semantic versioning and impact of breaking changes in the maven repository. J Syst Softw 129:140–158

    Article  MATH  Google Scholar 

  • Raemaekers S, van Deursen A, Visser J (2017) Semantic versioning and impact of breaking changes in the maven repository. J Syst Softw 129:140–158

  • Saldana J (2015) The Coding Manual for Qualitative Researchers. Sage Publications

  • Saldana J (2015) The Coding Manual for Qualitative Researchers. Sage Publications

    MATH  Google Scholar 

  • Sarzynska-Wawer J, Wawer A, Pawlak A, Szymanowska J, Stefaniak I, Jarkiewicz M, Okruszek L (2021) Detecting formal thought disorder by deep contextualized word representations. Psychiatry Res 304:114135

    Article  Google Scholar 

  • Seacord RC, Hissam SA, Wallnau KC (1998) Agora: a search engine for software components. IEEE Internet Comput 2 (6):62

  • Seacord RC, Hissam SA, Wallnau KC (1998) Agora: a search engine for software components. IEEE Internet Comput 2(6):62

    Article  MATH  Google Scholar 

  • Shahin M, Babar MA, Zhu L (2017) Continuous integration, delivery and deployment:a systematic review on approaches, tools, challenges and practices. IEEE access 5:3909–3943

    Article  MATH  Google Scholar 

  • Shahin M, Babar MA, Zhu L (2017) Continuous integration, delivery and deployment:a systematic review on approaches, tools, challenges and practices. IEEE access 5:3909–3943

  • Singh AS, Masuku MB (2014) Sampling techniques & determination of sample size in applied statistics research:An overview. Int J Econ Commer Manag 2(11):1–22

  • Singh AS, Masuku MB (2014) Sampling techniques & determination of sample size in applied statistics research: An overview. Int J Econ Commer Manag 2(11):1–22

    MATH  Google Scholar 

  • Stuckenholz A (2005) Component evolution and versioning state of the art. ACM SIGSOFT Softw Eng Notes 30(1):7

  • Stuckenholz A (2005) Component evolution and versioning state of the art. ACM SIGSOFT Softw Eng Notes 30(1):7

    Article  Google Scholar 

  • Sun S, Cheng Y, Gan Z, Liu J (2019) Patient knowledge distillation for bert model compression. arXiv preprint arXiv:1908.09355

  • Taraghi M, Dorcelus G, Foundjem A, Tambon F, Khomh F (2024) Deep learning model reuse in the huggingface community: challenges, benefit and trends. arXiv preprint arXiv:2401.13177

  • Team G, Mesnard T, Hardin C, Dadashi R, Bhupatiraju S, Pathak S, Sifre L, Rivière M, Kale MS, Love J et al (2024) Gemma: open models based on gemini research and technology. arXiv preprint arXiv:2403.08295

  • Toma TR, Bezemer CP (2024) An exploratory study of dataset and model management in open source machine learning applications

  • Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F et al (2023) Llama:Open and efficient foundation language models. arXiv preprint arXiv:2302.13971

  • Turri V, Morrison K, Robinson KM, Abidi C, Perer A, Forlizzi J, Dzombak R (2024) Transparency in the wild:Navigating transparency in a deployed ai system to broaden need-finding approaches. In The 2024 ACM conference on fairness, accountability, and transparency, pp 1494–1514

  • Vieira SM, Kaymak U, Sousa JMC (2010) Cohen’s kappa coefficient as a performance measure for feature selection. In International conference on fuzzy systems, IEEE, pp 1–8

    MATH  Google Scholar 

  • Wadhwani A, Jain P (2020) Machine learning model cards transparency review: using model card toolkit. In 2020 IEEE pune section international conference (PuneCon), IEEE, pp 133–137

  • Wang H, Li J, Wu H, Hovy E, Sun Y (2022) Pre-trained language models and their applications. Eng

  • Williams LL, Quave K (2019) Chapter 10–tests of proportions:chi-square, likelihood ratio, fisher’s exact test. Quantitative anthropology, pp 123–41

  • Wood JR, Wood LE (2008) Card sorting:current practices and beyond. J Usability Stud 4(1):1–6

    MATH  Google Scholar 

  • Wortsman M, Ilharco G, Kim JW, Li M, Kornblith S, Roelofs R, Lopes RG, Hajishirzi H, Farhadi A, Namkoong H et al (2022) Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7959–7971

  • Xia B, Bi T, Xing Z, Lu Q, Zhu L (2023) An empirical study on software bill of materials:Where we stand and the road ahead. In 2023 IEEE/ACM 45th international conference on software engineering (ICSE), IEEE, pp 2630–2642

  • Xiu M, Jiang ZMJ, Adams B (2023) An exploratory study of machine learning model stores. IEEE Softw 38(1):114–122

    Article  MATH  Google Scholar 

  • Xu T, Zhou Y (2015) Systems approaches to tackling configuration errors: a survey. ACM Comput Surv (CSUR) 47(4):1–41

    Article  MathSciNet  MATH  Google Scholar 

  • Yang L, Zhang H, Shen H, Huang X, Zhou X, Rong G, Shao D (2021) Quality assessment in systematic literature reviews: a software engineering perspective. Inf Softw Technol 130:106397

    Article  Google Scholar 

  • Yang Z, Shi J, Lo D (2024) Ecosystem of large language models for code. arXiv preprint arXiv:2405.16746

  • Yin Z, Ma X, Zheng J, Zhou Y, Bairavasundaram LN, Pasupathy S (2011) An empirical study on configuration errors in commercial and open source systems. In Proceedings of the 23rd ACM symposium on operating systems principles, pp 159–172

  • Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, Min Y, Zhang B, Zhang J, Dong Z et al (2023) A survey of large language models. arXiv preprint arXiv:2303.18223

  • Zhu M, Gupta S (2017) To prune, or not to prune:exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878

Download references

Funding

This research was supported by the NSERC Discovery Grant.

Author information

Authors and Affiliations

Authors

Contributions

−Adekunle Ajibode: Conceptualization, Data Collection, Methodology, Data Analysis, Writing - Original Draft. − Abdul Ali Bangash: Methodology, Data Validation, Writing - Review & Editing. − Filipe Roseiro Cogo: Methodology, Writing - Review & Editing. − Bram Adams: Supervision, Writing - Review & Editing, Conceptual Guidance, Research Direction. − Ahmed E. Hassan: Supervision, Research Direction.

Corresponding author

Correspondence to Adekunle Ajibode.

Ethics declarations

Conflicts of Interests/Competing Interests

The authors declare that they have no known competing interests or personal relationships that could have (appeared to) influenced the work reported in this article.

Ethical Approval

This study does not involve human participants or animals.

Informed Consent

No human subjects were involved in this study.

Additional information

Communicated by: Markus Borg.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ajibode, A., Bangash, A.A., Cogo, F.R. et al. Towards semantic versioning of open pre-trained language model releases on hugging face. Empir Software Eng 30, 78 (2025). https://doi.org/10.1007/s10664-025-10631-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-025-10631-3

Keywords