Skip to main content

Knowledge-Based Nonlinear to Linear Dataset Transformation for Chronic Illness Classification

  • Conference paper
  • First Online:
Health Information Science (HIS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14305))

Included in the following conference series:

Abstract

Nonlinear patterns are challenging to interpret, validate, and are resource-intensive for deep learning (DL) and machine learning (ML) algorithms to predict chronic illness. Transformation of nonlinear features to a linear representation enables the human understanding of AI results and traditional and proven ML algorithms. We propose the counts of terms cross-checked against the chapters of the International Classification of Disease (ICD) to replace the raw representation of key nonlinear variables in health surveys to improve the chronic illness classification performance. The specific selection of nonlinear keywords viz. Male, Female, Diabetes, Cancer, Obese, Overweight, Smoked, Cigarettes, and Sugar from a health survey, transformed into a purely linear and scaled set of features propels the Multinomial Naive Bayes (MNB) algorithm to outperform standard dataset preparation and feature selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.who.int/standards/classifications/classification-of-diseases.

  2. 2.

    https://www.cdc.gov/brfss/annual_data/annual_2020.html.

  3. 3.

    https://seer.cancer.gov/statfacts/html/lungb.html.

  4. 4.

    https://www.census.gov.

  5. 5.

    https://github.com/mjaworsky/LinearVarTransformation.

  6. 6.

    https://docs.google.com/spreadsheets/d/1QJLlhiMgUEo8EL4kXfb0khfjIHmO7X8jMTy0O9_XeZ0/edit?usp=sharing.

References

  1. Akram, T., et al.: A multilevel features selection framework for skin lesion classification. Hum.-Centric Comput. Inf. Sci. 10, 1–26 (2020)

    Article  Google Scholar 

  2. Bitew, F.H., Nyarko, S.H., Potter, L., Sparks, C.S.: Machine learning approach for predicting under-five mortality determinants in Ethiopia: evidence from the 2016 Ethiopian demographic and health survey. Genus 76(1), 1–16 (2020)

    Article  Google Scholar 

  3. Chen, I.Y., Agrawal, M., Horng, S., Sontag, D.: Robustly extracting medical knowledge from EHRs: a case study of learning a health knowledge graph. In: Proceedings of the 2020 Pacific Symposium on BioComputing, pp. 19–30. World Scientific (2019)

    Google Scholar 

  4. De Meulder, B., et al.: A computational framework for complex disease stratification from multiple large-scale datasets. BMC Syst. Biol. 12(1), 1–23 (2018). https://doi.org/10.1186/s12918-018-0556-z

    Article  Google Scholar 

  5. Deng, F., et al.: Predict multicategory causes of death in lung cancer patients using clinicopathologic factors. Comput. Biol. Med. 129, 104161 (2021). https://doi.org/10.1016/j.compbiomed.2020.104161

    Article  Google Scholar 

  6. Dokeroglu, T., Deniz, A., Kiziloz, H.E.: A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing (2022)

    Google Scholar 

  7. Fan, S.K.S., Hsu, C.Y., Jen, C.H., Chen, K.L., Juan, L.T.: Defective wafer detection using a denoising autoencoder for semiconductor manufacturing processes. Adv. Eng. Inf. 46, 101166 (2020). https://doi.org/10.1016/j.aei.2020.101166

    Article  Google Scholar 

  8. Georgakopoulos, S.V., Tasoulis, S.K., Vrahatis, A.G., Plagianakos, V.P.: Convolutional neural networks for toxic comment classification. In: Proceedings of the 10th Hellenic Conference on Artificial Intelligence, pp. 1–6 (2018)

    Google Scholar 

  9. Hamilton, W., Green, T., Martins, T., Elliott, K., Rubin, G., Macleod, U.: Evaluation of risk assessment tools for suspected cancer in general practice: a cohort study. Br. J. Gener. Pract. 63(606), e30–e36 (2013)

    Article  Google Scholar 

  10. Izmailov, P., Kirichenko, P., Gruver, N., Wilson, A.G.: On feature learning in the presence of spurious correlations. Adv. Neural. Inf. Process. Syst. 35, 38516–38532 (2022)

    Google Scholar 

  11. Jaworsky, M., Tao, X., Yong, J., Pan, L., Zhang, J., Pokhrel, S.: Automated knowledge graph construction for healthcare domain. In: Traina, A., Wang, H., Zhang, Y., Siuly, S., Zhou, R., Chen, L. (eds.) HIS 2022. LNCS, pp. 258–265. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20627-6_24

    Chapter  Google Scholar 

  12. Jing, X.Y., et al.: Multiset feature learning for highly imbalanced data classification. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 139–156 (2019)

    Article  Google Scholar 

  13. Khan, S.M., Chowdhury, M., Ngo, L.B., Apon, A.: Multi-class twitter data categorization and geocoding with a novel computing framework. Cities 96, 102410 (2020). https://doi.org/10.1016/j.cities.2019.102410

    Article  Google Scholar 

  14. de Koning, H.J., et al.: Reduced lung-cancer mortality with volume CT screening in a randomized trial. New England J. Med. 382(6), 503–513 (2020)

    Article  Google Scholar 

  15. Létinier, L., et al.: Artificial intelligence for unstructured healthcare data: application to coding of patient reporting of adverse drug reactions. Clin. Pharmacol. Ther. 110(2), 392–400 (2021)

    Article  Google Scholar 

  16. Liu, M., Xu, C., Luo, Y., Xu, C., Wen, Y., Tao, D.: Cost-sensitive feature selection via f-measure optimization reduction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)

    Google Scholar 

  17. Mosqueira-Rey, E., Hernández-Pereira, E., Alonso-Ríos, D., Bobes-Bascarán, J., Fernández-Leal, Á.: Human-in-the-loop machine learning: a state of the art. Artif. Intell. Rev., 1–50 (2022)

    Google Scholar 

  18. Nakano, F.K., Pliakos, K., Vens, C.: Deep tree-ensembles for multi-output prediction. Pattern Recognit. 121, 108211 (2022). https://doi.org/10.1016/j.patcog.2021.108211

    Article  Google Scholar 

  19. Pandey, D., Wang, H., Yin, X., Wang, K., Zhang, Y., Shen, J.: Automatic breast lesion segmentation in phase preserved DCE-MRIs. Health Inf. Sci. Syst. 10(1), 9 (2022)

    Article  Google Scholar 

  20. Pes, B.: Learning from high-dimensional and class-imbalanced datasets using random forests. Information 12(8), 286 (2021)

    Article  Google Scholar 

  21. Pham, T., Tao, X., Zhang, J., Yong, J.: Constructing a knowledge-based heterogeneous information graph for medical health status classification. Health Inf. Sci. Syst. 8, 1–14 (2020)

    Article  Google Scholar 

  22. Pham, T., Tao, X., Zhang, J., Yong, J., Li, Y., Xie, H.: Graph-based multi-label disease prediction model learning from medical data and domain knowledge. Knowl.-Based Syst. 235, 107662 (2022)

    Article  Google Scholar 

  23. Prashanth, R., Roy, S.D.: Novel and improved stage estimation in Parkinson’s disease using clinical scales and machine learning. Neurocomputing 305, 78–103 (2018)

    Article  Google Scholar 

  24. Rehman, O., Al-Busaidi, A.M., Ahmed, S., Ahsan, K.: Ubiquitous healthcare system: architecture, prototype design and experimental evaluations. EAI Endorsed Trans. Scalable Inf. Syst. 9(4), e6–e6 (2022)

    Google Scholar 

  25. Ricciardi, C., et al.: Linear discriminant analysis and principal component analysis to predict coronary artery disease. Health Inf. J. 26(3), 2181–2192 (2020)

    Article  Google Scholar 

  26. Sarki, R., Ahmed, K., Wang, H., Zhang, Y., Wang, K.: Convolutional neural network for multi-class classification of diabetic eye disease. EAI Endorsed Trans. Scalable Inf. Syst. 9(4), e5–e5 (2022)

    Google Scholar 

  27. Seo, W., Park, M., Kim, D.W., Lee, J.: Effective memetic algorithm for multilabel feature selection using hybridization-based communication. Expert Syst. Appl. 201, 117064 (2022)

    Article  Google Scholar 

  28. Soria, D., Garibaldi, J.M., Ambrogi, F., Biganzoli, E.M., Ellis, I.O.: A ‘non-parametric’ version of the Naive Bayes classifier. Knowl.-Based Syst. 24(6), 775–784 (2011)

    Article  Google Scholar 

  29. Tao, X., Pham, T., Zhang, J., Yong, J., Goh, W.P., Zhang, W., Cai, Y.: Mining health knowledge graph for health risk prediction. World Wide Web 23(4), 2341–2362 (2020)

    Article  Google Scholar 

  30. Vedsted, P., Olesen, F.: A differentiated approach to referrals from general practice to support early cancer diagnosis-the Danish three-legged strategy. Br. J. Cancer 112(1), S65–S69 (2015)

    Article  Google Scholar 

  31. Washington, P., et al.: Challenges and opportunities for machine learning classification of behavior and mental state from images. arXiv preprint arXiv:2201.11197 (2022)

  32. Xu, D., Shi, Y., Tsang, I.W., Ong, Y.S., Gong, C., Shen, X.: Survey on multi-output learning. IEEE Trans. Neural Netw. Learn. Syst. 31(7), 2409–2429 (2019). https://doi.org/10.1109/TNNLS.2019.2945133

    Article  MathSciNet  Google Scholar 

  33. Yager, R.R.: An extension of the Naive Bayesian classifier. Inf. Sci. 176(5), 577–588 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markian Jaworsky .

Editor information

Editors and Affiliations

Ethics declarations

Declarations

–The work is conducted with approval from the Human Research Ethics Committee of the University of Southern Queensland, Australia (Approval ID: H21REA222)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jaworsky, M., Tao, X., Yong, J., Pan, L., Zhang, J., Pokhrel, S.R. (2023). Knowledge-Based Nonlinear to Linear Dataset Transformation for Chronic Illness Classification. In: Li, Y., Huang, Z., Sharma, M., Chen, L., Zhou, R. (eds) Health Information Science. HIS 2023. Lecture Notes in Computer Science, vol 14305. Springer, Singapore. https://doi.org/10.1007/978-981-99-7108-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7108-4_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7107-7

  • Online ISBN: 978-981-99-7108-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics