Knowledge-Based Nonlinear to Linear Dataset Transformation for Chronic Illness Classification

Jaworsky, Markian; Tao, Xiaohui; Yong, Jianming; Pan, Lei; Zhang, Ji; Pokhrel, Shiva Raj

doi:10.1007/978-981-99-7108-4_10

Markian Jaworsky¹²,
Xiaohui Tao¹²,
Jianming Yong¹³,
Lei Pan¹⁴,
Ji Zhang¹² &
…
Shiva Raj Pokhrel¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14305))

Included in the following conference series:

International Conference on Health Information Science

Abstract

Nonlinear patterns are challenging to interpret, validate, and are resource-intensive for deep learning (DL) and machine learning (ML) algorithms to predict chronic illness. Transformation of nonlinear features to a linear representation enables the human understanding of AI results and traditional and proven ML algorithms. We propose the counts of terms cross-checked against the chapters of the International Classification of Disease (ICD) to replace the raw representation of key nonlinear variables in health surveys to improve the chronic illness classification performance. The specific selection of nonlinear keywords viz. Male, Female, Diabetes, Cancer, Obese, Overweight, Smoked, Cigarettes, and Sugar from a health survey, transformed into a purely linear and scaled set of features propels the Multinomial Naive Bayes (MNB) algorithm to outperform standard dataset preparation and feature selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A comprehensive review for chronic disease prediction using machine learning algorithms

Article Open access 16 July 2024

Deep Learning to Improve Heart Disease Risk Prediction

COVID-19 health data prediction: a critical evaluation of CNN-based approaches

Article Open access 17 March 2025

Notes

References

Akram, T., et al.: A multilevel features selection framework for skin lesion classification. Hum.-Centric Comput. Inf. Sci. 10, 1–26 (2020)
Article Google Scholar
Bitew, F.H., Nyarko, S.H., Potter, L., Sparks, C.S.: Machine learning approach for predicting under-five mortality determinants in Ethiopia: evidence from the 2016 Ethiopian demographic and health survey. Genus 76(1), 1–16 (2020)
Article Google Scholar
Chen, I.Y., Agrawal, M., Horng, S., Sontag, D.: Robustly extracting medical knowledge from EHRs: a case study of learning a health knowledge graph. In: Proceedings of the 2020 Pacific Symposium on BioComputing, pp. 19–30. World Scientific (2019)
Google Scholar
De Meulder, B., et al.: A computational framework for complex disease stratification from multiple large-scale datasets. BMC Syst. Biol. 12(1), 1–23 (2018). https://doi.org/10.1186/s12918-018-0556-z
Article Google Scholar
Deng, F., et al.: Predict multicategory causes of death in lung cancer patients using clinicopathologic factors. Comput. Biol. Med. 129, 104161 (2021). https://doi.org/10.1016/j.compbiomed.2020.104161
Article Google Scholar
Dokeroglu, T., Deniz, A., Kiziloz, H.E.: A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing (2022)
Google Scholar
Fan, S.K.S., Hsu, C.Y., Jen, C.H., Chen, K.L., Juan, L.T.: Defective wafer detection using a denoising autoencoder for semiconductor manufacturing processes. Adv. Eng. Inf. 46, 101166 (2020). https://doi.org/10.1016/j.aei.2020.101166
Article Google Scholar
Georgakopoulos, S.V., Tasoulis, S.K., Vrahatis, A.G., Plagianakos, V.P.: Convolutional neural networks for toxic comment classification. In: Proceedings of the 10th Hellenic Conference on Artificial Intelligence, pp. 1–6 (2018)
Google Scholar
Hamilton, W., Green, T., Martins, T., Elliott, K., Rubin, G., Macleod, U.: Evaluation of risk assessment tools for suspected cancer in general practice: a cohort study. Br. J. Gener. Pract. 63(606), e30–e36 (2013)
Article Google Scholar
Izmailov, P., Kirichenko, P., Gruver, N., Wilson, A.G.: On feature learning in the presence of spurious correlations. Adv. Neural. Inf. Process. Syst. 35, 38516–38532 (2022)
Google Scholar
Jaworsky, M., Tao, X., Yong, J., Pan, L., Zhang, J., Pokhrel, S.: Automated knowledge graph construction for healthcare domain. In: Traina, A., Wang, H., Zhang, Y., Siuly, S., Zhou, R., Chen, L. (eds.) HIS 2022. LNCS, pp. 258–265. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20627-6_24
Chapter Google Scholar
Jing, X.Y., et al.: Multiset feature learning for highly imbalanced data classification. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 139–156 (2019)
Article Google Scholar
Khan, S.M., Chowdhury, M., Ngo, L.B., Apon, A.: Multi-class twitter data categorization and geocoding with a novel computing framework. Cities 96, 102410 (2020). https://doi.org/10.1016/j.cities.2019.102410
Article Google Scholar
de Koning, H.J., et al.: Reduced lung-cancer mortality with volume CT screening in a randomized trial. New England J. Med. 382(6), 503–513 (2020)
Article Google Scholar
Létinier, L., et al.: Artificial intelligence for unstructured healthcare data: application to coding of patient reporting of adverse drug reactions. Clin. Pharmacol. Ther. 110(2), 392–400 (2021)
Article Google Scholar
Liu, M., Xu, C., Luo, Y., Xu, C., Wen, Y., Tao, D.: Cost-sensitive feature selection via f-measure optimization reduction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Google Scholar
Mosqueira-Rey, E., Hernández-Pereira, E., Alonso-Ríos, D., Bobes-Bascarán, J., Fernández-Leal, Á.: Human-in-the-loop machine learning: a state of the art. Artif. Intell. Rev., 1–50 (2022)
Google Scholar
Nakano, F.K., Pliakos, K., Vens, C.: Deep tree-ensembles for multi-output prediction. Pattern Recognit. 121, 108211 (2022). https://doi.org/10.1016/j.patcog.2021.108211
Article Google Scholar
Pandey, D., Wang, H., Yin, X., Wang, K., Zhang, Y., Shen, J.: Automatic breast lesion segmentation in phase preserved DCE-MRIs. Health Inf. Sci. Syst. 10(1), 9 (2022)
Article Google Scholar
Pes, B.: Learning from high-dimensional and class-imbalanced datasets using random forests. Information 12(8), 286 (2021)
Article Google Scholar
Pham, T., Tao, X., Zhang, J., Yong, J.: Constructing a knowledge-based heterogeneous information graph for medical health status classification. Health Inf. Sci. Syst. 8, 1–14 (2020)
Article Google Scholar
Pham, T., Tao, X., Zhang, J., Yong, J., Li, Y., Xie, H.: Graph-based multi-label disease prediction model learning from medical data and domain knowledge. Knowl.-Based Syst. 235, 107662 (2022)
Article Google Scholar
Prashanth, R., Roy, S.D.: Novel and improved stage estimation in Parkinson’s disease using clinical scales and machine learning. Neurocomputing 305, 78–103 (2018)
Article Google Scholar
Rehman, O., Al-Busaidi, A.M., Ahmed, S., Ahsan, K.: Ubiquitous healthcare system: architecture, prototype design and experimental evaluations. EAI Endorsed Trans. Scalable Inf. Syst. 9(4), e6–e6 (2022)
Google Scholar
Ricciardi, C., et al.: Linear discriminant analysis and principal component analysis to predict coronary artery disease. Health Inf. J. 26(3), 2181–2192 (2020)
Article Google Scholar
Sarki, R., Ahmed, K., Wang, H., Zhang, Y., Wang, K.: Convolutional neural network for multi-class classification of diabetic eye disease. EAI Endorsed Trans. Scalable Inf. Syst. 9(4), e5–e5 (2022)
Google Scholar
Seo, W., Park, M., Kim, D.W., Lee, J.: Effective memetic algorithm for multilabel feature selection using hybridization-based communication. Expert Syst. Appl. 201, 117064 (2022)
Article Google Scholar
Soria, D., Garibaldi, J.M., Ambrogi, F., Biganzoli, E.M., Ellis, I.O.: A ‘non-parametric’ version of the Naive Bayes classifier. Knowl.-Based Syst. 24(6), 775–784 (2011)
Article Google Scholar
Tao, X., Pham, T., Zhang, J., Yong, J., Goh, W.P., Zhang, W., Cai, Y.: Mining health knowledge graph for health risk prediction. World Wide Web 23(4), 2341–2362 (2020)
Article Google Scholar
Vedsted, P., Olesen, F.: A differentiated approach to referrals from general practice to support early cancer diagnosis-the Danish three-legged strategy. Br. J. Cancer 112(1), S65–S69 (2015)
Article Google Scholar
Washington, P., et al.: Challenges and opportunities for machine learning classification of behavior and mental state from images. arXiv preprint arXiv:2201.11197 (2022)
Xu, D., Shi, Y., Tsang, I.W., Ong, Y.S., Gong, C., Shen, X.: Survey on multi-output learning. IEEE Trans. Neural Netw. Learn. Syst. 31(7), 2409–2429 (2019). https://doi.org/10.1109/TNNLS.2019.2945133
Article MathSciNet Google Scholar
Yager, R.R.: An extension of the Naive Bayesian classifier. Inf. Sci. 176(5), 577–588 (2006)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics, Physics, and Computing, University of Southern Queensland, Toowoomba, Australia
Markian Jaworsky, Xiaohui Tao & Ji Zhang
School of Business, University of Southern Queensland, Springfield, Australia
Jianming Yong
School of Information Technology, Deakin University, Waurn Ponds, Geelong, Australia
Lei Pan & Shiva Raj Pokhrel

Authors

Markian Jaworsky
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Tao
View author publications
You can also search for this author in PubMed Google Scholar
Jianming Yong
View author publications
You can also search for this author in PubMed Google Scholar
Lei Pan
View author publications
You can also search for this author in PubMed Google Scholar
Ji Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shiva Raj Pokhrel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markian Jaworsky .

Editor information

Editors and Affiliations

University of Southern Queensland, Darling Heights, Australia
Yan Li
Vrije University, Amsterdam, The Netherlands
Zhisheng Huang
DAV University Jalandhar, Jalandhar, Punjab, India
Manik Sharma
Swinburne University of Technology, Hawthorn, VIC, Australia
Lu Chen
Swinburne University of Technology, Hawthorn, VIC, Australia
Rui Zhou

Ethics declarations

Declarations

–The work is conducted with approval from the Human Research Ethics Committee of the University of Southern Queensland, Australia (Approval ID: H21REA222)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jaworsky, M., Tao, X., Yong, J., Pan, L., Zhang, J., Pokhrel, S.R. (2023). Knowledge-Based Nonlinear to Linear Dataset Transformation for Chronic Illness Classification. In: Li, Y., Huang, Z., Sharma, M., Chen, L., Zhou, R. (eds) Health Information Science. HIS 2023. Lecture Notes in Computer Science, vol 14305. Springer, Singapore. https://doi.org/10.1007/978-981-99-7108-4_10

Download citation

DOI: https://doi.org/10.1007/978-981-99-7108-4_10
Published: 11 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7107-7
Online ISBN: 978-981-99-7108-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Knowledge-Based Nonlinear to Linear Dataset Transformation for Chronic Illness Classification