Abstract
Nonlinear patterns are challenging to interpret, validate, and are resource-intensive for deep learning (DL) and machine learning (ML) algorithms to predict chronic illness. Transformation of nonlinear features to a linear representation enables the human understanding of AI results and traditional and proven ML algorithms. We propose the counts of terms cross-checked against the chapters of the International Classification of Disease (ICD) to replace the raw representation of key nonlinear variables in health surveys to improve the chronic illness classification performance. The specific selection of nonlinear keywords viz. Male, Female, Diabetes, Cancer, Obese, Overweight, Smoked, Cigarettes, and Sugar from a health survey, transformed into a purely linear and scaled set of features propels the Multinomial Naive Bayes (MNB) algorithm to outperform standard dataset preparation and feature selection methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
References
Akram, T., et al.: A multilevel features selection framework for skin lesion classification. Hum.-Centric Comput. Inf. Sci. 10, 1–26 (2020)
Bitew, F.H., Nyarko, S.H., Potter, L., Sparks, C.S.: Machine learning approach for predicting under-five mortality determinants in Ethiopia: evidence from the 2016 Ethiopian demographic and health survey. Genus 76(1), 1–16 (2020)
Chen, I.Y., Agrawal, M., Horng, S., Sontag, D.: Robustly extracting medical knowledge from EHRs: a case study of learning a health knowledge graph. In: Proceedings of the 2020 Pacific Symposium on BioComputing, pp. 19–30. World Scientific (2019)
De Meulder, B., et al.: A computational framework for complex disease stratification from multiple large-scale datasets. BMC Syst. Biol. 12(1), 1–23 (2018). https://doi.org/10.1186/s12918-018-0556-z
Deng, F., et al.: Predict multicategory causes of death in lung cancer patients using clinicopathologic factors. Comput. Biol. Med. 129, 104161 (2021). https://doi.org/10.1016/j.compbiomed.2020.104161
Dokeroglu, T., Deniz, A., Kiziloz, H.E.: A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing (2022)
Fan, S.K.S., Hsu, C.Y., Jen, C.H., Chen, K.L., Juan, L.T.: Defective wafer detection using a denoising autoencoder for semiconductor manufacturing processes. Adv. Eng. Inf. 46, 101166 (2020). https://doi.org/10.1016/j.aei.2020.101166
Georgakopoulos, S.V., Tasoulis, S.K., Vrahatis, A.G., Plagianakos, V.P.: Convolutional neural networks for toxic comment classification. In: Proceedings of the 10th Hellenic Conference on Artificial Intelligence, pp. 1–6 (2018)
Hamilton, W., Green, T., Martins, T., Elliott, K., Rubin, G., Macleod, U.: Evaluation of risk assessment tools for suspected cancer in general practice: a cohort study. Br. J. Gener. Pract. 63(606), e30–e36 (2013)
Izmailov, P., Kirichenko, P., Gruver, N., Wilson, A.G.: On feature learning in the presence of spurious correlations. Adv. Neural. Inf. Process. Syst. 35, 38516–38532 (2022)
Jaworsky, M., Tao, X., Yong, J., Pan, L., Zhang, J., Pokhrel, S.: Automated knowledge graph construction for healthcare domain. In: Traina, A., Wang, H., Zhang, Y., Siuly, S., Zhou, R., Chen, L. (eds.) HIS 2022. LNCS, pp. 258–265. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20627-6_24
Jing, X.Y., et al.: Multiset feature learning for highly imbalanced data classification. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 139–156 (2019)
Khan, S.M., Chowdhury, M., Ngo, L.B., Apon, A.: Multi-class twitter data categorization and geocoding with a novel computing framework. Cities 96, 102410 (2020). https://doi.org/10.1016/j.cities.2019.102410
de Koning, H.J., et al.: Reduced lung-cancer mortality with volume CT screening in a randomized trial. New England J. Med. 382(6), 503–513 (2020)
Létinier, L., et al.: Artificial intelligence for unstructured healthcare data: application to coding of patient reporting of adverse drug reactions. Clin. Pharmacol. Ther. 110(2), 392–400 (2021)
Liu, M., Xu, C., Luo, Y., Xu, C., Wen, Y., Tao, D.: Cost-sensitive feature selection via f-measure optimization reduction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Mosqueira-Rey, E., Hernández-Pereira, E., Alonso-Ríos, D., Bobes-Bascarán, J., Fernández-Leal, Á.: Human-in-the-loop machine learning: a state of the art. Artif. Intell. Rev., 1–50 (2022)
Nakano, F.K., Pliakos, K., Vens, C.: Deep tree-ensembles for multi-output prediction. Pattern Recognit. 121, 108211 (2022). https://doi.org/10.1016/j.patcog.2021.108211
Pandey, D., Wang, H., Yin, X., Wang, K., Zhang, Y., Shen, J.: Automatic breast lesion segmentation in phase preserved DCE-MRIs. Health Inf. Sci. Syst. 10(1), 9 (2022)
Pes, B.: Learning from high-dimensional and class-imbalanced datasets using random forests. Information 12(8), 286 (2021)
Pham, T., Tao, X., Zhang, J., Yong, J.: Constructing a knowledge-based heterogeneous information graph for medical health status classification. Health Inf. Sci. Syst. 8, 1–14 (2020)
Pham, T., Tao, X., Zhang, J., Yong, J., Li, Y., Xie, H.: Graph-based multi-label disease prediction model learning from medical data and domain knowledge. Knowl.-Based Syst. 235, 107662 (2022)
Prashanth, R., Roy, S.D.: Novel and improved stage estimation in Parkinson’s disease using clinical scales and machine learning. Neurocomputing 305, 78–103 (2018)
Rehman, O., Al-Busaidi, A.M., Ahmed, S., Ahsan, K.: Ubiquitous healthcare system: architecture, prototype design and experimental evaluations. EAI Endorsed Trans. Scalable Inf. Syst. 9(4), e6–e6 (2022)
Ricciardi, C., et al.: Linear discriminant analysis and principal component analysis to predict coronary artery disease. Health Inf. J. 26(3), 2181–2192 (2020)
Sarki, R., Ahmed, K., Wang, H., Zhang, Y., Wang, K.: Convolutional neural network for multi-class classification of diabetic eye disease. EAI Endorsed Trans. Scalable Inf. Syst. 9(4), e5–e5 (2022)
Seo, W., Park, M., Kim, D.W., Lee, J.: Effective memetic algorithm for multilabel feature selection using hybridization-based communication. Expert Syst. Appl. 201, 117064 (2022)
Soria, D., Garibaldi, J.M., Ambrogi, F., Biganzoli, E.M., Ellis, I.O.: A ‘non-parametric’ version of the Naive Bayes classifier. Knowl.-Based Syst. 24(6), 775–784 (2011)
Tao, X., Pham, T., Zhang, J., Yong, J., Goh, W.P., Zhang, W., Cai, Y.: Mining health knowledge graph for health risk prediction. World Wide Web 23(4), 2341–2362 (2020)
Vedsted, P., Olesen, F.: A differentiated approach to referrals from general practice to support early cancer diagnosis-the Danish three-legged strategy. Br. J. Cancer 112(1), S65–S69 (2015)
Washington, P., et al.: Challenges and opportunities for machine learning classification of behavior and mental state from images. arXiv preprint arXiv:2201.11197 (2022)
Xu, D., Shi, Y., Tsang, I.W., Ong, Y.S., Gong, C., Shen, X.: Survey on multi-output learning. IEEE Trans. Neural Netw. Learn. Syst. 31(7), 2409–2429 (2019). https://doi.org/10.1109/TNNLS.2019.2945133
Yager, R.R.: An extension of the Naive Bayesian classifier. Inf. Sci. 176(5), 577–588 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Declarations
–The work is conducted with approval from the Human Research Ethics Committee of the University of Southern Queensland, Australia (Approval ID: H21REA222)
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jaworsky, M., Tao, X., Yong, J., Pan, L., Zhang, J., Pokhrel, S.R. (2023). Knowledge-Based Nonlinear to Linear Dataset Transformation for Chronic Illness Classification. In: Li, Y., Huang, Z., Sharma, M., Chen, L., Zhou, R. (eds) Health Information Science. HIS 2023. Lecture Notes in Computer Science, vol 14305. Springer, Singapore. https://doi.org/10.1007/978-981-99-7108-4_10
Download citation
DOI: https://doi.org/10.1007/978-981-99-7108-4_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7107-7
Online ISBN: 978-981-99-7108-4
eBook Packages: Computer ScienceComputer Science (R0)