A Data-Driven Study of Prediction Methods for Coronary Heart Disease

He, Xu; Fan, Xindi; Zheng, Wanxi; Ti, Ziming; Li, Chunshan; Zhang, Hua; Zhou, Xuequan

doi:10.1007/978-981-99-4402-6_32

Xu He⁸,
Xindi Fan⁹,
Wanxi Zheng⁹,
Ziming Ti¹⁰,
Chunshan Li⁸,
Hua Zhang⁸ &
…
Xuequan Zhou¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1844))

Included in the following conference series:

International Conference on Service Science

567 Accesses

Abstract

Coronary heart disease (CHD) is a globally recognised, highly prevalent disease with a high risk of death and a low cure rate. The World Health Organization estimates that deaths from heart disease will reach 23 million by 2030. Therefore, it is imperative to find a fast and effective method for early diagnosis in order to provide patients with early intervention and improve the effectiveness of treatment. With the in-depth development of machine learning, the function of data analysis and prediction will efficiently help doctors to make a preliminary cluster for a large number of people and detect those who have a dangerous rate of developing coronary heart disease. In this paper, three data pre-processing methods, Smote, Borderline Smote and K-means Smote, were used to construct a risk prediction model for coronary heart disease (CHD) based on an unbalanced data set, combined with four algorithms, Logistic Regression, Random Forest, KNN and SVM. After analysing the data characteristics and adjusting the parameters, different combinations of these methods were compared and a better classification method was selected to predict CHD, achieving higher accuracy, precision, AUC and f1 score. Overall, through experiments, the random oversampling and SMOTE methods can effectively solve the data imbalance problem in most cases.Our final training accuracy could be up to 99%, and the testing accuracy could reach 93%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Xiaomei, L.: How much do you know about the dangers of coronary heart disease and its treatment? Health All 557(24), 24–25 (2021)
Google Scholar
World health statistics 2022: monitoring health for the SDGs, sustainable development goals. World Health Organization, Geneva (2022). Licence: CC BY-NC-SA 3.0 IGO
Google Scholar
Chang, S.: Research on the application of machine learning algorithm in coronary heart disease prediction. Guilin University of Technology (2021). https://doi.org/10.27050/d.cnki.gglgc.2021.000200
Zhu, Y., Wu, J., Fang, Y.: Application of SVM in the classification and prediction of coronary heart disease. J. Biomed. Eng. 30(06), 1180–1185 (2013)
Google Scholar
Jianxin, C., Guangcheng, X., Wei, W., et al.: Comparison of data mining classification algorithms for clinical applications in coronary heart disease. Beijing Biomed. Eng. 03, 249–252 (2008)
Google Scholar
Li, J., Xiang, F.: Identification of risk factors for coronary heart disease and its prediction model construction. Chin. J. Med. Libr. Inf. 29(06), 7–13 (2020)
Google Scholar
Md Idris, N., et al.: Feature selection and risk prediction for patients with coronary artery disease using data mining. Med. Biol. Eng. Comput. 58(12), 3123–3140 (2020)
Article Google Scholar
Arabasadi, Z., et al.: Computer aided decision making for heart disease detection using hybrid neural network-genetic algorithm. Comput. Methods Prog. Biomed. 141, 19–26 (2017)
Article Google Scholar
Krittanawong, C., et al.: Machine learning prediction in cardiovascular diseases: a meta-analysis. Sci. Rep. 10(1), 16057–16057 (2020)
Article Google Scholar
Li, Z.R.: Principles and applications of logistic regression methods. China Strat. Emerg. Ind. 112(28), 114–115 (2017). https://doi.org/10.19474/j.cnki.10-1156/f.001686
Hosmer Jr, D. W., et al.: Applied Logistic Regression. Wiley Online Library, Hoboken (2013). https://doi.org/10.1002/9781118548387
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Menon, A. K.: Large-scale support vector machines: algorithms and theory (2009)
Google Scholar
Xu, Z., et al.: A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data. Inf. Sci. 572, 574–589 (2021)
Article MathSciNet Google Scholar
Lee, S.J., et al.: A novel bagging C4.5 algorithm based on wrapper feature selection for supporting wise clinical decision making. J. Biomed. Inf. 78, 144–155 (2018)
Google Scholar
Kavakiotis, I., et al.: Machine learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 15, 104–116 (2017)
Article Google Scholar
Chen, J., et al.: A disease diagnosis and treatment recommendation system based on big data mining and cloud computing. Inf. Sci. 435, 124–149 (2018)
Article Google Scholar
Itani, S., et al.: Specifics of medical data mining for diagnosis aid: a survey. Expert Syst. Appl. 118, 300–314 (2019)
Article Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9, 1263–1284 (2008)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article MATH Google Scholar
Han, H., et al.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, ICIC 2005: Advances in Intelligent Computing, pp. 878–887 (2005)
Google Scholar
Last, F., Douzas, G., Bacao, F.: Oversampling for imbalanced learning based on k-means and smote (2018). https://doi.org/10.1016/j.ins.2018.06.056

Download references

Acknowledgement

This work was supported by the National Key Research and Development Program of China (No.2022YFF0903100)

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Weihai, China
Xu He, Chunshan Li & Hua Zhang
School of Science, Harbin Institute of Technology, Weihai, China
Xindi Fan & Wanxi Zheng
School of Information Science and Engineering, Harbin Institute of Technology, Weihai, China
Ziming Ti
Research Center of Intelligent Computing for Enterprises & Services, Harbin Institute of Technology, Harbin, China
Xuequan Zhou

Authors

Xu He
View author publications
You can also search for this author in PubMed Google Scholar
Xindi Fan
View author publications
You can also search for this author in PubMed Google Scholar
Wanxi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ziming Ti
View author publications
You can also search for this author in PubMed Google Scholar
Chunshan Li
View author publications
You can also search for this author in PubMed Google Scholar
Hua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuequan Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Chunshan Li , Hua Zhang or Xuequan Zhou .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Harbin, China
Zhongjie Wang
Beijing University of Posts and Telecommunications, Beijing, China
Shangguang Wang
Harbin Institute of Technology, Harbin, China
Hanchuan Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, X. et al. (2023). A Data-Driven Study of Prediction Methods for Coronary Heart Disease. In: Wang, Z., Wang, S., Xu, H. (eds) Service Science. ICSS 2023. Communications in Computer and Information Science, vol 1844. Springer, Singapore. https://doi.org/10.1007/978-981-99-4402-6_32

Download citation

DOI: https://doi.org/10.1007/978-981-99-4402-6_32
Published: 27 July 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4401-9
Online ISBN: 978-981-99-4402-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Data-Driven Study of Prediction Methods for Coronary Heart Disease