Abstract
Exploitation time is an essential factor for vulnerability assessment in cybersecurity management. In this work, we propose an integrated consecutive batch learning framework to predict the probable exploitation time of vulnerabilities. To achieve a better performance, we combine features extracted from both vulnerability descriptions and the Common Vulnerability Scoring System in the proposed framework. In particular, we design an Adaptive Sliding Window Weighted Learning (ASWWL) algorithm to tackle the dynamic multiclass imbalance problem existing in many industrial applications including exploitation time prediction. A series of experiments are carried out on a real-world dataset, containing 24,413 exploited vulnerabilities disclosed between 1990 and 2020. Experimental results demonstrate the proposed ASWWL algorithm can significantly enhance the performance of the minority classes without compromising the performance of the majority class. Besides, the proposed framework achieves the most robust and state-of-the-art performance compared with the other five consecutive batch learning algorithms.







Similar content being viewed by others
References
Afzaliseresht, N., Miao, Y., Michalska, S., Liu, Q., Wang, H.: From logs to stories: human-centred data mining for cyber threat intelligence. IEEE Access 8, 19089–19099 (2020)
Alazab, M., Tang, M.: Deep Learning Applications for Cyber Security. Springer, Berlin (2019)
Anwar, M.M., Liu, C., Li, J.: Discovering and tracking query oriented active online social groups in dynamic information network. World Wide Web 22(4), 1819–1854 (2019)
Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: International Symposium on Intelligent Data Analysis, pp 249–260. Springer (2009)
Bozorgi, M., Saul, L.K., Savage, S., Voelker, G.M.: Beyond heuristics: learning to classify vulnerabilities and predict exploits. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 105–114. ACM (2010)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)
Du, J., Michalska, S., Subramani, S., Wang, H., Zhang, Y.: Neural attention with character embeddings for hay fever detection from twitter. Health Inf. Sci. Sys. 7(1), 1–7 (2019)
Edkrantz, M., Said, A.: Predicting cyber vulnerability exploits with machine learning. In: SCAI, pp 48–57 (2015)
Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Neural Netw. 22(10), 1517–1531 (2011)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 97–106 (2001)
Islam, M.R., Kabir, M.A., Ahmed, A., Kamal, A.R.M., Wang, H., Ulhaq, A.: Depression detection from social network data using machine learning techniques. Health Inf. Sci. Sys. 6(1), 1–12 (2018)
Jacobs, J., Romanosky, S., Adjerid, I., Baker, W.: Improving vulnerability remediation through better exploit prediction. J. Cybersec. 6(1), tyaa015 (2020)
Jacobs, J., Romanosky, S., Edwards, B., Roytman, M., Adjerid, I.: Exploit prediction scoring system (epss). arXiv:1908.04856 (2019)
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: an ensemble method for drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)
Kosina, P., Gama, J.: Very fast decision rules for classification in data streams. Data Min. Knowl. Disc. 29(1), 168–202 (2015)
Li, H., Wang, Y., Wang, H., Zhou, B.: Multi-window based ensemble learning for classification of imbalanced streaming data. World Wide Web 20(6), 1507–1525 (2017)
Li, M., Sun, X., Wang, H., Zhang, Y., Zhang, J.: Privacy-aware access control with trust management in web service. World Wide Web 14(4), 407–430 (2011)
Li, Z., Wang, X., Li, J., Zhang, Q.: Deep attributed network representation learning of complex coupling and interaction. Knowl.-Based Syst. 212, 106618 (2021)
Losing, V., Hammer, B., Wersing, H.: Knn classifier with self adjusting memory for heterogeneous concept drift. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp 291–300 (2016)
Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: a multi-output streaming framework. J. Mach. Learn. Res. 19(72), 1–5 (2018). http://jmlr.org/papers/v19/18-251.html
Rasool, R.U., Ashraf, U., Ahmed, K., Wang, H., Rafique, W., Anwar, Z.: Cyberpulse: a machine learning based link flooding attack mitigation system for software defined networks. IEEE Access 7, 34885–34899 (2019)
Sarki, R., Ahmed, K., Wang, H., Zhang, Y.: Automated detection of mild and multi-class diabetic eye diseases using deep learning. Health Inf. Sci. Sys. 8(1), 1–9 (2020)
Shen, Y., Zhang, T., Wang, Y., Wang, H., Jiang, X.: Microthings: a generic iot architecture for flexible data aggregation and scalable service cooperation. IEEE Commun. Mag. 55(9), 86–93 (2017)
Tang, M., Yin, J., Alazab, M., Cao, J.C., Luo, Y.: Modelling of extreme vulnerability disclosure in smart city industrial environments. IEEE Trans. Indust. Inf., pp. 1–1 (2020)
Tang, M., Alazab, M., Luo, Y.: Big data for cybersecurity: vulnerability disclosure trends and dependencies. IEEE Trans. Big Data 5(3), 317–329 (2019)
Tavabi, N., Goyal, P., Almukaynizi, M., Shakarian, P., Lerman, K.: Darkembed: exploit prediction with neural language models. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Vimalachandran, P., Liu, H., Lin, Y., Ji, K., Wang, H., Zhang, Y.: Improving accessibility of the australian my health records while preserving privacy and security of the system. Health Inf. Sci. Sys. 8(1), 1–9 (2020)
Wang, H., Sun, L., Bertino, E.: Building access control policy model for privacy preserving and testing policy conflicting problems. J. Comput. Syst. Sci. 80(8), 1493–1503 (2014)
Wang, H., Wang, Y., Taleb, T., Jiang, X.: Special issue on security and privacy in network computing. World Wide Web 23(2), 951–957 (2020)
Wang, H., Yi, X., Bertino, E., Sun, L.: Protecting outsourced data in cloud computing through access management. Concur. Comput. Pract. Exp. 28 (3), 600–615 (2016)
Wang, S., Minku, L.L., Yao, X.: A learning framework for online class imbalance learning. In: 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL), pp 36–45 (2013)
Wang, S., Minku, L.L., Yao, X.: Dealing with multiple classes in online class imbalance learning. In: IJCAI, pp 2118–2124 (2016)
Yi, X., Zhang, Y.: Privacy-preserving distributed association rule mining via semi-trusted mixer. Data Knowl Eng 63(2), 550–567 (2007)
Yin, J., Cao, J., Siuly, S., Wang, H.: An integrated mci detection framework based on spectral-temporal analysis. Int. J. Autom. Comput. 16(6), 786–799 (2019)
Yin, J., Tang, M., Cao, J., Wang, H.: Apply transfer learning to cybersecurity: predicting exploitability of vulnerabilities by description. Knowl-Based Sys., pp. 106529. https://doi.org/10.1016/j.knosys.2020.106529 (2020)
Yin, J., Tang, M., Cao, J., Wang, H., You, M., Lin, Y.: Adaptive online learning for vulnerability exploitation time prediction. In: Web Information Systems Engineering – WISE 2020, pp 252–266. Springer (2020)
Yin, J., You, M., Cao, J., Wang, H., Tang, M., Ge, Y.F.: Data-driven hierarchical neural network modeling for high-pressure feedwater heater group. In: Australasian Database Conference, pp 225–233. Springer (2020)
Zhang, F., Wang, Y., Liu, S., Wang, H.: Decision-based evasion attacks on tree ensemble classifiers. World Wide Web 23(5), 2957–2977 (2020)
Zhang, J., Li, H., Liu, X., Luo, Y., Chen, F., Wang, H., Chang, L.: On efficient and robust anonymization for privacy protection on massive streaming categorical information. IEEE Trans Depend Sec Comput 14(5), 507–520 (2015)
Zhang, J., Tao, X., Wang, H.: Outlier detection from large distributed databases. World Wide Web 17(4), 539–568 (2014)
Acknowledgements
The first author was partially supported by the Research Program of Chongqing University of Arts and Sciences, China (Grant No. P2020RG08) and the Natural Science Foundation of Chongqing, China (Grant No. cstc2019jcyj-msxmX034).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Web Information Systems Engineering 2020
Guest Editors: Hua Wang, Zhisheng Huang, and Wouter Beek
Rights and permissions
About this article
Cite this article
Yin, J., Tang, M., Cao, J. et al. Vulnerability exploitation time prediction: an integrated framework for dynamic imbalanced learning. World Wide Web 25, 401–423 (2022). https://doi.org/10.1007/s11280-021-00909-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-021-00909-z