Abstract
Due to the expanding requirements for data publishing and growing concerns regarding data privacy, the privacy-preserving data publishing (PPDP) problem has received considerable attention from research communities, industries, and governments. However, it is challenging to tackle the trade-off between privacy preservation and data quality maintenance in PPDP. In this paper, an information-driven genetic algorithm (ID-GA) is designed to achieve optimal anonymization based on attribute generalization and record suppression. In ID-GA, an information-driven crossover operator is designed to efficiently exchange information between different anonymization solutions; an information-driven mutation operator is proposed to promote information release during anonymization; a two-dimension selection operator is designed to identify the qualities of different anonymization solutions. Experimental results verify the advantages of ID-GA in terms of solution accuracy and convergence speed. Besides, the impacts of all the proposed components are verified.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ayyoubzadeh, S.M., Ayyoubzadeh, S.M., Zahedi, H., Ahmadi, M., Kalhori, S.R.N.: Predicting COVID-19 incidence through analysis of Google trends data in Iran: data mining and deep learning pilot study. JMIR Public Health Surveill. 6(2), e18828 (2020). https://doi.org/10.2196/18828
Bennett, J., Lanning, S.: The Netflix prize. In: Proceedings of KDD Cup and Workshop 2007, pp. 3–6 (2007)
Cheng, K., et al.: Secure k-NN query on encrypted cloud data with multiple keys. IEEE Trans. Big Data 7(4), 689–702 (2017). https://doi.org/10.1109/tbdata.2017.2707552
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4) (2010). https://doi.org/10.1145/1749603.1749605
Ge, Y.-F., Cao, J., Wang, H., Zhang, Y., Chen, Z.: Distributed differential evolution for anonymity-driven vertical fragmentation in outsourced data storage. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds.) WISE 2020. LNCS, vol. 12343, pp. 213–226. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62008-0_15
Ge, Y.F., Orlowska, M., Cao, J., Wang, H., Zhang, Y.: Knowledge transfer-based distributed differential evolution for dynamic database fragmentation. Knowl.-Based Syst. 229, 107325 (2021). https://doi.org/10.1016/j.knosys.2021.107325
Ge, Y.F., Orlowska, M., Cao, J., Wang, H., Zhang, Y.: MDDE: multitasking distributed differential evolution for privacy-preserving database fragmentation. VLDB J. (2022). https://doi.org/10.1007/s00778-021-00718-w
Ge, Y.F., et al.: Distributed memetic algorithm for outsourced database fragmentation. IEEE Trans. Cybern. 51(10), 4808–4821 (2021). https://doi.org/10.1109/tcyb.2020.3027962
Ge, Y.F., et al.: Distributed differential evolution based on adaptive mergence and split for large-scale optimization. IEEE Trans. Cybern. 48(7), 2166–2180 (2018). https://doi.org/10.1109/tcyb.2017.2728725
Ge, Y.F., Yu, W.J., Zhang, J.: Diversity-based multi-population differential evolution for large-scale optimization. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion. ACM (2016). https://doi.org/10.1145/2908961.2908995
Gong, D., Sun, J., Miao, Z.: A set-based genetic algorithm for interval many-objective optimization problems. IEEE Trans. Evol. Comput. 22(1), 47–60 (2018). https://doi.org/10.1109/tevc.2016.2634625
Kabir, M.E., Mahmood, A.N., Wang, H., Mustafa, A.K.: Microaggregation sorting framework for k-anonymity statistical disclosure control in cloud computing. IEEE Trans. Cloud Comput. 8(2), 408–417 (2020). https://doi.org/10.1109/tcc.2015.2469649
Kabir, M.E., Wang, H.: Conditional purpose based access control model for privacy protection. In: Proceedings of the Twentieth Australasian Conference on Australasian Database, pp. 135–142 (2009)
Kabir, M.E., Wang, H., Bertino, E.: A role-involved purpose-based access control model. Inf. Syst. Front. 14(3), 809–822 (2011). https://doi.org/10.1007/s10796-011-9305-1
Kifer, D., Machanavajjhala, A.: No free lunch in data privacy. In: Proceedings of the 2011 International Conference on Management of Data. ACM Press (2011). https://doi.org/10.1145/1989323.1989345
Kohlmayer, F., Prasser, F., Eckert, C., Kemper, A., Kuhn, K.A.: Flash: efficient, stable and optimal \(k\)-anonymity. In: 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing. IEEE (2012). https://doi.org/10.1109/socialcom-passat.2012.52
Lau, B.P.L., et al.: A survey of data fusion in smart city applications. Inf. Fusion 52, 357–374 (2019). https://doi.org/10.1016/j.inffus.2019.05.004
Li, J.Y., Zhan, Z.H., Wang, H., Zhang, J.: Data-driven evolutionary algorithm with perturbation-based ensemble surrogates. IEEE Trans. Cybern. 51(8), 3925–3937 (2021). https://doi.org/10.1109/tcyb.2020.3008280
Liu, C., Chen, S., Zhou, S., Guan, J., Ma, Y.: A novel privacy preserving method for data publication. Inf. Sci. 501, 421–435 (2019). https://doi.org/10.1016/j.ins.2019.06.022
Mahanan, W., Chaovalitwongse, W.A., Natwichai, J.: Data anonymization: a novel optimal k-anonymity algorithm for identical generalization hierarchy data in IoT. SOCA 14(2), 89–100 (2020). https://doi.org/10.1007/s11761-020-00287-w
Martin, K.D., Murphy, P.E.: The role of data privacy in marketing. J. Acad. Mark. Sci. 45(2), 135–155 (2016). https://doi.org/10.1007/s11747-016-0495-4
Mehmood, A., Natgunanathan, I., Xiang, Y., Hua, G., Guo, S.: Protection of big data privacy. IEEE Access 4, 1821–1834 (2016). https://doi.org/10.1109/access.2016.2558446
Mirjalili, S.: Evolutionary Algorithms and Neural Networks. SCI, vol. 780. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-93025-1
Romero, C., Ventura, S.: Educational data mining and learning analytics: an updated survey. WIREs Data Min. Knowl. Discov. 10(3) (2020). https://doi.org/10.1002/widm.1355
Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM Press (1998). https://doi.org/10.1145/275487.275508
Srinivas, M., Patnaik, L.: Genetic algorithms: a survey. Computer 27(6), 17–26 (1994). https://doi.org/10.1109/2.294849
Sun, L., Ma, J., Wang, H., Zhang, Y., Yong, J.: Cloud service description model: an extension of USDL for cloud services. IEEE Trans. Serv. Comput. 11(2), 354–368 (2018). https://doi.org/10.1109/tsc.2015.2474386
Sun, X., Wang, H., Li, J., Zhang, Y.: Satisfying privacy requirements before data anonymization. Comput. J. 55(4), 422–437 (2011). https://doi.org/10.1093/comjnl/bxr028
Sun, X., Li, M., Wang, H.: A family of enhanced (l, \(\alpha \))-diversity models for privacy preserving data publishing. Futur. Gener. Comput. Syst. 27(3), 348–356 (2011). https://doi.org/10.1016/j.future.2010.07.007
Sun, X., Li, M., Wang, H., Plank, A.: An efficient hash-based algorithm for minimal k-anonymity. In: Conferences in Research and Practice in Information Technology, vol. 74, pp. 101–107 (2008)
Sun, X., Wang, H., Li, J., Pei, J.: Publishing anonymous survey rating data. Data Min. Knowl. Disc. 23(3), 379–406 (2010). https://doi.org/10.1007/s10618-010-0208-4
Sun, X., Wang, H., Li, J., Zhang, Y.: Injecting purpose and trust into data anonymisation. Comput. Secur. 30(5), 332–345 (2011). https://doi.org/10.1016/j.cose.2011.05.005
Sun, Y., Xue, B., Zhang, M., Yen, G.G., Lv, J.: Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Trans. Cybern. 50(9), 3840–3854 (2020). https://doi.org/10.1109/tcyb.2020.2983860
Wang, H., Cao, J., Zhang, Y.: Ticket-based service access scheme for mobile users. Austral. Comput. Sci. Commun. 24(1), 285–292 (2002)
Wang, H., Sun, L.: Trust-involved access control in collaborative open social networks. In: 2010 Fourth International Conference on Network and System Security. IEEE (2010). https://doi.org/10.1109/nss.2010.13
Wang, H., Sun, L., Bertino, E.: Building access control policy model for privacy preserving and testing policy conflicting problems. J. Comput. Syst. Sci. 80(8), 1493–1503 (2014). https://doi.org/10.1016/j.jcss.2014.04.017
Wang, H., Wang, Y., Taleb, T., Jiang, X.: Editorial: special issue on security and privacy in network computing. World Wide Web 23(2), 951–957 (2019). https://doi.org/10.1007/s11280-019-00704-x
Wang, H., Zhang, Y., Cao, J., Varadharajan, V.: Achieving secure and flexible m-services through tickets. IEEE Trans. Syst. Man Cybern. - Part A: Syst. Hum. 33(6), 697–708 (2003). https://doi.org/10.1109/tsmca.2003.819917
Yang, J., et al.: Brief introduction of medical database and data mining technology in big data era. J. Evid. Based Med. 13(1), 57–69 (2020). https://doi.org/10.1111/jebm.12373
Zheng, X., Luo, G., Cai, Z.: A fair mechanism for private data publication in online social networks. IEEE Trans. Netw. Sci. Eng. 7(2), 880–891 (2020). https://doi.org/10.1109/tnse.2018.2801798
Zhou, M., et al.: Adaptive genetic algorithm-aided neural network with channel state information tensor decomposition for indoor localization. IEEE Trans. Evol. Comput. 25(5), 913–927 (2021). https://doi.org/10.1109/tevc.2021.3085906
Zhu, T., Li, G., Zhou, W., Yu, P.S.: Differentially private data publishing and analysis: a survey. IEEE Trans. Knowl. Data Eng. 29(8), 1619–1638 (2017). https://doi.org/10.1109/tkde.2017.2697856
Acknowledgements
This work was supported by The Major Key Project of PCL (Grant No. PCL2022A03, PCL2021A02, PCL2021A09).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ge, YF., Wang, H., Cao, J., Zhang, Y. (2022). An Information-Driven Genetic Algorithm for Privacy-Preserving Data Publishing. In: Chbeir, R., Huang, H., Silvestri, F., Manolopoulos, Y., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2022. WISE 2022. Lecture Notes in Computer Science, vol 13724. Springer, Cham. https://doi.org/10.1007/978-3-031-20891-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-20891-1_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20890-4
Online ISBN: 978-3-031-20891-1
eBook Packages: Computer ScienceComputer Science (R0)