Abstract
Microaggregation is a well-known family of statistical disclosure control methods, that can also be used to achieve the k-anonymity privacy model and some of its extensions. Microaggregation can be viewed as a clustering problem where clusters must include at least k elements. In this paper, we present a new microaggregation heuristic based on Lloyd’s clustering algorithm that causes much less information loss than the other microaggregation heuristics in the literature. Our empirical work consistently observes this superior performance for all minimum cluster sizes k and data sets tried.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare SDC methods for protection of numerical microdata. European Project IST-2000-25069 CASC (2002). http://neon.vb.cbs.nl/casc/CASCtestsets.htm
Chang, C.C., Li, Y.C., Huang, W.H.: TFRP: an efficient microaggregation algorithm for statistical disclosure control. J. Syst. Softw. 80(11), 1866–1878 (2007)
Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J.M., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15(4), 355–369 (2006)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)
Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. (Ny) 242, 35–48 (2013)
Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Comput. Math. Appl. 55, 714–732 (2008)
Domingo-Ferrer, J., Soria-Comas, J.: Steered microaggregation: a unified primitive for anonymization of data sets and data streams. In: IEEE International Conference on Data Mining Workshops, ICDMW, pp. 995–1002. New Orleans (2017)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Discov. 11, 195–212 (2005)
Hansen, S.L., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15(4), 1043–1044 (2003)
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)
Lin, J.L., Wen, T.H., Hsieh, J.C., Chang, P.C.: Density-based microaggregation for statistical disclosure control. Expert Syst. Appl. 37(4), 3256–3263 (2010)
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Mortazavi, R., Jalili, S., Gohargazi, H.: Multivariate microaggregation by iterative optimization. Appl. Intell. 39, 529–544 (2013)
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. UN Econ. Comm. Eur. 18, 345–354 (2001)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI International (1998)
Solanas, A., Martínez-Ballesté, A.: V-MDAV: a multivariate microaggregation with variable group size. In: Proceedings in Computational Statistics, pp. 917–926 (2006)
Soria-Comas, J., Domingo-Ferrer, J.: Differentially private data publishing via optimal univariate microaggregation and record perturbation. Knowl.-Based Syst. 153, 78–90 (2018)
Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J. 23, 771–794 (2014)
Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: t-Closeness through microaggregation: strict privacy with enhanced utility preservation. In: 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016. pp. 1464–1465 (2016)
Acknowledgments and disclaimer
The following funding sources are gratefully acknowledged: European Commission (project H2020-700540 “CANVAS”), Government of Catalonia (ICREA Acadèmia Prize to J. Domingo-Ferrer and grant 2017 SGR 705) and Spanish Government (project RTI2018-095094-B-C21 “Consent”). The views in this paper are the authors’ own and do not necessarily reflect the views of UNESCO or any of the funders.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Soria-Comas, J., Domingo-Ferrer, J., Mulero, R. (2019). Efficient Near-Optimal Variable-Size Microaggregation. In: Torra, V., Narukawa, Y., Pasi, G., Viviani, M. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2019. Lecture Notes in Computer Science(), vol 11676. Springer, Cham. https://doi.org/10.1007/978-3-030-26773-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-26773-5_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26772-8
Online ISBN: 978-3-030-26773-5
eBook Packages: Computer ScienceComputer Science (R0)