Abstract
We study the challenges of protecting privacy of individuals in the large public survey rating data in this paper. Recent study shows that personal information in supposedly anonymous movie rating records are de-identified. The survey rating data usually contains both ratings of sensitive and non-sensitive issues. The ratings of sensitive issues involve personal privacy. Even though the survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. None of the existing anonymisation principles (e.g., k-anonymity, l-diversity, etc.) can effectively prevent such breaches in large survey rating data sets. We tackle the problem by defining a principle called \({(k,\epsilon)}\)-anonymity model to protect privacy. Intuitively, the principle requires that, for each transaction t in the given survey rating data T, at least (k − 1) other transactions in T must have ratings similar to t, where the similarity is controlled by \({\epsilon}\) . The \({(k,\epsilon)}\) -anonymity model is formulated by its graphical representation and a specific graph-anonymisation problem is studied by adopting graph modification with graph theory. Various cases are analyzed and methods are developed to make the updated graph meet \({(k,\epsilon)}\) requirements. The methods are applied to two real-life data sets to demonstrate their efficiency and practical utility.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aggarwal C (2005) On k-anonymity and the curse of dimensionality. In: VLDB, pp 901–909
Atzori M, Bonchi F, Giannotti F, Pedreschi D (2005a) Blocking anonymity threats raised by frequent itemset mining. In: ICDM, pp 561–564
Atzori M, Bonchi F, Giannotti F, Pedreschi D (2005b) k-anonymous patterns. In: PKDD, pp 10–21
Atzori M, Bonchi F, Giannotti F, Pedreschi D (2008) Anonymity preserving pattern discovery. VLDB J 17(4): 703–727
Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymisation. In: ICDE, pp 217–228
Frankowski D, Cosley D, Sen S, Terveen LG, Riedl J (2006) You are what you say: privacy risks of public mentions. In: SIGIR, pp 565–572
Fung BC, Wang K, Yu PS (2005) Top-down specialization for information and privacy preservation. In: ICDE, pp 205–216
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of \({\mathcal{NP}}\) -completeness. Freeman, San Francisco
Ghinita G, Tao Y, Kalnis P (2008) On the anonymisation of sparse high-dimensional data. In: Proceedings of international conference on data engineering (ICDE), April, pp 715–724
Hafner K (2006) And if you liked the movie, a Netflix contest may reward you handsomely. New York Times, Oct 2
Hamming RW (1980) Coding and information theory. Prentice Hall, Englewood Cliffs
Hansell S (2006) AOL removes search data on vast group of web users. New York Times, Aug 8
He Y, Naughton J (2009) Anonymization of set-valued data via top-down, local generalization. In: VLDB 2009: proceedings of the thirtieth international conference on very large data bases. VLDB endowment
Iyengar V (2002) Transforming data to satisfy privacy constraints. In: SIGKDD, pp 279–288
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24: 881–892
Kifer D, Gehrke J (2006) Injecting utility into anonymized datasets. In: SIGMOD conference, pp 217–228
LeFevre K, DeWitt D, Ramakrishnan R (2006a) Mondrian multidimensional k-anonymity. In: ICDE, pp 25–25
LeFevre K, DeWitt DJ, Ramakrishnan R (2006b) Workload-aware anonymisation. In: KDD, pp 277–286
Li T, Li N (2009) On the tradeoff between privacy and utility in data publishing. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), pp 517–526
Li N, Li T, Venkatasubramanian S (2007) t-Closeness: privacy beyond k-anonymity and l-diversity. In: ICDE, pp 106–115
Li T, Li N, Zhang J (2009) Modeling and integrating background knowledge in data anonymization. In: ICDE, pp 6–17
Liu K, Terzi E (2008) Towards identity anonymization on graphs. In: SIGMOD
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) l-Diversity: privacy beyond k-anonymity. In: ICDE, p 24
Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: Proceedings of the 23rd ACM-SIGMOD-SIGACT-SIGART symposium on the principles of database systems, Paris, France, pp 223–228
Narayanan A, Shmatikov V (2008) Robust de-anonymisation of large sparse datasets. In: IEEE security and privacy, pp 111–125
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6): 1010–1027
Samarati P, Sweeney L (1998a) Generalizing data to provide anonymity when disclosing information (abstract). In: PODS, p 188
Samarati P, Sweeney L (1998b) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report SRI-CSL-98-04, SRI Computer Science Laboratory
Sweeney L (1997) Weaving technology and policy together to maintain confidentiality. J Law Med Ethics 25(2–3): 98–110
Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Syst 10(5): 557–570
Verykios VS, Elmagarmid AK, Bertino E, Dasseni E, Saygin Y (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4): 434–447
Wang K, Fung BCM (2006) Anonymizing sequential releases. In: ACM SIGKDD, pp 414–423
Wang K, Yu PS, Chakraborty S (2004) Bottom-up generalization: a data mining solution to privacy protection. In: The fourth IEEE international conference on data mining (ICDM 2004), pp 249–256
Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. 2nd edn. Morgan Kaufmann, San Francisco
Wong R, Li J, Fu A, Wang K (2006) (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: KDD, pp 754–759
Xu Y, Wang K, Fu Ada W-C, Yu PS (2008) Anonymizing transaction databases for publication. In: KDD, pp 767–775
Zhang Q, Koudas N, Srivastava D, Yu T (2007) Aggregate query answering on anonymized tables. In: ICDE, pp 116–125
Zhou B, Pei J, Luk WS (2008) A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Expl 10(2): 12–22
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: M.J. Zaki.
Rights and permissions
About this article
Cite this article
Sun, X., Wang, H., Li, J. et al. Publishing anonymous survey rating data. Data Min Knowl Disc 23, 379–406 (2011). https://doi.org/10.1007/s10618-010-0208-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0208-4