Publishing anonymous survey rating data

Sun, Xiaoxun; Wang, Hua; Li, Jiuyong; Pei, Jian

doi:10.1007/s10618-010-0208-4

Publishing anonymous survey rating data

Published: 26 November 2010

Volume 23, pages 379–406, (2011)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Xiaoxun Sun¹,
Hua Wang²,
Jiuyong Li³ &
…
Jian Pei⁴

589 Accesses
1 Altmetric
Explore all metrics

Abstract

We study the challenges of protecting privacy of individuals in the large public survey rating data in this paper. Recent study shows that personal information in supposedly anonymous movie rating records are de-identified. The survey rating data usually contains both ratings of sensitive and non-sensitive issues. The ratings of sensitive issues involve personal privacy. Even though the survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. None of the existing anonymisation principles (e.g., k-anonymity, l-diversity, etc.) can effectively prevent such breaches in large survey rating data sets. We tackle the problem by defining a principle called ${(k,\epsilon)}$-anonymity model to protect privacy. Intuitively, the principle requires that, for each transaction t in the given survey rating data T, at least (k − 1) other transactions in T must have ratings similar to t, where the similarity is controlled by ${\epsilon}$ . The ${(k,\epsilon)}$ -anonymity model is formulated by its graphical representation and a specific graph-anonymisation problem is studied by adopting graph modification with graph theory. Various cases are analyzed and methods are developed to make the updated graph meet ${(k,\epsilon)}$ requirements. The methods are applied to two real-life data sets to demonstrate their efficiency and practical utility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensuring Security and Privacy Preservation for the Publication of Rating Datasets

Article 27 March 2024

Deep Privacy Concerns

($l^{p_1}, \ldots ,l^{p_n}$)-Privacy: privacy preservation models for numerical quasi-identifiers and multiple sensitive attributes

Article 02 January 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aggarwal C (2005) On k-anonymity and the curse of dimensionality. In: VLDB, pp 901–909
Atzori M, Bonchi F, Giannotti F, Pedreschi D (2005a) Blocking anonymity threats raised by frequent itemset mining. In: ICDM, pp 561–564
Atzori M, Bonchi F, Giannotti F, Pedreschi D (2005b) k-anonymous patterns. In: PKDD, pp 10–21
Atzori M, Bonchi F, Giannotti F, Pedreschi D (2008) Anonymity preserving pattern discovery. VLDB J 17(4): 703–727
Article Google Scholar
Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymisation. In: ICDE, pp 217–228
Frankowski D, Cosley D, Sen S, Terveen LG, Riedl J (2006) You are what you say: privacy risks of public mentions. In: SIGIR, pp 565–572
Fung BC, Wang K, Yu PS (2005) Top-down specialization for information and privacy preservation. In: ICDE, pp 205–216
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of ${\mathcal{NP}}$ -completeness. Freeman, San Francisco
Google Scholar
Ghinita G, Tao Y, Kalnis P (2008) On the anonymisation of sparse high-dimensional data. In: Proceedings of international conference on data engineering (ICDE), April, pp 715–724
Hafner K (2006) And if you liked the movie, a Netflix contest may reward you handsomely. New York Times, Oct 2
Hamming RW (1980) Coding and information theory. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Hansell S (2006) AOL removes search data on vast group of web users. New York Times, Aug 8
He Y, Naughton J (2009) Anonymization of set-valued data via top-down, local generalization. In: VLDB 2009: proceedings of the thirtieth international conference on very large data bases. VLDB endowment
Iyengar V (2002) Transforming data to satisfy privacy constraints. In: SIGKDD, pp 279–288
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24: 881–892
Article Google Scholar
Kifer D, Gehrke J (2006) Injecting utility into anonymized datasets. In: SIGMOD conference, pp 217–228
LeFevre K, DeWitt D, Ramakrishnan R (2006a) Mondrian multidimensional k-anonymity. In: ICDE, pp 25–25
LeFevre K, DeWitt DJ, Ramakrishnan R (2006b) Workload-aware anonymisation. In: KDD, pp 277–286
Li T, Li N (2009) On the tradeoff between privacy and utility in data publishing. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), pp 517–526
Li N, Li T, Venkatasubramanian S (2007) t-Closeness: privacy beyond k-anonymity and l-diversity. In: ICDE, pp 106–115
Li T, Li N, Zhang J (2009) Modeling and integrating background knowledge in data anonymization. In: ICDE, pp 6–17
Liu K, Terzi E (2008) Towards identity anonymization on graphs. In: SIGMOD
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) l-Diversity: privacy beyond k-anonymity. In: ICDE, p 24
Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: Proceedings of the 23rd ACM-SIGMOD-SIGACT-SIGART symposium on the principles of database systems, Paris, France, pp 223–228
Narayanan A, Shmatikov V (2008) Robust de-anonymisation of large sparse datasets. In: IEEE security and privacy, pp 111–125
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6): 1010–1027
Article Google Scholar
Samarati P, Sweeney L (1998a) Generalizing data to provide anonymity when disclosing information (abstract). In: PODS, p 188
Samarati P, Sweeney L (1998b) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report SRI-CSL-98-04, SRI Computer Science Laboratory
Sweeney L (1997) Weaving technology and policy together to maintain confidentiality. J Law Med Ethics 25(2–3): 98–110
Article Google Scholar
Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Syst 10(5): 557–570
Article MathSciNet MATH Google Scholar
Verykios VS, Elmagarmid AK, Bertino E, Dasseni E, Saygin Y (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4): 434–447
Article Google Scholar
Wang K, Fung BCM (2006) Anonymizing sequential releases. In: ACM SIGKDD, pp 414–423
Wang K, Yu PS, Chakraborty S (2004) Bottom-up generalization: a data mining solution to privacy protection. In: The fourth IEEE international conference on data mining (ICDM 2004), pp 249–256
Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
Wong R, Li J, Fu A, Wang K (2006) (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: KDD, pp 754–759
Xu Y, Wang K, Fu Ada W-C, Yu PS (2008) Anonymizing transaction databases for publication. In: KDD, pp 767–775
Zhang Q, Koudas N, Srivastava D, Yu T (2007) Aggregate query answering on anonymized tables. In: ICDE, pp 116–125
Zhou B, Pei J, Luk WS (2008) A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Expl 10(2): 12–22
Article Google Scholar

Download references

Author information

Authors and Affiliations

Australian Council for Educational Research, 19 Prospect Hill Road, Camberwell, VIC, Australia
Xiaoxun Sun
Department of Mathematics Computing, University of Southern Queensland, Toowoomba, QLD, Australia
Hua Wang
School of Computer and Information Science, University of South Australia, Adelaide, SA, Australia
Jiuyong Li
School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
Jian Pei

Authors

Xiaoxun Sun
View author publications
You can also search for this author inPubMed Google Scholar
Hua Wang
View author publications
You can also search for this author inPubMed Google Scholar
Jiuyong Li
View author publications
You can also search for this author inPubMed Google Scholar
Jian Pei
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xiaoxun Sun.

Additional information

Responsible editor: M.J. Zaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, X., Wang, H., Li, J. et al. Publishing anonymous survey rating data. Data Min Knowl Disc 23, 379–406 (2011). https://doi.org/10.1007/s10618-010-0208-4

Download citation

Received: 22 July 2009
Accepted: 15 November 2010
Published: 26 November 2010
Issue Date: November 2011
DOI: https://doi.org/10.1007/s10618-010-0208-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Publishing anonymous survey rating data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Ensuring Security and Privacy Preservation for the Publication of Rating Datasets

Deep Privacy Concerns

(\(l^{p_1}, \ldots ,l^{p_n}\))-Privacy: privacy preservation models for numerical quasi-identifiers and multiple sensitive attributes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Publishing anonymous survey rating data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Ensuring Security and Privacy Preservation for the Publication of Rating Datasets

Deep Privacy Concerns

(\(l^{p_1}, \ldots ,l^{p_n}\))-Privacy: privacy preservation models for numerical quasi-identifiers and multiple sensitive attributes

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now