A practical approximation algorithm for optimal k-anonymity

Kenig, Batya; Tassa, Tamir

doi:10.1007/s10618-011-0235-9

A practical approximation algorithm for optimal k-anonymity

Published: 27 August 2011

Volume 25, pages 134–168, (2012)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Batya Kenig¹ &
Tamir Tassa¹

833 Accesses
28 Citations
3 Altmetric
Explore all metrics

Abstract

k-Anonymity is a privacy preserving method for limiting disclosure of private information in data mining. The process of anonymizing a database table typically involves generalizing table entries and, consequently, it incurs loss of relevant information. This motivates the search for anonymization algorithms that achieve the required level of anonymization while incurring a minimal loss of information. The problem of k-anonymization with minimal loss of information is NP-hard. We present a practical approximation algorithm that enables solving the k-anonymization problem with an approximation guarantee of O(ln k). That algorithm improves an algorithm due to Aggarwal et al. (Proceedings of the international conference on database theory (ICDT), 2005) that offers an approximation guarantee of O(k), and generalizes that of Park and Shim (SIGMOD ’07: proceedings of the 2007 ACM SIGMOD international conference on management of data, 2007) that was limited to the case of generalization by suppression. Our algorithm uses techniques that we introduce herein for mining closed frequent generalized records. Our experiments show that the significance of our algorithm is not limited only to the theory of k-anonymization. The proposed algorithm achieves lower information losses than the leading approximation algorithm, as well as the leading heuristic algorithms. A modified version of our algorithm that issues ℓ-diverse k-anonymizations also achieves lower information losses than the corresponding modified versions of the leading algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A (2005) Anonymizing tables. In: ICDT, pp 246–258
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB, pp 487–499
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM-SIGMOD conference on management of data, pp 439–450
Bayardo R, Agrawal R (2005) Data privacy through optimal k-anonymization. In: International conference on data engineering (ICDE), pp 217–228
Byun J-W, Kamra A, Bertino E, Li N (2007) Efficient k-anonymization using clustering techniques. In: DASFAA, pp 188–200
Ghinita G, Karras P, Kalnis P, Mamoulis N (2009) A framework for efficient data anonymization under privacy and accuracy constraints. ACM Trans Database Syst 34(2)
Gionis A, Tassa T (2009) k-anonymization with minimal loss of information. IEEE Trans Knowl Data Eng 21(2): 206–219
Article Google Scholar
Gionis A, Mazza A, Tassa T (2008) k-anonymization revisited. In: International conference on data engineering (ICDE), pp 744–753
Goldberger J, Tassa T (2010) Efficient anonymizations with enhanced utility. TDP 3: 149–175
MathSciNet Google Scholar
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10): 1347–1362
Article Google Scholar
Hsiao C-J (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4): 462–478
Article Google Scholar
Iyengar V (2002) Transforming data to satisfy privacy constraints. In: KDD ’02: proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 279–288
LeFevre K, DeWitt D, Ramakrishnan R (2005) Incognito: efficient full-domain k-anonymity. In: ACM-SIGMOD conference on management of data, pp 49–60
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: ICDE, p 25
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and ℓ-diversity. In: Proceedings of IEEE international conference on data engineering (ICDE) 2007, pp 106–115
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) ℓ-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1): 3
Article Google Scholar
Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: PODS ’04: proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 223–228
Nergiz M, Clifton C (2007) Thoughts on k-anonymization. Data Knowl Eng 63(3): 622–645
Article Google Scholar
Park H, Shim K (2007) Approximate algorithms for k-anonymity. In: SIGMOD ’07: proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 67–78
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Workshop on research issues in data mining and knowledge discovery, DMKD, pp 21–30
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6): 1010–1027
Article Google Scholar
Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information (abstract). In: PODS ’98: proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, p 188
Srikant R, Agrawal R (1995) Minning generalized association rules. In: VLDB, pp 407–419
Sweeney L (2000) Uniqueness of simple demographics in the U.S. population. Laboratory for international Data Privacy LIDAP-WP4
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzz Knowl Based Syst 10(5): 557–570
Article MathSciNet MATH Google Scholar
Truta T, Campan A, Meyer P (2007) Generating microdata with p-sensitive k-anonymity property. In: SDM, pp 124–141
Wong R, Li J, Fu A, Wang K (2006) (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: In ACM SIGKDD, pp 754–759
Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: VLDB, pp 139–150

Download references

Author information

Authors and Affiliations

Division of Computer Science, The Open University, Ra’anana, Israel
Batya Kenig & Tamir Tassa

Authors

Batya Kenig
View author publications
You can also search for this author in PubMed Google Scholar
Tamir Tassa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tamir Tassa.

Additional information

Responsible editor: Johannes Gehrke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kenig, B., Tassa, T. A practical approximation algorithm for optimal k-anonymity. Data Min Knowl Disc 25, 134–168 (2012). https://doi.org/10.1007/s10618-011-0235-9

Download citation

Received: 03 April 2010
Accepted: 11 August 2011
Published: 27 August 2011
Issue Date: July 2012
DOI: https://doi.org/10.1007/s10618-011-0235-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical approximation algorithm for optimal k-anonymity

Abstract

Access this article

Similar content being viewed by others

On the Complexity of t-Closeness Anonymization and Related Problems

The Hardness of (ε, m)-anonymity

Optimization algorithm for k-anonymization of datasets with low information loss

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A practical approximation algorithm for optimal k-anonymity

Abstract

Access this article

Similar content being viewed by others

On the Complexity of t-Closeness Anonymization and Related Problems

The Hardness of (ε, m)-anonymity

Optimization algorithm for k-anonymization of datasets with low information loss

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation