Skip to main content
Log in

A practical approximation algorithm for optimal k-anonymity

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

k-Anonymity is a privacy preserving method for limiting disclosure of private information in data mining. The process of anonymizing a database table typically involves generalizing table entries and, consequently, it incurs loss of relevant information. This motivates the search for anonymization algorithms that achieve the required level of anonymization while incurring a minimal loss of information. The problem of k-anonymization with minimal loss of information is NP-hard. We present a practical approximation algorithm that enables solving the k-anonymization problem with an approximation guarantee of O(ln k). That algorithm improves an algorithm due to Aggarwal et al. (Proceedings of the international conference on database theory (ICDT), 2005) that offers an approximation guarantee of O(k), and generalizes that of Park and Shim (SIGMOD ’07: proceedings of the 2007 ACM SIGMOD international conference on management of data, 2007) that was limited to the case of generalization by suppression. Our algorithm uses techniques that we introduce herein for mining closed frequent generalized records. Our experiments show that the significance of our algorithm is not limited only to the theory of k-anonymization. The proposed algorithm achieves lower information losses than the leading approximation algorithm, as well as the leading heuristic algorithms. A modified version of our algorithm that issues -diverse k-anonymizations also achieves lower information losses than the corresponding modified versions of the leading algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A (2005) Anonymizing tables. In: ICDT, pp 246–258

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB, pp 487–499

  • Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM-SIGMOD conference on management of data, pp 439–450

  • Bayardo R, Agrawal R (2005) Data privacy through optimal k-anonymization. In: International conference on data engineering (ICDE), pp 217–228

  • Byun J-W, Kamra A, Bertino E, Li N (2007) Efficient k-anonymization using clustering techniques. In: DASFAA, pp 188–200

  • Ghinita G, Karras P, Kalnis P, Mamoulis N (2009) A framework for efficient data anonymization under privacy and accuracy constraints. ACM Trans Database Syst 34(2)

  • Gionis A, Tassa T (2009) k-anonymization with minimal loss of information. IEEE Trans Knowl Data Eng 21(2): 206–219

    Article  Google Scholar 

  • Gionis A, Mazza A, Tassa T (2008) k-anonymization revisited. In: International conference on data engineering (ICDE), pp 744–753

  • Goldberger J, Tassa T (2010) Efficient anonymizations with enhanced utility. TDP 3: 149–175

    MathSciNet  Google Scholar 

  • Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10): 1347–1362

    Article  Google Scholar 

  • Hsiao C-J (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4): 462–478

    Article  Google Scholar 

  • Iyengar V (2002) Transforming data to satisfy privacy constraints. In: KDD ’02: proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 279–288

  • LeFevre K, DeWitt D, Ramakrishnan R (2005) Incognito: efficient full-domain k-anonymity. In: ACM-SIGMOD conference on management of data, pp 49–60

  • LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: ICDE, p 25

  • Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and -diversity. In: Proceedings of IEEE international conference on data engineering (ICDE) 2007, pp 106–115

  • Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) -diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1): 3

    Article  Google Scholar 

  • Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: PODS ’04: proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 223–228

  • Nergiz M, Clifton C (2007) Thoughts on k-anonymization. Data Knowl Eng 63(3): 622–645

    Article  Google Scholar 

  • Park H, Shim K (2007) Approximate algorithms for k-anonymity. In: SIGMOD ’07: proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 67–78

  • Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Workshop on research issues in data mining and knowledge discovery, DMKD, pp 21–30

  • Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6): 1010–1027

    Article  Google Scholar 

  • Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information (abstract). In: PODS ’98: proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, p 188

  • Srikant R, Agrawal R (1995) Minning generalized association rules. In: VLDB, pp 407–419

  • Sweeney L (2000) Uniqueness of simple demographics in the U.S. population. Laboratory for international Data Privacy LIDAP-WP4

  • Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzz Knowl Based Syst 10(5): 557–570

    Article  MathSciNet  MATH  Google Scholar 

  • Truta T, Campan A, Meyer P (2007) Generating microdata with p-sensitive k-anonymity property. In: SDM, pp 124–141

  • Wong R, Li J, Fu A, Wang K (2006) (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: In ACM SIGKDD, pp 754–759

  • Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: VLDB, pp 139–150

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tamir Tassa.

Additional information

Responsible editor: Johannes Gehrke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kenig, B., Tassa, T. A practical approximation algorithm for optimal k-anonymity. Data Min Knowl Disc 25, 134–168 (2012). https://doi.org/10.1007/s10618-011-0235-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-011-0235-9

Keywords

Navigation