ABSTRACT
Consider fully dynamic data, where we track data as it gets inserted and deleted. There are well developed notions of private data analyses with dynamic data, for example, using differential privacy. We want to go beyond privacy, and consider privacy together with security, formulated recently as pan-privacy by Dwork et al. (ICS 2010). Informally, pan-privacy preserves differential privacy while computing desired statistics on the data, even if the internal memory of the algorithm is compromised (say, by a malicious break-in or insider curiosity or by fiat by the government or law).
We study pan-private algorithms for basic analyses, like estimating distinct count, moments, and heavy hitter count, with fully dynamic data. We present the first known pan-private algorithms for these problems in the fully dynamic model. Our algorithms rely on sketching techniques popular in streaming: in some cases, we add suitable noise to a previously known sketch, using a novel approach of calibrating noise to the underlying problem structure and the projection matrix of the sketch; in other cases, we maintain certain statistics on sketches; in yet others, we define novel sketches. We also present the first known lower bounds explicitly for pan privacy, showing our results to be nearly optimal for these problems. Our lower bounds are stronger than those implied by differential privacy or dynamic data streaming alone and hold even if unbounded memory and/or unbounded processing time are allowed. The lower bounds use a noisy decoding argument and exploit a connection between pan-private algorithms and data sanitization.
- G. Cormode, M. Datar, P. Indyk, and S. Muthukrishnan. Comparing data streams using hamming norms (how to zero in). TKDE, 2003. Google ScholarDigital Library
- G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms, 2005. Google ScholarDigital Library
- I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, 2003. Google ScholarDigital Library
- C. Dwork. Differential privacy in new settings. In SODA, 2010. Google ScholarDigital Library
- C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. Google ScholarDigital Library
- C. Dwork, F. McSherry, and K. Talwar. The price of privacy and the limits of lp decoding. In STOC, 2007. Google ScholarDigital Library
- C. Dwork, M. Naor, T. Pitassi, G. Rothblum, and S. Yekhanin. Pan-Private streaming algorithms. In ICS, 2010.Google Scholar
- C. Dwork, T. Pitassi, M. Naor, and G. Rothblum. Differential privacy under continual observation. In STOC, 2010. Google ScholarDigital Library
- A. Gupta, K. Ligett, F. McSherry, A. Roth, and K. Talwar. Differentially private combinatorial optimization. In SODA, 2010. Google ScholarDigital Library
- P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. In Journal of ACM, 2006. Google ScholarDigital Library
- D. Kane, J. Nelson, and D. Woodruff. An optimal algorithm for the distinct elements problem. In PODS, 2010. Google ScholarDigital Library
- A. McGregor, I. Mironov, T. Pitassi, O. Reingold, K. Talwar, and S. Vadhan. Limits of two-party differential privacy. In FOCS, 2010. Google ScholarDigital Library
- F. Mcsherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007. Google ScholarDigital Library
- I. Mironov, O. Pandey, O. Reingold, and S. Vadhan. Computational differential privacy. In CRYPTO, 2009. Google ScholarDigital Library
- J. P. Nolan. Stable Distributions - Models for Heavy Tailed Data. Birkhauser, 2010. In progress, Chapter 1 online at academic2.american.edu/~jpnolan.Google Scholar
- S. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of American Statistical Association, 1965.Google Scholar
Index Terms
- Pan-private algorithms via statistics on sketches
Recommendations
Towards Guaranteed Privacy in Stream Processing: Differential Privacy for Private Pattern Protection
DEBS '23: Proceedings of the 17th ACM International Conference on Distributed and Event-based SystemsSensor data often contain private information that requires proper protection. Most existing privacy-preserving mechanisms (PPMs) for data streams undermine the utility of the entire data stream and limit the performance of data-driven applications. ...
Improving the Utility of Differentially Private Data Releases via k-Anonymity
TRUSTCOM '13: Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and CommunicationsA common view in some data anonymization literature is to oppose the "old'' k-anonymity model to the "new'' differential privacy model, which offers more robust privacy guarantees. However, the utility of the masked results provided by differential ...
Differentially Private Ensemble Classifiers for Data Streams
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data MiningLearning from continuous data streams via classification/regression is prevalent in many domains. Adapting to evolving data characteristics (concept drift) while protecting data owners' private information is an open challenge. We present a ...
Comments