Abstract
While price and data quality should define the major trade-off for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this paper we study a model where data quality can be traded for a discount. We focus on the case of XML documents and consider completeness as the quality dimension. In our setting, the data provider offers an XML document, and sets both the price of the document and a weight to each node of the document, depending on its potential worth. The data consumer proposes a price. If the proposed price is lower than that of the entire document, then the data consumer receives a sample, i.e., a random rooted subtree of the document whose selection depends on the discounted price and the weight of nodes. By requesting several samples, the data consumer can iteratively explore the data in the document. We show that the uniform random sampling of a rooted subtree with prescribed weight is unfortunately intractable. However, we are able to identify several practical cases that are tractable. The first case is uniform random sampling of a rooted subtree with prescribed size; the second case restricts to binary weights. For both these practical cases we present polynomial-time algorithms and explain how they can be integrated into an iterative exploratory sampling approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cohen, S., Kimelfeld, B., Sagiv, Y.: Running tree automata on probabilistic XML. In: PODS, pp. 227–236 (2009)
Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: On near-uniform URL sampling. Computer Networks 33(1-6) (2000)
Hübler, C., Kriegel, H.-P., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: ICDM (2008)
Koutris, P., Upadhyaya, P., Balazinska, M., Howe, B., Suciu, D.: Query-based data pricing. In: PODS (2012)
Koutris, P., Upadhyaya, P., Balazinska, M., Howe, B., Suciu, D.: QueryMarket demonstration: Pricing for online data markets. PVLDB 5(12) (2012)
Koutris, P., Upadhyaya, P., Balazinska, M., Howe, B., Suciu, D.: Toward practical query pricing with QueryMarket. In: SIGMOD (2013)
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: SIGKDD (2006)
Li, C., Li, D.Y., Miklau, G., Suciu, D.: A theory of pricing private data. In: ICDT (2013)
Li, C., Miklau, G.: Pricing aggregate queries in a data marketplace. In: WebDB (2012)
Lin, B.-R., Kifer, D.: On arbitrage-free pricing for general data queries. PVLDB 7(9), 757–768 (2014)
Lu, X., Bressan, S.: Sampling connected induced subgraphs uniformly at random. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 195–212. Springer, Heidelberg (2012)
Luo, C., Jiang, Z., Hou, W.-C., Yu, F., Zhu, Q.: A sampling approach for XML query selectivity estimation. In: EDBT (2009)
Maiya, A.S., Berger-Wolf, T.Y.: Sampling community structure. In: WWW (2010)
Muschalle, A., Stahl, F., Löser, A., Vossen, G.: Pricing approaches for data markets. In: Castellanos, M., Dayal, U., Rundensteiner, E.A. (eds.) BIRTE 2012. LNBIP, vol. 154, pp. 129–144. Springer, Heidelberg (2013)
Pipino, L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45(4) (2002)
Ribeiro, B.F., Towsley, D.F.: Estimating and sampling graphs with multidimensional random walks. In: Internet Measurement Conference (2010)
Tang, R., Wu, H., Bao, Z., Bressan, S., Valduriez, P.: The price is right. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds.) DEXA 2013, Part II. LNCS, vol. 8056, pp. 380–394. Springer, Heidelberg (2013)
Wang, R.Y., Strong, D.M.: Beyond accuracy: What data quality means to data consumers. J. of Management Information Systems 12(4) (1996)
Wang, W., Jiang, H., Lu, H., Yu, J.X.: Containment join size estimation: Models and methods. In: SIGMOD (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tang, R., Amarilli, A., Senellart, P., Bressan, S. (2014). Get a Sample for a Discount. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8644. Springer, Cham. https://doi.org/10.1007/978-3-319-10073-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-10073-9_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10072-2
Online ISBN: 978-3-319-10073-9
eBook Packages: Computer ScienceComputer Science (R0)