ABSTRACT
Some queries cannot be answered by machines only. Processing such queries requires human input for providing information that is missing from the database, for performing computationally difficult functions, and for matching, ranking, or aggregating results based on fuzzy criteria. CrowdDB uses human input via crowdsourcing to process queries that neither database systems nor search engines can adequately answer. It uses SQL both as a language for posing complex queries and as a way to model data. While CrowdDB leverages many aspects of traditional database systems, there are also important differences. Conceptually, a major change is that the traditional closed-world assumption for query processing does not hold for human input. From an implementation perspective, human-oriented query operators are needed to solicit, integrate and cleanse crowdsourced data. Furthermore, performance and cost depend on a number of new factors including worker affinity, training, fatigue, motivation and location. We describe the design of CrowdDB, report on an initial set of experiments using Amazon Mechanical Turk, and outline important avenues for future work in the development of crowdsourced query processing systems.
- Pictures of the Golden Gate Bridge retrieved from Flickr by akaporn, Dawn Endico, devinleedrew, di_the_huntress, Geoff Livingston, kevincole, Marc\_Smith, and superstrikertwo under the Creative Commons Attribution 2.0 Generic license.Google Scholar
- Amazon. AWS Case Study: Smartsheet, 2006.Google Scholar
- Amazon Mechanical Turk. http://www.mturk.com, 2010.Google Scholar
- S. Amer-Yahia et al. Crowds, Clouds, and Algorithms: Exploring the Human Side of "Big Data" Applications. In SIGMOD, 2010. Google ScholarDigital Library
- M. Armbrust et al. PIQL: A Performance Insightful Query Language. In SIGMOD, 2010. Google ScholarDigital Library
- M. S. Bernstein et al. Soylent: A Word Processor with a Crowd Inside. In ACM SUIST, 2010. Google ScholarDigital Library
- M. J. Carey and D. Kossmann. On saying "Enough already!" in SQL. SIGMOD Rec., 26(2):219--230, 1997. Google ScholarDigital Library
- S. S. Chawathe et al. The TSIMMIS Project: Integration of Heterogeneous Information Sources. In IPSJ, 1994.Google Scholar
- K. Chen et al. USHER: Improving Data Quality with Dynamic Forms. In ICDE, pages 321--332, 2010.Google ScholarCross Ref
- A. Doan, R. Ramakrishnan, and A. Halevy. Crowdsourcing Systems on the World-Wide Web. CACM, 54:86--96, Apr. 2011. Google ScholarDigital Library
- L. M. Haas et al. Optimizing Queries Across Diverse Data Sources. In VLDB, 1997. Google ScholarDigital Library
- J. M. Hellerstein et al. Adaptive Query Processing: Technology in Evolution. IEEE Data Eng. Bull., 2000.Google Scholar
- J. M. Hellerstein and J. F. Naughton. Query Execution Techniques for Caching Expensive Methods. In SIGMOD, pages 423--434, 1996. Google ScholarDigital Library
- E. Huang et al. Toward Automatic Task Design: A Progress Report. In HCOMP, 2010. Google ScholarDigital Library
- P. G. Ipeirotis. Analyzing the Amazon Mechanical Turk Marketplace. http://hdl.handle.net/2451/29801, 2010.Google Scholar
- P. G. Ipeirotis. Mechanical Turk, Low Wages, and the Market for Lemons. http://behind-the-enemy-lines.blogspot.com/2010/07/ mechanical-turk-low-wages-and-market.html, 2010.Google Scholar
- A. G. Kleppe, J. Warmer, and W. Bast. MDA Explained: The Model Driven Architecture: Practice and Promise. Addison-Wesley, 2003. Google ScholarDigital Library
- G. Little. How many turkers are there? http://groups.csail.mit.edu/uid/deneme/?p=502, 2009.Google Scholar
- G. Little et al. TurKit: Tools for Iterative Tasks on Mechanical Turk. In HCOMP, 2009. Google ScholarDigital Library
- A. Marcus et al. Crowdsourced Databases: Query Processing with People. In CIDR, 2011.Google Scholar
- Microsoft. Table Column Properties (SQL Server), 2008.Google Scholar
- A. Parameswaran et al. Human-Assisted Graph Search: It's Okay to Ask Questions. In VLDB, 2011. Google ScholarDigital Library
- A. Parameswaran and N. Polyzotis. Answering Queries using Humans, Algorithms and Databases. In CIDR, 2011.Google Scholar
- J. Ross et al. Who are the Crowdworkers? Shifting Demographics in Mechanical Turk. In CHI EA, 2010. Google ScholarDigital Library
- D. Schall, S. Dustdar, and M. B. Blake. Programming Human and Software-Based Web Services. Computer, 43(7):82--85, 2010. Google ScholarDigital Library
- Turker Nation. http://www.turkernation.com/, 2010.Google Scholar
- Turkopticon. http://turkopticon.differenceengines.com/, 2010.Google Scholar
- T. Yan, V. Kumar, and D. Ganesan. CrowdSearch: Exploiting Crowds for Accurate Real-time. Image Search on Mobile Phones. In MobiSys, 2010. Google ScholarDigital Library
Index Terms
- CrowdDB: answering queries with crowdsourcing
Recommendations
Efficient processing of monotonic linear progressive queries via dynamic materialized views
CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative ResearchThere is an increasing demand to process emerging types of queries, such as progressive queries (PQs), from numerous contemporary database applications including telematics, ecommerce, business intelligence, and decision support. Unlike a conventional ...
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
K-CAP '15: Proceedings of the 8th International Conference on Knowledge CaptureDue to the semi-structured nature of RDF data, missing values affect answer completeness of queries that are posed against RDF. To overcome this limitation, we present HARE, a novel hybrid query processing engine that brings together machine and human ...
DMVI: a dynamic materialized view index for efficiently discovering usable views for progressive queries
CASCON '12: Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative ResearchProgressive queries (PQ) are a new type of query emerged from numerous data intensive applications. A user formulates a PQ in several steps using a set of inter-related step-queries (SQ). Efficiently processing PQs in a DBMS is crucial in supporting ...
Comments