research-article

CrowdDB: answering queries with crowdsourcing

Authors:
Michael J. Franklin

UC Berkeley, Berkeley, CA, USA

UC Berkeley, Berkeley, CA, USA
View Profile

,
Donald Kossmann

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Tim Kraska

UC Berkeley, Berkeley, CA, USA

UC Berkeley, Berkeley, CA, USA
View Profile

,
Sukriti Ramesh

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Reynold Xin

UC Berkeley, Berkeley, CA, USA

UC Berkeley, Berkeley, CA, USA
View Profile

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataJune 2011Pages 61–72https://doi.org/10.1145/1989323.1989331

Published:12 June 2011Publication History

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Pages 61–72

ABSTRACT

Some queries cannot be answered by machines only. Processing such queries requires human input for providing information that is missing from the database, for performing computationally difficult functions, and for matching, ranking, or aggregating results based on fuzzy criteria. CrowdDB uses human input via crowdsourcing to process queries that neither database systems nor search engines can adequately answer. It uses SQL both as a language for posing complex queries and as a way to model data. While CrowdDB leverages many aspects of traditional database systems, there are also important differences. Conceptually, a major change is that the traditional closed-world assumption for query processing does not hold for human input. From an implementation perspective, human-oriented query operators are needed to solicit, integrate and cleanse crowdsourced data. Furthermore, performance and cost depend on a number of new factors including worker affinity, training, fatigue, motivation and location. We describe the design of CrowdDB, report on an initial set of experiments using Amazon Mechanical Turk, and outline important avenues for future work in the development of crowdsourced query processing systems.

References

Pictures of the Golden Gate Bridge retrieved from Flickr by akaporn, Dawn Endico, devinleedrew, di_the_huntress, Geoff Livingston, kevincole, Marc\_Smith, and superstrikertwo under the Creative Commons Attribution 2.0 Generic license.Google Scholar
Amazon. AWS Case Study: Smartsheet, 2006.Google Scholar
Amazon Mechanical Turk. http://www.mturk.com, 2010.Google Scholar
S. Amer-Yahia et al. Crowds, Clouds, and Algorithms: Exploring the Human Side of "Big Data" Applications. In SIGMOD, 2010. Google ScholarDigital Library
M. Armbrust et al. PIQL: A Performance Insightful Query Language. In SIGMOD, 2010. Google ScholarDigital Library
M. S. Bernstein et al. Soylent: A Word Processor with a Crowd Inside. In ACM SUIST, 2010. Google ScholarDigital Library
M. J. Carey and D. Kossmann. On saying "Enough already!" in SQL. SIGMOD Rec., 26(2):219--230, 1997. Google ScholarDigital Library
S. S. Chawathe et al. The TSIMMIS Project: Integration of Heterogeneous Information Sources. In IPSJ, 1994.Google Scholar
K. Chen et al. USHER: Improving Data Quality with Dynamic Forms. In ICDE, pages 321--332, 2010.Google ScholarCross Ref
A. Doan, R. Ramakrishnan, and A. Halevy. Crowdsourcing Systems on the World-Wide Web. CACM, 54:86--96, Apr. 2011. Google ScholarDigital Library
L. M. Haas et al. Optimizing Queries Across Diverse Data Sources. In VLDB, 1997. Google ScholarDigital Library
J. M. Hellerstein et al. Adaptive Query Processing: Technology in Evolution. IEEE Data Eng. Bull., 2000.Google Scholar
J. M. Hellerstein and J. F. Naughton. Query Execution Techniques for Caching Expensive Methods. In SIGMOD, pages 423--434, 1996. Google ScholarDigital Library
E. Huang et al. Toward Automatic Task Design: A Progress Report. In HCOMP, 2010. Google ScholarDigital Library
P. G. Ipeirotis. Analyzing the Amazon Mechanical Turk Marketplace. http://hdl.handle.net/2451/29801, 2010.Google Scholar
P. G. Ipeirotis. Mechanical Turk, Low Wages, and the Market for Lemons. http://behind-the-enemy-lines.blogspot.com/2010/07/ mechanical-turk-low-wages-and-market.html, 2010.Google Scholar
A. G. Kleppe, J. Warmer, and W. Bast. MDA Explained: The Model Driven Architecture: Practice and Promise. Addison-Wesley, 2003. Google ScholarDigital Library
G. Little. How many turkers are there? http://groups.csail.mit.edu/uid/deneme/?p=502, 2009.Google Scholar
G. Little et al. TurKit: Tools for Iterative Tasks on Mechanical Turk. In HCOMP, 2009. Google ScholarDigital Library
A. Marcus et al. Crowdsourced Databases: Query Processing with People. In CIDR, 2011.Google Scholar
Microsoft. Table Column Properties (SQL Server), 2008.Google Scholar
A. Parameswaran et al. Human-Assisted Graph Search: It's Okay to Ask Questions. In VLDB, 2011. Google ScholarDigital Library
A. Parameswaran and N. Polyzotis. Answering Queries using Humans, Algorithms and Databases. In CIDR, 2011.Google Scholar
J. Ross et al. Who are the Crowdworkers? Shifting Demographics in Mechanical Turk. In CHI EA, 2010. Google ScholarDigital Library
D. Schall, S. Dustdar, and M. B. Blake. Programming Human and Software-Based Web Services. Computer, 43(7):82--85, 2010. Google ScholarDigital Library
Turker Nation. http://www.turkernation.com/, 2010.Google Scholar
Turkopticon. http://turkopticon.differenceengines.com/, 2010.Google Scholar
T. Yan, V. Kumar, and D. Ganesan. CrowdSearch: Exploiting Crowds for Accurate Real-time. Image Search on Mobile Phones. In MobiSys, 2010. Google ScholarDigital Library

Index Terms

CrowdDB: answering queries with crowdsourcing
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Efficient processing of monotonic linear progressive queries via dynamic materialized views
CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research

There is an increasing demand to process emerging types of queries, such as progressive queries (PQs), from numerous contemporary database applications including telematics, ecommerce, business intelligence, and decision support. Unlike a conventional ...
Read More
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
K-CAP '15: Proceedings of the 8th International Conference on Knowledge Capture

Due to the semi-structured nature of RDF data, missing values affect answer completeness of queries that are posed against RDF. To overcome this limitation, we present HARE, a novel hybrid query processing engine that brings together machine and human ...
Read More
DMVI: a dynamic materialized view index for efficiently discovering usable views for progressive queries
CASCON '12: Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research

Progressive queries (PQ) are a new type of query emerged from numerous data intensive applications. A user formulates a PQ in several steps using a set of inter-related step-queries (SQ). Efficiently processing PQs in a DBMS is crucial in supporting ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
June 2011
1364 pages
ISBN:9781450306614
DOI:10.1145/1989323
General Chair:
Timos Sellis
IMIS/RC Athena
,
Program Chair:
Renée J. Miller
University of Toronto
,
Publications Chairs:
Anastasios Kementsietsidis
IBM T.J. Watson Research Center
,
Yannis Velegrakis
University of Trento
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
architecture
crowd
crowdsourcing
database
hybrid system
query processing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 450
  Total Citations
  View Citations
- 3,272
  Total Downloads
- Downloads (Last 12 months)222
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CrowdDB: answering queries with crowdsourcing

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient processing of monotonic linear progressive queries via dynamic materialized views

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing

DMVI: a dynamic materialized view index for efficiently discovering usable views for progressive queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

CrowdDB: answering queries with crowdsourcing

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient processing of monotonic linear progressive queries via dynamic materialized views

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing

DMVI: a dynamic materialized view index for efficiently discovering usable views for progressive queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media