short-paper

Combining image and text features: a hybrid approach to mobile book spine recognition

Authors:
Sam S. Tsai

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
David Chen

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Huizhong Chen

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Cheng-Hsin Hsu

National Tsing Hua University, Hsinchu, Taiwan Roc

National Tsing Hua University, Hsinchu, Taiwan Roc
View Profile

,
Kyu-Han Kim

HP Laboratories, Palo Alto, CA, USA

HP Laboratories, Palo Alto, CA, USA
View Profile

,
Jatinder P. Singh

Deutsche Telekom R&D Laboratories USA, Los Altos, CA, USA

Deutsche Telekom R&D Laboratories USA, Los Altos, CA, USA
View Profile

,
Bernd Girod

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

MM '11: Proceedings of the 19th ACM international conference on MultimediaNovember 2011Pages 1029–1032https://doi.org/10.1145/2072298.2071930

Published:28 November 2011Publication History

MM '11: Proceedings of the 19th ACM international conference on Multimedia

Pages 1029–1032

ABSTRACT

Despite the successful use of local image features for large-scale object recognition, they are not effective in recognizing book spines on bookshelves. This is because some book spines contain only text components that do not yield distinguishing image features. To overcome this issue, we develop a new approach that combines a text-based spine recognition pipeline with an image feature-based spine recognition pipeline. The text within the book spine image is recognized and used as keywords to search a book spine text database. The image features of the book spine image are searched through a book spine image database. The search results of the two approaches are then carefully combined to form the final result. We implement the proposed hybrid book recognition pipeline used in a book inventory management system, and conduct extensive experiments to evaluate its performance. The experimental results show that while text-based or image feature-based systems only achieve a recall of 72%, the proposed hybrid system achieves a recall of ~91%.

References

H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3), 2008. Google ScholarDigital Library
D. Chen, S. Tsai, K.-H. Kim, C.-H. Hsu, J. P. Singh, and B. Girod. Low-cost asset tracking using location-aware camera phones. Number 1, San Diego, California, USA, 2010.Google Scholar
D. M. Chen, S. S. Tsai, B. Girod, C.-H. Hsu, K.-H. Kim, and J. P. Singh. Building book inventories using smartphones. In Proc. ACM Multimedia (MM'10'), MM '10, Firenze, Italy, 2010. ACM. Google ScholarDigital Library
H. Chen, S. S. Tsai, G. Schroth, D. M. Chen., V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B. Girod. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In International Conference on Image Processing, 2011.Google ScholarCross Ref
D. Crasto, A. Kale, and C. Jaynes. The smart bookshelf: A study of camera projector scene augmentation of an everyday environment. In Proc. IEEE Workshop on Applications of Computer Vision (WACV'05), Breckenridge, CO, January 2005. Google ScholarDigital Library
M. Fischler and R. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 1981. Google ScholarDigital Library
D. Lee, Y. Chang, J. Archibald, and C. Pitzak. Matching book-spine images for library shelf-reading process automation. In Proc. IEEE International Conference on Automation Science and Engineering (CASE'08), Arlington, VA, September 2008.Google ScholarCross Ref
M. Loechtefeld, S. Gehring, J. Schoening, and A. Krueger. Shelftorchlight: Augmenting a shelf using a camera projector unit. UBIProjection 2010 - Workshop on Personal Projection, 2010.Google Scholar
K. Matsushita, D. Iwai, and K. Sato. Interactive bookshelf surface for in situ book searching and storing support. In Proceedings of the 2nd Augmented Human International Conference, New York, NY, USA, 2011. Google ScholarDigital Library
D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06), New York, NY, June 2006. Google ScholarDigital Library
J. Philbin, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08),Anchorage, AL, June 2008.Google ScholarCross Ref
N. Quoc and W. Choi. A framework for recognition books on bookshelves. In Proc. International Conference on Intelligent Computing (ICIC'09), Ulsan, Korea, September 2009. Google ScholarDigital Library
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 1988. Google ScholarDigital Library
I. H. Witten, A. Moffat, and T. C. Bell. Managing gigabytes: Compressing and indexing documents and images. 1999. Google ScholarDigital Library
T. Yeh and B. Katz. Searching documentation using text, ocr, and image. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, 2009. Google ScholarDigital Library

Index Terms

Combining image and text features: a hybrid approach to mobile book spine recognition
1. Information systems
  1. Information retrieval

Recommendations

Combining intra-image and inter-class semantics for consumer image retrieval

Unconstrained consumer photos pose great challenge for content-based image retrieval. Unlike professional images or domain-specific images, consumer photos vary significantly. More often than not, the objects in the photos are ill-posed, occluded, and ...
Read More
The effect of low-level image features on pseudo relevance feedback

Relevance feedback (RF) is a technique popularly used to improve the effectiveness of traditional content-based image retrieval systems. However, users must provide relevant and/or irrelevant images as feedback for their queries, which is a tedious ...
Read More
Multimodal Image Retrieval Based on Keywords and Low-Level Image Features
Semantic Keyword-Based Search on Structured Data Sources
Abstract
Image retrieval approaches dealing with the complex problem of image search and retrieval in very large image datasets proposed so far can be roughly divided into those that use text descriptions of images (text-based image retrieval) and those ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '11: Proceedings of the 19th ACM international conference on Multimedia
November 2011
944 pages
ISBN:9781450306164
DOI:10.1145/2072298
General Chairs:
K. Selçuk Candan
Arizona State University, USA
,
Sethuraman Panchanathan
Arizona State University, USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
,
Program Chairs:
Hari Sundaram
Arizona State University, USA
,
Wu-Chi Feng
Portland State University, USA
,
Nicu Sebe
University of Trento, Italy
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 November 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
image retrieval
text recognition
visual search
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 388
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Combining image and text features: a hybrid approach to mobile book spine recognition

MM '11: Proceedings of the 19th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Combining intra-image and inter-class semantics for consumer image retrieval

The effect of low-level image features on pseudo relevance feedback

Multimodal Image Retrieval Based on Keywords and Low-Level Image Features

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Combining image and text features: a hybrid approach to mobile book spine recognition

MM '11: Proceedings of the 19th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Combining intra-image and inter-class semantics for consumer image retrieval

The effect of low-level image features on pseudo relevance feedback

Multimodal Image Retrieval Based on Keywords and Low-Level Image Features

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media