skip to main content
10.1145/564376.564403acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Text genre classification with genre-revealing and subject-revealing features

Published:11 August 2002Publication History

ABSTRACT

Subject or prepositional content has been the focus of most classification research. Genre or style, on the other hand, is a different and important property of text, and automatic text genre classification is becoming important for classification and retrieval purposes as well as for some natural language processing research. In this paper, we present a method for automatic genre classification that is based on statistically selected features obtained from both subject-classified and genre classified training data. The experimental results show that the proposed method outperforms a direct application of a statistical learner often used for subject classification. We also observe that the deviation formula and discrimination formula using document frequency ratios also work as expected. We conjecture that this dual feature set approach can be generalized to improve the performance of subject classification as well.

References

  1. Ivan Bretan, John Dewe, Anders Hallberg, Niklas Wolkert, Jussi Karlgren, "Web-Specific Genre Visualization", Proc. of the 30th Hawaii International Conference on System Science, Jan 1997.Google ScholarGoogle Scholar
  2. Johan Dewe, Jussi Karlgren, Ivan Bretan, "Assembling a Balanced Corpus from the Internet", 11th Nordic Conference of Computational Linguistics, pages 100--107, Copenhagen, 1998.Google ScholarGoogle Scholar
  3. Andrew Dillon, Barbara A. Gushrowski, "Genre and the Web: Is the Personal Home Page the First Uniquely Digital Genre?", JASIS, 51(2):202--205, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jussi Karlgren, "Stylistic Variation in an Information Retrieval Experiment", Proc. of the 2nd International Conference on New Methods in Language Processing-NeMLaP, 1996.Google ScholarGoogle Scholar
  5. Jussi Karlgren, Ivan Brettan, Johan Dewe, Anders Hallberg, Niklas Wolkert, "Iterative Information Retrieval Using Fast Clustering and Usage-Specific Genres", 8th DELOS Workshop on User Interfaces in Digital Libraries, pages 85--92, 1998.Google ScholarGoogle Scholar
  6. Jussi Karlgren, Douglass Cutting, "Recognizing Text Genres with Simple Metrics Using Discriminant Analysis", Proc. of COLING94, Kyoto, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brett Kessler, Geoffrey Nunberg, Hinrich Schutze, "Automatic Detection of Text Genre", ACL'97, pages 32--38, July 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Lewis and M. Ringuette, "Compariosn of two learning algorithms for text categorization," Proc. of the 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994.Google ScholarGoogle Scholar
  9. H. J. Oh, S. H. Myaeng, and M. Lee, "A practical?hypertext categorization method using links and incrementally available?class?information", Proc. of the 23rd ACM SIGIR Conference, pages 264--271, Athenes, Greece,?2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E.Stamatatos, N.Fakotakis, G. Kokkinakis, "Text Genre Detection Using Common Word Frequencies", Proc. of the 18th International Conference on COLING2000, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Yang and X. Liu, "A re-examination of text categorization methods," Proc. Of the 22nd ACM SIGIR Conference, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Text genre classification with genre-revealing and subject-revealing features

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
        August 2002
        478 pages
        ISBN:1581135610
        DOI:10.1145/564376

        Copyright © 2002 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 August 2002

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        SIGIR '02 Paper Acceptance Rate44of219submissions,20%Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader