skip to main content
article

Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents

Published:01 June 2007Publication History
Skip Abstract Section

Abstract

This article is a report concerning the two years of the XML Mining track at INEX (2005 and 2006). We focus here on the classification and clustering of XML documents. We detail these two tasks and the corpus used for this challenge and then present a summary of the different methods proposed by the participants. We last compare the results obtained during the two years of the track.

References

  1. Maes, F., Denoyer, L., Gallinari, P.: XML structure mapping application to the pascal INEX 2006 XML document mining track. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2006)Google ScholarGoogle Scholar
  2. Gilleron, R., Jousse, F., Tellier, I., Tommasi, M.: XML document transformation with conditional random fields. In: INEX 2006. (2007)Google ScholarGoogle ScholarCross RefCross Ref
  3. Fuhr, N., Gövert, N., Kazai, G., Lalmas, M., eds.: Proceedings of the First Workshop of the INitiative for the Evaluation of XML Retrieval (INEX), Schloss Dagstuhl, Germany, December 9-11, 2002. In Fuhr, N., Gövert, N., Kazai, G., Lalmas, M., eds.: Workshop of the INitiative for the Evaluation of XML Retrieval. (2002)Google ScholarGoogle Scholar
  4. Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum (2006) Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Vercoustre, A. M., Fegas, M., Gul, S., Lechevallier, Y.: A flexible structured-based representation for XML document mining. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2005) 443--457 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Garboni, C., Masseglia, F., Trousse, B.: Sequential pattern mining for structure-based XML document classification. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2005) 458--468 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Candillier, L., Tellier, I., Torre, F.: Transforming XML trees for efficient classification and clustering. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2005) 469--480 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hagenbuchner, M., Sperduti, A., Tsoi, A. C., Trentini, F., Scarselli, F., Gori, M.: Clustering XML documents using self-organizing maps for structures. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2005) 481--496 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kc, M., Hagenbuchner, M., Tsoi, A., Scarselli, F., Gori, M., Sperduti, A.: XML document mining using contextual self-organizing maps for structures. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2006) Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Doucet, A., Lehtonen, M.: Unsupervised classification of text-centric XML document collections. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2006)Google ScholarGoogle Scholar
  11. Knijf, J. D.: Fat-cat: Frequent attributes tree based classification. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2006)Google ScholarGoogle Scholar
  12. Tran, T., Nayak, R., Raymond, K.: Clustering XML documents by structural similarity with pcxss. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2006) Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Nayak, R., Xu, S.: XML documents clustering by structures. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2005) 432--442 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGIR Forum
              ACM SIGIR Forum  Volume 41, Issue 1
              June 2007
              100 pages
              ISSN:0163-5840
              DOI:10.1145/1273221
              Issue’s Table of Contents

              Copyright © 2007 Authors

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 June 2007

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader