article

Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents

Authors:
Ludovic Denoyer

University of Paris

University of Paris
View Profile

,
Patrick Gallinari

University of Paris

University of Paris
View Profile

Authors Info & Claims

ACM SIGIR Forum Volume 41 Issue 1June 2007pp 79–90https://doi.org/10.1145/1273221.1273230

Published:01 June 2007Publication History

ACM SIGIR Forum

Abstract

This article is a report concerning the two years of the XML Mining track at INEX (2005 and 2006). We focus here on the classification and clustering of XML documents. We detail these two tasks and the corpus used for this challenge and then present a summary of the different methods proposed by the participants. We last compare the results obtained during the two years of the track.

References

Maes, F., Denoyer, L., Gallinari, P.: XML structure mapping application to the pascal INEX 2006 XML document mining track. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2006)Google Scholar
Gilleron, R., Jousse, F., Tellier, I., Tommasi, M.: XML document transformation with conditional random fields. In: INEX 2006. (2007)Google ScholarCross Ref
Fuhr, N., Gövert, N., Kazai, G., Lalmas, M., eds.: Proceedings of the First Workshop of the INitiative for the Evaluation of XML Retrieval (INEX), Schloss Dagstuhl, Germany, December 9-11, 2002. In Fuhr, N., Gövert, N., Kazai, G., Lalmas, M., eds.: Workshop of the INitiative for the Evaluation of XML Retrieval. (2002)Google Scholar
Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum (2006) Google ScholarDigital Library
Vercoustre, A. M., Fegas, M., Gul, S., Lechevallier, Y.: A flexible structured-based representation for XML document mining. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2005) 443--457 Google ScholarDigital Library
Garboni, C., Masseglia, F., Trousse, B.: Sequential pattern mining for structure-based XML document classification. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2005) 458--468 Google ScholarDigital Library
Candillier, L., Tellier, I., Torre, F.: Transforming XML trees for efficient classification and clustering. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2005) 469--480 Google ScholarDigital Library
Hagenbuchner, M., Sperduti, A., Tsoi, A. C., Trentini, F., Scarselli, F., Gori, M.: Clustering XML documents using self-organizing maps for structures. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2005) 481--496 Google ScholarDigital Library
Kc, M., Hagenbuchner, M., Tsoi, A., Scarselli, F., Gori, M., Sperduti, A.: XML document mining using contextual self-organizing maps for structures. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2006) Google ScholarDigital Library
Doucet, A., Lehtonen, M.: Unsupervised classification of text-centric XML document collections. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2006)Google Scholar
Knijf, J. D.: Fat-cat: Frequent attributes tree based classification. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2006)Google Scholar
Tran, T., Nayak, R., Raymond, K.: Clustering XML documents by structural similarity with pcxss. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2006) Google ScholarDigital Library
Nayak, R., Xu, S.: XML documents clustering by structures. In: Workshop of the INitiative for the Evaluation of XML Retrieval. (2005) 432--442 Google ScholarDigital Library

Index Terms

Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents

Recommendations

Report on the XML Mining Track at INEX 2005 and INEX 2006
Comparative Evaluation of XML Information Retrieval Systems

This article is a report concerning the two years of the XML Mining track at INEX (2005 and 2006). We focus here on the classification and clustering of XML documents. We detail these two tasks and the corpus used for this challenge and then present a ...
Read More
PKU at INEX 2010 XML mining track
INEX'10: Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval

This paper presents our participation in the INEX 2010 XML Mining track. Our classification and clustering solutions for XML documents have used both the structure and content information, where the frequent subtrees as structural units are used for ...
Read More
Report on the INEX 2004 interactive track

As scientific data repositories, digital libraries and publishers increasingly use the eXtensible Markup Language (XML) for publication and storage interest has arisen in exploiting this formatting for retrieval purposes. XML is attractive because it ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGIR Forum Volume 41, Issue 1
June 2007
100 pages
ISSN:0163-5840
DOI:10.1145/1273221
Issue’s Table of Contents

Copyright © 2007 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2007
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 63
  Total Citations
  View Citations
- 296
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents

ACM SIGIR Forum

Abstract

References

Cited By

Index Terms

Recommendations

Report on the XML Mining Track at INEX 2005 and INEX 2006

PKU at INEX 2010 XML mining track

Report on the INEX 2004 interactive track

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents

ACM SIGIR Forum

Abstract

References

Cited By

Index Terms

Recommendations

Report on the XML Mining Track at INEX 2005 and INEX 2006

PKU at INEX 2010 XML mining track

Report on the INEX 2004 interactive track

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media