research-article

Linear Extended Annotation Graphs

Authors:
Vincent Barrellon

INSA-Lyon, Villeurbanne, France

INSA-Lyon, Villeurbanne, France
View Profile

,
Pierre-Edouard Portier

INSA-Lyon, Villeurbanne, France

INSA-Lyon, Villeurbanne, France
View Profile

,
Sylvie Calabretto

INSA-Lyon, Villeurbanne, France

INSA-Lyon, Villeurbanne, France
View Profile

,
Olivier Ferret

Lyon 2, Lyon, France

Lyon 2, Lyon, France
View Profile

DocEng '17: Proceedings of the 2017 ACM Symposium on Document EngineeringAugust 2017Pages 9–18https://doi.org/10.1145/3103010.3103011

Published:31 August 2017Publication History

DocEng '17: Proceedings of the 2017 ACM Symposium on Document Engineering

Pages 9–18

ABSTRACT

Multistructured (M-S) data models were introduced to allow the expression of multilevel, concurrent annotation. However, most models lack either a consistent or an efficient validation mechanism. In a former paper, we introduced extended Annotation Graphs (eAG), a cyclic-graph data model equipped with a novel schema mechanism that, by allowing validation "by construction", bypasses the typical algorithmic cost of traditional methods for the validation of graph-structured data. We introduce here LeAG, a markup syntax for eAG annotations over text data. LeAG takes the shape of a classic, inline markup model. A LeAG annotation can then be written, in a human-readable form, in any notepad application, and saved as a text file; the syntax is simple and familiar -- yet LeAG proposes a natural syntax for multilayer annotation with (self-) overlap and links. From a theoretical point of view, LeAG inaugurates a hybrid markup paradigm. Syntactically speaking, it is a full inline model, since the tags are all inserted along the annotated resources; still, we evidence that representing independent elements' co-occurring in an inline manner requires to make the annotation rest upon a notion of reference value, that is typical of stand-off markup. To our knowledge, LeAG is the first inline markup syntax to properly conceptualize the notion of elements' accidental co-occurring, that is yet fundamental in multilevel annotation.

References

Gioele Barabucci, Angelo Di Iorio, Silvio Peroni, Francesco Poggi, and Fabio Vitali. 2013. Annotations with EARMARK in practice: a fairy tale Proceedings of the 1st International Workshop on Collaborative Annotations in Shared Environment: metadata, vocabularies and techniques in the Digital Humanities. ACM.Google Scholar
Vincent Barrellon, Pierre-Edouard Portier, Sylvie Calabretto, and Olivier Ferret. 2016. Schema-aware Extended Annotation Graphs. In Proceedings of the 2016 ACM symposium on Document engineering. ACM, 45--54. Google ScholarDigital Library
Sobveslav Benda, Jakub Klímek, and Martin Nevcasky. 2013. Using schematron as schema language in conceptual modeling for XML Proceedings of the Ninth Asia-Pacific Conference on Conceptual Modelling-Volume 143. Australian Computer Society, Inc., 31--40.Google Scholar
Steven Bird and Mark Liberman. 2001. A formal framework for linguistic annotation. Speech communication, Vol. 33, 1 (2001), 23--60. Google ScholarDigital Library
Anne Brüggemann-Klein. 1993. Regular expressions into finite automata. Theoretical Computer Science Vol. 120, 2 (1993), 197--213. Google ScholarDigital Library
Gerrit Brüning, Katrin Henzel, and Dietmar Pravida. 2013. Multiple encoding in genetic editions: the case of" Faust". Journal of the TEI 4 (2013).Google Scholar
Emmanuel Bruno and Elisabeth Murisasco. 2006. MSXD: a model and a schema for concurrent structures defined over the same textual data Database and Expert Systems Applications. Springer, 172--181.Google Scholar
Hugh A Cayless. 2013. Rebooting TEI Pointers. Journal of the Text Encoding Initiative 6 (2013).Google Scholar
Charles Chastain. 1975. Reference and Context. Language, Mind, and Knowledge, bibfieldeditorKeith Gunderson (Ed.). Vol. Vol. 7. University of Minessota Press, 194--231.Google Scholar
TEI Consortium, Lou Burnard, Syd Bauman, and others. 2008. TEI P5: Guidelines for electronic text encoding and interchange. TEI Consortium.Google Scholar
Dan Cristea, Nancy Ide, Laurent Romary, and others. 1998. Marking-up multiple views of a Text: Discourse and Reference Proceedings of the First International Conference on Language Resources and Evaluation.Google Scholar
Paolo D'Iorio and Michele Barbera. 2011. Scholarsource: A digital infrastructure for the humanities. Switching Codes. Thinking through New Technology in the Humanities and the Arts (2011), 61--87.Google Scholar
Mirco Hilbert, Andreas Witt, and Oliver Schonefeld. 2005. Making CONCUR work Extreme Markup Languages.Google Scholar
HV Jagadish, Laks VS Lakshmanan, Monica Scannapieco, Divesh Srivastava, and Nuwee Wiwatwattana. 2004. Colorful XML: one hierarchy isn't enough. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, 251--262. Google ScholarDigital Library
Holger Knublauch and Arthur Ryman. 2015. Shapes Constraint Language (SHACL). W3C First Public Working Draft Vol. 8 (2015), W3C.Google Scholar
Silvio Peroni. 2014. Markup beyond the trees. Semantic Web Technologies and Legal Scholarly Publishing. Springer, 45--93. Google ScholarCross Ref
Pierre-Édouard et al Portier. 2012. Modeling, encoding and querying multi-structured documents. Information Processing & Management Vol. 48, 5 (2012), 931--955.Google ScholarDigital Library
Eric Prud'hommeaux, Jose Emilio Labra Gayo, and Harold Solbrig. 2014. Shape expressions: an RDF validation and transformation language Proceedings of the 10th International Conference on Semantic Systems. ACM, 32--40.Google Scholar
Francesco Ranzato and Francesco Tapparo. 2010. An efficient simulation algorithm based on abstract interpretation. Information and Computation Vol. 208, 1 (2010), 1--22. Google ScholarDigital Library
Allen Renear, Elli Mylonas, and David Durand. 1996. Refining our notion of what text really is: The problem of overlapping hierarchies. Research in humanities computing Vol. 4 (1996), 263--80.Google Scholar
Dave Reynolds, Carol Thompson, Jishnu Mukerji, and Derek Coleman. 2005. An assessment of RDF/OWL modelling. Digital Media Systems Laboratory, HP Laboratories Bristol Vol. 28 (2005).Google Scholar
Robert et al. Sanderson. 2013. Open annotation data model. W3C community draft (2013).Google Scholar
Desmond Schmidt. 2012. The role of markup in the digital humanities. Historical Social Research/Historische Sozialforschung (2012), 125--146.Google Scholar
Oliver Schonefeld. 2007. XCONCUR and XCONCUR-CL: A constraint-based approach for the validation of concurrent markup Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference.Google Scholar
Evren Sirin. 2010. Data validation with OWL integrity constraints. Web Reasoning and Rule Systems. Springer, 18--22. Google ScholarCross Ref
C Michael Sperberg-McQueen. 1991. Text in the electronic age: Texual study and textual study and text encoding, with examples from medieval texts. Literary and Linguistic Computing Vol. 6, 1 (1991), 34--46. Google ScholarCross Ref
C Michael Sperberg-McQueen. 2006. Rabbit/duck grammars: a validation method for overlapping structures Extreme Markup Languages.Google Scholar
C Michael Sperberg-McQueen and Claus Huitfeldt. 2000. Goddag: A data structure for overlapping hierarchies. Digital documents: Systems and principles. Springer, 139--160.Google Scholar
Slawek Staworko, Iovka Boneva, Jose E Labra Gayo, Samuel Hym, Eric G Prud'hommeaux, and Harold Solbrig. 2015. Complexity and Expressiveness of ShEx for RDF. In LIPIcs-Leibniz International Proceedings in Informatics, Vol. Vol. 31. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
Jiao Tao, Evren Sirin, Jie Bao, and Deborah L McGuinness. 2010. Integrity Constraints in OWL. In AAAI Conference on Artificial Intelligence.Google Scholar
Jeni Tennison. 2007. Creole: Validating overlapping markup. In XTech.Google Scholar
Jeni Tennison and Wendell Piez. 2002. The Layered Markup and Annotation Language (LMNL). Extreme Markup Languages.Google Scholar
Giovanni Tummarello, Christian Morbidoni, and Elena Pierazzo. 2005. Toward Textual Encoding Based on RDF. In Proceedings of the 9th International Conference on Electronic Publishing. IEEE.Google Scholar
Andreas Witt. 2010. Different views on markup. Text, Speech and Language Technology (2010).Google Scholar

Index Terms

Linear Extended Annotation Graphs

Recommendations

Schema-aware Extended Annotation Graphs
DocEng '16: Proceedings of the 2016 ACM Symposium on Document Engineering

Multistructured (M-S) documents were introduced as an answer to the need of ever more expressive data models for scholarly annotation, as experienced in the frame of Digital Humanities. Many proposals go beyond XML, that is the gold standard for ...
Read More
Mapping of bibliographical standards into XML

The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
Read More
Efficient Revalidation of XML Documents

We study the problem of schema revalidation where XML data known to conform to one schema must be validated with respect to another schema. Such revalidation algorithms have applications in schema evolution, query processing, XML-based programming ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DocEng '17: Proceedings of the 2017 ACM Symposium on Document Engineering
August 2017
242 pages
ISBN:9781450346894
DOI:10.1145/3103010
General Chair:
Kenneth Camilleri
University of Malta, Malta
,
Program Chair:
Alexandra Bonnici
University of Malta, Malta
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Student Paper
Author Tags
markup model
multilayer annotation
multistructured data
simulation
validation
Qualifiers
- research-article
Conference

Acceptance Rates
DocEng '17 Paper Acceptance Rate13of71submissions,18%Overall Acceptance Rate178of537submissions,33%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 80
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Linear Extended Annotation Graphs

DocEng '17: Proceedings of the 2017 ACM Symposium on Document Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Schema-aware Extended Annotation Graphs

Mapping of bibliographical standards into XML

Efficient Revalidation of XML Documents