ABSTRACT
Multistructured (M-S) data models were introduced to allow the expression of multilevel, concurrent annotation. However, most models lack either a consistent or an efficient validation mechanism. In a former paper, we introduced extended Annotation Graphs (eAG), a cyclic-graph data model equipped with a novel schema mechanism that, by allowing validation "by construction", bypasses the typical algorithmic cost of traditional methods for the validation of graph-structured data. We introduce here LeAG, a markup syntax for eAG annotations over text data. LeAG takes the shape of a classic, inline markup model. A LeAG annotation can then be written, in a human-readable form, in any notepad application, and saved as a text file; the syntax is simple and familiar -- yet LeAG proposes a natural syntax for multilayer annotation with (self-) overlap and links. From a theoretical point of view, LeAG inaugurates a hybrid markup paradigm. Syntactically speaking, it is a full inline model, since the tags are all inserted along the annotated resources; still, we evidence that representing independent elements' co-occurring in an inline manner requires to make the annotation rest upon a notion of reference value, that is typical of stand-off markup. To our knowledge, LeAG is the first inline markup syntax to properly conceptualize the notion of elements' accidental co-occurring, that is yet fundamental in multilevel annotation.
- Gioele Barabucci, Angelo Di Iorio, Silvio Peroni, Francesco Poggi, and Fabio Vitali. 2013. Annotations with EARMARK in practice: a fairy tale Proceedings of the 1st International Workshop on Collaborative Annotations in Shared Environment: metadata, vocabularies and techniques in the Digital Humanities. ACM.Google Scholar
- Vincent Barrellon, Pierre-Edouard Portier, Sylvie Calabretto, and Olivier Ferret. 2016. Schema-aware Extended Annotation Graphs. In Proceedings of the 2016 ACM symposium on Document engineering. ACM, 45--54. Google ScholarDigital Library
- Sobveslav Benda, Jakub Klímek, and Martin Nevcasky. 2013. Using schematron as schema language in conceptual modeling for XML Proceedings of the Ninth Asia-Pacific Conference on Conceptual Modelling-Volume 143. Australian Computer Society, Inc., 31--40.Google Scholar
- Steven Bird and Mark Liberman. 2001. A formal framework for linguistic annotation. Speech communication, Vol. 33, 1 (2001), 23--60. Google ScholarDigital Library
- Anne Brüggemann-Klein. 1993. Regular expressions into finite automata. Theoretical Computer Science Vol. 120, 2 (1993), 197--213. Google ScholarDigital Library
- Gerrit Brüning, Katrin Henzel, and Dietmar Pravida. 2013. Multiple encoding in genetic editions: the case of" Faust". Journal of the TEI 4 (2013).Google Scholar
- Emmanuel Bruno and Elisabeth Murisasco. 2006. MSXD: a model and a schema for concurrent structures defined over the same textual data Database and Expert Systems Applications. Springer, 172--181.Google Scholar
- Hugh A Cayless. 2013. Rebooting TEI Pointers. Journal of the Text Encoding Initiative 6 (2013).Google Scholar
- Charles Chastain. 1975. Reference and Context. Language, Mind, and Knowledge, bibfieldeditorKeith Gunderson (Ed.). Vol. Vol. 7. University of Minessota Press, 194--231.Google Scholar
- TEI Consortium, Lou Burnard, Syd Bauman, and others. 2008. TEI P5: Guidelines for electronic text encoding and interchange. TEI Consortium.Google Scholar
- Dan Cristea, Nancy Ide, Laurent Romary, and others. 1998. Marking-up multiple views of a Text: Discourse and Reference Proceedings of the First International Conference on Language Resources and Evaluation.Google Scholar
- Paolo D'Iorio and Michele Barbera. 2011. Scholarsource: A digital infrastructure for the humanities. Switching Codes. Thinking through New Technology in the Humanities and the Arts (2011), 61--87.Google Scholar
- Mirco Hilbert, Andreas Witt, and Oliver Schonefeld. 2005. Making CONCUR work Extreme Markup Languages.Google Scholar
- HV Jagadish, Laks VS Lakshmanan, Monica Scannapieco, Divesh Srivastava, and Nuwee Wiwatwattana. 2004. Colorful XML: one hierarchy isn't enough. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, 251--262. Google ScholarDigital Library
- Holger Knublauch and Arthur Ryman. 2015. Shapes Constraint Language (SHACL). W3C First Public Working Draft Vol. 8 (2015), W3C.Google Scholar
- Silvio Peroni. 2014. Markup beyond the trees. Semantic Web Technologies and Legal Scholarly Publishing. Springer, 45--93. Google ScholarCross Ref
- Pierre-Édouard et al Portier. 2012. Modeling, encoding and querying multi-structured documents. Information Processing & Management Vol. 48, 5 (2012), 931--955.Google ScholarDigital Library
- Eric Prud'hommeaux, Jose Emilio Labra Gayo, and Harold Solbrig. 2014. Shape expressions: an RDF validation and transformation language Proceedings of the 10th International Conference on Semantic Systems. ACM, 32--40.Google Scholar
- Francesco Ranzato and Francesco Tapparo. 2010. An efficient simulation algorithm based on abstract interpretation. Information and Computation Vol. 208, 1 (2010), 1--22. Google ScholarDigital Library
- Allen Renear, Elli Mylonas, and David Durand. 1996. Refining our notion of what text really is: The problem of overlapping hierarchies. Research in humanities computing Vol. 4 (1996), 263--80.Google Scholar
- Dave Reynolds, Carol Thompson, Jishnu Mukerji, and Derek Coleman. 2005. An assessment of RDF/OWL modelling. Digital Media Systems Laboratory, HP Laboratories Bristol Vol. 28 (2005).Google Scholar
- Robert et al. Sanderson. 2013. Open annotation data model. W3C community draft (2013).Google Scholar
- Desmond Schmidt. 2012. The role of markup in the digital humanities. Historical Social Research/Historische Sozialforschung (2012), 125--146.Google Scholar
- Oliver Schonefeld. 2007. XCONCUR and XCONCUR-CL: A constraint-based approach for the validation of concurrent markup Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference.Google Scholar
- Evren Sirin. 2010. Data validation with OWL integrity constraints. Web Reasoning and Rule Systems. Springer, 18--22. Google ScholarCross Ref
- C Michael Sperberg-McQueen. 1991. Text in the electronic age: Texual study and textual study and text encoding, with examples from medieval texts. Literary and Linguistic Computing Vol. 6, 1 (1991), 34--46. Google ScholarCross Ref
- C Michael Sperberg-McQueen. 2006. Rabbit/duck grammars: a validation method for overlapping structures Extreme Markup Languages.Google Scholar
- C Michael Sperberg-McQueen and Claus Huitfeldt. 2000. Goddag: A data structure for overlapping hierarchies. Digital documents: Systems and principles. Springer, 139--160.Google Scholar
- Slawek Staworko, Iovka Boneva, Jose E Labra Gayo, Samuel Hym, Eric G Prud'hommeaux, and Harold Solbrig. 2015. Complexity and Expressiveness of ShEx for RDF. In LIPIcs-Leibniz International Proceedings in Informatics, Vol. Vol. 31. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
- Jiao Tao, Evren Sirin, Jie Bao, and Deborah L McGuinness. 2010. Integrity Constraints in OWL. In AAAI Conference on Artificial Intelligence.Google Scholar
- Jeni Tennison. 2007. Creole: Validating overlapping markup. In XTech.Google Scholar
- Jeni Tennison and Wendell Piez. 2002. The Layered Markup and Annotation Language (LMNL). Extreme Markup Languages.Google Scholar
- Giovanni Tummarello, Christian Morbidoni, and Elena Pierazzo. 2005. Toward Textual Encoding Based on RDF. In Proceedings of the 9th International Conference on Electronic Publishing. IEEE.Google Scholar
- Andreas Witt. 2010. Different views on markup. Text, Speech and Language Technology (2010).Google Scholar
Index Terms
- Linear Extended Annotation Graphs
Recommendations
Schema-aware Extended Annotation Graphs
DocEng '16: Proceedings of the 2016 ACM Symposium on Document EngineeringMultistructured (M-S) documents were introduced as an answer to the need of ever more expressive data models for scholarly annotation, as experienced in the frame of Digital Humanities. Many proposals go beyond XML, that is the gold standard for ...
Mapping of bibliographical standards into XML
The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
Efficient Revalidation of XML Documents
We study the problem of schema revalidation where XML data known to conform to one schema must be validated with respect to another schema. Such revalidation algorithms have applications in schema evolution, query processing, XML-based programming ...
Comments