skip to main content
10.1145/3103010.3103011acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article
Best Student Paper

Linear Extended Annotation Graphs

Published:31 August 2017Publication History

ABSTRACT

Multistructured (M-S) data models were introduced to allow the expression of multilevel, concurrent annotation. However, most models lack either a consistent or an efficient validation mechanism. In a former paper, we introduced extended Annotation Graphs (eAG), a cyclic-graph data model equipped with a novel schema mechanism that, by allowing validation "by construction", bypasses the typical algorithmic cost of traditional methods for the validation of graph-structured data. We introduce here LeAG, a markup syntax for eAG annotations over text data. LeAG takes the shape of a classic, inline markup model. A LeAG annotation can then be written, in a human-readable form, in any notepad application, and saved as a text file; the syntax is simple and familiar -- yet LeAG proposes a natural syntax for multilayer annotation with (self-) overlap and links. From a theoretical point of view, LeAG inaugurates a hybrid markup paradigm. Syntactically speaking, it is a full inline model, since the tags are all inserted along the annotated resources; still, we evidence that representing independent elements' co-occurring in an inline manner requires to make the annotation rest upon a notion of reference value, that is typical of stand-off markup. To our knowledge, LeAG is the first inline markup syntax to properly conceptualize the notion of elements' accidental co-occurring, that is yet fundamental in multilevel annotation.

References

  1. Gioele Barabucci, Angelo Di Iorio, Silvio Peroni, Francesco Poggi, and Fabio Vitali. 2013. Annotations with EARMARK in practice: a fairy tale Proceedings of the 1st International Workshop on Collaborative Annotations in Shared Environment: metadata, vocabularies and techniques in the Digital Humanities. ACM.Google ScholarGoogle Scholar
  2. Vincent Barrellon, Pierre-Edouard Portier, Sylvie Calabretto, and Olivier Ferret. 2016. Schema-aware Extended Annotation Graphs. In Proceedings of the 2016 ACM symposium on Document engineering. ACM, 45--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sobveslav Benda, Jakub Klímek, and Martin Nevcasky. 2013. Using schematron as schema language in conceptual modeling for XML Proceedings of the Ninth Asia-Pacific Conference on Conceptual Modelling-Volume 143. Australian Computer Society, Inc., 31--40.Google ScholarGoogle Scholar
  4. Steven Bird and Mark Liberman. 2001. A formal framework for linguistic annotation. Speech communication, Vol. 33, 1 (2001), 23--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Anne Brüggemann-Klein. 1993. Regular expressions into finite automata. Theoretical Computer Science Vol. 120, 2 (1993), 197--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gerrit Brüning, Katrin Henzel, and Dietmar Pravida. 2013. Multiple encoding in genetic editions: the case of" Faust". Journal of the TEI 4 (2013).Google ScholarGoogle Scholar
  7. Emmanuel Bruno and Elisabeth Murisasco. 2006. MSXD: a model and a schema for concurrent structures defined over the same textual data Database and Expert Systems Applications. Springer, 172--181.Google ScholarGoogle Scholar
  8. Hugh A Cayless. 2013. Rebooting TEI Pointers. Journal of the Text Encoding Initiative 6 (2013).Google ScholarGoogle Scholar
  9. Charles Chastain. 1975. Reference and Context. Language, Mind, and Knowledge, bibfieldeditorKeith Gunderson (Ed.). Vol. Vol. 7. University of Minessota Press, 194--231.Google ScholarGoogle Scholar
  10. TEI Consortium, Lou Burnard, Syd Bauman, and others. 2008. TEI P5: Guidelines for electronic text encoding and interchange. TEI Consortium.Google ScholarGoogle Scholar
  11. Dan Cristea, Nancy Ide, Laurent Romary, and others. 1998. Marking-up multiple views of a Text: Discourse and Reference Proceedings of the First International Conference on Language Resources and Evaluation.Google ScholarGoogle Scholar
  12. Paolo D'Iorio and Michele Barbera. 2011. Scholarsource: A digital infrastructure for the humanities. Switching Codes. Thinking through New Technology in the Humanities and the Arts (2011), 61--87.Google ScholarGoogle Scholar
  13. Mirco Hilbert, Andreas Witt, and Oliver Schonefeld. 2005. Making CONCUR work Extreme Markup Languages.Google ScholarGoogle Scholar
  14. HV Jagadish, Laks VS Lakshmanan, Monica Scannapieco, Divesh Srivastava, and Nuwee Wiwatwattana. 2004. Colorful XML: one hierarchy isn't enough. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, 251--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Holger Knublauch and Arthur Ryman. 2015. Shapes Constraint Language (SHACL). W3C First Public Working Draft Vol. 8 (2015), W3C.Google ScholarGoogle Scholar
  16. Silvio Peroni. 2014. Markup beyond the trees. Semantic Web Technologies and Legal Scholarly Publishing. Springer, 45--93. Google ScholarGoogle ScholarCross RefCross Ref
  17. Pierre-Édouard et al Portier. 2012. Modeling, encoding and querying multi-structured documents. Information Processing & Management Vol. 48, 5 (2012), 931--955.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Eric Prud'hommeaux, Jose Emilio Labra Gayo, and Harold Solbrig. 2014. Shape expressions: an RDF validation and transformation language Proceedings of the 10th International Conference on Semantic Systems. ACM, 32--40.Google ScholarGoogle Scholar
  19. Francesco Ranzato and Francesco Tapparo. 2010. An efficient simulation algorithm based on abstract interpretation. Information and Computation Vol. 208, 1 (2010), 1--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Allen Renear, Elli Mylonas, and David Durand. 1996. Refining our notion of what text really is: The problem of overlapping hierarchies. Research in humanities computing Vol. 4 (1996), 263--80.Google ScholarGoogle Scholar
  21. Dave Reynolds, Carol Thompson, Jishnu Mukerji, and Derek Coleman. 2005. An assessment of RDF/OWL modelling. Digital Media Systems Laboratory, HP Laboratories Bristol Vol. 28 (2005).Google ScholarGoogle Scholar
  22. Robert et al. Sanderson. 2013. Open annotation data model. W3C community draft (2013).Google ScholarGoogle Scholar
  23. Desmond Schmidt. 2012. The role of markup in the digital humanities. Historical Social Research/Historische Sozialforschung (2012), 125--146.Google ScholarGoogle Scholar
  24. Oliver Schonefeld. 2007. XCONCUR and XCONCUR-CL: A constraint-based approach for the validation of concurrent markup Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference.Google ScholarGoogle Scholar
  25. Evren Sirin. 2010. Data validation with OWL integrity constraints. Web Reasoning and Rule Systems. Springer, 18--22. Google ScholarGoogle ScholarCross RefCross Ref
  26. C Michael Sperberg-McQueen. 1991. Text in the electronic age: Texual study and textual study and text encoding, with examples from medieval texts. Literary and Linguistic Computing Vol. 6, 1 (1991), 34--46. Google ScholarGoogle ScholarCross RefCross Ref
  27. C Michael Sperberg-McQueen. 2006. Rabbit/duck grammars: a validation method for overlapping structures Extreme Markup Languages.Google ScholarGoogle Scholar
  28. C Michael Sperberg-McQueen and Claus Huitfeldt. 2000. Goddag: A data structure for overlapping hierarchies. Digital documents: Systems and principles. Springer, 139--160.Google ScholarGoogle Scholar
  29. Slawek Staworko, Iovka Boneva, Jose E Labra Gayo, Samuel Hym, Eric G Prud'hommeaux, and Harold Solbrig. 2015. Complexity and Expressiveness of ShEx for RDF. In LIPIcs-Leibniz International Proceedings in Informatics, Vol. Vol. 31. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google ScholarGoogle Scholar
  30. Jiao Tao, Evren Sirin, Jie Bao, and Deborah L McGuinness. 2010. Integrity Constraints in OWL. In AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  31. Jeni Tennison. 2007. Creole: Validating overlapping markup. In XTech.Google ScholarGoogle Scholar
  32. Jeni Tennison and Wendell Piez. 2002. The Layered Markup and Annotation Language (LMNL). Extreme Markup Languages.Google ScholarGoogle Scholar
  33. Giovanni Tummarello, Christian Morbidoni, and Elena Pierazzo. 2005. Toward Textual Encoding Based on RDF. In Proceedings of the 9th International Conference on Electronic Publishing. IEEE.Google ScholarGoogle Scholar
  34. Andreas Witt. 2010. Different views on markup. Text, Speech and Language Technology (2010).Google ScholarGoogle Scholar

Index Terms

  1. Linear Extended Annotation Graphs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          DocEng '17: Proceedings of the 2017 ACM Symposium on Document Engineering
          August 2017
          242 pages
          ISBN:9781450346894
          DOI:10.1145/3103010

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 31 August 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          DocEng '17 Paper Acceptance Rate13of71submissions,18%Overall Acceptance Rate178of537submissions,33%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader