loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Syed Tahseen Raza Rizvi 1 ; Dominique Mercier 2 ; Stefan Agne 3 ; Steffen Erkel 4 ; Andreas Dengel 3 and Sheraz Ahmed 3

Affiliations: 1 German Research Center for Artificial Intelligence (DFKI) and Kaiserslautern University of Technology, Germany ; 2 Kaiserslautern University of Technology, Germany ; 3 German Research Center for Artificial Intelligence (DFKI), Germany ; 4 Bosch Thermo-technology, Germany

Keyword(s): Table Detection, Information Extraction, Ontology, PDF Document, Document Analysis, Table Extraction, Relevancy.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Collaboration and e-Services ; e-Business ; Enterprise Engineering ; Enterprise Information Systems ; Enterprise Ontologies ; Formal Methods ; Knowledge Engineering and Ontology Development ; Knowledge Representation and Reasoning ; Knowledge-Based Systems ; Ontologies ; Semantic Web ; Simulation and Modeling ; Soft Computing ; Symbolic Systems

Abstract: This paper presents a novel system for extracting user relevant tabular information from documents. The presented system is generic and can be applied to any documents irrespective of their domain and the information they contain. In addition to the generic nature of the presented approach, it is robust and can deal with different document layouts followed while creating those documents. The presented system has two main modules; table detection and ontological information extraction. The table detection module extracts all tables from a given technical document while, the ontological information extraction module extracts only relevant tables from all of the detected tables. The generalization in this system is achieved by using ontologies, thus enabling the system to adapt itself, to a new set of documents from any other domain, according to any provided ontology. Furthermore, the presented system also provides a confidence score and explanation of the score for each of the extract ed tables in terms of its relevancy. The system was evaluated on 80 real technical documents of hardware parts containing 2033 tables from 20 different brands of Industrial Boilers domain. The evaluation results show that the presented system extracted all of the relevant tables and achieves an overall precision, recall, and F-measure of 0.88, 1 and 0.93 respectively. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.222.67.251

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Rizvi, S.; Mercier, D.; Agne, S.; Erkel, S.; Dengel, A. and Ahmed, S. (2018). Ontology-based Information Extraction from Technical Documents. In Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART; ISBN 978-989-758-275-2; ISSN 2184-433X, SciTePress, pages 493-500. DOI: 10.5220/0006596604930500

@conference{icaart18,
author={Syed Tahseen Raza Rizvi. and Dominique Mercier. and Stefan Agne. and Steffen Erkel. and Andreas Dengel. and Sheraz Ahmed.},
title={Ontology-based Information Extraction from Technical Documents},
booktitle={Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART},
year={2018},
pages={493-500},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006596604930500},
isbn={978-989-758-275-2},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART
TI - Ontology-based Information Extraction from Technical Documents
SN - 978-989-758-275-2
IS - 2184-433X
AU - Rizvi, S.
AU - Mercier, D.
AU - Agne, S.
AU - Erkel, S.
AU - Dengel, A.
AU - Ahmed, S.
PY - 2018
SP - 493
EP - 500
DO - 10.5220/0006596604930500
PB - SciTePress