Semi-automatic transcription of ancient Egyptian hieroglyphic documents
This project will conceive, design, and develop a digital toolset to perform optical character recognition (OCR) of ancient Egyptian hieroglyphs on the Pyramid Texts (PT) and the Coffin Texts (CT) to provide a semi-automatic transcription of the original text into a standard code known as Manuel de Codage (MdC). Besides the technical challenge, the OCR-PT-CT project might enable researchers to search for textual and sign parallels much more efficiently.
A proof of concept
From March to December 2022, the OCR-PT-CT project (PIUAH21/AH-036), funded by Universidad de Alcalá and in synergy with the MORTEXVAR project (Comunidad de Madrid) and the GEINTRA and CIARQ research groups, will assure the quality of input data by using the text editions of current reference in Egyptological research. Access to these editions has been granted by the Oriental Institute (University of Chicago) and James P. Allen (Brown University, Providence).
Thanks to its interdisciplinary team (Egyptology and Engineering), the OCR-PT-CT project will implement a task sequence that will constantly consider the flexibility and range of the data set regarding its possible usability with different complex writing systems.
The OCR-PT-CT project will propose an OCR system adapted to the chosen corpus that will permit to keep manual encoding at a minimum. The project will try techniques for segmentation of the hieroglyphic script in these texts and classification systems based on deep neural networks. This will allow the researchers to interactively check the chosen corpus without manually encoding much of the text at the sentence level.
The OCR-PT-CT team
Laura de Diego Otón
César Guerra Méndez
Allen, J.P. 2006a. The Egyptian Coffin Texts, VIII: Middle Kingdom copies of Pyramid Texts (Oriental Institute Publications 132). Chicago: University of Chicago.
Allen, J.P. 2006b. A new concordance of the Pyramid Texts I-VI. Providence: Brown University.
De Buck, A. 1935-1961. The Egyptian Coffin Texts I-VII (Oriental Institute Publications 24, 49, 64, 67, 73, 81 & 87). Chicago: University of Chicago.
Barucci, A., Cucci, C., Franci, M., Loschiavo, M. & Argenti, F., A Deep Learning Approach to Ancient Egyptian Hieroglyphs Classification, IEEE Acesss 9 (2021), 1-10. (doi 10.1109/ACCESS.2021.3110082)
Cruz Cavalieri D., Bastos-Filho T., Palazuelos-Cagigas S., Sarcinelli-Filho, M. 2015. On Combining Language Models to Improve a Text-based Human-machine Interface. International Journal of Advanced Robotic Systems 12/170: 1-14. (doi 10.5772/61753)
Cruz Cavalieri D., Palazuelos-Cagigas S., Bastos-Filho T., Sarcinelli-Filho, M. 2016. Combination of Language Models for Word Prediction: An Exponential Approach. IEEE/ACM Transactions on Audio, Speech, and Language Processing 99 (doi 10.1109/TASLP.2016.2547743).
Gardiner, A.H. 1957. Egyptian grammar. Being an introduction to the study of hieroglyphs. Oxford / Londres: Griffith Institute / Oxford University Press.
Gracia Zamacona, C. 2013. A database for the Coffin Texts. In S. Polis & J. Winand (eds.), Texts, languages and information technology in Egyptology (Aegyptiaca Leodiensia 9). Lieja: Presses Universitaires de Liège, 139-155.
Gracia Zamacona, C. & J. Ortiz-García. 2021. Handbook of digital Egyptology: Texts (Monografías de Oriente Antiguo 1). Alcalá de Henares: Universidad de Alcalá.
Hu, R., Gayol, C. P., Odobez, J. M. & Gatica-Perez, D. 2017. Analyzing and visualizing ancient Maya hieroglyphics using shape: From computer vision to Digital Humanities. Digital Scholarship in the Humanities 32 (suppl. 2): 179-194.
Nederhof, M.J. & F. Rahman. 2017. A probabilistic model of ancient Egyptian writing. Journal of Language Modelling 5/1: 131-163.
Chung, J. & Delteil, T. (2019). A computationally efficient pipeline approach to full page offline handwritten text recognition. IEEE (ed.), 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) 5: 35-40.
Wigington, C., Tensmeyer, C., Davis, B., Barrett, W., Price, B. & Cohen, S. 2018. Start, follow, read: End-to-end full-page handwriting recognition. Proceedings of the European Conference on Computer Vision (ECCV), 367-383.
Yang, L., Wang, P., Li, H., Li, Z. & Zhang, Y. 2020. A holistic representation guided attention network for scene text recognition. Neurocomputing 414: 67-75.
On the 7th of October 2022, the OCR-PT-CT project (Universidad de Alcalá) started collaborating with the Museo Arqueológico Nacional (MAN) in Madrid to produce 3D digital models of Ancient Egyptian materials. A big thank you to Esther Pons and Isabel Olbés, curators of the Egyptian collection, and Andrés Carretero, director of the MAN, for their interest in technology-based research and dissemination approaches to ancient Egypt.
Left to right: Carlos Gracia, Daniel Pizarro, Esther Pons, Isabel Olbés, Sira Palazuelos and Álvaro Hernandez.
With the support of
Hieroglyphs by Jsesh