OCR-PT-CT
Semi-automatic transcription of ancient Egyptian hieroglyphic documents
This project has conceived, designed, and developed a digital toolset to perform optical character recognition (OCR) of ancient Egyptian hieroglyphs on the Pyramid Texts (PT) and the Coffin Texts (CT) to provide a semi-automatic transcription of the original text into a standard code known as Manuel de Codage (MdC). Besides the technical challenge, the OCR-PT-CT project might enable researchers to search for textual and sign parallels much more efficiently.
A proof of concept
From March to December 2022, the OCR-PT-CT project (PIUAH21/AH-036), funded by Universidad de Alcalá and in synergy with the MORTEXVAR project (Comunidad de Madrid) and the GEINTRA and CIARQ research groups, assured the quality of input data by using the text editions of current reference in Egyptological research. Access to these editions has been granted by the Oriental Institute (University of Chicago) and James P. Allen (Brown University, Providence).
Thanks to its interdisciplinary team (Egyptology and Engineering), the OCR-PT-CT project has implemented a task sequence that constantly considers the flexibility and range of the data set regarding its possible usability with different complex writing systems.
The OCR-PT-CT project has proposed an OCR system adapted to the chosen corpus to permit minimum manual encoding. The project has tried techniques for segmenting the hieroglyphic script in these texts and classification systems based on deep neural networks. This will allow the researchers to interactively check the chosen corpus without manually encoding much of the text at the sentence level.
The OCR-PT-CT team
Former members
Beatriz Noria
Jónatan Ortiz
Sika Perdersen
The sources
Allen, J.P. 2006a. The Egyptian Coffin Texts, VIII: Middle Kingdom copies of Pyramid Texts (Oriental Institute Publications 132). Chicago: University of Chicago.
Allen, J.P. 2006b. A new concordance of the Pyramid Texts I-VI. Providence: Brown University.
De Buck, A. 1935-1961. The Egyptian Coffin Texts I-VII (Oriental Institute Publications 24, 49, 64, 67, 73, 81 & 87). Chicago: University of Chicago.
References
Barucci, A., Cucci, C., Franci, M., Loschiavo, M. & Argenti, F., A Deep Learning Approach to Ancient Egyptian Hieroglyphs Classification, IEEE Acesss 9 (2021), 1-10. (doi 10.1109/ACCESS.2021.3110082)
Barucci, A., Amendola, M., Argenti, F., Canfailla, Ch., Cucci, C., Guidi, T., Python, L., Franci, M. Discovering the ancient Egyptian hieroglyphs with Deep Learning. Rome: Consiglio Nazionale delle Ricerche (CNR), 2023.
Van den Berg, H. 1997. “Manuel de Codage”: A standard system for the computer-encoding of Egyptian transliteration and hieroglyphic texts
Cruz Cavalieri D., Bastos-Filho T., Palazuelos-Cagigas S., Sarcinelli-Filho, M. 2015. On Combining Language Models to Improve a Text-based Human-machine Interface. International Journal of Advanced Robotic Systems 12/170: 1-14. (doi 10.5772/61753)
Cruz Cavalieri D., Palazuelos-Cagigas S., Bastos-Filho T., Sarcinelli-Filho, M. 2016. Combination of Language Models for Word Prediction: An Exponential Approach. IEEE/ACM Transactions on Audio, Speech, and Language Processing 99 (doi 10.1109/TASLP.2016.2547743).
Gardiner, A.H. 1957. Egyptian grammar. Being an introduction to the study of hieroglyphs. Oxford / Londres: Griffith Institute / Oxford University Press.
Gracia Zamacona, C. 2013. A database for the Coffin Texts. In S. Polis & J. Winand (eds.), Texts, languages and information technology in Egyptology (Aegyptiaca Leodiensia 9). Lieja: Presses Universitaires de Liège, 139-155.
Gracia Zamacona, C. & J. Ortiz-García. 2021. Handbook of digital Egyptology: Texts (Monografías de Oriente Antiguo 1). Alcalá de Henares: Universidad de Alcalá.
Hu, R., Gayol, C. P., Odobez, J. M. & Gatica-Perez, D. 2017. Analyzing and visualizing ancient Maya hieroglyphics using shape: From computer vision to Digital Humanities. Digital Scholarship in the Humanities 32 (suppl. 2): 179-194.
Nederhof, M.J. & F. Rahman. 2017. A probabilistic model of ancient Egyptian writing. Journal of Language Modelling 5/1: 131-163.
Chung, J. & Delteil, T. (2019). A computationally efficient pipeline approach to full page offline handwritten text recognition. IEEE (ed.), 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) 5: 35-40.
Wigington, C., Tensmeyer, C., Davis, B., Barrett, W., Price, B. & Cohen, S. 2018. Start, follow, read: End-to-end full-page handwriting recognition. Proceedings of the European Conference on Computer Vision (ECCV), 367-383.
Yang, L., Wang, P., Li, H., Li, Z. & Zhang, Y. 2020. A holistic representation guided attention network for scene text recognition. Neurocomputing 414: 67-75.
News
Collaboration
On the 7th of October 2022, the OCR-PT-CT project (Universidad de Alcalá) started collaborating with the Museo Arqueológico Nacional (MAN) in Madrid to produce 3D digital models of Ancient Egyptian materials. A big thank you to Esther Pons and Isabel Olbés, curators of the Egyptian collection, and Andrés Carretero, director of the MAN, for their interest in technology-based research and dissemination approaches to ancient Egypt.
https://www.mortexvar.com/ocr-pt-ct
Left to right: Carlos Gracia, Daniel Pizarro, Esther Pons, Isabel Olbés, Sira Palazuelos and Álvaro Hernandez.
ICAENT 2
Presentation of the OCR-PT-CT project at the 2nd edition of the International Conference Ancient Egypt New Technology held at the University of Naples "L'Orientale" (5-7 July 2023).
ICAENT 2 Programme
With the support of
Hieroglyphs by Jsesh