Semi-automatic transcription of ancient Egyptian hieroglyphic documents

This project will conceive, design, and develop a digital toolset to perform optical character recognition (OCR) of ancient Egyptian hieroglyphs on the Pyramid Texts (PT) and the Coffin Texts (CT) to provide a semi-automatic transcription of the original text into a standard code known as Manuel de Codage (MdC). Besides the technical challenge, the OCR-PT-CT project might enable researchers to search for textual and sign parallels much more efficiently.

OCR System2.jpg

A proof of concept

From March to December 2022, the OCR-PT-CT project (PIUAH21/AH-036), funded by Universidad de Alcalá and in synergy with the MORTEXVAR project (Comunidad de Madrid) and the GEINTRA and CIARQ research groups, will assure the quality of input data by using the text editions of current reference in Egyptological research. Access to these editions has been granted by the Oriental Institute (University of Chicago) and James P. Allen (Brown University, Providence).

Thanks to its interdisciplinary team (Egyptology and Engineering), the OCR-PT-CT project will implement a task sequence that will constantly consider the flexibility and range of the data set regarding its possible usability with different complex writing systems.

The OCR-PT-CT project will propose an OCR system adapted to the chosen corpus that will permit to keep manual encoding at a minimum. The project will try techniques for segmentation of the hieroglyphic script in these texts and classification systems based on deep neural networks. This will allow the researchers to interactively check the chosen corpus without manually encoding much of the text at the sentence level.


The OCR-PT-CT team


Daniel Pizarro Pérez


Foto carnet Sira_edited.png

Sira Palazuelos Cagigas



Laura de Diego Otón

Engineering student


Jónatan Ortiz García

Egyptologist & Digital Humanities

Photo Sika Pedersen_edited.png

Sika Pedersen

Doctoral student (Egyptology)

foto carnet high_edited.png

Álvaro Hernández Alonso



Rubén Nieto Capuchino


Foto Carlos Gracia 3_edited_edited_edite

Carlos Gracia Zamacona

Egyptologist (PI)


Beatriz Noria Serrano

Doctoral student (Egyptology)


César Guerra Méndez

Egyptology student

The sources

Allen, J.P. 2006a. The Egyptian Coffin Texts, VIII: Middle Kingdom copies of Pyramid Texts (Oriental Institute Publications 132). Chicago: University of Chicago.

Allen, J.P. 2006b. A new concordance of the Pyramid Texts I-VI. Providence: Brown University.

De Buck, A. 1935-1961. The Egyptian Coffin Texts I-VII (Oriental Institute Publications 24, 49, 64, 67, 73, 81 & 87). Chicago: University of Chicago.


Barucci, A., Cucci, C., Franci, M., Loschiavo, M. & Argenti, F., A Deep Learning Approach to Ancient Egyptian Hieroglyphs Classification, IEEE Acesss 9 (2021), 1-10. (doi 10.1109/ACCESS.2021.3110082)

Van den Berg, H. 1997. “Manuel de Codage”: A standard system for the computer-encoding of Egyptian transliteration and hieroglyphic texts

Cruz Cavalieri D., Bastos-Filho T., Palazuelos-Cagigas S., Sarcinelli-Filho, M. 2015. On Combining Language Models to Improve a Text-based Human-machine Interface. International Journal of Advanced Robotic Systems 12/170: 1-14. (doi 10.5772/61753)

Cruz Cavalieri D., Palazuelos-Cagigas S., Bastos-Filho T., Sarcinelli-Filho, M. 2016. Combination of Language Models for Word Prediction: An Exponential Approach. IEEE/ACM Transactions on Audio, Speech, and Language Processing 99 (doi 10.1109/TASLP.2016.2547743).

Gardiner, A.H. 1957. Egyptian grammar. Being an introduction to the study of hieroglyphs. Oxford / Londres: Griffith Institute / Oxford University Press.

Gracia Zamacona, C. 2013. A database for the Coffin Texts. In S. Polis & J. Winand (eds.), Texts, languages and information technology in Egyptology (Aegyptiaca Leodiensia 9). Lieja: Presses Universitaires de Liège, 139-155.

Gracia Zamacona, C. & J. Ortiz-García. 2021. Handbook of digital Egyptology: Texts (Monografías de Oriente Antiguo 1). Alcalá de Henares: Universidad de Alcalá.

Hu, R., Gayol, C. P., Odobez, J. M. & Gatica-Perez, D. 2017. Analyzing and visualizing ancient Maya hieroglyphics using shape: From computer vision to Digital Humanities. Digital Scholarship in the Humanities 32 (suppl. 2): 179-194.

Nederhof, M.J. & F. Rahman. 2017. A probabilistic model of ancient Egyptian writing. Journal of Language Modelling 5/1: 131-163.

The Hieroglyphic initiative.

Chung, J. & Delteil, T. (2019). A computationally efficient pipeline approach to full page offline handwritten text recognition. IEEE (ed.), 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) 5: 35-40.

Wigington, C., Tensmeyer, C., Davis, B., Barrett, W., Price, B. & Cohen, S. 2018. Start, follow, read: End-to-end full-page handwriting recognition. Proceedings of the European Conference on Computer Vision (ECCV), 367-383.

Yang, L., Wang, P., Li, H., Li, Z. & Zhang, Y. 2020. A holistic representation guided attention network for scene text recognition. Neurocomputing 414: 67-75.

With the support of

Logo UAH_edited.jpg

Hieroglyphs by Jsesh