Text Mining the Coffin Texts

Text Mining the Coffin Texts (TM-CT) is a project financed by the Spanish Ministry for Science, Innovation and Universities, which runs at the University of Alcalá (2024-2026).


The main objective of the TM-CT project is to link the MORTEXVAR database with the images from the reference publication of the hieroglyphic version of this corpus (CT I-VIII). TM-CT will provide open access to the full database of the Coffin Texts corpus, with - expectedly - all text variants available in transliteration (alphabetic chain) and translation into modern language(s). More specifically, TM-CT will seek to

1. Trace the text variability of the whole corpus through the linking of the material philological information already in place (transliteration, translation, in-document location, geographical and chronological distribution of the texts) with the original publication of the hieroglyphic texts (images).
2. Refine the study of the variations on the different witnesses by using the database to potentially identify textual, dialectal, diachronic and grammar indicators of change.
3. Generate inventories of spellings/signs using the OCR toolkit designed by the subproject OCR-PT-CT.


Earlier Ancient Egyptian mortuary texts constitute a privileged showcase for the profound ideological and material changes that occurred during the Middle Kingdom, especially between the second half of the Eleventh Dynasty and the first half of the Twelfth Dynasty. A corpus-driven approach appears as the more efficient and reliable method to provide a comprehensive assessment of the complex situation of the period. In such an approach, the corpus is not a mere object under study but also a control group; this, most importantly, implies that the corpus’ assessing value is privileged over any external model. Fundamental to corpus-driven approaches is corpus size and access to it. Earlier Ancient Egyptian mortuary texts are an ideal candidate because they include two large corpora: the authoritative editions of the Pyramid Texts (PT) and Coffin Texts (CT), which extend over six and seven volumes, respectively, plus one volume for the copies of PT on Middle Kingdom coffins. TM-CT will focus on the CT, an important amount of which is now available from the MORTEXVAR database beta version. This dataset will be enriched with the image dataset from the OCR-PT-CT and TTAE projects using a pre-trained artificial neural network based on the YOLO v3 strategy and Natural Language Process techniques.

The resulting database is expected to allow, for the first time, to analyse variability in the complete corpus and on the original text in parallel with the transliteration and translation, plus the material philology metadata, much of them already in place.


One Egyptologist, one engineer, and one database manager will be in charge of the annotation, computer vision, and resulting database, with the PI coordinating and the collaborators providing feedback.

PI: Carlos Gracia Zamacona

Egyptologist: TBA soon

Engineer: TBA soon

Database manager: TBA soon


Gersande Eschenbrenner Diemer, Universidad de Alcalá: Egyptologist (wood analysis).

David Fuentes Jiménez, Universidad de Alcalá: Engineer (computer vision).

Álvaro Hernández Alonso, Universidad de Alcalá: Engineer (electronic design).

Anne Landborg: Egyptologist (Coffin Texts).

Leah Mascia, Universität Hamburg: Egyptologist (text materiality).

Antonio J. Morales, Universidad de Alcalá: Egyptologist (Pyramid Texts).

Sira Palazuelos Cagigas, Universidad de Alcalá: Engineer (natural language processing).

Daniel Pizarro Pérez, Universidad de Alcalá: Engineer (computer vision).

