A multilingual corpus of New Testament translations

651 translations across English, French, Italian, Polish, and Spanish — with verified edition metadata, pre-computed embeddings, and pairwise similarity scores.

Diachronic distribution of translations by language and year English French Italian Polish Spanish 1500 1700 1850 1950 2025
Each dot is one translation edition. Hover for details; click to open in browse.

Targum is a corpus designed to prioritize depth over linguistic breadth, with 2.4–5× more translations per language than any prior resource. Each translation is mapped to a canonical edition identifier with documented provenance, enabling micro-level analysis of translation families and macro-level comparison across confessional traditions.

Rapacz, M., & Smywiński-Pohl, A. (2026). Targum — a Multilingual New Testament Translation Corpus. In Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026) (pp. 7092–7105). European Language Resources Association (ELRA). https://doi.org/10.63317/2yiotxcyovir

We welcome questions, corrections, and reports of missing translations. For data or parsing issues, please open an issue on GitHub. For copyright concerns, collaboration, or anything else, write to mrapacz@agh.edu.pl.