A multilingual corpus of New Testament translations

651 instances of 334 editions across five languages, with verified canonical metadata.

1 Corinthians 13:12

Βλέπομεν γὰρ ἄρτι δι᾽ ἐσόπτρου ἐν αἰνίγματι

Interlinear
we-see · for · now · through · mirror · in · riddle
Young's Literal Translation
"for we see now through a mirror obscurely"
American Standard Version
"For now we see in a mirror, darkly"
King James Version
"For now we see through a glass, darkly"
Darby
"For we see now through a dim window obscurely"
World English Bible
"For now we see in a mirror, dimly"

Each dot is one edition. 334 editions, 5 languages, 1500–2025.

223 works, 334 editions, 651 instances.

Prior corpora typically pick one canonical text per translation family. We keep every edition we can verify — the 1611 King James and the 1769 Cambridge revision are different texts, and a corpus that collapses them loses the diachronic signal that makes within-family comparison interesting.

26% of works carry more than one edition, but those works account for 51% of the editions — roughly half the corpus would disappear under one-canonical-per-work deduplication.

41% of editions are publicly downloadable.

Public-domain and openly-licensed editions ship as plain text via GitHub and Hugging Face. Copyrighted editions are available on request for non-commercial research.

For every edition — including the copyrighted ones — we publish pre-computed verse- and chapter-level embeddings and full pairwise similarity matrices (lexical and semantic), so downstream work on the corpus's structure doesn't require the source text.

Read in the browser

Pick an edition below to start reading — public-domain and openly-licensed texts open in your browser, no download required.

Browse all translations →

Get the data

Browse the corpus →

Text corpus
137 editions in the public release; copyrighted on request.
GitHub →Hugging Face →
Embeddings
Verse- and chapter-level, Qwen3-Embedding 0.6 B and 8 B.
Hugging Face →
Pairwise similarities
Lexical and semantic, for all within-language edition pairs.
Hugging Face →

Get in touch

We welcome questions, corrections, and reports of missing translations. For data or parsing issues, please open an issue on GitHub. For copyright concerns, collaboration, or anything else, write to mrapacz@agh.edu.pl.

Cite

Rapacz, M., & Smywiński-Pohl, A. (2026). Targum — a Multilingual New Testament Translation Corpus. In Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026) (pp. 7092–7105). European Language Resources Association (ELRA). https://doi.org/10.63317/2yiotxcyovir