Description of an Interdisciplinary European Project

The European Bible translations of the Reformation era have had a remarkable influence on many European languages. This statement will be consensus of those scholars working in the area of historical linguistics and Reformation history, whereas the reasons and the extent of this influence have been subject of detailed discussion. Since the 19th century, Bible texts originating from early modern time were reprinted – sometimes including a critical apparatus – and have become subject of detailed theological and linguistic monographs.

See the collection of essays in: Luthers Deutsch. Sprachliche Leistung und Wirkung, edited by Herbert Wolf, Dokumentation germanistischer Forschung 2 (Frankfurt am Main 1996). The historiography of the vernacular Bible translations already began during the 18th century in several European countries, inter alia: Johann Adolf Schinmeier, Versuch einer vollständigen Geschichte der schwedischen Bibelübersetzungen und Ausgaben mit Anzeige und Beurtheilung ihres Werthes : nebst e. Anh. von einigen seltenen Handschriften und den Lebensumständen der dabei interestirten merkwürdigen Personen (Flensburg/Leipzig 1777–1782).

A general problem in this area of research lies in the size of the Bible, often extending half a million words. Hence most publications are based on a smaller, carefully selected part of the texts, merely because of the the time-consuming procedures of collecting and evaluating topic-relevant phenomena. Nevertheless, these efforts produced some reliable results, and some key methodologies (e.g. in the area comparative linguistics) could be effectively applied and refined. But because of the limitations concerning the quantity of texts, the desideratum to extend the basis and the wish to increase evidence remains.

The following project description will introduce the idea of a new research instrument: a carefully designed and annotated Bible corpus with precise and flexible search functions which will help to look for grammatical and lexical features without time-consuming preparations.

1. Short "History" of the Project

The project idea came into existence during my research in the area of editions of early printed texts. Working on the first Lithuanian Bible (translated by Johannes Bretke 1579-1590), my task was to create a new type of research software which allows to compare the translation with proved, probable, or possible sources (in this case for example Luther's translation, Greek New Testament, Latin Vulgate). The program serves as a parallel concordance in order to investigate the lexical and grammatical structures of the Lithuanian text in comparison with the German, Greek and Latin sources.

While looking through the 16th-century Bible editions in the area around the Baltic Sea (Latvia, Finland, Sweden, Denmark, Germany) their deep influence on the language history of all countries became obvious. Some examples:

•  The New Testament was the second book printed in the Finnish language, the first one was a grammar book, both written by Mikael Agricola. In this case the Bible translation founded the literary tradition of its language.

•  In Denmark and Sweden, the Bible translations and their use in the church drew a language boundary in an – until then – dialectal continuum.

•  The translation by Martin Luther (and even the low German adaptation of it) were a significant impetus concerning the decline of low German as an official and literary language in the northern part of Germany.

Furthermore, the Bible translations played an important role in theological dispute as well as in daily life. They were part of theological discussion, being the basis and the result of theological controversies, and they became an integral part of the lives of a wide majority of people, being a source of comfort and admonition.

The first role became evident, for example, in the Catholic New Testament edited by Hieronymus Emser, based on Luther's translation and changed only where doctrine required corrections, and in the great variety of Dutch Bibles translated from a Lutheran, Catholic, Calvinist, or Mennonite point of view.

The second role of the vernacular translations is manifest in their influence on language development, not only but most obviously resulting in a great richness of biblical proverbs and common sayings, and in a continuous pressure towards standardized language with far-reaching effects on vocabulary, orthography, and grammar.

Looking at the relations between the German, Danish, Swedish, and Finnish translations, one discovers in most cases a strong influence by Luther's text or its low German adaptation. The English, the Romance and the Slavonic translations are mainly based on other primary sources (e.g. the editions by Desiderius Erasmus or Santes Pagninus) but often Luther's text has been a reference text "on the desktop" of the translators.

These phenomena led to the idea to create a Bible corpus containing the main editions of each European language involved, as an electronic basis for synchronic and diachronic research in different linguistic areas as well as under various historical aspects of language and theology. Some promising perspectives might be that this corpus makes accessible

•  a new approach to investigate translation methods (theoretically) and strategies (practically),

•  a new basis for systematic examination of standardization processes in the areas of orthography and grammar, or

•  new possibilities to trace the "coining" process of theological and ecclesiastical terms and expressions.

2. Main "Demands" on the Project

The perspectives mentioned above can be realized only on the basis of carefully selected texts and consistently prepared text data. Regarding the overall structure of the Bible corpus, each language should be represented in the corpus by at least two texts encoded in a diplomatic form, enhanced by linguistic data and completed by graphical data.

2.1 Criteria for Selection of Versions

The selection process must take into account different aspects. The main criteria have to be based on

•  quantitative reasons: the texts should be printed and as widespread as possible;

•  historical reasons: the texts must originate from the Reformation era and should have been essential to ecclesiastical life and language development;

•  material reasons: the texts must form a consistent basis for comparison, i.e. they should represent the relevant linguistic stages and the different theological convictions.

These criteria are interdependent, because only printed texts could have been spread in a wider area and this circulation is necessary to produce a substantial influence on language use. In the Reformation era the Bible translations have been the main texts exerting such an influence. This statement is valid for most European countries, although its extent differs specifically. Additionally, nearly all vernacular translations have a more or less close relation to Luther's translation (and to one of Erasmus', Stephanus', or Münster's Latin, Greek, and Hebrew editions). The selected texts should be representative regarding two different aspects. They have to reflect the various theological backgrounds (for example the Lutheran, Reformed, and Catholic translations in the Netherlands as well as in Germany and Switzerland) and the linguistic stages of standardization.

List of Corpus Candidates

Regarding the source texts and the Germanic branch of the Indo-European languages the criteria mentioned lead to the following selection, which will be subject of a more detailed discussion:

(a) The Hebrew, Greek, and Latin sources should be part of the corpus, at least one edition of Erasmus' New Testament (Greek and Latin, preferably 1516 or 1519), one of Münster's Old Testament (Hebrew and Latin, 1534 or 1546), and a Vulgate version (one of Stephanus' editions 1527ff. and the Sixto-Clementinian edition of 1592).

(b) The main German editions of Luther's translation printed in Wittenberg have to be included, at least the 1522 "September-Testament", the first complete Bible of 1534, the thoroughly revised version of 1541 ("auffs new zugericht"), and the last edition authorized by Luther, printed in 1545.

(c) The other German versions, all of them in close neighbourhood to Luther's text, should be present in the corpus: the Low German rendering by Bugenhagen and others (New Testament, Hamburg 1523 or Wittenberg 1523, and Bible, Lübeck 1534 or Magdeburg 1536), the Catholic versions (Emser's New Testament, Dresden 1527, and Dietenberger's Bible, Mainz/Cologne 1534), and the Zurich editions (New Testament 1524, Bible 1530 or 1531).

(d) The English Bible translations are embedded in an eventful church history, so that at least four editions are to be included: Tyndale's New Testament (1526), Coverdale's Bible (1535), the Geneva Bible (1560), the Catholic Rheims-Douay Bible (1609/1610), and the King James Bible (1611) as the resulting standard version.

(e) The Dutch versions originate from various theological backgrounds, so that a limited selection seems to be rather complicated. At a first glance the following translations might fulfil the criteria: the Antwerp versions by van Berghen (1524), Vorsterman (1528), and van Liesvelt (1532), the Emden versions by Mierdman/Gheylliaerd (1556) and Biestkens (1560), and the Statenbijbel (1637) that became the standard text of the Netherlands.

(f) At least two versions should be included regarding the Danish (New Testament 1524, Bible 1550), the Swedish (New Testament 1526, Bible 1541), and the Icelandic (New Testament 1540, Bible 1584) translations, all of them depending on different editions of Luther's or Bugenhagen's text. The 17th century Bible editions in these countries gained the status of standard editions (like the Statenbijbel and the King James Bible) and would be valuable supplements: the Danish Bible of 1647, the Swedish Bible of 1646, and the Icelandic Bible of 1644.

This list of relevant Bible versions should be discussed during the project preparation and has to be completed regarding the Romance, the Slavonic, and the Finno-Ugrian languages.

Project Restrictions

The amount of relevant texts makes it necessary to split the work into pieces that could be handled in a manageable period.

The first phase of the project should deal with the New Testament, because of a quantitative and a qualitative reason: on the one hand, the text portion is approximately one fifth of the whole text, on the other hand, this part of the Bible was the primary text of the theological discussion during the Reformation era. Regarding the number of languages, a restriction to the Germanic languages (besides Greek and Latin) will help to get started soon, but additional partners working on Romance, Slavonic, or Finno-Ugrian Bible texts could be also integrated. This phase could be finished after a period of two years. The second phase would cover the Old Testament and could extend the focus on the remaining European languages.

As a result, the Bible corpus will provide an excellent and unique source for linguistic and theological studies promising new insights in historical linguistics and theological aspects of Reformation history. About 500 years after the beginning of modern times, this project will help to explain some of the roots of modern Europe focussed on but not restricted to language and belief.

2.2 Encoding and Annotation Standards

In order to get a consistent corpus of comparable texts, all incorporated text data must follow the same standards of character encoding and of editorial and linguistic annotation. The texts will be presented as graphical (page scans) and textual data (full text, diplomatically encoded and linguistically annotated).

The eXtended Mark-up Language (XML) provides all relevant features needed for encoding and annotating the corpus texts. The entities and tags defined on this standard are very flexible and allow the definition of a base set which can be extended if new characters or complex editorial and linguistic data occur. Within the project, the base set must be obligatory, and the extensions have to follow certain rules like tag hierarchies and character encoding schemes.

The demand for a diplomatic text encoding requires a tag set which allows to preserve page structure (page, column, line, initial; headline, marginal note, pagination) and content structure (introduction, appendix; book, chapter, paragraph, verse; note). Linguistic meta data could be added on different grammatical levels, especially concerning word features and sentence segmentation.

The lexeme definition of each word should allow to combine different (ortho)graphical variants. A reduced part-of-speech set will leave aside difficult – and in the process of annotating very time-consuming – questions of differentiation (e.g. between different pronouns). These questions might be one of the first research subjects based on the corpus in order to gain insights e.g. into the pronominal usage of different lexemes in a synchronic and a diachronic perspective.

Sentence segmentation should be based on orthographical divisions and on linguistic definitions of main and subordinate clauses. It is possible to extend syntactic data on part-of-sentence functions which might be very useful while comparing versions of different languages.

The process of annotating will be supported by a specialized program that helps to reduce typing and structural errors and aids in keeping the texts consistent. The page scans could be presented to the annotator as a part of the user interface. This will make the encoding and annotating process easier and will help to save time.

2.3 Tasks to be Handled

The size of the project requires a cooperation of several institutes providing the linguistic and theological expertise needed for each language. Furthermore, a careful coordination is necessary in order to preserve the consistency of all corpus texts and to solve occurring problems in the area of encoding and annotation on a common basis.

The first task for all partners will be to formulate their special interests, which may extend the core aims of the project, and to define additions regarding encoding and annotation that these interests might require. The second one should be to present the special problems of each corpus language in early modern times in general and the questions concerning the Bible translations in particular during an initial conference that will take place in spring 2006. The third one will be to prepare and to submit coordinated funding applications for each project part.

The central coordination has to develop and to provide the means for project-internal communication and project-external presentation, the software for the encoding and annotation process as well as for examination, and the research software, continuously improved by specialized search functions.

This structure of the project combines the possibility of special accents in each part of the project with the essential requirements of a tight cooperation. Thus the common efforts will result in a promising new research instrument increasing the probability of new findings.

The project preparation started in 2004, until December 2005 at least one cooperation partner should be found for each language represented in the corpus. The next step will be to discuss the selection of relevant texts and the depth of tagging. As a result, there will be a definitive list of editions for each language and a common tag set for all corpus texts.

Concerning the funding of the first part of the project (New Testament, 2 years), it will help to coordinate the national applications which should be prepared in the beginning of 2006.

In autumn 2006 the national projects should start their work, (if and) when the funding applications will be granted. In 2008 a concluding conference should present the results of the project and prepare the second phase of the project, if considered a useful completion of the then existing first version of the Bible corpus.

3. Some "Results" of the Project

The project presented here is based on the idea that a corpus of biblical texts covering the main translations of the 16th century – in some cases up to the standard versions of the 17th century (e.g. Statenbijbel, King James Version) – would help to investigate the process of "coining" theological terms and concepts on the one hand, and the process of "creating" standardized languages regarding orthography, grammar, and lexis on the other hand. Additionally, the study of translation theory could profit especially in regard of the conflict between literal and literary translation, between the faithfulness concerning the source (Luther, Erasmus) and the comprehensibility of the language (e.g. in Denmark, Sweden, and Switzerland).

Within the first phase of the project only the "main" translations can be digitized and annotated. But the project framework consisting of several programs (encoding and annotation tools as well as research tools) will be based on open standards (primarily XML) so that the possibility occurs that other projects working on other editions and revisions can base their work on already existing texts and/or use the framework. Thus every student or scholar can establish his or her own text database (surrounded by the existing texts) in order to write a thesis or an essay. The resulting text data could be incorporated into the corpus if both the quality and the amount of data are sufficient enough.

The main idea (and the attainable result) of the Bible corpus project is to provide (re)search possibilities a lot of scholar wished to have in former times. The corpus bibliorum aetatis reformationis will provide

•  comparable and diplomatically encoded electronic editions,

•  representing a text of an intense and widespread influence on language and church history in (diachronically) various revisions and (synchronically) different versions and languages,

•  presented in a flexible research environment.

The findings expected will stimulate the theological and linguistic discussion, and a lot of former results could be rethought with more precision and on a broader basis.

