Given that corpora are specially designed to meet the needs of the project at hand, there are as many different corpora as there are projects. Nevertheless, it is possible to identify some general characteristics that corpora may have. For instance, corpora can be monolingual, bilingual, or multilingual. A monolingual corpus is one that contains texts in a single language. Bilingual and multilingual corpora contain texts in two or more languages, respectively. Most commonly, such corpora will contain texts in language A alongside their translations into language B, language C, and so on. A bilingual corpus that contains source texts and their translations is sometimes referred to as bitext, but the more common term is parallel corpus, which can be used to describe both bilingual and multilingual collections. Unfortunately, there can be some confusion surrounding the word "parallel." As described above, the printed parallel texts conventionally used by translators consist of texts that have the same communicative function as the source text, but that were originally written in the target language; in other words, they are not translations of the source text but are texts of the same text type, on the same topic, and so on. Parallel corpora, on the other hand, consist of source texts aligned with their translations. The notion of alignment is an important one if the parallel
corpus is to be optimally useful.
Other types of corpora include monolingual comparable corpora and bilingual comparable corpora. Monolingual comparable corpora consist of two parts: a collection of texts that have been originally written in language A, and a collection of texts that have been translated into language A from other languages. This type of corpus is useful for researchers interested in studying the nature of translated text; however, it is less useful as a resource for practising translators. Bilingual comparable corpora are akin to the printed parallel texts used by translators: both parts of this corpus contain texts that are of the same text type and on the same subject, but one collection contains texts originally written in language A while the other collection contains texts originally written in language B. Because the two collections do not have a source text-target text relationship, they cannot be aligned. Therefore, although a bilingual comparable corpus contains a potential wealth of useful information for translators, it is very difficult to identify and retrieve the relevant sections of the text in a semi-automated way. A great deal of active research is being carried out with regard to the development and exploitation of monolingual and bilingual comparable corpora, so it may not be long before tools for helping translators to exploit these resources become available.
责任编辑:admin