Corpus-analysis tools have been in use for some time by language professionals such as foreign-language teachers and lexicographers, and translators are becoming increasingly aware of the advantages offered by such tools. As always, however, the advantages must be carefully weighed against the drawbacks before a decision is made about whether using a tool will be truly beneficial in any given situation.
1. Frequency data
Frequency data are not easily obtainable from resources such as dictionaries or printed parallel texts, but word-frequency lists can be easily generated using corpus-analysis tools. Such lists are simple yet powerful. In a translation context, they can help translators to determine which words seem to be "important" in the corpus on the basis of frequency, and translators can compare the frequencies of different words. For example, a translator can use a frequency list to help determine whether a term appears to be commonly used by experts in the subject field or appears to be the idiosyncratic preference of a small group of users. Similarly, when faced with a choice of two or more synonyms, the translator can consult a frequency list to see which of the terms is more commonly used. Of course, frequency alone is not always sufficient for determining whether a given term is appropriate, but the data generated by the frequency list can be further investigated using other features of corpus-analysis tools such as concordancers and collocation generators.
2. Context
One of the greatest advantages of corpus-analysis tools is that they allow translators to see terms in a variety of contexts simultaneously, which, in turn, allows them to detect various kinds of linguistic and conceptual patterns that are sometimes difficult to spot in isolated printed resources. It is important to emphasize once again, however, that although the corpus-analysis tools present information in a manner that makes it easier to analyze, they do not actually do any analysis; it is up to the translator to interpret the data.
3. Availability and copyright
As well as interpreting data, the translator must provide them. Corpus-analysis tools are not typically accompanied by corpora; in any case, it is up to the translator to compile a corpus that is suitable for the project at hand. Depending on the languages, subject field, and text type in question, it may be reasonably easy or relatively difficult to compile a
corpus.
In the case of monolingual corpora, there is a considerable amount of information available in English on a wide variety of subjects and in a broad range of text types. However, if a translator is working with a less widely used language and in a very specialized subject field, it may be more difficult to find texts to put into a corpus. As the Internet increases in popularity and as access to such technology spreads around the globe, this situation will gradually improve.
With regard to bilingual parallel corpora, availability may also be an issue. There are currently very few pre-constructed corpora of this type available, so translators will likely have to compile their own corpora in the relevant subject field and language pair. This means identifying existing translations and aligning the texts. However, as the popularity of tools such as translation memories begins to increase, it will become easier to either construct or gain access to bilingual parallel corpora.