会员中心 |  会员注册  |  兼职信息发布    浏览手机版!    超值满减    人工翻译    英语IT服务 贫困儿童资助 | 留言板 | 设为首页 | 加入收藏  繁體中文
当前位置:首页 > 行业文章 > 笔译技术 > 正文

Word-Frequency Lists

发布时间: 2023-07-11 09:21:56   作者:etogether.net   来源: 网络   浏览次数:
摘要: Word-frequency lists can be manipulated in a number of ways, and they can be sorted in various different orders.


The most basic feature provided by a corpus-analysis tool is a word-frequency list, which allows users to discover how many different words are in a corpus and how often each appears. These two figures are referred to as types and tokens. For illustrative purposes, suppose that a corpus consists of the following sentence:


I really like translation because I think that translation is really, really fun.


This sentence contains a total of thirteen words; therefore, the corpus contains thirteen tokens. However, some of the words appear more than once (I, really, translation); therefore, the corpus contains only nine different words, and these are known as types. In a word-frequency list, the types are presented in a list and the number of tokens (the number of times that word occurs) is shown beside the type. This is illustrated in figure 1.


Word-frequency lists can be manipulated in a number of ways. They can be sorted in various different orders, including order of occurrence in the corpus, alphabetical order, and order of frequency, and these lists can be arranged in ascending or descending order. Therefore, the same word list can be arranged in at least six different ways, as shown in figures 2, 3, and 4.


Figure 1.png

Figure 1 A word-frequency list showing types on the left and tokens on the right.



Figure 2.png

Figure 2 Word-frequency lists sorted in order of appearance in the corpus, in descending order (from the beginning of the corpus to the end) and ascending order (from the end of the corpus to the beginning).



The single-sentence corpus used in the above examples is purely for illustrative purposes – a translator would not need to use a computerized tool to analyze a single sentence. Normally, a corpus would be much larger – often in the order of hundreds of thousands or even millions of words. In such cases, the advantage of having a computer to help with counting and sorting becomes clear!


In addition to counting the frequency of words, corpus-analysis tools calculate the ratio of types to tokens. Some corpus-analysis tools can also count the number of sentences and paragraphs and calculate the average length of words, sentences, and paragraphs in the corpus.


Figure 3.png

Figure 3 Word-frequency lists sorted in alphabetical order, in descending order (from A to Z) and ascending order (from Z to A).



Figure 4.png

Figure 4 Word-frequency lists sorted in order of frequency, in descending order (from the most frequent to the least frequent) and ascending order (from the least frequent to the most frequent). When multiple words have the same

frequency count, they are further sorted in alphabetical order.



This type of information can help translators assess some of the stylistic features of the texts in the corpus.



责任编辑:admin


微信公众号

我来说两句
评论列表
已有 0 条评论(查看更多评论)