发布时间: 2022-07-13 09:22:30
摘要: N元语法性能强烈地依赖于训练它们的语料库 特别是语料库的种类和单词的容量。


我们采用Shannon(1951)提出并同时由Miller and Selfridge(1950)使用过的直观化(visualization)技术,从直觉上来了解这些事实。我们的基本想法是,首先训练各种N元语法,然后用它们来随机地生成句子。在一元语法的场合,要直观地看到其工作的情况非常简单。我们假定,英语中所有单词覆盖的概率空间在0和1之间,我们在0和1之间选择一个随机数,然后把覆盖所选实际值的单词打印出来。同样的技术也可以用来生成阶数更高的N元语法,首先根据二元语法的概率从<s>开始生成一个随机的二元语法,然后接着这个二元语法再选择一个随机的二元语法(下面一个二元语法的似然度与它的条件概率是成比例的),依次类推。


1. 用一元语法来逼近莎士比亚

(a) To him swallowed confess hear both. Which. Of save on trail for are ay device and rote life have

(b) Every enter now severally so, let

(c) Hill he late speaks; or! a more to leg less first you enter

(d) Will rash been and by I the me loves gentle me not slavish page, the and hour; ill let

(e) Are where exeunt and sighs have rise excellency took of. Sleep knave we. near; vile like

2. 用二元语法来逼近莎士比亚

(a) What means, sir. I confess she? then all sorts, he is trim, captain.

(b) Why dost stand forth thy canopy, forsooth; he is this palpable hit the King Henry. Live king. Follow.

(c) What we, hath got so she that I rest and send to scold and nature bankrupt, nor the first gentleman?

(d) Enter Menenius, if it so many good direction found'st thou art a strong upon command of fear not a liberal largess given away, Falstaff! Exeunt

(e) Thou whoreson chops. Consumption catch your dearest friend, well, and I know where many mouths upon my undoing all but be, how soon. then; we'll execute upon my love's bonds and we do you will?

(f) The world shall- my lord!

3. 用三元语法来逼近莎士比亚

(a) Sweet prince, Falstaff shall die. Harry of Monmouth's grave.

(b) This shall forbid it should be branded, if renown made it empty.

(c) What is't that cried?

(d) Indeed the duke; and had a very good friend.

(e) Fly, and will rid me these news of price. Therefore the sadness of parting, as they say,'tis done.

(f) The sweet! How many then shall posthumus end his miseries.


