1.Term Frequency(TF(x)):指词x在当前文本D中的词频
2.Inverse Document Frequency(IDF): N代表语料库中文本的总数,而N(x)代表语料库中包含词x的文本总数,平滑后的IDF如下:
3.TF-IDF :
我玩电脑 = 电脑玩我 ?
Bi-gram : {I, love}, {love, deep}, {love, deep}, {deep, learning}
Tri-gram : {I, love, deep}, {love, deep, learning}
CBOW(Continuous Bag-Of-Words):应用词的上下文预测当前的词。
Skip-Gram:应用当前的词来预测上下文。
欢迎光临 智客公社 (http://bbs.cnaiplus.com/) | Powered by Discuz! X3.4 |