學(xué)術(shù)文本的結(jié)構(gòu)功能識(shí)別——在關(guān)鍵詞自動(dòng)抽取中的應(yīng)用
發(fā)布時(shí)間:2018-09-11 09:19
【摘要】:當(dāng)前的關(guān)鍵詞自動(dòng)提取研究大多基于候選詞的詞頻、文檔頻率等統(tǒng)計(jì)信息,往往忽略了侯選詞所在的學(xué)術(shù)文本的內(nèi)在結(jié)構(gòu),導(dǎo)致關(guān)鍵詞提取的效果不佳。本文將學(xué)術(shù)文本看作是5個(gè)結(jié)構(gòu)功能域的集合,提出了融合學(xué)術(shù)文本結(jié)構(gòu)功能特征的多特征組合提取方法,并利用學(xué)術(shù)文本的章節(jié)標(biāo)題對(duì)其結(jié)構(gòu)功能進(jìn)行識(shí)別,然后通過(guò)SVM二分類和LambdaMART學(xué)習(xí)排序算法分別在計(jì)算機(jī)語(yǔ)言學(xué)領(lǐng)域的文獻(xiàn)集上進(jìn)行了實(shí)現(xiàn)。實(shí)驗(yàn)結(jié)果表明,本文提出的組合特征方法相比基準(zhǔn)特征在關(guān)鍵詞提取的效果上取得了較大的提升,尤其在分類實(shí)驗(yàn)中準(zhǔn)確率的相對(duì)提升上達(dá)到10.75%,證明了學(xué)術(shù)文本結(jié)構(gòu)功能特征在關(guān)鍵詞自動(dòng)提取上的重要性。
[Abstract]:Most of the current research on automatic keyword extraction is based on the statistical information such as word frequency and document frequency of candidate words, which often ignores the internal structure of the academic text in which the candidate words are located, resulting in a poor result of keyword extraction. In this paper, the academic text is regarded as a collection of five structural and functional domains, and a multi-feature combination extraction method is proposed, which combines the structural and functional features of the academic text, and uses the chapter title of the academic text to identify its structure and function. Then, the SVM binary classification and the LambdaMART learning sorting algorithm are implemented on the literature set in the field of computer linguistics. The experimental results show that the combined feature method proposed in this paper has achieved a better result than the benchmark feature in keyword extraction. Especially in the classification experiment, the relative improvement of accuracy is 10.75, which proves the importance of the function feature of academic text structure in the automatic extraction of keywords.
【作者單位】: 武漢大學(xué)信息管理學(xué)院信息檢索與知識(shí)挖掘?qū)嶒?yàn)所;
【基金】:國(guó)家自然科學(xué)基金面上項(xiàng)目“面向詞匯功能的學(xué)術(shù)文本語(yǔ)義識(shí)別與知識(shí)圖譜構(gòu)建”(71473183);國(guó)家自然科學(xué)基金面上項(xiàng)目“基于多語(yǔ)義信息融合的學(xué)術(shù)文獻(xiàn)引文推薦研究”(71673211)
【分類號(hào)】:TP391.1
,
本文編號(hào):2236274
[Abstract]:Most of the current research on automatic keyword extraction is based on the statistical information such as word frequency and document frequency of candidate words, which often ignores the internal structure of the academic text in which the candidate words are located, resulting in a poor result of keyword extraction. In this paper, the academic text is regarded as a collection of five structural and functional domains, and a multi-feature combination extraction method is proposed, which combines the structural and functional features of the academic text, and uses the chapter title of the academic text to identify its structure and function. Then, the SVM binary classification and the LambdaMART learning sorting algorithm are implemented on the literature set in the field of computer linguistics. The experimental results show that the combined feature method proposed in this paper has achieved a better result than the benchmark feature in keyword extraction. Especially in the classification experiment, the relative improvement of accuracy is 10.75, which proves the importance of the function feature of academic text structure in the automatic extraction of keywords.
【作者單位】: 武漢大學(xué)信息管理學(xué)院信息檢索與知識(shí)挖掘?qū)嶒?yàn)所;
【基金】:國(guó)家自然科學(xué)基金面上項(xiàng)目“面向詞匯功能的學(xué)術(shù)文本語(yǔ)義識(shí)別與知識(shí)圖譜構(gòu)建”(71473183);國(guó)家自然科學(xué)基金面上項(xiàng)目“基于多語(yǔ)義信息融合的學(xué)術(shù)文獻(xiàn)引文推薦研究”(71673211)
【分類號(hào)】:TP391.1
,
本文編號(hào):2236274
本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2236274.html
最近更新
教材專著