基于共現(xiàn)潛在語(yǔ)義向量空間模型的語(yǔ)義核構(gòu)建

發(fā)布時(shí)間：2018-10-29 20:28

【摘要】：實(shí)現(xiàn)數(shù)字圖書(shū)館資源聚合的知識(shí)發(fā)現(xiàn)離不開(kāi)對(duì)知識(shí)的有效表示。作為經(jīng)典的文本表示模型,向量空間模型(VSM)及其衍生模型在信息檢索以及知識(shí)發(fā)現(xiàn)等研究中都有著重要的地位,但依然存在不足。共現(xiàn)潛在語(yǔ)義向量空間模型(CLSVSM)作為新的文本表示模型,與VSM相比明顯提高了文本聚類(lèi)的精度。然而,面對(duì)文本大數(shù)據(jù)的應(yīng)用,共現(xiàn)矩陣維度往往較高,致使模型的計(jì)算復(fù)雜度也較大。因此,本文在CLSVSM基礎(chǔ)上構(gòu)建了語(yǔ)義核(CLSVSM_K),構(gòu)建的原理是基于潛在語(yǔ)義分析(LSA)的思想。CLSVSM_K不僅降低了共現(xiàn)矩陣的維度,而且實(shí)現(xiàn)了文本特征詞之間同義信息的合并。本文將該語(yǔ)義核模型應(yīng)用于文獻(xiàn)的主題聚類(lèi)中,實(shí)驗(yàn)結(jié)果表明,該方法的確有效降低了特征詞空間的維度和計(jì)算的復(fù)雜度,提高了聚類(lèi)算法的性能,且提高了文獻(xiàn)主題聚類(lèi)的精確度。該模型的應(yīng)用將有助于數(shù)字圖書(shū)館信息資源組織、知識(shí)發(fā)現(xiàn)和知識(shí)優(yōu)化。
[Abstract]:The realization of digital library resources aggregation knowledge discovery can not be separated from the effective representation of knowledge. As a classical text representation model, vector space model (VSM) and its derivative model play an important role in the research of information retrieval and knowledge discovery, but there are still some shortcomings. As a new text representation model, the latent semantic vector space model (CLSVSM) improves the accuracy of text clustering obviously compared with VSM. However, in the face of the application of big data, the dimension of co-occurrence matrix is often high, which leads to the computational complexity of the model. Therefore, this paper constructs a semantic kernel (CLSVSM_K) on the basis of CLSVSM, which is based on the idea of latent semantic analysis of (LSA). CLSVSM_K not only reduces the dimension of co-occurrence matrix, but also reduces the dimension of co-occurrence matrix. Moreover, the synonymy information of text feature words is merged. In this paper, the semantic kernel model is applied to the topic clustering in literature. The experimental results show that the proposed method can effectively reduce the dimension and computational complexity of the feature space and improve the performance of the clustering algorithm. Moreover, the accuracy of topic clustering is improved. The application of this model will be helpful to information resource organization, knowledge discovery and knowledge optimization of digital library.
【作者單位】：山西大學(xué)數(shù)學(xué)科學(xué)學(xué)院;山西大學(xué)管理與決策研究所;
【基金】：國(guó)家自然科學(xué)基金“共現(xiàn)潛在語(yǔ)義向量空間模型及其語(yǔ)義核的構(gòu)建與應(yīng)用研究”(71503151) 山西省高等學(xué)校創(chuàng)新人才支持計(jì)劃“基于潛在語(yǔ)義的文本信息主題深度聚類(lèi)研究”(2016052006)
【分類(lèi)號(hào)】：TP391.1

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 張玉峰;王志芳;;文本分類(lèi)中的語(yǔ)義核函數(shù)研究[J];情報(bào)科學(xué);2010年07期

2 劉建舟;邵雄凱;;基于語(yǔ)義核的中文實(shí)體關(guān)系抽取[J];信息系統(tǒng)工程;2011年03期

3 杜家利;于屏方;;計(jì)算語(yǔ)義學(xué)視角下的文本風(fēng)格研究[J];計(jì)算機(jī)工程與應(yīng)用;2011年30期

4 丁月華,文貴華,郭煒強(qiáng);基于核向量空間模型的專(zhuān)利分類(lèi)[J];華南理工大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年08期

5 王萌,何婷婷,張偉;基于概念向量空間模型的中文自動(dòng)文摘系統(tǒng)[J];計(jì)算機(jī)工程與應(yīng)用;2005年01期

6 張玉連;張敏;張波;;一種擴(kuò)展的向量空間模型-隱含語(yǔ)義索引模型研究[J];燕山大學(xué)學(xué)報(bào);2006年01期

7 李雪峰;劉魯;張f，

本文編號(hào)：2298727

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2298727.html

上一篇：從短語(yǔ)到構(gòu)式:構(gòu)式知識(shí)庫(kù)建設(shè)的若干理論問(wèn)題探析
下一篇：圖像去噪與圖像分割中的數(shù)學(xué)方法

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于共現(xiàn)潛在語(yǔ)義向量空間模型的語(yǔ)義核構(gòu)建