天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

維吾爾文文本分類(lèi)研究及系統(tǒng)開(kāi)發(fā)

發(fā)布時(shí)間:2018-05-31 14:26

  本文選題:維吾爾文 + 文本分類(lèi)。 參考:《新疆大學(xué)》2012年碩士論文


【摘要】:隨著計(jì)算機(jī)與網(wǎng)絡(luò)技術(shù)的快速發(fā)展,互聯(lián)網(wǎng)得到了廣泛應(yīng)用。Web信息的快速增長(zhǎng)給信息檢索帶來(lái)嚴(yán)峻的考驗(yàn),大量信息的出現(xiàn)使我們從中尋找需求的信息難度加大。文本分類(lèi)對(duì)處理雜亂信息起著關(guān)鍵而有效的作用,在信息檢索,搜索引擎,數(shù)字圖書(shū)館管理等領(lǐng)域都有重要的應(yīng)用。 本文從維吾爾文的特點(diǎn)與書(shū)寫(xiě)規(guī)則出發(fā),建立了(包含20類(lèi),每類(lèi)300篇文本)規(guī)模較大的文本語(yǔ)料庫(kù)。深入研究并仔細(xì)考慮維吾爾文的特點(diǎn)和語(yǔ)法規(guī)則,通過(guò)進(jìn)行大量實(shí)驗(yàn)和人工審核建立了比較完整的停用詞表。分析了詞干提取對(duì)維吾爾文文本分類(lèi)準(zhǔn)確率和分類(lèi)速度方面的影響。由于降低向量空間維數(shù)是文本分類(lèi)中的一個(gè)很重要的問(wèn)題,針對(duì)這一點(diǎn)本文利用維吾爾文的詞法規(guī)則采用了詞干提取方法,通過(guò)此方法不影響維吾爾文文本分類(lèi)準(zhǔn)確率的同時(shí)達(dá)到了很好的降維目的。采用詞干提取方法以后,,將維25%左右。 在特征提取方法中采用CHI統(tǒng)計(jì)特征選擇方法,通過(guò)實(shí)驗(yàn)分析特征數(shù)目的多少對(duì)實(shí)驗(yàn)結(jié)果的影響,實(shí)驗(yàn)結(jié)果表明,選取原始特征的3%-5%,相對(duì)來(lái)說(shuō)是個(gè)最佳特征。通過(guò)大量實(shí)驗(yàn),分析了維吾爾文字拼寫(xiě)錯(cuò)誤對(duì)維吾爾文文本分類(lèi)的影響。實(shí)驗(yàn)結(jié)果表明,拼寫(xiě)錯(cuò)誤對(duì)維吾爾文文本分類(lèi)的影響不大,但在降低向量空間維數(shù)方面有一定的影響。 較深入的研究了國(guó)內(nèi)外廣泛應(yīng)用的KNN,樸素貝葉斯(NB),SVM等的分類(lèi)算法,并通過(guò)這些算法對(duì)維吾爾文文本進(jìn)行分類(lèi),分析了每一種算法在維吾爾文文本上的性能。最終把維吾爾語(yǔ)的特點(diǎn)和文本分類(lèi)技術(shù)相結(jié)合,搭建了維吾爾文文本分類(lèi)實(shí)驗(yàn)平臺(tái)(維吾爾文文本分類(lèi)系統(tǒng))。
[Abstract]:With the rapid development of computer and network technology, the Internet has been widely used. The rapid growth of Web information brings a severe test to information retrieval. The emergence of a large number of information makes it more difficult for us to find the information we need. Text classification plays a key and effective role in dealing with messy information. It has important applications in the fields of information retrieval, search engine, digital library management and so on. Based on the characteristics and writing rules of Uygur language, a large text corpus (including 20 categories, 300 texts per class) is established in this paper. In this paper, the characteristics and grammar rules of Uygur language are deeply studied and carefully considered, and a complete stop word list is established by a large number of experiments and manual verification. The effect of stem extraction on the accuracy and speed of Uygur text classification is analyzed. Because reducing the dimension of vector space is a very important problem in text classification, this paper uses the lexical rules of Uygur to extract the stem. The accuracy of Uygur text classification is not affected by this method, and a good dimension reduction is achieved at the same time. After using stem extraction method, the dimension is about 25%. In the feature extraction method, the CHI statistical feature selection method is adopted, and the influence of the number of features on the experimental results is analyzed experimentally. The experimental results show that the selection of the original feature 3- 5 is relatively the best feature. Through a large number of experiments, this paper analyzes the influence of Uygur spelling errors on Uygur text classification. The experimental results show that spelling errors have little effect on Uygur text classification, but have a certain effect on reducing the dimension of vector space. In this paper, the classification algorithms of KNN, naive Bayesian support Vector Machine (SVM), which are widely used at home and abroad, are deeply studied, and the performance of each algorithm on Uygur text is analyzed through these algorithms. Finally, combining the characteristics of Uygur language with text classification technology, a Uygur text classification experimental platform (Uygur text classification system) is built.
【學(xué)位授予單位】:新疆大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:TP391.1

【引證文獻(xiàn)】

相關(guān)碩士學(xué)位論文 前2條

1 祖麗湖瑪爾·馬木提江;維吾爾語(yǔ)區(qū)分性關(guān)鍵詞提取算法研究及其性能分析[D];新疆大學(xué);2013年

2 如先姑力·阿布都熱西提;維吾爾文詞語(yǔ)自動(dòng)校對(duì)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];電子科技大學(xué);2013年



本文編號(hào):1960077

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1960077.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶c64cd***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com