基于異構(gòu)信息的債券知識(shí)服務(wù)的研究與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-03-01 00:13
本文關(guān)鍵詞: 異構(gòu)信息 檢索結(jié)果評(píng)估方法 本體規(guī)則自適應(yīng) 不平衡分類 出處:《哈爾濱工業(yè)大學(xué)》2013年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著金融行業(yè)的迅猛發(fā)展,金融產(chǎn)品的網(wǎng)絡(luò)知識(shí)服務(wù)平臺(tái)越來(lái)越得到眾多投資者的認(rèn)可。以債券為例,網(wǎng)絡(luò)中大量債券異構(gòu)信息的存在,為構(gòu)建自動(dòng)化的債券知識(shí)服務(wù)平臺(tái)提供了一定的數(shù)據(jù)來(lái)源。因此,本課題將研究金融產(chǎn)品異構(gòu)信息的獲取方法,以及對(duì)這些異構(gòu)信息進(jìn)行加工、處理,,進(jìn)一步完成信息的分類融合,并將最終整合的信息應(yīng)用于債券知識(shí)服務(wù)平臺(tái)當(dāng)中。 本課題研究的主要內(nèi)容有以下幾個(gè)方面: 債券產(chǎn)品異構(gòu)信息的獲取方法:包括債券結(jié)構(gòu)化數(shù)據(jù)和非結(jié)構(gòu)化網(wǎng)頁(yè)數(shù)據(jù)的獲取、預(yù)處理;債券數(shù)據(jù)的來(lái)源包括固定金融網(wǎng)站和搜索引擎兩部分,在搜索引擎部分本文提出了基于搜索引擎的特定領(lǐng)域檢索結(jié)果評(píng)估模型RDMDRR,進(jìn)一步提高了債券公告信息獲取的準(zhǔn)確性和全面性。 債券產(chǎn)品異構(gòu)信息的抽。菏紫仁褂肳HISK算法構(gòu)建債券特征的本體規(guī)則庫(kù),然后利用本體規(guī)則自適應(yīng)的方法對(duì)構(gòu)建的規(guī)則進(jìn)行剪枝操作,得到完善的本體規(guī)則庫(kù),并將其運(yùn)用到債券實(shí)體信息的抽取中,為構(gòu)建債券的知識(shí)服務(wù)提供數(shù)據(jù)來(lái)源。 債券信息的分類及融合:針對(duì)債券的不同類別,分別采用了規(guī)則和機(jī)器學(xué)習(xí)的方法對(duì)債券進(jìn)行分類。基于類別不均衡分布的特點(diǎn),本文提出了一種新的特征權(quán)重方法,對(duì)原來(lái)的TFIDF進(jìn)行了改進(jìn),并將其運(yùn)用到不均衡分類當(dāng)中,提高了少數(shù)類的識(shí)別率,準(zhǔn)確的對(duì)債券信息進(jìn)行歸類整理,然后將其與其它債券信息進(jìn)行融合,形成較完整的債券知識(shí)庫(kù)。 異構(gòu)信息經(jīng)過(guò)上述三個(gè)環(huán)節(jié)的處理、加工與融合,得到完整的債券知識(shí),并將其整合到債券知識(shí)服務(wù)平臺(tái)中。實(shí)驗(yàn)表明,構(gòu)建的知識(shí)服務(wù)平臺(tái)改變了傳統(tǒng)的知識(shí)服務(wù)平臺(tái)的知識(shí)擴(kuò)充模式,知識(shí)獲取的準(zhǔn)確度和召回率在不同處理環(huán)節(jié)均得到了相應(yīng)的提高,知識(shí)服務(wù)平臺(tái)也得到債券投資用戶的認(rèn)可。
[Abstract]:With the rapid development of financial industry, the network knowledge service platform of financial products is more and more recognized by many investors. It provides a certain data source for the construction of automated bond knowledge service platform. Therefore, this paper will study the methods of obtaining heterogeneous information of financial products, as well as the processing and processing of these heterogeneous information. Further complete the classification and fusion of information, and apply the final integrated information to the bond knowledge service platform. The main contents of this research are as follows:. The methods of obtaining isomerous information of bond products include the acquisition and preprocessing of structured and unstructured data of bonds, and the sources of bond data include fixed financial websites and search engines. In the part of search engine, this paper puts forward the evaluation model of search results based on search engine in specific domain, which further improves the accuracy and comprehensiveness of obtaining bond announcement information. The extraction of heterogeneous information of bond products: firstly, the ontology rule base of bond features is constructed by using WHISK algorithm, and then the rules are pruned by the adaptive method of ontology rules, and a perfect ontology rule base is obtained. It is applied to the extraction of bond entity information to provide data source for constructing bond knowledge service. The classification and fusion of bond information: according to the different categories of bonds, the methods of rule and machine learning are used to classify bonds. Based on the characteristics of class disequilibrium distribution, a new method of feature weight is proposed in this paper. This paper improves the original TFIDF and applies it to the unbalanced classification, improves the recognition rate of a few classes, classifies the bond information accurately, and then merges it with other bond information. Form a complete bond knowledge base. The heterogeneous information is processed, processed and integrated into the bond knowledge service platform through the processing and fusion of the above three links. The experimental results show that, The knowledge service platform has changed the knowledge expansion mode of the traditional knowledge service platform. The accuracy and recall rate of knowledge acquisition have been improved in different processing links. The knowledge service platform has also been recognized by bond investment users.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 廖一星;潘雪增;;面向不平衡文本的特征選擇方法[J];電子科技大學(xué)學(xué)報(bào);2012年04期
2 陳蘭,左志宏,熊毅,孟令謙;一種新的基于Ontology的信息抽取方法[J];計(jì)算機(jī)應(yīng)用研究;2004年08期
3 劉遷;焦慧;賈惠波;;信息抽取技術(shù)的發(fā)展現(xiàn)狀及構(gòu)建方法的研究[J];計(jì)算機(jī)應(yīng)用研究;2007年07期
4 劉鵬博;車海燕;陳偉;;知識(shí)抽取技術(shù)綜述[J];計(jì)算機(jī)應(yīng)用研究;2010年09期
5 車萬(wàn)翔,劉挺,李生;實(shí)體關(guān)系自動(dòng)抽取[J];中文信息學(xué)報(bào);2005年02期
6 張愛(ài)華;靖紅芳;王斌;徐燕;;文本分類中特征權(quán)重因子的作用研究[J];中文信息學(xué)報(bào);2010年03期
7 郭紅鈺;;基于信息熵理論的特征權(quán)重算法研究[J];計(jì)算機(jī)工程與應(yīng)用;2013年10期
本文編號(hào):1549567
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1549567.html
最近更新
教材專著