天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于相似度估計文檔復(fù)制檢測系統(tǒng)的設(shè)計與實現(xiàn)

發(fā)布時間:2019-03-16 15:31
【摘要】:隨著計算機網(wǎng)絡(luò)應(yīng)用技術(shù)的發(fā)展,互聯(lián)網(wǎng)中相似信息的數(shù)量呈幾何級增長,越來越多的高相似度文檔一方面消耗了高額的網(wǎng)絡(luò)儲存空間,另一方面也對用戶體驗造成了不良影響。信息平臺的開放性與數(shù)字化文本的易獲性造成了論文的抄襲甚至是非法剽竊等學(xué)術(shù)不端行為有越演越烈之勢,造成的嚴(yán)重后果不言而喻。為提高信息檢索效率和保護知識產(chǎn)權(quán),利用相似度估計技術(shù)來設(shè)計和實現(xiàn)文檔復(fù)制檢測系統(tǒng)具有重要技術(shù)意義和應(yīng)用價值。為了在海量數(shù)據(jù)環(huán)境中快速地、準(zhǔn)確地檢測出相似性文檔,論文圍繞文檔相似度估計的相關(guān)理論與方法進行了深入的研究,設(shè)計并實現(xiàn)了基于相似度估計的文檔復(fù)制檢測系統(tǒng)。論文的主要工作體現(xiàn)如下:論文基于minwise相似性估計子,使用設(shè)計并實現(xiàn)了一套文檔相似性檢測系統(tǒng),涵蓋了文檔信息預(yù)處理、相似性計算、相似性結(jié)果呈現(xiàn)及導(dǎo)出三個子功能系統(tǒng),重點解決了項目文檔聚類、相似度估值算法、相似性證據(jù)著色、相似性報告單生成和數(shù)據(jù)統(tǒng)計分析等問題。以軟件工程中的瀑布模型為設(shè)計主線,論文詳細介紹了基于相似度估計的文檔相似性檢測系統(tǒng)的業(yè)務(wù)需求、系統(tǒng)架構(gòu)設(shè)計、功能設(shè)計和主要業(yè)務(wù)流程設(shè)計,并對主要功能,給出了系統(tǒng)的實現(xiàn)環(huán)境、界面設(shè)計以及關(guān)鍵功能模塊的實現(xiàn)過程。經(jīng)過本課題的研發(fā)測試,最終得到的系統(tǒng)擁有更為人性化的操作,各類格式的文本(pdf、word)的提取率和相似性比對的計算效率顯著提升。
[Abstract]:With the development of computer network application technology, the number of similar information in the Internet is increasing exponentially. On the one hand, more and more documents with high similarity consume high amount of network storage space. On the other hand, it also has a negative impact on the user experience. The openness of information platform and the availability of digital text result in academic misconduct such as plagiarism and even illegal plagiarism. The serious consequences are self-evident. In order to improve the efficiency of information retrieval and protect intellectual property, it is of great technical significance and application value to design and implement a document copy detection system by using similarity estimation technology. In order to detect similarity documents quickly and accurately in the environment of massive data, this paper researches deeply on the theory and method of document similarity estimation, and designs and implements a document copy detection system based on similarity estimation. The main work of this paper is as follows: based on the minwise similarity estimator, a set of document similarity detection system is designed and implemented, which covers the pre-processing of document information, similarity calculation, and so on. Three sub-functional systems are presented and derived from similarity results, which focus on solving the problems of project document clustering, similarity estimation algorithm, similarity evidence coloring, similarity report form generation and data statistical analysis. Based on the waterfall model in software engineering, the paper introduces the business requirements, system architecture design, function design and main business process design of document similarity detection system based on similarity estimation in detail. The implementation environment, interface design and key function modules of the system are given. Through the research and development of this project, the final system has a more user-friendly operation, and the extraction rate of various formats of text (pdf,word) and the computing efficiency of similarity comparison are significantly improved.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP391.1
,

本文編號:2441642

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/falvlunwen/zhishichanquanfa/2441642.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶46dbb***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com