場景圖像文本定位與字符識別方法研究

發(fā)布時間：2019-05-24 09:04

【摘要】：場景圖像中的文本包含著豐富而又準(zhǔn)確的信息,在工業(yè)自動化、交通管理、自動翻譯、殘障人士服務(wù)等領(lǐng)域中存在廣泛的應(yīng)用需求。但由于場景圖像受非均勻光照、背景紋理和文字多樣性等影響,現(xiàn)有方法場景文本提取的準(zhǔn)確性較低。因此,如何從這些場景圖像中準(zhǔn)確地提取文本信息已成為模式識別領(lǐng)域的研究熱點(diǎn),開展本項(xiàng)目的研究對提高場景圖像文本識別系統(tǒng)的準(zhǔn)確性和魯棒性具有重要的實(shí)用價值。本文主要工作及貢獻(xiàn)包括:首先,基于文本區(qū)域字符灰度值一致性,x方向梯度幅值呈凸形分布和文本字符相近鄰的特點(diǎn),本文提出一種基于卷積神經(jīng)網(wǎng)絡(luò)(CNN)和支撐向量機(jī)(SVM)輸出得分的場景圖像文本定位方法。依據(jù)文本區(qū)域x方向梯度幅值的凸形分布和字符灰度值一致性,檢測文本區(qū)域的典型點(diǎn),并通過典型點(diǎn)位置和灰度聚類提取候選連通成分,再對上述候選連通成分以外的區(qū)域,用k-means聚類方法進(jìn)一步提取其它的候選連通成分。然后,使用基于CNN的文本連通成分SVM分類器,利用CNN提取連通成分的紋理特征,再使用SVM輸出得分抑制非文本連通成分,并將近鄰的連通成分組合成候選文本區(qū)域;最后,針對提取的候選區(qū)域梯度方向直方圖HOG特征,利用支持向量機(jī)驗(yàn)證候選區(qū)域。對于ICDAR2011和ICDAR2013的場景文本圖像數(shù)據(jù)集,本文定位方法分別獲得76%和78%的F值,表明該方法有效地抑制了復(fù)雜背景紋理干擾。其次,基于文本行內(nèi)字符顏色的相似性,提出一種基于顏色聚類和梯度向量流的文本區(qū)域字符切割方法。先利用k-means聚類方法,對像素點(diǎn)色彩空間位置分布進(jìn)行聚類獲得k個候選圖層,再用連通成分的占空比、寬高比等幾何特征,提取候選字符連通成分所在圖層;并在同質(zhì)區(qū)域?qū)ふ疫h(yuǎn)離邊緣的點(diǎn)作為候選切分像素點(diǎn),利用灰度差值的平方作為代價,尋找累計代價最小的切割路徑。在ICDAR2013場景圖像文本數(shù)據(jù)集上,本文方法獲得87.9%的F值,實(shí)驗(yàn)表明,顏色聚類可有效地抑制非均勻光照和遮擋的干擾。最后,基于字符結(jié)構(gòu)的旋轉(zhuǎn)不變性,提出一種多方向單個字符識別模型。采用變形HOG算子和同心圓形模板采樣,提取局部聯(lián)合HOG紋理特征和采樣點(diǎn)之間的象限關(guān)系結(jié)構(gòu)特征,組合上述兩種特征得到字符特征,進(jìn)而通過學(xué)習(xí)建立特征詞典的字符詞袋模型,然后,利用支持向量機(jī)識別字符。針對ICDAR字符數(shù)據(jù)集、Chars74K數(shù)據(jù)集和手工收集的數(shù)據(jù)集進(jìn)行字符識別實(shí)驗(yàn),本文提出的方法分別獲得82%、87%和73%的準(zhǔn)確率,表明提出的模型對旋轉(zhuǎn)變化具有較好的魯棒性。
[Abstract]:The text in the scene image contains rich and accurate information, which has a wide range of application requirements in the fields of industrial automation, traffic management, automatic translation, service for the disabled and so on. However, due to the influence of non-uniform lighting, background texture and text diversity, the accuracy of scene text extraction is low. Therefore, how to extract text information accurately from these scene images has become a research focus in the field of pattern recognition. The research of this project has important practical value to improve the accuracy and robustness of scene image text recognition system. The main work and contributions of this paper are as follows: firstly, based on the consistency of the gray value of the characters in the text area, the amplitude of the gradient in the x direction is convex and the nearest neighbor of the text characters. In this paper, a text location method of scene image based on convolution neural network (CNN) and support vector machine (SVM) output score is proposed. According to the convexity distribution of the gradient amplitude in the x direction of the text region and the consistency of the character gray value, the typical points in the text region are detected, and the candidate connected components are extracted by the typical point position and gray clustering, and then the regions other than the candidate connected components are extracted. Other candidate connected components were further extracted by k-means clustering method. Then, the text connected component SVM classifiers based on CNN are used, the texture features of connected components are extracted by CNN, and then the non-text connected components are suppressed by SVM output score, and the nearest neighbor connected components are combined into candidate text regions. Finally, the support vector machine (SVM) is used to verify the candidate region according to the gradient direction histogram HOG feature of the candidate region. For the scene text image datasets of ICDAR2011 and ICDAR2013, the F values of 76% and 78% are obtained by the localization method, respectively, which shows that the method can effectively suppress the complex background texture interference. Secondly, based on the similarity of character color in text line, a text region character cutting method based on color clustering and gradient vector stream is proposed. Firstly, k-means clustering method is used to cluster the spatial position distribution of pixel color to obtain k candidate layers, and then the geometric features such as duty cycle and aspect ratio of connected components are used to extract the layers in which the candidate characters are connected. In the homogeneous region, the point far from the edge is found as the candidate segmentation pixel point, and the square of the gray difference is used as the cost to find the cutting path with the lowest cumulative cost. On the text dataset of ICDAR2013 scene image, the F value of 87.9% is obtained by this method. The experimental results show that color clustering can effectively suppress the interference of non-uniform light and occlusion. Finally, based on the rotation invariance of character structure, a multi-direction single character recognition model is proposed. The deformed HOG operator and concentric circular template sampling are used to extract the local joint HOG texture features and the quadrant structure features between the sampling points, and the character features are obtained by combining the above two features. Then the character word bag model of feature dictionary is established by learning, and then the character is recognized by support vector machine (SVM). Character recognition experiments are carried out for ICDAR character datasets, Chars74K datasets and manual collected datasets. The accuracy of the proposed method is 82%, 87% and 73% respectively, which shows that the proposed model has good robustness to rotation change.
【學(xué)位授予單位】：華中科技大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2016
【分類號】：TP391.41

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 ;有限自然碼非接觸光電字符識別[J];中國計量學(xué)院學(xué)報;2001年02期

2 許振新;字符識別要面向應(yīng)用[J];中國計算機(jī)用戶;2003年13期

3 盧達(dá),浦煒,謝銘培;一種用于提高字符識別速度的字符預(yù)分類法研究　[J];計算機(jī)工程與應(yīng)用;2000年04期

4 孫廣玲,唐降龍;基于識別結(jié)果反饋信息的閉環(huán)聯(lián)機(jī)字符識別系統(tǒng)[J];計算機(jī)工程與應(yīng)用;2002年22期

5 烏凌超,莫玉龍;基于獨(dú)立分量分析的字符識別方法[J];上海大學(xué)學(xué)報(自然科學(xué)版);2003年03期

6 陳薇,李勇;基于塊輸入的神經(jīng)網(wǎng)絡(luò)英語字符識別研究[J];計算機(jī)時代;2005年07期

7 湯茂斌;謝渝平;李就好;;基于神經(jīng)網(wǎng)絡(luò)算法的字符識別方法研究[J];微電子學(xué)與計算機(jī);2009年08期

8 田立巖;胡曉光;;一種改進(jìn)的快速嵌入式字符識別方法[J];光電子.激光;2010年10期

9 陳默;何小海;吳煒;楊曉敏;付光榮;;結(jié)合獨(dú)立與連續(xù)字符識別的集裝箱號識別技術(shù)[J];四川大學(xué)學(xué)報(工程科學(xué)版);2011年S1期

10 韓林峰;趙暉;;基于支持向量機(jī)的聯(lián)機(jī)手寫維吾爾字符識別[J];計算機(jī)應(yīng)用與軟件;2012年03期

相關(guān)會議論文前10條

1 湯茂斌;謝渝平;李就好;;基于神經(jīng)網(wǎng)絡(luò)算法的字符識別方法研究[A];2009年全國開放式分布與并行計算機(jī)學(xué)術(shù)會議論文集(上冊)[C];2009年

2 洪漢玉;郭強(qiáng);章秀華;張艷;林志敏;;復(fù)雜背景條件下字符識別新方法研究[A];第十四屆全國圖象圖形學(xué)學(xué)術(shù)會議論文集[C];2008年

3 車揚(yáng);鄭智捷;;速記字符識別的預(yù)處理模式和方法探討[A];2010通信理論與技術(shù)新發(fā)展——第十五屆全國青年通信學(xué)術(shù)會議論文集（下冊）[C];2010年

4 李玉良;王良松;李晶;;圖像中數(shù)字字符識別技術(shù)概覽[A];節(jié)能環(huán)保和諧發(fā)展——2007中國科協(xié)年會論文集（一）[C];2007年

5 劉云曼;王磊;;盲人閱讀機(jī)中圖像字符識別方法的研究[A];天津市生物醫(yī)學(xué)工程學(xué)會第三十三屆學(xué)術(shù)年會論文集[C];2013年

6 余曉華;陳曉春;劉好炯;;手持式儀表字符識別技術(shù)研究[A];《IT時代周刊》論文專版（第300期）[C];2014年

7 陸璐;張旭東;趙瑩;高雋;;基于卷積神經(jīng)網(wǎng)絡(luò)的車牌照字符識別研究[A];第十二屆全國圖象圖形學(xué)學(xué)術(shù)會議論文集[C];2005年

8 朱小燕;史一凡;馬少平;;脫機(jī)手寫體字符識別研究[A];面向21世紀(jì)的科技進(jìn)步與社會經(jīng)濟(jì)發(fā)展（上冊）[C];1999年

9 歐梅芳;宋瑞霞;;V-系統(tǒng)在信息重構(gòu)與字符識別中的應(yīng)用探索[A];中國圖學(xué)新進(jìn)展2007——第一屆中國圖學(xué)大會暨第十屆華東六省一市工程圖學(xué)學(xué)術(shù)年會論文集[C];2007年

10 張雪山;田慧;;字符識別系統(tǒng)的一種定位算法[A];圖像仿真信息技術(shù)——第二屆聯(lián)合學(xué)術(shù)會議論文集[C];2002年

相關(guān)重要報紙文章前3條

1 尼克;計算歷史學(xué)：大數(shù)據(jù)時代的讀書[N];東方早報;2014年

2 王慶國;票據(jù)印刷視覺字符檢測系統(tǒng)中硬件的選擇[N];中國包裝報;2008年

3 方忠誠;OCR技術(shù)及其應(yīng)用[N];北京電子報;2000年

相關(guān)博士學(xué)位論文前4條

1 巫義銳;視覺場景理解與交互關(guān)鍵技術(shù)研究[D];南京大學(xué);2016年

2 文穎;數(shù)字、字符識別及其應(yīng)用研究[D];上海交通大學(xué);2009年

3 彭健;多類小字符集自適應(yīng)字符識別技術(shù)及系統(tǒng)的研究[D];重慶大學(xué);2002年

4 羅特飛（Mohammed Lutf）;基于HMM與決策樹的多字體阿拉伯文的字符識別[D];華中科技大學(xué);2015年

相關(guān)碩士學(xué)位論文前10條

1 張佳偉;基因組自動化進(jìn)化儀的研制[D];浙江大學(xué);2015年

2 邱立松;國際音標(biāo)字符識別算法的研究[D];上海師范大學(xué);2015年

3 張靖婭;鋼板點(diǎn)陣噴印字符識別方法研究[D];沈陽理工大學(xué);2015年

4 武威;基于模板匹配與結(jié)構(gòu)特征的字符識別算法研究[D];鄭州大學(xué);2015年

5 王勁松;基于神經(jīng)網(wǎng)絡(luò)的字符識別系統(tǒng)的設(shè)計與實(shí)現(xiàn)[D];電子科技大學(xué);2014年

6 周炳昱;基于手機(jī)攝像取詞的電子詞典的設(shè)計與實(shí)現(xiàn)[D];大連理工大學(xué);2015年

7 戴威;聯(lián)機(jī)手寫智能計算系統(tǒng)的研究[D];華北電力大學(xué);2015年

8 尹少東;基于嵌入式Linux的字符識別[D];河北科技大學(xué);2015年

9 周軍;圖像中自然場景字符區(qū)域定位[D];東北大學(xué);2014年

10 周品;車牌分割和字符識別的算法研究[D];南京郵電大學(xué);2015年

，

本文編號：2484739

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2484739.html

上一篇：基于快速密度峰值聚類的圖像檢索技術(shù)研究與應(yīng)用
下一篇：層次化軟件可信度量模型研究與設(shè)計

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

場景圖像文本定位與字符識別方法研究