基于多元線性回歸模型的缺失浮動車數(shù)據(jù)填充研究
本文選題:浮動車數(shù)據(jù) + 多元線性回歸模型; 參考:《哈爾濱工業(yè)大學》2015年碩士論文
【摘要】:在現(xiàn)實生活中,數(shù)據(jù)缺失問題是很廣泛存在的,無論是在交通方面還是在社會經(jīng)濟研究、生物醫(yī)藥研究等諸多領域中數(shù)據(jù)缺失現(xiàn)象都是不可避免的。因為數(shù)據(jù)存在缺失,不單會增加分析研究任務的復雜程度,這樣既會大大降低了統(tǒng)計工作的效率,又會導致統(tǒng)計分析結(jié)果的重大偏差。所以,為了得到較為完整的數(shù)據(jù),采用數(shù)理統(tǒng)計的方法對缺失的數(shù)據(jù)進行填充,是數(shù)據(jù)處理中不可缺少的重要步驟。本文就是以浮動車數(shù)據(jù)為例,來研究缺失數(shù)據(jù)的填充方法。本文研究的主要內(nèi)容是,將深圳市路網(wǎng)與浮動車數(shù)據(jù)相結(jié)合,得到路網(wǎng)中存在的缺失數(shù)據(jù),為了填充缺失部分提出多元線性回歸模型,盡可能使得數(shù)據(jù)覆蓋路網(wǎng)范圍更廣,形成路況發(fā)布指南,方便人民出行。具體如下:考慮到交通數(shù)據(jù)的時空相關(guān)性,分析在多尺度下路網(wǎng)的空間相關(guān)性,得到缺失數(shù)據(jù)插補的空間相關(guān)因素,同時分析浮動車數(shù)據(jù)的時間相關(guān)性,確定了時間窗的尺度,為后文插補缺失數(shù)據(jù)模型奠定基礎。結(jié)合時空相關(guān)性,應用多元線性回歸模型。首先僅結(jié)合空間相關(guān)性建立模型,通過選取訓練數(shù)據(jù)做驗證分析,效果不好,精度較低;為了提高精度引入時間相關(guān)性因素建立模型,進行對比驗證,得到在結(jié)合時空關(guān)系的多元線性回歸模型填充缺失數(shù)據(jù)更具有普遍適用性,并總結(jié)該模型適用的四種情況,同時根據(jù)課題組成員針對熱點區(qū)域的研究得到的三個熱點區(qū)域,分別進行遍歷填充。最后是實證分析部分。本文通過對熱點區(qū)域福田區(qū)為例,選取訓練數(shù)據(jù)對模型進行實證校驗,通過實證數(shù)據(jù)校正模型的準確性,然后對實際道路缺失的數(shù)據(jù)進行填充并與該缺失部分歷史存在數(shù)據(jù)做佐證,進行路況發(fā)布。本文的研究能夠得到一個結(jié)合時空相關(guān)性填充缺失數(shù)據(jù)的可靠模型。
[Abstract]:In real life, the problem of missing data is very widespread, whether in the transportation or in the social and economic research, biomedical research and many other fields of data missing phenomenon is inevitable. The lack of data not only increases the complexity of the task of analysis and research, but also greatly reduces the efficiency of statistical work and leads to a significant deviation of the results of statistical analysis. Therefore, in order to obtain more complete data, it is an indispensable and important step in data processing to use mathematical statistics to fill the missing data. This paper takes floating car data as an example to study the filling method of missing data. The main content of this paper is to combine the data of Shenzhen road network and floating car to get the missing data in the road network. In order to fill the missing part, a multivariate linear regression model is proposed to make the data cover the road network more widely. Form road condition issue guide, convenient people travel. The details are as follows: considering the temporal and spatial correlation of traffic data, the spatial correlation of road network under multi-scale is analyzed, and the spatial correlation factors of missing data interpolation are obtained. At the same time, the temporal correlation of floating vehicle data is analyzed, and the scale of time window is determined. It lays the foundation for the later interpolation missing data model. Combined with temporal and spatial correlation, multiple linear regression model was applied. First of all, only combined with spatial correlation to establish a model, through the selection of training data for verification analysis, the effect is not good, the accuracy is low; in order to improve the accuracy of the introduction of time correlation factors to establish a model, to compare and verify, It is more applicable to obtain the missing data in the multivariate linear regression model combined with space-time relationship, and to summarize the four cases of the model. At the same time, according to the research of the hot spot region by the members of the research group, three hot spots can be obtained. Traversal padding is carried out respectively. The last part is empirical analysis. Through the example of Futian district, the training data is selected to verify the model, and the veracity of the model is corrected by the empirical data. Then the missing data of the actual road are filled and verified with the missing part of the historical data, and the road condition is released. In this paper, we can obtain a reliable model combining spatiotemporal correlation with missing data.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:U491
【參考文獻】
相關(guān)期刊論文 前10條
1 葉素靜;唐文清;張敏強;曹魏聰;;追蹤研究中缺失數(shù)據(jù)處理方法及應用現(xiàn)狀分析[J];心理科學進展;2014年12期
2 郝勝軒;宋宏;周曉鋒;;一種基于雙聚類的缺失數(shù)據(jù)填補方法[J];計算機應用研究;2015年03期
3 郝勝軒;宋宏;周曉鋒;;基于近鄰噪聲處理的KNN缺失數(shù)據(jù)填補算法[J];計算機仿真;2014年07期
4 張偉;馮萍;袁佳英;李梅;勾忠平;;缺失數(shù)據(jù)處理方法的研究進展[J];中國醫(yī)院統(tǒng)計;2012年04期
5 方煒煒;任江;夏紅科;;異構(gòu)分布的多元線性回歸隱私保護模型[J];計算機研究與發(fā)展;2011年09期
6 鄒海翔;樂陽;李清泉;葉嘉安;;基于Kriging插值的無檢測器路段交通數(shù)據(jù)插補方法[J];交通運輸工程學報;2011年03期
7 龐新生;;缺失數(shù)據(jù)處理方法的比較[J];統(tǒng)計與決策;2010年24期
8 劉春;黃美嫻;楊超;;浮動車數(shù)據(jù)缺失道路的速度推估模型與實現(xiàn)[J];同濟大學學報(自然科學版);2010年08期
9 張海燕;;基于多元線性回歸模型的四川農(nóng)村居民收入增長分析[J];統(tǒng)計與決策;2010年13期
10 徐健銳;李星毅;施化吉;;處理缺失數(shù)據(jù)的短時交通流預測模型[J];計算機應用;2010年04期
,本文編號:1867432
本文鏈接:http://www.sikaile.net/kejilunwen/daoluqiaoliang/1867432.html