天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 路橋論文 >

基于Spark平臺(tái)的公交客流預(yù)測方法的研究

發(fā)布時(shí)間:2018-10-31 11:02
【摘要】:城市公共交通是城市建設(shè)和社會(huì)生活的重要組成部分,對(duì)城市經(jīng)濟(jì)和居民生活具有深遠(yuǎn)性、全面性的影響。但是,當(dāng)前交通資源利用率低、交通擁堵、交通污染等問題日益嚴(yán)重,這些現(xiàn)實(shí)問題直接關(guān)系著人民群眾的切身利益。公交客流預(yù)測作為一種科學(xué)的措施,能為城市公共交通政策制定、系統(tǒng)規(guī)劃、運(yùn)營管理提供重要信息,能幫助公交管理者制定合理的公交運(yùn)營計(jì)劃和政策,是提高交通資源利用率、增強(qiáng)城市功能的重要途徑,對(duì)緩解交通擁堵、降低交通污染具有十分重要的作用。隨機(jī)森林是基于多棵決策樹的組合模型,相比于其他算法有較多的優(yōu)勢(shì)。然而在單機(jī)模式下,隨機(jī)森林的決策樹構(gòu)建和預(yù)測投票過程都是串行化的,運(yùn)行效率較低。數(shù)據(jù)量規(guī)模較大時(shí),傳統(tǒng)單機(jī)環(huán)境下的隨機(jī)森林算法會(huì)消耗大量時(shí)間。Spark是一個(gè)分布式計(jì)算平臺(tái),能夠輕松處理海量數(shù)據(jù),使得大規(guī)模,分布式迭代計(jì)算成為可能。本文結(jié)合了隨機(jī)森林和Spark兩者的優(yōu)點(diǎn),將隨機(jī)森林作為公交客流預(yù)測模型,Spark作為隨機(jī)森林的并行化實(shí)現(xiàn)平臺(tái)。本文在現(xiàn)有公交客流數(shù)據(jù)的基礎(chǔ)上,使用Spark SQL統(tǒng)計(jì)和提取有用信息,對(duì)公交客流的出行規(guī)律進(jìn)行分析。分別研究了客流的時(shí)間分布特征和動(dòng)態(tài)影響因素,分析了公交客流在工作日、周末的變化規(guī)律,同時(shí)分析了天氣、溫度、節(jié)假日等因素對(duì)公交短時(shí)客流的影響。為了解決單機(jī)環(huán)境下隨機(jī)森林耗時(shí)長的問題,本文提出了基于Spark平臺(tái)的隨機(jī)森林并行化方法,實(shí)現(xiàn)了建樹和投票兩個(gè)過程的并行化。實(shí)驗(yàn)結(jié)果表明,并行化隨機(jī)森林的運(yùn)行效率要好于傳統(tǒng)單機(jī)環(huán)境下的隨機(jī)森林。另外,本文通過對(duì)比多種回歸模型的實(shí)驗(yàn)結(jié)果,證實(shí)了并行化隨機(jī)森林在模型擬合度和預(yù)測精度上都能取得較好的效果,F(xiàn)有對(duì)隨機(jī)森林的改進(jìn)研究大多用于分類問題上,對(duì)于回歸問題的改進(jìn)研究較少。本文總結(jié)了以往各方面的研究經(jīng)驗(yàn),提出了改進(jìn)型隨機(jī)森林樣本相似度計(jì)算方法,并基于該計(jì)算方法對(duì)隨機(jī)森林的投票過程進(jìn)行優(yōu)化,提出了加權(quán)投票方法。同時(shí)實(shí)現(xiàn)了改進(jìn)型特征選擇算法,該算法能縮小隨機(jī)森林進(jìn)行特征選擇時(shí)抽取的特征子集,減小不重要的特征對(duì)隨機(jī)森林預(yù)測效果的影響。實(shí)驗(yàn)結(jié)果表明,改進(jìn)后隨機(jī)森林模型的客流預(yù)測精度較改進(jìn)前有所提高。
[Abstract]:Urban public transportation is an important part of urban construction and social life, which has far-reaching and comprehensive influence on urban economy and residents' life. However, the current low utilization of traffic resources, traffic congestion, traffic pollution and other problems are increasingly serious, these practical problems directly related to the vital interests of the people. As a scientific measure, bus passenger flow prediction can provide important information for urban public transport policy making, system planning and operation management, and can help public transport managers to formulate reasonable bus operation plans and policies. It is an important way to improve the utilization rate of traffic resources and enhance the function of the city. It plays an important role in alleviating traffic congestion and reducing traffic pollution. Stochastic forest is a combination model based on multiple decision trees, which has more advantages than other algorithms. However, in the single machine mode, the decision tree construction and prediction voting process of stochastic forest are serialized, and the operation efficiency is low. When the amount of data is large, the traditional stochastic forest algorithm in single computer environment will consume a lot of time. Spark is a distributed computing platform, which can easily process massive data, making large-scale and distributed iterative computing possible. Combining the advantages of stochastic forest and Spark, this paper takes stochastic forest as bus passenger flow prediction model and Spark as parallel implementation platform of stochastic forest. Based on the existing bus passenger flow data, this paper analyzes the travel rules of bus passenger flow by using Spark SQL statistics and extracting useful information. This paper studies the time distribution characteristics and dynamic influencing factors of passenger flow, analyzes the changing law of bus passenger flow on weekdays and weekends, and analyzes the influence of weather, temperature, holidays and other factors on the short-time passenger flow of public transport. In order to solve the problem of long time consuming of random forest in single machine environment, this paper proposes a parallel method of stochastic forest based on Spark platform, which realizes the parallelization of building and voting processes. The experimental results show that the operational efficiency of parallel random forest is better than that of traditional random forest in single machine environment. In addition, by comparing the experimental results of various regression models, it is proved that parallel stochastic forest can achieve good results in model fitting and prediction accuracy. Most of the existing researches on the improvement of stochastic forests are used for classification problems, but few researches on the improvement of regression problems. This paper summarizes the previous research experiences and proposes an improved method for calculating the similarity of random forest samples. Based on this method, the voting process of random forest is optimized and a weighted voting method is proposed. At the same time, an improved feature selection algorithm is implemented, which can reduce the feature subset extracted from the random forest for feature selection, and reduce the influence of the unimportant features on the prediction effect of the stochastic forest. The experimental results show that the prediction accuracy of passenger flow in the improved stochastic forest model is higher than that before the improvement.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:U491.17;TP181;TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前7條

1 王平;單文英;;改進(jìn)的隨機(jī)森林算法在乳腺腫瘤診斷中的應(yīng)用[J];計(jì)算機(jī)應(yīng)用與軟件;2016年04期

2 李慧;李正;佘X;;一種基于綜合不放回抽樣的隨機(jī)森林算法改進(jìn)[J];計(jì)算機(jī)工程與科學(xué);2015年07期

3 姜平;石琴;陳無畏;張衛(wèi)華;;公交客流預(yù)測的神經(jīng)網(wǎng)絡(luò)模型[J];武漢理工大學(xué)學(xué)報(bào)(交通科學(xué)與工程版);2009年03期

4 楊智偉;趙騫;趙勝川;金雷;毛羿;;基于公交IC卡數(shù)據(jù)信息的客流預(yù)測方法研究[J];交通標(biāo)準(zhǔn)化;2009年09期

5 莊進(jìn)發(fā);羅鍵;彭彥卿;黃春慶;吳長慶;;基于改進(jìn)隨機(jī)森林的故障診斷方法研究[J];計(jì)算機(jī)集成制造系統(tǒng);2009年04期

6 韓秀華;李津;鄭黎黎;;基于IC卡信息的居民公交出行動(dòng)態(tài)特性[J];吉林大學(xué)學(xué)報(bào)(工學(xué)版);2009年S1期

7 馬成前;任桂山;;基于神經(jīng)網(wǎng)絡(luò)智能預(yù)測武昌閱馬場隧道交通流[J];計(jì)算機(jī)與數(shù)字工程;2008年02期

相關(guān)碩士學(xué)位論文 前4條

1 馬驪;隨機(jī)森林算法的優(yōu)化改進(jìn)研究[D];暨南大學(xué);2016年

2 李振;基于Hadoop平臺(tái)的公交客流分析與預(yù)測研究[D];東北師范大學(xué);2015年

3 董海洋;公交客流實(shí)時(shí)分析與短時(shí)預(yù)測研究[D];大連理工大學(xué);2013年

4 戴霄;基于公交IC信息的公交數(shù)據(jù)分析方法研究[D];東南大學(xué);2006年

,

本文編號(hào):2301942

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/daoluqiaoliang/2301942.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶73960***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com