基于Spark平臺(tái)的公交客流預(yù)測方法的研究
[Abstract]:Urban public transportation is an important part of urban construction and social life, which has far-reaching and comprehensive influence on urban economy and residents' life. However, the current low utilization of traffic resources, traffic congestion, traffic pollution and other problems are increasingly serious, these practical problems directly related to the vital interests of the people. As a scientific measure, bus passenger flow prediction can provide important information for urban public transport policy making, system planning and operation management, and can help public transport managers to formulate reasonable bus operation plans and policies. It is an important way to improve the utilization rate of traffic resources and enhance the function of the city. It plays an important role in alleviating traffic congestion and reducing traffic pollution. Stochastic forest is a combination model based on multiple decision trees, which has more advantages than other algorithms. However, in the single machine mode, the decision tree construction and prediction voting process of stochastic forest are serialized, and the operation efficiency is low. When the amount of data is large, the traditional stochastic forest algorithm in single computer environment will consume a lot of time. Spark is a distributed computing platform, which can easily process massive data, making large-scale and distributed iterative computing possible. Combining the advantages of stochastic forest and Spark, this paper takes stochastic forest as bus passenger flow prediction model and Spark as parallel implementation platform of stochastic forest. Based on the existing bus passenger flow data, this paper analyzes the travel rules of bus passenger flow by using Spark SQL statistics and extracting useful information. This paper studies the time distribution characteristics and dynamic influencing factors of passenger flow, analyzes the changing law of bus passenger flow on weekdays and weekends, and analyzes the influence of weather, temperature, holidays and other factors on the short-time passenger flow of public transport. In order to solve the problem of long time consuming of random forest in single machine environment, this paper proposes a parallel method of stochastic forest based on Spark platform, which realizes the parallelization of building and voting processes. The experimental results show that the operational efficiency of parallel random forest is better than that of traditional random forest in single machine environment. In addition, by comparing the experimental results of various regression models, it is proved that parallel stochastic forest can achieve good results in model fitting and prediction accuracy. Most of the existing researches on the improvement of stochastic forests are used for classification problems, but few researches on the improvement of regression problems. This paper summarizes the previous research experiences and proposes an improved method for calculating the similarity of random forest samples. Based on this method, the voting process of random forest is optimized and a weighted voting method is proposed. At the same time, an improved feature selection algorithm is implemented, which can reduce the feature subset extracted from the random forest for feature selection, and reduce the influence of the unimportant features on the prediction effect of the stochastic forest. The experimental results show that the prediction accuracy of passenger flow in the improved stochastic forest model is higher than that before the improvement.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:U491.17;TP181;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 王平;單文英;;改進(jìn)的隨機(jī)森林算法在乳腺腫瘤診斷中的應(yīng)用[J];計(jì)算機(jī)應(yīng)用與軟件;2016年04期
2 李慧;李正;佘X;;一種基于綜合不放回抽樣的隨機(jī)森林算法改進(jìn)[J];計(jì)算機(jī)工程與科學(xué);2015年07期
3 姜平;石琴;陳無畏;張衛(wèi)華;;公交客流預(yù)測的神經(jīng)網(wǎng)絡(luò)模型[J];武漢理工大學(xué)學(xué)報(bào)(交通科學(xué)與工程版);2009年03期
4 楊智偉;趙騫;趙勝川;金雷;毛羿;;基于公交IC卡數(shù)據(jù)信息的客流預(yù)測方法研究[J];交通標(biāo)準(zhǔn)化;2009年09期
5 莊進(jìn)發(fā);羅鍵;彭彥卿;黃春慶;吳長慶;;基于改進(jìn)隨機(jī)森林的故障診斷方法研究[J];計(jì)算機(jī)集成制造系統(tǒng);2009年04期
6 韓秀華;李津;鄭黎黎;;基于IC卡信息的居民公交出行動(dòng)態(tài)特性[J];吉林大學(xué)學(xué)報(bào)(工學(xué)版);2009年S1期
7 馬成前;任桂山;;基于神經(jīng)網(wǎng)絡(luò)智能預(yù)測武昌閱馬場隧道交通流[J];計(jì)算機(jī)與數(shù)字工程;2008年02期
相關(guān)碩士學(xué)位論文 前4條
1 馬驪;隨機(jī)森林算法的優(yōu)化改進(jìn)研究[D];暨南大學(xué);2016年
2 李振;基于Hadoop平臺(tái)的公交客流分析與預(yù)測研究[D];東北師范大學(xué);2015年
3 董海洋;公交客流實(shí)時(shí)分析與短時(shí)預(yù)測研究[D];大連理工大學(xué);2013年
4 戴霄;基于公交IC信息的公交數(shù)據(jù)分析方法研究[D];東南大學(xué);2006年
,本文編號(hào):2301942
本文鏈接:http://www.sikaile.net/kejilunwen/daoluqiaoliang/2301942.html