多資源服務器協(xié)同環(huán)境下的HTTP流量分析
發(fā)布時間:2019-01-26 20:08
【摘要】:數(shù)年以前,基于HTTP的網絡業(yè)務由若干服務提供商以中央集中的方式提供,鮮有分布式服務器的存在。通常的情況是,單一服務器提供獨有的網絡服務,并且固定在某個IP地址上,F(xiàn)如今,網絡結構日益復雜,IP地址與其提供的內容及服務開始變得動態(tài)化和復雜化:運營商大量使用內容分發(fā)網絡(CDN, Content delivery network)、內容緩存,基于云的網絡服務不斷涌現(xiàn),服務提供商與承載服務的基礎設備之間耦合程度正在減弱,所有這些都使得網絡管理更加困難。在如此形勢下,運營商迫切需要把握HTTP流量構成及使用模式,搞清HTTP流量在不同服務提供商間的分布,以便合理配置網絡資源。與此同時,由于網絡流量的劇增,傳統(tǒng)的流量分析方法已無法滿足海量數(shù)據的存儲和處理要求,需要引入更高效、更可靠的方式進行處理。Hadoop正是一個能夠對海量數(shù)據進行可靠的分布式處理的可擴展開源軟件框架,并已經被應用于越來越多的研究領域。 本文首先介紹了基于關聯(lián)規(guī)則的HTTP流量分析算法,利用jaccard系數(shù)衡量流量相關性并給出數(shù)學描述。 隨后,本文介紹了Hadoop的基本原理,并在Hadoop技術的基礎上提出了HTTP流量分析系統(tǒng)的三層體系結構,將網絡流量的采集、存儲、處理和分析等獨立的功能整合到一起,形成具備完整功能的處理系統(tǒng)。 接著,本文對前述系統(tǒng)數(shù)據層的IP地址識別組件進行了重點介紹。此組件實現(xiàn)了服務器IP地址向服務提供商的映射,是本文所述HTTP流量分析系統(tǒng)最重要的組成部分。 最后,利用系統(tǒng)采集層和數(shù)據層的處理的中間結果,本文在HTTP流量分析應用層總結了HTTP流量分布規(guī)律。
[Abstract]:A few years ago, the network service based on HTTP was provided by several service providers in a centralized way, and there were few distributed servers. Typically, a single server provides a unique network service and is fixed to a IP address. Nowadays, with the increasing complexity of network structure, IP addresses and their contents and services are becoming more and more dynamic and complicated: operators use a lot of content to distribute network (CDN, Content delivery network), content cache, and cloud-based network services continue to emerge. The coupling between service providers and the infrastructure that hosts the services is decreasing, all of which make network management more difficult. In such a situation, operators urgently need to grasp the HTTP traffic structure and usage mode, to find out the distribution of HTTP traffic among different service providers, in order to allocate network resources reasonably. At the same time, due to the rapid increase of network traffic, the traditional traffic analysis method can no longer meet the storage and processing requirements of massive data, so it is necessary to introduce more efficient. Hadoop is a scalable open source software framework capable of reliably distributed processing massive data and has been used in more and more research fields. This paper first introduces the HTTP traffic analysis algorithm based on association rules, and uses the jaccard coefficient to measure the traffic correlation and gives the mathematical description. Then, this paper introduces the basic principle of Hadoop, and puts forward the three-layer architecture of HTTP traffic analysis system based on Hadoop technology, which integrates the independent functions of network traffic collection, storage, processing and analysis. Form a complete function of the processing system. Then, this paper focuses on the IP address recognition component of the system data layer. This component realizes the mapping of server IP address to service provider and is the most important component of HTTP traffic analysis system described in this paper. Finally, using the intermediate results of the system collection layer and the data layer, this paper summarizes the HTTP traffic distribution law in the HTTP traffic analysis application layer.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP393.06
本文編號:2415863
[Abstract]:A few years ago, the network service based on HTTP was provided by several service providers in a centralized way, and there were few distributed servers. Typically, a single server provides a unique network service and is fixed to a IP address. Nowadays, with the increasing complexity of network structure, IP addresses and their contents and services are becoming more and more dynamic and complicated: operators use a lot of content to distribute network (CDN, Content delivery network), content cache, and cloud-based network services continue to emerge. The coupling between service providers and the infrastructure that hosts the services is decreasing, all of which make network management more difficult. In such a situation, operators urgently need to grasp the HTTP traffic structure and usage mode, to find out the distribution of HTTP traffic among different service providers, in order to allocate network resources reasonably. At the same time, due to the rapid increase of network traffic, the traditional traffic analysis method can no longer meet the storage and processing requirements of massive data, so it is necessary to introduce more efficient. Hadoop is a scalable open source software framework capable of reliably distributed processing massive data and has been used in more and more research fields. This paper first introduces the HTTP traffic analysis algorithm based on association rules, and uses the jaccard coefficient to measure the traffic correlation and gives the mathematical description. Then, this paper introduces the basic principle of Hadoop, and puts forward the three-layer architecture of HTTP traffic analysis system based on Hadoop technology, which integrates the independent functions of network traffic collection, storage, processing and analysis. Form a complete function of the processing system. Then, this paper focuses on the IP address recognition component of the system data layer. This component realizes the mapping of server IP address to service provider and is the most important component of HTTP traffic analysis system described in this paper. Finally, using the intermediate results of the system collection layer and the data layer, this paper summarizes the HTTP traffic distribution law in the HTTP traffic analysis application layer.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP393.06
【參考文獻】
相關期刊論文 前1條
1 劉軍;李銀周;Felix Cuadrado;Steve Uhlig;雷振明;;基于Jaccard的移動終端自動識別并行算法及其MapReduce實現(xiàn)(英文)[J];中國通信;2013年07期
,本文編號:2415863
本文鏈接:http://www.sikaile.net/guanlilunwen/ydhl/2415863.html
最近更新
教材專著