天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

電商數(shù)據(jù)倉庫作業(yè)調(diào)度系統(tǒng)的設計與實現(xiàn)

發(fā)布時間:2018-11-18 18:31
【摘要】:數(shù)據(jù)已成為當代互聯(lián)網(wǎng)企業(yè)核心競爭力,而高效的作業(yè)調(diào)度系統(tǒng)是離線海量數(shù)據(jù)管理的重要工具,誰能有效管理這些海量數(shù)據(jù),并能有效挖掘其中有價值信息,誰就站在了戰(zhàn)略至高點。ETL作業(yè)是數(shù)據(jù)倉庫日常工作的核心內(nèi)容,海量具有復雜關(guān)系的作業(yè)只有在作業(yè)調(diào)度系統(tǒng)的調(diào)度管理下才能高效有序進行。在當前以數(shù)據(jù)為生產(chǎn)力的信息經(jīng)濟時代,電商數(shù)據(jù)倉庫日常工作已不是簡單的數(shù)據(jù)備份和日志拉取,任何能夠關(guān)聯(lián)的數(shù)據(jù)都有可能產(chǎn)生新的火花。由此,作業(yè)調(diào)度系統(tǒng)不僅要保證高效和穩(wěn)定地觸發(fā)作業(yè),又要兼顧各作業(yè)之間的依賴關(guān)系,最后以作業(yè)鏈的形式有序觸發(fā)所有作業(yè)。這些要求是作業(yè)調(diào)度系統(tǒng)建設將要面臨的新的挑戰(zhàn)。隨著大數(shù)據(jù)時代的到來,以Hadoop生態(tài)系統(tǒng)為基礎的大數(shù)據(jù)處理工具得到了市場的廣泛認可。而Hive數(shù)據(jù)庫的誕生正迎合了大數(shù)據(jù)時代的需要。本系統(tǒng)將對Hive數(shù)據(jù)處理的支持納入數(shù)據(jù)倉庫重要部分,充分利用了hadoop集群穩(wěn)定高擴展性優(yōu)勢,采用分布式集群滿足電子商務企業(yè)對數(shù)據(jù)倉庫的穩(wěn)定/高效/經(jīng)濟的需求。由此新的作業(yè)調(diào)度系統(tǒng)不僅支持常規(guī)關(guān)系型數(shù)據(jù)庫處理,還能兼容HIVE數(shù)據(jù)處理功能。目前,國內(nèi)外各大企業(yè)數(shù)據(jù)倉庫作業(yè)調(diào)度系統(tǒng)多以自主建設為主,也有些優(yōu)秀的開源作業(yè)調(diào)度系統(tǒng)(如OOZIE)和一些優(yōu)秀的作業(yè)調(diào)度系統(tǒng)框架(如quartz),但是在使用場景和功能上與企業(yè)當前發(fā)展階段需求不符。本文通過總結(jié)日常工作中的調(diào)度需求,為企業(yè)設計開發(fā)了一套符合當前發(fā)展階段的定制化的電商數(shù)據(jù)倉庫作業(yè)調(diào)度引擎,數(shù)據(jù)開發(fā)人員能夠方便地在任意作業(yè)機部署自己的作業(yè),并提供按周期調(diào)起,靈活添加依賴,負載均衡,日志記錄,監(jiān)控報警等一系列的統(tǒng)一高效管理。
[Abstract]:Data has become the core competitiveness of contemporary Internet enterprises, and efficient job scheduling system is an important tool for offline mass data management. Who can effectively manage these massive data and effectively mine valuable information. The ETL job is the core of the daily work of data warehouse. A large number of jobs with complex relationships can only be carried out efficiently and orderly under the scheduling management of the job scheduling system. In the current era of information economy with data as productivity, the daily work of e-commerce data warehouse is no longer a simple data backup and log pull, any data can be associated with the possibility of a new spark. Therefore, the job scheduling system should not only guarantee the efficient and stable triggering of jobs, but also take into account the dependencies among the jobs. Finally, all jobs will be triggered in an orderly manner in the form of job chains. These requirements are the new challenges to the construction of job scheduling system. With the arrival of big data era, big data processing tools based on Hadoop ecosystem have been widely accepted by the market. The birth of Hive database is to meet the needs of big data era. In this system, the support of Hive data processing is brought into the important part of data warehouse, and the stable and high expansibility advantage of hadoop cluster is fully utilized, and the distributed cluster is adopted to meet the stable / efficient / economical demand of electronic commerce enterprises for data warehouse. The new job scheduling system not only supports conventional relational database processing, but also can be compatible with HIVE data processing function. At present, most of the job scheduling systems in domestic and foreign enterprises are mainly self-built, and there are some excellent open source job scheduling systems (such as OOZIE) and some excellent job scheduling system frameworks (such as quartz),). However, in the use of scenarios and functions with the current stage of enterprise development requirements. Through summing up the scheduling requirements of daily work, this paper designs and develops a set of customized scheduling engine of e-commerce data warehouse in accordance with the current development stage for the enterprise. Data developers can easily deploy their jobs on any job machine and provide a series of unified and efficient management such as cycle adjustment, flexible addition of dependencies, load balancing, logging, monitoring and alarm, and so on.
【學位授予單位】:首都經(jīng)濟貿(mào)易大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13

【參考文獻】

相關(guān)期刊論文 前10條

1 宋丹;黃旭;;新興技術(shù)在商業(yè)智能創(chuàng)新發(fā)展中的應用[J];中國管理信息化;2016年19期

2 周柱;郎朗;;Ajax技術(shù)在B/S架構(gòu)中的數(shù)據(jù)傳輸應用研究[J];新余學院學報;2016年03期

3 李治;;數(shù)據(jù)挖掘在商業(yè)信息服務中的應用[J];電腦知識與技術(shù);2015年05期

4 趙宣容;;計算機軟件數(shù)據(jù)庫設計的重要性以及原則探討[J];電子技術(shù)與軟件工程;2015年17期

5 王有為;王偉平;孟丹;;基于統(tǒng)計方法的Hive數(shù)據(jù)倉庫查詢優(yōu)化實現(xiàn)[J];計算機研究與發(fā)展;2015年06期

6 曹靖;;提高Java數(shù)據(jù)庫訪問效率的策略研究[J];通訊世界;2015年11期

7 葉均隆;葉均明;何銀川;;Tomcat執(zhí)行定時任務實現(xiàn)不同系統(tǒng)數(shù)據(jù)導入[J];現(xiàn)代計算機(專業(yè)版);2015年09期

8 羅強;何利力;王曉菲;;數(shù)據(jù)倉庫中數(shù)據(jù)清洗技術(shù)分析[J];電腦編程技巧與維護;2015年02期

9 聶章艷;李川;唐常杰;徐洪宇;張永輝;楊寧;;面向OLGP的多維信息網(wǎng)絡數(shù)據(jù)倉庫模型設計[J];計算機科學與探索;2014年01期

10 侯增江;王勇;饒磊;;一種高可用性的計劃任務管理方法[J];計算機與現(xiàn)代化;2012年12期

相關(guān)博士學位論文 前1條

1 馬丹;任務間相互依賴的并行作業(yè)調(diào)度算法研究[D];華中科技大學;2007年

相關(guān)碩士學位論文 前4條

1 王偉;基于Hive的物流數(shù)據(jù)倉庫研究與實現(xiàn)[D];東華大學;2016年

2 張智敏;數(shù)據(jù)倉庫之ETL并行調(diào)度研發(fā)[D];吉林大學;2015年

3 金迎;基于SaaS的中小企業(yè)區(qū)域信息化支持平臺構(gòu)建研究[D];東北林業(yè)大學;2011年

4 王云輝;工作流建模過程的分析與設計[D];吉林大學;2004年

,

本文編號:2340804

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2340804.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶6d3a2***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com