基于機(jī)器學(xué)習(xí)的用戶反饋數(shù)據(jù)中心設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-06-05 16:37
本文選題:用戶反饋 + 文本分類; 參考:《北京交通大學(xué)》2017年碩士論文
【摘要】:該項(xiàng)目是來源于百度公司度秘產(chǎn)品線的實(shí)際項(xiàng)目,屬于互聯(lián)網(wǎng)人工智能領(lǐng)域。度秘是新一代智能操作系統(tǒng)的杰出代表,以NLP(Natural Language Processing,自然語言處理)技術(shù)為基礎(chǔ),明確用戶需求,提供相應(yīng)服務(wù)。在該產(chǎn)品線上,每天要接收到十萬量級的用戶評論和用戶反饋,數(shù)據(jù)量十分龐大。通過對用戶反饋進(jìn)行分類篩選,可以得到用戶對于當(dāng)前產(chǎn)品使用體驗(yàn)的相關(guān)問題和建議,直觀的反映出當(dāng)前版本產(chǎn)品所存在的問題和亟待優(yōu)化的部分,從而引導(dǎo)迭代需求,也為質(zhì)量保證人員跟蹤線上問題提供了依據(jù)。數(shù)量龐大的用戶反饋數(shù)據(jù)的文本分類和篩選成為問題的關(guān)鍵,但是當(dāng)前的解決方法為人工從線上數(shù)據(jù)庫中導(dǎo)出部分?jǐn)?shù)據(jù),并進(jìn)行人工分類篩選有用的反饋。論文運(yùn)用機(jī)器學(xué)習(xí)的方法,設(shè)計(jì)和實(shí)現(xiàn)了用戶反饋數(shù)據(jù)中心平臺(tái),將戶反饋數(shù)據(jù)導(dǎo)入到平臺(tái)中,可以高效且準(zhǔn)確地對龐大數(shù)據(jù)量的用戶反饋文本進(jìn)行分類篩選,進(jìn)行分類展現(xiàn)和統(tǒng)計(jì),方便相關(guān)人員進(jìn)行查閱并跟進(jìn)用戶反饋問題的原因排查和問題解決。用戶反饋數(shù)據(jù)中心平臺(tái)系統(tǒng)可劃分成三大部分:用戶反饋數(shù)據(jù)的拉取、反饋數(shù)據(jù)分類篩選和用戶反饋數(shù)據(jù)中心。其中,用戶反饋數(shù)據(jù)的拉取利用Python 編寫相關(guān)輪詢 API(Application Programming Interface,應(yīng)用程序編程接口)從公司統(tǒng)一的用戶反饋平臺(tái)上拉取該產(chǎn)品線的所有反饋數(shù)據(jù)并根據(jù)需要重新組織數(shù)據(jù)格式,并存儲(chǔ)到Hbase中;反饋數(shù)據(jù)的分類篩選利用機(jī)器學(xué)習(xí)中的遺傳算法等相關(guān)算法,完成特征詞的提取,優(yōu)化分類以及數(shù)據(jù)根據(jù)特征詞進(jìn)行相應(yīng)的數(shù)據(jù)分類篩選;數(shù)據(jù)中心基于PHP和MySQL,實(shí)現(xiàn)數(shù)據(jù)的分類展現(xiàn)、條件查詢、反饋問題跟蹤處理等功能。論文完成了用戶反饋數(shù)據(jù)中心平臺(tái)系統(tǒng)的需求分析、總體設(shè)計(jì)、詳細(xì)設(shè)計(jì)、測試驗(yàn)證等具體工作。本人參與設(shè)計(jì)和開發(fā)了用戶反饋數(shù)據(jù)拉取、基于機(jī)器學(xué)習(xí)的反饋數(shù)據(jù)分類篩選和數(shù)據(jù)平臺(tái)中的相關(guān)功能。目前論文完成的用戶反饋數(shù)據(jù)中心平臺(tái)系統(tǒng)已經(jīng)上線投入使用,數(shù)據(jù)分類合格率達(dá)到91%以上。用戶反饋數(shù)據(jù)中心極大地提高了用戶反饋處理的效率,并釋放了數(shù)據(jù)人力,獲得了部門領(lǐng)導(dǎo)和同事的一致好評。
[Abstract]:The project is derived from Baidu Company's secret product line of the actual project, belong to the field of artificial intelligence on the Internet. Degree secret is an outstanding representative of the new generation of intelligent operating system. It is based on the NLP Natural language processing (NLP) technology to define the user's needs and provide the corresponding services. In this product line, we receive 100,000 comments and feedback every day, and the amount of data is very large. By classifying and filtering the user feedback, we can get the relevant problems and suggestions of the user for the current product use experience, and intuitively reflect the problems existing in the current version of the product and the parts that need to be optimized so as to guide the iterative requirements. It also provides the basis for the quality assurance personnel to track the problems on the line. A large number of user feedback data text classification and filtering become the key to the problem, but the current solution is to manually export part of the data from the online database, and carry out manual classification filtering useful feedback. In this paper, the user feedback data center platform is designed and implemented by the method of machine learning, and the household feedback data is imported into the platform, which can efficiently and accurately classify and filter the user feedback text of the huge amount of data. Conduct classification presentation and statistics, facilitate related personnel to consult and follow up user feedback problem of the cause and problem solving. The system of user feedback data center platform can be divided into three parts: the pulling of user feedback data, the classification and filtering of feedback data and the user feedback data center. Among them, the pull of the user feedback data uses Python to write the related polling API Application programming Interface (API) from the company's unified user feedback platform to pull all the feedback data of the product line and reorganize the data format according to the need. The feedback data is classified and filtered by genetic algorithm in machine learning to extract the feature words, optimize the classification and select the corresponding data according to the feature words. The data center is based on PHP and MySQL to realize the functions of data classification, conditional query, feedback problem tracking and so on. In this paper, the requirements analysis, overall design, detailed design, test and verification of the user feedback data center platform system are completed. I have participated in the design and development of user feedback data extraction, feedback data classification and filtering based on machine learning and related functions in the data platform. At present, the user feedback data center platform system has been put into use, and the qualified rate of data classification is over 91%. The user feedback data center greatly improves the efficiency of user feedback processing, and releases the data manpower, which is well received by department leaders and colleagues.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.52;TP181
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 何永;;一種元數(shù)據(jù)驅(qū)動(dòng)數(shù)據(jù)倉庫設(shè)計(jì)與應(yīng)用[J];科技創(chuàng)新與應(yīng)用;2014年02期
2 趙龍;江榮安;;基于Hive的海量搜索日志分析系統(tǒng)研究[J];計(jì)算機(jī)應(yīng)用研究;2013年11期
3 黃楠;;海量信息存儲(chǔ)中數(shù)據(jù)庫性能優(yōu)化方法[J];科技通報(bào);2013年03期
4 張海軍;彭成;欒靜;;基于外部排序的字串左右熵快速計(jì)算方法[J];計(jì)算機(jī)工程與應(yīng)用;2011年19期
,本文編號(hào):1982651
本文鏈接:http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/1982651.html
最近更新
教材專著