天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

評(píng)分過(guò)程與評(píng)分員信念—評(píng)分員差異的內(nèi)在因素研究

發(fā)布時(shí)間:2018-05-31 00:30

  本文選題:評(píng)分員信念 + 評(píng)分過(guò)程 ; 參考:《廣東外語(yǔ)外貿(mào)大學(xué)》2009年博士論文


【摘要】: 主觀性考試中,評(píng)分員差異是影響考試信效度和公平性的最為重要的因素之一。和大多數(shù)利用統(tǒng)計(jì)方法描述評(píng)分員誤差的研究不同,本研究從評(píng)分員自身入手,深入探討他們?cè)谠u(píng)分中產(chǎn)生差異的內(nèi)在原因,并期望通過(guò)對(duì)較好和較差評(píng)分員的對(duì)比研究,找到評(píng)分員能夠準(zhǔn)確一致地進(jìn)行評(píng)分的內(nèi)在決定因素,以期對(duì)改進(jìn)評(píng)分員培訓(xùn)和評(píng)分流程以及提高考試信效度提供實(shí)證證據(jù)及有效反饋。本研究的背景是全國(guó)大學(xué)英語(yǔ)四級(jí)考試的寫(xiě)作評(píng)分,所有受試均為參加過(guò)四級(jí)考試正式評(píng)分環(huán)節(jié)的評(píng)分員,評(píng)分標(biāo)準(zhǔn)和作文題目均來(lái)自真實(shí)的四級(jí)考試。實(shí)證研究包括三個(gè)數(shù)據(jù)收集環(huán)節(jié):獨(dú)立評(píng)分,有聲思維和開(kāi)放式半結(jié)構(gòu)式訪談。在利用多層面Rasch模型對(duì)評(píng)分員的評(píng)分情況進(jìn)行統(tǒng)計(jì)分析的基礎(chǔ)上,作者根據(jù)受試的評(píng)分與專(zhuān)家評(píng)分的吻合程度將評(píng)分員分為較好和較差兩組。利用受試在有聲思維時(shí)產(chǎn)出的口頭報(bào)告以及一對(duì)一的訪談?dòng)涗,,作者?duì)比了兩組評(píng)分員在其評(píng)分思維過(guò)程以及評(píng)分信念上的異同。 分析的結(jié)果揭示了較好和較差兩組評(píng)分員在很多方面都存在差異。首先,在評(píng)分過(guò)程中,不同的評(píng)分員傾向于關(guān)注不同的文章特征。好評(píng)分員關(guān)注的語(yǔ)言特點(diǎn)更為全面,包括文章內(nèi)容,整體結(jié)構(gòu)安排,語(yǔ)篇特征,句子結(jié)構(gòu),詞匯等;而較差評(píng)分員更多地關(guān)注一些孤立的、零散的語(yǔ)言特征,比如詞匯的多樣性,句子的長(zhǎng)短和復(fù)雜程度,連接詞的使用等。其次,兩組評(píng)分員對(duì)所關(guān)注的信息有不同的處理方式。好評(píng)分員更善于將語(yǔ)言錯(cuò)誤分類(lèi),總結(jié)信息,進(jìn)行推斷,并且能更加有效地自己的評(píng)分過(guò)程和評(píng)分準(zhǔn)確性進(jìn)行自我監(jiān)控。此外,不同評(píng)分員的評(píng)分信念也不同。最主要的區(qū)別是他們對(duì)于評(píng)分對(duì)象和評(píng)分標(biāo)準(zhǔn)的認(rèn)識(shí)和理解。好評(píng)分員與較差評(píng)分員相比,對(duì)寫(xiě)作能力的定義更加清楚、全面。相應(yīng)地,他們對(duì)文章中反映寫(xiě)作能力的語(yǔ)言特征的定義也更為全面,系統(tǒng)化,并有系統(tǒng)、一致的標(biāo)準(zhǔn)來(lái)區(qū)分這些特征的權(quán)重。好評(píng)分員對(duì)評(píng)分標(biāo)準(zhǔn)中抽象描述語(yǔ)的理解和操作化定義包括了更為全面的語(yǔ)言特征。研究結(jié)果還表明好評(píng)分員之間的評(píng)分信念更為一致,與專(zhuān)家的期望和考試大綱中的構(gòu)念定義也更為接近。 通過(guò)比較,作者嘗試將評(píng)分員的評(píng)分結(jié)果與他們內(nèi)在的思維過(guò)程與信念聯(lián)系起來(lái),并發(fā)現(xiàn)評(píng)分員的內(nèi)在差異,尤其是他們?cè)谛拍钌系牟町悾撬麄冊(cè)u(píng)分行為上差異的根源。這對(duì)于評(píng)分員培訓(xùn)的啟示是:培訓(xùn)的目的和重點(diǎn)在于統(tǒng)一評(píng)分員對(duì)于評(píng)分對(duì)象和評(píng)分工具以及對(duì)與自身責(zé)任與任務(wù)等方面的理解和認(rèn)識(shí),只有在內(nèi)在信念上達(dá)成一致,形成較為統(tǒng)一的認(rèn)識(shí),評(píng)分員的評(píng)分才能準(zhǔn)確反映考試開(kāi)發(fā)者和管理者的意圖,體現(xiàn)考試所要測(cè)量的潛在能力,在某種意義上形成一個(gè)評(píng)價(jià)共同體。
[Abstract]:One of the most important factors affecting reliability, validity and fairness in subjective tests is the difference of raters. Different from most studies that describe the error of graders by statistical methods, this study starts with the raters themselves, and probes into the internal causes of their differences in scoring, and looks forward to a comparative study of better and worse graders. To find out the intrinsic determinants of grading, to provide empirical evidence and effective feedback to improve the training and scoring process of the graders and to improve the reliability and validity of the test. The background of this study is the writing score of CET-4. All the subjects are all graders who have taken part in the formal grading process of CET-4. The scoring criteria and composition questions are all from the real CET-4 test. The empirical study consists of three data collection sections: independent score, sound thinking and open semi-structured interviews. Based on the statistical analysis of the grader's score by using the multi-level Rasch model, the author divides the grader into better and worse groups according to the degree of agreement between the score and the expert score. Using oral reports and one-to-one interview records, the authors compared the differences and similarities between the two groups in the process of scoring thinking and scoring beliefs. The results of the analysis revealed that there were differences between the better and the worse groups of graders in many ways. First, different raters tend to focus on different characteristics of the article during the scoring process. The good graders pay more attention to the language characteristics, including the content of the article, the overall structure arrangement, the text features, sentence structure, vocabulary and so on, while the poor graders pay more attention to some isolated and scattered language features. For example, the variety of words, the length and complexity of sentences, the use of conjunction words and so on. Second, the two groups of raters had different approaches to the information they were concerned with. Good graders are better at classifying language errors, summarizing information, inferring, and more effectively monitoring their own grading process and scoring accuracy. In addition, different raters have different scoring beliefs. The main difference is their knowledge and understanding of rating objects and criteria. Good graders have a clearer and more comprehensive definition of writing ability than poor graders. Accordingly, their definitions of linguistic features that reflect writing competence are more comprehensive, systematic, and systematic, with consistent criteria to distinguish the weight of these features. The understanding and operational definition of abstract descriptors in the scoring criteria by good graders includes more comprehensive language features. The results also show that the scoring beliefs of the good graders are more consistent with the expectations of experts and the definition of constitution in the exam syllabus. Through comparison, the author tries to link the score result of the grader with their inner thinking process and belief, and finds out that the internal difference of the grader, especially the difference in their belief, is the root of the difference in their scoring behavior. The inspiration for the training of raters is that the purpose and emphasis of the training is to unify their understanding and understanding of the scoring objects and scoring tools, as well as their own responsibilities and tasks, and only to reach agreement on their internal beliefs. In order to form a unified understanding, the scoring system can accurately reflect the intention of the test developer and administrator, reflect the potential ability of the test to be measured, and form an evaluation community in a certain sense.
【學(xué)位授予單位】:廣東外語(yǔ)外貿(mào)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2009
【分類(lèi)號(hào)】:G424.74

【引證文獻(xiàn)】

相關(guān)期刊論文 前3條

1 徐鷹;;大學(xué)英語(yǔ)寫(xiě)作能力構(gòu)念的操作定義研究[J];考試與評(píng)價(jià)(大學(xué)英語(yǔ)教研版);2012年06期

2 李航;;基于概化理論和多層面Rasch模型的CET-6作文評(píng)分信度研究[J];外語(yǔ)與外語(yǔ)教學(xué);2011年05期

3 徐鷹;;不同性別評(píng)分人差異的實(shí)證研究[J];外語(yǔ)測(cè)試與教學(xué);2013年03期

相關(guān)博士學(xué)位論文 前1條

1 李航;評(píng)分員與評(píng)分量表間的交互作用對(duì)EFL作文評(píng)分結(jié)果與過(guò)程的影響[D];浙江大學(xué);2012年



本文編號(hào):1957512

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/jiaoyulunwen/jsxd/1957512.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)17836***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com