Volume 57, Issue 1

Delta: A new measure of agreement between two raters

A. Martín Andrés

Corresponding Author

Facultad de Medicina, Universidad de Granada, Spain

Bioestadistica, Facultad de Medicina, Universidad de Granada, 18071 Granada, Spain (e‐mail: amartina@ugr.es).Search for more papers by this author
P. Femia Marzo

Facultad de Medicina, Universidad de Granada, Spain

Search for more papers by this author
First published: 16 December 2010
Citations: 46

Abstract

The most common measure of agreement for categorical data is the coefficient kappa. However, kappa performs poorly when the marginal distributions are very asymmetric, it is not easy to interpret, and its definition is based on hypothesis of independence of the responses (which is more restrictive than the hypothesis that kappa has a value of zero). This paper defines a new measure of agreement, delta, ‘the proportion of agreements that are not due to chance’, which comes from model of multiple‐choice tests and does not have the previous limitations. The paper shows that kappa and delta generally take very similar values, except when the marginal distributions are strongly unbalanced. The case of the 2 × 2 tables (which admits very simple solutions) is considered in detail.

Number of times cited according to CrossRef: 46

  • Homogeneity score test of AC1 statistics and estimation of common AC1 in multiple or stratified inter-rater agreement studies, BMC Medical Research Methodology, 10.1186/s12874-019-0887-5, 20, 1, (2020).
  • Hubert's multi‐rater kappa revisited, British Journal of Mathematical and Statistical Psychology, 10.1111/bmsp.12167, 73, 1, (1-22), (2019).
  • Kappa Coefficients for Missing Data, Educational and Psychological Measurement, 10.1177/0013164418823249, 79, 3, (558-576), (2019).
  • Functioning in schizophrenia from the perspective of psychologists: A worldwide study, PLOS ONE, 10.1371/journal.pone.0217936, 14, 6, (e0217936), (2019).
  • What Does It Mean to Be Popular in Spain? Mixed-Method Analysis of Popularity as Perceived by Teenagers and Their Teachers, Youth & Society, 10.1177/0044118X19855899, (0044118X1985589), (2019).
  • Why Cohen’s Kappa should be avoided as performance measure in classification, PLOS ONE, 10.1371/journal.pone.0222916, 14, 9, (e0222916), (2019).
  • Transnormativity and Transgender Identity Development: A Master Narrative Approach, Sex Roles, 10.1007/s11199-018-0992-7, (2019).
  • Kappa and Rater Accuracy: Paradigms and Parameters, Educational and Psychological Measurement, 10.1177/0013164416663277, 77, 6, (1019-1047), (2016).
  • Development of the Brazilian version of the Child Hayling Test, Trends in Psychiatry and Psychotherapy, 10.1590/2237-6089-2016-0019, 38, 3, (164-174), (2016).
  • Propiedades psicométricas del Cuestionario de conciencia emocional en población infantil española, Revista Latinoamericana de Psicología, 10.1016/j.rlp.2015.09.012, 48, 3, (183-190), (2016).
  • The Exact Variance of Weighted Kappa with Multiple Raters, Psychological Reports, 10.2466/pr0.101.2.655-660, 101, 2, (655-660), (2016).
  • Chance-corrected measures of reliability and validity in K K tables, Statistical Methods in Medical Research, 10.1191/0962280205sm412oa, 14, 5, (473-492), (2016).
  • Brazilian adaptation of the Hotel Task: a tool for the ecological assessment of executive functions, Dementia & Neuropsychologia, 10.1590/1980-57642015DN92000010, 9, 2, (156-164), (2015).
  • Regulating emotion and identity by narrating harm, Journal of Research in Personality, 10.1016/j.jrp.2015.07.003, 58, (127-136), (2015).
  • Adaptação da Bateria Montreal de Avaliação da Comunicação para o Português Europeu, Audiology - Communication Research, 10.1590/S2317-64312015000200001563, 20, 3, (274-284), (2015).
  • Guidelines for Establishing Reliability When Coding Narrative Data, Emerging Adulthood, 10.1177/2167696815587648, 3, 6, (375-387), (2015).
  • Observer agreement paradoxes in 2x2 tables: comparison of agreement measures, BMC Medical Research Methodology, 10.1186/1471-2288-14-100, 14, 1, (2014).
  • Multiple Choice Tests: Inferences Based on Estimators of Maximum Likelihood, Open Journal of Statistics, 10.4236/ojs.2014.46045, 04, 06, (466-483), (2014).
  • Validación argentina de la Escala de Orientación a la Dominancia Social, Revista Latinoamericana de Psicología, 10.1016/S0120-0534(14)70004-4, 46, 1, (35-43), (2014).
  • Brazilian Portuguese version of the Patient Competency Rating Scale (PCRS-R-BR): semantic adaptation and validity, Trends in Psychiatry and Psychotherapy, 10.1590/2237-6089-2013-0021, 36, 1, (40-51), (2014).
  • 1980–2000, A Chronicle of Permutation Statistical Methods, 10.1007/978-3-319-02744-9, (275-362), (2014).
  • Metaanalyse–praktische Schritte und Entscheidungen im Umsetzungsprozess, Zeitschrift für Arbeits- und Organisationspsychologie A&O, 10.1026/0932-4089/a000165, 58, 4, (186-205), (2014).
  • Brazilian Version of the Protocole Montréal d'Évaluation de la Communication (Protocole MEC): Normative and Reliability Data, The Spanish Journal of Psychology, 10.1017/S1138741600004686, 11, 2, (678-688), (2014).
  • Can home health aids using the clinical algorithm Algo choose the right bath seat for clients having a straightforward problem?, Clinical Rehabilitation, 10.1177/0269215513494027, 28, 2, (172-182), (2013).
  • College Students’ Storytelling of Ethnicity-Related Events in the Academic Domain, Journal of Adolescent Research, 10.1177/0743558411432633, 27, 2, (203-230), (2012).
  • The potential for integration of environmental data from regional stratifications into a European monitoring framework, Journal of Environmental Planning and Management, 10.1080/09640568.2011.575698, 55, 1, (39-57), (2012).
  • Observational study of treatment compliance in women initiating antiresorptive therapy with or without calcium and vitamin D supplements in Spain, Menopause: The Journal of The North American Menopause Society, 10.1097/gme.0b013e318223bd6b, 19, 1, (89-95), (2012).
  • Using functional neuroimaging to test theories of cognition: A selective survey of studies from 2007 to 2011 as a contribution to the Decade of the Mind Initiative, Cortex, 10.1016/j.cortex.2012.05.024, 48, 9, (1247-1250), (2012).
  • Patterns of Father Self-Evaluations Among Mexican and European American Men and Links to Adolescent Adjustment, Fathering: A Journal of Theory, Research, and Practice about Men as Fathers, 10.3149/fth.1002.213, 10, 2, (213-235), (2012).
  • Inter‐rater reliability of the Pediatric Transplant Rating Instrument (P‐TRI): Challenges to reliably identifying adherence risk factors during pediatric pre‐transplant evaluations, Pediatric Transplantation, 10.1111/j.1399-3046.2010.01428.x, 15, 2, (142-147), (2011).
  • A Formal Proof of a Paradox Associated with Cohen’s Kappa, Journal of Classification, 10.1007/s00357-010-9060-x, 27, 3, (322-332), (2010).
  • Does the clinical context affect the validity of bathroom recommendations made by home health aides?, International Journal of Industrial Ergonomics, 10.1016/j.ergon.2009.08.007, 40, 1, (82-89), (2010).
  • Utility and Limitations of the Traditional Diagnostic Approach to Hyponatremia: A Diagnostic Study, The American Journal of Medicine, 10.1016/j.amjmed.2010.01.013, 123, 7, (652-657), (2010).
  • Peirce's and Cohen's for Measures of Rater Reliability , Journal of Probability and Statistics, 10.1155/2010/480364, 2010, (1-10), (2010).
  • Criterion validity of a home health aide's algorithm for recommending bathroom equipment, Canadian Journal of Occupational Therapy, 10.1177/000841740907600s10, 76, 1_suppl, (246-256), (2009).
  • Assessing Pain in Dementia Patients with Comorbid Delirium and/or Depression, Pain Management Nursing, 10.1016/j.pmn.2007.12.004, 9, 2, (48-54), (2008).
  • Chance-Corrected Measures of Reliability and Validity in 2 × 2 Tables, Communications in Statistics - Theory and Methods, 10.1080/03610920701669884, 37, 5, (760-772), (2008).
  • Resampling Probability Values for Weighted Kappa with Multiple Raters, Psychological Reports, 10.2466/pr0.102.2.606-613, 102, 2, (606-613), (2008).
  • Assessing Agreement between Multiple Raters with Missing Rating Information, Applied to Breast Cancer Tumour Grading, PLoS ONE, 10.1371/journal.pone.0002925, 3, 8, (e2925), (2008).
  • Estimation for the Change of Daily Maxima Temperature, Korean Journal of Applied Statistics, 10.5351/KJAS.2007.20.1.001, 20, 1, (1-9), (2007).
  • A New Measure of Agreement to Resolve the Two Paradoxes of Cohen's Kappa, Korean Journal of Applied Statistics, 10.5351/KJAS.2007.20.1.117, 20, 1, (117-132), (2007).
  • THE EXACT VARIANCE OF WEIGHTED KAPPA WITH MULTIPLE RATERS, Psychological Reports, 10.2466/PR0.101.6.655-660, 101, 6, (655), (2007).
  • A New Method of Yielding the GDP of Korea Small Business: Conversion of the Statistics of Workplace Units to Industrial Units, Korean Journal of Applied Statistics, 10.5351/KJAS.2007.20.1.011, 20, 1, (11-22), (2007).
  • The Effect of Verification Bias in the Naïve Estimators of Accuracy of a Binary Diagnostic Test, Communications in Statistics - Simulation and Computation, 10.1080/03610910701539369, 36, 5, (959-972), (2007).
  • Estimating area errors for fine‐scale feature‐based ecological mapping, International Journal of Remote Sensing, 10.1080/01431160600735632, 27, 21, (4731-4749), (2007).
  • How to characterize and predict alien species? A response to Pysek et al. (2004), Diversity and Distributions, 10.1111/j.1366-9516.2005.00138.x, 11, 1, (121-123), (2005).