From missing data to informative GPA predictions: Navigating selection process beliefs with the partial identifiability approach
Corresponding Author
Eduardo Alarcón-Bustamante
Escuela de Psicología, Pontificia Universidad Católica de Chile, Santiago de Chile, Chile
Departamento de Evaluación Medición y Registro Educacional (DEMRE), Universidad de Chile, Santiago de Chile, Chile
Millennium Nucleus on Intergenerational Mobility: From Modelling to Policy (MOVI), Santiago de Chile, Chile
Interdisciplinary Laboratory of Social Statistics, Santiago de Chile, Chile
Correspondence
Eduardo Alarcón-Bustamante, Av. Vicuña Mackenna 4860, Macul, Santiago de Chile, Chile.
Email: [email protected]
Contribution: Conceptualization, Investigation, Funding acquisition, Writing - original draft, Writing - review & editing, Visualization, Validation, Methodology, Software, Formal analysis, Project administration, Resources, Supervision, Data curation
Search for more papers by this authorJorge González
Millennium Nucleus on Intergenerational Mobility: From Modelling to Policy (MOVI), Santiago de Chile, Chile
Interdisciplinary Laboratory of Social Statistics, Santiago de Chile, Chile
Faculty of Mathematics, Pontificia Universidad Católica de Chile, Santiago de Chile, Chile
Contribution: Data curation, Supervision, Resources, Project administration, Software, Formal analysis, Methodology, Validation, Visualization, Writing - review & editing, Writing - original draft, Funding acquisition, Investigation, Conceptualization
Search for more papers by this authorDavid Torres Irribarra
Escuela de Psicología, Pontificia Universidad Católica de Chile, Santiago de Chile, Chile
Millennium Nucleus on Intergenerational Mobility: From Modelling to Policy (MOVI), Santiago de Chile, Chile
Interdisciplinary Laboratory of Social Statistics, Santiago de Chile, Chile
Contribution: Conceptualization, Investigation, Funding acquisition, Writing - original draft, Writing - review & editing, Visualization, Validation, Methodology, Software, Formal analysis, Project administration, Resources, Supervision, Data curation
Search for more papers by this authorErnesto San Martín
Millennium Nucleus on Intergenerational Mobility: From Modelling to Policy (MOVI), Santiago de Chile, Chile
Interdisciplinary Laboratory of Social Statistics, Santiago de Chile, Chile
Faculty of Mathematics, Pontificia Universidad Católica de Chile, Santiago de Chile, Chile
The Economics School of Louvain, Université Catholique de Louvain, Brussels, Belgium
Contribution: Conceptualization, Investigation, Funding acquisition, Writing - original draft, Writing - review & editing, Visualization, Validation, Methodology, Software, Formal analysis, Project administration, Resources, Supervision, Data curation
Search for more papers by this authorCorresponding Author
Eduardo Alarcón-Bustamante
Escuela de Psicología, Pontificia Universidad Católica de Chile, Santiago de Chile, Chile
Departamento de Evaluación Medición y Registro Educacional (DEMRE), Universidad de Chile, Santiago de Chile, Chile
Millennium Nucleus on Intergenerational Mobility: From Modelling to Policy (MOVI), Santiago de Chile, Chile
Interdisciplinary Laboratory of Social Statistics, Santiago de Chile, Chile
Correspondence
Eduardo Alarcón-Bustamante, Av. Vicuña Mackenna 4860, Macul, Santiago de Chile, Chile.
Email: [email protected]
Contribution: Conceptualization, Investigation, Funding acquisition, Writing - original draft, Writing - review & editing, Visualization, Validation, Methodology, Software, Formal analysis, Project administration, Resources, Supervision, Data curation
Search for more papers by this authorJorge González
Millennium Nucleus on Intergenerational Mobility: From Modelling to Policy (MOVI), Santiago de Chile, Chile
Interdisciplinary Laboratory of Social Statistics, Santiago de Chile, Chile
Faculty of Mathematics, Pontificia Universidad Católica de Chile, Santiago de Chile, Chile
Contribution: Data curation, Supervision, Resources, Project administration, Software, Formal analysis, Methodology, Validation, Visualization, Writing - review & editing, Writing - original draft, Funding acquisition, Investigation, Conceptualization
Search for more papers by this authorDavid Torres Irribarra
Escuela de Psicología, Pontificia Universidad Católica de Chile, Santiago de Chile, Chile
Millennium Nucleus on Intergenerational Mobility: From Modelling to Policy (MOVI), Santiago de Chile, Chile
Interdisciplinary Laboratory of Social Statistics, Santiago de Chile, Chile
Contribution: Conceptualization, Investigation, Funding acquisition, Writing - original draft, Writing - review & editing, Visualization, Validation, Methodology, Software, Formal analysis, Project administration, Resources, Supervision, Data curation
Search for more papers by this authorErnesto San Martín
Millennium Nucleus on Intergenerational Mobility: From Modelling to Policy (MOVI), Santiago de Chile, Chile
Interdisciplinary Laboratory of Social Statistics, Santiago de Chile, Chile
Faculty of Mathematics, Pontificia Universidad Católica de Chile, Santiago de Chile, Chile
The Economics School of Louvain, Université Catholique de Louvain, Brussels, Belgium
Contribution: Conceptualization, Investigation, Funding acquisition, Writing - original draft, Writing - review & editing, Visualization, Validation, Methodology, Software, Formal analysis, Project administration, Resources, Supervision, Data curation
Search for more papers by this authorAbstract
The extent to which college admissions test scores can forecast college grade point average (GPA) is often evaluated in predictive validity studies using regression analyses. A problem in college admissions processes is that we observe test scores for all the applicants; however, we cannot observe the GPA of applicants who were not selected. The standard solution to tackle this problem has relied upon strong assumptions to identify the exact value of the regression function in the presence of missing data. In this paper, we present an alternative approach based on the theory of partial identifiability that considers a variety of milder assumptions to learn about the regression function. Using a university admissions dataset we illustrate how results can vary as a function of the assumptions that one is willing to make about the selection process.
CONFLICT OF INTEREST STATEMENT
The author declares that there is no conflict of interest.
Open Research
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available in the GitHub repository of the first author at https://github.com/edalarconb/BJMSP2024.
REFERENCES
- Alarcón-Bustamante, E. (2024). DataBritish.xlsx. https://github.com/edalarconb/BJMSP2024
- Alarcón-Bustamante, E., San Martín, E., & González, J. (2021). On the marginal effect under partitioned populations: Definition and interpretation. In M. Wiberg, D. Molenaar, J. González, U. Böckenholt, & J.-S. Kim (Eds.), Quantitative psychology. Springer International Publishing.
10.1007/978-3-030-74772-5_15 Google Scholar
- Alarcón-Bustamante, E., & Varas, I. M. (2023). Bounding R package. https://github.com/edalarconb/Bounding
- American Educational Research Association, American Psychological Association, National Council on Measurement in Education, & Joint Committee on Standards for Educational and Psychological Testing (U.S.). (2014). Standards for educational and psychological testing. American Educational Research Association.
- Aradillas-Lopez, A. (2011). Nonparametric probability bounds for nash equilibrium actions in a simultaneous discrete game. Quantitative Economics, 2(2), 135–171. https://doi.org/10.3982/QE74
- Ayers, J. B., & Peters, M. (1977). Predictive validity of the test of English as foreign language for Asian graduate students in engineering, chemistry, or mathematics. Educational and Psychological Measurement, 37(2), 461–463.
10.1177/001316447703700221 Google Scholar
- Breskin, A., Westreich, D., Cole, S., & Edwards, J. (2019). Using bounds to compare the strength of exchangeability assumptions for internal and external validity. American Journal of Epidemiology, 188(7), 1355–1360.
- Diemer, E., Shi, J., & Swanson, S. (2024). Partial identification of the effects of sustained treatment strategies. Epidemiology, 35(3), 308–312.
- Diemmer, E., Zuccolo, L., & Swanson, S. (2023). Partial identification of the average causal effect in multiple study populations. The challenge of combining mendelian randomization studies. Epidemiology, 34, 20–28.
- Fan, Y., Sherman, R., & Shum, M. (2014). Identifying treatment effects under data combination. Econometrica, 82(2), 811–822.
- Florens, J., & Mouchart, M. (1982). A note on noncausality. Econometrica, 50(3), 583–592.
- Frake, J., Gibbs, A., Goldfarb, B. D., Hiraiwa, T., Starr, E., & Yamaguchi, S. (2023). From perfect to practical: Partial identification methods for causal inference in strategic management research. https://ssrn.com/abstract=4228655 or https://doi.org/10.2139/ssrn.4228655
10.2139/ssrn.4228655 Google Scholar
- Geiser, S., & Studley, R. (2002). UC and the SAT: Predictive validity and differential impact of the SAT I and SAT II at the university of California. Educational Assessment, 8(1), 1–26.
10.1207/S15326977EA0801_01 Google Scholar
- Giustinelli, P. (2011). Non-parametric bounds on quantiles under monotonicity assumptions: With an application to the italian education returns. Journal of Applied Econometrics, 26(6), 783–824. https://doi.org/10.1002/jae.1132
10.1002/jae.1132 Google Scholar
- Guilliksen, H. (1950). Theory of mental test. John Willey and Sons.
10.1037/13240-000 Google Scholar
- Gundersen, C., Kreider, B., & Pepper, J. (2012). The impact of the National School Lunch Program on child health: A nonparametric bounds analysis. Journal of Econometrics, 166, 79–91.
- Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. The Annals of Economic and Social Measurement, 46, 931–961.
- Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 47(1), 153–161.
- Hirano, K., & Imbens, G. W. (2004). The propensity score with continuous treatments. In W. A. Shewhart & S. S. Wilks (Eds.), Applied Bayesian modeling and causal inference from incomplete-data perspectives: An essential journey with donald rubin's statistical family (pp. 73–84). Wiley Series in Probability and Statistics.
- Hsu, J.-W. Y. (1995). Sampling behaviour in estimating predictive validity in the context of selection and latent variable modelling: A monte carlo study. British Journal of Mathematical and Statistical Psychology, 48(1), 75–97.
10.1111/j.2044-8317.1995.tb01051.x Google Scholar
- Imbens, G. (2000). The role of the propensity score in estimating dose-response functions. Biometrika, 87(3), 706–710.
- Imbens, G., & Manski, C. (2004). Confidence intervals for partially identified parameters. Econometrica, 72(6), 1845–1857.
- Kaido, H., Molinari, F., & Stoye, J. (2019). Confidence intervals for projections of partially identified parameters. Econometrica, 87(4), 1397–1432.
- Koopmans, T. C. (1949). Identification problems in economic model construction. Econometrica, 17(2), 125–144.
- Koretz, D., Yu, C., Mbekeani, P. P., Langi, M., Dhaliwal, T., & Braslow, D. (2016). Predicting freshman grade point average from college admissions test scores and state high school test scores. AERA Open, 2(4), 1–13.
- Lawley, D. (1943). IV.—A note on Karl Pearson's selection formulae. Proceedings of the Royal Society of Edinburgh. Section A Mathematical and Physical Science, 62(1), 28–30.
10.1017/S0080454100006385 Google Scholar
- Lee, S., & Mendoza, J. (2022). The biasing effects of selection and attrition on estimating the mean. British Journal of Mathematical and Statistical Psychology, 76(1), 1–25. https://doi.org/10.1111/bmsp.12284
- Linn, R. L. (1983). Pearson selection formulas: Implications for studies of predictive bias and estimates of educational effects in selected samples. Journal of Educational Measurement, 20(1), 1–15.
- Lord, F. (1952). A theory of test score. Psychometric Society.
- Lord, F. (1980). Applications of item response theory to practical testing problem. Routledge.
- Makransky, G., Havmose, P., Vang, M. L., Andersen, T. E., & Nielsen, T. (2017). The predictive validity of using admissions testing and multiple mini-interviews in undergraduate university admissions. Higher Education Research and Development, 36(5), 1003–1016.
10.1080/07294360.2016.1263832 Google Scholar
- Manski, C. (1989). Anatomy of the selection problem. The Journal of Human Resources, 24(3), 343–360.
- Manski, C. (1993). Identification problems in the social sciences. Sociological Methodology, 23, 1–56.
- Manski, C. (2003). Partial identification of probability distribution. Springer.
- Manski, C. (2005). Social choice with partial knowledge of treatment response ( 1st ed.). Princeton University Press.
- Manski, C. (2007). Identification for prediction and decisio. Harvard University Press.
- Manski, C. (2013). Public policy in an uncertain world: Analysis and decision. Harvard University Press.
10.4159/harvard.9780674067547 Google Scholar
- Manski, C. (2016). Credible interval estimates for official statistic with survey nonresponse. Journal of Econometrics, 191, 293–301.
- Manski, C., & Pepper, J. (2000). Monotone instrumental variables: With an application to return to schooling. Econometrica, 68(4), 997–1010.
- Manski, C., & Straub, J. D. (2000). Worker perceptions of job insecurity in the mid-1990s: Evidence from the survey of economic expectations. Journal of Human Resources, 35(3), 447–479.
- Marchenko, Y. V., & Genton, M. G. (2012). A Heckman selection-t model. Journal of the American Statistical Association, 107(497), 304–317.
- Meagher, D. G., Lin, A., & Stellato, C. P. (2006). A predictive validity study of the pharmacy college admission test. American Journal of Pharmaceutical Education, 70(3), 53.
- Mendoza, J., & Mumford, M. (1987). Corrections for attenuation and range restriction on the predictor. Journal of Educational Statistics, 12(3), 282–293.
- Muthén, B. O., & Hsu, J.-W. Y. (1993). Selection and predictive validity with latent variable structures. British Journal of Mathematical and Statistical Psychology, 46(2), 255–271.
10.1111/j.2044-8317.1993.tb01015.x Google Scholar
- Pearson, K. (1903). Mathematical contribution to the theory of evolution-XI on the influence of natural selection on the variability and correlation of organs. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 200, 1–66.
- Pepper, J. (2000). The intergenerational transmission of welfare receipt: A nonparametric bounds analysis. The Review of Economics and Statistics, 82, 472–488.
- Rosenbaum, P., & Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.
- San Martín, E., & Alarcón-Bustamante, E. (2022). Dissecting Chilean surveys: The case of missing outcomes. Chilean Journal of Statistics, 13(1), 17–45.
10.32372/chjs.13-01-02 Google Scholar
- San Martín, E., & González, J. (2022). A critical view on the NEAT equating design: Statistical modelling and identifiability problems. Journal of Educational and Behavioral Statistics, 47(4), 406–437.
10.3102/10769986221090609 Google Scholar
- San Martín, E., Perticará, M., Varas, I. M., Asahi, K., & González, J. (2024). The role of identifiability in empirical research. In W. P. Fisher, L. R. Pendrill, K.-D. Sommer, & T. Fröhlich (Eds.), Models, measurement, and metrology extending the SI: Trust and quality assured knowledge infrastructure (pp. 133–158). De Gruyter Oldenbourg.
10.1515/9783111036496-004 Google Scholar
- Smits, N., MelIenbergh, G. J., & Vorst, H. C. M. (2002). Alternative missing data techniques to grade point average: Imputing unavailable grades. Journal of Educational Measurement, 39(3), 187–206.
- Stoye, J. (2007). Bounds on generalized linear predictors with incomplete outcome data. Reliable Computing, 13, 293–302.
- Stoye, J. (2009). More on confidence intervals for partially identified parameters. Econometrica, 77(4), 1299–1315. https://doi.org/10.3982/ECTA7347
- Stoye, J. (2011). Minimax regret treatment choice with covariates or with limited validity of experiments. Journal of Econometrics, 166, 138–156.
10.1016/j.jeconom.2011.06.012 Google Scholar
- Tamer, E. (2010). Partial identification in econometrics. Annual Review of Economics, 2, 167–195.
- Thorndike, R. (1949). Personnel selection: Test and measurement technique. Wiley.
- Zimmermann, S., Klusmann, D., & Hampe, W. (2017). Correcting the predictive validity of a selection test for the effect of indirect range restriction. BMC Medical Education, 17, 246.
10.1186/s12909-017-1070-5 Google Scholar