Volume 73, Issue 1 p. 44-71
Original Article

An improved stochastic EM algorithm for large-scale full-information item factor analysis

Siliang Zhang

Siliang Zhang

Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China

Search for more papers by this author
Yunxiao Chen

Corresponding Author

Yunxiao Chen

Department of Statistics, London School of Economics and Political Science, London, UK

Correspondence should be addressed to Yunxiao Chen, London School of Economics, Columbia House, Room 5.16, Houghton Street, London WC2A 2AE, UK (email: [email protected]).Search for more papers by this author
Yang Liu

Yang Liu

Department of Human Development and Quantitative Methodology, University of Maryland, College Park MD

Search for more papers by this author
First published: 03 December 2018
Citations: 30

Abstract

In this paper, we explore the use of the stochastic EM algorithm (Celeux & Diebolt (1985) Computational Statistics Quarterly, 2, 73) for large-scale full-information item factor analysis. Innovations have been made on its implementation, including an adaptive-rejection-based Gibbs sampler for the stochastic E step, a proximal gradient descent algorithm for the optimization in the M step, and diagnostic procedures for determining the burn-in size and the stopping of the algorithm. These developments are based on the theoretical results of Nielsen (2000, Bernoulli, 6, 457), as well as advanced sampling and optimization techniques. The proposed algorithm is computationally efficient and virtually tuning-free, making it scalable to large-scale data with many latent traits (e.g. more than five latent traits) and easy to use for practitioners. Standard errors of parameter estimation are also obtained based on the missing-information identity (Louis, 1982, Journal of the Royal Statistical Society, Series B, 44, 226). The performance of the algorithm is evaluated through simulation studies and an application to the analysis of the IPIP-NEO personality inventory. Extensions of the proposed algorithm to other latent variable models are discussed.