Volume 71, Issue 2
Original Article
Free Access

Approximated adjusted fractional Bayes factors: A general method for testing informative hypotheses

Xin Gu

Corresponding Author

E-mail address: xin.gu@liverpool.ac.uk

Department of Methodology and Statistics, Utrecht University, The Netherlands

Department of Geography and Planning, University of Liverpool, UK

Corresponding should be addressed to Xin Gu, Department of Geography and Planning, University of Liverpool, Roxby Building, Liverpool L69 7ZT, UK (email: xin.gu@liverpool.ac.uk).Search for more papers by this author
Joris Mulder

Department of Methodology and Statistics, Tilburg University, The Netherlands

Search for more papers by this author
Herbert Hoijtink

Department of Methodology and Statistics, Utrecht University, The Netherlands

CITO Institute for Educational Measurement, Arnhem, The Netherlands

Search for more papers by this author
First published: 31 August 2017
Citations: 15

Abstract

Informative hypotheses are increasingly being used in psychological sciences because they adequately capture researchers’ theories and expectations. In the Bayesian framework, the evaluation of informative hypotheses often makes use of default Bayes factors such as the fractional Bayes factor. This paper approximates and adjusts the fractional Bayes factor such that it can be used to evaluate informative hypotheses in general statistical models. In the fractional Bayes factor a fraction parameter must be specified which controls the amount of information in the data used for specifying an implicit prior. The remaining fraction is used for testing the informative hypotheses. We discuss different choices of this parameter and present a scheme for setting it. Furthermore, a software package is described which computes the approximated adjusted fractional Bayes factor. Using this software package, psychological researchers can evaluate informative hypotheses by means of Bayes factors in an easy manner. Two empirical examples are used to illustrate the procedure.

1 Introduction

One of the objectives of psychological studies is to test hypotheses that represent scientific expectations. The main tool available for this purpose is null hypothesis significance testing where the goal is to falsify a null hypothesis of ‘no effect’. On the other hand, psychologists may expect, for example, that the learning ability of children is stronger than the learning ability of adolescents, which in turn is stronger than the learning ability of adults, or it is expected that a patient's psychological disease would decrease after the first therapy, and decrease further after subsequent therapies. These expectations cannot be formulated by the traditional null hypothesis. Instead, such expectations can be translated to so‐called informative hypotheses which assume a specific structure of the model parameters (Hoijtink, 2012). An informative hypothesis consists of equality and/or inequality constraints among the parameters of interest in a statistical model. For example, three equal parameters can be represented by an equality constrained hypothesis urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0001, and three ordered parameters can be represented by an inequality constrained hypothesis urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0002. This class of informative hypotheses covers a much broader range of scientific expectations than the class of standard null hypotheses. In addition, by testing competing informative hypotheses directly against each other a researcher obtains a direct answer as to which scientific theory is most supported by the data. The interested reader is referred to http://informative-hypotheses.sites.uu.nl/ for an overview of psychological research in which informative hypotheses were used.

Informative hypothesis testing has drawn a lot of attention both in frequentist statistics (Barlow, Bartholomew, Bremner, & Brunk, 1972; Silvapulle & Sen, 2004) and in Bayesian statistics (Hoijtink, 2012). In the frequentist framework, hypothesis testing with inequality constraints has been studied for over 50 years, starting with (Bartholomew, 1959). Some recent contributions can be found in van de Schoot, Hoijtink, and Deković (2010), and Klugkist, Bullens, and Postma (2012). Bayesian evaluation of informative hypotheses by means of the Bayes factor is relatively new. A decade ago, Klugkist, Laudy, and Hoijtink (2005) started using Bayes factors to evaluate inequality constrained hypotheses in ANOVA models. Follow‐up research appeared in Klugkist and Hoijtink (2007) for Bayesian testing of inequality and about equality constrained hypotheses, in Mulder, Klugkist, van de Schoot, Meeus, Selfhout, and Hoijtink (2009) for Bayesian informative hypothesis testing in repeated measures models, in Klugkist, Laudy, and Hoijtink (2010) for Bayesian evaluation of equality and inequality constrained hypotheses in contingency tables, and in Mulder, Hoijtink, and Klugkist (2010) for Bayesian model selection of equality and inequality constrained hypotheses in the context of multivariate normal linear models. Developments in the use of Bayes factors for informative hypothesis testing are summarized in Hoijtink (2012). However, these studies are limited to assessing informative hypotheses in specific models and cannot yet be applied in other models, such as confirmatory factor analysis or logistic regression. More recently, van de Schoot, Hoijtink, Hallquist, and Boelen (2012) have enabled researchers to test inequality constrained hypotheses in structural equation models, Gu, Mulder, Deković, and Hoijtink (2014) have shown how to evaluate inequality constrained hypothesis in general statistical models, and Böing‐Messing, van Assen, Hofman, Hoijtink, and Mulder (2017) have enabled researchers to test informative hypotheses on group variances. Furthermore, the usefulness of the Bayes factor for testing hypotheses in psychological research has been highlighted in various studies in a special issue on the topic (Mulder & Wagenmakers, 2016). Although these studies enable hypothesis testing in a large number of statistical models using the Bayes factor, the available methods for testing hypotheses with both equality constraints and inequalities are still limited.

The incessant debate between frequentist and Bayesian hypothesis testing (Wagenmakers, 2007) has highlighted an advantage of the Bayes factor: it quantifies the relative support in the data for one hypothesis against another (Kass & Raftery, 1995). This cannot be done using classical p‐values. Psychological researchers can quantify how much the data favour a hypothesis relative to another hypothesis by means of the Bayes factor. However, the popularity of the Bayes factor is limited for two reasons: the specification of the prior can be a difficult task, especially when prior information is weak or completely unavailable; and the computation can be very intensive when the statistical model is complex. To overcome these barriers, Bayesian statisticians have presented several default Bayes factors based on default priors. Default priors usually do not reflect subjective prior beliefs and have distributional forms chosen such that the Bayes factor can easily be computed. Examples of default Bayes factors are the JZS Bayes factor (Jeffreys, 1961; Rouder, Speckman, Sun, Morey, & Iverson, 2009; Zellner & Siow, 1980), partial Bayes factors (de Santis & Spezzaferri, 1999), the Bayes factor based on expected posterior priors (Pérez & Berger, 2002), the intrinsic Bayes factor (Berger & Pericchi, 1996) and the fractional Bayes factor (O'Hagan, 1995). The last two Bayes factors are closely related to the partial Bayes factor.

In the partial Bayes factor the data are split into two parts: one part is used as a training sample to update an improper non‐informative prior distribution, and the remaining part is used to compute the Bayes factor. The training sample is proper if it renders a proper updated prior. Furthermore, the training sample is called minimal if any of its subsets is not proper (Berger & Pericchi, 2004). Both the intrinsic Bayes factor and the fractional Bayes factor use the partial Bayes factor method (de Santis & Spezzaferri, 1997, 1999). The intrinsic Bayes factor is an average of the partial Bayes factors based on all possible minimal training samples. Because of the use of all possible minimal training samples, the computation of the intrinsic Bayes factor can be intensive especially when the sample size and the size of the minimal training sample are large. Alternatively, the fractional Bayes factor takes a small fraction b of the likelihood of the complete data (O'Hagan, 1995). The updated proper prior in the fractional Bayes approach is then implicitly specified from a non‐informative prior and a fraction of full likelihood (de Santis & Spezzaferri, 1999; Gilks, 1995; Moreno, 1997; Mulder, 2014b). In this paper, we shall refer to updated priors following from the fractional Bayes methodology as fractional priors. The remaining fraction of the likelihood is then used for testing the hypotheses of interest. As will be shown in this paper, the fractional Bayes factor is computationally easy. Recently, Fouskakis, Ntzoufras, and Draper (2015) presented power expected posterior priors, which are similar to fractional priors in the sense that both of them are specified using a fraction of a likelihood function. The main difference is that the fractional prior comes from a fraction of the likelihood of the observed data, whereas the power expected posterior prior follows from a fraction of the likelihood of imaginary training data coming from a prior predictive distribution.

In this paper we focus on the fractional Bayes factor as it stands out for its convenience of evaluating informative hypotheses (Mulder, 2014b). Recently, Mulder (2014b) proposed an adjustment of the fractional Bayes factor where the fractional prior was shifted around the null value. This approach resulted in an adjusted fractional Bayes factor that converges faster to a true inequality constrained hypothesis. However, the current applications of (adjusted) fractional Bayes factors in informative hypothesis testing are still within the class of multivariate normal linear models.

This paper proposes an approximation of a fractional Bayes factor to extend its applicability to testing informative hypotheses for more general models. These models can be generalized linear (mixed) models (McCullogh & Searle, 2001) such as logistic regression models and multilevel models, and structural equation models (Kline, 2011) such as path models, confirmatory factor analysis models and latent class models. Due to large‐sample theory (Gelman, Carlin, Stern, & Rubin, 2004, pp. 101–107), the posterior distribution of the parameters in each model can be approximated by a (multivariate) normal distribution. This paper also approximates the implicit fractional prior with a (multivariate) normal distribution as a general methodology to ensure a fast computation of the (adjusted) fractional Bayesian factor. Based on these approximations, we can approximate a fractional Bayes factor to evaluate informative hypotheses in general statistical models. In addition, we discuss different choices of the fraction (Gu, Mulder, & Hoijtink, 2016; O'Hagan, 1995), which is a tuning parameter in the fractional prior, and provide a guideline for choosing this fraction. Furthermore, an important issue in Bayesian hypothesis testing is the consistency of the Bayesian procedure. Previous studies have discussed the consistency of the intrinsic Bayes factor (Casella, Giron, & Moreno, 2009), the fractional Bayes factor (de Santis & Spezzaferri, 2001; O'Hagan, 1997), and posterior model probabilities (Moreno, Giron, & Casella, 2015). In this paper, the consistency of the approximate adjusted fractional Bayes factor (AAFBF) will be elaborated and illustrated.

This paper is organized as follows. Section 2 introduces the informative hypothesis in general statistical models, and illustrates how the informative hypothesis is constructed based on researchers’ expectations by means of two empirical examples. Section 3 elaborates the specification of the adjusted fractional prior and the posterior distribution using normal approximations. Based on the specified prior and posterior distributions, the AAFBF is derived and a software package is presented for the evaluation of informative hypotheses in general statistical models. In Section 4 we discuss different choices of the fraction, and conduct a sensitivity study for the fractional Bayes factors with those choices. Section 5 revisits the two empirical examples to show how to evaluate informative hypotheses using the proposed fractional Bayes factors. Section 6 concludes.

2 Informative hypotheses in general statistical models

A statistical model is described by the likelihood function urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0003, where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0004 denotes the data, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0005 contains the parameters that are used to specify informative hypotheses, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0006 contains the nuisance parameters. Informative hypotheses are constructed using equality and/or inequality constraints based on the theories or expectations of researchers. The general form of the informative hypothesis is given by
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0007(1)
where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0008 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0009 are the restriction matrices for equality and inequality constraints in Hi, respectively, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0010 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0011 contain constants. Note that the number of rows in urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0012 equals the number of equality constraints, the number of rows in urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0013 equals the number of inequality constraints, and the numbers of columns in urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0014 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0015 equal the length of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0016.
For example, hypothesis urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0017 corresponds to
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0018
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0019

Note that a range constraint, in which the parameters of interest are constrained between two values, can be written as two inequality constraints. For example, hypothesis urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0020 can be expressed by urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0021, where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0022 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0023. This hypothesis can be seen as one where it is expected that θ is approximately equal to 0.5 with maximal deviation of 0.5, that is, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0024, where the maximal deviation of 0.5 should be specified subjectively by the user.

An informative hypothesis urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0025 can be tested against the unconstrained hypothesis
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0026(2)
against its complement
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0027(3)
which expresses what a researcher does not expect, or against another informative hypothesis
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0028(4)
It should be noted that when an informative hypothesis Hi contains at least one equality constraint, the complement of Hi is the same as the unconstrained hypothesis Hu.

Before evaluating the informative hypotheses, the parameters of interest may need to be standardized in some situations. The need for standardization depends on the statistical model and informative hypothesis under evaluation. On the one hand, the parameters have to be standardized when comparing, for example, coefficients in regression models and factor loadings in confirmatory factor analysis. For example, testing whether the regression coefficient θ1 is larger than θ2 requires the standardization of θ1 and θ2, because a large coefficient can also result from a large scale of the corresponding predictor. On the other hand, it may not be necessary to standardize the parameters urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0029 if they are compared to constants, and it is undesirable to standardize the parameters urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0030 if they represent means. For instance, testing whether a regression coefficient is larger than 0 or testing whether the mean of group 1 is smaller than the mean of group 2 does not require standardization. If standardization is required, Gu et al. (2014) discussed two ways to do this: (1) standardize all observed and latent variables, or (2) use standardized parameters. In the situation considered by Gu et al. (2014), there was little difference between the performances of the two methods. Therefore, researchers can use either of them if necessary.

In what follows, we will use two empirical examples to illustrate how researchers’ expectations can be expressed by informative hypotheses.

2.1 Example 1: Multiple regression

The first example concerns a multiple regression model used in Guber (1999) to investigate the relation between the educational costs of a school and the academic performance of the students. The data were collected in 50 US states (available at www.amstat.org/publications/jse/secure/v7n2/datasets.guber.cfm). The performance of the students is measured by the average total SAT score yi, ranging from 400 to 1,600. Its predictors are the average public school expenditure x1i, the percentage of students taking the SAT exams x2i, and the average pupil–teacher ratio x3i. The descriptives for the dependent variable yi and independent variables x1i, x2i and x3i are shown in Table 1. The relationship between student performance and its predictors is given in a regression model,

Table 1. Descriptives for variables in regression model
yi x1i x2i x3i
Mean 965.92 5.91 35.24 16.86
Standard deviation 74.82 1.36 26.76 2.27
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0031(5)
where θ0 is the intercept, θ1, θ2 and θ3 are the regression coefficients, and εi ~ N (0, σ2), denotes the residuals, with σ2 being their residual variance. For this regression model, the likelihood is
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0032(6)
where n = 50 denotes the sample size, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0033 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0034.
Guber (1999) theorized that higher education expenditures result in better student performance in SAT exams, which implies that the coefficient θ1 of the predictor x1i is positive. In addition, in those states with a small percentage of students taking SATs, the students are expected to do well because they have self‐selected into the SAT exam, which is only required by universities with high prestige. This implies that the coefficient θ2 of the predictor x2i is negative. Furthermore, although a lower pupil–teacher ratio would be associated with better performance, a school needs to spend more money on education and therefore this predictor overlaps with the expenditures. This suggests that the coefficient θ3 of predictor x3i is zero. Consequently, we specify the informative hypothesis
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0035(7)
with urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0036, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0037, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0038, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0039, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0040 in urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0041. Hypothesis H1 can be tested against its complement
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0042(8)

2.2 Example 2: Repeated measures ANOVA

We reanalyse the example of the repeated measures ANOVA used in Howell (2012, p. 462) based on an experiment with relaxation therapy. The experiment investigated the duration of nine patients’ migraine headaches before and after relaxation training. The duration of headaches is measured by the number of hours per week. Our example uses the data for the last 2 weeks of the baseline where patients received no training and the last 2 weeks of training. Therefore, the data shown in Table 2 consists of four dependent variables: the number of hours with a headache per week for nine patients in 4 weeks. The random effects model for these dependent variables is (Hox, 2010, p. 83).

Table 2. Data in repeated measures ANOVA
Subject Baseline Training
Week 1 Week 2 Week 3 Week 4
1 21 22 6 6
2 20 19 4 4
3 17 15 4 5
4 25 30 12 17
5 30 27 8 6
6 19 27 7 4
7 26 16 2 5
8 17 18 1 5
9 26 24 8 9
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0043(9)
where yij, for = 1, … , 9 and = 1, … , 4, denotes the four dependent variables, μ denotes the grand mean, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0044 denotes the random difference for person i which is constant for different j, τj denotes the fixed measurement difference for week j which is constant for different i, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0045 is the measurement error with respect to person i and week j. To investigate the effect of relaxation training, we specify the individual differences with a random effect and the treatment differences with a fixed effect. Thus, the mean for each measurement is
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0046(10)
and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0047.
The researchers expected a reduction of the duration of headaches after relaxation training. Furthermore, it is reasonable to expect that the mean durations are equal in the first 2 weeks of baseline and in the last 2 weeks of training to ensure that other factors do not influence the duration of headaches. These expectations can be expressed by the informative hypothesis
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0048(11)
with urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0049, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0050, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0051, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0052, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0053 in urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0054. We compare this hypothesis to another informative hypothesis that the mean number of headache hours continually declines in the 4 weeks:
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0055(12)
which only contains inequality constraints urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0056 with urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0057 and
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0058

The informative hypotheses constructed in these examples can be evaluated using Bayes factors, which will be elaborated in the next section. We will revisit these examples in Section 5 to display the results of the evaluation of these informative hypotheses.

3 Approximated adjusted fractional Bayes factors

The Bayes factor is the cornerstone of Bayesian hypothesis testing. It quantifies the relative evidence in the data for one hypothesis against another. The Bayes factor of an informative hypothesis urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0059 against another informative hypothesis urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0060 is defined by their marginal likelihood ratio (Jeffreys, 1961; Kass & Raftery, 1995):
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0061(13)

In Bayesian hypothesis testing, the Bayes factor has a direct interpretation as the relative evidence from the data for one hypothesis against another. If urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0062 (urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0063), this implies that hypothesis urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0064 (urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0065) receives more support from the data. Specifically, if urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0066, then the support for Hi is five times larger than for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0067. For researchers who are new to Bayes factors we recommend using the guidelines for their interpretation as provided by Kass and Raftery (1995). The degree of evidence in favour of Hi can be classified as unconvincing for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0068, positive for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0069, strong for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0070, and very strong for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0071. However, these rules for interpreting Bayes factors are not strict and can differ in different contexts.

The informative hypothesis Hi is nested in the unconstrained hypothesis Hu which does not contain any constraints on θ. When comparing Hi to Hu we can use the encompassing prior approach of Klugkist et al. (2005) where a prior is constructed under Hi via a truncation of the unconstrained (or encompassing) prior πu (θ, ζ) under Hu. The prior under Hi is then given by urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0072, where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0073 is a normalizing constraint, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0074 is the parameter space of θ in agreement with the informative hypothesis Hi. Consequently, the Bayes factor for the informative hypothesis against the unconstrained hypothesis can be expressed as

urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0075(14)
where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0076 is the posterior distribution of θ and ζ under Hu. For example, for hypothesis H1: θ1 > 0, θ2 < 0, θ3 = 0 in 7 with equality and inequality constraints, where we denote urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0077 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0078, the Bayes factor of H1 against the unconstrained alternative in 14 comes down to
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0079(15)
Note further that for a hypothesis with only equality constraints, such as H0: θ1 = θ2 = θ3 = 0, expression 13 is equal to the well‐known Savage–Dickey density ratio (Dickey, 1971; Wetzels, Grasman, & Wagenmakers, 2010),
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0080(16)
Finally, for a hypothesis with only inequality constraints, say, H2: θ1 > θ2 > θ3 > 0, expression 13 is equal to the ratio of posterior and prior probabilities that the inequality constraints hold under Hu,
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0081(17)

Thus, in order to compute the Bayes factor the unconstrained prior and corresponding unconstrained posterior need to be determined, and subsequently the unconstrained prior and posterior need to be integrated over the constrained region under the informative hypothesis. In this section we propose a novel and general approach using normal distributions to approximate the unconstrained posterior and the unconstrained fractional prior to compute default Bayes factors.

3.1 Fractional prior and posterior

To avoid ad hoc or subjective specification of the unconstrained prior, we consider the approach of O'Hagan (1995), referred to as the fractional Bayes factor. A proper default prior is automatically generated by updating a non‐informative improper prior urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0082 using a fraction b of the likelihood (Gilks, 1995). In the fractional Bayes factor the marginal likelihood of the hypothesis Hu is defined by
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0083(18)
where the proper default prior is defined by
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0084(19)
We shall refer to 19 as the fractional prior. Note that the marginal likelihood in the fractional Bayes factor in 18 is closely related to the marginal likelihood in the partial Bayes factor, where a proper default prior is obtained by training a non‐informative prior with a small subset of the data, called a training sample, X(l), while the remaining part of the data, say, X(−l), is used for computing the marginal likelihood. The marginal likelihood in the fractional Bayes factor also follows this idea, but takes a fraction b of the data, denoted by Xb, to train a non‐informative prior and then uses the remaining fraction of the data, X1–b, for computing the marginal likelihood in 18. The advantage of the fractional Bayes factor is that it does not depend on the exact choice of the subset of the data because a fraction of the complete data is used (de Santis & Spezzaferri, 1999; O'Hagan, 1995).
Following similar steps to 14 and integrating out the nuisance parameters, the fractional Bayes factor of an informative hypothesis against the unconstrained hypothesis is given by (Mulder, 2014b)
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0085(20)

3.2 Normal approximations of the fractional prior and posterior distributions

Due to large‐sample theory (e.g., Gelman et al., 2004, p. 101), the marginal posterior in the numerator of 20 can be approximated using a normal distribution where the mean is equal to the maximum likelihood estimate and the covariance matrix is equal to the inverse of the Fisher information matrix,
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0086(21)

where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0087 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0088 denote the maximum likelihood estimate and covariance matrix of θ, respectively. Note that urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0089 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0090 can be obtained using statistical software such as Mplus (Muthén & Muthén, 2010) or the R package lavaan (Rosseel, 2012). This will be further elaborated when we return to the empirical examples in Section 5.

The fractional prior in the denominator of 20 is also centred around the maximum likelihood estimate. However, it is based on a fraction b of the data, which implies an approximated covariance matrix of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0091. Consider, for example, a normally distributed data set urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0092 with known σ2. The posterior of θ is given by urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0093, where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0094 equals the sample mean urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0095 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0096. In this setting the fractional prior of θ would be urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0097. For this reason we propose to approximate the fractional prior according to
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0098(22)

3.3 Adjusting the prior mean

Various authors have suggested centring the prior distribution of θ around the focal point of interest; see, for example, Zellner and Siow (1980) and Jeffreys (1961, pp. 268–274) for null hypothesis testing, and Mulder (2014b) for testing informative hypotheses. Suppose, for example, we evaluate H1: θ ≤ 0 against its complement H2: θ > 0. By constructing the priors for θ under H1 and H2 as a truncation of an unconstrained prior that is centred around the focal point 0, the prior distributions for θ under both hypotheses are essentially equivalent; the only difference is the sign. Furthermore, by centring the prior at 0 it is assumed that small effects are more likely a priori than large effects, which is often the case in practice. A more detailed discussion on centring prior means can be found in Mulder (2014b). In this paper, we adjust the prior in 22 as follows:

urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0099(23)

where the adjusted prior mean is given by urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0100. For each informative hypothesis, one can define a parameter space urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0101 which contains one or more θ*. For example, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0102 results in urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0103, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0104 results in urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0105 in which urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0106 can be any value. Note the suggestion that the prior mean for parameters in a range constrained hypothesis is in the middle of the range space (Mulder, Hoijtink, & de Leeuw, 2012), because a range constraint basically implies an approximate equality, which in terms of a restriction for the prior mean becomes an equality. For example, the range constraint −0.2 < θ < 0.2 corresponds to the approximate equality θ ≈ 0 with maximal deviation of 0.2. Thus, the focal point is 0, and therefore we set the prior mean to θ* = 0. Below we will deal with the choice of θ*.

The prior distribution proposed in 23 depends on the informative hypothesis under evaluation, because the prior mean θ* is located on the boundary of the constrained region of the informative hypothesis. When two or more informative hypotheses are under comparison, the intersection of their constrained regions must be non‐empty so that a common unconstrained prior mean θ* exists to evaluate all informative hypotheses against the unconstrained hypothesis. A set of informative hypotheses Hi, = 1, … , I, are comparable if there exists at least one solution of θ to the set of equations
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0107(24)
(Mulder et al., 2010). The solution of θ for these equations defines the parameter space urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0108. Examples of comparable hypotheses are H1: θ = 0 versus H2: θ > 0 and H3: θ1 > θ2 > θ3 > 0 versus H4: θ3 > θ2 > θ1. Hypotheses H5: θ1 = θ2 versus H6: θ1 > θ2 + 1 are not comparable because there is no solution of θ1 and θ2 for equations θ1 = θ2 and θ1 = θ2 + 1. It should be noted that the hypothesis H7: θ1 > 0, θ2 > 0, θ2 > θ1−1 cannot be properly evaluated yet because a solution does not exist for equations θ1 = 0, θ2 = 0, and θ2 = θ1−1.

Adjusting the prior mean from urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0109 to θ* results in a slight change of the posterior for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0110. In particular, the posterior mean of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0111 would be slightly shifted towards the prior mean θ*. Large‐sample theory, however, dictates that the prior has a negligible effect on the posterior for large samples. Therefore, we leave the approximated posterior for θ, given by urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0112, unaltered. Note that a similar argument is used in the Bayesian information criterion approximation of the Bayes factor (Kass & Raftery, 1995; Schwarz, 1978).

Based on the adjusted fractional prior distribution 23 and the posterior distribution 21, the AAFBF for an informative hypothesis versus the unconstrained hypothesis can be defined as
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0113(25)
where the parameter space urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0114 is in agreement with the informative hypothesis Hi. The computation of the AAFBF will be elaborated next.

3.4 Bayes factor computation

To compute the AAFBF, we first need to determine the adjusted prior mean θ* in 23. Finding the parameter space urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0115 can be difficult for complicated informative hypotheses (Mulder et al., 2012). However, if we transform the parameters of interest using urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0116 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0117, then the informative hypothesis under consideration becomes urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0118 such that we can simply specify the prior mean vector equal to zero for the new parameter vector urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0119. Note that the range constrained hypothesis (e.g., H1: 0 < θ < 1) is an exception because, as elaborated earlier, the prior mean for θ is centred at θ* = 0.5, which requires urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0120 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0121. The specification of the prior mean for range constraints is given in Appendix A. This parameter transformation was also used in Mulder (2016) for hypotheses with only inequality constraints on correlations. Here we generalize it to equality and inequality constraints on parameters in general statistical models. The parameter transformation of θ to β simplifies the form of the hypothesis without changing the expectation of researchers. For instance, testing whether two parameters are equal (θ1 = θ2) is identical to testing whether their difference is 0 (i.e., β0 = θ1−θ2 = 0). Consequently, the adjusted fractional prior distribution and posterior distribution for the new parameter β are given by

urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0122(26)
and
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0123(27)

respectively, where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0124 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0125 with urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0126 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0127. Specifically, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0128 where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0129 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0130, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0131 where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0132 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0133.

This parameter transformation from θ to β simplifies the computation of the AAFBF. First, the AAFBF for an informative hypothesis with only equality constraints (i.e., Hi: β0 = 0), compared to the unconstrained hypothesis, can be obtained using the Savage–Dickey density ratio (Dickey, 1971; Mulder, 2014b; Wagenmakers, Lodewyckx, Kuriyal, & Grasman, 2010):
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0134(28)
where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0135 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0136 are the densities of the prior 26 and posterior 27, respectively, for β0 at the point β0 = 0 under Hu. Second, the AAFBF for an informative hypothesis with only inequality constraints (i.e., Hi: β1>0), compared to the unconstrained hypothesis, is given by (Hoijtink, 2012; Mulder, 2014b)
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0137(29)
where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0138 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0139 are the prior 26 and posterior 27, respectively, for β1. Finally, the AAFBF for an informative hypothesis with both equality and inequality constraints (i.e., Hi: β0 = 0, β1 > 0), compared to the unconstrained hypothesis, can be obtained via
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0140(30)
where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0141 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0142 are the prior and posterior distributions of β1 given β0 = 0, respectively. Note that urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0143 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0144.
We let urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0145 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0146, which can be interpreted as the relative complexities of the equality constrained hypothesis and inequality constrained hypothesis, respectively, compared to Hu under prior 26. Then, in general,
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0147(31)
represents the relative complexity of informative hypothesis Hi (Hoijtink, 2012; Mulder, 2014a), which is a relative measure of the size of the parameter space under an informative hypothesis in comparison to the unconstrained parameter space. For example, the relative complexity of ‘θ1 > θ2, and θ3 unconstrained’ is larger than the relative complexity of ‘θ1 > θ2 > θ3’. This can be understood from the fact that the parameter space of the latter is a subset of the parameter space of the former. Similarly, the relative complexity of ‘θ1 = 0, θ2 unconstrained’ is larger than the relative complexity of ‘θ1 = 0, θ2 = 0’. It is interesting to note that the relative complexity urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0148 of an equality constrained hypothesis Hi: β 0 becomes smaller when the prior variance of β under Hu becomes larger. The reason is that a larger variance of the unconstrained prior implies that a larger region of the unconstrained parameter space is likely a priori, which means that Hi is simpler relative to the unconstrained hypothesis. Furthermore, we let urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0149 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0150, which can be interpreted as the measures of relative fit of the equality constrained hypothesis and inequality constrained hypothesis, respectively, compared to Hu. Then
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0151(32)

expresses the relative fit of Hi (Hoijtink, 2012; Mulder, 2014a), which implies how well a hypothesis is supported by the data compared to the unconstrained hypothesis. The relative complexity and fit in the AAFBF can be estimated based on a similar procedure presented in Gu et al. (2014) which only considers inequality constraints. We generalize the method to hypotheses with inequality as well as equality constraints to cover a very large spectrum of informative hypotheses that can be tested.

The computation of the AAFBF is implemented in the software package BaIn (Bayesian evaluation of informative hypotheses) available at http://informative-hypotheses.sites.uu.nl/software/. A user manual for BaIn is given in Appendix B. The input of BaIn needs the maximum likelihood estimate and covariance matrix of the parameters of interest, which can be obtained using other software packages such as Mplus (Muthén & Muthén, 2010) or the free R package lavaan (Rosseel, 2012). Executing BaIn renders the AAFBF for each informative hypothesis Hi under evaluation.

The Bayes factor of an informative hypothesis Hi against its complement urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0152 is

urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0153(33)
if Hi does not contain equality constraints. Otherwise urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0154 because the marginal likelihood of the complement of a hypothesis which contains equality constraints is equal to the marginal likelihood of the unconstrained hypothesis. For the comparison of two informative hypotheses Hi and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0155, the AAFBF for Hi against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0156 can be obtained as
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0157(34)

Running BaIn for Hi and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0158 renders urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0159 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0160 such that urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0161 can be computed using 34.

4 Choices for b

This section discusses the choices of the fraction b for the specification of fractional priors. We first show the influence of the choices of b on the AAFBF when evaluating informative hypotheses because, as with the original fractional Bayes factor (Conigliani & O'Hagan, 2000), the choice of the fraction b also plays a crucial in the AAFBF. Then we present two traditional choices and one novel choice of b. At the end of this section, we conduct a sensitivity study to investigate the approximation error of the AAFBF relative to the actual adjusted fractional Bayes factor. It should be noted that this paper uses one common fraction b of the likelihood for prior specification. For this reason the AAFBF should only be used for testing hypotheses based on data that come from one population or balanced data with equal group sizes in the case of multiple populations, similar to the fractional Bayes factor (de Santis & Spezzaferri, 2001).

4.1 The role of b in AAFBF

The influence of the fraction b on the AAFBF is different for the evaluation of equality constraints urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0162 and of inequality constraints urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0163. First of all, b is a very influential parameter when evaluating equality constraints urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0164. The underlying reason is that a small (large) b implies a prior with large (small) variance such that the prior density evaluated at urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0165 or urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0166 in 28 is small (large). This can be illustrated in Figure 1 in which the solid line represents the density of prior distribution urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0167 with urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0168 under (a) = 0.05 and (b) = 0.2. As can be seen, when testing the hypothesis H1: θ = 0 against Hu, the prior density at θ = 0 is 0.63 under = 0.05 in Figure 1a, half the value 1.26 under = 0.2 in Figure 1b. Given an estimate of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0169, the resulting AAFBF for H1 against Hu under b = 0.05 is AAFBF1u = 1.64, whereas under = 0.2 it is AAFBF1u = 0.82 according to equation 27.

image
Relative complexities under different values of b.

Secondly, for range constrained hypotheses the effect of b is similar to that for an equality constrained hypothesis: a small (large) b implies a large (small) AAFBF for the range constrained hypothesis against the unconstrained hypothesis. For example, the shaded area in Figure 1 represents the prior probability in line with the range constrained hypothesis H2: −0.5 < θ < 0.5, which implies that the absolute effect is expected to be smaller than 0.5. For a small = 0.05 the prior probability of −0.5 < θ < 0.5 shown in Figure 1a is 0.57, whereas for a large = 0.2 the prior probability in Figure 1b is 0.89. Based on urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0170 and equation 29 the AAFBF for H2 against Hu under = 0.05 is AAFBF2u = 1.72, which is different from AAFBF2u = 1.11 under = 0.2.

Thirdly, the AAFBF is independent of the choice of b for inequality constrained hypotheses which do not contain range constraints. This property was proven in Mulder (2014b) and can also be seen in Figure 1 where the prior probability that the constraint of H3: θ > 0 holds under Hu is equal to 0.5 for both choices of b.

The influence of b on the AAFBF is illustrated in Figure 2 when comparing the equality constrained hypothesis H1: θ = 0, the range constrained hypothesis H2: −0.5 < θ < 0.5, and the inequality constrained hypothesis H3: θ > 0 to the unconstrained hypothesis Hu. Given the estimate urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0171 and variance urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0172 for θ, Figure 2 shows the AAFBF for each informative hypothesis under various urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0173. As can be seen, the AAFBF for H1 decreases as b increases, the AAFBF for H2 behaves similarly to that for H1, and the AAFBF for H3 is stable as b changes. This illustrates that the fraction b has to be carefully specified when equality constrained hypotheses and range constrained hypotheses are of interest to the researcher, while any fraction b can be used when only inequality constrained hypotheses without range constraints are formulated by the user. In what follows we will specify b in three different ways.

image
Influence of b on AAFBF.

4.2 Traditional choices for b

Previous studies have recommended two choices for b for the fractional Bayes factor. The first one comes from Berger and Pericchi (1996) and O'Hagan (1995) who suggested using the minimal training sample for prior specification to leave maximal information in the data for hypothesis testing. This corresponds to m/n in the fractional prior, where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0174 is the size of the minimal training sample that makes all parameters identifiable. For example, for the one‐sample t test of H0: θ = 0 where the data are urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0175, the actual adjusted fractional prior distribution for θ is urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0176, that is, a Student t density with mean 0, scale parameter s2/(nb−1), and degrees of freedom nb−1. In this case, the minimal m is 2 because = 1 results in = 1/n and degrees of freedom 0, which is not allowed.

For the AAFBF we propose a similar approach to determine our first choice of b. To estimate β (with length J) we need at least + 1 observations. Therefore, our first choice of the fraction equals
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0177(35)
where J is the number of independent constraints in all the informative hypotheses under investigation, that is, J equals the rank of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0178 for a set of informative hypotheses Hi for = 1, … , I. Thus, if H3: θ1 = 0 and H4: θ1 > 0, θ2 > 0 are under evaluation, for example, = 2 when computing the AAFBF for each informative hypothesis against the unconstrained hypothesis because there are two independent constraints.

For multiple regression model 5 in Section 2, = 3 because H1: θ1 > 0, θ2 < 0, θ3 = 0 can be formulated using a vector β of length 3. With sample size = 50, the first choice of the fraction b can be set to bmin = 2/25. For repeated measures model 10, = 3 based on a vector β of length 3 in urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0179 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0180, and therefore bmin = 1/9 based on sample size = 36.

The second way of choosing b is (O'Hagan, 1995)

urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0181(36)

which is in general larger than the first choice. O'Hagan (1995) stated that a larger b can reduce the sensitivity of the fractional Bayes factor to the distributional form of the prior. Conigliani and O'Hagan (2000) further derived a measure of the sensitivity of the fractional Bayes factor and proved that this measure is a decreasing function of b. The second choice of b can also be applied to the AAFBF defined in 25. When setting a larger b, the AAFBF becomes more similar to the non‐AAFBF. Thus, the AAFBF is less sensitive to the prior distribution given larger b. We will more to say on this topic in Section 4.4. Given the sample size = 50 in the regression model in Section 2, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0182 is specified to evaluate hypothesis H1. In the case of the repeated measures model with sample size = 36, one can set urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0183 for the comparison of H2 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0184.

4.3 A frequentist choice for b

Gu et al. (2016) recently proposed another method of specifying b by taking into account the frequentist error probabilities. In Bayesian hypothesis testing, the probability of a Bayes factor favouring Hu when Hi is true is
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0185(37)
which corresponds to the Type I error probability if Hi is a traditional null hypothesis, and the probability of a Bayes factor favouring Hi when Hu is true is
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0186(38)
which then corresponds to the Type II error probability. Gu et al. (2016) found that these probabilities are often quite different when using traditional choices of b in the one‐sample t test. This may not be preferable from a frequentist point of view where the goal typically is to control the error probabilities. Here we show how to specify b to control the error probabilities under certain conditions. First, we shall use a one‐sample t test to illustrate the procedure for specifying b based on this method, and then apply it to the AAFBF 28 for general statistical models. Finally, a rule for choosing b is proposed.

4.3.1 One‐sample t test

Consider a one‐sample t test for which data come from urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0187, where θ denotes the population mean and σ2 denotes the population variance, and the hypotheses under consideration are H1: θ = 0 against Hu: θ. The AAFBF for H1 against Hu can be derived using equation 28:
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0188(39)

where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0189 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0190. For this AAFBF the error probabilities eqns (37) and (38) become

urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0191(40)
and
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0192(41)

where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0193 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0194, for = 1, …, L, are the mean and standard deviation of data urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0195 sampled from urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0196, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0197 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0198 are the mean and standard deviation of data urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0199 sampled from Hu, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0200 is the indicator function which is 1 if the argument is true and 0 otherwise. When sampling data from Hu, an expected standardized effect size, denoted by βe, needs to be specified under Hu, namely, Hu: θ = βeσ, so that the scaled data are sampled from urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0201 under Hu, where yi xi/σ. Note that sampling urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0202 based on urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0203, where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0204, is identical to sampling the mean urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0205 based on urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0206. The specification of the standardized effect size βe will be discussed in Section 4.3.3.

In the one‐sample t test, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0207 is the observed standardized effect size known as Cohen's d (Cohen, 1992). It has sampling distributions under H1 and Hu which can be obtained using urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0208 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0209, respectively. Figure 3 shows the distributions of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0210 under H1: θ = 0 (solid line) and Hu: θ = βe (dashed line) given σ2 = 1 and = 20, where βe = .5 is the pre‐specified standardized effect size under Hu. Note that according to Cohen (1992), βe = .2, .5, and .8 correspond to small, medium, and large effects, respectively. If we use bmin = 2/n for the one‐sample t test, the error probabilities in 40 and 41 become urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0211 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0212, whereas if we specify urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0213, the error probabilities are urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0214 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0215. These error probabilities are marked in Figure 3a for bmin and Figure 3b for brobust, where the dark grey area represents p1 and the light grey area represents p2. As can be seen, p1 < p2 under both bmin and brobust, which means that we are more likely to incorrectly prefer H1 when Hu is true than incorrectly prefer Hu when H1 is true.

image
Sampling distributions of observed effect size urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0216 in one‐sample t test for = 20 and βe = 0.5 under Hu.

In order to correct for this, Gu et al. (2016) showed how to choose b such that p1 = p2 given sample size n and effect size βe under Hu. A direct way of obtaining such a b is proposed by Morey, Wagenmakers, and Rouder (2016) and illustrated in Figure 3c. As can be seen, the distributions of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0217 under H1: θ = 0 and Hu: θ = βe are symmetric on βe/2. This implies that we can simply specify urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0218 or equivalently urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0219 to attain equal error probabilities, because urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0220 is equal to urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0221. For example, given = 20 and βe = .5 under Hu in Figure 3c, the dark grey area for p1 is the same size as the light grey area for p2 when setting urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0222. The error probabilities under this setting are p1 = p2 = .139.

4.3.2 General case

The method of choosing b based on equal error probabilities can be generalized to the AAFBF of any Hi: β0 = 0 against Hu: β0 ≠ 0. Based on the adjusted fractional prior 26 and approximated posterior 27, the AAFBF in 28 is
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0223(42)
It is interesting to note that urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0224 in 42 is the test statistic in the Wald test (Engle, 1984) which assumes that β is approximately normally distributed. The test statistic is not only the cornerstone in frequentist hypothesis testing, but also important in default Bayes factors. For example, the Bayes factor proposed by Rouder et al. (2009) for the t test is a function of the t statistic, and the Bayes factor based on Zellner's g prior (Zellner & Siow, 1980) in regression models is a function of the F statistic. The standardized effect size is often defined as a test statistic divided by urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0225 to offset the influence of the sample size (Cohen, 1992), because the effect size should not be affected by the sample size as it expresses the degree to which Hu differs from Hi. Thus, the observed standardized effect size in this case can be defined as
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0226(43)
Then using the steps as in 40 and 41 for the one‐sample t test, the error probabilities of AAFBFs are defined as
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0227(44)
and
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0228(45)

The observed standardized effect size urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0229 is usually within the interval [0,1] for equality constrained hypothesis testing, because urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0230 can be interpreted analogously as Cohen's d or Cohen's f2 (Cohen, 1992), which rarely exceeds 1. First, for a one‐sample t test urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0231, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0232 versus Hu: θ, the maximum likelihood estimate of β = θ is urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0233 and the standard deviation is urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0234. Then the observed standardized effect size 43 becomes urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0235 which is the same as Cohen's d. Second, we consider the F test of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0236 against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0237 in a simple linear regression model urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0238, where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0239 is the intercept, θ1 is the regression coefficient, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0240 is the residual. The maximum likelihood estimate of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0241 is urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0242 and the standard deviation is urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0243, where urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0244 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0245 are the standard derivations of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0246 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0247, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0248 is the correlation coefficient between urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0249 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0250. Note that urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0251 is equal to the coefficient of determination urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0252 in the case of the simple linear regression model. Thus, because the coefficient of determination is equal to urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0253, the observed standardized effect size in 43 becomes urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0254, which is the square root of Cohen's urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0255.

Analogously to the effect size urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0256 in the one‐sample t test, the observed standardized effect size urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0257 also has sampling distributions under urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0258 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0259, which are symmetric around half of the pre‐specified standardized effect size urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0260 under urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0261. Therefore, by setting urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0262, or equivalently
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0263(46)
the test for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0264 against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0265 using the AAFBF has equal error probability:
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0266(47)
We now turn to how to specify urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0267 in 46.

4.3.3 A new rule for choosing b

Before presenting the new choice of b based on equal error probabilities, we need to deal with two issues: the range of b for consistent Bayes factors and the specification of standardized effect size urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0268 under urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0269. The consistency of the Bayes factor is an important property in Bayesian hypothesis testing. The Bayes factor for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0270 against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0271 is consistent if it goes to infinity as sample size goes to infinity when urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0272 is true, and goes to 0 when urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0273 is true. Morey et al. (2016) found that the prior specification based on frequentist error probabilities may result in inconsistent Bayes factors. Gu et al. (2016) showed how to resolve this by restricting b to urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0274 in the one‐sample t test. As stated in Section 4.2, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0275 is based on the minimal number of observations to specify proper priors, and therefore we will always constrain urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0276 in the AAFBF. Furthermore, we also suggest constraining urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0277 because urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0278 implies that more than half of the likelihood is used for prior specification, which is undesirable in Bayesian tests (Berger & Pericchi, 1996). Consequently, the range of b is set to urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0279.

To obtain b in 46 for equal error probabilities, the standardized effect size urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0280 under urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0281 has to be specified. Given any specific urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0282, a fraction b in 46 can be obtained such that urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0283. However, in practice urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0284 is unknown. Therefore, a distribution for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0285 is specified that covers a range of realistic effect sizes (i.e., urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0286 as already discussed). Here we consider a uniform distribution urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0287 in which every effect size from small to large is equally likely within the interval [0,1] (Gu et al., 2016). Note that this choice for b would be the same as when using urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0288 because the choice of b is independent of the sign of the effect.

Based on the distribution of effects urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0289, the third choice for b for equal error probabilities is given by
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0290(48)

The integration in 48 can be calculated numerically (see Gu et al., 2016). Although urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0291 cannot always achieve equal error probabilities as we constrain urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0292 and specify urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0293, Gu et al. (2016) show that this choice results in error probabilities that are often about equal for the one‐sample t test. It was shown that the difference between the Type I and Type II error probabilities was typically smaller for this choice than when using the more traditional choices for b. We recommend the choice urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0294 when the sample size is small, because in this case the error probabilities urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0295 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0296 are relatively large and the difference between urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0297 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0298 can be quite severe. In the following subsection, we will discuss the sensitivity of AAFBF based on different choices of b.

4.4 Sensitivity to prior distributions

In Section 3 we specified the normal prior 26 for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0299 in general statistical models. However, the adjusted fractional prior for the parameters in a specific model is often not normally distributed. Thus, when using a normal approximation of the fractional prior, as in the case of the AAFBF, we may misspecify the prior distribution for the parameters of interest. For example, if the parameter is a probability which is bounded in [0,1] in a binomial model, the (implicit) fractional prior has a beta distribution. Therefore the use of the AAFBF, where the fractional prior is approximated using a normal distribution, may be different from the non‐AAFBF. Thus, it is useful to investigate the sensitivity of the AAFBF when the fractional prior is far from normally distributed.

O'Hagan (1995) argued that the sensitivity of the fractional Bayes factor depends on the magnitude of b. This dependence was proved by Conigliani and O'Hagan (2000). Increasing b reduces the sensitivity to the distributional form of the fractional prior. This is also the case for the adjusted fractional Bayes factor (AFBF) of Mulder (2014b), because a larger b implies that more information in the data is used for prior specification, which makes the distribution of the adjusted fractional prior in the AFBF more similar to a normal distribution. This section will use two simple examples to illustrate how much difference there is between the AAFBF using the normal prior and the AFBF using the actual fractional prior. Furthermore, it is shown that the AAFBFs based on the different fractions show consistent behavior. In these examples, we will only focus on equality constrained hypotheses because, as explained earlier, the AFBF for inequality constrained hypotheses is independent of b.

The first example again concerns the one‐sample t test, where data come from urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0300 with unknown mean and variance, and the hypotheses under consideration are urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0301 against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0302. In the AAFBF, the default prior 26 for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0303 is urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0304, while the actual adjusted fractional prior for a normal mean has a t distribution urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0305 with mean 0, variance urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0306, and degrees of freedom urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0307. It is well known that the t distribution has heavier tails than the normal distribution, such that the density at the mode urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0308 from the normal distribution is larger than the density from the t distribution. Furthermore, as b increases, the degrees of freedom urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0309 increase such that the t distribution urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0310 becomes more similar to the normal distribution urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0311. This implies that for a larger b the AAFBF where the default prior has a normal distribution performs more similarly to the AFBF under the actual fractional prior. This is illustrated in Figure 4.

image
The logarithms of the adjusted fractional Bayes factor with a Student t prior (solid line) and the AAFBF with a normal prior (dashed line). The black, red, and blue lines correspond to the logarithms of Bayes factors for observed effect sizes urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0312, 0.1, and 0.2, respectively. [Colour figure can be viewed at wileyonlinelibrary.com]

Figure 4 shows the logarithms of AFBFs and AAFBFs for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0313 against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0314 for different observed effect sizes urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0315, and different fractions urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0316, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0317, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0318. The sample size n varies from 10 to 500. First, as can be seen in Figure 4a, based on urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0319 the logarithms of AAFBFs under the normal prior distribution (dashed line) differ substantially from the logarithms of AFBFs under the t prior distribution (solid line). This difference does not decrease as n increases because when setting urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0320 the degrees of freedom in the t distribution are 1, which is independent of n. This suggests high sensitivity to the functional form of the prior distribution. Second, Figure 4b shows that based on urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0321 there is not much difference between the logarithms of AAFBFs and AFBFs. This implies that the choice of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0322 results in less sensitivity to the functional form of the prior distribution than urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0323. Third, Figure 4c shows the logarithms of AAFBFs and AFBFs under urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0324. As can be seen, with urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0325 there is no sensitivity either.

It is interesting to note that Figure 4 also illustrates the consistency of AAFBFs. The consistency in this example requires that as sample size goes to infinity, the AAFBF for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0326 against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0327 approaches infinity when the observed effect size is equal to 0 and goes to zero when the observed effect size is not equal to 0. As can be seen in Figure 4, for an observed effect size urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0328 the logarithm of the AAFBF (black lines) in each figure goes to infinity as sample size n increases. Conversely, the logarithms of the AAFBF based on an observed effect size of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0329 (red lines) and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0330 (blue lines) diverge to minus infinity, which implies decisive evidence for the true unconstrained hypothesis as the sample size goes to infinity.

Next, we consider a binomial model, where data come from urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0331. The hypotheses under evaluation are urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0332 against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0333. Since urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0334 is nested in urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0335, we can use the AAFBF 28 to evaluate urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0336 against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0337. Given data urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0338, the estimate of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0339 is urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0340 and the variance is urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0341, and therefore the normal adjusted fractional prior 26 is urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0342. On the other hand, following the idea of adjusted fractional Bayes factors, the fractional prior has a beta distribution, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0343, which has a mean of 0.4 and thus β has a prior mean of 0. Note that this prior is centred on the focal point of 0.4 in urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0344.

Figure 5 plots the logarithms of the AFBFs and AAFBFs for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0345 against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0346 as the sample size n increases from 10 to 500. The observed data are urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0347. As can be seen in Figure 5 there is a considerably smaller approximation error of the AAFBF with respect to the AFBF in comparison to the first example in Figure 4. Again, the difference is largest for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0348 because this fraction is always smaller than urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0349 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0350. Finally, note that the AAFBFs show consistent behaviour for this testing problem.

image
The logarithms of the adjusted fractional Bayes factor with a beta prior (solid line) and the AAFBF with a normal prior (dashed line). The black, red, and blue lines correspond to the logarithms of Bayes factors under observed effect sizes urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0351, 0.5n, and 0.6n, respectively. [Colour figure can be viewed at wileyonlinelibrary.com]

These two examples include the evaluation of equality constrained hypotheses in both continuous data and discrete data. Although the models used are simple, the results of the sensitivity study of adjusted fractional Bayes factors can be applied in the multivariate normal model where the parameters (e.g., the group means in the ANOVA model, the coefficients in the regression model) have a multivariate t distribution, and in the multinomial model where the parameters (e.g., the probabilities in contingency tables) have a Dirichlet distribution, which is the multivariate generalization of the beta distribution. Furthermore, in more complicated settings such as structural equation models and generalized linear models, it can be anticipated that the larger b will result in less sensitive AFBFs because this implies that more data are used to specify the fractional prior such that the normal approximation to the prior has better performance based on large‐sample theory.

Based on the discussion in this section, we propose the following scheme for specifying b in the AAFBF:

  • Choose urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0352 to have a default prior that is based on the idea of a minimal training sample.
  • Choose urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0353 to ensure that the default prior is close to normal.
  • Choose urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0354 to control the frequentist error probabilities when testing an equality constrained hypothesis against the unconstrained alternative.

Note that n and J denote the sample size and the number of independent constraints for all the informative hypotheses, respectively.

5 Results for empirical examples

Let us revisit the examples introduced in Section 2 to illustrate how the AAFBF can be used to evaluate informative hypotheses. In the regression model, three parameters with respect to the regression coefficients are considered in the informative hypothesis urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0355. The first step is to specify the prior and posterior distributions in 26 and 27, which needs the estimates urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0356 and covariance matrix urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0357 of the parameters. These can be obtained by analysing the regression model with the data in Table 1 using a number of statistical software packages, such as Mplus (Muthén & Muthén, 2010) and R package lavaan (Rosseel, 2012). Note that we do not need to standardize the three coefficients as they are compared with zero. Analysis of the data in lavaan gives the maximum likelihood estimates of the parameters, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0358, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0359, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0360, and the covariance matrix
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0361

To obtain the AAFBF for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0362 against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0363, the fraction b has to be specified. Based on the sample size of = 50 and the length of vector urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0364 of = 3 in this example, the three choices of fraction are urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0365, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0366, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0367. Running BaIn with the estimates and covariance matrix of parameters of interest yields the AAFBF displayed in Table 3. As can be seen, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0368 is greater than 3 under each choice of b, which implies positive evidence in the data for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0369 against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0370 according to Kass and Raftery (1995) rule.

Table 3. Result for regression model example
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0371 urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0372 urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0373
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0374 6.04 4.46 3.55
The hypothesis in the repeated measures ANOVA model consists of four parameters of which the estimates are urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0375, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0376, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0377 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0378, and the covariance matrix is
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0379
Given sample size n = 36 and length of vector urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0380 of J = 3, three choices of b are automatically specified in BaIn as urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0381, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0382, and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0383. Based on these specifications, BaIn renders the AAFBFs urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0384 for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0385 versus urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0386 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0387 for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0388 versus urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0389. The results are shown in Table 4. As can be seen, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0390 is independent of b because the AAFBF for inequality constrained hypotheses is invariant to the choice of b. Then the AAFBF urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0391 for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0392 versus urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0393 can be computed by urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0394 which is shown in the last row in Table 4. The result of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0395 in the last row suggests positive evidence in the data for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0396 against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0397.
Table 4. Result for repeated measures ANOVA example
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0398 urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0399 urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0400
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0401 4.60 3.07 2.01
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0402 0.24 0.24 0.24
urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0403 19.2 12.8 8.38

6 Conclusion

This paper has presented a new approximate Bayesian procedure for the evaluation of informative hypotheses that can be used for virtually any model. The methodology is based on the prior adjusted default Bayes factor of Mulder (2014b). Furthermore, normal approximations were used to ensure fast computations. Numerical results showed that the approximation is close to the prior adjusted fractional Bayes factor. This implies that the proposed AAFBF provides an accurate quantification of the relative evidence between informative hypotheses. Furthermore, different choices were given for the fraction b, similar as in the fractional Bayes factor of O'Hagan (1995). The first choice relies on the concept of priors containing minimal information. The second choice uses a robustness argument resulting in a default prior distribution that is close to normal. The third choice is based on a frequency argument to control the classical error probabilities. The choice can be made by the user depending on the property which he/she finds most important. By computing the AAFBF for each choice of b we get a complete picture how much support there is in the data between two hypotheses when taking into account different philosophies.

We provide a software package BaIn, with a user manual in Appendix B, to evaluate the informative hypotheses which only needs the maximum likelihood estimates and covariance matrix of the parameters of interest, denoted by urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0404 in this paper. BaIn computes the AAFBF for an informative hypothesis against an unconstrained hypothesis. By computing these quantities for each informative hypothesis against the unconstrained hypothesis, psychology researchers can straightforwardly compute the relative support in the data for pairs of informative hypotheses.

The study in this paper contributes to the quantitative techniques in psychology research in three respects. First, the proposed Bayesian test stimulates psychologists to translate scientific expectations into informative hypotheses that can be tested with the data in a direct manner. Second, the approximate Bayesian procedure allows psychologists to test their informative hypotheses in virtually any statistical model. Third, the software package allows psychologists to apply the new methodology to their own data in an easy manner.

Acknowledgements

This research is supported by the Consortium on Individual Development (CID) which is funded through the Gravitation program of the Dutch Ministry of Education, Culture, and Science and the Netherlands Organization for Scientific Research (NWO grant number 024.001.003).

    Appendix A: Adjusting the prior mean for range constraints

    The specification of the prior mean for urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0405 in range constrained hypotheses consists of two steps:

    1. Find the range constraints in the hypotheses under investigation. A hypothesis contains range constraint(s) if there exist lines in urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0406 of which the sum is the zero vector. If there is more than one range constraint in the same hypothesis, then there are multiple sets of two or more lines that are added to zero. For example, the hypothesis urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0407 with urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0408 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0409 contains a range constraint, because urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0410.
    2. Specify the prior mean of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0411 for the range constraints. urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0412 contains the elements related to the range constraints and other inequality constraints. The prior means for those elements of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0413 that represent the edges of a range constraint are specified as urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0414 where K is the number of lines in urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0415 for each range constraint and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0416 is the constant for this range constraint, whereas the prior means for other elements of urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0417 are 0, which is not different from that for equality and inequality constrained hypotheses. For example, for the hypothesis urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0418 the edges of the range constraint are urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0419 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0420. Thus, urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0421 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0422 have prior means of .5, whereas urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0423 has a prior mean of 0.

    Appendix B: BaIn user manual

    The software package BaIn is written in Fortran 90 with the IMSL 5.0 numerical library. It computes Bayes factors to evaluate any informative hypotheses (Section 2) and compare pairs from a set of informative hypotheses if they are comparable (Section 3.3). BaIn can be freely downloaded from the website http://informative-hypotheses.sites.uu.nl/software/bain/. The downloaded folder consists of an executable file (BaIn.exe), an input file (Input.txt), and an output file (Output.txt). Running BaIn.exe with Input.txt located in the same folder produces Output.txt. This appendix shows how to fill in Input.txt so that BaIn.exe can properly read the information. Input.txt mainly contains the estimates and covariance matrix of parameters urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0424 for prior and posterior specification, and the restriction matrix and constant vector for each informative hypothesis.

    The repeated measures ANOVA example in Section 2.2 is used to illustrate the valid specification of input file. We will first display and then explain the context below from Input.txt when evaluating the informative hypotheses urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0425 11 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0426 12.

    The input text has strictly fixed structure. There are annotation lines starting with # below which the corresponding information (numbers) has to be given. The first line is the annotation for the number of structural parameters, number of informative hypotheses, and sample size, which means we need to write three numbers in the second line (i.e., 4, 2 and 9). Because the number of structural parameters is 4, four numbers for the estimates of parameters are presented in line 4, and a 4 × 4 covariance matrix is written in lines 6–9. Furthermore, because the number of informative hypotheses is 2, two hypotheses are specified. For the first hypothesis, line 11 specifies 2 and 1 for the numbers of equality and inequality constraints, respectively. Therefore, the augmented restriction matrix with constant vector for equality constraints has two rows shown in lines 13 and 14, and one row for inequality constraints in line 16. For the second hypothesis, the numbers of equality and inequality constraints are 0 and 3 given in line 18, respectively. As can be seen, there is no line with numbers immediately after line 19 because this hypothesis does not contain any equality constraints. In lines 21–23 the augmented restriction matrix for three inequality constraints is displayed.

    The estimates and covariance matrix of structural parameters can be obtained from other statistical software, such as Mplus (Muthén & Muthén, 2010) and R package lavaan (Rosseel, 2012), and the augmented restriction matrix (R0|r0) and (R1|r1) can be specified based on the informative hypotheses under evaluation. Executing BaIn.exe with this information renders the relative complexities, fits and Bayes factors for informative hypotheses under different choices of b in Output.txt. The results for the repeated measures ANOVA example are as follows:

    The results contain the relative fits and complexities for both equality and inequality constraints, as well as the Bayes factors under different bs in each hypothesis. For equality constraints, the relative fit and complexity are the normal posterior and prior densities in 28, and thus can be directly computed. However, the computation of relative fit and complexity for inequality constraints is often difficult and needs to sample from the posterior and prior distributions using Markov chain Monte Carlo methods (Gu et al., 2014). BaIn uses an efficient algorithm, which requires fewer iterations (displayed below fit and complexities) in the Markov chains to accurately estimate the relative fit and complexity. Note that the Bayes factor for informative hypothesis urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0427 against urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0428 can be computed using 34 with urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0429 and urn:x-wiley:00071102:media:bmsp12110:bmsp12110-math-0430.

      Number of times cited according to CrossRef: 15

      • Successful visually guided eye movements following sight restoration after congenital cataracts, Journal of Vision, 10.1167/jov.20.7.3, 20, 7, (3), (2020).
      • Combining Evidence Over Multiple Individual Analyses, Small Sample Size Solutions, 10.4324/9780429273872, (126-138), (2020).
      • Teacher’s Corner: Evaluating Informative Hypotheses Using the Bayes Factor in Structural Equation Models, Structural Equation Modeling: A Multidisciplinary Journal, 10.1080/10705511.2020.1745644, (1-10), (2020).
      • Beyond Discrete Choices – Investigating the Effectiveness of a Proximity Nudge With Multiple Alternative Options, Frontiers in Psychology, 10.3389/fpsyg.2020.01211, 11, (2020).
      • Sample-size determination for the Bayesian t test and Welch’s test using the approximate adjusted fractional Bayes factor, Behavior Research Methods, 10.3758/s13428-020-01408-1, (2020).
      • Bayesian One-Sided Variable Selection, Multivariate Behavioral Research, 10.1080/00273171.2020.1813067, (1-15), (2020).
      • Parental Age and Offspring Childhood Mental Health: A Multi‐Cohort, Population‐Based Investigation, Child Development, 10.1111/cdev.13267, 91, 3, (964-982), (2019).
      • Modeling the evolution of interaction behavior in social networks: a dynamic relational event approach for real-time analysis, Chaos, Solitons & Fractals, 10.1016/j.chaos.2018.11.027, 119, (73-85), (2019).
      • Elapsed time estimates in virtual reality and the physical world: The role of arousal and emotional valence, Computers in Human Behavior, 10.1016/j.chb.2019.01.005, (2019).
      • Encoding or consolidation? The effects of pre- and post-learning propranolol on the impact of an emotional scene, Journal of Behavior Therapy and Experimental Psychiatry, 10.1016/j.jbtep.2019.101480, (101480), (2019).
      • Perception of action-outcomes is shaped by life-long and contextual expectations, Scientific Reports, 10.1038/s41598-019-41090-8, 9, 1, (2019).
      • The influence of gender stereotype threat on mathematics test scores of Dutch high school students: a registered report, Comprehensive Results in Social Psychology, 10.1080/23743603.2018.1559647, (1-35), (2019).
      • Simple Bayesian testing of scientific expectations in linear regression models, Behavior Research Methods, 10.3758/s13428-018-01196-9, (2019).
      • Bayesian evaluation of informative hypotheses for multiple populations, British Journal of Mathematical and Statistical Psychology, 10.1111/bmsp.12145, 72, 2, (219-243), (2018).
      • Exploring the Role of Motor and Non-Motor Predictive Mechanisms in Sensory Attenuation: Perceptual and Neurophysiological Findings, Neuropsychologia, 10.1016/j.neuropsychologia.2018.12.007, (2018).