Volume 72, Issue 2
Original Article
Open Access

Bayesian evaluation of informative hypotheses for multiple populations

Herbert Hoijtink

Corresponding Author

E-mail address: H.Hoijtink@uu.nl

Department of Methodology and Statistics, Utrecht University, The Netherlands

Correspondence should be addressed to Herbert Hoijtink, Department of Methodology and Statistics, Utrecht University, PO Box 80140, 3508 TC, Utrecht, The Netherlands (email: H.Hoijtink@uu.nl).Search for more papers by this author
Xin Gu

Department of Educational Psychology, East China Normal University, Shanghai, China

Search for more papers by this author
Joris Mulder

Department of Methodology and Statistics, Tilburg University, The Netherlands

Search for more papers by this author
First published: 21 October 2018
Citations: 1
The copyright line for this article was changed on 28 March 2019 after original online publication.

Abstract

The software package Bain can be used for the evaluation of informative hypotheses with respect to the parameters of a wide range of statistical models. For pairs of hypotheses the support in the data is quantified using the approximate adjusted fractional Bayes factor (BF). Currently, the data have to come from one population or have to consist of samples of equal size obtained from multiple populations. If samples of unequal size are obtained from multiple populations, the BF can be shown to be inconsistent. This paper examines how the approach implemented in Bain can be generalized such that multiple‐population data can properly be processed. The resulting multiple‐population approximate adjusted fractional Bayes factor is implemented in the R package Bain.

1 Introduction

This paper is the most recent addition to a sequence of papers in which an alternative to null hypothesis significance testing has been developed. Important landmarks in this development are Klugkist, Laudy, and Hoijtink (2005a) and Kuiper, Klugkist, and Hoijtink (2010) who added order constrained hypotheses to the classical null hypothesis and showed in the context of analysis‐of‐variance models how these can be evaluated using the Bayes factor (Kass & Raftery, 1995); Mulder, Hoijtink, and de Leeuw (2012) who generalized the approach to Bayesian evaluation of informative hypotheses (Hoijtink, 2012), that is, hypotheses specified using equality and inequality (or order) constraints among the parameters of multivariate normal linear models; Gu, Mulder, Dekovic, and Hoijtink (2014) who developed a Bayes factor for the evaluation of inequality‐constrained hypotheses in a rather wide range of statistical models; and Mulder (2014) and Gu, Mulder, and Hoijtink (2018) who generalized the latter Bayes factor into the approximate adjusted fractional Bayes factor (AAFBF, henceforth abbreviated to BF) which can be used to evaluate informative hypotheses for one‐population data for a wide range of statistical models such as normal linear models, logistic regression models, confirmatory factor analysis, and structural equation models.11 The interested reader is referred to http://informative-hypotheses.sites.uu.nl/where all the books, dissertations, papers, and software produced during the course of this development are presented.

The BF is simple to compute and the only input needed are estimates of the model parameters, the corresponding covariance matrix, and the sample size. However, as will be discussed in this paper, the BF is inconsistent if samples of unequal size are obtained from multiple populations (similar to O'Hagan, 1995; fractional Bayes factor, as is shown by De Santis & Spezzaferri, 2001). This paper examines how the BF can be generalized into the multiple‐population approximate adjusted fractional Bayes factor (MBF). This Bayes factor is simple to compute too; the only input needed are estimates of the model parameters, separate estimates of the corresponding covariance matrix for each population, and the sample size obtained from each population. As will be shown, the MBF is consistent and can therefore be used for testing informative hypotheses with respect to multiple populations.

With the availability of the MBF (and corresponding software) researchers have a viable alternative for null hypothesis significance testing. In a wide range of statistical models, the null hypothesis can be replaced by informative hypotheses, and the p‐values can be replaced by the MBF. The recent and current critical appraisal of null hypothesis significance testing in the literature will not be reiterated here. However, the interested reader is referred to Cohen (1994) who called the null hypothesis the nill hypothesis because he could not come up with research examples in which the null hypothesis might be a realistic representation of the population of interest. This point of view was further elaborated by Royal (1997, pp. 79–81) who claims that the null hypothesis cannot be true, and consequently, that data are not needed in order to be able to reject it. However, the interested reader is also referred to Wainer (1999) who highlights that there are situations where, without dispute, the null hypothesis is relevant. Landmark papers criticizing the use of p‐values and significance levels are Ioannides (2005) and Wagenmakers (2007), among others. The latter paper also motivates and illustrates the replacement of p‐values by Bayesian hypothesis testing using the Bayesian information criterion (Schwartz, 1978; Raftery, 1995). However, the interested reader is also referred to the American Statistical Association's statement on p‐values (Wasserstein & Lazar, 2016) which gives a to‐the‐point and balanced overview of what can and cannot be done with p‐values, and to Benjamin, Berger, Johannesson, Nosek, Wagenmakers, and Johnson (2018) who propose to redefine statistical significance.

The focus of this paper is on the evaluation of informative hypotheses using the Bayes factor. Note that model selection criteria like the Akaike and Bayesian information criteria (Raftery, 1995; Schwartz, 1978) cannot be used (Mulder, Klugkist, Meeus, van de Schoot, Selfhout, & Hoijtink, 2009; Section 3). The penalty for model complexity in both criteria is a function of the number of parameters in the model at hand. Since the number of parameters in an unconstrained hypothesis (e.g. Hu : θ1, θ2, θ3) is the same as in a constrained hypothesis (e.g. H1 : θ1 > θ2 > θ3), it does not reflect the fact that H1 is more parsimonious than Hu. This problem is solved by Kuiper and Hoijtink (2013) who present the generalized order‐restricted information criterion (GORIC), which is a generalization of the Akaike information criterion with a penalty term that does properly reflect the fact that H1 is more parsimonious than Hu. However, the GORIC can only be applied in the context of the multivariate normal linear model, while, as discussed above, the range of application of the (M)BF is not limited to the multivariate normal linear model.

Also, as is elaborated in Van de Schoot, Hoijtink, Romeijn, and Brugman (2012), the penalty for model complexity used by the deviance information criterion (DIC; Spiegelhaler, Best, Carlin, & Van der Linde, 2002) is also not suited to quantifying how parsimonious an informative hypothesis is. Using a modification of the loss function used by the DIC, they obtain the prior information criterion (PIC) which in the examples provided can be used to evaluate informative hypotheses. However, as was shown by Mulder (2014), using the Bayes factor results in more desirable selection behaviour when testing constrained hypotheses than using the PIC.

Silvapulle and Sen (2004) show how so‐called Type A testing problems (evaluating a null hypothesis against an informative hypothesis) and Type B testing problems (evaluating an informative hypothesis against an unconstrained hypothesis) can be evaluated using p‐values in a wide range of statistical models. Those in favour of null hypothesis significance testing are well advised to consult this book and the R packages restriktor and ic.infer. The main limitation of this approach is that it cannot be used to directly compare two competing informative hypotheses.

Stern (2005) proposes using the posterior density of Hk for k = 1, …, K to select the best hypothesis. However, as is elaborated in Klugkist, Laudy, and Hoijtink (2005b), this amounts to using fk to select the best hypothesis, that is, the complexity ck is ignored. This will work if each hypothesis has the same complexity. However, if, for example, Hu is compared to H1, irrespective of the data, Hu will always be preferred because it has by definition a larger fit than H1 (cf. equation 7).

This paper starts by introducing the BF. Using a simple two‐group setup, it will be shown and illustrated that it may show inconsistent behaviour if samples of unequal size are obtained from multiple populations. Subsequently, the BF will be generalized into the MBF and, using the same two‐group setup, it will be shown and illustrated that the MBF does not exhibit inconsistent behaviour if samples of unequal size are obtained from multiple populations. Further illustrations of the approach proposed in the context of an analysis‐of‐covariance (ANCOVA) model and a logistic regression analysis will be provided. Illustrations are executed using the R package22 https://www.r‐project.org
Bain.33 https://informative‐hypotheses.sites.uu.nl/software/bain/
The R codes and data used in this paper can be found at the bottom of the Bain website (click on the title of this paper). The paper is concluded with a short discussion and contains an Appendix with a further discussion of the consistency of the MBF.

2 The approximate adjusted fractional Bayes factor

Consider a model where θ is a vector of length J containing the structural parameters, and ω a scalar, vector, or matrix containing the nuisance parameters. Hypotheses can be formalized as
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0001(1)
where Sk is a pk × J matrix imposing pk equality constraints on θ, Rk is a qk × J matrix imposing qk inequality constraints, and sk and rk are vectors containing constants of size pk and qk, respectively. Additionally of interest is the unconstrained hypothesis Hu: θ, that is, a hypothesis without constraints on the parameters θ. As will be explained below, this hypothesis has a central role in the computation of the Bayes factor.
Mulder (2014), Gu et al. (2014), Gu (2016), and Gu et al. (2018) show that the relative support in the data for Hk and Hu can be quantified using the approximate adjusted fractional Bayes factor,
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0002(2)
that is, the ratio of the fit and the complexity of Hk relative to Hu. The interested reader should consult the references given for the derivation of equation 2 from the ratio of the marginal likelihoods of Hk and Hu. The Bayes factor from equation 2 is a quantification of the relative support in the data for Hk against Hu. If, for example, BFku = 5, the support in the data is five times larger for Hk than for Hu. It will now be shown how the BF can be computed and why it is called the approximate adjusted fractional Bayes factor. It will also be highlighted that the BF is a member of the family of Bayes factors based on encompassing priors (Klugkist, Kato, & Hoijtink, 2005; Wetzels, Grasman, & Wagenmakers, 2010), that is, Bayes factors for which the prior distribution of the model parameters under Hk is derived from the prior distribution under Hu.
Before providing the formulas for fk and ck, it must be emphasized that the density of the data can be factored according to O'Hagan (1995) as
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0003(3)

where Y denotes the data that are modelled (e.g., the dependent variable in a multiple regression) and X the data that are not modelled and considered to be fixed (e.g., the predictor variables in a multiple regression). The idea of fractional Bayes factors is to use a fraction b of the information in the likelihood function to specify the prior distribution. Usually the fraction b is chosen such that it corresponds to the size of a minimal training sample (Berger & Pericchi, 1996, 2004). For the evaluation of informative hypotheses we implemented in the R package Bain b = J*/N, where J* denotes the number of independent constraints in [S1, R1, …, SK, RK] and N the sample size. This choice can be illustrated using a simple example. If H1: θ1 > θ2 > θ3 and H2 : θ1 = θ2 = θ3, the number of independent constraints J* = 2, that is, there are two underlying parameters that are combinations of the target parameters with respect to which hypotheses are formulated: θ1θ2 and θ2θ3. Our choice is motivated by the fact that in the normal linear model, the minimal training sample needed to obtain a proper posterior distribution is equal to the number of parameters. If, for example, a variable is modelled using a normal distribution with unknown mean μ and variance σ2, the minimum training sample needed to obtain a proper posterior based on the prior h(μ, σ2)  = 1/σ2 is 2 (cf. Berger & Pericchi, 2004, Example 1). If, a variable is a linear combination of two predictors with normal error, there are four parameters (intercept, two regression coefficients, residual variance) and, consequently, the minimum training sample equals 4.

Gu et al. (2014), Gu (2016), and Gu et al. (2018) show that, based on equation 3 and an improper uniform prior for θ, a large‐sample approximation (see Gelman, Carlin, Stern, Dunson, Vehtari, & Rubin, 2013, Chapter 4) of the posterior distribution of θ under Hu can be obtained:
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0004(4)

where urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0005 denotes the maximum likelihood estimate of θ and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0006 the corresponding covariance matrix. Note that the ‘approximate’ in the name ‘approximate adjusted fractional Bayes factor’ reflects the fact that for its computation a normal approximation of the posterior distribution is used. An implication of the approximation is that the BF can only be used if a normal approximation to the posterior distribution of θ is reasonable. If the sample size is not too small (see below), this is the case with unbounded parameters such as means and regression coefficients as they appear in generalized linear models and structural equation models. It is also the case for the fixed regression coefficients (the random effect would be treated as nuisance parameters) in, for example, two‐level models. In the latter case, the sample size used is the number of level‐two units (and not the number of observations of the dependent variable). This is not necessarily the case with naturally bounded parameters such as variances (naturally bounded to be positive) and probabilities (naturally bounded between 0 and 1), although even there, if the sample size is large, a normal approximation of the posterior distribution may be accurate. The interested reader is referred to Gu et al. (2014) who show that, for the evaluation of inequality‐constrained hypotheses in the context of a multiple regression with two predictors, the difference between the approximate BF implemented in Bain and the corresponding non‐approximate BF implemented in Biems (Mulder et al., 2012) is negligible if the sample size is at least 20. They also show that inequality constrained hypotheses with respect to the probabilities in a two by two contingency table render an approximate BF that is very similar the the non‐approximate BF presented by Klugkist, Laudy, and Hoijtink (2010) if the sample size is at least 40. Although these results give confidence in the performance of the approximate adjusted fractional Bayes factor, further research in the context of different models is needed in order to strengthen these results.

The prior distribution of θ has a covariance matrix which is based on a fraction b of the information in equation 3 and a mean
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0007(5)
that is, θB denotes a value of θ on the boundary of all the hypotheses under investigation (Mulder, 2014):
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0008(6)

where [Y, X]b stresses that the prior distribution is based on a fraction b of the information in the data. Note that θB is called the adjusted mean (Mulder, 2014) of the prior distribution, which explains the ‘adjusted’ in the name ‘approximate adjusted fractional Bayes factor’. As was shown by Mulder (2014), if, for example, H1 : θ > 0 is compared with Hu : θ, it holds that the more the data support H1 the smaller the support in the fractional Bayes factor for H1! This phenomenon is addressed if the adjusted fractional Bayes factor is used, that is, if the prior mean is in agreement with equation 5, the more the data are in agreement with H1 the larger the support in the adjusted fractional Bayes factor for H1 (see Mulder, 2014, for further details). Note, furthermore, that urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0009 is a so‐called encompassing prior, that is, the prior distribution of θ under Hk is proportional to urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0010, where the indicator function is 1 if the argument is true and 0 otherwise (Klugkist et al., 2005a; Wetzels et al., 2010).

There are situations in which there is no solution to equation 5. For example, if hypotheses are specified using range constraints, for example, H1 : |θ| < 0.2 (i.e. H1 : θ > −0.2, θ < 0.2), there is no solution. Bain addresses this problem in the following manner: in equation 5 (and only in this equation) this (part of a) hypothesis is represented as H1 : θ = 0, that is, θB will be equal to the midpoint of the range specified. The rationale is that H1 essentially implies that θ ≈ 0. Another example is given by the hypotheses H1 : θ = 0 and H2 : θ > 2. Although each of these hypotheses can be evaluated by itself, they cannot be compared using the approximate adjusted fractional Bayes factor because there is no solution to equation equation 5, that is, both hypotheses are not compatible because hu (·) is different for each hypothesis (Hoijtink, 2012; section 9.9.2.1.). Testing non‐compatible hypotheses can be done using BIEMS (Mulder et al., 2012) by instructing the program to use the same unconstrained prior for each of the hypotheses under consideration.

Based on equations 4 and 6, the relative fit and complexity from equation 2 are defined as
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0011(7)
and
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0012(8)

respectively. The interested reader is referred to Gu (2016, Chapter 3) for the algorithms with which the fit and complexity are computed. The strength of the BF lies in its simplicity. Its computation is based only on maximum likelihood estimates and the corresponding asymptotic covariance matrix, and the choice of the fraction b, which is completely determined by the sample size N and the number of independent constraints J*.

The approximate adjusted fractional Bayes factor, also in the paragraphs that follow abbreviated as BF, falls in the category of default, automatic, or pseudo Bayes factors because no priors have to be manually specified. Instead, the prior is automatically constructed using a small fraction of the data, while the remaining fraction is used for hypothesis testing, similar to the fractional Bayes factor (O'Hagan, 1995). The BF is coherent in the sense that BFuk = 1/BFku and BFkk’ = BFku/BFk'u (O'Hagan, 1997; sections 3.1 and 3.2). Note that these coherence properties do not necessarily hold for other default Bayes factors (O'Hagan, 1997; Robert, 2007; p. 240).

As further noted by Robert (2007, p. 242), a potential issue of the fractional Bayes factor, and therefore also of the BF in equation 2, is that there is no clear‐cut procedure to choose the fraction b. We believe, however, that the use of a minimal fraction is reasonable as it results in a minimally informative default prior while maximal information in the data is used for hypothesis testing (Berger & Mortera, 1995). Furthermore, it has been shown that this choice results in consistent testing behaviour (Mulder, 2014; O'Hagan, 1995). Nevertheless, further research on the choice of b would strengthen the approach we present in this paper. The interested reader is referred to Gu, Hoijtink, and Mulder (2016), for one evaluation of the choice of b. Another potential issue highlighted by Robert (2007, p. 242) is that default Bayes factors can be computationally intensive. The BF procedure that is proposed here, however, is very easy to compute: only the maximum likelihood estimates, error covariance matrix and sample size are needed (Gu et al., 2018).

Finally, it is important to note that default Bayes factors may behave as ordinary Bayes factors based on on so‐called intrinsic priors (Berger & Pericchi, 1996). Currently, however, intrinsic priors have not yet been explored for the BF. Although this too is a topic worthy of further research, from a pragmatic point of view it is more important to know whether the BF is consistent, that is, whether the support for the true hypothesis goes to infinity when the sample size grows to infinity. According to O'Hagan (1997; section 2.1), if hypotheses are nested (in the cases we consider all hypotheses are nested within Hu) and if b → 0 if N → ∞ (which holds for our b), the fractional Bayes factor is consistent. However, De Santis and Spezzaferri (2001) show that the fractional Bayes factor may show inconsistent behaviour if data from multiple populations are sampled. Similarly, as shown in the next section, the BF is also inconsistent if the data are sampled from multiple populations. In line with the solution proposed by De Santis and Spezzaferri (2001) for the fractional Bayes factor, the MBF is an extension of the BF that is consistent when testing hypotheses in the case of multiple populations.

3 Consistency of the approximate adjusted fractional Bayes factor

The consistency of the (M)BF this will be discussed in terms of (M)BFku if Hk is specified using only equality constraints. In this case a Bayes factor is called consistent if (M)BFku → ∞ (or 0) when Hk (or Hu) is true and Ng → ∞ at the same rate for all g = 1, …, G. If Hk is specified using only inequality constraints, it will be discussed in terms of (M)BFkc, where Hc is the complement of Hk. In this situation, a Bayes factor is called consistent if BFkc → ∞ (or 0) when Hk (or Hc) is true, as Ng → ∞ at the same rate for all g = 1, …, G. Both scenarios imply that the G populations are treated as one population from which a sample of increasing size (proportionally increasing the sample sizes from each of the G populations) is taken. Note that BFkc = BFku/BFcu, where the numerator and denominator can be computed using equation 2. Note, furthermore, that for hypotheses specified using only equality constraints Hu Hc.

When BFku ↛ ∞ or BFkc ↛ ∞ for the same limit, the Bayes factor is called inconsistent. Another form of inconsistency that will be considered in this paper is whether (M)BFku → ∞ or 0, and (M)BFkc → ∞ or 0, as Ng → ∞ for some populations but not all G populations. This situation applies if a sample of increasing size is obtained from some of the G populations while the sample size from the other populations remains fixed. De Santis and Spezzaferri (2001) showed for this limit that the fractional Bayes factor (O'Hagan, 1995) is inconsistent. In this section it will be illustrated, in line with De Santis and Spezzaferri (2001), that the same holds for the BF. In the next section the MBF will be introduced, which can be seen as an extension of the BF to multiple populations which avoids this form of inconsistency.

Example 1: Comparison of two independent means. Consider the following simple model:
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0013(9)

where D1i equals 1 for i = 1, …, N1 and 0 otherwise, D2i equals 1 for i = N1 + 1, …, N1 + N2 and 0 otherwise (and, consequently, θ1 and θ2 denote the means in group 1 and group 2, respectively, and ω the residual variance), and N1 and N2 denote the sample sizes of groups 1 and 2, respectively, with N = N1 + N2. Connecting this notation to that of the previous section renders Y = y and X = [D1, D2].

Consider testing H1 : θ1 = θ2 against Hc : θ1 ≠ θ2. Note that the marginal likelihood of Hc is equal to the marginal likelihood of the unconstrained hypothesis Hu : θ1, θ2 because θ1 = θ2 has zero probability assuming a bivariate normal prior for θ1, θ2 under Hu. For the exposition that follows we arbitrarily assume that urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0014. The approximated unconstrained posterior and prior distribution of θ1 and θ2 from equations 4 and 6 are then given by

urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0015(10)

and

urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0016(11)
respectively, where b = J*/N = 1/N. Note that, with respect to H1, the prior means for θ are in agreement with equation 5.
If we write δ = θ1θ2, then the BF is given by the Savage–Dickey density ratio (Dickey, 1971; Mulder, Hoijtink, & Klugkist, 2010; Wetzels et al., 2010).
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0017
Let us first of all consider the situation in which N1 and N2 go to ∞ at the same rate, that is, let Ng = agn, for some positive constant ag, g = 1 or 2, and let n → ∞. If urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0018, then
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0019(12)
if n → ∞ and
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0020(13)

which is a constant independent of n. Equations 12 and 13 imply that BF1u → ∞ if n → ∞, which is consistent. If urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0021,

urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0022(14)
if n → ∞ and c1 remains as in equation 13. This implies that BF1u → 0 if n → ∞, which is consistent.
Now if we fix N1 and let N2 → ∞, then in the limit equation 14 reduces to
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0023(15)
and the middle part of equation 13 reduces to
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0024(16)

if n → ∞. This implies that in the limit BF1u → ∞ also if Hu is true, which is inconsistent behaviour.

To get more insight into the (in)consistency, the BF was computed for various numerical examples in Tables 1, 2, and 3. In the case of support for H0 we set urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0025 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0026, and in the case of support for Hu we set urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0027 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0028. In both situations we again let urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0029. As can be seen in Table 1, when N1 = N2 and both increase at the same rate, BF1u → ∞ if H1 is true and BF1u → 0 if Hu is true, that is, the Bayes factor shows consistent behaviour. Table 2 shows that BF1u also shows consistent behaviour if both sample sizes increase at the same rate if N1 ≠ N2. However, as can be seen in Table 3, if there is support for H1, BF1u increases if N2 increases while N1 remains fixed, but if there is support in the data for Hu, BF1u at first decreases but then starts to increase, which implies that the evidence accumulates in the wrong direction. As N2 keeps increasing BF1u goes to infinity as shown above. This is a simple illustration of inconsistent behaviour of the BF where multiple populations are considered while the sample size does not increase for all populations. De Santis and Spezzaferri (2001) show that this behaviour can also be observed for the fractional Bayes factor. The problem is caused by the fact that the prior variances of θ1 and θ2 are dependent on the sample sizes in both groups because b = 1/N (Table 3). As N2 increases, the fraction that is used to construct the default prior for θ1 also goes to zero even though the sample size of group 2 does not increase. This undesirable property can be avoided using population‐specific fractions in line with Iwaki (1997), Berger and Pericchi (1998), De Santis and Spezzaferri (1999, 2001), and Mulder (2014). In the remainder of this paper it will be detailed how this can be done for the BF to obtain the MBF for multiple populations.

Table 1. Investigation of consistent behaviour of the one‐population Bayes factor (BF) and the multiple‐population Bayes factor (MBF) in the case of support for H1 (urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0030) or for Hu (urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0031) in the case of equal sample sizes for both groups increasing at the same rate
N 1 N 2 BF MBF
b urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0032 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0033 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0034 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0035 b1 b2 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0036 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0037 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0038 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0039
BF1u BF1u MBF1u MBF1u
10 10 .05 2 2 4.47 1.31 .05 .05 2 2 4.47 1.31
25 25 .02 2 2 7.07 0.33 .02 .02 2 2 7.07 0.33
50 50 .01 2 2 10.00 0.02 .01 .01 2 2 10.00 0.02
100 100 .005 2 2 14.14 0.00 .005 .005 2 2 14.14 0.00

Note

  • N1 and N2 denote the sample sizes in groups 1 and 2, respectively; b denotes the fraction of information in the density of the data, and b1 and b2 denote the fraction of information in the density of the data for groups 1 and 2, respectively; urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0040 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0041 denote the prior variances of θ1 and θ2 from equation 11 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0042 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0043 the prior variances of θ1 and θ2 from equation 25. The numbers in italics are referred to in the text.
Table 2. Investigation of consistent behaviour of the one‐population Bayes factor (BF) and the multiple‐population Bayes factor (MBF) in the case of support for H1 (urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0044) or for Hu (urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0045) in the case of unequal sample sizes for the two groups increasing at the same rate
N 1 N 2 BF MBF
b urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0046 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0047 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0048 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0049 b1 b2 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0050 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0051 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0052 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0053
BF1u BF1u MBF1u MBF1u
10 50 .017 6 1.2 7.74 1.01 .05 .01 2 2 5.77 0.75
25 125 .007 6 1.2 12.25 0.07 .02 .004 2 2 9.13 0.06
50 250 .003 6 1.2 17.32 0.00 .01 .002 2 2 12.91 0.00

Notes

  • N1 and N2 denote the sample sizes in groups 1 and 2, respectively; b denotes the fraction of information in the density of the data, and b1 and b2 denote the fraction of information in the density of the data for groups 1 and 2, respectively; urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0054 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0055 denote the prior variances of θ1 and θ2 from equation 11 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0056 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0057 the prior variances of θ1 and θ2 from equation 25.
Table 3. Investigation of consistent behaviour of the one‐population Bayes factor (BF) and the multiple‐population Bayes factor (MBF) in the case of support for H1 (urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0058) or for Hu (urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0059) in the case of unequal sample sizes where only the size of group 2 increases
N 1 N 2 BF MBF
b urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0060 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0061 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0062 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0063 b1 b2 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0064 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0065 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0066 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0067
BF1u BF1u MBF1u MBF1u
10 10 .05 2 2 4.47 1.31 .05 .05 2 2 4.47 1.31
10 25 .029 3.5 1.4 5.92 1.03 .05 .02 2 2 5.34 0.93
10 50 .017 6.0 1.2 7.74 1.01 .05 .01 2 2 5.77 0.75
10 100 .009 11 1.13 10.48 1.13 .05 .005 2 2 6.03 0.65
10 200 .005 21 1.05 14.94 1.41 .05 .0025 2 2 6.17 0.60
10 1000 .001 101 1.01 31.78 2.81 .05 .0005 2 2 6.29 0.56

Note

  • N1 and N2 denote the sample sizes in groups 1 and 2, respectively; b denotes the fraction of information in the density of the data, and b1 and b2 denote the fraction of information in the density of the data for groups 1 and 2, respectively; urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0068 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0069 denote the prior variances of θ1 and θ2 from equation 11 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0070 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0071 the prior variances of θ1 and θ2 from equation 11. The numbers in italics are referred to in the text.

4 The approximate adjusted fractional Bayes factor for multiple populations

In this section the MBF will be introduced. The developments will be illustrated using the comparison of two independent means. Let g = 1, …, G, where G denotes the number of groups, and Ng the corresponding sample sizes. Let θ = [θ1, …, θg, …, θG, η], where θg denotes the structural parameters that are unique to group g and η the structural parameters that are shared by all the groups. Then, in line with De Santis and Spezzaferri (2001), the density of the data of the multiple population model can be factored as
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0072(17)
where bg denotes the fraction of the information in the likelihood for population g that will be used for the specification of the prior distribution.

Example 1, continued. The following notation will be used to denote which parts of the data belong to groups 1 and 2. The subscripts 1 and 2 in y1, y2 denote data sampled from populations 1 and 2, respectively. Analogously, the second subscript in D11, D12, and D21, D22, denotes data from populations 1 and 2, respectively. Using this notation, the density of the data for the comparison of two independent means can be factored as:

urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0073(18)

The covariance matrix of the parameters in equation 17 can be obtained as a function of the observed or expected Fisher information matrix (the interested reader is referred to Efron and Hinkley (1978), for details of the relative (dis)advantages of both types of information). Using the observed Fisher information, this leads to

urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0074(19)
where each second‐order derivative is to be evaluated using urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0075, that is, the unconstrained maximum likelihood estimates of the model parameters obtained using the full density of the data from equation 17. If the expected Fisher information is used, the expected value of each entry in the last part of equation 21 has to be taken. The corresponding normal approximation of the posterior distribution of the structural parameters is
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0076(20)
that is, the multiple‐population counterpart of equation 4.

Note that urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0077 can be constructed using the observed Fisher information matrix for the parameters of each group:

urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0078(21)

where each second‐order derivative is be evaluated using urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0079, that is, the maximum likelihood estimates of the model parameters obtained using the full density of the data displayed in equation 17not only using the data for group g. Analogously, equation 23 can be replaced by the corresponding expected Fisher information matrix. Comparing equation 23 for g = 1, …,G to equation 21 shows that the former contains all the elements needed to construct the latter. This is important since the input for Bain consists of the covariance matrices for each group, from which Bain constructs the overall covariance matrix. As will be detailed in the next paragraph, these group‐specific covariance matrices are needed in order to be able to construct the prior distribution based on a fraction bg of the information of the data in each group.

Once urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0080 for g = 1, …, G has been obtained it is straightforward to obtain the multiple‐population counterpart of the prior distribution displayed in equation 6 which is based on a covariance matrix using a fraction bg of the information in Yg, Xg for g = 1, …, G (see, equation 17). Using the mathematical rule that ∂2 log p(vw)u ∂v ∂w = u ∂ 2 log p(vw) ∂v ∂w, it can be seen that
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0081(22)

Reassembling these matrices (cf. equation 21) renders

urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0082(23)
where urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0083 denotes a covariance matrix based on fractions b = [b1, …, bG] of the information in the data, rendering the multiple‐population adjusted fractional prior distribution of the structural parameters
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0084(24)

Note that […]b in equation 26 denotes a prior distribution based on fractions b of the information in the data. It can be interpreted as a default prior that contains the information of group‐specific data fractions, bg, for the parameters of interest. Note, furthermore, that the subscript B in θB,1, …, θB,GηB highlights that the prior means of the structural parameters are in agreement with equation 5, that is, centred on the boundary of the hypotheses specified.

The MBF is the counterpart of BF in equation 2 based on the multiple‐population posterior and prior distributions displayed in equations 22 and 26:
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0085(25)
Example 1, continued. Estimates of θ1, θ1, σ2 are easy to obtain. It is well known that, using the expected Fisher information, the counterpart of equation 22 for the example at hand is
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0086(26)

from which, using equation 19, it is straightforward to obtain that

urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0087(27)

The counterpart of equation 26 for the example at hand has θB = [0, 0] and, applying equation 25,

urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0088(28)

With respect to the computation of equation 23 three situations can be distinguished.

Situation 1. The multivariate normal linear model with group‐specific and joint parameters. In the multivariate normal linear model there are z = 1, …, Z dependent variables and p = 1, …, P predictors with regression coefficients βpz (where the predictor attached to a possible intercept is a column of 1s):
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0089(29)
where urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0090. Multiple populations arise if two or more of the predictors are used to create groups. Two groups with group‐specific intercepts are created if, for example, x1i = 1 if person i is a member of group 1 and 0 otherwise and x2i = 1 if person i is a member of group 2 and 0 otherwise. Group‐specific regression coefficients can additionally be obtained if, for example, x3i = x*ix1i and x4i x*ix2i (where x*i denotes a continuous predictor for which group‐specific regression coefficients are required), that is, the predictor x3i gets a regression coefficient β3z in group 1 and β4z in group 2. With Z = 1 the model could be
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0091(30)
for group 1, and
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0092(31)

for group 2, and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0093.

For the multivariate normal linear model,
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0094(32)
where θg contains all the group‐specific coefficients, η contains the joint coefficients, ω is a matrix containing the covariance matrix of the residuals, and all the data for the predictors for group g are collected in Xg. Using equation 1, hypotheses with respect to the structural parameters θ = vec(B), where B is a Z × P matrix containing the regression coefficients βpz, can be formulated. Later in this paper an ANCOVA model will be used in Example 2 to illustrate situation 1. Note that the R function lm can be used to estimate the parameters of the multivariate normal linear model.

Situation 2. Models with only group‐specific parameters. When all of the parameters (including ω) in the density of the data are group specific, the covariance matrix in equation 21 will be block diagonal with one block for each group. Consequently, it is straightforward to use R packages tailored to the statistical model of interest to obtain estimates urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0095 for g = 1, …, G, and, for each group, the corresponding covariance matrix urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0096. Note that this does not apply to the example given for situation 1 (equations 32 and 33) because σ2 was not group specific. This would have applied if, in addition to the intercept and regression coefficient, σ2 had been group specific too.

Situation 3. All other situations. In all other situations R packages can be used to obtain the estimates urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0097, but the equations rendering urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0098 based on urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0099 and YgXg for g = 1, …, G will either have to be programmed in R or obtained through the use of R packages like numDeriv which provides numerical approximations of second‐order derivatives based on the log density of the data of the statistical model of interest. Later in this paper a logistic regression will be used in Example 3 to illustrate this situation. For users with limited experience of statistical modelling and R, the third situation will be difficult to handle: the likelihood function of the statistical model at hand has to be formulated and numDeriv has to be used to estimate the covariance matrix for each group (and not the overall covariance matrix) using the overall groups estimates of the model parameters. Currently, one annotated example (a logistic regression model) is provided on the Bain website. Users requiring support in the context of other models can send an email to the first author of this paper with the request to add additional examples to the website.

5 Choosing bg

In the case of one population based on Gu et al. (2018), the R package Bain uses b = J*/N. The remaining question is how to choose bg for g = 1, …, G in the case of multiple populations. If the size of the sample obtained from each population is the same, it should not matter whether BFku or MBFku is used, that is, the equality BFku = MBFku should hold. Computation of the covariance matrix displayed in equation 25 in the situation wherre N1 = … = NG can be done using b1 = … = bG = b. Applying this to the penultimate diagonal entry of the covariance matrix renders
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0100(33)
with
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0101
Therefore a reasonable choice is
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0102(34)

This choice is in keeping with the concept of a minimal fraction from each population to construct an implicit default prior.

6 Consistency of the multiple population approximate adjusted fractional Bayes factor

De Santis and Spezzaferri (2001) show that their generalized fractional Bayes factor is consistent if N → ∞, that is, if the Ng increase at the same rate (cf. De Santis & Spezzaferri, 2001; Theorem 4.1). It will be illustrated below, via a continuation of Example 1, that the same holds for the MBF, that is, if Ng → ∞, for all g at the same rate, then MBFku → ∞ (or 0) when Hk (or Hu) is true. A more general discussion of the consistency of the MBF is given in the Appendix. The example below also shows that the MBF avoids the inconsistent behaviour shown by the BF when fixing the sample size of one population while letting the sample size of the other population go to infinity.

Example 1, continued. Earlier in this example it was shown that the BF exhibits inconsistent behaviour if the sample size of one group is fixed while the sample size of the other group goes to infinity. When the MBF is used, the posterior distribution is unchanged and identical to equation 10. However, the prior distribution changes from equation 11 to
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0103(35)
because b1 = urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0104 and b2 = urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0105. As can be seen, the prior distribution in equation 35 is independent of N1 and N2. This can be interpreted as the amount of prior information being independent of the sample size, which is a desirable property. Also note that the prior mean does not depend on the information in the data but is chosen to be in agreement with equation 5.
The MBF of H1 against Hu is given by
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0106(36)
as N2 → ∞, where gu(·) and hu(·) are obtained based on equations 10 and 35, respectively. As can be seen, if urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0107 and n → ∞, then f1 → ∞ and c1 is constant. This implies that MBF1u → ∞. If urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0108 and n → ∞, then f1 → 0 and c1 is constant. This implies that MBF1u → 0. Stated otherwise, for n → ∞, MBF1u is consistent. Furthermore, if N2 → ∞ while N1 is fixed, then if urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0109, in the limit (see the last term of equation 36 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0110, which is larger than 1, that is, correctly expresses support for H1. Although for N2 → ∞ MBF1u does not approach ∞, this is reasonable behaviour and the inconsistent behaviour of the BF is avoided. If urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0111 the limiting behaviour of MBF1u is shown by the last term of equation 36. If, for example, N1 = 25 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0112, MBF1u = 8.8, that is, H1 is supported. This too is reasonable, because both the sample size of group 1 and the effect size are small and therefore the effect is not convincingly different from zero. If both are larger, for example, N1 = 49 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0113, MBF1u = 0.03, that is, Hu is supported. As is illustrated, the degree support for or against H1 is based on the sample size and the effect size. This too is reasonable behaviour and again the inconsistent behaviour of the BF is avoided.

As can be seen in the last two columns in the middle and right‐hand panels of Tables 1 and 2, if both sample sizes are proportionally increasing, both BF1u and MBF1u show consistent behaviour in the sense that (M)BF1u → ∞ if θ1 = θ2 and (M)BF1u → 0 if urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0114. Note that, as required by our choice of bg in equation 36, for equal sample sizes in both groups both Bayes factors are equal (see Table 1).

Furthermore, as can be seen in the last two columns in the middle and right‐hand panels of Table 3, if one sample size is fixed and the other is increasing, in contrast to BF1u, MBF1u does not show inconsistent behaviour in the sense that MBF1u is monotonically increasing if urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0115 and MBF1u is monotonically decreasing if urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0116. As can be seen, in this situation, when only N2 is increased, MBF1u converges to the upper bound 6.325 (0.546) when θ1 = θ2 (urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0117) based on the limit in equation 1.

As can be seen by comparing the last number on the last row in Table 1 (N = 200, N1 = N2 = 100) with the last number on the penultimate row in Table 3 (N = 210, N1 = 10, N2 = 200), it makes a huge difference in outcome whether or not the sample sizes are balanced. Evidence in favor of the true hypothesis is larger with balanced than with unbalanced sample sizes.

7 Example 2: Analysis of covariance

Consider the ANCOVA model
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0118(37)
where D1i is equal to 1 if person i is a member of group 1 and 0 otherwise, the other dummy variables are defined analogously, and both covariates are centred such that θ1, …, θ5 denote the covariate adjusted means. Equation 2 can be split into five parts, one for each group:
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0119(38)
for g = 1, …, G and Ng persons in each of the groups. Note that θ1 = θ1, …, θ5 = θ5, η = [β1β2], ω = σ2, and Yg = yg, Xg = [Dggx1gx2g], where the second subscript g denotes that the data correspond to the members of group g.
Applying equation 17, the density of the data of this model can be factored as
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0120(39)
Maximum likelihood estimates of the parameters θ1, …, θ5β1β2σ2 of the ANCOVA model from equation 2 can, for example, be obtained using the lm function from the R package. Subsequently, using a well‐known result from the regression literature, the realization of equation  for g = 1, …, 5 is obtained as
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0121(40)

where Xg = [Dggx1gx2g], in which the second subscript indicates to which group the data elements belong. Equation 5 is obtained using the expected Fisher information. Since the expected value of the second‐order derivatives with respect to either θ1, …, θ5β1β2 on the one hand, or σ2 on the other hand, are zero, urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0122 constructed using equation 21 based on the expected Fisher information for only these parameters is identical to the corresponding part in urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0123 (cf. equations 28 and 29).

Inverting and multiplication by −1 of urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0124, for g = 1,2, renders the Fisher information matrices for the groups. Using equation 21, these can be assembled into the overal Fisher information matrix which, after inverting and multiplication by −1, renders urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0125. Modifying equation 5 according to equation 22 renders
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0126(41)

that is, the elements of the expected Fisher information matrix for each group g. Reassembling these elements using equation 25 renders urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0127, that is, the covariance matrix of the prior distribution.

This example concludes using data from Stevens (1996, appendix A) concerning the effect of the first year of the Sesame Street series on the knowledge of 240 children in the age range 34–69 months. We will use the following variables: y, knowledge of numbers after watching Sesame Street; x1, the knowledge of numbers before watching Sesame Street; x2, a test measuring the mental age of children; and D1, …,D5 dummy variables representing the children's background (1 = disadvantaged inner city, 2 = advantaged suburban, 3 = advantaged rural, 4 = disadvantaged rural, 5 = disadvantaged Spanish‐speaking).

The informative hypotheses of interest are
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0128(42)
and
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0129(43)

Hypothesis 1 states that the knowledge of numbers after watching Sesame Street does not depend on background correcting for initial knowledge and mental age. Hypothesis 2 states that the advantaged children have a greater knowledge after watching Sesame Street than the disadvantaged children.

Table 4 presents the input the R package Bain needs in order to evaluate H1 and H2, that is, estimates of the adjusted means, regression coefficients, and residual variance, and, for each group, the covariance matrix for the group‐specific adjusted mean and both regression coefficients, computed using urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0130 (cf. equation 5), and the sample size. Table 5 first of all presents the posterior covariance matrix of the structural parameters computed from the group‐specific covariance matrices using equation 21. Then the vector b computed using bg = 1/5 × 4/Ng for g = 1, …, G is presented. Note that J* equals 4 because the number of independent constraints in

Table 4. A five‐group ANCOVA model: Input for the R package Bain
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0131 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0132 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0133 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0134 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0135 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0136 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0137 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0138
29.16 34.38 28.90 27.12 30.89 0.70 0.05 84.06
N 1 N 2 N 3 N 4 N 5
60 55 64 43 18
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0139 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0140
1.62 −0.05 0.05 3.07 −0.01 −0.10
−0.05 0.02 −0.01 −0.01 0.02 −0.01
0.05 −0.01 0.01 −0.10 −0.01 0.01
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0141 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0142
2.32 0.07 0.09 2.21 0.04 0.04
0.07 0.03 −0.01 0.04 0.03 −0.09
0.09 −0.01 0.01 0.04 −0.01 0.01
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0143
5.47 0.20 −0.20
0.20 0.09 −0.05
−0.20 −0.05 0.05
Table 5. A five‐group ANCOVA model: Output from the R package Bain
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0144
1.45 −0.09 0.03 0.02 −0.04 −0.01 0.01
−0.09 1.96 −0.24 −0.12 0.14 0.01 −0.03
0.03 −0.24 1.46 0.07 −0.07 0.00 0.02
0.02 −0.12 0.07 1.99 −0.03 0.00 0.01
−0.04 0.14 −0.07 −0.03 4.72 0.01 −0.01
−0.01 0.01 0.00 0.00 0.01 0.01 −0.00
0.01 −0.03 0.02 0.01 −0.01 −0.00 0.00
b
0.013 0.015 0.012 0.019 0.044
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0145
108.14 −5.52 2.02 0.89 −2.66 −0.85 0.67
−5.52 129.41 −13.40 −6.77 7.90 0.46 −1.82
2.02 −13.40 113.04 4.09 −3.86 0.17 0.85
0.89 −6.77 4.09 107.20 −1.89 0.14 0.41
−2.66 7.90 −3.86 −1.89 108.07 0.51 −0.72
−0.85 0.46 0.17 0.14 0.51 0.31 −0.14
0.67 −1.82 0.85 0.41 −0.72 −0.14 0.17
MBF1u MBF2u MBF12
2.94 1.34 2.21

Note

  • The number in italics is referred to in the text.
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0146(44)

which specifies the equality constraints in H1 and

urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0147(45)

which specifies the inquality constraints in H2, is equal to 4, that is, the number of independent rows in the combination of S1 and R2 is equal to 4. Next, the prior covariance matrix of the structural parameters computed using the group‐specific covariance matrices and b and equations 22 and 23 is displayed. Finally, MBF1u, MBF2u, and MBF12 are presented. As can be seen, the support in the data is 2.21 times greater for H1 than for H2, that is, it is slightly more likely that the gain in knowledge of numbers is equal for advantaged and disadvantaged children than that the gain is greater for the advantaged children. More data would be needed to obtain a more decisive conclusion.

8 Example 3: Logistic regression

Example 2 illustrated how urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0148 for g = 1, …, G can be computed if the statistical model at hand is a member of the (multivariate) normal linear model (previously labelled situation 1). In this section it will be illustrated how urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0149 for g = 1, …, G can be obtained for models outside the (multivariate) normal linear modelling framework (previously labelled situation 3) based on the observed Fisher information using the R package numDeriv.44 https://cran.r-project.org/web/packages/numDeriv/

Again using the data from Stevens (1996; appendix A), a logistic regression model is specified in which y, whether a child is encouraged to watch Sesame Street (0 = no, 1 = yes), is predicted from gender (D1i equals 1 for a girl and zero otherwise, D2i equals 1 for a boy and zero otherwise), and centred age x:

urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0150(46)

The hypothesis of interest is

urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0151(47)

that is, girls are more encouraged than boys and older children are more encouraged than younger children.

The top part of Table 6 presents the input the R package Bain needs in order to evaluate H1. Note that urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0152 and urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0153 are computed using the observed Fisher information matrix rendered by the R package numDeriv using the data for group 1 (D1g,xg) and group 2 (D2g,xg), respectively. In the bottom part of Table 6 the output resulting from Bain is presented. It can be observed that H1 is not supported by the data, with MBF1u = 0.53. Note that, for the example at hand, urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0154 computed using the observed Fisher information matrix is virtually identical to urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0155 computed using the expected Fisher information matrix with the R package glm. Note, however, that this does not always have to be the case. Researchers preferring the expected Fisher information matrix (but see Efron & Hinkley, 1978) will have to replace the computations with numDeriv by formulae for the expected Fisher information for logistic regression models (see, for example, McCullagh & Nelder, 1989, pp. 115–117).

Table 6. A two‐group logistic regression
Input for the R package Bain
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0156 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0157 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0158
0.50 0.60 −0.01
N 1 N 2
125 115
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0159 urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0160
0.03 0.00 0.04 −0.00
0.00 0.00 −0.00 0.00
Output from the R package Bain
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0161
0.03 −0.00 0.00
−0.00 0.04 −0.00
0.00 −0.00 0.00
b
0.008 0.009
urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0162
4.28 −0.02 0.03
−0.02 4.39 −0.04
0.03 −0.04 0.06
MBF1u
0.53

Note

  • The number in italics is referred to in the text. Also urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0163 does not change when computed using the expected Fisher information.

9 Discussion

In this paper the approximate adjusted fractional Bayes factor BF, which is suited for the evaluation of informative hypotheses if data are sampled from one population, has been generalized to the multiple population approximate adjusted fractional Bayes factor MBF, which is suited for the evaluation of informative hypotheses if data are sampled from one or multiple populations. Both BF and MBF are implemented in the R package Bain.

The result is a versatile and generally applicable approach for the evaluation of informative hypotheses by means of the Bayes factor in a wide range of statistical models. However, as mentioned earlier in the paper, there are number of topics that deserve further research. The first topic is which sample sizes are required to obtain an accurate normal approximation of the posterior distribution for a wide range of statistical models. The second topic concerns the choice of b, that is, what are the properties of our proposal and what are potential alternatives (the interested reader is referred to Gu et al. (2016) for one study on this topic). The third topic is further development of Bain such that it is easier for users to deal with, what was previously called situation 3, that is, models for which numDeriv or other approaches have to be used to obtain the covariance matrix of the parameters of interest for each of the groups in the data set. The fourth topic is more philosophical in nature. It concerns the question whether there is an intrinsic Bayes factor corresponding to our MBF. The fifth topic concerns a modification of the approach presented in this paper such that it can be applied in variable selection problems (see, for example, O'Hara & Sillanpaa, 2009). The spike‐and‐slap prior is known to perform well in variable selection problems with sparse data, for example, regression models with a relatively large number of persons to number of predictors ratio, and in which only a few predictors are expected to have a substantial regression coefficient. Spike‐and‐slab prior‐based variable selection is currently an exploratory approach. In the future we will consider a more confirmatory approach based on an efficient evaluation of sets of informative hypotheses in which it is considered not only if the regression coefficient is substantial, but also its direction, and (partial) orderings of regression coefficients.

Acknowledgments

The first author is supported by the Consortium on Individual Development (CID) which is funded through the Gravitation program of the Dutch Ministry of Education, Culture, and Science and the Netherlands Organization for Scientific Research (NWO grant no. 024.001.003). The third author is supported by a NWO Vidi Grant (number 452‐17‐006).

    Appendix 1

    1 Further discussion of the consistency of the MBF

    We consider two different cases. First, we consider the case where hypothesis Hk only contains inequality constraints, and no equality constraints. Second, we consider the case where Hk contains (only) equality constraints. We will discuss both N → ∞ and one or more but not all of the Ng → ∞.

    As was shown in this paper,
    urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0164(48)

    If Hk only contains inequality constraints, that is, Hk : Rk θ rk, MBF reduces to

    urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0165(49)
    where the Pr(·) denote the probabilities that the posterior and prior distribution, respectively, support Hk. To give one simple example, if the prior distribution of θ1θ2θ3 has θB = [0, 0, 0] and an identity covariance matrix urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0166, then urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0167 because there are six possible orderings of three parameters that each have an equal probability.

    Again let Ng = ag n, where the ag represent the relative size of the samples from the G populations and let θ* denote the true value of θ. If the data support Hk, that is, θ* ∈ Hk, then if n → ∞ then urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0168, and the posterior distribution in the numerator of equation 12 is increasingly concentrated around urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0169 and consequently fk → 1. Analogously, if the data do not support Hk, that is, urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0170, fk → 0. This follows from asymptotic theory; see, for example, Gelman et al. (2013, Chapter 4). The prior distribution in the denominator of equation 12 is independent of the Ng and thus independent of n. As can be seen from the combination of equations 22 and 34, for each group the second‐order derivatives (which can for the vast majority of statistical models be written as the sum of Ng contributions) are weighted with bg = J*/G × 1/Ng, that is, asymptotically each element of equation 22 is independent of Ng. Consequently, asymptotically ck is a constant that is independent of n. This is exemplified by equation 35.

    We now have all the ingredients in place to show that MBFkc, where Hc is the complement of Hk, is consistent. Note that, due to the complementary nature of Hc, fc = 1 – fk and cc = 1 – ck and thus
    urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0171(50)

    Then if θ* ∈ Hk and n → ∞, then MBFkc → 1/ck × cc/0 → ∞ and if urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0172 and n → ∞, then MBFkc → 0/ck × cc/1 → 0, which implies consistency.

    Theorem 4.1 from De Santis and Spezzaferri (2001) for the generalized fractional Bayes factor and our exposition in the context of Example 1 for the MBF provide evidence for consistency if Hk : Skθ = sk. Further evidence is obtained by realizing that each equality constraint (e.g. θ = 0) can be written as an about‐equality constraint urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0173 for z → 0. If each equality constraint is rewritten in this manner, the exposition given at the beginning of this section applies to Hk : Skθ = sk and also to Hk : Skθ = skRkθ > rk.

    If Ng → ∞ for some but not all of the G groups, an analogous line of reasoning can be used to show that MBF shows reasonable behaviour. If the data support Hk, that is, θ* ∈ Hk and some of the group sizes increase, then the posterior distribution in the numerator of equation 12 is increasingly concentrated around the parameters corresponding to the groups with increasing group sizes (some of the urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0174) and η*. Consequently, fk will become larger but will not attain its maximum value 1.0. Analogously, if urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0175, fk will become smaller, but will will not attain its minimum value 0.0. Note that ck is a constant irrespective of whether n → ∞ or that some of the group sizes go to infinity. These ingredients can be used to show that the behaviour of the MBF is reasonable. Looking at equation 14, it can be seen that if θ* ∈ Hk, MBF will increase (to a boundary value, not to infinity) if some of the group sizes go to infinity; and if urn:x-wiley:00071102:media:bmsp12145:bmsp12145-math-0176, MBF will decrease (to a boundary value, not to zero). A proof and illustration in the context of a simple model can be found in Example 1.

      Number of times cited according to CrossRef: 1

      • Sample-size determination for the Bayesian t test and Welch’s test using the approximate adjusted fractional Bayes factor, Behavior Research Methods, 10.3758/s13428-020-01408-1, (2020).