Model-free Genetic Meta-analysis

Some years ago some colleagues and I wrote a paper about the Bayesian meta-analysis of genetic association studies (Minelli et al Statistics in Medicine 2005;24:3845-3861) and periodically since then people have contacted me asking for the WinBUGS code that we used in that paper and those requests have prompted me to reformulate the code in terms of the functions for calling WinBUGS for Stata as described in the book ‘Bayesian Analysis with Stata’.

Here is the background to the paper although of course you can get more details from the original.

Many genetic studies relate a binary outcome (disease yes/no) to a specific genetic variant. Since our autosomal chromosomes occur in pairs, when a particular genetic variant can take one of two forms, say little a and big A, then that individual will we either aa, aA or AA (there is no way to distinguish aA from Aa). In effect then we have a 2×3 table of numbers of people.

aa aA AA
Disease = Yes
Disease = No

We model these data in terms of the odds ratio of disease, but there are two odds ratios one for ORaA = aA vs aa and one for ORAA = AA vs aa. Very often researchers assume that ORAA = ORaA*ORaA and for many diseases and variants this works well because it is like assuming that one’s risk increases equally with each extra copy of the A variant.

However, the assumption that ORAA = ORaA*ORaA does not always apply and in the paper we suggested that we could use a meta-analysis to estimate both the form of the relationship between ORs and their sizes. We expressed the relationship in terms of a parameter

λ = logORaA/logORAA

What is more if we have an estimate of λ for a particular disease and variant, even if that estimate is subject to uncertainty, we can then use the information about ORaA to improve our estimate of the ORAA. Since a known or assumed relationship between ORaA and ORAA is called the mode of inheritance or sometimes the genetic model, we referred to our analysis as ‘genetic model free’.

We fitted this model using WinBUGS and investigated the impact of using different prior distributions using published and simulated data. Here I will show how we can automate such an investigation.

Let us start with some real data. Kato et al (Journal of Hypertension 1999;17:757-763) reported a meta-analysis of studies of a variant in the angiotensinogen gene and hypertension. Here is a summary of their data.

Study  Cases   Controls  
MM TM TT MM TM TT
Hata, 1994 2 20 83 3 34 44
Iwai, 1994 3 17 62 4 30 49
Nishiuma, 1995 3 30 31 17 84 48
Morise, 1995 5 23 52 5 32 63
Sato, 1997 8 39 133 12 62 119
Kamitani, 1994 6 34 68 9 48 47
Kato, 1999 20 214 483 18 134 363

 

So here the two variants are labelled as M and T and in the first study 83 people with hypertension had the TT genotype. The key question is whether the T variant is more or less common in cases than in controls, with the genetic model being of secondary interest.

In the first study the researchers recruited 81 controls and found that they divided in the ratio 3:34:44. This we can model these data by a multinomial distribution with n=81 and probabilities p01,p02,p03. We choose to parameterize β1=1, β2=p02/p01 and β3= p03/p01, then

p01=1/(1+ β2+ β3),    p02= β2/(1+ β2+ β3)    and     p03= β3/(1+ β2+ β3).

Similarly the 105 cases were in the ratio 2:20:83 and if these categories have probabilities q01,q02,q03, we can parameterize them in terms of α1=1, α2=q02/q01 and α3= q03/q01. But α2/β2 is ORTM and α3/β3 is ORTT so we define δ2=logORTM and δ3=logORTT and the formulae for the probabilities in cases can be written,

q01=1/(1+ β2exp(δ2)+ β3 exp(δ3))

q02= β2exp(δ2)/(1+ β2exp(δ2)+ β3 exp(δ3))

q03= β3 exp(δ3)/(1+ β2exp(δ2)+ β3 exp(δ3))

Now the problem is parameterized in terms of β2, β3, δ2, δ3 and the genetic model as described by λ = δ2/δ3.

In a meta-analysis we would expect that β2, β3 could well be different for each study because genetic variants are sometimes more common in one population than another. Similarly δ3 might vary with population because frequently genes interact with aspects of the environment that differ between studies, however we might be willing to assume that the mode of inheritance, λ, is the same for all studies.

We treated the β’s as fixed effects, 2 for each study. δ3 as a random effect N(θ,τ2) and λ as a constant. A basic WinBUGS program for fitting this model is,

model {

       for( i in 1:7 ) {

              p[i,1] <- 1/(1+b[i,1]+b[i,2])

              p[i,2] <- b[i,1]/(1+b[i,1]+b[i,2])

              p[i,3] <- b[i,2]/(1+b[i,1]+b[i,2])

              q[i,1] <- 1/(1+b[i,1]*exp(lambda*d[i])+b[i,2]*exp(d[i]))

              q[i,2] <- b[i,1]*exp(lambda*d[i])/

                   (1+b[i,1]*exp(lambda*d[i])+b[i,2]*exp(d[i]))

              q[i,3] <- b[i,2]*exp(d[i])/

                   (1+b[i,1]*exp(lambda*d[i])+b[i,2]*exp(d[i]))

              d[i] ~ dnorm(theta,tau2)

              ncont[i,1:3] ~ dmulti(p[i,1:3],tcont[i])

              ncase[i,1:3] ~ dmulti(q[i,1:3],tcase[i])

              b[i,1] ~ dnorm(0,0.0001)

              b[i,2] ~ dnorm(0,0.0001)

       }

       lambda ~ dbeta(1,1)

       theta ~ dnorm(0,0.0001)

       tau2 ~ dgamma(0.001,0.001)

}

WinBUGS cannot fit this model and crashes but OpenBUGS runs it without a problem. Working within Stata we read the data from a file call kato.dta which has the same structure as the table shown above and variable names case1, case2 etc. Here is the entire code for fitting the model and viewing the results.

use kato.dta, clear

gen tcase = case1+case2+case3

gen tcont = control3+control2+control1

wbslist (struct case1 case2 case3 , name(ncase) format(%4.0f) ) ///

   (struct control1 control2 control3 , name(ncont) format(%4.0f) ) ///

   (vect tcase tcont) using data.txt, replace

matrix b=J(7,2,0.5)

wbslist (lambda=0.5,theta=0,tau2=1,d=c(7{0})) (matrix b , format(%4.1f)) ///

   using init.txt , replace

tomodel modelfree.do model.txt

/*

model {

       for( i in 1:7 ) {

              p[i,1] <- 1/(1+b[i,1]+b[i,2])

              p[i,2] <- b[i,1]/(1+b[i,1]+b[i,2])

              p[i,3] <- b[i,2]/(1+b[i,1]+b[i,2])

              q[i,1] <- 1/(1+b[i,1]*exp(lambda*d[i])+b[i,2]*exp(d[i]))

              q[i,2] <- b[i,1]*exp(lambda*d[i])/

                  (1+b[i,1]*exp(lambda*d[i])+b[i,2]*exp(d[i]))

              q[i,3] <- b[i,2]*exp(d[i])/

                   (1+b[i,1]*exp(lambda*d[i])+b[i,2]*exp(d[i]))

              d[i] ~ dnorm(theta,tau2)

              ncont[i,1:3] ~ dmulti(p[i,1:3],tcont[i])

              ncase[i,1:3] ~ dmulti(q[i,1:3],tcase[i])

              b[i,1] ~ dnorm(0,0.0001)

              b[i,2] ~ dnorm(0,0.0001)

       }

       lambda ~ dbeta(1,1)

       theta ~ dnorm(0,0.0001)

       tau2 ~ dgamma(0.001,0.001)

}

*/

wbsscript using script.txt, replace ///

       model(model.txt) data(data.txt) init(init.txt) log(log.txt) ///

       burnin(1000) update(10000) set(lambda theta tau2 b) ///

       coda(mf) openbugs

wbsrun using script.txt    , openbugs

wbscoda using mf , clear openbugs

gen OR3 = exp(theta)

gen OR2 = exp(lambda*theta)

gen sd = 1/sqrt(tau2)

mcmcstats *

and here are the results

--------------------------------------------------------------------------
Parameter         n     mean       sd     sem   median         95% CrI
--------------------------------------------------------------------------
 b_1_1       10000   18.437 11.803   0.7491   14.840 (   5.743, 51.549 )
 b_1_2       10000   28.600   18.929   1.7735   22.805 (   8.778, 83.620 )
 b_2_1       10000   11.348   5.987   0.2278   9.906 (   4.217, 27.469 )
 b_2_2       10000   20.655   11.225   0.6528   18.350 (   7.726, 50.830 )
 b_3_1       10000   6.615   1.868   0.0263   6.284 (   3.882, 11.079 )
 b_3_2       10000   3.879   1.185   0.0181   3.678 (   2.176,   6.757 )
 b_4_1       10000   7.459   3.066   0.0735   6.801 (   3.438, 15.370 )
 b_4_2       10000   14.312   6.138   0.1968   12.965 (   6.585, 30.110 )
 b_5_1       10000   5.798   1.577   0.0250   5.535 (   3.404,   9.478 )
 b_5_2       10000   11.550   3.170   0.0560   11.070 (   6.803, 19.010 )
 b_6_1       10000   6.534   2.117   0.0320   6.150 (   3.508, 11.730 )
 b_6_2       10000   6.929   2.412   0.0413   6.516 (   3.471, 12.870 )
 b_7_1       10000   9.813   1.793   0.0383   9.597 (   6.883, 13.910 )
 b_7_2       10000   24.927   5.075   0.1376   24.260 ( 16.790, 36.808 )
 lambda       10000   0.179   0.130   0.0025   0.155 (   0.007,   0.474 )
 tau2         10000   7.486   8.726   0.1713   5.117 (   0.887, 29.250 )
 theta       10000   0.541   0.252   0.0035   0.527 (   0.076,   1.083 )
 OR3         10000   1.775   0.502   0.0069   1.694 (   1.079,   2.954 )
 OR2         10000   1.117   0.135   0.0024   1.075 (   1.001,   1.465 )
 sd           10000   0.488   0.235   0.0042   0.442 (   0.185,   1.062 )
--------------------------------------------------------------------------

I will show a couple of convergence checks before commenting on these results.

traceModelFree

densityModelFree

Mixing is fine and there is no obvious problem with the convergence, so let’s return to the findings.

The genetic model has an average lambda of about 0.18. On this scale 0.5 is the value that most people assume corresponding to ORTT = ORTM*ORTM. This value is not well supported by the analysis and if anything we are closer to lambda=0 when ORTM=1, which is known as the recessive model because we only see increased risk when both variants are T. Here the average effect of TT across the studies is to increase the odds of disease risk by 1.8 compared with MM individuals. The standard deviation is, however, quite large, which suggests that there is a lot of heterogeneity between studies. Perhaps, this variant interacts with something in the environment that differs across the populations or perhaps it is methodological, for instance it might relate to the inclusion criteria of the studies.

This particular analysis assumed a Beta distribution, B(1,1), for the prior on lambda this is flat between 0 and 1 but importantly excludes the possibility that lambda is outside that range. So this prior rules out the possibility the ORTM> ORTT. In the paper we looked at the impact of several different priors and I am sure that you can see other possibilities, such as assuming a fixed ORTT or for putting structure on the betas.

In the paper we then simulated lots of datasets based on the pattern in the Kato study and investigated the fit. Suppose that we want to base our simulation on the posterior mean values of the betas,

use kato.dta, clear

gen tcase = case1+case2+case3

gen tcont = control3+control2+control1

input b1 b2

18.4 28.6

11.3 20.7

6.6 3.9

7.5 14.3

5.8 11.6

6.5 6.9

9.8 24.9

local lambda = 0.18

local sd = 0.49

local theta = 0.54

we can then simulate a new data set and write that to our data file

gen p1 = 1/(1+b1+b2)

gen p2 = b1/(1+b1+b2)

gen p3 = b2/(1+b1+b2)

gen d = rnormal(`theta’,`sd’)

gen OR2 = exp(d)

gen OR1 = exp(`lambda’*d)

gen q1 = 1/(1+b1*OR1+b2*OR2)

gen q2 = b1*OR1/(1+b1*OR1+b2*OR2)

gen q3 = b2*OR2/(1+b1*OR1+b2*OR2)

gen rcase1 = rbinomial(tcase,q1)

gen rcase2 = rbinomial(tcase-rcase1,q2/(q2+q3))

gen rcase3 = tcase-rcase1-rcase2

gen rcont1 = rbinomial(tcont,p1)

gen rcont2 = rbinomial(tcont-rcont1,p2/(p2+p3))

gen rcont3 = tcont-rcont1-rcont2

wbslist (struct rcase1 rcase2 rcase3 , name(ncase) format(%4.0f) ) ///

              (struct rcont1 rcont2 rcont3 , name(ncont) format(%4.0f) ) ///

              (vect tcase tcont) using data.txt, replace

This code creates a new dataset based on the Kato study and write it to the text file data.txt. We could place this code in a loop so that we generate lots of datasets and each time we could run OpenBUGS, read the results and store whatever we choose. Then we could run the do file and go and have tea, because it will take a while if we want a lot of simulations (we analysed 1,000 random datasets for each of several different choices of prior distribution).

Share this page:

Share this page:

John

About John

Professor of Genetic Epidemiology

View more posts by John

Subscribe to John's posts

One response to “Model-free Genetic Meta-analysis”

  1. Pantelis Bagos

    Hi John,
    Congratulations for the nice and useful resource you have here. Just for completeness, a frequentist implementation in STATA, based on your original model can be found here http://www.compgen.org/tools/multivariate-genetic along with some other similar multivariate models (models using binomial likelihood etc).
    Regards
    Pantelis

Leave a Reply

Network-wide options by YD - Freelance Wordpress Developer