Drug-target MR focuses on locating a gene known to encode a druggable protein, also known as cis-MR, and genetic variants within or near the gene of interest are used to characterise the effect(s) of the drug target on a single or multiple outcome(s).16 Ideally, selecting causal variants known to affect the drug target while maximising precision (statistical power) is preferred. However, typically the nature and number of causal variant(s) is unknown and hence impeding the instrument selection process.16 There is no gold standard strategy for instrument selection and the optimal approach often depends on the specific investigation and available data. Here, we outline several key model decisions.
The location of genetic variants relative to protein-encoding gene
Cis-acting genetic variants are within or near the protein-encoding gene of interest. The optimal distance between the variant and the gene has not been standardised and efforts to define the ideal range for determining cis versus trans functions are ongoing.22 Some previous drug-target MR studies used stringent selection criteria for location, such as variants within 100 kilobase pairs upstream and downstream of a protein-encoding gene or transcription start site.23 In contrast, others have used a broader flanking region (eg, 1 megabase (Mb) or even 5 Mb) to include more variants.24 25 However, using a broader genetic flanking region may include variants located at neighbouring genes (trans-acting) not encoding the drug target of interest and may erroneously model effects due to horizontal pleiotropy (ie, violation of exclusion restriction assumption).
Independent versus correlated variants within protein-encoding gene
Genetic instruments should be conditionally independent genetic predictors of the putative causal trait. In the context of cis-MR, a single genetic variant with the smallest univariate p value in the protein-encoding region is typically used as the instrument. The putative causal effect estimate can be obtained using the Wald ratio with a single instrument. However, this approach precludes the applications of robust methods for sensitivity analyses that require multiple instruments, such as the weighted median and MR-Egger, which are commonly employed in genome-wide MR analyses. In cis-MR, multiple cis-variants may be available if the GWAS is sufficiently large. However, using variants from the same gene region could violate assumptions underlying sensitivity analyses due to shared pleiotropy and non-independence.26 Therefore, genetic colocalisation analysis is often used as a complementary analysis to cis-MR to assess potential biases arising from linkage disequilibrium (LD).27 When multiple causal variants are presented within the protein-encoding gene, single-variant MR may not adequately capture all the genetic effects in the region, potentially leading to a loss of statistical power. Conversely, including all genetic variants from the same gene region may result in numerical instability due to multicollinearity among the variants.28 To enhance statistical power, researchers often select multiple candidate variants that are in partial LD as instruments. Various techniques have been introduced for this purpose including stepwise-pruning, conditional analysis, principal component analysis, factor analysis and Bayesian variable section.28
Stepwise-pruning and conditional analysis both depend on a correlation threshold parameter. The correlated instruments inverse-variance weighted method (also known as the generalised least squares method) is used to account for the genetic covariance matrix.28 There is no consensus on the optimal choice of LD correlation threshold. Empirically, employing larger thresholds (r2≥0.8) will result in a numerically unstable causal effect estimate due to the inversion of an ill-conditioned genetic correlation matrix.16 Principal component analysis is widely used in GWAS to adjust for population stratification.29 In cis-MR, its goal is to identify linear combinations of variants that are orthogonal to each other as an instrument and explain either 99% or 99.9% of the variance in the genetic data.30 Where weak instrument bias is a concern, factor analysis and Bayesian variable selection (eg, joint analysis of marginal summary statistics) can provide more reliable estimates.28 31
The choice of exposure to weight the genetic associations
The availability of high-throughput proteomics platforms, such as the aptamer-based multiplex protein assay SomaScan 11K Assay V.5.0 (SomaLogic) and the antibody-based affinity reagents Olink Explore 3072, enables large-scale drug-target MR, also known as ‘proteome-wide MR’. These platforms measure biological targets of many approved or developmental therapeutics, significantly improving drug development yield with available circulating proteins as well as other novel unknown opportunities for drug targeting.
Ideally, a drug-target MR will be most reliable when using protein abundance as the exposure where these variants are known as protein expression quantitative loci (pQTL) measured in disease-relevant tissues when available. For example, previous studies integrated human proteome data derived from brain tissue to identify and prioritise drug targets for neurological phenotypes (eg, Alzheimer disease).32 Although tissue-specific protein data has not yet been generated at a large scale,33 circulating protein levels are more readily accessible and offer a practical alternative in large biobanks, such as the UK Biobank Pharma Proteomics Project.34 However, these data may not necessarily capture biological effects from tissue types that are more relevant to the disease being studied.32
When a circulating protein is unavailable or does not accurately represent the protein perturbation arising from drug action, researchers can use other phenotypes upstream or downstream of the druggable protein to search for relevant genetic variants and corresponding weights for MR analyses. The weights refer to the effect sizes of these genetic variants on the relevant phenotypes. For upstream phenotypes, this can be achieved by using blood gene expression quantitative loci (eQTL) from eQTLGen consortium35 and/or tissue-specific eQTL from the Genotype-Tissue Expression consortium.36 Prior study showed that genetic effects on circulating protein abundance are often but not exclusively driven by regulation of transcription.37 However, the tissue-specific heterogeneity in eQTL/pQTL, if present, likely reflects actual biological differences between tissues. In drug development, ascertaining the tissue specificity is essential for ensuring that the drug exerts appropriate pharmacokinetic properties while minimising the risk of potential adverse effects in unrelated tissues.15
Where a drug exerts specific action on a protein and subsequently influences the downstream trait, drug-target MR using weights from the genetic associations with known downstream traits (biomarker or disease outcome) may provide a valid test for the effect of protein on disease. An example in cardiovascular medicine was lipid-modifying medication which involves the use of low-density lipoprotein cholesterol (LDL-C) to locate known drug targets encoding loci HMGCR (statins), NPC1L1 (ezetimibe), PCSK9 (PCSK9 inhibitors) for inferring the effects of pharmacological perturbations on coronary artery disease38 and subsequently investigation on MASLD.39 Similar examples include using glycated haemoglobin (HbA1c) reduction to identify functional variants related to antidiabetic drugs action40 and systolic blood pressure reduction for antihypertensive drugs to inform drug efficacy and potential side effects.41 Alternatively, genetic variants may be weighted by their association with a binary disease outcome which is an intermediate phenotype in the pathway between the exposure and the outcome of interest. For example, recent studies have used variants associated with type 2 diabetes and HbA1c reduction to identify functional variants related to glucagon-like peptide-1 receptor agonists (GLP1-RA).42 43
The principle of instrument selection should be to maximise statistical power while safeguarding against incorrect inferences. If selected genetic variants in the flanking region do not mimic the effects of pharmacological perturbation of the drug, this may raise concerns over the validity of the drug-target MR study. Ideally, selected instruments should be validated using positive control outcome(s) such as a clinically confirmed indication of the drug. Employing various instrument selection methods as sensitivity analyses can enhance the reliability of the study.