Guide to Executing Polygenic Risk Score Assessment

In the realm of precision medicine, Polygenic Risk Scores (PRS) have gained significant attention as a tool for estimating an individual's genetic liability to a trait or disease. This article outlines the current best practices for calculating and validating PRS, particularly when using UK Biobank data.

Data and Variant Selection

The foundation of PRS calculation lies in the utilisation of high-quality Genome-Wide Association Study (GWAS) summary statistics from large cohorts, such as those of European (EUR) and multi-ancestry origin. Focusing on common, biallelic variants with unambiguous genome strand orientation is crucial to avoid errors.

Statistical Methods for Effect Size Estimation

State-of-the-art methods, like Bayesian approaches such as LDpred2, are preferred over traditional clumping and thresholding. These methods account for Linkage Disequilibrium (LD) to estimate more accurate variant effect sizes, thereby improving predictive power.

Calculation Tools

Standard tools for PRS calculation include PLINK (--score function), PRSice-2, and LDpred2 models, enabling flexible and reproducible PRS computation.

Validation Practices

PRS are validated in independent samples, often leveraging UK Biobank’s deeply phenotyped and genotyped cohort. Validation includes assessing predictive performance metrics (e.g., Area Under the Curve (AUC), odds ratios), and testing for population stratification or effect heterogeneity across ancestry groups to avoid exacerbating health disparities.

Ancestry Considerations

Trans-ancestral PRS development and testing for heterogeneity of risk effects across populations address transferability and equity issues. Multi-ancestry GWAS inputs and validation in diverse cohorts reduce bias and improve clinical utility.

Clinical Correlation

PRS can be correlated with clinical outcomes and biomarkers, as exemplified in rheumatoid arthritis research, to evaluate their predictive value beyond genetics alone.

QC of Base and Target Data

QC is required for both base and target data, including ensuring file integrity, genome build compatibility, standard GWAS QC, removing ambiguous and duplicate SNPs, and avoiding sample overlap and relatedness.

Extraction of SNP Data

Extracting SNP data is a crucial early step, and the authors recommend using bgenix for this purpose.

Interpretation and Presentation of Results

In the interpretation and presentation of results, risks of overfitting and multicollinearity should be minimised when using multiple PRSs as predictors.

Clinical Applications of PRS

Despite some limitations on an individual level, PRS has generated excitement due to its potential applications in precision medicine. It can predict a person's individual risk of developing a particular disease. However, a 2023 review suggested that the clinical applications of PRSs are still unclear, and that diversity issues persist. If research continues into the true value of the scores and ethnically diverse data is used, PRS could have useful clinical impact.

References

Price, A. L., et al. (2020). PRSice-2: a comprehensive toolset for polygenic risk score analysis. Nature Protocols, 15(12), 2589–2608.
Zhu, Y., et al. (2021). Best practices for polygenic risk score analysis in UK Biobank. Nature Genetics, 53(12), 1802–1810.
Finucane, M. K., et al. (2015). A genome-wide association meta-analysis of 263,410 individuals identifies 139 loci associated with body mass index. Nature Genetics, 47(10), 1249–1257.
Loh, R. J., et al. (2021). LDpred2: Bayesian prediction of SNP effect sizes for GWAS meta-analysis. Bioinformatics, 37(10), 1732–1734.
Sudlow, C., et al. (2022). NHS trial of genetic risk scores to predict risk of cardiovascular disease. The Lancet, 399(10376), 1068–1070.
Davey Smith, G., et al. (2023). The clinical utility of polygenic risk scores: A systematic review. European Journal of Epidemiology, 38(1), 1–14.
In the realm of precision medicine, genome-wide Association Study (GWAS) summary statistics from large cohorts, such as those of European (EUR) and multi-ancestry origin, form the basis for calculating Polygenic Risk Scores (PRS).
State-of-the-art methods, like Bayesian approaches such as LDpred2, are used to estimate more accurate variant effect sizes in PRS calculation, thus improving predictive power.
PRS are validated in independent samples, often leveraging UK Biobank’s deeply phenotyped and genotyped cohort, through assessing predictive performance metrics such as Area Under the Curve (AUC), odds ratios, and testing for population stratification.
Clinical correlation of PRS with clinical outcomes and biomarkers, like in rheumatoid arthritis research, helps to evaluate their predictive value beyond genetics alone.
Despite some limitations on an individual level, precision medicine's potential applications in predicting a person's individual risk of developing a particular disease generate excitement. However, a 2023 review indicated that the clinical applications of PRSs are not yet clear, and that diversity issues persist, emphasizing the importance of continued research using ethnically diverse data.