## William Astle wja24@cam.ac.uk ## Last updated 2017/11/12 This directory contains summary results of meta-analyses of UK10K/1000 Genomes imputed GWAS of 36 blood cell traits. The file TRAIT_MAP.tsv contains a mapping from the trait names used in the filenames to the trait names used in the publication. (Astle et al. 2016, Cell 167 1415–1429; http://dx.doi.org/10.1016/j.cell.2016.10.042) The study subjects contributing to the analyses were the Europeans in the UK Biobank interim (summer 2015) imputed genotype release and the Europeans with imputed genotypes in INTERVAL. Study subjects in INTERVAL who were estimated to be genetically identical with at least one study subject in UK Biobank were removed from INTERVAL. Phenotypes were inverse rank normalised. Analyses were based on linear mixed models and were performed using BOLT-LMM. Meta-analyses were: a) three-way: an inverse variance weighted fixed effects meta-analysis stratified into the UK BiLEVE component of the UK Biobank study (abbreviated UKBIL), the complementary component of the UK Biobank study (abbreviated UKBB) and the INTERVAL study (abbreviated INT). Standard errors were adjusted using Genomic Control at the sub-study level and at the meta level. b) two-way: an inverse variance weighted fixed effects meta-analysis stratified into the UK BiLEVE component of the UK Biobank study and the complementary component of the UK Biobank study. Standard errors and P-values were not GC corrected. The primary purpose of this set of analyses was to provide a complete set of UK Biobank specific summary statistics for return to UK Biobank. All results were filtered by a novel meta-analysis heterogeneity score computed from the three-way meta-analysis. See http://dx.doi.org/10.1016/j.cell.2016.10.042 for a full description of the analyses. The results files are in five formats, as follows: wide_form - complete three-way meta-analysis summary statistics, including sub-study level and meta-level statistics narrow_form - partial three-way meta-analysis summary statistics, no sub-study level statistics, meta-level statistics only igv_gwsig - IGV files containing summary statistics on all variants associated significantly with the relevant trait at the critical level alpha=8.31x10e-9 igv_condsig - IGV file containing summary statistics for a set of variants identified by a stepwise model selection procedure to explain the genome-wide significant associations for the relevant trait parsimoniously (corresponding to table S4 of Astle et. al) ukbb_ukbil_meta - two-way meta-analysis of UKBB and UKBIL sub-studies in the results format requested for data return by UK Biobank (http://www.ukbiobank.ac.uk/wp-content/uploads/2017/08/Return-of-Results_Guidance-Note_aug17.pdf) The columns of the wide_form, narrow_form, igv_gwsig and igv_condsig files are selected from the following variables: VARIANT - unique variant ID formed from GRCh37 coordinates as CHR:BP_REF_ALT ID - dbSNP release 47 rsID ID_dbSNP49 - dbSNP release 49 rsID CHR - GRCh37 chromosome coordinate BP - GRCh37 physical position (base pair) coordinate GENPOS - Genetic map position REF - GRCh37 reference allele (also the statistical baseline allele) with respect to the positive strand ALT - alternative allele (also the statistical effect allele) with respect to the positive strand ALT_MINOR - "TRUE" if the alternative allele is the (in sample) minor allele, "FALSE" otherwise DIRECTION - sign of EFFECT: "+" if EFFECT>0, "-" otherwise EFFECT - additive effect size estimate from a fixed effects meta-analysis; based on sub-study GC corrected standard errors SE - standard error of the meta-analysis effect size estimator; GC corrected and based on sub-study GC corrected standard errors P - except for igv_condsig files, a P-value for a Wald test against the null hypothesis that the meta-analysis additive effect=0; for igv_condsig files, a conditional P-value for a test against the null hypothesis that the additive effect=0 in a multiple regression of the rank normalised trait on the imputed allele dosages of the variants explaining the genome-wide significant associations for the trait, identified by a model selection procedure; for numerical reasons, P is truncated at 1e-300 in IGV files MLOG10P - minus log base 10 of P value for the meta-analysis Wald test GWSIG - "TRUE" if the Wald test is significant at the critical level alpha=8.31x10e-9, "FALSE" otherwise ALT_FREQ - in sample frequency of the ALT allele MA_FREQ - in sample frequency of the (in sample) minor allele R2 - in sample proportion of trait variance explained calculated as 2*ALT_FREQ*(1-ALT_FREQ)*EFFECT^2 STUDY_DIRECTIONS - triple in {+,-}x{+,-}x{+,-} indicating the signs of EFFECT_UKBB, EFFECT_UKBIL, EFFECT_INT EFFECT_UKBB - additive effect size estimate from UK Biobank (excluding UK BiLEVE) sub-study EFFECT_UKBIL - additive effect size estimate from UK BiLEVE sub-study EFFECT_INT - additive effect size estimate from INTERVAL sub-study SE_UKBB - standard error of UK Biobank (excluding UK BiLEVE) sub-study effect size estimator (not GC corrected) SE_UKBIL - standard error of UK BiLEVE sub-study effect size estimator (not GC corrected) SE_INT - standard error of INTERVAL sub-study effect size estimator (not GC corrected) P_UKBB - P-value for a Wald test against the null hypothesis that the additive effect=0 using UK Biobank (excluding UK BiLEVE) sub-study data (not GC corrected) P_UKBIL - P-value for a Wald test against the null hypothesis that the additive effect=0 using UK BiLEVE sub-study data (not GC corrected) P_INT - P-value for a Wald test against the null hypothesis that the additive effect=0 using INTERVAL sub-study data (not GC corrected) MLOG10P_UKBB - minus log base 10 of P-value for the UK Biobank (excluding UK BiLEVE) sub-study Wald test (not GC corrected) MLOG10P_UKBIL - minus log base 10 of P-value for the UK BiLEVE sub-study Wald test (not GC corrected) MLOG10P_INT - minus log base 10 of P-value for the INTERVAL sub-study Wald test (not GC corrected) ALT_FREQ_UKBB - in sample frequency of alternative allele in UK Biobank (excluding UK BiLEVE) sub-study ALT_FREQ_UKBIL - in sample frequency of alternative allele in UK BiLEVE sub-study ALT_FREQ_INT - in sample frequency of alternative allele in INTERVAL sub-study ALT_FREQ_MIN - minimum in sample frequency of alternative allele over the three sub-studies ALT_FREQ_MAX - maximum in sample frequency of alternative allele over the three sub-studies R2_UKBB - proportion of phenotypic variance explained in UK Biobank (excluding UK BiLEVE) sub-study calculated as 2*ALT_FREQ_UKBB*(1-ALT_FREQ_UKBB)*EFFECT_UKBB^2 R2_UKBIL - proportion of phenotypic variance explained in UK BiLEVE sub-study calculated as 2*ALT_FREQ_UKBIL*(1-ALT_FREQ_UKBIL)*EFFECT_UKBIL^2 R2_INT - proportion of phenotypic variance explained in INTERVAL sub-study calculated as 2*ALT_FREQ_INT*(1-ALT_FREQ_INT)*EFFECT_INT^2 I2 - three way meta-analysis I^2 COCHQ - three way meta-analysis Cochran's Q COCHQ_DF - degrees of freedom for three way meta-analysis Cochran's Q COCHQ_MLOG10P - minus log base 10 P-value for test against null hypothesis that Cochran's Q=0 PYRAMID_HET_SCORE - novel heterogeneity score for three way meta-analysis PYRAMID_HET_SIG - "TRUE" if the novel heterogeneity score > 27, "FALSE" otherwise INFO_UKBB_UKBIL - imputation information score for UK Biobank datasets INFO_INT - imputation information score for INTERVAL dataset ANCEST - ancestral allele, where identified CLUMP - identifier for the linkage disequilibrium clump the variant belongs to; clumps partition all the variants identified by the model selection procedures (i.e. a universal rather than trait specific partition); the partition is the largest such that every pair of variants with r^2>0.8 are in the same clump JOINT_DIRECTION - sign of JOINT_EFFECT: "+" if JOIN_EFFECT>0, "-" otherwise JOINT_EFFECT - additive effect size estimate from a multiple regression of the rank normalised trait on the imputed allele dosages of the variants explaining the genome-wide significant associations for the trait, identified by a model selection procedure JOINT_SE - standard error of the effect size estimator from a multiple regression of the rank normalised trait on the imputed allele dosages of the variants explaining the genome-wide significant associations for the trait, identified by a model selection procedure JOINT_MLOG10P - minus log base 10 of conditional P-value for a test against the null hypothesis that the additive effect=0 in a multiple regression of the rank normalised trait on the imputed allele dosages of the variants explaining the genome-wide significant associations for the trait, identified by a model selection procedure DERIV_FREQ - derived (non-ancestral) allele frequency The columns of the ukbb_ukbil_meta files are as follows: SNP - dbSNP release 49 rsID CHR - GRCh37 chromosome coordinate POS - GRCh37 physical position coordinate A1 - statistical effect allele (also the alternative allele), positive strand A2 - statistical baseline allele (also the GRCh37 reference allele), positive strand A1/A2 - GRCh37 reference allele (identical to column A2), positive strand EAF - in sample frequency of the statistical effect allele Beta - additive effect size estimate from a two-way fixed effects meta-analysis of the UK Biobank (excluding UK BiLEVE) and UK BiLEVE sub-studies se - standard error of the two-way meta-analysis effect size estimator (not GC corrected) P - P-value for a Wald test against the null hypothesis that the two-way meta-analysis additive effect=0 (not GC corrected) N - sample size INFO - imputation information score for UK Biobank datasets