Genome-wide studies reveal factors associated with circulating uromodulin and its relationships to complex diseases

Uromodulin (UMOD) is a major risk gene for monogenic and complex forms of kidney disease. The encoded kidney-specific protein uromodulin is highly abundant in urine and related to chronic kidney disease, hypertension, and pathogen defense. To gain insights into potential systemic roles, we performed genome-wide screens of circulating uromodulin using complementary antibody-based and aptamer-based assays. We detected 3 and 10 distinct significant loci, respectively. Integration of antibody-based results at the UMOD locus with functional genomics data (RNA-Seq, ATAC-Seq, Hi-C) of primary human kidney tissue highlighted an upstream variant with differential accessibility and transcription in uromodulin-synthesizing kidney cells as underlying the observed cis effect. Shared association patterns with complex traits, including chronic kidney disease and blood pressure, placed the PRKAG2 locus in the same pathway as UMOD. Experimental validation of the third antibody-based locus, B4GALNT2, showed that the p.Cys466Arg variant of the encoded N-acetylgalactosaminyltransferase had a loss-of-function effect leading to higher serum uromodulin levels. Aptamer-based results pointed to enzymes writing glycan marks present on uromodulin and to their receptors in the circulation, suggesting that this assay permits investigating uromodulin’s complex glycosylation rather than its quantitative levels. Overall, our study provides insights into circulating uromodulin and its emerging functions.


Downstream characterization of GWAS meta-analysis findings
Association with urine uromodulin levels The association between the index SNP at each genome-wide significant locus and the concentrations of uromodulin in the urine were queried in a GWAS meta-analysis of up to 29,512 participants of 13 cohorts (1). Urine uromodulin was quantified with a previously described ELISA (2) in all but one study, that used the same RBM immunoassay as used by the ORIGIN study in this project. Summary statistics for the index SNPs were extracted from results of an inverse-variance weighted fixed-effects meta-analysis of GWAS of urinary uromodulin levels based on well imputed and quality-controlled genotypes (1).

Annotation, enrichment analyses and functional genomics
SNPs were annotated by querying the SNiPA database (3) version 3.4 (November 2020). In SNiPA v3.4, the variant effect annotations were based on the Ensembl 100 release. Based on SNiPA annotations and colocalization analyses, we performed gene prioritization at each locus by assigning each potential candidate gene evidence scores. The prioritization procedure has been described previously (4), with additional evidence scores for colocalization with eQTL and pQTLs that were prioritized over other sources of evidence. In the case of ties, the closest gene was assigned. As gene prioritization from GWAS signals is not always unambiguous, for example because of incomplete tissue representation in some of the underlying data sources, we report two most likely causal genes at three of the loci to account for uncertainty. Gene Ontology and KEGG pathway enrichment analyses based on the resulting 16 potentially causal genes were performed separately for the 4 genes from the antibody and the 12 genes from the aptamer-based assay using the R package clusterProfiler (5) with default parameters. Statistically significant enrichment was defined as Benjamini-Hochberg-adjusted p-value <0.05. Results using the 13 entirely automatically assigned most plausible gene at each locus or the 13 biologically most plausible genes were essentially the same (data not shown).
Macroscopically identifiable cortex and medulla were obtained from grossly uninvolved tumor-adjacent normal tissue in nephrectomy specimens. RNA extraction and RNA-seq was performed by GeneWiz. The paired end fastq files were trimmed and aligned to human reference genome sequence hg38 using STAR 2.7.5b (6)  Publicly available single-nucleus ATAC-seq data generated from 12,720 human kidney cells (8) were obtained at https://susztaklab.com/human_kidney/igv/. Open chromatin peaks in different kidney cell types were saved and displayed together with the bulk RNA-seq and ATAC-seq tracks based on the corresponding data we generated from human kidney cortex and medulla.

Independent SNP selection, statistical fine mapping and credible set annotation
In order to identify additional independent SNPs other than the index SNP at each associated locus, conditional analysis using GCTA-COJO (9) were performed as described previously (10) using European ancestry GWAS summary statistics as input. HRC-imputed genotypes of the

Colocalization with gene expression, plasma protein levels, biomarkers and diseases
To gain insights into the molecular basis of the identified genetic associations and to determine the most likely causal gene at each locus, we performed colocalization analyses of uromodulin-associated SNPs from the European ancestry participants with their respective associations from GWAS summary statistics of gene expression, plasma protein levels, biomarkers and diseases. The precomputed eQTL data from the GTEx Project V8 (11) (36 nonbrain tissues) and the NEPTUNE study (12) (NephQTL from tubulo-interstitial and glomerular kidney portions) were downloaded from the respective web sites. GTEx brain tissues were excluded from the analysis. Association summary statistics of plasma protein levels (2,751 proteins represented by 3,022 SOMAmers) (cis-pQTL) were downloaded from http://www.phpc.cam.ac.uk/ceu/proteins/. Colocalization analysis was also performed between the uromodulin-associated loci identified with the antibody-based assay with the levels of any proteins with which the index SNP showed association in trans, which was true for the QSOX2 protein (trans-pQTL). Phenotypes of 30 biomarkers were from UK Biobank (application #20272) and GWAS were performed using BOLT-LMM. Precomputed associations with 1,404 diseases and binary traits as defined by PheCodes from the UK Biobank were downloaded from https://www.leelabsg.org/resources. Additionally, we performed colocalization with five UMOD-related traits, namely eGFR, CKD, SBP, DBP and urine uromodulin obtained from the CKDGen Consortium (10,13), ICBP consortium (14), and from The detailed procedure of colocalization with eQTL and pQTL has been described previously (4). A cis window was defined by extending the corresponding gene region by 500 kb upstream and downstream. When there was at least one cis-eQTL or cis-pQTL with P<0.001 within ±100 kb of a uromodulin-associated index SNP, colocalization analyses were performed within the eQTL/pQTL cis window as defined in their primary publications (11,12,15). For colocalization with biomarkers and diseases, a colocalization analysis was performed within ±500 kb of an index SNP, when at least one variant of MAF >0.01 had a p-value<0.001. The 'coloc.fast' function from the R package gtx with default parameters and prior definitions were used, an adaptation of the colocalization method introduced by Giambartolomei (16).
Positive colocalization was defined when the posterior probability of a single shared variant underlying the association signals for both traits (H4) was >0.8 (4).
Colocalization using marginal association statistics did not yield positive colocalization with urine uromodulin. Given that the association patterns at the UMOD/PDILT locus are however very similar, and two independent association signals with serum uromodulin were detected at this locus, conditional colocalization analysis with urine uromodulin levels were performed (17). After selecting independent SNPs associated with urine uromodulin as described in "Independent SNP selection", conditional association statistics were generated separately for serum and urine uromodulin for each independent selected SNP by conditioning on the other selected independent SNPs. Colocalization analyses were subsequently performed for each pair of independent SNPs in serum and urine uromodulin using the conditional association statistics as input.

Gene-by-gene interaction analyses
To detect potential genetic interactions, pairwise interaction analyses between the 13 index procedures. Densitometry of signals was carried out by using ImageJ software (22). Possible differences between groups were assessed by non-parametric Mann-Whitney test.
C57BL/6N wild-type mice were housed in a light-and temperature-controlled environment with ad libitum access to tap water and standard chow. Mice were sacrificed at 4 weeks by cervical dislocation following anaesthesia with isoflurane (Minrad International Inc.) for kidney collection. Nephron segment microdissection was performed as previously described (23).
RNA isolation and quantitative RT-PCR: total RNA was extracted micro-dissected mouse nephron segments using the RNAqueous® kit (Invitrogen). One microgram of RNA was used to perform the reverse transcriptase reaction with iScript TM cDNA Synthesis Kit (Bio-Rad).
Changes in target genes' mRNA levels were determined by relative RT-qPCR with a CFX96 TM Supplementary Figure 2: Forest plots of the index SNP at each of the three significant loci from the antibody-based primary meta-analysis to illustrate effect sizes in the individual studies and the combined effect estimate. Effect sizes on the x-axis correspond to SD (standard deviation) changes in analyzed uromodulin. The allele to which the depicted effect refers to is shown next to the SNP identifier.
In the antibody-based regional association plot on chromosome 17, rs146261845 is not available because it has an MAF smaller than 0.01 and was therefore filtered out.
Supplementary Figure 4: Comparison of genetic effect sizes between the primary, eGFRadjusted association analyses (x-axis) and a sensitivity-analysis not adjusted for eGFR (yaxis). Dots represent SNPs present in both analyses and with p-value <1e-5. Panel A shows results from the antibody-based analyses (513 SNPs), and panel B results from the aptamerbased analyses (3244 SNPs). Rho, Spearman's correlation coefficient.
Supplementary Figure 5: Miami plots of sex-stratified primary association analyses of circulating uromodulin. The upper plot shows results from the antibody-based analyses (8,397 men, upwards direction; 5,515 women, downwards direction). The lower plot shows results from the aptamer-based analyses (3,535 men, upwards direction; 3,860 women, downwards direction). The x-axis shows genomic location (GRCh37/hg19 coordinates), and the y-axis the -log10(two-sided p-value) for SNP associations with uromodulin levels. Plots were generated using the EasyStrata v8.6 (26). The red line indicates the genome-wide significance threshold (P<5e-8).