Biogeographic and disease-specific alterations in epidermal lipid composition and single-cell analysis of acral keratinocytes

The epidermis is the outermost layer of skin. Here, we used targeted lipid profiling to characterize the biogeographic alterations of human epidermal lipids across 12 anatomically distinct body sites, and we used single-cell RNA-Seq to compare keratinocyte gene expression at acral and nonacral sites. We demonstrate that acral skin has low expression of EOS acyl-ceramides and the genes involved in their synthesis, as well as low expression of genes involved in filaggrin and keratin citrullination (PADI1 and PADI3) and corneodesmosome degradation, changes that are consistent with increased corneocyte retention. Several overarching principles governing epidermal lipid expression were also noted. For example, there was a strong negative correlation between the expression of 18-carbon and 22-carbon sphingoid base ceramides. Disease-specific alterations in epidermal lipid gene expression and their corresponding alterations to the epidermal lipidome were characterized. Lipid biomarkers with diagnostic utility for inflammatory and precancerous conditions were identified, and a 2-analyte diagnostic model of psoriasis was constructed using a step-forward algorithm. Finally, gene coexpression analysis revealed a strong connection between lipid and immune gene expression. This work highlights (a) mechanisms by which the epidermis is uniquely adapted for the specific environmental insults encountered at different body surfaces and (b) how inflammation-associated alterations in gene expression affect the epidermal lipidome.

Standard Skin Sampling Discs were placed on the skin with sterile tweezers and gloved hands. Adhesive skin discs were pressed down with gauze for 5 seconds. After another 25 seconds, the tape was gently removed, and transferred to D-SQUAME® Standard Storage Cards. Six discs were applied sequentially to each anatomic site that was sampled. The first two discs were discarded to eliminate superficial corneocytes and sebum, and the subsequent four discs were submitted for analysis. Samples were stored at -80℃.

Lipid extraction and quantification
D-SQUAME disks were extracted sequentially with a polar and a nonpolar organic solvent after addition of a known amount of surrogate standard solution (C16 ceramide-d31 (NS ceramideCer(d18:1/16:0))-d31), cis-10-heptadecenoic acid (FA17:1n7) and cholesteryl sulfate-d7). The organic extracts were combined and evaporated to dryness. The dried extract was reconstituted, and an aliquot was analyzed on a Waters UPC2/Sciex QTrap 5500 mass spectrometer SFC-MS/MS system in MRM mode using characteristic parent-fragment mass transitions for each analyte trace. The semi-quantitative determination of individual analytes was based on their peak area compared with the peak areas of their corresponding surrogate standards, for which concentrations were known. C16 ceramide-d31 was used as surrogate standard for all ceramides, all fatty acids were referenced to cis-10-heptadecenoic acid, and cholesteryl sulfate was referenced to cholesteryl sulfate-d7. Absolute concentrations of lipids were determined by comparing their peaks to those of relevant internal standards. Concentrations were given in picomol/disk for individual analytes, as well as each lipid class. Additionally, the percent composition of individual ceramide subtypes of the ceramide fraction was listed for each sample. In total, 351 lipids were simultaneously monitored using this technique.

Single-cell RNA-sequencing
Generation of single cell suspensions for single-cell RNA-sequencing (scRNA-seq) was performed as follows: Samples were incubated overnight in 0.4% dispase (Life Technologies) in Hank's Balanced Saline Solution (Gibco) at 4°C. Epidermis and dermis were separated. Epidermis was digested in 0.25% Trypsin-EDTA (Gibco) with 10U/mL DNase I (Thermo Scientific) for 1 hour at 37°C, quenched with FBS (Atlanta Biologicals), and strained through a 70μM mesh. Dermis was minced, digested in 0.2% Collagenase II (Life Technologies) and 0.2% Collagenase V (Sigma) in plain medium for 1.5 hours at 37°C, and strained through a 70μM mesh. Epidermal and dermal cells were recombined and libraries were generated using the 10X Chromium platform. Libraries were then sequenced on the Illumina NovaSeq 6000 sequencer to generate 151-bp paired end reads. Approximately 6,500 median UMI counts per cell were obtained. Data processing including quality control, read alignment, and gene quantification were conducted using 10X Cell Ranger v3.1 using their default parameters. Empty droplets and cells with UMI counts less than a calculated threshold based on the distribution of UMI counts per cell were removed, resulting in elimination of cells with less than approximately 600 UMI. Log normalized data and ANOVA was used to calculate differential expression among cell types. Data normalization and cell library size correction was performed using the median of ratios method implemented in the R package DESeq2. Clustered cells were mapped to corresponding cell types using "singleR" R package and the Human Primary Cell Atlas as a reference data. Keratinocytes were then further subdivided into basal, spinous, and granular layer keratinocytes by matching their cell cluster gene signatures with putative cell-type specific markers (DST, KRT5, KRT10, and KLK7).

RNA-sequencing (RNA-seq)
Skin biopsies of 4 mm in diameter were taken, snap frozen in liquid nitrogen, and subsequently stored at -80°C until further processing. RNA isolation and sequencing was performed as previously described by our group (2). The paired-end reads were mapped using STAR (3) to human build GRCh37, and only uniquely mapped reads were utilized for subsequent analysis. Raw whole tissue RNA-sequencing FASTQ files were obtained from previously published studies (4). We obtained 20 million reads per sample on average and approximately 86% of the reads were uniquely mapped and used for the downstream analysis." Gene expression levels were quantified (GENCODE v24 was used as reference) and normalized by HTSeq(5) and DESeq2 (6), respectively. Negative binomial model in DESeq2 were used to conduct differential expression analysis. The expression of 83 genes (Supplemental Table 3) that have been reported in the literature to be involved in epidermal lipid metabolism was assessed in biopsy specimens obtained from acral and trunk skin.
Differentially expressed lipid genes across these two anatomic locations were defined as those with a fold change of 2 or greater and an FDR adjusted p value of less than 0.05.

Statistical analysis
All statistical analysis were performed using R software (7). Prior to statistical analysis, samples with values below Limit of Detection (LOD) were assigned a concentration equal to the LOD divided by the square root of two. There were otherwise no missing data. The assumptions of homoscedasticity and heteroscedasticity were evaluated, and non-normal data was transformed using the BoxCox method. Outliers were identified using R package "extremevalues" (8), and when present, were winsorized from the analysis, so that outliers were set equal to the nearest non-outlier value. Following normalization and transformation, linear mixed-effects model was used to identify lipids with body site-specific expression.
FDR-adjusted p-values were calculated using the Benjamini-Hochberg procedure.
Differentially expressed lipids amongst diagnostic groups (AD, PP, SK, AK, and TI) were identified using ANOVA. Heatmaps comparing differential expression of lipids were generated with the R package "pheatmap"(9).
Receiver-operating characteristic (ROC) curves were generated and area under the curves (AUC) were calculated for individual SC lipids that demonstrated diagnostic accuracy as single analyte classifiers of psoriasis or atopic dermatitis using the R package "ROCR"(10).
Step forward logistic regression models were fitted using Firth's bias reduction method with the R package "logistf"(11). Models were validated using the K-fold cross validation method.
ROC curves and AUC values for these models were calculated as just described. Linear regression assumptions about the normality of residuals were examined by use of the Shapiro-Wilk test.
Relatedness between different body groups and between diagnostic groups was determined by calculating the Euclidian Distance as follows: simple matching distance (SMD) was calculated for each respective analyte across the two diagnostic groups and separately for two body groups (for example, between PP and PN or between PF and VF, respectively). The process was repeated for all 351 analytes and across each pair of diagnostic groups and pair of body groups. SMD of each of the analytes were squared and summed, and the square root was obtained. A cluster dendrogram was created based on these values. With this data, a PCA was also created, and data for the PCA was scaled and centered.
Product-moment correlation coefficients (PMCCs) were calculated and graphed for each lipid-lipid combination to show relationships of co-expression between specific lipids. A 2dimensional visual representation of all lipid-lipid correlations was created using a dimensionality reduction technique, t-distributed stochastic neighbor embedding (t-SNE), and calculated with the R package "Rtsne"(12), using the pairwise distance formula, 1-r 2 , where r represents the correlation coefficient. Heinze G, and Puhr R. Bias-reduced and separation-proof conditional logistic regression with small or sparse data sets. Stat Med. 2010;29(7-8):770-7. 12.

Supplemental Figure Legends
Supplemental Figure 1. Ceramide structures. Ceramides are composed of an amino alcohol (e.g. sphingosine) and a fatty acid, which may have an alpha hydroxyl group. Shown here are the epidermal ceramide classes that were monitored by targeted mass spectrometry. An "A" (ADS, AH, AP, and AS) specifies an alpha-hydroxy fatty acid. Supplemental Figure 8. Differential expression of lipid-associated metabolic genes in psoriasis. RNA-seq was performed on RNA extracted from biopsies obtained from psoriasis lesional skin and healthy controls. Expression of lipid-associated metabolic genes is shown as box-and-whisker plots. The upper and lower bars connected to each box indicate the boundaries of the normal distribution, and the box edges mark the first and third quartile boundaries within each distribution. The dark vertical line represents the median. The unpaired Student's t-test was used to calculate significance.

Supplemental
Supplemental Figure 9. Cytokine-induced differential expression of lipid-associated metabolic genes predict lipid alterations in psoriasis. (A) RNA-seq analysis was performed on biopsies obtained from psoriasis lesional (PP), atopic dermatitis lesional (AD) and healthy control skin (NN). Differential expression of representative genes involved in ceramide synthesis are shown. (B) 50 human primary keratinocyte cell lines were cultured with indicated cytokines (including the psoriasis-associated cytokines IFN-g, IL-17A, TNF)(10). RNA was then extracted and gene expression was evaluated with RNA-seq. Results demonstrate that in vitro culture with TNF increases the expression of SPTLC1, SPTLC2 and decreases the expression of SPTLC3. Also, the expression of CERS6 is decreased by IFN-g and TNF. These cytokine-induced alterations in gene expression mimic the altered gene expression in psoriasis lesional skin.  Figure S1). The resulting t-SNE clusters demonstrate that 1) not all lipids of the same class or subclass correlate with one another, and 2) lipids of different classes and different subclasses can correlate with one another. (B) Bar graphs illustrate the percent of ceramides that correlated with each other. Upper graph illustrates the negative correlative relationship between 18 carbon sphingoid base ceramides and 22 carbon sphingoid base ceramides. Lower graph illustrates the positive correlative relationship between the NS and NDS ceramides with unsaturated FA24:1 and the negative correlative relationship these ceramides have with the saturated FA24:0. (C) Correlation matrix representing the patterns of negative correlations among different lipid subclasses. The intensity of the color at the intersect between a column and row represents the average correlation coefficient for that particular lipid subclass combination. The size of the circle within each colored box represents the percent of lipids that negatively correlated. Hierarchical clustering is used to order the lipid subclasses based on their patterns of correlation. From this correlation matrix, it is evident that cholesterol sulfate has a strong negative correlation with many AH and NH ceramides. (D) Ceramides were grouped by the length of their sphingoid bases. Lipid expression across different groups was then assessed and a correlation matrix constructed. The intensity of the color at the intersect between a column and row represents the average negative correlation coefficient for that particular group comparison. The size of the circle within each colored box represents the percent of lipids that correlated. From this correlation matrix, it is evident that 18 carbon sphingoid base ceramides tended to negatively correlate with 22 carbon sphingoid base ceramides. Furthermore, cholesterol sulfate tends to negatively correlate with 26 carbon sphingoid base ceramides (E) Correlation matrix displaying positive correlations between lipid classes. The strongest positive correlations are between lipids of the same class. Also, cholesterol sulfate tends to positively correlate with AS, EOS, and NS ceramides. (F) Correlation matrix displaying negative correlations between lipid classes.
Supplemental Figure 14. Altered gene expression in ELOVL4 low -expressing keratinocytes. Keratinocyte RNA-seq datasets were parsed into three groups based on the ELOVL4 allele they expressed (0/0 representing the ELOVL4 reference allele and 0/1 and 1/1 representing heterozygosity and homozygosity for the ELOVL4 variant, rs62407622). Keratinocytes homozygous for rs62407622 expressed significantly lower levels of ELOVL4. The same observation was noted when psoriasis patients were parsed for the same allele; rs62407622 homozygosity was associated with low ELOLV4 expression in psoriasis lesional skin. Keratinocytes homozygous for the ELOVL4 low variant also expressed significantly lower levels of the lipid genes, CERS3 and SPTLC3; the keratin genes, KRT77 and KRT10; the skin barrier genes, CSTA, KLK7, PDAI1, and TGM5; and lower levels of IL18, an IL1-family member cytokine. Also shown are scatter plots of keratinocyte gene expression (primary keratinocyte cell lines). There was a significant positive correlation between ELOVL4 and each of the aforementioned genes.

Supplemental Tables
Supplemental Table 1. Targeted quantification of epidermal lipids across different biogeographic regions.
Supplemental Table 2. Mixed-effects model of epidermal lipid expression.
Supplemental Table 3. RNA-seq analysis of lipid associated metabolic genes in palm versus trunk skin.
Supplemental Table 4. Targeted quantification of epidermal lipids in psoriasis, atopic dermatitis, and other skin diseases.
Supplemental Table 5. Lipid expression in lesional versus non-lesional psoriasis skin.
Supplemental Table 6. Lipid expression in psoriasis lesional skin compared to other skin diseases.
Supplemental Table 7. Lipid expression in atopic dermatitis lesional skin compared to atopic dermatitis non-lesional skin.
Supplemental Table 8. Differential lipid expression between atopic dermatitis and psoriasis lesional skin.