Skip to main content

A systematic simulation-based meta-analytical framework for prediction of physiological biomarkers in alopecia



Alopecia or hair loss is a complex polygenetic and psychologically devastating disease affecting millions of men and women globally. Since the gene annotation and environmental knowledge is limited for alopecia, a systematic analysis for the identification of candidate biomarkers is required that could provide potential therapeutic targets for hair loss therapy.


We designed an interactive framework to perform a meta-analytical study based on differential expression analysis, systems biology, and functional proteomic investigations. We analyzed eight publicly available microarray datasets and found 12 potential candidate biomarkers including three extracellular proteins from the list of differentially expressed genes with a p-value < 0.05. After expression profiling and functional analysis, we studied protein–protein interactions and observed functional associations of source proteins including WIF1, SPON1, LYZ, GPRC5B, PTPRE, ZFP36L2, HBB, PHF15, LMCD1, KRT35 and VAV3 with target proteins including APCDD1, WNT1, WNT3A, SHH, ESRI, TGFB1, and APP. Pathway analysis of these molecules revealed their role in major physiological reactions including protein metabolism, signal transduction, WNT, BMP, EDA, NOTCH and SHH pathways. These pathways regulate hair growth, hair follicle differentiation, pigmentation, and morphogenesis. We studied the regulatory role of β-catenin, Nf-kappa B, cytokines and retinoic acid in the development of hair growth. Therefore, the differential expression of these significant proteins would affect the normal level and could cause aberrations in hair growth.


Our integrative approach helps to prioritize the biomarkers that ultimately lessen the economic burden of experimental studies. It will also be valuable to discover mutants in genomic data in order to increase the identification of new biomarkers for similar problems.


Alopecia or baldness is characterized by either patchy or complete loss of scalp hair [1]. It affects millions of men and women of all ethnic backgrounds and can be psychologically shocking. It is a complex polygenic disease that can occur at any age affecting the general population between 1 and 2% [1, 2]. The severity of the disease varies widely in the population, but very little is known about genetic and environmental factors. A comprehensive understanding of the physiological mechanisms and signaling pathways involved in the progression of the disease is lacking. How inflammation and infectious agents, genetic and immunological factors are associated with the pathophysiology of alopecia is not clear. The currently available treatment options have variable efficacy and are not satisfactory [3, 4] due to limited understanding of the causes and prognosis. The high prevalence and ineffective cure call for extensive research to establish better therapeutic strategies [5]. A systematic approach is required to identify candidate biomarkers regulating major biological pathways linked to alopecia and potential therapeutic targets for its therapy.

Biomarkers are quantifiable traits that can be used to analyze normal as well as pathological processes [6]. These candidate molecules can be used for prediction of relapse, screening and to examine the response of the treatment of hair loss. Due to a lack of a suitable analytical framework, significant problems have been confronted during the translation of candidate biomarkers from cDNA microarray research into proteomic investigations for clinical applications [7,8,9]. The gene expression profiling of the epidermal and dermal keratinocytes is responsible for the activation of adipogenic factors and the levels of chemical-markers [58] including WIF1, GPRC5B, PTPRE, and LMCD1. These biological molecules are known to regulate hair growth cycle. Recent progress in genomics and proteomics analysis enabled the identification of many proteins and the discovery of new biomarkers.

Microarray and quantitative proteomics generally produce a massive amount of data that need to be further examined to find biomarkers and drug targets. For genome-wide pathological analysis, numerous expression data sets are publicly available. Since data unavailability and platform diversity pose major challenges to the analysis of human diseases, the meta-analysis approach has received significant consideration for analyzing various datasets. This approach offers more detailed analysis, statistical support, and significant results compared to any single study [10, 11]. Simulation-based differential gene expression has enabled the discovery of genetic variants and the generation of a quantitative gene expression scoring system that is critical for disease pathogenesis. In this study, we designed a systematic and logical computational-based meta-analytical framework involving systems biology approaches that would be helpful in identifying the causative agents of hair loss leading to the discovery of biomarkers. Differential expression analysis of eight microarray datasets was performed to determine candidate genes with significant differential regulation scoring system. Identification of key proteins as potential biomarkers based on their expression patterns has become possible through the use of databases which reveal the expression of several genes in tissues [12]. Comprehensive in silico systems-level biological analysis was employed to determine significant alopecia biomarkers through a multistep prioritization method. The physicochemical and functional characteristics of these protein molecules were also studied. Moreover, the protein–protein interactions and pathways networks showed the functional association of our candidate proteins with hair follicle development and regulation of β-catenin, Nf-kappa B, cytokines, and retinoic acid levels. Probing of new biomarkers can deliver valuable insights in both biological and medical research after studying these molecular networks. It also increases biomarker discovery and can be used in large public datasets for better results in other diseases as well.


Identification and accession of eligible source data

The aim of the meta-analysis was to identify potential biomarkers for alopecia to effectively diagnose the causes of hair loss. Alopecia-related expression datasets were identified and retrieved from the Gene Expression Omnibus (GEO) database excluding the non-human studies. Information extracted from each dataset covers GEO accession number, sample type, platform, number of samples, and gene expression data. This study examines the genes commonly covered by hgu133plus2 chips. In these datasets, the Affymetrix Gene Chip Human Genome U133Plus2.0 Array platform and the annotation information (hgu133plus2) of probes were used to check differentially expressed genes. R-platform and BioConductor packages including Affy, AffyQCReport, AffyRNADegradation, AnnotationDbi, Annotate, Biobase, Limma, hgu133a2cdf and hgu133plus2cdf were employed to evaluate the statistical significance of the results (Additional file 1: Fig. S1).

Normalization and differential expression analysis

Normalization was used to compare microarray data sets. The pheno-data files of these data sets were organized in an identifiable format [13]. Background correction, i.e. for perfect matches (PM) and mismatches (MM), was calculated as given in equation (i). Robust Multi-Array Analysis (RMA) was used to remove local artifacts and noise [14, 15].

$$PM_{ijk} = BG_{ijk} + S_{ijk} \left( {\text{i}} \right)$$

where, PM is a perfect match, Background (BG) caused by optical noise and non-specific binding (S); ijk is the signal for probe j of probe set k on array i.

$$BG\left( {PM_{ijk} } \right) \, = E\left[ {S_{ijk} |PM_{ijk} } \right] > 0$$
$$S_{ijk} \sim Exp(\lambda_{ijk} )BG_{ijk} \sim N(\beta i,\sigma^{ 2} )$$

where, PM-data is a perfect match (hybridization) representing combination of background (BG) and signals (S) of expression (E) for a set of probes (ijk), normalized (N) at default parameters. The signals of expressions were used to compute probe affinities. BioConductor “Array Quality Metrics” package was used to analyze the dataset that is normalized to the median expression level of each gene [14, 16, 17]. The expression value of a transcript having a p-value < 0.15 was considered as marginal log transformation and quantile normalization of arrays was used to bring them to the same scale. Gene–gene covariance matrices of each data set, ignoring the missing values, were calculated across all arrays (54675affyids). The formula for transformation is:

$$X_{norm} = F 2^{ - 1} \left( {F 1\left( x \right)} \right)$$

where, F1 and F2 are distribution functions of the actual and reference chips, respectively.

We used the RMA-algorithm to calculate averages between probes in a probe set in order to get a summary of intensities. BioConductor “AffyRNADegradation” package was used for RNA-degradation analysis and to measure the quality of RNA in these samples [18, 19]. In this study, we identified DEGs in each dataset from genomic experiments by pairwise comparison [20] and the Benjamini–Hochberg method was used for multiple testing corrections [21]. DEG sand duplicates pots along with the measurement of quality weights were shortlisted. The moderated statistics were calculated, genes were ranked and p-values were calculated. FDR < 0.05 (false discovery rate), p-value ≤ 0.05, Average Expression Level (AEL) ≥ 40% and absolute log fold change (logFC) > 1 were set as significant cut-off values [22].

Gene overlap significance analysis

Common genes among ranked DEGs (p-value ≤ 0.05) of each microarray dataset included in this study were identified using Compare Two Lists (Bioinformatics & Research Computing, Whitehead Institute

Subcellular localization and identification of secreted proteins

We predicted the subcellular localization of biomarker candidate proteins. We predicted the subcellular localization of alopecia-related significant DEGs using Signal P4.1 [23] and CELLO v.2.5 [24] to find the cellular compartments and categorize the selected proteins for effective applications. In most of the cases, the secretory and extracellular proteins of several cells are important sources of biomarker discovery as they indicate different states of the cells at a real-time [25]. These extracellular proteins that are secreted into the blood or plasma could be a convenient and easy way for patient clinical screening. The abnormal blood and plasma levels of such secretory proteins in various conditions would relate to the pathological conditions. Amino acid sequences as FASTA input format were used. SignalP4.1server was used to predict classical secretory proteins with D-value > 0.45 and SecretomeP2.0 server for non-classical secretory proteins with a neural network (NN) score = 0.5 [26]. Trans-membrane proteins were predicted using TMHMMv.2. with default options [27]. For biomedical text mining, the predicted genes were mapped with alopecia using the Comparative Toxicogenomics Database (CTD) to filter the information and related literature for the association of our selected differentially expressed genes with alopecia [28].

Expression profile and functional enrichment analysis

We performed expression profile analysis based on average values of expression in each sample of each dataset in order to determine variations in gene expression among different sample sets [29, 30]. The functional annotation, gene ontology (GO) and pathway enrichment analysis [31, 32] of alopecia-related genes helped us to reveal biological functions and they were performed using the web-based FunRich [33], DAVID and Enrichr [34] annotation tools.

Physicochemical and functional proteomic analysis

We analyzed the physicochemical properties of selected biomarker proteins by ExPASy ProtParam, ProtScale and Peptide Mass tools [35]. For empirical investigation and validation of the studies of two-dimensional (2-D) gels and mass spectrometry, these tools compute various physical and chemical parameters of proteins. ProtParam was used to calculate various physicochemical parameters, ProtScale for hydrophobicity analysis and Peptide Mass was used for calculating the masses of possible cleavable peptides and any known post-translational modifications. For further structural and functional analysis, PDBePISA interactive tool [36] was employed for the exploration of macromolecular interfaces. We identified all known motifs in our protein samples by Motif Scan web server [37]. Large proteins that make up structural and functional units have evolved by duplication of internal sequence repeats. Therefore, we identified gapped approximate repeats and short composition biased in protein sequences using RADAR web server [38, 39].

Interactomic analysis

The protein–protein interactions (PPIs) reveal the topology and functional interaction of proteins that are useful to assess biological and pathological conditions [40, 41]. We can infer the functional relationship of proteins through genomic associations. In the PPI network, each protein is considered to have a relationship with one or more gene-sets connected with biological or molecular functions [42]. Functions of this biological network may show great differences of activity in the disease state as compared to normal. The functional interactors of source genes were retrieved with a high confidence score (0.999) from the STRING version-10 [43] and HAPPI databases [44]. For visualization and analysis of molecular networks, Cytoscape (version 3.2.1) was used [45] to identify the role of each seeder and target genes in alopecia and hair loss. In the network, the source nodes were categorized according to the degree of association with alopecia and any disturbance in the functionality of genes contributes to a disease phenotype. In Cytoscape, Network Analyzer was used to calculate topological properties of the network.

Pathway-molecular mechanisms analysis in alopecia

The pathway analysis could be an essential tool for biomarker discovery to better understand the underlying biochemical mechanisms. KEGG, Reactome and Wiki pathways were used to curate and map the candidate biomarkers. We constructed an integrated metabolic network of alopecia-related potential biomarkers using PathVisio version-3 tool [46]. In this integrated network, the potential mechanism of each genetic marker in each pathway was studied based on data mining.

Statistical analysis

We performed multiple testing to evaluate the significance of candidate biomarkers. Average expressions of genes between two phenotypic classes were compared, for gene g. The two-sample t-statistic was calculated. The false discovery rate (FDR) [21] is commonly used in the multiple testing of high-dimensional genomic data as a criterion for controlling false positives [47]. FDR = V/R is defined as the expected proportion of false positives among the genes declared significant. Since no null hypothesis was rejected, V/R is considered to be 0 when R = 0. The determination of the number of biological replicates is one of the current methods for controlling true positives. The variance was used to analyze the importance of differences between these groups. Data were expressed as mean ± SD. We performed all statistical analysis using the Limma-package of the R-platform version 3.1.3. The decision rule was set to p-value < 0.05.


cDNA datasets and normalization

We accessed eight GEO datasets related to alopecia, hair loss, and scalp hair follicles cases. The AffyBatch object has the size of the arrays: 1164 × 1164, 732 × 732, 712 × 712, and 448 × 448 features with related Affyids (Table 1). Normalized distances between arrays of DNA chip showed the median expression level and the quantile normalization of the probes. The MA-plots indicated the quality of the individual array of each dataset after normalization (Fig. 1). The gene–gene covariance matrix across all arrays in each dataset by ignoring missing values was computed following a log-transformation of the arrays to make sure they were on the same scale. Decreased amount of RNA hybridization renders low-quality results and lowers the total signal strength. The intensity gradient 3′/5′ depends on the degree of competitive binding of specific and non-specific targets to a particular probe. The RNA degradation-plot of each dataset shows short probe sets near the 3′-end of the transcripts indicating that the 3′/5′ intensity gradient decreases by increasing degrees of saturation (Fig. 2). A single summary statistic for each array in the batch provides an assessment of the severity of RNA-degradation level (Additional file 2: Table S1). The list of tools, databases and software used in this study are shown in Additional file 3: Table S2.

Table 1 List of cDNA datasets
Fig. 1
figure 1

MA plots: individual array quality of each dataset after normalization. M = log2 (I1) − log2 (I2), A = 1/2 [(log2 (I1) + log2 (I2)], where I1 is the intensity of the array studied and I2 is the intensity of a “pseudo-array” that consists of the median across arrays. Typically, we expect the mass of the distribution in an MA plot to be concentrated along the M = 0 axis, and there should be no trend in M as a function of A

Fig. 2
figure 2

RNA degradation plot of each dataset: side-by-side plot produced by plotAffyRNAdeg representing 5′ to 3′ trend presenting an assessment of the severity of degradation and significance level

Identifying DEGs and screening for potential biomarkers

We identified a total of 100 DEGs in each microarray dataset by pairwise comparison between biologically comparable groups. Among these DEGs, the top 20 genes in each dataset were ranked and selected based on FDR (< 0.05), p-value (≤ 0.05) and logFC (> 1) parameters. From the list of ranked genes, 12 common genes (as potential biomarkers) of each dataset were identified (Additional file 4: Table S3). The sub-cellular localization of these genes indicates that WIF1, SPON1, and LYZ are extracellular, GPRC5B is membrane bound, PTPRE, VAV3, and HBB are cytoplasmic, ZFP36L2, PHF15, LMCD1, and KRT35 are nuclear while HBB is a mitochondrial-associated protein. We determined the transmembrane helices of these proteins based on a Hidden Markov Model (Additional file 5: Fig. S2). For disease-gene mapping, we found the role of these genes in alopecia and hair-loss and observed that SPON1 is the enriched term (Fig. 3).

Fig. 3
figure 3

Screening steps to filter potential biomarkers for alopecia. The bar graph indicates the disease-gene mapping (alopecia-potential genes) using the comparative toxicogenomics database (CTD)

Expression profiling and functional enrichment analysis

We analyzed the expression level of 12 candidate biomarkers to observe the global picture of cellular functions between the biological samples of each dataset. This profiling showed the variation in expression levels and the relative activity of these genes in hair loss. In the heat map, the major variations in the expression levels of PHF15, KRT35, HBB, LMCD1, and WIF1 of each dataset are observed (Fig. 4). Pathway-enriched terms indicate a significant association of DEGs in biological pathways related to alopecia (Fig. 5). The pathway enrichment analysis of these genes uncovered important background information about cell growth, energy pathways, metabolism, and signal transduction (Table 2).

Fig. 4
figure 4

Expression profiling (indicates expression level) of potential biomarker candidates in each sample of each dataset

Fig. 5
figure 5

The functional annotation and GO analysis indicate important molecular functions of selected biomarker proteins using FunRich and DAVID annotation tools

Table 2 Pathway enrichment analysis of potential biomarkers using Enrich tool

Identifying physicochemical and functional properties

Proteins possess different functional properties due to their dynamic physical, chemical, structural and amphiphilic nature. To explore the relationship of these properties, we investigated the physicochemical properties of selected potential biomarkers to facilitate the validation of studies using ProtParam, ProtScale, and Peptide Mass tools (Table 3). The total number of amino acid (aa) residues of biomarker proteins including WIF1 (379 aa), SPON1 (807 aa), PTPRE (700 aa) and LYZ (148 aa) showed a significant charge density while the hydrophobicity range at the C and N-terminus enabled an insight of the Îs-subunits of these proteins. The theoretical-pI indicated the solubility of these proteins at a given pH while the instability index showed that HBB and LYZ are more stable in vitro as compared to other proteins.

Table 3 Physicochemical properties of selected potential biomarkers using ProtParam, ProtScale, and PeptideMass tools

For functional proteomic analysis, we summarized and visualized the macromolecular interfaces of these proteins. The known amino acid residues of these proteins in the interface contact matrix are used to describe the molecular surface at the interface. The sequence motif indicated the pattern of biological significance involved in the shape of proteins and binding sites (Additional file 6: Fig. S3). We analyzed the motif sites of these proteins and identified the gapped approximate repeats and composition of residues for structural significance.

Interactomic analysis

To evaluate the topology and functional annotation of proteins, a protein–protein interaction (PPI) network was constructed. The network involved 111 nodes and 104 edges (nodes represent proteins and edges denote interaction), containing high scoring interaction partners (confidence score: > 0.9). The PPI network was largely categorized into three neighborhoods: light yellow and red nodes indicate the secretory and cell-associated potential biomarkers while the remaining blue nodes represent the other target proteins. We found that these candidate biomarkers interact functionally with other important protein targets including APCDD1 [48], WNT1, WNT31 [49], SHH [50], ESRI, TGFB1 [51] and APP [52]. This functional relationship shows the genomic connectivity of these proteins to disease phenotypes (Fig. 6). Of these proteins, SPON1 (spondin1, extracellular matrix protein) is principally interacting with SHH (sonic hedgehog protein facilitates androgen metabolic processes) and WIF1 (WNT inhibitory factor1) while WNT3A (Wnt-3a protein involved in WNT signaling pathway and hair development) is also interacting with WIF1. Similarly, PHF15 (PHD finger protein-15) is associated with the family of PHF11 and PHF12 contributing to clinical phenotype. We assessed the topological properties of the network using Network Analyzer to classify and optimize the network performance. After disease-gene mapping of these network genes using CTD, we observed more than 90 target genes having a functional connection with candidate biomarkers (source/seeder genes) in alopecia. Among them, RAC1, LPAR1, MAPK1, ITM2B, LTF are the enriched terms (Additional file 7: Fig. S4).

Fig. 6
figure 6

Protein–Protein Interaction Network. Interaction of seeder/source nodes (light yellow and red) with target nodes (light blue). The topological properties of the network were analyzed by Network Analyzer

Studying pathway models and associated mechanisms

We studied the role of potential biomarkers in associated pathways to explore the molecular mechanisms of these molecules in alopecia and hair loss. These proteins are critically involved in the regulation of integrated pathways including protein metabolism, signal transduction, Wnt, BMP, Eda, Notch, and Shh pathways. The genetic ontology of these pathways is related to hair follicle differentiation, hair follicle morphogenesis and hair shaft pigmentation (Fig. 7a). Therefore, hair growth and shape depend on these pathways which interact with epithelial and mesenchymal cells. Integration of Wnt with the Eda pathway upregulates the expression of Shh ligand. The cascade of reaction is unified with BMP signaling and regulation of epithelial Notch expression [49]. The levels of β-catenin and Nf-kappa B [53], cytokines [35], retinoic acid [2] and the interaction cycle catalyzed by WIF1, SPON1, SHH, WNT3A, GPCR5, and LYZ are critical for modulating the signaling of Wnt, Shh, Notch and allied pathways. Regular expression patterns of these genes would affect the synthesis of these bio-molecules and dysregulation of these pathways lead to certain abnormalities of hair growth. In Fig. 7b, a circos graph represents the expression levels and interconnection of differentially expressed genes in alopecia.

Fig. 7
figure 7

Reactomic analysis and molecular mechanisms in alopecia. a The role of integrated pathways in hair follicle differentiation, hair follicle shape, hair shaft pigmentation, and hair development. The pathways have been mapped using KEGG, Reactome and Wiki Pathways. Color codes are used to describe reaction steps of pathway model. b Circos Graph generated by Gene Terrain indicates the expression levels and the relationship of differentially expressed genes of each pathway

Statistical analysis

We cross-validated our analysis by statistical algorithms. The significance of biomarker proteins in alopecia and hair loss between two phenotypes was tested based on logFC (> 1), Ave Expr (AEL ≥ 40%), p-value (< 0.05) and t-test statistics (Prob > |t|: 0.92944) (Fig. 8). We determined the false discovery rate (FDR) as a criterion for controlling false positives.

Fig. 8
figure 8

Statistical validation of potential biomarkers based on logFC, AveExpr, p-value and t-test statistics (classifier score)


In recent years, the advances in technological developments have opened up a new era in systems biology, structural, functional and clinical proteomics. Specifically, functional proteomics helps us to understand the fundamental mechanisms of various pathophysiological states and reveals new opportunities for biomarker discovery. The current study critically explores the variations of gene expression levels in alopecia and hair loss. The meta-analysis provides a list of differentially expressed genes of scalp epidermis, hair follicle, dermal papilla and correlates with metabolic pathways. The expression profiling of these genes is indicative of the obvious differences among different tissue samples of clinical phenotypes.

We used various bioinformatics software, tools, and databases to systematically investigate expression data in order to discover specific protein biomarkers. The computational framework employed provides a logical step of the meta-analysis of cDNA microarray datasets associated with the exploration of DEGs. The steps implemented in this study are consistent and complement those technological pipelines described earlier [54, 55].

We filtered 12 common genes as potential biomarkers, namely WIF1, SPON1, LYZ, GPRC5B, PTPRE, ZFP36L2, HBB, PHF15, LMCD1, KRT35, HBB, and VAV3 (p < 0.05), from the list of 100 DEGs based on physicochemical and functional analysis. The dysregulation and functional role of these differentially expressed genes have also been studied in hair physiology [35, 49, 53]. In the protein–protein interaction network, we identified that these possible biomarkers are functionally associated with other interacting protein targets including APCDD1, WNT1, WNT31 [49], SHH [50], ESRI, TGFB1 [51] and APP [52]. The APCDD1 gene inhibits the WNT signaling pathway (WIF1), ultimately interfering with BMP, Eda, and Notch pathways, and this aberration causes alopecia and hair loss [48]. We observed more than 90 target proteins having a functional connection with candidate source genes in alopecia.

These candidate biomarkers are involved in the regulation of biological pathways including protein metabolism, signal transduction, Wnt, BMP, Eda, Notch, and Shh pathways [35, 53, 56]. The gene ontology (GO) is related to hair follicle differentiation, hair follicle morphogenesis and hair shaft pigmentation. Therefore, hair growth and shape depend on these important pathways that interact with epithelial and mesenchymal cells. WNT starts signaling through the signal mediator EDA/EDAR/NF-κB, i.e. nuclear factor kappa-light-chain-enhancer of activated B-cells, by upregulating Shh ligand expression. The pathway cycle is integrated with BMP signaling and upregulation of epithelial Notch expression [49]. Therefore, abnormalities of these pathways lead to irregularities of hair growth. The levels of β-catenin and Nf-kappa B are critical to modulate the signaling of WNT, SHH, and allied pathways [53]. Therefore, mutations in WIF1 would affect this level and hair growth. The cytokines [35] and retinoic acid are known to change skin and hair growth [2]. It has been observed that SPON1, GPC5B, and LYZ metabolize the proteins to synthesize retinoic acid and cytokines. The irregular expression pattern of these genes would change the normal level of these biomolecules affecting hair follicle function. The pathway modeling and integrated network-based analysis showed that these potential biomarkers are important for hair follicle development and variations in β-catenin, Nf-kappa B, cytokines and retinoic acid can be considered as diagnostic indicators of alopecia and hair loss. The protein–protein interaction network provides testable assumptions, albeit validation can only be justified through experimental studies.

The differential expression of WIF1 [57], PTPRE, HBB, and LMCD1 is associated with hair loss, morphogenesis, and alopecia. SPON1 and GPRC5B catalyze post-translational modifications and are integrated with WNT, Eda, Notch, Shh, and RAIG1 pathways. SPON1 is over-expressed in growing hair follicles [58, 59]. GPRC5 is associated with the differentiation of cells involved in the production of hard keratin and cortical cells of the hair shaft [59] and its dysregulation cause hair loss. LYZ-Ndufs4 showed a similar phenotype and disease conditions with significant attenuation of the alopecia [60]. Protein tyrosine phosphatase (PTPRE) and PHD finger protein 15 (PHF15) are differentially expressed in bald individuals responsible for androgenetic alopecia [61]. In animals, it has been reported that the down-regulation of ZFP36L2 gene that synthesizes a proline-rich zinc finger protein causes patchy alopecia [62]. The differential expression of LIM and cysteine-rich domains 1 (LMCD1) causes the aberration of Notch and Hedgehog signaling pathways leading to alopecia and hair-loss [63]. Under-expression of KRT35 and VAV3 genes level down the keratins and vav-3 guanine nucleotide exchange factor respectively, affecting the signaling pathways and hair growth [64, 65]. This study suggests that differential expression of these genes in both normal and mild state may be used as clinical biomarkers. We propose that the combination of several indicators may be more useful in diagnostic sensitivity and specificity instead of using a single biomarker.


Systems biology studies of cDNA datasets helped us to identify potential DEGs of alopecia and hair loss. The methodological framework used revealed genome to phenome association in alopecia and hair loss. Our prioritization approach has found 12 potential physiological biomarkers of alopecia connected with other vital proteins including SHH and APCDD1. However, further molecular studies and research should be considered to authenticate the role of these genes in alopecia for effective treatment.


  1. Safavi KH, Muller SA, Suman VJ, Moshell AN, Melton LJ III. Incidence of alopecia areata in Olmsted County, Minnesota, 1975 through 1989. Mayo Clin Proceed. 1995;70:628–33.

    Article  CAS  Google Scholar 

  2. Everts HB, King LE Jr, Sundberg JP, Ong DE. Hair cycle-specific immunolocalization of retinoic acid synthesizing enzymes Aldh1a2 and Aldh1a3 indicate complex regulation. J Invest Dermatol. 2004;123:258–63.

    Article  CAS  Google Scholar 

  3. Harries MJ, Sun J, Paus R, King LE Jr. Management of alopecia areata. BMJ. 2010;341:c3671.

    Article  CAS  Google Scholar 

  4. Tosti A, Duque-Estrada B. Treatment strategies for alopecia. Exp Opin Pharmacother. 2009;10:1017–26.

    Article  CAS  Google Scholar 

  5. Jabbari A, Cerise JE, Chen JC, Mackay-Wiggan J, Duvic M, Price V, et al. Molecular signatures define alopecia areata subtypes and transcriptional biomarkers. EBioMedicine. 2016;7:240–7.

    Article  Google Scholar 

  6. Bhatt AN, Mathur R, Farooque A, Verma A, Dwarakanath B. Cancer biomarkers-current perspectives. Indian J Med Res. 2010;132:129–49.

    CAS  PubMed  Google Scholar 

  7. Anderson NL. The roles of multiple proteomic platforms in a pipeline for new diagnostics. Mol Cell Proteom. 2005;4:1441–4.

    Article  CAS  Google Scholar 

  8. Paulovich AG, Whiteaker JR, Hoofnagle AN, Wang P. The interface between biomarker discovery and clinical validation: the tar pit of the protein biomarker pipeline. Proteomics Clin Appl. 2008;2:1386–402.

    Article  CAS  Google Scholar 

  9. Rifai N, Gillette MA, Carr SA. Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat Biotechnol. 2006;24:971–83.

    Article  CAS  Google Scholar 

  10. Choi JK, Yu U, Kim S, Yoo OJ. Combining multiple microarray studies and modeling interstudy variation. Bioinformatics. 2003;19:i84–90.

    Article  Google Scholar 

  11. Hong F, Breitling R. A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments. Bioinformatics. 2008;24:374–82.

    Article  CAS  Google Scholar 

  12. Klee EW, Finlay JA, McDonald C, Attewell JR, Hebrink D, Dyer R, et al. Bioinformatics methods for prioritizing serum biomarker candidates. Clin Chem. 2006;52:2162–4.

    Article  CAS  Google Scholar 

  13. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17:520–5.

    Article  CAS  Google Scholar 

  14. Bolstad BM, Irizarry RA, Åstrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–93.

    Article  CAS  Google Scholar 

  15. Wu Z, Irizarry RA, Gentleman R, Martinez-Murillo F, Spencer F. A model-based background adjustment for oligonucleotide expression arrays. J Am Stat Assoc. 2004;99:909–17.

    Article  Google Scholar 

  16. Fujita A, Sato JR, de Oliveira Rodrigues L, Ferreira CE, Sogayar MC. Evaluating different methods of microarray data normalization. BMC Bioinform. 2006;7:469.

    Article  Google Scholar 

  17. Obenchain V, Lawrence M, Carey V, Gogarten S, Shannon P, Morgan M. VariantAnnotation: a bioconductor package for exploration and annotation of genetic variants. Bioinformatics. 2014;30:2076–8.

    Article  CAS  Google Scholar 

  18. Affymetrix Microarray Suite User Guide version 5.0. Santa Clara, CA. 2000.

  19. Affymetrix Mircoarray Suite User Guide version 5.0. Santa Clara, CA. 2001.

  20. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001;98:5116–21.

    Article  CAS  Google Scholar 

  21. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B. 1995;57:289–300.

    Google Scholar 

  22. Liu X, Wu J, Zhang D, Bing Z, Tian J, Ni M, et al. Identification of potential key genes associated with the pathogenesis and prognosis of gastric cancer based on integrated bioinformatics analysis. Front Genet. 2018;9:265.

    Article  Google Scholar 

  23. Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011;8:785–6.

    Article  CAS  Google Scholar 

  24. Lu Z, Szafron D, Greiner R, Lu P, Wishart DS, Poulin B, et al. Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics. 2004;20:547–56.

    Article  CAS  Google Scholar 

  25. Stastna M, Van Eyk JE. Secreted proteins as a fundamental source for biomarker discovery. Proteomics. 2012;12:722–35.

    Article  CAS  Google Scholar 

  26. Bendtsen JD, Jensen LJ, Blom N, Von Heijne G, Brunak S. Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel. 2004;17:349–56.

    Article  CAS  Google Scholar 

  27. Möller S, Croning MDR, Apweiler R. Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics. 2001;17:646–53.

    Article  Google Scholar 

  28. Clematide S, Rinaldi F. Ranking relations between diseases, drugs and genes for a curation task. J Biomed Semantics. 2012;3:S5.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–8.

    Article  CAS  Google Scholar 

  30. Gellert P, Jenniches K, Braun T, Uchida S. C-It: a knowledge database for tissue-enriched genes. Bioinformatics. 2010;26:2328–33.

    Article  CAS  Google Scholar 

  31. Nam D, Kim S-Y. Gene-set approach for expression pattern analysis. Brief Bioinform. 2008;9:189–97.

    Article  Google Scholar 

  32. Muhammad SA, Fatima N, Syed NIH, Wu X, Yang XF, Chen JY. MicroRNA expression profiling of human respiratory epithelium affected by invasive Candida infection. PLoS ONE. 2015;10:e0136454.

    Article  Google Scholar 

  33. Pathan M, Keerthikumar S, Ang CS, Gangoda L, Quek CY, Williamson NA, et al. FunRich: an open access standalone functional enrichment and interaction network analysis tool. Proteomics. 2015;15:2597–601.

    Article  CAS  Google Scholar 

  34. Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinform. 2013;14:128.

    Article  Google Scholar 

  35. Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J Mol Biol. 2007;372:774–97.

    Article  CAS  Google Scholar 

  36. Pagni M, Ioannidis V, Cerutti L, Zahn-Zabal M, Jongeneel CV, Hau J, et al. MyHits: improvements to an interactive resource for analyzing protein sequences. Nucleic Acids Res. 2007;35:W433–7.

    Article  Google Scholar 

  37. Heger A, Holm L. Rapid automatic detection and alignment of repeats in protein sequences. Proteins. 2000;41:224–37.

    Article  CAS  Google Scholar 

  38. McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, et al. Analysis tool web services from the EMBL-EBI. Nucleic Acids Res. 2013;41:W597–600.

    Article  Google Scholar 

  39. Li F, Long T, Lu Y, Ouyang Q, Tang C. The yeast cell-cycle network is robustly designed. Proc Natl Acad Sci USA. 2004;101:4781–6.

    Article  CAS  Google Scholar 

  40. Muhammad SA, Ahmed S, Ali A, Huang H, Wu X, Yang XF, et al. Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach. Genomics. 2014;104:24–35.

    Article  CAS  Google Scholar 

  41. Rachlin J, Cohen DD, Cantor C, Kasif S. Biological context networks: a mosaic view of the interactome. Mol Syst Biol. 2006;2:66.

    Article  Google Scholar 

  42. Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011;39:D561–8.

    Article  CAS  Google Scholar 

  43. Chen JY, Mamidipalli S, Huan T. HAPPI: an online database of comprehensive human annotated and predicted protein interactions. BMC Genomics. 2009;10:S16.

    Article  Google Scholar 

  44. Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, et al. Integration of biological networks and gene expression data using Cytoscape. Nat Protoc. 2007;2:2366–82.

    Article  CAS  Google Scholar 

  45. Kutmon M, van Iersel MP, Bohler A, Kelder T, Nunes N, Pico AR, et al. PathVisio 3: an extendable pathway analysis toolbox. PLoS Comput Biol. 2015;11:e1004085.

    Article  Google Scholar 

  46. Dupuy A, Simon RM. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst. 2007;99:147–57.

    Article  Google Scholar 

  47. Nagy N, Farkas K, Kinyó Á, Fazekas B, Szabó KÁ, Kollár E, et al. A synonymous polymorphism of APCDD1 affects translation efficacy and is associated with androgenic alopecia. J Life Sci. 2014;8:106–14.

    CAS  Google Scholar 

  48. Rishikaysh P, Dev K, Diaz D, Qureshi WMS, Filip S, Mokry J. Signaling involved in hair follicle morphogenesis and development. Int J Mol Sci. 2014;15:1647–70.

    Article  CAS  Google Scholar 

  49. Hammerschmidt B, Schlake T. Localization of Shh expression by Wnt and Eda affects axial polarity and shape of hairs. Dev Biol. 2007;305:246–61.

    Article  CAS  Google Scholar 

  50. Foitzik K, Spexard T, Nakamura M, Halsner U, Paus R. Towards dissecting the pathogenesis of retinoid-induced hair loss: all-trans retinoic acid induces premature hair follicle regression (catagen) by upregulation of transforming growth factor-β2 in the dermal papilla. J Invest Dermatol. 2005;124:1119–26.

    Article  CAS  Google Scholar 

  51. Kwak Y-D, Marutle A, Dantuma E, Merchant S, Bushnev S, Sugaya K. Involvement of notch signaling pathway in amyloid precursor protein induced glial differentiation. Eur J Pharmacol. 2011;650:18–27.

    Article  CAS  Google Scholar 

  52. Zhang Y, Tomann P, Andl T, Gallant NM, Huelsken J, Jerchow B, et al. Reciprocal requirements for EDA/EDAR/NF-kB and Wnt/b-catenin signaling pathways in hair follicle induction. Dev Cell. 2009;17:49–61.

    Article  Google Scholar 

  53. Gregoriou S, Papafragkaki D, Kontochristopoulos G, Rallis E, Kalogeromitros D, Rigopoulos D. Cytokines and other mediators in alopecia areata. Mediat Inflamm. 2010;2010:928030.

    Article  Google Scholar 

  54. Awan FM, Naz A, Obaid A, Ali A, Ahmad J, Anjum S, et al. Identification of circulating biomarker candidates for hepatocellular carcinoma (HCC): an integrated prioritization approach. PLoS ONE. 2015;10:e0138913.

    Article  Google Scholar 

  55. Addona TA, Shi X, Keshishian H, Mani DR, Burgess M, Gillette MA, et al. A pipeline that integrates the discovery and verification of plasma protein biomarkers reveals candidate markers for cardiovascular disease. Nat Biotechnol. 2011;29:635–43.

    Article  CAS  Google Scholar 

  56. Li J, Tzu J, Chen Y, Zhang YP, Nguyen NT, Gao J, et al. Laminin-10 is crucial for hair morphogenesis. EMBO J. 2003;22:2400–10.

    Article  CAS  Google Scholar 

  57. Kim YJ, Yoon B, Han K, Park BC. Comprehensive transcriptome profiling of balding and non-balding scalps in trichorhinophalangeal syndrome type i patient. Ann Dermatol. 2017;29:597–601.

    Article  Google Scholar 

  58. Donati G, Proserpio V, Lichtenberger BM, Natsuga K, Sinclair R, Fujiwara H, et al. Epidermal Wnt/β-catenin signaling regulates adipocyte differentiation via secretion of adipogenic factors. Proc Natl Acad Sci USA. 2014;111:E1501–9.

    Article  CAS  Google Scholar 

  59. Inoue S, Nambu T, Shimomura T. The RAIG family member, GPRC5D, is associated with hard-keratinized structures. J Invest Dermatol. 2004;122:565–73.

    Article  CAS  Google Scholar 

  60. Jin Z, Wei W, Yang M, Du Y, Wan Y. Mitochondrial complex I activity suppresses inflammation and enhances bone resorption by tipping the balance of macrophage-osteoclast polarization. Cell Metab. 2014;20:483–98.

    Article  CAS  Google Scholar 

  61. Chew EGY, Tan JHJ, Bahta AW, Ho BS, Liu X, Lim TC, et al. Differential expression between human dermal papilla cells from balding and non-balding scalps reveals new candidate genes for androgenetic alopecia. J Invest Dermatol. 2016;136:1559–67.

    Article  CAS  Google Scholar 

  62. Sanduja S, Blanco FF, Young LE, Kaza V, Dixon DA. The role of tristetraprolin in cancer and inflammation. Front Biosci (Landmark Ed). 2012;17:174–88.

    Article  CAS  Google Scholar 

  63. Shi FT, Yu M, Zloty D, Bell RH, Wang E, Akhoundsadegh N, et al. Notch signaling is significantly suppressed in basal cell carcinomas and activation induces basal cell carcinoma cell apoptosis. Mol Med Rep. 2017;15:1441–54.

    Article  CAS  Google Scholar 

  64. Pratt CH, King LE Jr, Messenger AG, Christiano AM, Sundberg JP. Alopecia areata. Nat Rev Dis Primers. 2017;3:17011.

    Article  Google Scholar 

  65. Eriksson N, Tung JY, Kiefer AK, Hinds DA, Francke U, Mountain JL, et al. Novel associations for hypothyroidism include known autoimmune risk Loci. PLoS ONE. 2012;7:e34442.

    Article  CAS  Google Scholar 

  66. Kim S, Dix DJ, Thompson KE, Murrell RN, Schmid JE, Gallagher JE, et al. Gene expression in head hair follicles plucked from men and women. Ann Clin Lab Sci. 2006;36:115–26.

    CAS  PubMed  Google Scholar 

  67. Garza LA, Yang CC, Zhao T, Blatt HB, et al. Bald scalp in men with androgenetic alopecia retains hair follicle stem cells but lacks CD200-rich and CD34-positive hair follicle progenitor cells. J Clin Invest. 2011;121:613–22.

    Article  CAS  Google Scholar 

  68. Garza LA, Liu Y, Yang Z, Alagesan B, et al. Prostaglandin D2 inhibits hair growth and is elevated in bald scalp of men with androgenetic alopecia. Sci Transl Med. 2012;4:126ra34.

    Article  Google Scholar 

  69. Higgins CA, Chen JC, Cerise JE, Jahoda CA, et al. Microenvironmental reprogramming by three dimensional culture enables dermal papilla cells to induce de novo human hair-follicle growth. Proc Natl Acad Sci USA. 2013;110:19679–88.

    Article  CAS  Google Scholar 

  70. Xing L, Dai Z, Jabbari A, Cerise JE, et al. Alopecia areata is driven by cytotoxic T lymphocytes and is reversed by JAK inhibition. Nat Med. 2014;20:1043–9.

    Article  CAS  Google Scholar 

  71. Gazel A, Blumenberg M. Transcriptional effects of inhibiting epidermal growth factor receptor in keratinocytes. Dermatologica Sinica. 2012.

    Article  Google Scholar 

  72. Ohyama M, Kobayashi T, Sasaki T, Shimizu A, et al. Restoration of the intrinsic properties of human dermal papilla in vitro. J Cell Sci. 2012;1:4114–25.

    Article  Google Scholar 

Download references

Authors’ contributions

SM and JC supervised the study. HS, FN, and PR collected the data. SM, JC, and AA analyzed the data. JC and AA provided guidance with study design and data analysis. All authors contributed to manuscript writing and edition. All authors read and approved the final manuscript.


Not applicable.

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

The data have been presented with the article.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.


Not applicable.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Syed Aun Muhammad or Jake Y. Chen.

Additional files

Additional file 1: Fig. S1.

A framework of our study designed to identify physiological biomarkers in alopecia.

Additional file 2: Table S1.

The function summaryAffyRNAdeg of Bioconductor package produced a single summary-statistic for each array in the batch dataset.

Additional file 3: Table S2.

List of Databases, Software, and Tools used in this study.

Additional file 4: Table S3.

Preliminary investigation of common and ranked differentially expressed genes of each microarray dataset.

Additional file 5: Fig. S2.

Prediction of transmembrane helices in selected potential biomarker proteins using TMHMM Server v. 2.0.

Additional file 6: Fig. S3.

Structural and functional properties of potential biomarker candidates. Studies of macromolecular interfaces using PDBePISA interactive tool. Motif scan in protein sequences were studied using Motif tool. Identified gapped approximate repeats and complex repeat architectures using RADAR (Rapid Automatic Detection and Alignment of Repeats) tool.

Additional file 7: Fig. S4.

Data mapping: The role of differentially expressed genes in alopecia was mapped using the Comparative Toxicogenomics Database (CTD).

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Muhammad, S.A., Fatima, N., Paracha, R.Z. et al. A systematic simulation-based meta-analytical framework for prediction of physiological biomarkers in alopecia. J of Biol Res-Thessaloniki 26, 2 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: