A potential prognostic model based on miRNA expression profile in The Cancer Genome Atlas for bladder cancer patients

This study aimed to construct prognostic model by screening prognostic miRNA signature of bladder cancer. The miRNA expression profile data of bladder cancer (BC) in The Cancer Genome Atlas (TCGA) were obtained and randomly divided into the training set and the validation set. Differentially expressed miRNAs (DEMs) between BC and normal control samples in the training set were firstly identified, and DEMs related to prognosis were screened by Cox Regression analysis. Then, the MiR Score system was constructed using X-Tile based cutoff points and verified in the validation set. The prognostic clinical factors are selected out by univariate and multivariate Cox Regression analysis. Finally, the mRNAs related to prognosis were screened and the biological pathway analysis was carried out. We identified the 7-miRNA signature was significantly associated with the patient’s Overall Survival (OS). A prognostic model was constructed based on the prognostic 7-miRNA signature, and possessed a relative satisfying predicted ability both in the training set and validation set. In addition, univariate and multivariate Cox Regression analysis showed that age, lymphovascular invasion and MiR Score were considered as independent prognostic factors in BC patients. Furthermore, based on MiR Score prognostic model, several differentially expressed genes (DEGs), such as WISP3 and UNC5C, as well as their related biological pathway(s), including cell–cell adhesion and neuroactive ligand-receptor interaction, were considered to be related to BC prognosis. The prognostic model which was constructed based on the prognostic 7-miRNA signature presented a high predictive ability for BC.

surgery and a high rate of recurrence and mortality in the short time after surgery [5]. As far as patients with BC are concerned, it is impossible to predict which of them will have disease progression. Therefore, it is important to further reveal novel diagnostic and therapeutic methods, as well as underlying risk factors for poor prognosis of BC patients.
MiRNAs (miRNAs) are endogenous non-coding singlestranded small RNA molecules that regulate gene expression by repressing translational efficiency or decreasing target mRNA stability, thereby participating in various key cell biological processes, such as embryonic development, tumor cell proliferation, differentiation, and apoptosis [6,7]. MiRNAs are known to be dysregulated in BC and implicated in the pathogenesis of bladder tumors mainly through their effect on genes involved in two molecular pathways, specifically the gene which codes tumor protein 53 (TP53) [8] and fibroblast growth factor receptor 3 (FGFR3) [9]. Previous studies have reported that the dysregulation of miRNAs is related to the prognosis and the progression of BC [1], such as miR-144-5p [10], miR-199 family [11], and miR-214 [12]. For example, Falzone et al. [13] reported that downregulated hsa-miR-145-5p and hsa-miR-214-3p may modulate the expression of both EMT and NGAL/MMP-9 pathways in BC using bioinformatic analysis. Accordingly, miR-NAs may be considered good candidates as biomarkers for both prognosis and diagnosis of BC. Hence, miRNAs profiling studies from different tissues represent an excellent alternative application for these short sequences as biomarkers with clinical significance. It is widely known that poor prognosis as a major challenge for the treatment of BC leads to a low survival rate of BC patients, and gene mutation and environment exposure have been identified to be associated with this outcome. To our knowledge, however, a prognostic model of BC is rarely reported. In the current study, we aimed to develop a prognostic model for BC that provides some useful insights in improving the prognosis of BC patients and helps to increase their overall survival. For this purpose, the miRNA expression profile data of BC based on The Cancer Genome Atlas (TCGA) were analyzed to screen miRNAs related to BC prognosis and then construct a BC prognostic model using bioinformatic methods. In addition, the related clinical prognostic factors, messenger RNAs (mRNAs) and biological pathways were analyzed based on this model.

Data extraction and grouping
The clinical information, the miRNA expression profile and the mRNA data based on the Illumina HiSeq2000 RNA Sequencing (Illumina, San Diego, CA, USA) platform of a total of 432 samples of BC patients were downloaded from TCGA (https ://gdc-porta l.nci.nih.gov/) [14]. After corresponding to clinical information of them, a total of 428 samples with corresponding information were included in our study, of which 409 were tumor samples (BC group) and 19 were normal control samples (control group). Then, BC group were randomly divided into two groups: 204 tumor samples utilized as the training set and 205 tumor samples utilized as the validation set. The clinical information of tumor samples in the training set and validation set are shown in Table 1.

DEMs screening in the training set
First, miRNAs with median value as 0 in the training set were removed, which means the read counts obtained through the Illumina Hiseq. Next, based on the expression information provided by TCGA, limma package  [15] in R 3.4.1 was used to screen DEMs between BC and normal control samples with the thresholds of false discovery rate (FDR) < 0.05 and |log fold change (FC)| > 1. At last, according to the expression values of DEMs in the training set, bidirectional hierarchical clustering based on centered Pearson correlation algorithm was performed by pheatmap (version 1.0.8, https ://cran.r-proje ct.org/web/packa ges/pheat map/index .html) [16] in R 3.4.1.

DEMs screening related to prognosis
Combined with the clinical prognostic information of BC samples in the training set and the expression levels of DEMs, DEMs related to the OS were screened by Univariate Cox Regression analysis survival package (version 2.41.3, https ://cran.r-proje ct.org/web/packa ges/survi val/ index .html) [17] in R 3.4.1, with the threshold of log-rank p-value < 0.05.

Construction and verification of prognosis risk assessment model based on DEMs levels
Based on DEMs related to prognosis, the optimized DEMs signature was screened using LASSO Cox Regression model [18] of penalized package (version 0.9.50, https ://cran.r-proje ct.org/web/packa ges/penal ized/index .html) [19] in R 3.4.1. The optimized parameter lambda in this model was obtained by 1000 cycles calculation of cross-validation likelihood (cvl) algorithm. Subsequently, the cutoff value of the optimized DEM signature was calculated using X-Tile Bio-Informatics Tool (https ://medic ine.yale.edu/lab/rimm/resea rch/softw are.aspx) [20] with the threshold of Monte-Carlo p-value < 0.05. The sample status was defined according to the cutoff value of each miRNA: status = 1 when the expression level of miRNA > the cutoff value; while status = 0 when the expression level of miRNA < the cutoff value [21]. After that, the risk assessment model (MiR score) was constructed for each sample by the linear combination of miRNA status weighted by regression coefficient as follows: MiR Score = ∑β miRNA n × Status miRNA n . The β represented prognostic regression coefficient and status was defined as previously mentioned. We found that technical biases of miRNA data from the TCGA are not affecting the differential expression due to the presence of sequencing procedures or batches. According to the median value of MiR Score, all samples in the training set were divided into high risk and low risk groups. The Kaplan-Meier (K-M) survival curve analysis was used to estimate the prognosis difference between high risk and low risk groups. Meanwhile, Area Under Receiver Operating Characteristic (AUROC) curve analysis was used to assess the prognostic model. Similar to the above, all samples in the validation set were divided into high risk and low risk groups, and this model was further verified in the validation set and assessed using K-M survival analysis and Receiver Operating Characteristic (ROC) curve.

Further analysis of the prognostic factors
The independent prognostic factors based on the clinical information of tumor samples in the training set and validation set were analyzed using univariate and multivariate Cox Regression analysis survival package (version 2.41.1) in R 3.4.1. Based on these independent prognostic factors, the nomogram of 3-and 5-year survival prediction models were constructed using rms package (version 5.1-2, https ://cran.r-proje ct.org/web/packa ges/rms/ index .html) in R 3.4.1.

The screening of mRNA related to prognosis and pathway analysis
The RNA-seq profile data of BC which paired with miRNA expression profile data were extracted, and then divided into high risk and low risk groups according to MiR Score. Next, differentially expressed genes (DEGs) between high risk and low risk groups were screened using limma package with the thresholds of FDR < 0.05 and log FC > 1. Following this, Gene Ontology (GO) functional annotation associated with biological process analysis [22] and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis [23] were carried out using Database for Annotation, Visualization and Integrated Discovery (DAVID) program (v 6.8, https :// david .ncifc rf.gov/) [24]. The p value < 0.05 was considered as the cutoffs for significantly statistical difference in functional analyses.

DEMs screening in the training set
Among all samples contained in the training set, miRNAs with median value of read counts as 0, which indicate not expressed miRNAs, were filtered (Fig. 1a). Based on the selective criteria, a total of 134 DEMs were identified between BC and normal control samples, including 18 downregulated and 116 upregulated miRNAs (Fig. 1b). Then, bidirectional hierachical clustering was conducted for these 134 DEMs, indicating that these identified DEMs could significantly distinct tumor samples from the normal controls (Fig. 1c).
The distribution of the MiR Score in the training set and validation set are shown in Fig. 3a left and b left, respectively. ROC curve analysis revealed that the area under curve (AUC) of 3-and 5-year survival were 0.781 and 0.778 in the training set, as well as 0.781 and 0.762 in the validation set, respectively, indicating that this model possessed a relative satisfying predicted ability both in the training set and validation set (Fig. 3a, b middle). Meanwhile, the estimation of K-M survival analysis showed that the OS of patients in the low-risk group was significantly longer than that in the high-risk group (p = 1.905 × 10 −10 , and p = 8.821 × 10 −3 , Fig. 3a, b right) in the training set and validation set, respectively.  1.0702-5.176, p = 3.328 × 10 −2 ) were considered as the independent prognostic factors in BC patients both in the training set and validation set, respectively (Table 3). Furthermore, the nomogram of 3-and 5-year survival prediction models of these independent prognostic factors were constructed as Fig. 4a. The nomogram of 5-year survival prediction showed compliance to actual 5-year survival (Fig. 4b).

Functional enrichment analysis of DEGs related to prognosis
Based on the selective criteria, a total of 389 DEGs were identified between high risk and low risk groups, including 33 downregulated and 356 upregulated genes, in which several DEGs, such as WISP3 and UNC5C were related to biological pathway (Fig. 5a). WISP3 and UNC5C are the significant DEGs between high risk and low risk group. Then, clustering analysis for these DEGs was conducted, indicating that these identified DEGs could significantly distinct high risk from low risk groups (Fig. 5b). To further identify the functional characteristics of DEGs, the functional enrichment analyses of genes were conducted with DAVID. Consequently, the Biology Process (BP) analysis of DEGs revealed that the significant enriched terms primarily concentrated on ion transport, cell-cell adhesion, neurological system process, metal ion transport, cell-cell signaling, cation transport, transmission of nerve impulse, neuron differentiation, synaptic transmission, muscle contraction, and cell morphogenesis involved in neuron differentiation (Fig. 5c). In addition, the KEGG pathway analysis implied that these DEGs were responsible for neuroactive ligand-receptor interaction, calcium signaling pathway, cell adhesion molecules, and gap junction pathways (Fig. 5c, Table 4).

Discussion
In this study, 21 miRNAs related to BC prognosis were identified based on the expression levels in TCGA by univariate and multivariate Cox Regression analysis. Then, seven out of them were further isolated as the optimized prognostic gene signature and a MiR Score prognostic model was constructed based on these seven miRNAs (hsa-mir-1247, hsa-mir-1304, hsa-mir-1911, hsa-mir-204, hsa-mir-33b, hsa-mir-3934 and hsa-mir-526b), which presented a relative highly forecast ability for BC. In addition, age, lymphovascular invasion and MiR Score were identified as the independent prognostic factors in BC patients from TCGA. Furthermore, based on MiR Score prognostic model, several DEGs, such as WISP3 and UNC5C, as well as their related pathway, including cell-cell adhesion and neuroactive ligand-receptor interaction, were considered to be related to BC prognosis. Besides, we found UNC5C is a potential target for hsamir-1911, hsa-mir-3934 and hsa-mir-526b through Targetscan, which belongs to the UNC-5 family of netrin receptors. Netrins are secreted proteins that direct axon extension and cell migration during neural development a b Fig. 2 a The lambda parametric curves by cross-validation likelihood (cvl) algorithm. Horizontal axis and vertical axis represent lambda and cvl, respectively; the intersection of green dotted line represent that the maximum value of cvl was -509.633 when lambda was 11.567. b Coefficient distribution diagram of the optimized seven DEMs related to prognosis based on LASSO Cox Regression model (NCBI, https ://www.ncbi.nlm.nih.gov/). At present, studies have indicated that UNC5C was associated with colorectal cancer [25,26] and Alzheimer's Disease [27]. Thus, further studies are needed to identify the association between UNC5C and hsa-mir-1911, hsa-mir-3934 and hsa-mir-526b in BC patients. The mining of a large amount of genetic data in various diseases have been enhanced due to the rapid technological advances in high-throughput sequencing and bioinformatics [28]. TCGA, as a public and available cancer genomic database, provides comprehensive data for different types of cancer, including mRNA expression data, miRNA expression data, copy number variation, DNA methylation, and clinical information [29]. The data from TCGA have been effectively applied to improve diagnostic and therapeutic methods of cancers, as well as finally cancer prevention [29]. Thus, this study was also performed based on the miRNA expression profile data and clinical information of BC form TCGA. MiRNA expression profiles have been reported to predict the prognosis outcome of cancers [30]. Computationally, univariate and multivariate Cox Regression were the most common method to construct the prognostic models and screen prognostic factors [31]. In this study, the Cox Regression model based on the LASSO, a semi-parametric proportional hazards model, was applied. The availability of this model in survival analysis have been confirmed in recent studies [32,33]. Similarly, in this study, the MiR Score prognostic model constructed by LASSO Cox Regression model showed a higher predictive ability both in training and validation sets. In addition, this study showed that age and lymphovascular invasion were independent prognostic factors in BC patients. Consistent with our results, previous studies have also demonstrated that age and lymphovascular invasion are associated with poor prognosis in BC patients [34], Notably, MiR Score was also been considered as an independent prognostic factor in BC patients, which further showed that the MiR Score prognostic model had a significant predictive ability for BC prognosis.
In this study, the 7-miRNA signature was identified. Specifically, miR-1247, as a tumor suppressor, has been reported in several cancers, including lung cancer [35], hepatocellular cancer [36], and pancreatic cancer [37]. A recent study also has shown that miR-1247 inhibits cell proliferation and invasion through down-regulating its target gene RAB36 in BC [38]. MiR-204, miR-33b, and miR-526b are three reported miRNAs that function as tumor suppressor in cancers. Previous studies have suggested that miR-204 plays an inhibitory effect on cell invasion in gastric cancer cells [39] and non-small cell lung cancer [40]. MiR-33b has been reported to inhibit migration and invasion in osteosarcoma cells [41], melanoma cells [42], and lung adenocarcinoma cells [43]. In addition, miR-526b is revealed to have an inhibitory role in non-small cell lung cancer [44]. However, studies of these three miRNAs as well as miR-1304, miR-1911, and miR-3934 are rarely reported in BC. Therefore, it is important to further reveal the mechanism and prognostic significance of 7-miRNA signature in BC.
Furthermore, this study found that several DEGs, such as WISP3 and UNC5C, were closely associated with BC prognosis. WISP3, also known as cellular communication network factor 6 (CCN6), is a cysteine-rich and glycosylated signaling protein and key regulatory extracellular matrix component [45]. It is one member of the CCN (Cyr61, CTGF, Nov) family which play important roles in several biological functions, including cell proliferation, adhesion, and invasion [45], which suggested the close relationship of WISP3 and cell-cell adhesion. Increasing evidences have demonstrated that WISP3 is abnormally expressed in cancers and play a contrary effect in cancer progression [46]. It is reported that WISP3 is overexpressed as an oncogene in ovarian carcinomas [47]. On the contrary, WISP3 is a tumor suppressor gene that inhibits cell proliferation in breast cancer [48]. Zeng et al. [49] has found that depletion of WISP3 notably inhibited the invasion of BC cells. Our data suggests that inhibition of WISP3 may be a therapeutic strategy for BC. Recent study has found that WISP3 is up-regulated and promotes the cell proliferation and invasion in BC cells, which is consistent with our results. UNC5C (unc-5 netrin receptor C) belongs to the netrin-1 receptor family, and plays key role in cell apoptotic process as a dependence receptor that may be involved in neuroactive ligand-receptor interaction [50]. Previous reports have suggested that UNC5C has tumor suppressive effect in colorectal cancer through promoter methylation [51]. In addition, UNC5C is reported to be downregulated due to specific genetic alterations and inhibits apoptosis of tumor cells by suppressing proapoptotic signals [52]. More recently, the receptors of UNC5 family have been revealed to be involved in the regulation of cell death processes in BC [53,54]. However, there were still a number of limitations in the work. Functional validation was lacked for the feature genes obtained herein. Further investigations for these genes are required with substantial experiments. Nevertheless, this work provides novel insight into the pathogenesis of BC.

Conclusion
In conclusion, the prognostic model based on the prognostic 7-miRNA signature presented a relatively promising predictive ability for BC. The seven prognostic miRNAs may have clinical implications in BC prognosis. However, the prognostic significance of 7-miRNA From left to right represents the changing process of MiRScore value from low to high. c GO and KEGG enrichment analyses of DEGs. Horizontal axis and vertical axis represent gene number and term, respectively; the color of the bar indicates the significant p value, and the closer the color is to orange, the higher the significance