Skip to main content
Figure 1 | Journal of Biological Research-Thessaloniki

Figure 1

From: Reproducibility and reliability assays of the gene expression-measurements

Figure 1

Correlation-based reproducibility and reliability assays on real gene expression data. A publicly available RNA-Seq dataset from rice including two root-sample replicates and two shoot-sample replicates is used as an example to illustrate the power of correlation coefficient in the reproducibility assay of read counts as gene expression data. Briefly, the reads were mapped on the rice genome to apply the mapped total gene, unigene, and total exon reads (A1, A2 and B1, B2). Large bias shifts from ~65% in A1 and B1 to ~12% in A2 and B2 after both reference gene-based correction (A) and read count-based correction (B) were not able to increase the correlation coefficient. (C) As the level of noise increases the correlation becomes weaker. The slopes are nearly the same and approach a slope of 1, but the lower correlation observed results from the higher level of noise in C compared to A2 and B2. Considering the level of noise in the data and the slope of inter-replicate regression line, it is possible that, while helpful, they might not be precise for reproducibility assay. After data correction (in A2 and B2), the correlation was not changed but the slope was improved, i.e. the slope-deviation from 1 was decreased. However, 12% of the genes still show 50% or greater than 50% inter-replicate variation in expression, which would be indistinguishable by taking both the correlation and the slope into account. (A1α, B1α) Logarithmic transformation of data could change the difference between sample replicates which is represented by the changed slope-deviation on scatter plots. (A2α, B2α) Scatter plot narrow-intervals allow us to observe noise in the dataset. There is almost a clear ±0.5x variation shown by scatter plots. 350, 354: rice untreated root sample replicates; 349, 353: rice untreated shoot sample replicates. We evaluated the expression of 24122 to 24701 genes by applying the different types of reads and samples. The log10 transformed data were used to calculate the Pearson correlation. All correlations were significant (P = 0.000) by a two-tailed test.

Back to article page