RNA-Seq differential expression work flow in R/Bioconductor
EGFR inhibitor
Experiment RNA at four time points: before the treatment (t=0), two, six and twenty-four hours after treatment (t=2, t=6, t=24, respectively), and sequenced using an Illumina HiSeq instrument in triplicates.
Input Data Basic structure of the input data:
Input Data
Inspecting the results table sum( is.na(res$pvalue) ) res <- res[ ! is.na(res$pvalue), ] sig <- res[ which(res$padj < 0.01), ] sig <- sig[ order(sig$padj), ]
EGFR Statistics
EGFR Expression
MA plot This plot demonstrates that only genes with a large average normalized count contain sufficient information to yield a significant call.
Regularized-logarithm transformation Many common statistical methods for exploratory analysis of multidimensional data work best for homoskedastic data. In RNA-Seq data, however, variance grows with the mean. For genes with high counts, the rlog transformation differs not much from an ordinary log2 transformation. For genes with lower counts the values are shrunken towards the genes averages across all samples.
PCA Differences between timepoints is much larger than the difference between triplicate samples.
Heatmap Blocks of genes which covary across samples:
Annotation
Pathway
ErbB signaling pathway shading the molecules in the pathway by their degree of up/down-regulation
MAPK signaling pathway
Pathways in cancer
Non-small cell lung cancer