This report was generated by the WTSI Pathogen Informatics Differential Expression And GO enrichment analysis pipeline. It should only be used as an overview of your RNA-Seq data and is not intended to be a final analysis.
The R package DEAGO is a wrapper which contains functions to generate QC plots, perform differential expression analysis with DESeq2 and GO term enrichment analysis with topGO.
The Pipeline configuration section below gives the details of the files and settings used to generate this report. These have been imported from /lustre/scratch118/infgen/pathdev/vo1/deago_tutorial/lrt_analysis/deago.config.
parameters <- importConfig("/lustre/scratch118/infgen/pathdev/vo1/deago_tutorial/lrt_analysis/deago.config")
resultsDir <- makeResultDir(parameters$results_directory, parameters$keep_images)
parameters[['results']] <- resultsDir
The summary table below contains the total number of differentially expressed genes and the number of up-regulated (lfc > 2) and down-regulated (lfc < -2) genes for each contrast (padj < 10E-12).
targets <- importTargets(parameters$targets_file)
countdata <- readCountData(targets, parameters$counts_directory, parameters$gene_ids, data_column=7, skip=1, sep='\t')
coldata <- DataFrame( cell_type=factor( tolower(targets$cell_type) ),
treatment=factor( tolower(targets$treatment) ),
condition=factor( tolower(targets$condition) ) )
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~cell_type+treatment+cell_type:treatment)
dds$cell_type <- relevel(dds$cell_type, ref = "ko")
dds$treatment <- relevel(dds$treatment, ref = "ctrl")
dds <- DESeq(dds, test="LRT", reduced=~cell_type+treatment)
dds <- annotateDataset(dds, parameters)
if ("annotation_file" %in% names(parameters)) {
dds <- annotateDataset(dds, parameters)
}
This reads per sample plot shows the total read counts (raw and normalized) for each sample. Bar colours represent experimental condition.
plotReadCounts(dds, resultsDir)
This null count percentage per sample plot shows the percentage of genes which have no counts (raw and normalized) for each sample. Bar colours represent experimental condition.
plotNullCounts(dds, resultsDir)
The sample-to-sample distance plot gives an overview of how the samples cluster based on their euclidean distance using the regularized log transformed count data.
plotSampleDistances(dds, resultsDir)
The Principal Component Analysis (PCA) plot shows the first two principal components which explain the variability in the data using the regularized log count data.
pc_list <- getPrincipalComponents(dds)
pcaPlot(pc_list,resultsDir)
The principal components (PC) scree plot shows the percentage contribution of each PC.
pcaScreePlot(pc_list,resultsDir)
The PC summary shows the percentage contribution (bars in scree plot) and cumulative total (points in scree plot) for each PC.
pcaSummary(pc_list)
Cook’s distance is a measure of how much a single sample is influencing the fitted coefficients for a gene. A large value of Cook’s distance is intended to indicate an outlier count.
plotCooks(dds,resultsDir)
FALSE Using as id variables
The density plot shows the distribution of normalised read counts per sample.
plotDensity(dds,resultsDir)
The dispersion estimate plot shows the gene-wise estimates (black), fitted values (red) and final maximum a posteriori estimates used in testing (blue).
plotDispersionEstimates(dds, resultsDir)
lrt_results <- results(dds, alpha=as.numeric(parameters$qvalue))
lrt_results$symbol <- mcols(dds)$symbol
contrasts <- list('lrt'=lrt_results)
writeContrasts(dds, contrasts, resultsDir)
The summary table below contains the total number of differentially expressed genes and the number of up-regulated (lfc > 2) and down-regulated (lfc < -2) genes for each contrast (adjusted p-value < 10E-12).
contrast_summary <- contrastSummary(contrasts, parameters)
datatable(contrast_summary, options = list(dom = 't', colnames=c('contrast', 'up-regulated','down-regulated','total'), columnDefs = list(list(className = 'dt-center', targets = 1:ncol(contrast_summary)))))
The MA plot below compares the mean of the normalized counts against the log fold change, showing one point per gene. Points will be colored red if the adjusted p-value is less than 0.05. Points which fall out of the window are plotted as open triangles pointing either up or down.
plotContrastMA(contrasts$'lrt', resultsDir, geneLabels=TRUE)
The volcano plot below compares the mean of the normalized counts against the log fold change, showing one point per gene. Points will be colored where the adjusted p-value is less than 0.05. Green points represent down-regulated genes (lfc < -2) and orange points represent up-regulated genes (lfc > 2).
plotVolcano(contrasts$'lrt', resultsDir, geneLabels=TRUE)
The table below contains differntially expressed genes (adjusted p-value < 0.01, log2 fold change <= 2 | >= 2).
prepareContrastTable(contrasts$'lrt')
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu precise (12.04.2 LTS)
##
## Matrix products: default
## BLAS: /software/R-3.4.0/lib/R/lib/libRblas.so
## LAPACK: /software/R-3.4.0/lib/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 methods stats graphics grDevices utils
## [8] datasets base
##
## other attached packages:
## [1] DESeq2_1.16.1 SummarizedExperiment_1.6.5
## [3] DelayedArray_0.2.7 matrixStats_0.52.2
## [5] Biobase_2.36.2 GenomicRanges_1.28.6
## [7] GenomeInfoDb_1.12.3 IRanges_2.10.5
## [9] S4Vectors_0.14.7 BiocGenerics_0.22.1
## [11] DT_0.2 deago_1.1.2
## [13] markdown_0.8
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.5 bit64_0.9-7
## [3] splines_3.4.0 topGO_2.28.0
## [5] assertthat_0.1 Formula_1.2-2
## [7] latticeExtra_0.6-28 blob_1.1.0
## [9] GenomeInfoDbData_0.99.0 ggrepel_0.7.0
## [11] yaml_2.1.14 RSQLite_2.0
## [13] backports_1.1.1 lattice_0.20-33
## [15] limma_3.32.10 glue_1.2.0
## [17] digest_0.6.12 RColorBrewer_1.1-2
## [19] XVector_0.16.0 checkmate_1.8.4
## [21] colorspace_1.3-2 cowplot_0.9.2
## [23] htmltools_0.3.6 Matrix_1.2-11
## [25] plyr_1.8.4 pkgconfig_2.0.1
## [27] XML_3.98-1.9 SparseM_1.77
## [29] genefilter_1.58.1 zlibbioc_1.22.0
## [31] purrr_0.2.4 GO.db_3.4.1
## [33] xtable_1.8-2 scales_0.5.0
## [35] BiocParallel_1.10.1 htmlTable_1.9
## [37] tibble_1.3.4 annotate_1.54.0
## [39] ggplot2_2.2.1 ggpubr_0.1.6
## [41] nnet_7.3-12 lazyeval_0.2.0
## [43] survival_2.41-3 magrittr_1.5
## [45] memoise_1.1.0 evaluate_0.10.1
## [47] foreign_0.8-69 graph_1.54.0
## [49] tools_3.4.0 data.table_1.10.4-2
## [51] stringr_1.2.0 munsell_0.4.3
## [53] locfit_1.5-9.1 cluster_2.0.6
## [55] bindrcpp_0.2 AnnotationDbi_1.38.2
## [57] compiler_3.4.0 rlang_0.2.0
## [59] grid_3.4.0 RCurl_1.95-4.8
## [61] htmlwidgets_0.9 labeling_0.3
## [63] bitops_1.0-6 base64enc_0.1-3
## [65] rmarkdown_1.6 gtable_0.2.0
## [67] reshape_0.8.7 DBI_0.7
## [69] R6_2.2.2 gridExtra_2.3
## [71] knitr_1.17 dplyr_0.7.4
## [73] bit_1.1-12 bindr_0.1
## [75] Hmisc_4.0-3 rprojroot_1.2
## [77] stringi_1.1.5 Rcpp_0.12.13
## [79] geneplotter_1.54.0 rpart_4.1-11
## [81] acepack_1.4.1