1 Introduction

This report was generated by the WTSI Pathogen Informatics Differential Expression And GO enrichment analysis pipeline. It should only be used as an overview of your RNA-Seq data and is not intended to be a final analysis.

The R package DEAGO is a wrapper which contains functions to generate QC plots, perform differential expression analysis with DESeq2 and GO term enrichment analysis with topGO.

The Pipeline configuration section below gives the details of the files and settings used to generate this report. These have been imported from /lustre/scratch118/infgen/pathdev/vo1/deago_tutorial/lrt_analysis/deago.config.

2 Pipeline configuration

parameters <- importConfig("/lustre/scratch118/infgen/pathdev/vo1/deago_tutorial/lrt_analysis/deago.config")
resultsDir <- makeResultDir(parameters$results_directory, parameters$keep_images)
parameters[['results']] <- resultsDir

3 Imported data summary

The summary table below contains the total number of differentially expressed genes and the number of up-regulated (lfc > 2) and down-regulated (lfc < -2) genes for each contrast (padj < 10E-12).

targets <- importTargets(parameters$targets_file)
countdata <- readCountData(targets, parameters$counts_directory, parameters$gene_ids, data_column=7, skip=1, sep='\t')

4 DESeq2 analysis

coldata <- DataFrame( cell_type=factor( tolower(targets$cell_type) ),
                      treatment=factor( tolower(targets$treatment) ),
                      condition=factor( tolower(targets$condition) ) )

dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~cell_type+treatment+cell_type:treatment)

dds$cell_type <- relevel(dds$cell_type, ref = "ko")
dds$treatment <- relevel(dds$treatment, ref = "ctrl")

dds <- DESeq(dds, test="LRT", reduced=~cell_type+treatment)
dds <- annotateDataset(dds, parameters)
if ("annotation_file" %in% names(parameters)) {
  dds <- annotateDataset(dds, parameters)
}

5 QC plots

5.1 Total read counts per sample

This reads per sample plot shows the total read counts (raw and normalized) for each sample. Bar colours represent experimental condition.

plotReadCounts(dds, resultsDir)

5.2 Null count percentage per sample

This null count percentage per sample plot shows the percentage of genes which have no counts (raw and normalized) for each sample. Bar colours represent experimental condition.

plotNullCounts(dds, resultsDir)

5.3 Sample-to-sample distances

The sample-to-sample distance plot gives an overview of how the samples cluster based on their euclidean distance using the regularized log transformed count data.

plotSampleDistances(dds, resultsDir)

5.4 Principal component analysis (PCA)

5.4.1 PCA plot

The Principal Component Analysis (PCA) plot shows the first two principal components which explain the variability in the data using the regularized log count data.

pc_list <- getPrincipalComponents(dds)
pcaPlot(pc_list,resultsDir)

5.4.2 PC scree plot

The principal components (PC) scree plot shows the percentage contribution of each PC.

pcaScreePlot(pc_list,resultsDir)

5.4.3 PC summary

The PC summary shows the percentage contribution (bars in scree plot) and cumulative total (points in scree plot) for each PC.

pcaSummary(pc_list)

5.5 Cook’s distances

Cook’s distance is a measure of how much a single sample is influencing the fitted coefficients for a gene. A large value of Cook’s distance is intended to indicate an outlier count.

plotCooks(dds,resultsDir)
FALSE Using  as id variables

5.6 Density plot

The density plot shows the distribution of normalised read counts per sample.

plotDensity(dds,resultsDir)

5.7 Dispersion plot

The dispersion estimate plot shows the gene-wise estimates (black), fitted values (red) and final maximum a posteriori estimates used in testing (blue).

plotDispersionEstimates(dds, resultsDir)

6 Pairwise contrasts

lrt_results <- results(dds, alpha=as.numeric(parameters$qvalue))
lrt_results$symbol <- mcols(dds)$symbol

contrasts <- list('lrt'=lrt_results)
writeContrasts(dds, contrasts, resultsDir)

6.1 Contrast summary

The summary table below contains the total number of differentially expressed genes and the number of up-regulated (lfc > 2) and down-regulated (lfc < -2) genes for each contrast (adjusted p-value < 10E-12).

contrast_summary <- contrastSummary(contrasts, parameters)
datatable(contrast_summary, options = list(dom = 't', colnames=c('contrast', 'up-regulated','down-regulated','total'), columnDefs = list(list(className = 'dt-center', targets = 1:ncol(contrast_summary)))))

6.1.0.1 MA plot

The MA plot below compares the mean of the normalized counts against the log fold change, showing one point per gene. Points will be colored red if the adjusted p-value is less than 0.05. Points which fall out of the window are plotted as open triangles pointing either up or down.

plotContrastMA(contrasts$'lrt', resultsDir, geneLabels=TRUE)

6.1.0.2 Volcano plot

The volcano plot below compares the mean of the normalized counts against the log fold change, showing one point per gene. Points will be colored where the adjusted p-value is less than 0.05. Green points represent down-regulated genes (lfc < -2) and orange points represent up-regulated genes (lfc > 2).

plotVolcano(contrasts$'lrt', resultsDir, geneLabels=TRUE)

6.1.1 Differential expression

The table below contains differntially expressed genes (adjusted p-value < 0.01, log2 fold change <= 2 | >= 2).

prepareContrastTable(contrasts$'lrt')

7 R session

## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu precise (12.04.2 LTS)
## 
## Matrix products: default
## BLAS: /software/R-3.4.0/lib/R/lib/libRblas.so
## LAPACK: /software/R-3.4.0/lib/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    methods   stats     graphics  grDevices utils    
## [8] datasets  base     
## 
## other attached packages:
##  [1] DESeq2_1.16.1              SummarizedExperiment_1.6.5
##  [3] DelayedArray_0.2.7         matrixStats_0.52.2        
##  [5] Biobase_2.36.2             GenomicRanges_1.28.6      
##  [7] GenomeInfoDb_1.12.3        IRanges_2.10.5            
##  [9] S4Vectors_0.14.7           BiocGenerics_0.22.1       
## [11] DT_0.2                     deago_1.1.2               
## [13] markdown_0.8              
## 
## loaded via a namespace (and not attached):
##  [1] jsonlite_1.5            bit64_0.9-7            
##  [3] splines_3.4.0           topGO_2.28.0           
##  [5] assertthat_0.1          Formula_1.2-2          
##  [7] latticeExtra_0.6-28     blob_1.1.0             
##  [9] GenomeInfoDbData_0.99.0 ggrepel_0.7.0          
## [11] yaml_2.1.14             RSQLite_2.0            
## [13] backports_1.1.1         lattice_0.20-33        
## [15] limma_3.32.10           glue_1.2.0             
## [17] digest_0.6.12           RColorBrewer_1.1-2     
## [19] XVector_0.16.0          checkmate_1.8.4        
## [21] colorspace_1.3-2        cowplot_0.9.2          
## [23] htmltools_0.3.6         Matrix_1.2-11          
## [25] plyr_1.8.4              pkgconfig_2.0.1        
## [27] XML_3.98-1.9            SparseM_1.77           
## [29] genefilter_1.58.1       zlibbioc_1.22.0        
## [31] purrr_0.2.4             GO.db_3.4.1            
## [33] xtable_1.8-2            scales_0.5.0           
## [35] BiocParallel_1.10.1     htmlTable_1.9          
## [37] tibble_1.3.4            annotate_1.54.0        
## [39] ggplot2_2.2.1           ggpubr_0.1.6           
## [41] nnet_7.3-12             lazyeval_0.2.0         
## [43] survival_2.41-3         magrittr_1.5           
## [45] memoise_1.1.0           evaluate_0.10.1        
## [47] foreign_0.8-69          graph_1.54.0           
## [49] tools_3.4.0             data.table_1.10.4-2    
## [51] stringr_1.2.0           munsell_0.4.3          
## [53] locfit_1.5-9.1          cluster_2.0.6          
## [55] bindrcpp_0.2            AnnotationDbi_1.38.2   
## [57] compiler_3.4.0          rlang_0.2.0            
## [59] grid_3.4.0              RCurl_1.95-4.8         
## [61] htmlwidgets_0.9         labeling_0.3           
## [63] bitops_1.0-6            base64enc_0.1-3        
## [65] rmarkdown_1.6           gtable_0.2.0           
## [67] reshape_0.8.7           DBI_0.7                
## [69] R6_2.2.2                gridExtra_2.3          
## [71] knitr_1.17              dplyr_0.7.4            
## [73] bit_1.1-12              bindr_0.1              
## [75] Hmisc_4.0-3             rprojroot_1.2          
## [77] stringi_1.1.5           Rcpp_0.12.13           
## [79] geneplotter_1.54.0      rpart_4.1-11           
## [81] acepack_1.4.1