This is a quick walkthrough demonstrating how to generate SWNE plots starting from a counts matrix using a 3k PBMC dataset as an example. You can download the matrix here
First let’s load the required libraries
library(irlba)
library(Matrix)
library(swne)
Next let’s load the matrix, convert it to a sparse matrix to save memory, and filter and trim the genes
counts <- read.table("~/swne/Data/pbmc3k_matrix.tsv.gz", header = T, sep = "\t")
counts <- as(as.matrix(counts), "dgCMatrix")
counts <- FilterData(counts, min.samples.frac = 0.001, trim = 3, min.nonzero.features = 0,
max.sample.sum = Inf)
Most scRNA-seq pipelines only use a subset of highly overdispersed genes for analysis. We’ll pull out those variable genes here, as well as the cluster labels
## Pull out overdispersed genes as defined by Seurat
var.genes <- SelectFeatures(counts, n.features = 3000)
## calculating variance fit ... using gam
length(var.genes)
## [1] 3000
## Pull out cell clusters as defined by Seurat
cell.clusters <- factor(sapply(colnames(counts), ExtractField, field = 2, delim = "_"))
names(cell.clusters) <- colnames(counts)
table(cell.clusters)
## cell.clusters
## B CD14..Mono CD8.T DC FCGR3A..Mono
## 344 480 271 32 162
## Memory.CD4.T Mk Naive.CD4.T NK
## 483 14 697 155
Next we will normalize and run variance stabilization on the counts matrix
norm.counts <- ScaleCounts(counts)
## calculating variance fit ... using gam
The easiest way to generate an SWNE embedding is to use the wrapper function RunSWNE
## Run SWNE
genes.embed <- c("MS4A1", "GNLY", "CD3E", "CD14",
"FCER1A", "FCGR3A", "LYZ", "PPBP", "CD8A")
swne.embedding <- RunSWNE(norm.counts, k = 16, var.genes = var.genes, genes.embed = genes.embed,
sample.groups = cell.clusters)
## [1] "3000 variable genes"
## Warning in irlba::irlba(t(A), nv = 50, maxit = 100, center = rowMeans(A)):
## did not converge--results might be invalid!; try increasing work or maxit
## Initial stress : 0.13403
## stress after 10 iters: 0.04646, magic = 0.461
## stress after 20 iters: 0.03907, magic = 0.500
## stress after 30 iters: 0.03877, magic = 0.500
## stress after 40 iters: 0.03874, magic = 0.500
## Plot SWNE
PlotSWNE(swne.embedding, alpha.plot = 0.4, sample.groups = cell.clusters,
do.label = T, label.size = 3.5, pt.size = 1, show.legend = F,
seed = 42)
We can validate the embedded genes by overlaying the expression of one of these key genes onto the plot.
gene.use <- "CD8A"
gene.expr <- norm.counts[gene.use,]
FeaturePlotSWNE(swne.embedding, gene.expr, gene.use, alpha.plot = 0.4, label.size = 3.5, pt.size = 1.25)
Finally, we can extract cluster colors for compatibility with other plotting methods (i.e. Monocle)
color.mapping <- ExtractSWNEColors(swne.embedding, sample.groups = cell.clusters, seed = 42)
color.mapping
## Memory.CD4.T B CD14..Mono NK CD8.T
## "#FF61C3" "#F8766D" "#00C19F" "#93AA00" "#DB72FB"
## Naive.CD4.T FCGR3A..Mono DC Mk
## "#619CFF" "#D39200" "#00B9E3" "#00BA38"