This is a quick walkthrough demonstrating how to generate SWNE plots starting from a counts matrix using a 3k PBMC dataset as an example. You can download the matrix here

First let’s load the required libraries

library(irlba)
library(Matrix)
library(swne)

Next let’s load the matrix, convert it to a sparse matrix to save memory, and filter and trim the genes

counts <- read.table("~/swne/Data/pbmc3k_matrix.tsv.gz", header = T, sep = "\t")
counts <- as(as.matrix(counts), "dgCMatrix")
counts <- FilterData(counts, min.samples.frac = 0.001, trim = 3, min.nonzero.features = 0,
                     max.sample.sum = Inf)

Most scRNA-seq pipelines only use a subset of highly overdispersed genes for analysis. We’ll pull out those variable genes here, as well as the cluster labels

## Pull out overdispersed genes as defined by Seurat
var.genes <- SelectFeatures(counts, n.features = 3000)
## calculating variance fit ... using gam
length(var.genes)
## [1] 3000
## Pull out cell clusters as defined by Seurat
cell.clusters <- factor(sapply(colnames(counts), ExtractField, field = 2, delim = "_"))
names(cell.clusters) <- colnames(counts)
table(cell.clusters)
## cell.clusters
##            B   CD14..Mono        CD8.T           DC FCGR3A..Mono 
##          344          480          271           32          162 
## Memory.CD4.T           Mk  Naive.CD4.T           NK 
##          483           14          697          155

Next we will normalize and run variance stabilization on the counts matrix

norm.counts <- ScaleCounts(counts)
## calculating variance fit ... using gam

The easiest way to generate an SWNE embedding is to use the wrapper function RunSWNE

## Run SWNE
genes.embed <- c("MS4A1", "GNLY", "CD3E", "CD14",
                 "FCER1A", "FCGR3A", "LYZ", "PPBP", "CD8A")
swne.embedding <- RunSWNE(norm.counts, k = 16, var.genes = var.genes, genes.embed = genes.embed,
                          sample.groups = cell.clusters)
## [1] "3000 variable genes"
## Warning in irlba::irlba(t(A), nv = 50, maxit = 100, center = rowMeans(A)):
## did not converge--results might be invalid!; try increasing work or maxit
## Initial stress        : 0.13403
## stress after  10 iters: 0.04646, magic = 0.461
## stress after  20 iters: 0.03907, magic = 0.500
## stress after  30 iters: 0.03877, magic = 0.500
## stress after  40 iters: 0.03874, magic = 0.500
## Plot SWNE
PlotSWNE(swne.embedding, alpha.plot = 0.4, sample.groups = cell.clusters,
         do.label = T, label.size = 3.5, pt.size = 1, show.legend = F,
         seed = 42)

We can validate the embedded genes by overlaying the expression of one of these key genes onto the plot.

gene.use <- "CD8A"
gene.expr <- norm.counts[gene.use,]
FeaturePlotSWNE(swne.embedding, gene.expr, gene.use, alpha.plot = 0.4, label.size = 3.5, pt.size = 1.25)

Finally, we can extract cluster colors for compatibility with other plotting methods (i.e. Monocle)

color.mapping <- ExtractSWNEColors(swne.embedding, sample.groups = cell.clusters, seed = 42)
color.mapping
## Memory.CD4.T            B   CD14..Mono           NK        CD8.T 
##    "#FF61C3"    "#F8766D"    "#00C19F"    "#93AA00"    "#DB72FB" 
##  Naive.CD4.T FCGR3A..Mono           DC           Mk 
##    "#619CFF"    "#D39200"    "#00B9E3"    "#00BA38"