This is a quick walkthrough demonstrating how to use SWNE to re-analyze an existing single-cell study that looks at both the host transcriptome and Zika viral RNA levels using a Huh7 hepatoma cell line. The study can be found here.
To save time we will be using the pre-computed Seurat object zika_seurat.Robj
, which can be downloaded here.
First let’s load the required libraries
library(Seurat)
library(swne)
library(perturbLM)
library(ggplot2)
Next let’s load the Seurat object
obj <- readRDS("~/swne/Data/zika_seurat.Robj")
Most scRNA-seq pipelines only use a subset of highly overdispersed genes for analysis. However here, we want to use the genes that are most highly correlated with viral load. First let’s pull out the zika reads and log-scale and normalize them
zika.reads <- as.integer(obj$numberZikaReads)
names(zika.reads) <- colnames(obj)
zika.norm.reads <- log((zika.reads/obj$nCount_RNA) * median(obj$nCount_RNA) + 1)
zika.scale.reads <- scale(zika.norm.reads, center = T, scale = T)
Next we can compute a pearson correlation between each scaled gene and the scaled zika reads
scale.expr <- GetAssayData(obj, slot = "scale.data")
gene.virus.pearson <- sapply(rownames(obj), function(g) cor(scale.expr[g,], zika.scale.reads, method = "pearson"))
gene.virus.pearson <- sort(gene.virus.pearson, decreasing = T)
Next we’ll look at the top correlated and anti-correlated genes for candidate genes to embed onto our SWNE plot
top.gene.virus <- c(head(gene.virus.pearson, n = 10), tail(gene.virus.pearson, n = 10),
gene.virus.pearson[c("FURIN", "HSPA2")])
ggBarplot(top.gene.virus, fill.color = "tomato") + coord_flip()
We selected some of the top genes, as well as some genes previously known to be key to flavivirus replication for SWNE embedding
genes.embed <- c("ATF3", ## UPR
"DDIT3", ## UPR
"RND1", ## Permeabilizes cell membrane
"NEURL3", ## Possible anti-viral IFN response
"FGFR4", ## Alters distribution of viral particles
"NR0B2", ## Regulates cholesterol homeostasis
"CCNB1", ## Cell Cycle
"HSPA2", ## Chaperone required for entry
"HMGB1", ## Cell Cycle
"LSM3") ## mRNA degradation
We build the Shared Nearest Neighbors (SNN) graph using Seurat. We’re only going to use genes that are correlated with the viral load from now on.
## Only use genes that are correlated with viral load
genes.use <- names(gene.virus.pearson[abs(gene.virus.pearson) > 0.1])
length(genes.use)
## [1] 2965
obj <- RunPCA(obj, features = genes.use, verbose = F)
ElbowPlot(obj)
obj <- FindNeighbors(obj, k.param = 20, dims = 1:10, prune.SNN = 0.05)
## Computing nearest neighbor graph
## Computing SNN
snn <- obj@graphs$RNA_snn
Normalize the data and find the optimal number of factors to use for NMF
norm.counts <- ExtractNormCounts(obj)
## calculating variance fit ... using gam
k.res <- FindNumFactors(norm.counts[genes.use,], k.range = seq(2,20,2))
Next we run NMF and the SWNE embedding as well as the feature embedding
nmf.res <- RunNMF(norm.counts[genes.use,], k = 10, n.cores = 12, ica.fast = T)
nmf.res$W <- ProjectFeatures(norm.counts, nmf.res$H, n.cores = 12)
swne.emb <- EmbedSWNE(nmf.res$H, SNN = snn, alpha.exp = 1.25, snn.exp = 1, n_pull = 3)
## Initial stress : 0.07600
## stress after 10 iters: 0.03121, magic = 0.018
## stress after 20 iters: 0.02669, magic = 0.500
## stress after 30 iters: 0.02576, magic = 0.500
## stress after 40 iters: 0.02174, magic = 0.500
## stress after 50 iters: 0.02153, magic = 0.500
## stress after 60 iters: 0.02133, magic = 0.500
## stress after 70 iters: 0.02124, magic = 0.500
swne.emb <- EmbedFeatures(swne.emb, nmf.res$W, alpha.exp = 1, features.embed = genes.embed, n_pull = 3)
swne.emb$H.coords$name <- ""
Now we can generate an SWNE plot, with the viral load of each cell overlaid onto the plot
FeaturePlotSWNE(swne.emb, feature.scores = zika.norm.reads, feature.name = "Viral Load")
We extract out the time point (in hrs) and make an SWNE plot by time point. We can see that generally the later time points have a higher viral load but there is also quite a bit of heterogeneity within each time point.
time.point <- droplevels(obj$time..h.)
levels(time.point) <- paste0(levels(time.point), "hr")
PlotSWNE(swne.emb, sample.groups = time.point)