checkingChecking login status

Login required

You must be logged in to analyze datasets. Please use the login form in the top-right.

About

The workbench for single cell RNAseq (scRNAseq) is designed to allow biologists meaningful access to single cell data, even with limited informatics training. The workbench begins by selecting a dataset for analysis, and then offers analysis tools following several standard pre-processing steps.

Start by choosing a dataset on the left.

Imported analysis selected

You have a selected an analysis bundled with the dataset itself, usually created by the dataset author outside of this workbench. You can use the workbench to perform actions below provided by the analyses the authors have uploaded, but all other analysis steps will be disabled until you create an entirely new analysis.

Compare genes / clusters

The method dropdown menu lists the available statistical tests for your comparison. The “t-test overestimated variance” option may help in situations where clusters contain few cells and variance is difficult to estimate directly, otherwise for robust clusters choose “t-test” (Assumes normally distributed data). The “Wilcoxon-Rank-Sum” option is a non-parametric test and may be helpful when a dataset includes a few genes with very high expression (outliers) or data distribution is not known. Multiple testing correction can be performed either by Benjamini-Hochberg or Bonferroni methods, where Bonferroni is more conservative.

Name log2FC P-value FDR
No data to display

Name log2FC P-value FDR
No data to display

Find marker genes

This option will show you the marker genes within each group of cells as defined by the clustering method used. You can adjust the number of genes you would like to see for each cluster by adjusting the N.


Top ranked genes per cluster (click to select genes of interest)

Marker gene visualization

Select desired marker genes in the table above and/or type gene symbols (separated by commas) in the field below to visualize

  • Unique marker genes selected in table: 0
  • Unique marker genes manually entered: 0
  • Total unique genes selected: 0

Save New Gene Collection

Enter name to save selected genes as a genecart.

Labeled tSNE

Enter a gene of interest to see its tSNE colored both by expression and cluster / cell type.

Unable to load this image. Perhaps that gene is not found in this dataset?

Clustering (Louvain)


Group Num Cells Markers New label Keep

The Louvain clustering is used to find the most likely groups of associated cells within a network. Here they are color-coded. The number of neighbors will have an effect on the smallest possible size for a cluster. If you are interested in groups of cells that are all larger than 20 cells, for example (based on the gene coloring in the initial PCA) – then you can try 6, 10 or 15 neighbors, for example. However, if this is a smaller dataset of regular RNA-seq, for example, with only biological triplicates, starting with two neighbors makes more sense – because the smallest ‘natural’ group should be 3 replicates. Alternatively, if some of the populations in a single cell dataset are very small, again, 3 neighbors could be a useful approach. However, the smaller the number of neighbors, the larger the number of clusters.

The resolution determines how granular the clustering will be. It is set to 1.3 by default. To decrease resolution you can drop it to 1, for example. Or increase the number for higher resolution.

tSNE / UMAP

Choose one (or both). What's the difference?

tSNE
UMAP (faster)
Couldn't find this gene in the dataset:

This non-linear dimensional reduction visualization tool is used to visually cells that are similar to each other. The number of principal components to include depends on the result in the PCA step, as listed above. You can vary the number of principal components used and view how it changes the data display. The default recommendation is to look for the point in the PCA curve in which additional components result in minimal added variation.

Principal Component Analysis (PCA)

Couldn't find this gene in the dataset:



The principal component analysis indicates the groups of genes that have the largest contribution to the variability in gene expression in the dataset. For example, PC1 with have the largest effect in dividing the dataset into subtypes of cells or samples. Each principal component is composed of groups of “related” genes. Visualizing your principal component graph and knowing which genes contribute to it (listed in the table) is important for the next step. The number of principal components included will affect your tSNE (t-distributed stochastic neighbor embedding) plot.

Identify highly-variable genes



Options below are ignored if N top genes is used



We are next going to start to use dimensionality reduction methods to look for structure within the data, but before we do that we want to filter the large gene list down to those genes that are more likely to represent the biologically important variability between each of the cells. There are several parameters that can be adjusted here to set the sensitivity versus stringency of what genes are included and you may find that trying different parameters help you identify new features of the dataset. As a brief description, the x-axis represents the average expression of genes across the dataset. The y-axis is a measure called dispersion, which indicates the variance of that gene across the dataset. The workbench will limit your maximum number of highly variable genes to 2,000. By increasing the Min mean you increase the minimal expression value of genes that may be considered as highly variable. We suggest that you change the parameters and observe the plot to further guide your selection parameters. At the completion of this step press on ‘save these genes’.

QC by mitochondrial content

Filtered shape: genes x obs

No mitochondrial genes with this prefix were found. This could be real, or it could be just because this prefix is case-sensitive. Common options are mt-, Mt- or MT-. (This should be handled for you automatically in a later release.)


Press on the plot button to see plots of (a) the number of genes in each cell; (b) number of read counts per cell; and (c) percent mitochondrial content. The general recommendation is to maintain percent mitochondrial content below 0.05% to focus on living cells. Based on the data in these plots you may wish to change some of the criteria in your previous step. Press ‘save these genes’ before moving to the next step.

Dataset:

Initial shape:

Filtered shape:

Apply filters as desired.

Exclude cells with < genes
Exclude cells with > genes
Exclude genes in < cells
Exclude genes in > cells

Initial composition
Loading Loading initial gene/cell count plots
Genes with highest fraction of counts per cell

Action log