Would You Like Some Single Cells on Top, Perhaps?
Wednesday, October 11, 2017 - 13:46

If you are even marginally involved in molecular biology these days, wherever you go or whatever you do (attend a keynote lecture at a conference, follow a blog, grab a recent copy of a journal from the library shelf, or browse a Twitter feed) single cell-related technologies are everywhere. Everywhere! You cannot avoid it. It is no longer possible. For instance, the other day I called a local takeaway and the guy who was taking my order asked me if I wanted some single cells on top of my pizza.

But, frankly speaking, there are some very good reasons for the exponential growth of popularity of single cell analysis. Although I do not want to negate all the discoveries made by bulk approaches, we need to be aware of their limitations. The principal one is that they average cellular events or - to use a nice, biological wording - they mix apples with oranges. Unless you are making a smoothie, that is hardly ever desireable. If we are interested in function of neurons and, hence, isolate mRNA from a brain sample, in addition to mRNA originating from the neurons, we are inevitably going to get mRNA from glial cells, vascular smooth muscle cells, and an occasional lymphocyte or neutrophil.

A way around the limitation above is to use a high-resolution tool, like single-cell RNA-Seq, and focus on individual cells and groups of cells of the same type. That power, alas, comes with a price. To start, you typically spend more time QC-ing single-cell RNA-Seq data then bulk RNA-Seq data. To make that step easier, we developed a flexible and interactive tool for filtering cells based on common criteria (Figure 1).

Figure 1. Single-cell RNA-Seq QA/QC using Partek® Flow®. Violin plots enable filtering of cells by total number of reads per cell, number of detected genes per cell, or fraction of mitochondrial reads per cell. Each dot is a cell; blue dots = selected cells, black dots = excluded cells. Cells in the gray area will be filtered out and are shown as black dots on other plots. To emphasise the distribution of cells with respect to the metric on the y-axis, pink curves (“violins”) are added (data from the study “DroNc-Seq: Single nucleus RNA-seq on mouse archived brain”, available from the Single Cell Portal of the Broad Institute)

Because an elevated fraction of reads mapping to mitochondrial genes is a characteristic of broken cells, filtering based on that metric is highly advisable. Broken cells are a known source of transcriptional noise and their inclusion may compromise the downstream steps.

Total read count (or library size) and the number of detected genes are interpreted in a similar way and are used to exclude outliers. Again, excluding outliers reduces noise in the data and ensures technically homogenous starting data. For example, an unusually high number of reads / expressed genes may point to doublets (a technology artifact: multiple cells counted as one).

Completion of the QC stage results in a clean, analysis-ready data set. An exploratory technique, which these days is a hallmark of single-cell RNA-Seq analysis, is a dimensionality-reduction procedure: t-Distributed Stochastic Neighbor Embedding (t-SNE). According to Wikipedia, t-SNE “is particularly well-suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot”.

The key point here, which is easily overlooked, is three dimensions. A quick Google Images search for “single-cell t-SNE” will come up with a number of nice scatterplots and if you glance through them for just a second you will see the common denominator. They are all 2D t-SNEs. OK, but why am I making a big fuss about it?

Let us take an example of a t-SNE chart based on the aforementioned data set (Figure 2, left). We can spot several cell clusters, including a large cluster at 12 o’clock. If you are working with a common 2D t-SNE plot, you would call it a single cluster (zooming in does not help, I tried).

However, with the unique Partek 3D t-SNE chart, as soon as you start rotating the plot (Figure 2, right) the true biology emerges. What appeared as one cell group is actually three clusters. You would have completely missed it if you only had a 2D plot. This functionality is crucial if you are looking at scarce cell populations, which many single-cell studies are essentially about.



Figure 2. 3D t-SNE chart in Partek Flow. Only a single cell cluster (arrowhead) is visible on the top of the chart in the 2D projection (on the left). Rotating the plot using the 3D projection (on the right) reveals that there are actually three clusters (arrowheads) that would otherwise have been missed. Each dot is a single cell (data from the study “DroNc-Seq: Single nucleus RNA-Seq on mouse archived brain”, available from the Single Cell Portal of the Broad Institute)

Thus, you may have a groundbreaking hypothesis, your study design may be flawless, and your raw data may be as sound as it gets, but without the right analysis tool you may not be able to get that answer that you are looking for.

To find out more about the analysis of single-cell RNA-Seq data and new features coming to Partek Flow, please take a look at our recent webinar.

Oh, right, in case you are interested in the outcome of the takeway story, I actually passed. For any kind of “omics”, I always choose Partek; when it comes to pizza, I go for cheese.