Single Cell Resource Guide

Welcome! Though single cell analysis has been around for some time, it can be challenging to understand and perform. This collection of resources has been gathered to help. We hope you find these resources useful.

Topics:

If there is something you think that could be added to this page to make it more useful, please contact us.


Single Cell Analysis Process

Here we will provide an overview of the single cell data analysis steps and provide tips.

single cell data processing
Pre-processing
Cell Type Diversity
Cell Type Discovery
Differential analysis
Differential Analysis
pathway analysis
Biological Interpretation
Description An umbrella term encompassing the steps starting with sequencing output and typically ending with analysis-ready data One of the most common goals of single cell analysis, aiming to detect previously undescribed cell populations Statistical comparison of the study samples A set of analysis procedures, which relate results of statistical analysis with domain knowledge
Primary Output Quantified gene/protein matrix file containing the number of reads per gene and per cell Phenotypic description of the new cell type, e.g. unique gene expression signature Feature-based: List of genes (or other features) that are expressed at different levels between the samples

Cell-based: List of cell types that are present in different quantities between the samples

List of gene sets (groups) with different expression levels between the study samples
Steps
  • Alignment
  • Quantification
  • Filtering barcodes
  • Normalization
  • Batch correction
  • Scatter plot (t-SNE, UMAP)
  • Clustering
  • Trajectory analysis
  • Biomarker detection
  • Statistical testing
  • Enrichment
  • Differential analysis
Tips Alignment

  • STAR works well for splice-aware alignment and is commonly cited in publications
  • Alternatively, any splice-aware aligner can be used

Quantification

  • Ensembl is the most frequently used source of gene annotation models

Barcode Filtering

  • Balance between high-quality barcodes and maximizing their number
  • Experimental goal drives choice between quality or quantity
  • If the cutoff suggested by the algorithm contains too few barcodes, use the number of cells as a guide

Normalization

  • Normalization matters!
  • Carefully inspect data distribution
  • pre- and post-normalization
  • Keep an eye on the values, as this information may be needed downstream
  • Try different normalization transformations and different settings
  • Tip: to start with, try SCTransform

 

Batch correction (optional)

  • If you suspect/know that the data is burdened by batch effects, perform thorough exploration
  • Try different tools; some tools use normalized data as input, some use PCA results
  • Be aware of a possible confounding effect between biology and technology
  • Tip: general linear model offers the most flexibility if you have a complex experimental design
  • Tip: try S3 integration
Scatter plot

  • Manual approach: visual detection of cell groups using data exploration tools
  • Common tools include UMAP and t-SNE; UMAP is computationally faster and results in better separation of cell groups
  • Batch correction (if needed) is critical for visual detection of novel cell types
  • Tip: always use 3D scatterplot, not to overlook a population due to data point superimposition

Clustering

  • Algorithmic – based on a computational algorithm
  • Graph-based clustering is the most common tool
  • Tip: to “force” clustering in a custom number of clusters, try K-means clustering

Trajectory analysis

  • Algorithmic – based on a computational algorithm
  • Branches of a trajectory tree are basically cell populations

Biomarker detection

  • Differential expression can be performed between cell populations to identify group-specific biomarkers
  • Tip: interpret the biomarker list to gain functional insight into cell groups
  • Perform differential testing with your biological question in mind
  • Add relevant experimental factors to your model, including control factors
  • Tip: if data shows evidence of a batch effect, include it into the model
  • Features expressed at very low level (~background) may be removed before testing, to de-noise the data
  • When choosing the low threshold filter, don’t simply use the defaults; always make an informed decision based on exploratory analysis and descriptive statistics
  • No two tools will produce exactly the same result and this is expected; however, feature lists should be highly comparable
  • False discovery rate (FDR) is affected by the number of tests performed in parallel. Removing features expressed at a very low level as well as irrelevant features (i.e., gene biotypes) will boost your statistical power (if you are removing features, your decisions need to be justified!)
  • Tip: try the hurdle model (for bulk RNA-seq try DESeq2)
Gene sets

  • Typically: Gene ontology categories (GO enrichment) or pathways (pathway enrichment)
  • Tip: the Broad institute hosts several interesting gene sets which can be used e.g., to look for miRNAs targeting significant genes or transcription factors controlling their expression

Enrichment

  • Enrichment-based methods start with a list of significant genes (features) and produce a list of enriched gene sets
  • When interpreting enrichment results, focus on ranking: gene sets that are at the top of the list
  • Tip: if a cut-off value is preferred, use an enrichment score of 3.0 or higher

Differential analysis

  • Methods based on differential analysis start with gene (feature) expression values, combine genes into sets, and then perform statistical analysis between the study samples. Output is a p-value and fold change for each set
  • Enrichment-based methods have an inherent bias (as cut-off criteria such as p-value and fold change are arbitrary), which does not affect methods based on differential expression

Partek Flow Single Cell Bioinformatics Training Series

 

This comprehensive series provides three hours of single cell content broken into bite-size videos. Learn step-by-step how to perform single cell analysis and receive expert tips throughout.

Single Cell mRNA-Seq and Protein NGS Assays

This table outlines the most common single cell assays and details.

Assay Vendor Type of Single Cell Isolation Method Measurements Coverage Short Description
10x Chromium 3’ gene expression 10x Genomics® Droplet mRNA 3′ Single cells are encapsulated into droplets with a barcoded gel bead and reagents. Cells are lysed and the 3′ end of mRNA transcripts are captured to create barcoded cDNA libraries for sequencing
10x Chromium 3′ gene expression + Feature barcoding 10x Genomics Droplet mRNA + Protein 3′ Single cells are encapsulated into droplets with a barcoded gel bead and reagents. Cells are lysed, Biolegend® TotalSeq™-B barcode-conjugated antibodies are attached to cell surface proteins, and the 3′ end of mRNA transcripts and feature barcodes are captured to create barcoded cDNA libraries for sequencing
10x Chromium 5′ gene expression 10x Genomics Droplet mRNA 5′ Single cells are encapsulated into droplets with a barcoded gel bead and reagents. Cells are lysed and the 5′ end of mRNA transcripts are captured to create barcoded cDNA libraries for sequencing
10x Chromium 5′ gene expression + Feature barcoding 10x Genomics Droplet mRNA + Protein 5′ Single cells are encapsulated into droplets with a barcoded gel bead and reagents. Cells are lysed, Biolegend® TotalSeq™-C barcode-conjugated antibodies are attached to cell surface proteins, and the 5′ end of mRNA transcripts and feature barcodes are captured to create barcoded cDNA libraries for sequencing
10x Chromium Visium Spatial Gene Expression 10x Genomics Tissue slide mRNA + histology + spatial coordinates 3′ Tissue slices are histologically stained and imaged on a Visium tissue slide. Barcoded tissue spots on the slide capture mRNA from cells to create barcoded cDNA libraries for sequencing
BD Rhapsody™ Targeted mRNA BD® Biosciences Microwell mRNA 3′ Single cells are paired with barcoded magnetic capture beads in microwells. Cells are lysed and the 3′ end of mRNA transcripts from a validated panel of genes are captured. The beads are retrieved and barcoded cDNA libraries are created for sequencing
BD Rhapsody™ Whole Transcriptome Analysis (WTA) BD Biosciences Microwell mRNA 3′ Single cells are paired with barcoded magnetic capture beads in microwells. Cells are lysed and all 3′ end of mRNA transcripts are captured. The beads are retrieved and barcoded cDNA libraries are created for sequencing
BD Rhapsody™ Targeted mRNA + AbSeq BD Biosciences Microwell mRNA + Protein 3′ Single cells are labeled with barcoded conjugated antibodies and paired with barcoded magnetic capture beads in microwells. Cells are lysed and the 3′ end of mRNA transcripts from a validated panel of genes and antibody barcodes are captured. The beads are retrieved and barcoded cDNA libraries are created for sequencing
Fluidigm C1™ mRNA Seq HT IFC Fluidigm® Integrated fluidic circuit mRNA 3′ Single cells are separated into an integrated fluidic circuit with 20 columns x 40 rows (800 capture sites). Cells are lysed in each capture site and the transcripts are processed to create uniquely barcoded cDNA libraries for each single cell
SureCell™ WTA 3′ Illumina®/Bio-Rad® Droplet mRNA 3′ (strand-specific) Single cells are encapsulated into droplets, lysed, and barcoded. Barcoded cDNA is pooled for second-strand synthesis. Libraries are generated with direct cDNA tagmentation followed by 3′ enrichment, sample indexing and, downstream sequencing
CosMx™ SMI NanoString tissue on slides mRNA + protein + histology panel specific The Spatial Molecular Imager quantifies RNAs and proteins using a smart cyclic in situ hybridization chemistry
Evercode™ WT v2 Parse Biosciences cell or nucleus is the reaction vessel mRNA 3′ (captures regions that tile across the transcript) Barcodes are appended to each transcript via split pool combinatorial barcoding prior to standard library preparation and sequencing
MERFISH Vizgen tissue on slides mRNA + histology panel specific The spatial distribution of RNA is visualized and quantified by fluorescence microscopy using custom probes
DropSeq Open source, although commercial implementations exist (e.g. DolomiteBio®) Droplet mRNA 3′ Single cells are encapsulated into droplets with a barcoded microbead and reagents. Cells are lysed and the 3′ end of mRNA transcripts are captured to create barcoded cDNA libraries for sequencing
SmartSeq2 Open source, although commercial implementations exist (e.g. Takara Bio®) Various (e.g. manual pipetting, FACS, Fluidigm C1™) mRNA Full-length Single cells are separated into wells and lysed. Full-length cDNA libraries are constructed and tagmented for each cell prior to short-read sequencing


Tips and Tricks

This is a collection of blog posts and articles about single cell analysis.

How to select the best single cell quality control thresholds
The answer no one wants to hear

Using trajectory analysis to study cellular differentiation in single cell RNA-Seq experiments
Using trajectory analysis to determine their fate

Tissue transcriptomics—what’s the big deal and why you should do it
Transcriptome-wide studies of gene expression certainly provide invaluable insight into biology on a molecular level, particularly when performed at the single-cell level

Less is more: detecting differential gene expression in single cell RNA-Seq analysis
Which tools to use for single cell analysis

Batch remover for single cell data
Can nuisance batch effects or undesirable numeric or categorical factors be removed?

How to perform single cell RNA sequencing: exploratory analysis
Step one in performing single cell analysis

Bioinformatics approach to spatially resolved transcriptomics
A review of spatial transcriptomic analysis

Need a Single Cell
Analysis Tool?

Try Partek Flow.

Frequently Asked Questions About Single Cell Data

This technique analyzes gene expression at the individual cell level. By sequencing the transcriptome of single cells, single cell RNA sequencing can reveal the molecular diversity of cells within a tissue or organism. For example, scRNA-Seq can be used to identify differentially expressed genes between cells and clusters of cells which allows for the identification of distinct cell populations and subsequent gene expression profiles, or to reveal transcriptional dynamics, such as changes in gene expression over time or in response to external stimuli. In addition, it can be used to identify novel cell types and their functions, study rare cell types, or reconstruct cellular trajectories and infer developmental pathways. Overall, scRNA-Seq analysis provides insights into cellular heterogeneity, gene expression regulation, cell type identification, and cellular functions which can lead to new discoveries.

RNA sequencing is a high-throughput technique used to quantify gene expression by sequencing RNA molecules. It is commonly used to analyze gene expression patterns across different conditions or cell types and involves converting RNA molecules into cDNA fragments, which are then sequenced using next-generation sequencing technologies. Single cell RNA sequencing is a specialized form of RNA sequencing to analyze gene expression at the single cell level. The major difference between these is the level of resolution. Bulk RNA-Seq measures a mixture of many cells whereas scRNA-Seq uses individual cells which allows researchers to identify and analyze gene expression patterns in specific cell types with a more detailed understanding of cellular processes and gene regulation.

scRNA-Seq analysis offers several advantages over traditional bulk RNA-Seq. Some of these advantages include:

  • Identifying rare cell types
  • Understanding cellular heterogeneity
  • Capturing cell-to-cell variation
  • Deconvoluting tissue-specific gene expression
  • Reducing batch effects

Single cell analysis studies the properties of individual cells, rather than in bulk. This technique has several applications in various fields such as:

  • Understanding cellular heterogeneity
  • Disease diagnosis and prognosis
  • Drug discovery and development
  • Immunology research
  • Neuroscience research
  • Developmental biology
Kathi GoscheSingle Cell Resource Guide