Single Cell Resource Guide

Welcome! Though single cell analysis has been around for some time, it can be challenging to understand and perform. This collection of resources has been gathered to help. We hope you find these resources useful.

Topics:

If there is something you think that could be added to this page to make it more useful, please contact us.


Single Cell Analysis Process

Here we will provide an overview of the single cell data analysis steps and provide tips.

single cell data processing
Pre-processing
Cell Type Diversity
Cell Type Discovery
Differential analysis
Differential Analysis
pathway analysis
Biological Interpretation
Description An umbrella term encompassing the steps starting with sequencing output and typically ending with analysis-ready data One of the most common goals of single cell analysis, aiming to detect previously undescribed cell populations Statistical comparison of the study samples A set of analysis procedures, which relate results of statistical analysis with domain knowledge
Primary Output Quantified gene/protein matrix file containing the number of reads per gene and per cell Phenotypic description of the new cell type, e.g. unique gene expression signature Feature-based: List of genes (or other features) that are expressed at different levels between the samples

Cell-based: List of cell types that are present in different quantities between the samples

List of gene sets (groups) with different expression levels between the study samples
Steps
  • Alignment
  • Quantification
  • Filtering barcodes
  • Normalization
  • Batch correction
  • Scatter plot (t-SNE, UMAP)
  • Clustering
  • Trajectory analysis
  • Biomarker detection
  • Statistical testing
  • Enrichment
  • Differential analysis
Tips Alignment

  • STAR works well for splice-aware alignment and is commonly cited in publications
  • Alternatively, any splice-aware aligner can be used

Quantification

  • Ensembl is the most frequently used source of gene annotation models

Barcode Filtering

  • Balance between high-quality barcodes and maximizing their number
  • Experimental goal drives choice between quality or quantity
  • If the cutoff suggested by the algorithm contains too few barcodes, use the number of cells as a guide

Normalization

  • Normalization matters!
  • Carefully inspect data distribution
  • pre- and post-normalization
  • Keep an eye on the values, as this information may be needed downstream
  • Try different normalization transformations and different settings
  • Tip: to start with, try SCTransform

 

Batch correction (optional)

  • If you suspect/know that the data is burdened by batch effects, perform thorough exploration
  • Try different tools; some tools use normalized data as input, some use PCA results
  • Be aware of a possible confounding effect between biology and technology
  • Tip: general linear model offers the most flexibility if you have a complex experimental design
  • Tip: try S3 integration
Scatter plot

  • Manual approach: visual detection of cell groups using data exploration tools
  • Common tools include UMAP and t-SNE; UMAP is computationally faster and results in better separation of cell groups
  • Batch correction (if needed) is critical for visual detection of novel cell types
  • Tip: always use 3D scatterplot, not to overlook a population due to data point superimposition

Clustering

  • Algorithmic – based on a computational algorithm
  • Graph-based clustering is the most common tool
  • Tip: to “force” clustering in a custom number of clusters, try K-means clustering

Trajectory analysis

  • Algorithmic – based on a computational algorithm
  • Branches of a trajectory tree are basically cell populations

Biomarker detection

  • Differential expression can be performed between cell populations to identify group-specific biomarkers
  • Tip: interpret the biomarker list to gain functional insight into cell groups
  • Perform differential testing with your biological question in mind
  • Add relevant experimental factors to your model, including control factors
  • Tip: if data shows evidence of a batch effect, include it into the model
  • Features expressed at very low level (~background) may be removed before testing, to de-noise the data
  • When choosing the low threshold filter, don’t simply use the defaults; always make an informed decision based on exploratory analysis and descriptive statistics
  • No two tools will produce exactly the same result and this is expected; however, feature lists should be highly comparable
  • False discovery rate (FDR) is affected by the number of tests performed in parallel. Removing features expressed at a very low level as well as irrelevant features (i.e., gene biotypes) will boost your statistical power (if you are removing features, your decisions need to be justified!)
  • Tip: try the hurdle model (for bulk RNA-seq try DESeq2)
Gene sets

  • Typically: Gene ontology categories (GO enrichment) or pathways (pathway enrichment)
  • Tip: the Broad institute hosts several interesting gene sets which can be used e.g., to look for miRNAs targeting significant genes or transcription factors controlling their expression

Enrichment

  • Enrichment-based methods start with a list of significant genes (features) and produce a list of enriched gene sets
  • When interpreting enrichment results, focus on ranking: gene sets that are at the top of the list
  • Tip: if a cut-off value is preferred, use an enrichment score of 3.0 or higher

Differential analysis

  • Methods based on differential analysis start with gene (feature) expression values, combine genes into sets, and then perform statistical analysis between the study samples. Output is a p-value and fold change for each set
  • Enrichment-based methods have an inherent bias (as cut-off criteria such as p-value and fold change are arbitrary), which does not affect methods based on differential expression

Single Cell Publications

 

These recent single cell publications all feature Partek Flow software.

Partek Flow Single Cell Bioinformatics Training Series

 

This comprehensive series provides three hours of single cell content broken into bite-size videos. Learn step-by-step how to perform single cell analysis and receive expert tips throughout.

Single Cell mRNA-Seq and Protein NGS Assays

This table outlines the most common single cell assays and details.

Assay Vendor Type of Single Cell Isolation Method Measurements Coverage Short Description
10x Chromium 3’ gene expression 10x Genomics® Droplet mRNA 3′ Single cells are encapsulated into droplets with a barcoded gel bead and reagents. Cells are lysed and the 3′ end of mRNA transcripts are captured to create barcoded cDNA libraries for sequencing
10x Chromium 3′ gene expression + Feature barcoding 10x Genomics Droplet mRNA + Protein 3′ Single cells are encapsulated into droplets with a barcoded gel bead and reagents. Cells are lysed, Biolegend® TotalSeq™-B barcode-conjugated antibodies are attached to cell surface proteins, and the 3′ end of mRNA transcripts and feature barcodes are captured to create barcoded cDNA libraries for sequencing
10x Chromium 5′ gene expression 10x Genomics Droplet mRNA 5′ Single cells are encapsulated into droplets with a barcoded gel bead and reagents. Cells are lysed and the 5′ end of mRNA transcripts are captured to create barcoded cDNA libraries for sequencing
10x Chromium 5′ gene expression + Feature barcoding 10x Genomics Droplet mRNA + Protein 5′ Single cells are encapsulated into droplets with a barcoded gel bead and reagents. Cells are lysed, Biolegend® TotalSeq™-C barcode-conjugated antibodies are attached to cell surface proteins, and the 5′ end of mRNA transcripts and feature barcodes are captured to create barcoded cDNA libraries for sequencing
10x Chromium Visium Spatial Gene Expression 10x Genomics Tissue slide mRNA + histology + spatial coordinates 3′ Tissue slices are histologically stained and imaged on a Visium tissue slide. Barcoded tissue spots on the slide capture mRNA from cells to create barcoded cDNA libraries for sequencing
BD Rhapsody™ Targeted mRNA BD® Biosciences Microwell mRNA 3′ Single cells are paired with barcoded magnetic capture beads in microwells. Cells are lysed and the 3′ end of mRNA transcripts from a validated panel of genes are captured. The beads are retrieved and barcoded cDNA libraries are created for sequencing
BD Rhapsody™ Whole Transcriptome Analysis (WTA) BD Biosciences Microwell mRNA 3′ Single cells are paired with barcoded magnetic capture beads in microwells. Cells are lysed and all 3′ end of mRNA transcripts are captured. The beads are retrieved and barcoded cDNA libraries are created for sequencing
BD Rhapsody™ Targeted mRNA + AbSeq BD Biosciences Microwell mRNA + Protein 3′ Single cells are labeled with barcoded conjugated antibodies and paired with barcoded magnetic capture beads in microwells. Cells are lysed and the 3′ end of mRNA transcripts from a validated panel of genes and antibody barcodes are captured. The beads are retrieved and barcoded cDNA libraries are created for sequencing
Fluidigm C1™ mRNA Seq HT IFC Fluidigm® Integrated fluidic circuit mRNA 3′ Single cells are separated into an integrated fluidic circuit with 20 columns x 40 rows (800 capture sites). Cells are lysed in each capture site and the transcripts are processed to create uniquely barcoded cDNA libraries for each single cell
DropSeq Open source, although commercial implementations exist (e.g. DolomiteBio®) Droplet mRNA 3′ Single cells are encapsulated into droplets with a barcoded microbead and reagents. Cells are lysed and the 3′ end of mRNA transcripts are captured to create barcoded cDNA libraries for sequencing
SureCell™ WTA 3′ Illumina®/Bio-Rad® Droplet mRNA 3′ (strand-specific) Single cells are encapsulated into droplets, lysed, and barcoded. Barcoded cDNA is pooled for second-strand synthesis. Libraries are generated with direct cDNA tagmentation followed by 3′ enrichment, sample indexing and, downstream sequencing
SmartSeq2 Open source, although commercial implementations exist (e.g. Takara Bio®) Various (e.g. manual pipetting, FACS, Fluidigm C1™) mRNA Full-length Single cells are separated into wells and lysed. Full-length cDNA libraries are constructed and tagmented for each cell prior to short-read sequencing


Tips and Tricks

This is a collection of blog posts and articles about single cell analysis.

How to select the best single cell quality control thresholds
The answer no one wants to hear

Using trajectory analysis to study cellular differentiation in single cell RNA-Seq experiments
Using trajectory analysis to determine their fate

Tissue transcriptomics—what’s the big deal and why you should do it
Transcriptome-wide studies of gene expression certainly provide invaluable insight into biology on a molecular level, particularly when performed at the single-cell level

Less is more: detecting differential gene expression in single cell RNA-Seq analysis
Which tools to use for single cell analysis

Batch remover for single cell data
Can nuisance batch effects or undesirable numeric or categorical factors be removed?

How to perform single cell RNA sequencing: exploratory analysis
Step one in performing single cell analysis

Bioinformatics approach to spatially resolved transcriptomics
A review of spatial transcriptomic analysis

Need a Single Cell
Analysis Tool?

Try Partek Flow.

Kathi GoscheSingle Cell Resource Guide