Skip to content

Cell Calling and Quality Control

The ScaleBio RNA workflow implements multiple approaches to distinguish real cells from background barcodes in single-cell RNA sequencing data. This guide explains the different cell calling methods and quality control options available.


1. Overview

For QuantumScale RNA, single cells are identified by a combination of RT (sample) barcode, bead barcode and library (PCR) index. However not all barcode combinations correspond to real cells, but instead to background, e.g. empty beads. The workflow implements multiple approaches to distinguish cells from background barcodes.

  • cellFinder: EmptyDrops-like approach with ambient RNA profile rescue
  • TopCell Thresholding: Dynamic threshold based on transcript count percentiles
  • Fixed UTC Threshold: User-defined unique transcript count threshold

2. CellFinder (Default Method)

2.1. What is CellFinder?

CellFinder is an EmptyDrops-like cell calling approach used by default (cellFinder parameter). Cell-barcodes that are above a minimal transcript count (minUTC) but below the provided or calculated UTC threshold can be rescued based on expression differences from the ambient RNA profile.

2.2. How It Works

  • Barcodes above minUTC but below the UTC threshold are evaluated
  • Expression differences from ambient RNA profile are calculated
  • Cells are rescued based on false discovery rate (cellFinderFdr)

2.3. Enable CellFinder

# In your runParams.yml
cellFinder: true
UTC: 0  # Optional: Set fixed threshold
cellFinderFdr: 0.001

2.4. Parameters

Parameter Description Default
cellFinder Enables CellFinder cell calling true
UTC Fixed unique transcript count threshold (optional) 0
cellFinderFdr False discovery rate for cell rescue 0.001

Important

CellFinder is disabled for multi-species (barnyard) samples.


3. TopCell Thresholding

3.1. What is TopCell?

TopCell thresholding sets a hard threshold on the unique transcript count per cell-barcode; all barcodes with counts above this threshold are called as cells, while all other barcodes are background.

3.2. How It Works

  1. Barcodes with counts less than the parameter minUTC are filtered from the data.
  2. Next, the top cell, the cell barcode topCellPercent percentile count (if the expectedCells parameter is set, the percentile is applied to that number).
  3. The UTC top cell is then divided by the parameter minCellRatio to determine the unique transcript count threshold for the dataset.

3.3. Enable TopCell

# In your runParams.yml
cellFinder: false
minUTC: 100
topCellPercent: 99
minCellRatio: 10

3.4. Parameters

Parameter Description Default
minUTC Minimum counts to consider a barcode as potential cell 100
topCellPercent percentile to use for the top cell (robust max) 99
minCellRatio Ratio between transcript counts of top cell and the lower cell threshold 10

4. Fixed UTC Thresholding

4.1. What is Fixed UTC Thresholding?

A unique transcript count threshold is used when the parameter UTC is greater than 0 and cellFinder is not enabled. In this case every barcode with a total count >= UTC is called as a cell.

4.2. Enable Fixed Threshold

# In your runParams.yml
cellFinder: false
UTC: 300

4.3. How It Works

  • Every barcode with total count ≥ UTC is called as a cell
  • All other barcodes are considered background
  • Can be set in in a sample specific manner by using the sample barcode table (samples.csv). See the example below:
sample barcodes UTC
sample1 1A-6H 300
sample2 7A-12H 500

8. Output Files

For detailed information about cell calling outputs, see:

  • Outputs Overview: General output structure and organization
  • Samples Directory: The samples directory has information about the cells called per sample as well as background ambient barcodes.

9. References & Further Reading


Need Help?

For more information, please contact support@scale.bio or visit our support website.