ScaleRna Analysis Overview

Overview

Step-by-step overview of the ScaleRna analysis pipeline. Each step is described concisely, with links to relevant documentation for further details. Use this as a quick reference for the workflow and to understand where to find more information about each stage.

Step-by-Step Workflow

FASTQ Generation (if starting from BCL files)
- Converts raw Illumina BCL files to FASTQ format using bcl-convert.
- Produces all necessary FASTQ files, including index reads (I1/I2).
- Filters reads not matching expected ScaleBio RNA PCR barcodes.
- If --splitFastq is enabled, generates separate FASTQ files for each PCR index pool and partial bead barcode.
- For more details, see Fastq Generation.
Read Trimming
- Trims poly-A, poly-G stretches, and adapter sequences using cutadapt.
- Cuts reads at the first occurrence of a stretch of 7 As within 8 bp.
- Ensures high-quality input for downstream analysis.
FastQC and MultiQC
- Runs quality control reports on input FASTQ files using FastQC.
- Reports include Q30 scores, adapter content, and base distribution for Read 1 & 2.
- FastQC reports for all FASTQ files, along with the cutadapt metrics output is combined into one QC file with MultiQC.
- Enable with the --fastqc parameter. See FastQ Output for more.
Barcode Parsing and Sample Demultiplexing
- Extracts and error-corrects single-cell barcodes using the ScaleBio bc_parser tool.
- Error correction allows up to 1 mismatch against expected barcode sequences.
- Separates samples loaded into different RT barcode wells using the samples.csv sheet (Sample Barcode Table).
- Outputs an unaligned BAM file for each sample, with cell barcodes in tags.
- Reads shorter than 16 bp after trimming are discarded.
- If --splitFastq is enabled, further splits reads by RT barcode for parallel processing.
- For more on barcode handling, see Barcodes Output.
Genome Alignment
- Aligns transcript reads to the genome using STAR.
- Optionally outputs a sorted BAM file for downstream analysis (--bamOut).
- For alignment output details, see Alignment Output.
Transcript Assignment & Quantification
- Matches aligned reads to genes and transcripts using STARsolo.
- Prioritizes exon matches over intron matches; filters antisense matches.
- Uses gene annotation from the STAR index (GTF file).
- For more, see Alignment Output.
Unique Transcript Counting
- Counts unique transcripts per cell and gene (UMI-based quantification).
- Handles multi-gene reads using the method set by starMulti (see Analysis Parameters).
- Allows up to 6 mapping positions per read by default.
Cell Filtering (Cell Calling)
- Identifies real cells using cellFinder (EmptyDrops-like) or unique transcript count thresholding.
- cellFinder can "rescue" cells based on expression differences from ambient RNA.
- Thresholds are determined based on unique transcript counts and user parameters (see Cell Calling).
Gene Expression Matrix Generation
- Produces sparse gene expression matrices (MTX format) for all and filtered cell barcodes.
- Compatible with Seurat, ScanPy, and similar tools.
- For output file details, see Samples Output.
Sample QC Report Generation
- Generates summary reports (HTML and CSV) with mapping metrics, cell counts, and sensitivity metrics for each sample.
- Includes RT barcode distribution and other sample-specific statistics.
- For more on report content, see QC Reports and Outputs.
Library QC Report Generation
- Produces an HTML report with QC metrics for the entire ScaleBio RNA library.
- Focuses on barcode matching rates, read distribution, and data quality.
- See QC Reports for more details.

ScaleBio Tools

In addition to third-party and open-source software, the workflow uses specialized tools developed by ScaleBio. The most important is:

`bc_parser`

Purpose:
- Extracts and error-corrects cell barcodes and UMIs from input FASTQ files.
- Splits (demultiplexes) input FASTQ files into sample BAM files based on cell barcodes (RT wells).
- Generates barcode and read-level metrics for downstream QC and analysis.
Output:
- Produces unaligned BAM (.ubam) files with all barcode information and transcript sequence in a single file. That makes the output easily compatible with STARSolo (and other tools). Specifically, the barcode sequence in the unaligned BAM file is repesented under the CB tag, and the UMI is represented under the UM tag.

Installation & Availability

Included in Containers:
- All ScaleBio tools are included in the scalerna Docker container image. When running the workflow with -profile docker (or other container engines like Singularity or Podman), these tools are automatically available.
Manual Installation:
- If not using containers, you must install the tools manually. Use the provided script: bash /PATH/TO/ScaleRNA/envs/download-scale-tools.sh
- This downloads pre-compiled binaries for Linux (x86_64) and installs them in ScaleRNA/bin.

Additional Notes & Cross-References

For a full list of input requirements, see Analysis Parameters.
For details on output files and directory structure, see Outputs.
For troubleshooting and advanced options, see FAQs and Advanced Topics.

If you have questions about any step, refer to the linked documentation or reach out to the ScaleBio support team (support@scale.bio).

Need Help?

For more information, please contact support@scale.bio or visit our support website.