Alignment Output

The alignment directory contains STAR and STARSolo alignment results for each sample in the analysis.

Directory Structure

alignment
└── QS-SmallKit-PBMCs.QSR-P
    ├── QS-SmallKit-PBMCs.QSR-P.star.align
    │   └── Log.final.out
    └── QS-SmallKit-PBMCs.QSR-P.star.solo
        └── GeneFull_Ex50pAS
            ├── CellReads.stats
            └── raw
                ├── UniqueAndMult-PropUnique.mtx.gz
                ├── barcodes.tsv.gz
                └── features.tsv.gz

Key Files and Directories

STAR Alignment (`*.star.align/`)

The STAR alignment directory contains core alignment results from the STAR aligner, mapping raw sequencing reads to the reference genome. This directory provides essential statistics about read mapping performance and data quality.

Log.final.out

This comprehensive log file contains detailed alignment statistics for assessing sequencing experiment success. It provides key metrics that help evaluate data quality:

Total number of input reads: Raw count of sequencing reads processed by STAR
Uniquely mapped reads: Reads mapping to exactly one genome location (highest quality)
Multi-mapped reads: Reads mapping to multiple locations (repetitive regions or artifacts)
Unmapped reads: Reads that couldn't be aligned (poor quality, contamination, or novel sequences)
Mapping rates and quality metrics: Overall alignment success statistics

STARSolo Results (`*.star.solo/`)

The STARSolo directory contains specialized single-cell analysis results beyond standard RNA-seq alignment. STARSolo performs cell barcode detection, UMI counting, and gene expression quantification optimized for single-cell data.

Gene Expression Quantification (`GeneFull_Ex50pAS/`)

This directory contains gene expression quantification results with parameters optimized for single-cell RNA sequencing:

GeneFull: Reads counted across full gene body (exons and introns), capturing nascent transcripts common in single-cell data
Ex50pAS: Exon-based counting with 50% antisense allowance, accounting for sequencing artifacts and biological antisense reads

This strategy balances capturing diverse transcript types while maintaining accurate gene expression measurements for single-cell data.

Key Files:

CellReads.stats

This file contains statistics about cell barcode and UMI processing for understanding single-cell data quality:

Cell barcode and UMI processing statistics: Performance metrics for UMIs and cell barcodes
Valid cell barcode information: Counts and quality metrics for detected cell barcodes
UMI deduplication statistics: Data about duplicate UMI removal effectiveness

These statistics help identify issues with cell capture, barcode design, or sequencing depth. Few valid cell barcodes might indicate cell capture problems, while unusual UMI distributions could suggest PCR amplification issues.

Raw Expression Data (raw/)

This directory contains raw gene expression matrix files in standard 10X Genomics format, compatible with most single-cell analysis tools:

barcodes.tsv.gz: Raw cell barcodes; all barcodes detected
features.tsv.gz: Gene information (gene IDs, names, types) corresponding to matrix rows
UniqueAndMult-PropUnique.mtx.gz: Main expression matrix in sparse format with UMI counts as values

File Formats

Matrix Market Format (`.mtx.gz`)

The gene expression data is stored in Matrix Market format:

Compressed sparse matrix representation
Compatible with most single-cell analysis tools

Barcodes File (`barcodes.tsv.gz`)

One cell barcode per line
Corresponds to columns in the expression matrix

Features File (`features.tsv.gz`)

Tab-separated file with gene information
Corresponds to rows in the expression matrix

Usage examples

These files can be directly loaded into popular single-cell analysis tools for raw data analysis. However, we recommend using the filtered data from the /samples output directory, which contains ScaleBio's cell-filtered data with quality control applied.

Raw Data Analysis (Alignment Output)

R/Seurat:

library(Seurat)

# Load the matrix market files using Seurat's ReadMtx function
expression_matrix <- ReadMtx(
  mtx = "alignment/<sample>.<libIndex2>.star.solo/GeneFull_Ex50pAS/raw/UniqueAndMult-PropUnique.mtx.gz",
  features = "alignment/<sample>.<libIndex2>.star.solo/GeneFull_Ex50pAS/raw/features.tsv.gz",
  cells = "alignment/<sample>.<libIndex2>.star.solo/GeneFull_Ex50pAS/raw/barcodes.tsv.gz"
)

# Create Seurat object
seurat_obj <- CreateSeuratObject(counts = expression_matrix)

Python/Scanpy:

import scanpy as sc
adata = sc.read_mtx("alignment/<sample>.<libIndex2>.star.solo/GeneFull_Ex50pAS/raw/")

Recommended: Filtered Data Analysis (Samples Output)

For most analyses, we recommend using the filtered data from the /samples directory, which contains:

Quality-controlled cells: Only cells that pass ScaleBio's cell calling criteria
Additional metadata: Cell metrics and quality indicators
Standard formats: Compatible with most single-cell analysis tools

See Samples Output for detailed usage examples with the filtered data.

Quality Metrics

The alignment process provides several quality metrics:

Mapping Rate: Percentage of reads that align to the reference genome
Unique Mapping Rate: Percentage of reads that map uniquely
Gene Detection: Number of genes detected per cell
UMI Saturation: Measure of sequencing depth adequacy

Pipeline Steps - Where alignment fits in with overall pipeline
Cell Calling - How cells are identified from aligned data
QC Reports - Quality control metrics and reports

Need Help?

For more information, please contact support@scale.bio or visit our support website.

Alignment Output

Directory Structure

Key Files and Directories

STAR Alignment (*.star.align/)

STARSolo Results (*.star.solo/)

Gene Expression Quantification (GeneFull_Ex50pAS/)