Skip to content

Samples Output

The samples directory contains processed gene expression matrices, cell metrics, and analysis-ready data files for each sample in the analysis.

Directory Structure

samples
├── QS-SmallKit-PBMCs.QSR-P.allBarcodes.parquet
├── QS-SmallKit-PBMCs.QSR-P.allCells.csv
├── QS-SmallKit-PBMCs.QSR-P.filtered.matrix
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
└── QS-SmallKit-PBMCs.QSR-P_anndata.h5ad

Key Files and Directories

Cell Metrics (*.allCells.csv)

Comprehensive CSV file containing detailed metrics for each called cell:

Column Description
cell_id Unique identifier composed of detected barcodes
counts Number of unique transcript molecules detected
genes Number of unique genes detected
totalReads Total reads demultiplexed to this cell barcode
countedReads Reads contributing to counts in expression matrix
mappedReads Reads that aligned to the reference genome
geneReads Reads that mapped to annotated genes
exonReads Reads that mapped to exons
intronReads Reads that mapped to introns
antisenseReads Reads mapping antisense to annotated exons
mitoReads Reads mapping to mitochondrial genome
countedMultiGeneReads Multi-gene reads contributing to expression matrix
Saturation 1 - (UniqueReads / TotalReads) on reads mapped to transcriptome
mitoProp Proportion of mapped reads aligned to mitochondrial genome
PCR PCR (library) barcode alias
RT RT plate well position
bead_bc Bead barcode (microwell identifier)
sample Sample name
flags Quality control flags

Gene Expression Matrix (*.filtered.matrix/)

Standard single cell gene expression format for passing cells:

Cell Barcodes (barcodes.tsv.gz)

  • Cell barcodes for passing cells (one per line, gzipped)
  • Corresponds to columns in the expression matrix

Gene Information (features.tsv.gz)

  • Gene/feature information (gzipped tab-separated file)
  • Corresponds to rows in the expression matrix

Expression Counts (matrix.mtx.gz)

  • Sparse matrix format gene expression counts (gzipped)
  • Rows: Genes/features
  • Columns: Cells
  • Values: UMI counts per gene per cell

Python Analysis Object (*_anndata.h5ad)

AnnData format file for Python-based analysis:

  • Gene expression matrix: Sparse matrix of UMI counts
  • Cell metadata: All cell metrics and annotations (allCells.csv)

All Barcodes Data (*.allBarcodes.parquet)

Parquet format file containing all detected barcodes and their properties:

  • Barcode sequences: All detected barcode combinations; including cells and background barcodes
  • Quality metrics: Barcode quality scores and error rates
  • Sample assignments: Which sample each barcode belongs to
  • Cell calling results: Whether barcodes passed cell calling filters

Usage Examples

Loading Data in R/Seurat

library(Seurat)

# Load filtered matrix using ReadMtx
seurat_counts <- ReadMtx(
    mtx = "samples/your_sample.filtered.matrix/matrix.mtx.gz",
    features = "samples/your_sample.filtered.matrix/features.tsv.gz",
    cells = "samples/your_sample.filtered.matrix/barcodes.tsv.gz"
)

# Create Seurat object
seurat_obj <- CreateSeuratObject(counts = seurat_counts)

# Load cell metrics (optional)
cell_metrics <- read.csv("samples/your_sample.allCells.csv")
seurat_obj <- AddMetaData(seurat_obj, cell_metrics)

Note: Replace your_sample with the actual sample name or path. The ReadMtx() function expects the paths to the matrix, features, and barcodes files. This is the recommended way to load Matrix Market format data into Seurat.

Loading Data in Python/Scanpy

import scanpy as sc
import pandas as pd

# Load AnnData object
adata = sc.read_h5ad("samples/*_anndata.h5ad")

# Load cell metrics
cell_metrics = pd.read_csv("samples/*.allCells.csv")
adata.obs = cell_metrics.set_index('cell_id')

Loading Parquet Data

import pandas as pd
import glob

# Find all matching Parquet files
parquet_files = glob.glob("samples/*.allBarcodes.parquet")

# Read and concatenate all files
barcodes = pd.concat([pd.read_parquet(f) for f in parquet_files], ignore_index=True)

Quality Metrics

Cell Quality Indicators

  • UTC Counts: Number of unique molecules per cell
  • Gene Counts: Number of detected genes per cell
  • Mitochondrial Proportion: Quality indicator for cell viability
  • Saturation: Measure of sequencing depth adequacy

Data Quality Assessment

  • Cell Recovery: Percentage of expected cells detected
  • Gene Detection: Number of genes detected across cells
  • UTC Distribution: Distribution of unique transcript counts per cell
  • Barcode Quality: Error rates and quality scores

Need Help?

For more information, please contact support@scale.bio or visit our support website.