Samples Output

The samples directory contains processed gene expression matrices, cell metrics, and analysis-ready data files for each sample in the analysis.

Directory Structure

samples
├── QS-SmallKit-PBMCs.QSR-P.allBarcodes.parquet
├── QS-SmallKit-PBMCs.QSR-P.allCells.csv
├── QS-SmallKit-PBMCs.QSR-P.filtered.matrix
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
└── QS-SmallKit-PBMCs.QSR-P_anndata.h5ad

Key Files and Directories

Cell Metrics (`*.allCells.csv`)

Comprehensive CSV file containing detailed metrics for each called cell:

Column	Description
`cell_id`	Unique identifier composed of detected barcodes
`counts`	Number of unique transcript molecules detected
`genes`	Number of unique genes detected
`totalReads`	Total reads demultiplexed to this cell barcode
`countedReads`	Reads contributing to counts in expression matrix
`mappedReads`	Reads that aligned to the reference genome
`geneReads`	Reads that mapped to annotated genes
`exonReads`	Reads that mapped to exons
`intronReads`	Reads that mapped to introns
`antisenseReads`	Reads mapping antisense to annotated exons
`mitoReads`	Reads mapping to mitochondrial genome
`countedMultiGeneReads`	Multi-gene reads contributing to expression matrix
`Saturation`	`1 - (UniqueReads / TotalReads)` on reads mapped to transcriptome
`mitoProp`	Proportion of mapped reads aligned to mitochondrial genome
`PCR`	PCR (library) barcode alias
`RT`	RT plate well position
`bead_bc`	Bead barcode (microwell identifier)
`sample`	Sample name
`flags`	Quality control flags

Gene Expression Matrix (`*.filtered.matrix/`)

Standard single cell gene expression format for passing cells:

Cell Barcodes (`barcodes.tsv.gz`)

Cell barcodes for passing cells (one per line, gzipped)
Corresponds to columns in the expression matrix

Gene Information (`features.tsv.gz`)

Gene/feature information (gzipped tab-separated file)
Corresponds to rows in the expression matrix

Expression Counts (`matrix.mtx.gz`)

Sparse matrix format gene expression counts (gzipped)
Rows: Genes/features
Columns: Cells
Values: UMI counts per gene per cell

Python Analysis Object (`*_anndata.h5ad`)

AnnData format file for Python-based analysis:

Gene expression matrix: Sparse matrix of UMI counts
Cell metadata: All cell metrics and annotations (allCells.csv)

All Barcodes Data (`*.allBarcodes.parquet`)

Parquet format file containing all detected barcodes and their properties:

Barcode sequences: All detected barcode combinations; including cells and background barcodes
Quality metrics: Barcode quality scores and error rates
Sample assignments: Which sample each barcode belongs to
Cell calling results: Whether barcodes passed cell calling filters

Usage Examples

Loading Data in R/Seurat

library(Seurat)

# Load filtered matrix using ReadMtx
seurat_counts <- ReadMtx(
    mtx = "samples/your_sample.filtered.matrix/matrix.mtx.gz",
    features = "samples/your_sample.filtered.matrix/features.tsv.gz",
    cells = "samples/your_sample.filtered.matrix/barcodes.tsv.gz"
)

# Create Seurat object
seurat_obj <- CreateSeuratObject(counts = seurat_counts)

# Load cell metrics (optional)
cell_metrics <- read.csv("samples/your_sample.allCells.csv")
seurat_obj <- AddMetaData(seurat_obj, cell_metrics)

Note: Replace your_sample with the actual sample name or path. The ReadMtx() function expects the paths to the matrix, features, and barcodes files. This is the recommended way to load Matrix Market format data into Seurat.

Loading Data in Python/Scanpy

import scanpy as sc
import pandas as pd

# Load AnnData object
adata = sc.read_h5ad("samples/*_anndata.h5ad")

# Load cell metrics
cell_metrics = pd.read_csv("samples/*.allCells.csv")
adata.obs = cell_metrics.set_index('cell_id')

Loading Parquet Data

import pandas as pd
import glob

# Find all matching Parquet files
parquet_files = glob.glob("samples/*.allBarcodes.parquet")

# Read and concatenate all files
barcodes = pd.concat([pd.read_parquet(f) for f in parquet_files], ignore_index=True)

Quality Metrics

Cell Quality Indicators

UTC Counts: Number of unique molecules per cell
Gene Counts: Number of detected genes per cell
Mitochondrial Proportion: Quality indicator for cell viability
Saturation: Measure of sequencing depth adequacy

Data Quality Assessment

Cell Recovery: Percentage of expected cells detected
Gene Detection: Number of genes detected across cells
UTC Distribution: Distribution of unique transcript counts per cell
Barcode Quality: Error rates and quality scores

Need Help?

For more information, please contact support@scale.bio or visit our support website.