Samples Output
The samples
directory contains processed gene expression matrices, cell metrics, and analysis-ready data files for each sample in the analysis.
Directory Structure
samples
├── QS-SmallKit-PBMCs.QSR-P.allBarcodes.parquet
├── QS-SmallKit-PBMCs.QSR-P.allCells.csv
├── QS-SmallKit-PBMCs.QSR-P.filtered.matrix
│ ├── barcodes.tsv.gz
│ ├── features.tsv.gz
│ └── matrix.mtx.gz
└── QS-SmallKit-PBMCs.QSR-P_anndata.h5ad
Key Files and Directories
Cell Metrics (*.allCells.csv
)
Comprehensive CSV file containing detailed metrics for each called cell:
Column | Description |
---|---|
cell_id |
Unique identifier composed of detected barcodes |
counts |
Number of unique transcript molecules detected |
genes |
Number of unique genes detected |
totalReads |
Total reads demultiplexed to this cell barcode |
countedReads |
Reads contributing to counts in expression matrix |
mappedReads |
Reads that aligned to the reference genome |
geneReads |
Reads that mapped to annotated genes |
exonReads |
Reads that mapped to exons |
intronReads |
Reads that mapped to introns |
antisenseReads |
Reads mapping antisense to annotated exons |
mitoReads |
Reads mapping to mitochondrial genome |
countedMultiGeneReads |
Multi-gene reads contributing to expression matrix |
Saturation |
1 - (UniqueReads / TotalReads) on reads mapped to transcriptome |
mitoProp |
Proportion of mapped reads aligned to mitochondrial genome |
PCR |
PCR (library) barcode alias |
RT |
RT plate well position |
bead_bc |
Bead barcode (microwell identifier) |
sample |
Sample name |
flags |
Quality control flags |
Gene Expression Matrix (*.filtered.matrix/
)
Standard single cell gene expression format for passing cells:
Cell Barcodes (barcodes.tsv.gz
)
- Cell barcodes for passing cells (one per line, gzipped)
- Corresponds to columns in the expression matrix
Gene Information (features.tsv.gz
)
- Gene/feature information (gzipped tab-separated file)
- Corresponds to rows in the expression matrix
Expression Counts (matrix.mtx.gz
)
- Sparse matrix format gene expression counts (gzipped)
- Rows: Genes/features
- Columns: Cells
- Values: UMI counts per gene per cell
Python Analysis Object (*_anndata.h5ad
)
AnnData format file for Python-based analysis:
- Gene expression matrix: Sparse matrix of UMI counts
- Cell metadata: All cell metrics and annotations (allCells.csv)
All Barcodes Data (*.allBarcodes.parquet
)
Parquet format file containing all detected barcodes and their properties:
- Barcode sequences: All detected barcode combinations; including cells and background barcodes
- Quality metrics: Barcode quality scores and error rates
- Sample assignments: Which sample each barcode belongs to
- Cell calling results: Whether barcodes passed cell calling filters
Usage Examples
Loading Data in R/Seurat
library(Seurat)
# Load filtered matrix using ReadMtx
seurat_counts <- ReadMtx(
mtx = "samples/your_sample.filtered.matrix/matrix.mtx.gz",
features = "samples/your_sample.filtered.matrix/features.tsv.gz",
cells = "samples/your_sample.filtered.matrix/barcodes.tsv.gz"
)
# Create Seurat object
seurat_obj <- CreateSeuratObject(counts = seurat_counts)
# Load cell metrics (optional)
cell_metrics <- read.csv("samples/your_sample.allCells.csv")
seurat_obj <- AddMetaData(seurat_obj, cell_metrics)
Note: Replace your_sample
with the actual sample name or path. The ReadMtx()
function expects the paths to the matrix, features, and barcodes files. This is the recommended way to load Matrix Market format data into Seurat.
Loading Data in Python/Scanpy
import scanpy as sc
import pandas as pd
# Load AnnData object
adata = sc.read_h5ad("samples/*_anndata.h5ad")
# Load cell metrics
cell_metrics = pd.read_csv("samples/*.allCells.csv")
adata.obs = cell_metrics.set_index('cell_id')
Loading Parquet Data
import pandas as pd
import glob
# Find all matching Parquet files
parquet_files = glob.glob("samples/*.allBarcodes.parquet")
# Read and concatenate all files
barcodes = pd.concat([pd.read_parquet(f) for f in parquet_files], ignore_index=True)
Quality Metrics
Cell Quality Indicators
- UTC Counts: Number of unique molecules per cell
- Gene Counts: Number of detected genes per cell
- Mitochondrial Proportion: Quality indicator for cell viability
- Saturation: Measure of sequencing depth adequacy
Data Quality Assessment
- Cell Recovery: Percentage of expected cells detected
- Gene Detection: Number of genes detected across cells
- UTC Distribution: Distribution of unique transcript counts per cell
- Barcode Quality: Error rates and quality scores
Need Help?
For more information, please contact support@scale.bio or visit our support website.