Analysis Parameters
Overview
Analysis parameters can be set in either of two ways:
- Parameter File: Create a
runParams.yml
file and pass it withnextflow run -params-file runParams.yml
. - Command Line: Set parameters directly with
nextflow run --parameter=value
.
Tip
Nextflow options such as -resume
or -params-file
are given with a single -
, while workflow parameters (e.g. --samples
) are given with a double dash --
. If both are used, command-line options overwrite values in the parameter file (e.g. runParams.yml
).
Required Parameters
These parameters are essential for every analysis run. You must set these for the workflow to function correctly.
Sequencing Data Source
Choose one of these input methods:
BCL Files (Raw Sequencer Output)
- Parameter:
runFolder
- Description:
Path to the Illumina sequencer run directory (must contain theRunInfo.xml
file). This is used when starting from raw BCL files. The workflow will automatically generate FASTQ files using Illumina bcl-convert. For more details on supported input types and file structure, see Fastq Generation. - Example:
runFolder: "/path/to/sequencer/run"
FASTQ Files (Pre-generated)
- Parameter:
fastqDir
- Description:
Directory containing all input FASTQ files for this analysis. Use this option if you already have demultiplexed FASTQ files. File naming and organization must follow the conventions described in Fastq Generation. - Example:
fastqDir: "/path/to/fastq/files"
Ultima CRAM Files
- Parameter:
ultimaCramDir
- Description:
Directory containing unaligned CRAM files pre-processed with Ultimatrimmer
. - Example:
ultimaCramDir: "/path/to/cramFiles"
Sample Information
- Parameter:
samples
- Description:
Path to the sample barcode table (samples.csv
). This file lists all samples, their names, RT barcodes, and optional sample-specific settings. The format and required columns are described in detail in the Sample Barcode Table documentation. - Example:
samples: "samples.csv"
Reference Genome
- Parameter:
genome
- Description:
Path to the genome configuration file (genome.json
). This file specifies the reference genome, annotation files, and index locations required for alignment and quantification. For setup instructions and supported formats, see Genomes. - Example:
genome: "/genomes/grch38/genome.json"
Output Directory
- Parameter:
outDir
- Description:
Output directory for all analysis results. If the directory already exists, it will be overwritten. For a complete list of output files and their organization, see Outputs. - Example:
outDir: "ScaleRna.out"
Scale Bio RNA Kit Version
- Parameter:
libStructure
- Description: This defines the version of the Scale Bio single-cell RNA kit used to generate the libraries. The default libQuantumV1.0.json matches version 1.0 of the Quantum ScaleRNA. For older Scale RNA (3 level) kits, set libStructure libV1.1.json (Scale RNA v1.1 kit and extended throughput kit).
- Example:
libStructure: "libQuantumV1.0.json"
Advanced/Optional Parameters
These parameters allow you to customize, optimize, or extend the workflow. They are not required for a basic run, but can be very useful for advanced users or special cases.
BAM File Output
- Parameter:
bamOut
andbcParserBamOut
- Default:
false
(BAM files not published to output directory) - When Enabled: Generates STAR alignment BAM files and per sample unaligned BAM files
- Trade-off: Saves compute time and storage when disabled
- Note: Gene expression matrices (
.mtx
) are always generated - Example:
bamOut: false bcParserBamOut: false
FASTQ File Output
- Parameter:
fastqOut
- Default:
false
(FASTQ files not published to output directory) - When Enabled: Generates bcl-convert ScaleBio library level fastq files
- Trade-off: Saves compute time and storage when disabled
- Example:
fastqOut: true
Multimapping Read Handling
Parameter | Default | Description |
---|---|---|
starMaxLoci |
6 |
Reads that map to more genomic locations that filtered out, regardless of transcriptome overlap |
starMulti |
PropUnique |
The algorithm used by STAR, to handle reads that match multiple genes, either because of genomic multimapping or because of overlapping gene annotations; See STARsolo documentation |
roundCounts |
false |
If multi-gene reads are resolved, they lead to fractional unique transcript counts, e.g. "0.5" for a unique read that matches two genes equally well. If this setting is true, these counts are rounded to the closest full integer |
Example: To change the multi-gene resolution method from the default PropUnique
to Unique
which would include only uniquely mapped reads, set the following in your parameter file:
starMulti: "Unique"
How does starMulti
Affects Gene Counts?
Scenario: A read maps to both Gene A and Gene B
Cell 1 has these UMIs: - Gene A: 10 unique UMIs, 2 multi-gene UMIs (shared with Gene B) - Gene B: 5 unique UMIs, 2 multi-gene UMIs (shared with Gene A)
starMulti: "PropUnique"
(Default)
- Gene A: 10 + (2 × 10/15) = 10 + 1.33 = 11.33 UMIs
- Gene B: 5 + (2 × 5/15) = 5 + 0.67 = 5.67 UMIs
- Result: Multi-gene UMIs are distributed proportionally to the number of unique UMIs per gene (10:5 ratio)
starMulti: "Unique"
- Gene A: 10 UMIs (only unique UMIs counted)
- Gene B: 5 UMIs (only unique UMIs counted)
- Result: Multi-gene UMIs are completely ignored
Cell Calling
- Purpose: Distinguishes real cells from background barcodes
- Documentation: See Cell Calling for detailed methodology
- Parameters: Multiple options for sensitivity and specificity tuning
Compute Resources
Parallelization
- Parameter:
splitFastq
- Default:
true
- Benefit: Increases parallelization by splitting data by FASTQ files and RT barcodes
- Trade-off: More compute jobs, faster processing
- Small Analyses: Set to
false
to reduce job count
Resource Limits (Default)
Parameter | Purpose | Default Value |
---|---|---|
taskMaxMemory |
Maximum memory per workflow task | 256.GB |
taskMaxCpus |
Maximum CPUs per workflow task | 16 |
taskMaxTime |
Maximum runtime per workflow task | 48.h |
Special Run Types
Reporting Only
- Parameter:
resultDir
- Use Case: Generate updated reports from existing analysis outputs
- Documentation: See Reporting Workflow for details
- Benefit: Skip expensive computation, focus on report generation
Parameter File Example
# Basic configuration
samples: "samples.csv"
genome: "/genomes/grch38/genome.json"
libStructure: "libQuantumV1.0.json"
outDir: "ScaleRna.out"
# Input method (choose one)
runFolder: "/path/to/sequencer/run"
# fastqDir: "/path/to/fastq/files"
# ultimaCramDir: "/path/to/cram/files"
# Advanced settings
bamOut: false
starMaxLoci: 6
starMulti: "PropUnique"
roundCounts: false
splitFastq: true
# Resource limits
taskMaxMemory: "256.GB"
taskMaxCpus: 16
taskMaxTime: "48.h"
Complete Parameter Reference
For a complete list of all available parameters and their default values, see: - nextflow.config — All workflow parameters and Nextflow system options - runParams.yml example — Complete parameter file template
Related Documentation
- Sample Barcode Table — Input file format and requirements
- Genomes — Reference genome setup and configuration
- Cell Calling — Cell detection methodology and parameters
- Outputs — Complete output file descriptions
- Reporting Workflow — Generating reports from existing results
Need Help?
For more information, please contact support@scale.bio or visit our support website.