Analysis Parameters

Overview

Analysis parameters can be set in either of two ways:

Parameter File: Create a runParams.yml file and pass it with nextflow run -params-file runParams.yml.
Command Line: Set parameters directly with nextflow run --parameter=value.

Tip

Nextflow options such as -resume or -params-file are given with a single -, while workflow parameters (e.g. --samples) are given with a double dash --. If both are used, command-line options overwrite values in the parameter file (e.g. runParams.yml).

Required Parameters

These parameters are essential for every analysis run. You must set these for the workflow to function correctly.

Sequencing Data Source

Choose one of these input methods:

BCL Files (Raw Sequencer Output)

Parameter: runFolder
Description:
Path to the Illumina sequencer run directory (must contain the RunInfo.xml file). This is used when starting from raw BCL files. The workflow will automatically generate FASTQ files using Illumina bcl-convert. For more details on supported input types and file structure, see Fastq Generation.
Example: runFolder: "/path/to/sequencer/run"

FASTQ Files (Pre-generated)

Parameter: fastqDir
Description:
Directory containing all input FASTQ files for this analysis. Use this option if you already have demultiplexed FASTQ files. File naming and organization must follow the conventions described in Fastq Generation.
Example: fastqDir: "/path/to/fastq/files"

Ultima CRAM Files

Parameter: ultimaCramDir
Description:
Directory containing unaligned CRAM files pre-processed with Ultima trimmer.
Example: ultimaCramDir: "/path/to/cramFiles"

Sample Information

Parameter: samples
Description:
Path to the sample barcode table (samples.csv). This file lists all samples, their names, RT barcodes, and optional sample-specific settings. The format and required columns are described in detail in the Sample Barcode Table documentation.
Example: samples: "samples.csv"

Reference Genome

Parameter: genome
Description:
Path to the genome configuration file (genome.json). This file specifies the reference genome, annotation files, and index locations required for alignment and quantification. For setup instructions and supported formats, see Genomes.
Example: genome: "/genomes/grch38/genome.json"

Output Directory

Parameter: outDir
Description:
Output directory for all analysis results. If the directory already exists, it will be overwritten. For a complete list of output files and their organization, see Outputs.
Example: outDir: "ScaleRna.out"

Scale Bio RNA Kit Version

Parameter: libStructure
Description: This defines the version of the Scale Bio single-cell RNA kit used to generate the libraries. The default libQuantumV1.0.json matches version 1.0 of the Quantum ScaleRNA. For older Scale RNA (3 level) kits, set libStructure libV1.1.json (Scale RNA v1.1 kit and extended throughput kit).
Example: libStructure: "libQuantumV1.0.json"

Advanced/Optional Parameters

These parameters allow you to customize, optimize, or extend the workflow. They are not required for a basic run, but can be very useful for advanced users or special cases.

BAM File Output

Parameter: bamOut and bcParserBamOut
Default: false (BAM files not published to output directory)
When Enabled: Generates STAR alignment BAM files and per sample unaligned BAM files
Trade-off: Saves compute time and storage when disabled
Note: Gene expression matrices (.mtx) are always generated
Example: bamOut: false bcParserBamOut: false

FASTQ File Output

Parameter: fastqOut
Default: false (FASTQ files not published to output directory)
When Enabled: Generates bcl-convert ScaleBio library level fastq files
Trade-off: Saves compute time and storage when disabled
Example: fastqOut: true

Multimapping Read Handling

Parameter	Default	Description
`starMaxLoci`	`6`	Reads that map to more genomic locations that filtered out, regardless of transcriptome overlap
`starMulti`	`PropUnique`	The algorithm used by STAR, to handle reads that match multiple genes, either because of genomic multimapping or because of overlapping gene annotations; See STARsolo documentation
`roundCounts`	`false`	If multi-gene reads are resolved, they lead to fractional unique transcript counts, e.g. "0.5" for a unique read that matches two genes equally well. If this setting is true, these counts are rounded to the closest full integer

Example: To change the multi-gene resolution method from the default PropUnique to Unique which would include only uniquely mapped reads, set the following in your parameter file:

starMulti: "Unique"

How does `starMulti` Affects Gene Counts?

Scenario: A read maps to both Gene A and Gene B

Cell 1 has these UMIs: - Gene A: 10 unique UMIs, 2 multi-gene UMIs (shared with Gene B) - Gene B: 5 unique UMIs, 2 multi-gene UMIs (shared with Gene A)

`starMulti: "PropUnique"` (Default)

Gene A: 10 + (2 × 10/15) = 10 + 1.33 = 11.33 UMIs
Gene B: 5 + (2 × 5/15) = 5 + 0.67 = 5.67 UMIs
Result: Multi-gene UMIs are distributed proportionally to the number of unique UMIs per gene (10:5 ratio)

`starMulti: "Unique"`

Gene A: 10 UMIs (only unique UMIs counted)
Gene B: 5 UMIs (only unique UMIs counted)
Result: Multi-gene UMIs are completely ignored

Cell Calling

Purpose: Distinguishes real cells from background barcodes
Documentation: See Cell Calling for detailed methodology
Parameters: Multiple options for sensitivity and specificity tuning

Compute Resources

Parallelization

Parameter: splitFastq
Default: true
Benefit: Increases parallelization by splitting data by FASTQ files and RT barcodes
Trade-off: More compute jobs, faster processing
Small Analyses: Set to false to reduce job count

Resource Limits (Default)

Parameter	Purpose	Default Value
`taskMaxMemory`	Maximum memory per workflow task	`256.GB`
`taskMaxCpus`	Maximum CPUs per workflow task	`16`
`taskMaxTime`	Maximum runtime per workflow task	`48.h`

Special Run Types

Reporting Only

Parameter: resultDir
Use Case: Generate updated reports from existing analysis outputs
Documentation: See Reporting Workflow for details
Benefit: Skip expensive computation, focus on report generation

Parameter File Example

# Basic configuration
samples: "samples.csv"
genome: "/genomes/grch38/genome.json"
libStructure: "libQuantumV1.0.json"
outDir: "ScaleRna.out"

# Input method (choose one)
runFolder: "/path/to/sequencer/run"
# fastqDir: "/path/to/fastq/files"
# ultimaCramDir: "/path/to/cram/files"

# Advanced settings
bamOut: false
starMaxLoci: 6
starMulti: "PropUnique"
roundCounts: false
splitFastq: true

# Resource limits
taskMaxMemory: "256.GB"
taskMaxCpus: 16
taskMaxTime: "48.h"

Complete Parameter Reference

For a complete list of all available parameters and their default values, see: - nextflow.config — All workflow parameters and Nextflow system options - runParams.yml example — Complete parameter file template

Sample Barcode Table — Input file format and requirements
Genomes — Reference genome setup and configuration
Cell Calling — Cell detection methodology and parameters
Outputs — Complete output file descriptions
Reporting Workflow — Generating reports from existing results

Need Help?

For more information, please contact support@scale.bio or visit our support website.