Skip to content

Analysis Parameters

Overview

Analysis parameters can be set in either of two ways:

  • Parameter File: Create a runParams.yml file and pass it with nextflow run -params-file runParams.yml.
  • Command Line: Set parameters directly with nextflow run --parameter=value.

Tip

Nextflow options such as -resume or -params-file are given with a single -, while workflow parameters (e.g. --samples) are given with a double dash --. If both are used, command-line options overwrite values in the parameter file (e.g. runParams.yml).


Required Parameters

These parameters are essential for every analysis run. You must set these for the workflow to function correctly.

Sequencing Data Source

Choose one of these input methods:

BCL Files (Raw Sequencer Output)

  • Parameter: runFolder
  • Description:
    Path to the Illumina sequencer run directory (must contain the RunInfo.xml file). This is used when starting from raw BCL files. The workflow will automatically generate FASTQ files using Illumina bcl-convert. For more details on supported input types and file structure, see Fastq Generation.
  • Example: runFolder: "/path/to/sequencer/run"

FASTQ Files (Pre-generated)

  • Parameter: fastqDir
  • Description:
    Directory containing all input FASTQ files for this analysis. Use this option if you already have demultiplexed FASTQ files. File naming and organization must follow the conventions described in Fastq Generation.
  • Example: fastqDir: "/path/to/fastq/files"

Ultima CRAM Files

  • Parameter: ultimaCramDir
  • Description:
    Directory containing unaligned CRAM files pre-processed with Ultima trimmer.
  • Example: ultimaCramDir: "/path/to/cramFiles"

Sample Information

  • Parameter: samples
  • Description:
    Path to the sample barcode table (samples.csv). This file lists all samples, their names, RT barcodes, and optional sample-specific settings. The format and required columns are described in detail in the Sample Barcode Table documentation.
  • Example: samples: "samples.csv"

Reference Genome

  • Parameter: genome
  • Description:
    Path to the genome configuration file (genome.json). This file specifies the reference genome, annotation files, and index locations required for alignment and quantification. For setup instructions and supported formats, see Genomes.
  • Example: genome: "/genomes/grch38/genome.json"

Output Directory

  • Parameter: outDir
  • Description:
    Output directory for all analysis results. If the directory already exists, it will be overwritten. For a complete list of output files and their organization, see Outputs.
  • Example: outDir: "ScaleRna.out"

Scale Bio RNA Kit Version

  • Parameter: libStructure
  • Description: This defines the version of the Scale Bio single-cell RNA kit used to generate the libraries. The default libQuantumV1.0.json matches version 1.0 of the Quantum ScaleRNA. For older Scale RNA (3 level) kits, set libStructure libV1.1.json (Scale RNA v1.1 kit and extended throughput kit).
  • Example: libStructure: "libQuantumV1.0.json"

Advanced/Optional Parameters

These parameters allow you to customize, optimize, or extend the workflow. They are not required for a basic run, but can be very useful for advanced users or special cases.

BAM File Output

  • Parameter: bamOut and bcParserBamOut
  • Default: false (BAM files not published to output directory)
  • When Enabled: Generates STAR alignment BAM files and per sample unaligned BAM files
  • Trade-off: Saves compute time and storage when disabled
  • Note: Gene expression matrices (.mtx) are always generated
  • Example: bamOut: false bcParserBamOut: false

FASTQ File Output

  • Parameter: fastqOut
  • Default: false (FASTQ files not published to output directory)
  • When Enabled: Generates bcl-convert ScaleBio library level fastq files
  • Trade-off: Saves compute time and storage when disabled
  • Example: fastqOut: true

Multimapping Read Handling

Parameter Default Description
starMaxLoci 6 Reads that map to more genomic locations that filtered out, regardless of transcriptome overlap
starMulti PropUnique The algorithm used by STAR, to handle reads that match multiple genes, either because of genomic multimapping or because of overlapping gene annotations; See STARsolo documentation
roundCounts false If multi-gene reads are resolved, they lead to fractional unique transcript counts, e.g. "0.5" for a unique read that matches two genes equally well. If this setting is true, these counts are rounded to the closest full integer

Example: To change the multi-gene resolution method from the default PropUnique to Unique which would include only uniquely mapped reads, set the following in your parameter file:

starMulti: "Unique"

How does starMulti Affects Gene Counts?

Scenario: A read maps to both Gene A and Gene B

Cell 1 has these UMIs: - Gene A: 10 unique UMIs, 2 multi-gene UMIs (shared with Gene B) - Gene B: 5 unique UMIs, 2 multi-gene UMIs (shared with Gene A)

starMulti: "PropUnique" (Default)
  • Gene A: 10 + (2 × 10/15) = 10 + 1.33 = 11.33 UMIs
  • Gene B: 5 + (2 × 5/15) = 5 + 0.67 = 5.67 UMIs
  • Result: Multi-gene UMIs are distributed proportionally to the number of unique UMIs per gene (10:5 ratio)
starMulti: "Unique"
  • Gene A: 10 UMIs (only unique UMIs counted)
  • Gene B: 5 UMIs (only unique UMIs counted)
  • Result: Multi-gene UMIs are completely ignored

Cell Calling

  • Purpose: Distinguishes real cells from background barcodes
  • Documentation: See Cell Calling for detailed methodology
  • Parameters: Multiple options for sensitivity and specificity tuning

Compute Resources

Parallelization

  • Parameter: splitFastq
  • Default: true
  • Benefit: Increases parallelization by splitting data by FASTQ files and RT barcodes
  • Trade-off: More compute jobs, faster processing
  • Small Analyses: Set to false to reduce job count

Resource Limits (Default)

Parameter Purpose Default Value
taskMaxMemory Maximum memory per workflow task 256.GB
taskMaxCpus Maximum CPUs per workflow task 16
taskMaxTime Maximum runtime per workflow task 48.h

Special Run Types

Reporting Only

  • Parameter: resultDir
  • Use Case: Generate updated reports from existing analysis outputs
  • Documentation: See Reporting Workflow for details
  • Benefit: Skip expensive computation, focus on report generation

Parameter File Example

# Basic configuration
samples: "samples.csv"
genome: "/genomes/grch38/genome.json"
libStructure: "libQuantumV1.0.json"
outDir: "ScaleRna.out"

# Input method (choose one)
runFolder: "/path/to/sequencer/run"
# fastqDir: "/path/to/fastq/files"
# ultimaCramDir: "/path/to/cram/files"

# Advanced settings
bamOut: false
starMaxLoci: 6
starMulti: "PropUnique"
roundCounts: false
splitFastq: true

# Resource limits
taskMaxMemory: "256.GB"
taskMaxCpus: 16
taskMaxTime: "48.h"

Complete Parameter Reference

For a complete list of all available parameters and their default values, see: - nextflow.config — All workflow parameters and Nextflow system options - runParams.yml example — Complete parameter file template


Need Help?

For more information, please contact support@scale.bio or visit our support website.