Skip to content

Merging Multiple Runs

ScaleRna supports combining data from multiple sequencing runs, but the approach depends on your experimental design. There are two distinct scenarios that require different workflows:

Scenario 1: Different Sub-libraries / Plates / Cells on Separate Runs

When different sub-libraries or different plates are sequenced on separate flowcells, each run contains unique biological samples. This scenario allows for post-alignment merging of results.

Key Characteristics: - Each sequencing run contains different biological samples - Sub-libraries or plates are sequenced independently - Results can be combined after individual analysis

Scenario 2: Same Libraries / Plates / Cells on Multiple Runs

When the same libraries are sequenced across multiple flowcells, you have technical replicates of the same biological material. This scenario requires pre-analysis data combination.

Key Characteristics: - Same biological samples sequenced on different runs - Technical replicates that need unified analysis - Data must be combined before processing

Choosing Your Approach

  • Use Scenario 1 workflow if your runs contain different biological samples
  • Use Scenario 2 workflow if your runs contain the same biological samples

The following sections provide detailed workflows for each scenario.


Step-by-Step Guide:

Scenario 1: Different sub-libraries / plates / cells sequenced on their own unique sequencing runs

  • When the different sub-libraries are sequenced across multiple flowcells, the data can be combined after genome alignment.

Step 1: Analyze each run or plate separately:

  • Create a samples.csv for each sequencing run or plate.
  • Process each run/plate normally through the workflow.

Step 2: Prepare for merging:

  • Create a new samples.csv listing all samples and plates to be merged.
  • Add a resultDir column that points to the output directory for each run/plate.
  • See merge example.

Step 3: Run the workflow in merge mode:

nextflow run /PATH/TO/ScaleRna -profile PROFILE -params-file /PATH/TO/ScaleRna/docs/examples/extended-throughput/runParams.yml --reporting --outDir merged_output.ext

Do not specify any input reads at this step (no --runFolder or --fastqDir)


Scenario 2: Same Libraries / Plates / Cells Sequenced on Different Sequencing Runs

When the same libraries are sequenced across multiple flowcells, the data must be combined for unified analysis.

Process Steps

Step 1: Generate Individual FASTQ Files

  • Generate FASTQ files for each sequencing run separately
  • Use appropriate samplesheets for each run
  • Ensure proper naming conventions for each run

Step 2: Combine or Organize Files

  • Option A: Use unique naming to distinguish between runs
  • Option B: Concatenate FASTQ files from different runs (NOT RECOMMENDED)
  • Group all FASTQ files in a single directory

Step 3: Unified Analysis

  • Supply the combined directory to the --fastqDir parameter
  • Run the ScaleRna pipeline on the combined dataset
  • Pipeline will analyze all sequencing data as one unified dataset

Key Requirements

  • Generate FASTQs for each individual library/run
  • Ensure the ScaleRna pipeline processes both sets of FASTQs together
  • Maintain proper file organization and naming
  • Use the --fastqDir parameter for the combined dataset

Expected Outcome

  • Single analysis run containing data from multiple sequencing runs
  • Unified gene expression matrices and quality metrics
  • Combined cell calling and demultiplexing results


Need Help?

For more information, please contact support@scale.bio or visit our support website.