Merging Multiple Runs

ScaleRna supports combining data from multiple sequencing runs, but the approach depends on your experimental design. There are two distinct scenarios that require different workflows:

Scenario 1: Different Sub-libraries / Plates / Cells on Separate Runs

When different sub-libraries or different plates are sequenced on separate flowcells, each run contains unique biological samples. This scenario allows for post-alignment merging of results.

Key Characteristics: - Each sequencing run contains different biological samples - Sub-libraries or plates are sequenced independently - Results can be combined after individual analysis

Scenario 2: Same Libraries / Plates / Cells on Multiple Runs

When the same libraries are sequenced across multiple flowcells, you have technical replicates of the same biological material. This scenario requires pre-analysis data combination.

Key Characteristics: - Same biological samples sequenced on different runs - Technical replicates that need unified analysis - Data must be combined before processing

Choosing Your Approach

Use Scenario 1 workflow if your runs contain different biological samples
Use Scenario 2 workflow if your runs contain the same biological samples

The following sections provide detailed workflows for each scenario.

Step-by-Step Guide:

Scenario 1: Different sub-libraries / plates / cells sequenced on their own unique sequencing runs

When the different sub-libraries are sequenced across multiple flowcells, the data can be combined after genome alignment.

Step 1: Analyze each run or plate separately:

Create a samples.csv for each sequencing run or plate.
Process each run/plate normally through the workflow.

Step 2: Prepare for merging:

Create a new samples.csv listing all samples and plates to be merged.
Add a resultDir column that points to the output directory for each run/plate.
See merge example.

Step 3: Run the workflow in merge mode:

nextflow run /PATH/TO/ScaleRna -profile PROFILE -params-file /PATH/TO/ScaleRna/docs/examples/extended-throughput/runParams.yml --reporting --outDir merged_output.ext

Do not specify any input reads at this step (no --runFolder or --fastqDir)

Scenario 2: Same Libraries / Plates / Cells Sequenced on Different Sequencing Runs

When the same libraries are sequenced across multiple flowcells, the data must be combined for unified analysis.

Process Steps

Step 1: Generate Individual FASTQ Files

Generate FASTQ files for each sequencing run separately
Use appropriate samplesheets for each run
Ensure proper naming conventions for each run

Step 2: Combine or Organize Files

Option A: Use unique naming to distinguish between runs
Option B: Concatenate FASTQ files from different runs (NOT RECOMMENDED)
Group all FASTQ files in a single directory

Step 3: Unified Analysis

Supply the combined directory to the --fastqDir parameter
Run the ScaleRna pipeline on the combined dataset
Pipeline will analyze all sequencing data as one unified dataset

Key Requirements

Generate FASTQs for each individual library/run
Ensure the ScaleRna pipeline processes both sets of FASTQs together
Maintain proper file organization and naming
Use the --fastqDir parameter for the combined dataset

Expected Outcome

Single analysis run containing data from multiple sequencing runs
Unified gene expression matrices and quality metrics
Combined cell calling and demultiplexing results

Need Help?

For more information, please contact support@scale.bio or visit our support website.