Merging Multiple Runs
ScaleRna supports combining data from multiple sequencing runs, but the approach depends on your experimental design. There are two distinct scenarios that require different workflows:
Scenario 1: Different Sub-libraries / Plates / Cells on Separate Runs
When different sub-libraries or different plates are sequenced on separate flowcells, each run contains unique biological samples. This scenario allows for post-alignment merging of results.
Key Characteristics: - Each sequencing run contains different biological samples - Sub-libraries or plates are sequenced independently - Results can be combined after individual analysis
Scenario 2: Same Libraries / Plates / Cells on Multiple Runs
When the same libraries are sequenced across multiple flowcells, you have technical replicates of the same biological material. This scenario requires pre-analysis data combination.
Key Characteristics: - Same biological samples sequenced on different runs - Technical replicates that need unified analysis - Data must be combined before processing
Choosing Your Approach
- Use Scenario 1 workflow if your runs contain different biological samples
- Use Scenario 2 workflow if your runs contain the same biological samples
The following sections provide detailed workflows for each scenario.
Step-by-Step Guide:
Scenario 1: Different sub-libraries / plates / cells sequenced on their own unique sequencing runs
- When the different sub-libraries are sequenced across multiple flowcells, the data can be combined after genome alignment.
Step 1: Analyze each run or plate separately:
- Create a
samples.csv
for each sequencing run or plate. - Process each run/plate normally through the workflow.
Step 2: Prepare for merging:
- Create a new
samples.csv
listing all samples and plates to be merged. - Add a
resultDir
column that points to the output directory for each run/plate. - See merge example.
Step 3: Run the workflow in merge mode:
nextflow run /PATH/TO/ScaleRna -profile PROFILE -params-file /PATH/TO/ScaleRna/docs/examples/extended-throughput/runParams.yml --reporting --outDir merged_output.ext
Do not specify any input reads at this step (no --runFolder
or --fastqDir
)
Scenario 2: Same Libraries / Plates / Cells Sequenced on Different Sequencing Runs
When the same libraries are sequenced across multiple flowcells, the data must be combined for unified analysis.
Process Steps
Step 1: Generate Individual FASTQ Files
- Generate FASTQ files for each sequencing run separately
- Use appropriate samplesheets for each run
- Ensure proper naming conventions for each run
Step 2: Combine or Organize Files
- Option A: Use unique naming to distinguish between runs
- Option B: Concatenate FASTQ files from different runs (NOT RECOMMENDED)
- Group all FASTQ files in a single directory
Step 3: Unified Analysis
- Supply the combined directory to the
--fastqDir
parameter - Run the ScaleRna pipeline on the combined dataset
- Pipeline will analyze all sequencing data as one unified dataset
Key Requirements
- Generate FASTQs for each individual library/run
- Ensure the ScaleRna pipeline processes both sets of FASTQs together
- Maintain proper file organization and naming
- Use the
--fastqDir
parameter for the combined dataset
Expected Outcome
- Single analysis run containing data from multiple sequencing runs
- Unified gene expression matrices and quality metrics
- Combined cell calling and demultiplexing results
Related Documentation
Need Help?
For more information, please contact support@scale.bio or visit our support website.