Sequencing Reads
Users can provide sequencing reads as input to the ScaleRna workflow using one of three different input methods:
- Sequencer Run Folder (
--runFolder
): Raw sequencing data (BCLs) from one Illumina sequencing run - FASTQ Files (
--fastqDir
): Demultiplexed FASTQ files from one or multiple sequencing runs - Ultima CRAM Files (
--ultimaCramDir
): Unaligned cram files pre-processed with Ultimatrimmer
All methods will produce equivalent results; the choice depends on the sequencing setup and data availability.
Starting from a Sequencer Run Folder
The workflow can start directly with the output of an Illumina sequencing run (the runFolder containing the data in BCL files). In this case the workflow automatically generates FASTQ files internally using Illumina bcl-convert.
When using --runFolder
, you do NOT need to provide a samplesheet.csv
file. The workflow automatically generates the appropriate samplesheet based on either the Quantum ScaleRNA sub-library index (libIndex2
) or the ScaleRNA v1.1 library index (libIndex
) from your sample barcode table (samples.csv
).
nextflow run ScaleBio/ScaleRna \
--runFolder /path/to/runFolder \
--samples samples.csv \
--params-file runParams.yml \
--outDir results
Replace /path/to/runFolder
with the top-level sequencer output directory (runFolder), i.e. the directory containing the RunInfo.xml
file.
Starting with FASTQ Files
Alternatively, the workflow can start with pre-generated FASTQ files (e.g. from a core facility); See FASTQ generation for documentation and samplesheets to use.
nextflow run ScaleBio/ScaleRna \
--fastqDir /path/to/fastq_directory/ \
--samples samples.csv \
--outDir results
Where fastq_directory
contains all FASTQ files for the analysis run:
fastq_directory/
├── QSR-P_01_L001_R1_001.fastq.gz
├── QSR-P_01_L001_R2_001.fastq.gz
├── QSR-P_01_L001_I1_001.fastq.gz
├── QSR-P_01_L001_I2_001.fastq.gz
├── QSR-P_02_L001_R1_001.fastq.gz
...
The exact naming of the FASTQ files does not matter, as long as files for all reads including index reads (R1
, R2
, I1
, I2
) are present; see FASTQ generation for details.
Tip
These files do not correspond to individual samples in the QuantumScale RNA run; sample-level demultiplexing is done by the ScaleRNA workflow using information from the sample barcode table.
Starting from Ultima CRAM files
For Ultima sequencing data, the ScaleRNA workflow starts from unaligned CRAM files pre-processed with Ultima trimmer
. The workflow then proceeds straight to alignment (STAR) and produces all library and sample level QC reports.
Index Read Extraction Post FASTQ Generation
If you only have pre-generated FASTQ files without separate index reads, the index information may be embedded in read headers. Please download our extraction tool here.
IMPORTANT: This tool should ONLY be used as a failsafe. The recommended procedure is to generate index reads properly by supplying bcl files or using ScaleBio pre-made samplesheets.
The index read extraction tool will gather index nucleotide sequences from the fastq read header. These vary based on which library structure, thus we have an analysis --mode
to select which library. Please see below an explantion of the tool and some examples below:
Example Header Format:
@A01466R:144:HCTLGDSX7:1:1110:8260:7780 1:N:0:CAGAAGCTAG+GCATCGTATG
Extraction Tool: Use makeIndexFqs.py
to extract index reads from headers:
# For QuantumScale libraries
for file in *R1*.fastq.gz; do
python makeIndexFqs.py "$file" --mode QS --outDir output/
done
# For ScaleRNA v1.1 libraries
for file in *R1*.fastq.gz; do
python makeIndexFqs.py "$file" --mode 3lvl --outDir output/
done
Need Help?
For more information, please contact support@scale.bio or visit our support website.