Sequencing Reads

Users can provide sequencing reads as input to the ScaleRna workflow using one of three different input methods:

Sequencer Run Folder (--runFolder): Raw sequencing data (BCLs) from one Illumina sequencing run
FASTQ Files (--fastqDir): Demultiplexed FASTQ files from one or multiple sequencing runs
Ultima CRAM Files (--ultimaCramDir): Unaligned cram files pre-processed with Ultima trimmer

All methods will produce equivalent results; the choice depends on the sequencing setup and data availability.

Starting from a Sequencer Run Folder

The workflow can start directly with the output of an Illumina sequencing run (the runFolder containing the data in BCL files). In this case the workflow automatically generates FASTQ files internally using Illumina bcl-convert.

When using --runFolder, you do NOT need to provide a samplesheet.csv file. The workflow automatically generates the appropriate samplesheet based on either the Quantum ScaleRNA sub-library index (libIndex2) or the ScaleRNA v1.1 library index (libIndex) from your sample barcode table (samples.csv).

nextflow run ScaleBio/ScaleRna \
    --runFolder /path/to/runFolder \
    --samples samples.csv \
    --params-file runParams.yml \
    --outDir results

Replace /path/to/runFolder with the top-level sequencer output directory (runFolder), i.e. the directory containing the RunInfo.xml file.

Starting with FASTQ Files

Alternatively, the workflow can start with pre-generated FASTQ files (e.g. from a core facility); See FASTQ generation for documentation and samplesheets to use.

nextflow run ScaleBio/ScaleRna \
    --fastqDir /path/to/fastq_directory/ \
    --samples samples.csv \
    --outDir results

Where fastq_directory contains all FASTQ files for the analysis run:

fastq_directory/
├── QSR-P_01_L001_R1_001.fastq.gz
├── QSR-P_01_L001_R2_001.fastq.gz
├── QSR-P_01_L001_I1_001.fastq.gz
├── QSR-P_01_L001_I2_001.fastq.gz
├── QSR-P_02_L001_R1_001.fastq.gz
...

The exact naming of the FASTQ files does not matter, as long as files for all reads including index reads (R1, R2, I1, I2) are present; see FASTQ generation for details.

Tip

These files do not correspond to individual samples in the QuantumScale RNA run; sample-level demultiplexing is done by the ScaleRNA workflow using information from the sample barcode table.

Starting from Ultima CRAM files

For Ultima sequencing data, the ScaleRNA workflow starts from unaligned CRAM files pre-processed with Ultima trimmer. The workflow then proceeds straight to alignment (STAR) and produces all library and sample level QC reports.

Index Read Extraction Post FASTQ Generation

If you only have pre-generated FASTQ files without separate index reads, the index information may be embedded in read headers. Please download our extraction tool here.

IMPORTANT: This tool should ONLY be used as a failsafe. The recommended procedure is to generate index reads properly by supplying bcl files or using ScaleBio pre-made samplesheets.

The index read extraction tool will gather index nucleotide sequences from the fastq read header. These vary based on which library structure, thus we have an analysis --mode to select which library. Please see below an explantion of the tool and some examples below:

Example Header Format:

@A01466R:144:HCTLGDSX7:1:1110:8260:7780 1:N:0:CAGAAGCTAG+GCATCGTATG

Extraction Tool: Use makeIndexFqs.py to extract index reads from headers:

# For QuantumScale libraries
for file in *R1*.fastq.gz; do
  python makeIndexFqs.py "$file" --mode QS --outDir output/
done

# For ScaleRNA v1.1 libraries
for file in *R1*.fastq.gz; do
  python makeIndexFqs.py "$file" --mode 3lvl --outDir output/
done

Need Help?

For more information, please contact support@scale.bio or visit our support website.