Fastq Generation / Demultiplexing

The ScaleRNA analysis workflow can start directly from an Illumina sequencer runFolder, see Sequencing Reads Input. However, it is often more convenient to generate FASTQ files ahead of time and use those as input to the analysis. To generate FASTQ files, use the example samplesheets provided with the workflow along with Illumina bcl-convert (version 3.9 or later).

Important

All ScaleBio libraries require these 4 fastq files:

Read 1 (R1)
Read 2 (R2)
Index 1 (I1, i7)
Index 2 (I2, i5)

QuantumScale Samplesheets

All QuantumScale samplesheets are available in the QuantumScale samplesheets. The available configurations include:

Kit Configuration	Description
Small/Medium Kit	QSR-P PCR primer pool configuration
Large/Extra Large Kit	QSR-1 to QSR-8 PCR primer pools
Modular Kit	QSR-1 to QSR-12 PCR primer pools
ScalePlex Integration	QuantumScale with ScalePlex integration

Each configuration is available with and without ScalePlex integration, and with both forward and reverse index2 orientations.

QuantumScale RNA FASTQs

Overview

With QuantumScale samplesheets, FASTQ files are demultiplexed based on the PCR (sub-library) barcode in the index2 (i5) read and a partial bead barcode (the first block of 8 bp) in index1 (i7). For a detailed explanation please see our QuantumScale Demux Strategy.

We always use all 96 possible bead1 barcode sequences for index1, combined with the 4 sequences for each PCR primer pool (sub-library index) in index2. Based on the QuantumScale kit used, a subset of PCR (sub-library) barcodes will be used in the experiment:

Large and Extra Large kits: PCR primer pools QSR-1 to QSR-8
Small and Medium kits: PCR primer pool QSR-P
Modular kit: PCR primer pools QSR-1 to QSR-12

Splitting Large Fastq Files

To reduce workflow runtime, it is best to avoid combining the entire data in one fastq file. Generally the Illumina default of splitting per lane (i.e. not setting bcl-convert option --no-lane-splitting) is good for smaller runs. An alternative is to modify samplesheet.csv to assign a different sample_ID to each i5 * i7 barcode combination, following a pattern like QSR-1_1, QSR-1_2; where all samples start with the same libIndex2 followed by _ and the "well" coordinate of the bead barcode (i7). This way the fastq files will be grouped together for analysis in the ScaleRNA workflow. This is the default way in which the workflow generates the samplesheet, if running the workflow from a runFolder.

Required Configuration Settings

Index Reads

The Index read sequences are essential for QuantumScale RNA analysis as they contain critical barcode information. Hence index read FASTQ files are required input to the workflow. When using bcl-convert for fastq-generation include this samplesheet.csv setting:

CreateFastqForIndexReads,1

OverrideCycles

The index1 (i7) read is 32 bp long to capture the entire bead barcode. However since there are over 800,000 possible bead barcodes in an experiment, we do not want to create a separate fastq file for each. Instead we use only the first 8bp block of the bead barcode during FASTQ demultiplexing. This is enabled with the OverrideCycles setting in samplesheet.csv:

OverrideCycles: Y82;I8U24;I8;Y16

Breakdown of OverrideCycles:

Y82: Read 1 (RNA sequence) - 82bp
I8U24: Index 1 (i7) - First 8bp used for demux (I8), full 32bp preserved for analysis
I8: Index 2 (i5) - Full 8bp used for demux
Y16: Read 2 (RT barcode + molecular barcode) - 16bp

The first 8 cycles of Index 1 and all of Index 2 are used as an index (I) for FASTQ demultiplexing. The complete 32bp sequence of Index 1 is preserved and available for Bead Barcode extraction by the Scale Bio Seq Suite pipeline. Note that the number of bases for read1 (transcript read) and read2 (barcode read) has to match the actual sequencing length of your run. If these differ from the recommended 82 and 16, the Y82 and Y16 entries in OverrideCycles has to be adjusted.

Expected FASTQ File Numbers

The provided samplesheets will demultiplex the sequencing data into 96 sets of FASTQ files per sub-library, which the workflow will process in parallel for faster performance; note that these individual FASTQ files correspond to a random subset of beads and not to individual wells in the QuantumScale plate or to different samples.

Kit Type / Configuration	PCR Sequences Used	Unique FASTQs	Total FASTQ Files (R1/R2/I1/I2)
QuantumScale Small/Medium	QSR-P	96	384
QuantumScale Large/Extra Large	QSR-1 to QSR-8	768	3,072
QuantumScale Modular	QSR-1 to QSR-12	1,152	4,608

Sequencing Setup

See the QuantumScale Single Cell RNA Sequencing Guidelines for further details on sequencing and FASTQ generation.

Read Configuration

Read	Purpose	Length
Read 1	RNA sequence	82bp
Read 2	RT (sample) barcode + Molecular Barcode	16bp
Index 1 (i7)	Bead Barcode	32bp
Index 2 (i5)	PCR barcode / sub-library index	8bp

ScaleRNA v1+ Kit FASTQs

RNA Kit v1.0

The v1.0 kit uses a simpler demultiplexing approach where all 96 i7 barcode sequences from the PCR plate are merged into one set of fastq files.

If an index2 (i5) read is used to demultiplex the ScaleBio RNA library with other libraries in the sequencing run, a index2 column can be added with the constant i5 sequence of the ScaleRna library, which is TGAACCTT[AC] (8 or 10bp) in forward and AAGGTTCA[GT] in reverse orientation.

RNA Kit v1.1

The v1.1 kit uses a different index read setup: - One i5 sequence (index2 read) for each final distribution (PCR) plate well - A pool of 4 i7 sequences (index1 read) for the whole plate

Some sequencers require the index2 (i.e. i5) sequence in the opposite orientation; examples for both orientations are provided.

Extended Throughput Kit

Each extended throughput plate uses the same set of i5 (index2) sequences, but a different pool of i7 (index1). This creates a separate set of fastq files for each extended throughput plate (RNA-A-AP1, RNA-A-AP2, ...). See Extended Throughput for how to analyze these jointly.

To reduce workflow runtime through parallelization (--splitFastq mode), it is best to avoid combining the entire data in one fastq file. Generally the Illumina default of splitting per lane (i.e. not setting bcl-convert option --no-lane-splitting) is good for smaller runs.

An alternative is to modify samplesheet.csv to assign a different sample_ID to each i5 barcode, following a pattern like RNA-A-AP1_A1, RNA-A-AP1_A2; where all samples start with the same library name followed by _ and a unique tag. This way the fastq files will be grouped together for analysis in the ScaleRNA workflow.

ScaleRNA v1 Samplesheets

All ScaleRNA v1+ samplesheets are available:

The available configurations include:

Kit Type / Seq Configuration	Description
ScaleRNA v1.0	Standard v1.0 kit configuration
ScaleRNA v1.0 with i5	v1.0 kit with index2 demultiplexing
ScaleRNA v1.1	Standard v1.1 kit configuration
ScaleRNA v1.1 (reverse i5)	v1.1 kit with reverse index2 orientation
Extended Throughput v1.1	All three extended throughput plates including the base kit indexes
Extended Throughput v1.1 (reverse i5)	Same as above but with with reverse index2
ScalePlex v1	v1.1 kit with ScalePlex integration
Extended Throughput + ScalePlex v1	Extended throughput with ScalePlex integration

Expected FASTQ File Counts

Kit Type / Configuration	Unique FASTQs	Total FASTQ Files (R1/R2/I1/I2)
ScaleRNA 3L v1.1	96	384
ScaleRNA 3L v1.1 + 3 ET	768	3,072

Splitting Large Fastq Files for v1 kits

This creates a separate set of fastq files for each extended throughput plate (RNA-A-AP1, RNA-A-AP2, ...). See Extended Throughput for how to analyze these jointly. To reduce workflow runtime through parallelization (--splitFastq mode), it is best to avoid combining the entire data in one fastq file. Generally the Illumina default of splitting per lane (i.e. not setting bcl-convert option --no-lane-splitting) is good for smaller runs. An alternative is to modify samplesheet.csv to assign a different sample_ID to each i5 barcode, following a pattern like RNA-A-AP1_A1, RNA-A-AP1_A2; where all samples start with the same library name followed by _ and a unique tag. This way the fastq files will be grouped together for analysis in the ScaleRNA workflow.

Need Help?

For more information, please contact support@scale.bio or visit our support website.