Fastq Generation / Demultiplexing
The ScaleRNA analysis workflow can start directly from an Illumina sequencer runFolder, see Sequencing Reads Input. However, it is often more convenient to generate FASTQ files ahead of time and use those as input to the analysis. To generate FASTQ files, use the example samplesheets provided with the workflow along with Illumina bcl-convert (version 3.9 or later).
Important
All ScaleBio libraries require these 4 fastq files:
- Read 1 (
R1
) - Read 2 (
R2
) - Index 1 (
I1
,i7
) - Index 2 (
I2
,i5
)
QuantumScale Samplesheets
All QuantumScale samplesheets are available in the QuantumScale samplesheets. The available configurations include:
Kit Configuration | Description |
---|---|
Small/Medium Kit | QSR-P PCR primer pool configuration |
Large/Extra Large Kit | QSR-1 to QSR-8 PCR primer pools |
Modular Kit | QSR-1 to QSR-12 PCR primer pools |
ScalePlex Integration | QuantumScale with ScalePlex integration |
Each configuration is available with and without ScalePlex integration, and with both forward and reverse index2 orientations.
QuantumScale RNA FASTQs
Overview
With QuantumScale samplesheets, FASTQ files are demultiplexed based on the PCR (sub-library) barcode in the index2 (i5
) read and a partial bead barcode (the first block of 8 bp) in index1 (i7
). For a detailed explanation please see our QuantumScale Demux Strategy.
We always use all 96 possible bead1 barcode sequences for index1, combined with the 4 sequences for each PCR primer pool (sub-library index) in index2
. Based on the QuantumScale kit used, a subset of PCR (sub-library) barcodes will be used in the experiment:
- Large and Extra Large kits: PCR primer pools QSR-1 to QSR-8
- Small and Medium kits: PCR primer pool QSR-P
- Modular kit: PCR primer pools QSR-1 to QSR-12
Splitting Large Fastq Files
To reduce workflow runtime, it is best to avoid combining the entire data in one fastq file. Generally the Illumina default of splitting per lane (i.e. not setting bcl-convert option --no-lane-splitting
) is good for smaller runs. An alternative is to modify samplesheet.csv
to assign a different sample_ID
to each i5 * i7 barcode combination, following a pattern like QSR-1_1
, QSR-1_2
; where all samples start with the same libIndex2 followed by _
and the "well" coordinate of the bead barcode (i7). This way the fastq files will be grouped together for analysis in the ScaleRNA workflow. This is the default way in which the workflow generates the samplesheet, if running the workflow from a runFolder.
Required Configuration Settings
Index Reads
The Index read sequences are essential for QuantumScale RNA analysis as they contain critical barcode information. Hence index read FASTQ files are required input to the workflow. When using bcl-convert
for fastq-generation include this samplesheet.csv
setting:
CreateFastqForIndexReads,1
OverrideCycles
The index1 (i7) read is 32 bp long to capture the entire bead barcode. However since there are over 800,000 possible bead barcodes in an experiment, we do not want to create a separate fastq file for each. Instead we use only the first 8bp block of the bead barcode during FASTQ demultiplexing. This is enabled with the OverrideCycles
setting in samplesheet.csv
:
OverrideCycles: Y82;I8U24;I8;Y16
Breakdown of OverrideCycles:
Y82
: Read 1 (RNA sequence) - 82bpI8U24
: Index 1 (i7) - First 8bp used for demux (I8
), full 32bp preserved for analysisI8
: Index 2 (i5) - Full 8bp used for demuxY16
: Read 2 (RT barcode + molecular barcode) - 16bp
The first 8 cycles of Index 1 and all of Index 2 are used as an index (I
) for FASTQ demultiplexing. The complete 32bp sequence of Index 1 is preserved and available for Bead Barcode extraction by the Scale Bio Seq Suite pipeline. Note that the number of bases for read1 (transcript read) and read2 (barcode read) has to match the actual sequencing length of your run. If these differ from the recommended 82 and 16, the Y82
and Y16
entries in OverrideCycles
has to be adjusted.
Expected FASTQ File Numbers
The provided samplesheets will demultiplex the sequencing data into 96 sets of FASTQ files per sub-library, which the workflow will process in parallel for faster performance; note that these individual FASTQ files correspond to a random subset of beads and not to individual wells in the QuantumScale plate or to different samples.
Kit Type / Configuration | PCR Sequences Used | Unique FASTQs | Total FASTQ Files (R1/R2/I1/I2) |
---|---|---|---|
QuantumScale Small/Medium | QSR-P | 96 | 384 |
QuantumScale Large/Extra Large | QSR-1 to QSR-8 | 768 | 3,072 |
QuantumScale Modular | QSR-1 to QSR-12 | 1,152 | 4,608 |
Sequencing Setup
See the QuantumScale Single Cell RNA Sequencing Guidelines for further details on sequencing and FASTQ generation.
Read Configuration
Read | Purpose | Length |
---|---|---|
Read 1 | RNA sequence | 82bp |
Read 2 | RT (sample) barcode + Molecular Barcode | 16bp |
Index 1 (i7) | Bead Barcode | 32bp |
Index 2 (i5) | PCR barcode / sub-library index | 8bp |
ScaleRNA v1+ Kit FASTQs
RNA Kit v1.0
The v1.0 kit uses a simpler demultiplexing approach where all 96 i7 barcode sequences from the PCR plate are merged into one set of fastq files.
If an index2 (i5) read is used to demultiplex the ScaleBio RNA library with other libraries in the sequencing run, a index2
column can be added with the constant i5 sequence of the ScaleRna library, which is TGAACCTT[AC]
(8 or 10bp) in forward and AAGGTTCA[GT]
in reverse orientation.
RNA Kit v1.1
The v1.1 kit uses a different index read setup:
- One i5 sequence (index2
read) for each final distribution (PCR) plate well
- A pool of 4 i7 sequences (index1
read) for the whole plate
Some sequencers require the index2
(i.e. i5
) sequence in the opposite orientation; examples for both orientations are provided.
Extended Throughput Kit
Each extended throughput plate uses the same set of i5 (index2
) sequences, but a different pool of i7 (index1
). This creates a separate set of fastq files for each extended throughput plate (RNA-A-AP1
, RNA-A-AP2
, ...). See Extended Throughput for how to analyze these jointly.
To reduce workflow runtime through parallelization (--splitFastq
mode), it is best to avoid combining the entire data in one fastq file. Generally the Illumina default of splitting per lane (i.e. not setting bcl-convert option --no-lane-splitting
) is good for smaller runs.
An alternative is to modify samplesheet.csv
to assign a different sample_ID
to each i5 barcode, following a pattern like RNA-A-AP1_A1
, RNA-A-AP1_A2
; where all samples start with the same library name followed by _
and a unique tag. This way the fastq files will be grouped together for analysis in the ScaleRNA workflow.
ScaleRNA v1 Samplesheets
All ScaleRNA v1+ samplesheets are available:
The available configurations include:
Kit Type / Seq Configuration | Description |
---|---|
ScaleRNA v1.0 | Standard v1.0 kit configuration |
ScaleRNA v1.0 with i5 | v1.0 kit with index2 demultiplexing |
ScaleRNA v1.1 | Standard v1.1 kit configuration |
ScaleRNA v1.1 (reverse i5) | v1.1 kit with reverse index2 orientation |
Extended Throughput v1.1 | All three extended throughput plates including the base kit indexes |
Extended Throughput v1.1 (reverse i5) | Same as above but with with reverse index2 |
ScalePlex v1 | v1.1 kit with ScalePlex integration |
Extended Throughput + ScalePlex v1 | Extended throughput with ScalePlex integration |
Expected FASTQ File Counts
Kit Type / Configuration | Unique FASTQs | Total FASTQ Files (R1/R2/I1/I2) |
---|---|---|
ScaleRNA 3L v1.1 | 96 | 384 |
ScaleRNA 3L v1.1 + 3 ET | 768 | 3,072 |
Splitting Large Fastq Files for v1 kits
This creates a separate set of fastq files for each extended throughput plate (RNA-A-AP1
, RNA-A-AP2
, ...). See Extended Throughput for how to analyze these jointly.
To reduce workflow runtime through parallelization (--splitFastq
mode), it is best to avoid combining the entire data in one fastq file. Generally the Illumina default of splitting per lane (i.e. not setting bcl-convert option --no-lane-splitting
) is good for smaller runs. An alternative is to modify samplesheet.csv
to assign a different sample_ID
to each i5 barcode, following a pattern like RNA-A-AP1_A1
, RNA-A-AP1_A2
; where all samples start with the same library name followed by _
and a unique tag. This way the fastq files will be grouped together for analysis in the ScaleRNA workflow.
Need Help?
For more information, please contact support@scale.bio or visit our support website.