ScaleRna FAQs
Q1. How do I produce fastq files using the ScaleRna workflow when starting from an illumina bcl run folder?
As default for ScaleRna v2.0+ fastq files are excluded from the output. To export fastq files you enable the parameter fastqOut : true
in your runParams.yml file. Please see more information about the following:
Q2. I've generated fastqs, but where are my sample specific fastqs?
The workflow expects library-level fastqs and performs sample-level demultiplexing internally. For details, see Sequencing Reads & Fastq Generation. For compute and storage optimization, the workflow produces sample-level demultiplexed unaligned BAM files, which can be published to the output directory using the --bcParserBamOut
parameter. These unaligned BAM files can be used as input into STARsolo, however, other alignment softwares have not been tested and are therefore not supported.
Q3. How do I generate index reads if I only have fastq files?
The index reads are typically exported within each read header of either R1 or R2. We recommend, if possible, re-running fastq generation from bcl files to maintain quality scores. If that's not possible, we provide a custom script to extract index reads from headers.
Technical Details:
Index Reads for ScaleBio QuantumScale RNA libraries:
For ScaleBio QuantumScale RNA libraries the RT and bead cell-barcodes are included in index 1, while the PCR cell-barcode is in the index 2 read. Hence index read fastq files are a required input (pre-made samplesheets). When using bcl-convert
include this setting: \
CreateFastqForIndexReads,1
Index Read Extraction from Fastq Headers:
If it is not possible then we have a custom script to extract the index reads from the header of Read 1 and export them in fastq format: makeIndexFqs.py
Please see more information about "Index Read Extraction Post FASTQ Generation" here: Sequencing Reads
Q4. What are the index reads and why are they required?
For all ScaleBio libraries, the i5 and i7 index read contain valuable information necessary for data analysis. The index reads sequences are used to: 1. Identify different adapter primers for plate level identification (i7) 2. Contain the entire cell barcode or partial cell barcode (dependency on kit) 3. The nucleotide sequences themselves are used within the pipeline and are ultimately exported as one of the cell barcodes
Quick Answers: - Do the index reads contain sample information? NO - Why do index reads contain no sample level information? Sample barcodes are defined by RT barcode and are located within R1/R2. Standard illumina demultiplexing does not support Read1/Read2 level barcode filtering - this is done within the ScaleBio pipeline.
Important Notes: - Index reads do NOT contain sample information - Sample barcodes are defined by RT barcode and are located within R1/R2, depending on kit configuration - Standard illumina demultiplexing does not support Read1/Read2 level barcode filtering - this is done within the ScaleBio pipeline
Q5. What happens when my index reads are longer than the required length?
The pipeline will support index read length longer than expected, however, users may have trouble demultiplexing with ScaleBio pre-made samplesheets. This is because the index length needs to match what is supplied in the samplesheet and corresponds to the OverRideCycles
setting. The index read length in the "RunInfo.yml" file needs to match that of the samplesheet as well as the nucleotide sequence entries in both index
and index2
columns.
Solution 1: Make a custom samplesheet.csv with the common index nucleotide sequence that matches the length of the cycle parameters.. This way the demux will occur properly and the pipeline will ignore any sequences beyond what it expects for the library being analyzed. Solution 2: Export all reads from the runfolder then trim the index reads to the proper length. The pipeline will run an internal demux so any reads that fail illumina demux will also fail ScaleBio demux, therefore there is no concern of using all reads from a sequencer as input in the ScaleRna workflow.
Q6. How do I merge together sequencing data from multiple sequencing runs but one ScaleBio single cell library?
See Merging and Multi-Run Analysis for instructions on merging data from multiple sequencing runs.
Q7. How does the ScaleRna pipeline assign sample barcodes to my samples?
See Sample Barcode Table for details on how barcodes are parsed, error correction, and sample demultiplexing.
Q8. What are the cell thresholding options and relevant parameters in the ScaleRna workflow?
See Cell Calling for a full explanation of cell thresholding methods and parameters.
Q9. Can I analyze multiple species in one ScaleBio pipeline run?
See Reference Genomes for multi-species analysis guidance.
Q10. What does including multi-gene reads and Prop/Unique mean for my gene count matrices?
See Alignment Output for an explanation of multi-gene reads, Prop/Unique counting modes, and their impact on gene count matrices.
Need Help?
For more information, please contact support@scale.bio or visit our support website.