There are two common modes of DNA sequencing: whole genome sequencing and exome sequencing. Exome sequencing methods sequence just the exonic regions which typically comprise 1-2% of the whole genome. Whole genome sequencing methods of course sequence the whole genome. Reads coming from the sequencer are then aligned to the reference genome and the resulting BAM file is imported into Strand NGS. For storage size computation, all data upstream of this BAM file can be treated as transient, so only storage for BAM files and subsequent analyses needs to be planned.
The size of a BAM file depends on coverage (the average number of times each base is read) and read length. A few examples are provided in Table 1 below. Please note that sizes in Strand NGS have an overhead. This arises from storage of extra information, which enables fast access and visualization later.
Coverage | No. of Reads | Read Length | BAM File Size | Strand NGS Size | |
Whole Genome |
37.7x | 975,000,000 | 115 | 82 GB | 104 GB |
Whole Genome |
38.4x | 3,200,000,000 | 36 | 138 GB | 193 GB |
Exome | 40x | 110,000,000 | 75 | 5.7 GB | 7.1 GB |
Allowing for some extra analysis results storage and assuming whole genome samples are done at read lengths of 75 or above, the size of each whole genome sample can be rounded off to about 150 GB and the size of each exome sample to about 8 GB. Space for backups also needs to be taken into consideration. With these assumptions, the total storage requirement for a few scenarios is illustrated in Table 2 below.
Whole Genome Samples | Exome Samples | Space | Space including Backup |
0 | 200 | 1.6 TB | 3.2 TB |
0 | 1000 | 8.0 TB | 16 TB |
100 | 0 | 15 TB | 30 TB |
1000 | 0 | 150 TB | 300 TB |
100 | 1000 | 23 TB | 46 TB |
2 TB hard drives are available off the shelf; two of these should more than suffice for running 250 exome sequencing samples. Strand NGS can be configured to add storage incrementally, so you can start with a 2*2 TB hard disk and add further disks on demand if needed. If you need to plan for more than 10 TB of storage we recommend a network storage solution as opposed to adding disks to a single machine.
Computation speeds for various tasks for Strand NGS v2.9 are given below. These are generated on a 16-core machine, but these analyses can even be run on a standard laptop with 4 GB of RAM at proportionately reduced speeds. A minimum of 8 GB of RAM is recommended for alignment tasks in case of large genomes.
Machine details 16 cores @ 2.7GHz, 32 GB RAM |
|
Sample details DNA reads of a human (NA12878) sample Size of the fastq.gz files: 92 GB; #Reads: 1.16 billion paired-end reads Read length: 150bp |
|
Task | Time Taken |
Alignment of DNA reads | 6 hr 26 min
(~11.5 million reads /hour/core)
|
Import of the aligned reads (includes computation of QC statistics) | 5 hr 59 min |
Local realignment (includes recomputation of QC statistics) | 9 hr 31 min |
Base quality recalibration (includes recomputation of QC statistics) | 8 hr 54 min |
Read Filters (includes recomputation of QC statistics) | 10 hr 41 min |
SNP detection (includes annotating with dbSNP 138) | 5 hr 47 min |
If you require more information please contact our Support Team.
Download this page as a PDF document.