Version: 1.2
Author: Song Cao
Contact: scao@wustl.edu
Original Release: April 25, 2016
Last Updated: 2026 (LSF chained job dependency support)
If you use VirusScan, please cite:
Song Cao, Michael C. Wendl, Matthew A. Wyczalkowski, Kristine Wylie, Kai Ye, Reyka Jayasinghe, Mingchao Xie, Song Wu, Beifang Niu, Robert Grubb III, Kimberly J. Johnson, Hiram Gay, Ken Chen, Janet S. Rader, John F. DiPersio, Feng Chen, and Li Ding.
Divergent viral presentation among human tumors and adjacent normal tissues.
Scientific Reports, 2016, 6:28294.
VirusScan is a fully automated and modular pipeline for detecting known viral sequences from NGS data.
It is designed to run on LSF (IBM Spectrum LSF / Platform LSF) clusters using Docker.
All required tools are bundled inside the Docker image used by the pipeline.
- End-to-end viral detection from BAM files
- RepeatMasker-based low-complexity filtering
- Host genome BLAST filtering
- Viral BLASTN detection and assignment
- Uses NCBI taxonomy dump files directly
- Per-sample chained LSF job dependencies
- Supports full runs and partial restarts
Pipeline steps for each sample are submitted sequentially using LSF job dependencies:
-w "done(<previous_job>)"
Each sample runs independently, but steps within a sample never start before the previous step finishes.
All software dependencies are included in the Docker image.
No local installation is required for:
- RepeatMasker
- BLAST / BLAST+
- Perl modules (BioPerl, etc.)
- VirusScan helper scripts
Users must provide or configure paths to:
- Viral reference databases
- NCBI nt database (if used)
- NCBI taxonomy dump files (
nodes.dmp,names.dmp,merged.dmp) - Host reference genome (if applicable)
git clone https://github.com/ding-lab/VirusScan.git
cd VirusScanrun_folder/
├── sample1/
│ └── sample1.bam
├── sample2/
│ └── sample2.bam
Important:
The BAM file name must match the sample directory name.
perl VirusScan.pl <run_folder> <step_number>run_folder: directory containing sample subdirectoriesstep_number: pipeline step to execute
perl VirusScan.pl <run_folder> 0| Step | Description |
|---|---|
| 0 | Run all steps (1–14) |
| 1 | Extract unmapped non-human reads |
| 2 | Split files for RepeatMasker |
| 3 | Submit RepeatMasker job array |
| 4 | Sequence quality control |
| 5 | Split files for human genome BLAST |
| 6 | Submit human genome BLAST |
| 7 | Parse human genome BLAST results |
| 8 | Pool and split files for BLASTN |
| 9 | Submit BLASTN job array |
| 10 | Parse BLASTN results |
| 11 | Generate BLASTN summary |
| 12 | Assignment report per sample |
| 13 | Assignment summary per sample |
| 14 | Generate final run-level report |
| Step | Runs |
|---|---|
| 22 | Steps 2–14 |
| 23 | Steps 3–14 |
| 24 | Steps 4–14 |
| 25 | Steps 5–14 |
| 26 | Steps 6–14 |
| 27 | Steps 7–14 |
| 28 | Steps 8–14 |
| 29 | Steps 9–14 |
| 30 | Steps 10–14 |
| 31 | Steps 11–14 |
| 32 | Steps 12–14 |
| 33 | Steps 13–14 |
VirusScan submits all jobs using Docker-enabled LSF queues.
Example submission pattern:
bsub -q <queue> -n <threads> \
-R "select[mem>30000] rusage[mem=30000]" -M 30000000 \
-a 'docker(<docker_image>)' \
-o <out.log> -e <err.log> \
bash <job_script.sh>Song Cao
Email: scao@wustl.edu