Virome Assembly and Bin Evaluation

Overview

Teaching: min
Exercises: 30 min
Questions
  • How to bin virus genomes from the assembly file

  • How to evaluate binning results

Objectives

3. Bin virus genomes

Tools: MaxBin2, CheckV

In this section, we start with the assembled contig file belonging to the same project to save time and focus on binning and evaluation. By starting with a pre-assembled contig file, you streamline the process and focus on binning and quality evaluation, ensuring efficient and effective analysis of your viromics data.

Download the assembled file with wget or curl

wget -nc https://zenodo.org/records/10650983/files/illumina_sample_01_megahit.fa.gz?download=1 -O workshop_day5/illumina_sample_01_megahit.fa.gz curl -L https://zenodo.org/records/10650983/files/illumina_sample_01_megahit.fa.gz?download=1 -o workshop_day5/illumina_sample_01_megahit.fa.gz


Create abundance_counts file

First, we need an abundance_counts file to be used later in

# Index the contig file (conda install bioconda::bwa)
bwa index PRJEB47625/illumina_sample_01_megahit.fa.gz

# Align reads to contigs
bwa mem PRJEB47625/illumina_sample_01_megahit.fa.gz PRJEB47625/ERR6797441_1.fastq.gz PRJEB47625/ERR6797441_2.fastq.gz > aligned_reads.sam

# Convert SAM to BAM (conda install bioconda::samtools)
samtools view -bS aligned_reads.sam > aligned_reads.bam

# Sort BAM file
samtools sort aligned_reads.bam -o sorted_reads.bam

# Index BAM file
samtools index sorted_reads.bam

bedtools bamtobed -i sorted_reads.bam > intervals.bed

# Count reads mapped to each contig (conda install bioconda::bedtools)
bedtools coverage -a intervals.bed -b sorted_reads.bam > abundance_counts.txt

Step 1: Bin virus genomes from the assembly file using MaxBin2

MaxBin2 is a tool designed to bin metagenomic contigs into individual genomes, including viral genomes. Follow website instructions or use conda install bioconda::maxbin2 to install maxbin2 via conda.

Usage:

cd workshop_day5
wget -c https://zenodo.org/records/10650983/files/illumina_sample_01_megahit.fa.gz -O PRJEB47625/illumina_sample_01_megahit.fa.gz

# Run MaxBin2 for binning
run_MaxBin.pl -contig PRJEB47625/illumina_sample_01_megahit.fa.gz -abund abundance_counts.txt -out bins_directory

MaxBin2 will generate bins of contigs, each representing a putative genome, including viral genomes.

Step 2: Evaluate bins using CheckV

CheckV is used to assess the quality of viral genomes from metagenomic assemblies. Conda installation command is conda install bioconda::checkv Usage:

# Run CheckV on the binned data
checkv end_to_end bins_directory checkv_output -t 4

CheckV evaluates the completeness and contamination of viral bins, providing quality metrics that help in refining and validating your viral genomes.

Key Points

  • MaxBin2, CheckV