hybracter Commands and Input CSV

Commands

  • hybracter install: Installs required databases for hybracter.
  • hybracter hybrid: Assemble multiple genomes from isolates that have long-reads and paired-end short reads.
  • hybracter hybrid-single: Assembles a single genome from an isolate with long-reads and paired-end short reads. It takes similar parameters to Unicycler.
  • hybracter long: Assemble multiple genomes from isolates that have long-reads only.
  • hybracter long-single: Assembles a single genome from an isolate with long-reads only.
  • hybracter install: Downloads and installs the required plassembler database.
  • hybracter test-hybrid: Runs hybracter hybrid on a small test dataset.
  • hybracter test-long: Runs hybracter long on a small test dataset.
hybracter -h 
 _           _                    _            
| |__  _   _| |__  _ __ __ _  ___| |_ ___ _ __ 
| '_ \| | | | '_ \| '__/ _` |/ __| __/ _ \ '__|
| | | | |_| | |_) | | | (_| | (__| ||  __/ |   
|_| |_|\__, |_.__/|_|  \__,_|\___|\__\___|_|   
       |___/


Usage: hybracter [OPTIONS] COMMAND [ARGS]...

  For more options, run: hybracter command --help

Options:
  -h, --help  Show this message and exit.

Commands:
  install        Downloads and installs the plassembler database
  hybrid         Run hybracter with hybrid long and paired end short reads
  hybrid-single  Run hybracter hybrid on 1 isolate
  long           Run hybracter with only long reads
  long-single    Run hybracter long on 1 isolate
  test-hybrid    Test hybracter hybrid
  test-long      Test hybracter long
  config         Copy the system default config file
  citation       Print the citation(s) for hybracter
  version        Print the version for hybracter

Input csv

hybracter hybrid and hybracter long require an input csv file to be specified with --input. No other inputs are required.

  • This file requires no headers.
  • Other than the reads, hybracter requires a value for a lower bound the minimum chromosome length for each isolate in base pairs. It must be an integer.
  • hybracter will denote contigs about this value as chromosome(s) and if it can recover a chromosome, it will denote the isolate as complete.
  • In practice, I suggest choosing 90% of the estimated chromosome size for this value.
  • e.g. for S. aureus, I'd choose 2500000, E. coli, 4000000, P. aeruginosa 5500000.

hybracter hybrid

  • hybracter hybrid requires an input csv file with 5 columns.
  • Each row is a sample.
  • Column 1 is the sample name you want for this isolate.
  • Column 2 is the long read fastq file.
  • Column 3 is the minimum chromosome length for that sample.
  • Column 4 is the R1 short read fastq file
  • Column 5 is the R2 short read fastq file.

e.g.

s_aureus_sample1,sample1_long_read.fastq.gz,2500000,sample1_SR_R1.fastq.gz,sample1_SR_R2.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz,5500000,sample2_SR_R1.fastq.gz,sample2_SR_R2.fastq.gz
s_aureus_sample1 sample1_long_read.fastq.gz 2500000 sample1_SR_R1.fastq.gz  sample1_SR_R2.fastq.gz
p_aeruginosa_sample2 sample2_long_read.fastq.gz 5500000 sample2_SR_R1.fastq.gz  sample2_SR_R2.fastq.gz

hybracter long

hybracter long also requires an input csv with no headers, but only 3 columns.

  • hybracter long requires an input csv file with 3 columns.
  • Each row is a sample.
  • Column 1 is the sample name you want for this isolate.
  • Column 2 is the long read fastq file.
  • Column 3 is the minimum chromosome length for that sample.

e.g.

s_aureus_sample1,sample1_long_read.fastq.gz,2500000
p_aeruginosa_sample2,sample2_long_read.fastq.gz,5500000
s_aureus_sample1 sample1_long_read.fastq.gz 2500000
p_aeruginosa_sample2 sample2_long_read.fastq.gz 5500000

Pacbio Reads

Thanks to Brad Hart (@biobrad) for volunteering this tutorial on how to run Hybracter with PacBio long reads.

Hybracter works with PacBio long reads once they have been extracted from the PacBio BAM file.

To extract the raw fastq long read from the BAM file:

First, install pbtk: https://github.com/PacificBiosciences/pbtk

bash conda create -n pacbio -c bioconda pbtk conda activate pacbio

Next, extract the fastq, requiring a 'pbi' file as an index:

 pbindex yourpacbiofile.bam

Then to extract the fastq file from the bam:

 bam2fastq -o yourfastqfilename yourpacbiofile.bam

The output file will be:

yourfastqfilename.fastq.gz

which you can feed into any of the Hybracter input formats as above.