hybracter Commands and Input CSV
Commands
hybracter install: Installs required databases forhybracter.hybracter hybrid: Assemble multiple genomes from isolates that have long-reads and paired-end short reads.hybracter hybrid-single: Assembles a single genome from an isolate with long-reads and paired-end short reads. It takes similar parameters to Unicycler.hybracter long: Assemble multiple genomes from isolates that have long-reads only.hybracter long-single: Assembles a single genome from an isolate with long-reads only.hybracter install: Downloads and installs the requiredplassemblerdatabase.hybracter test-hybrid: Runshybracter hybridon a small test dataset.hybracter test-long: Runshybracter longon a small test dataset.
hybracter -h
_ _ _
| |__ _ _| |__ _ __ __ _ ___| |_ ___ _ __
| '_ \| | | | '_ \| '__/ _` |/ __| __/ _ \ '__|
| | | | |_| | |_) | | | (_| | (__| || __/ |
|_| |_|\__, |_.__/|_| \__,_|\___|\__\___|_|
|___/
Usage: hybracter [OPTIONS] COMMAND [ARGS]...
For more options, run: hybracter command --help
Options:
-h, --help Show this message and exit.
Commands:
install Downloads and installs the plassembler database
hybrid Run hybracter with hybrid long and paired end short reads
hybrid-single Run hybracter hybrid on 1 isolate
long Run hybracter with only long reads
long-single Run hybracter long on 1 isolate
test-hybrid Test hybracter hybrid
test-long Test hybracter long
config Copy the system default config file
citation Print the citation(s) for hybracter
version Print the version for hybracter
Input csv
hybracter hybrid and hybracter long require an input csv file to be specified with --input. No other inputs are required.
- This file requires no headers.
- Other than the reads,
hybracterrequires a value for a lower bound the minimum chromosome length for each isolate in base pairs. It must be an integer. hybracterwill denote contigs about this value as chromosome(s) and if it can recover a chromosome, it will denote the isolate as complete.- In practice, I suggest choosing 90% of the estimated chromosome size for this value.
- e.g. for S. aureus, I'd choose 2500000, E. coli, 4000000, P. aeruginosa 5500000.
hybracter hybrid
hybracter hybridrequires an input csv file with 5 columns.- Each row is a sample.
- Column 1 is the sample name you want for this isolate.
- Column 2 is the long read fastq file.
- Column 3 is the minimum chromosome length for that sample.
- Column 4 is the R1 short read fastq file
- Column 5 is the R2 short read fastq file.
e.g.
s_aureus_sample1,sample1_long_read.fastq.gz,2500000,sample1_SR_R1.fastq.gz,sample1_SR_R2.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz,5500000,sample2_SR_R1.fastq.gz,sample2_SR_R2.fastq.gz
| s_aureus_sample1 | sample1_long_read.fastq.gz | 2500000 | sample1_SR_R1.fastq.gz | sample1_SR_R2.fastq.gz |
|---|---|---|---|---|
| p_aeruginosa_sample2 | sample2_long_read.fastq.gz | 5500000 | sample2_SR_R1.fastq.gz | sample2_SR_R2.fastq.gz |
- If you use
--auto, you can remove the column with the chromosome length
e.g.
s_aureus_sample1,sample1_long_read.fastq.gz,sample1_SR_R1.fastq.gz,sample1_SR_R2.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz,sample2_SR_R1.fastq.gz,sample2_SR_R2.fastq.gz
| s_aureus_sample1 | sample1_long_read.fastq.gz | sample1_SR_R1.fastq.gz | sample1_SR_R2.fastq.gz |
|---|---|---|---|
| p_aeruginosa_sample2 | sample2_long_read.fastq.gz | sample2_SR_R1.fastq.gz | sample2_SR_R2.fastq.gz |
hybracter long
hybracter long also requires an input csv with no headers, but only 3 columns.
hybracter longrequires an input csv file with 3 columns.- Each row is a sample.
- Column 1 is the sample name you want for this isolate.
- Column 2 is the long read fastq file.
- Column 3 is the minimum chromosome length for that sample.
e.g.
s_aureus_sample1,sample1_long_read.fastq.gz,2500000
p_aeruginosa_sample2,sample2_long_read.fastq.gz,5500000
| s_aureus_sample1 | sample1_long_read.fastq.gz | 2500000 |
|---|---|---|
| p_aeruginosa_sample2 | sample2_long_read.fastq.gz | 5500000 |
- If you use
--auto, you can remove the column with the chromosome length
s_aureus_sample1,sample1_long_read.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz
| s_aureus_sample1 | sample1_long_read.fastq.gz |
|---|---|
| p_aeruginosa_sample2 | sample2_long_read.fastq.gz |
Pacbio Reads
Thanks to Brad Hart (@biobrad) for volunteering this tutorial on how to run Hybracter with PacBio long reads.
Hybracter works with PacBio long reads once they have been extracted from the PacBio BAM file.
To extract the raw fastq long read from the BAM file:
First, install pbtk: https://github.com/PacificBiosciences/pbtk
bash
conda create -n pacbio -c bioconda pbtk
conda activate pacbio
Next, extract the fastq, requiring a 'pbi' file as an index:
pbindex yourpacbiofile.bam
Then to extract the fastq file from the bam:
bam2fastq -o yourfastqfilename yourpacbiofile.bam
The output file will be:
yourfastqfilename.fastq.gz
which you can feed into any of the Hybracter input formats as above.