hybracter

hybracter is an automated long-read first bacterial genome assembly pipeline implemented in Snakemake using Snaketool.

Overview

hybracter is designed for assembling bacterial isolate genomes using a long read first assembly approach. It scales massively using the embarassingly parallel power of HPC and Snakemake profiles. It is designed for applications where you have isolates with Oxford Nanopore Technologies (ONT) long reads and optionally matched paired-end short reads for polishing.

hybracter is designed to straddle the fine line between being as fully feature-rich as possible with as much information as you need to decide upon the best assembly, while also being a one-line automated program. In other words, as awesome as Unicycler, but updated for 2023. Perfect for lazy people like myself.

hybracter is largely based off Ryan Wick's magnificent tutorial and associated paper. hybracter differs in that it adds some additional steps regarding targeted plasmid assembly with plassembler, contig reorientation with dnaapler and extra polishing and statistical summaries.

Image

  • A. Reads are quality controlled and subsampled with Filtlong, Porechop, fastp, Seqkit and optionally contaminant removal using modules from trimnami.
  • B. Long-read assembly is conducted with Flye. Each sample is clssified if the chromosome(s) were assembled (marked as 'complete') or not (marked as 'incomplete') based on the given minimum chromosome length.
  • C. For complete isolates, plasmid recovery with Plassembler.
  • D. For all isolates, long read polishing with Medaka.
  • E. For complete isolates, circularised chromosome(s) are reorientated to begin with the dnaA gene with dnaapler.
  • F. For all isolates, if short reads are provided, short-read polishing with Polypolish and Pypolca depending on short-read depth.
  • G. For all isolates, assessment of all assemblies with ALE for hybracter hybrid or Pyrodigal for hybracter long.
  • H. The best assembly is selected (the last is taken for hybracter hybrid) and output along with final assembly statistics.