`hybracter` Advanced Usage

Snakemake Profiles

It is highly recommended to run hybracter using a Snakemake profile if you are using a HPC and have multiple isolates to assemble.

An example slurm profile is included in the profile directory, and check out this link for more detail on other HPC job scheduler profiles.

You can run hybracter with a profile using --profile e.g.:

hybracter hybrid --input <input.csv> --output <output_dir> --threads <threads> --profile profiles/hybracter

The following guide below has been copied and modified from the hecatomb documentation

About Snakemake profiles

Snakemake profiles are a must-have for running Snakemake pipelines on HPC clusters. While they can be a pain to set up, you only need to do this once and then life is easy.

For more information, check the Snakemake documentation on profiles, or this recent blog post on Snakemake profiles.

Profiles for Hybracter

The example profile config.yaml file contains all the Snakemake options for jobs that are submitted to the scheduler. hybracter expects the following in the cluster commands:

resources.time for time in minutes
resources.mem_mb for requested memory in Mb
threads for requested CPUs

We have tried to use what we believe is the most common nomenclature for these variables in Snakemake pipelines in the hopes that hybracter is compatible with existing Snakemake profiles and the available Cookiecutter profiles for Snakemake.

We recommend redirecting STDERR and STDOUT messages to log files using the Snakemake variables {rule} and {jobid}, for instance like this --output=logs/{rule}/{jobid}.out. You should also prepend the scheduler command with a command to make the log directories in case they don't exists (it can cause errors for some schedulers), in this example like so: mkdir -p logs/{rule}/ && sbatch .... This will make troubleshooting easier for jobs that fail due to scheduler issues.

The example profile includes a 'watcher' script. Snakemake won't always pick up when a scheduler prematurely terminates a job, which is why we need a watcher script. This line in the config file tells Snakemake how to check on the status of a job: cluster-status: ~/.config/snakemake/slurm/slurm-status.py (be sure to check and update your file path). The slurm-status.py script will query the scheduler with the jobid and report back to Snakemake on the job's status.

Profile installation examples

We'll walk through two ways to set up a profile for the Slurm workload manager. If your HPC uses a different workload manager, the process of installing a profile will be similar but different.

Copy an example profile

We have provided an example profile for the Slurm workload manager that should work for most HPCs using Slurm. Snakemake will look for profiles in your home directory at:

~/.config/snakemake/

First create a directory for your new profile, we'll call it 'slurm':

mkdir -p ~/.config/snakemake/slurm

Now copy the files for the example slurm profile (you can view them here on GitHub):

# go to your new profile directory
cd ~/.config/snakemake/slurm/
# copy the files (either from GitHub or from where you installed hybracter)
wget https://raw.githubusercontent.com/gbouras13/hybracter/main/profiles/slurm/config.yaml
wget https://raw.githubusercontent.com/gbouras13/hybracter/main/profiles/slurm/slurm_status.py

This example includes the necessary config.yaml file for profiles and a watcher script called slurm-status.py. Make the watcher script executable:

chmod +x ~/.config/snakemake/slurm/slurm-status.py

Done! You can now use this profile with hybracter:

hybracter test-hybrid --profile slurm

Create a profile with cookiecutter

Cookiecutter is a nifty tool for creating projects using a template. For Snakemake profiles, Cookiecutter takes away a lot of the manual configuration steps involved with setting up a profile for a specific scheduler. There are currently Cookiecutter templates for Slurm, SGE, PBS, and several other workload managers, and hybracter is intended to be compatible with these profiles.

We will walk through installing the Slurm profile using Cookiecutter. To begin, create a new directory for your profile:

mkdir -p ~/.config/snakemake/slurm

Move to this directory and run the Cookiecutter command for this profile:

cd ~/.config/snakemake/slurm/
cookiecutter https://github.com/Snakemake-Profiles/slurm.git

Follow the prompts and you're done! For our system, we did not need to specify anything, but you may need to specify account information for billing etc. There is more detail on the GitHub page for this profile.

Use the profile with Hecatomb:

hybracter test-hybrid --profile slurm

hybracter Advanced Usage