hybracter creates a number of output files in different formats.
Main Output
The main outputs are in the FINAL_OUTPUT directory.
This directory will include:
Summary File
hybracter_summary.tsvfile. This gives the summary statistics for your assemblies with the following columns:
| Sample | Complete (True or False) | Total_assembly_length | Number_of_contigs | Most_accurate_polishing_round | Longest_contig_length | Longest_contig_coverage | Number_circular_plasmids |
|---|---|---|---|---|---|---|---|
Summary Assemblies
- The
completeandincompletedirectories will contain the summary assemblies for all samples.
All samples that are denoted by hybracter to be complete will have 5 outputs in the complete directory:
sample_summary.tsv containing the summary statistics for that sample.sample_per_contig_stats.tsv containing the contig names, lengths, GC% and whether the contig is circular.sample_final.fasta containing the final assembly for that sample.sample_chromosome.fasta containing only the final chromosome(s) assembly for that sample.sample_plasmid.fasta containing only the final plasmid(s) assembly for that sample. Note this may be empty. If this is empty, then that sample had no plasmids.- Note - there may be a number of non-circular "plasmid" contigs. Be careful assuming these are truly plasmids and check the plassmbler output in
supplementary_results. These may be assembly artefacts that should be excluded, or indicate that your long- and short-read sets aren't well matched!
- Note - there may be a number of non-circular "plasmid" contigs. Be careful assuming these are truly plasmids and check the plassmbler output in
All samples that are denoted by hybracter to be incomplete will have 3 outputs in the incomplete directory:
sample_summary.tsv containing the summary statistics for that sample.sample_per_contig_stats.tsv containing the contig names, lengths, GC% and whether the contig is circular.sample_final.fasta containing the final assembly for that sample.
Other Outputs
supplementary_results directory
The supplementary_results directory contains a number of supplementary results that you might find useful:
1. comparisons directory
- This directory contains visual representations comparing the effect of each polishing round for each sample using a modified version of Ryan Wick's compare_assemblies.py script. An example is below
contig_1 37368-37398: ACCATTTTTGTTTTATTTTTTGTAAAGACAC
contig_1 37368-37397: ACCATTTTTGTTTTA-TTTTTGTAAAGACAC
*
contig_1 43247-43277: CAACGTTGTTTTCCCTGAGCCTAAATAACCA
contig_1 43246-43276: CAACGTTGTTTTCCCCGAGCCTAAATAACCA
*
contig_1 44658-44688: CTTGATCTTTATCTATGATTTCATTAATACT
contig_1 44657-44687: CTTGATCTTTATCTACGATTTCATTAATACT
*
- If this file is empty, there are no differences between assemblies
2. intermediate_chromosome_assemblies directory
- This directory contains intermediate chromosome assemblies for all polishing rounds for each sample.
3. flye_individual_summaries directory
- This directory contains individual sample summaries from Flye for all samples.
4. plassembler_individual_summaries directory
- This directory contains individual sample summaries from Plassembler for each sample.
5. plassembler_all_assembly_summary directory
- This directory contains individual sample summaries from Plassembler for all samples.
6. pyrodigal_mean_length_summaries directory
- For
long, this directory contains pyrodigal mean CDS length summary files for each polishing round for each sample.
7. pyrodigal_mean_length_summaries_plassembler directory
- For
long, this directory contains pyrodigal mean CDS length summary files for each polishing round for each sample for the plassembler assembled plasmids.
processing directory
The processing directory will contain a number of intermediate directories whose information you might find useful:
1. flye directory
- This directory will contain the Flye assembly output and associated intermediate files for each sample
2. qc directory
This directory will contain the filtered, trimmed and contaminant removed FASTQ reads (where applicable) for each sample, along with the estimated chromosome size (which will be used if you specify --auto)
3. plassembler directory
- This directory will contain the Plassembler assembly output and associated intermediate files for each sample
4. chrom_pre_polish directory
- This directory will contain the pre-polished chromosome assemblies for complete isolates
5. complete and incomplete directories
- These directories will contain the medaka, polypolish and pypolca polishing and dnaapler reorientation intermediate files for each sample
6. ale_out_files directory
- For
hybrid, this directory will intermediate ALE files for each assembly polishing round internal tohybracter(so can be ignored).
7. ale_scores_complete and ale_scores_incomplete directories
- These directories will containin ALE scores for each assembly polishing round.
stderr directory
- This will contain log files for each program in
hybracter.
versions directory
- This will contain the specific versions used for each program in
hybracter.
flags directory
- This will contain flag files internal to
hybracter(so can be ignored).
completeness directory
- This will contain flag files internal to determine completeness internal to
hybracter(so can be ignored).
benchmarks directory
- This will contain benchmarking time and memory usage statistics for each program in hybracter.