Bacterial Genomics - Summary Report - Mock Data

Author
Affiliation

Marcela Torres
Dahlia Walters
Shawn Hawken

Molecular Epidemiology Program, WA DOH

Published

March 27, 2026

Overview New Samples

There are 5 new sequencing results in this BigBacter run.

This is a summary table which includes sample metadata and mapping of the sequencing ID to the corresponding WDRS CASE_ID, if available

WA ID ALT ID WDRS ID Collection Date Patient County Submitter County Submitter Facility
WA1000000 2024JQ-00001 100000000 2024-01-01 Thurston Spokane A Hospital
WA1200000 2024JQ-00002 200000000 2024-01-01 King Pierce A Laboratory
WA1300000 2024JQ-00003 300000000 2024-01-01 Whatcom Whatcom Microbiology LHJ
WA1400000 2024JQ-00004 400000000 2024-01-01 Snohomish King B Hospital
WA1500000 2024JQ-00005 500000000 2024-01-01 Pierce King B Laboratory

The new isolates were analyzed using Gubbins and classified by BigBacter as follows:

ID QUAL TAXA GENOMIC CLUSTER CLUSTER (n=) PARTITIONGubbins

PARTITION

(n=)

WA1000000 PASS Klebsiella_pneumoniae 4 22 5 1
WA1200000 PASS Klebsiella_pneumoniae 4 22 4 3
WA1300000 PASS Klebsiella_pneumoniae 208 1 NA NA
WA1400000 PASS Klebsiella_pneumoniae 301 6 3 1
WA1500000 PASS Klebsiella_pneumoniae 318 1 NA NA

Sequences that resulted in new genetic clusters are excluded from tree partitioning.

Of these, the following isolates resulted in new genetic clusters:

ID QUAL TAXA GENOMIC CLUSTER CLUSTER (n=)
WA1300000 PASS Klebsiella_pneumoniae 208 1
WA1500000 PASS Klebsiella_pneumoniae 318 1

Failed Isolates

The following isolates failed quality control.

Recombination

Bacterial recombination is the process where bacteria exchange genetic material with each other which leads to the gain of new DNA sequences into their genomes. It is important to be aware of recombination when conducting genomic analyses because recombination events can be confused with mutations events which can impact metrics used to characterize relationships between sequences, such as calculating single nucleotide polymorphisms (SNP) distances. The bioinformatics pipelines developed at WA PHL use Gubbins, a method to detect and control for recombination. If recombination is detected the sites where recombination is present are masked in the SNPs distance calculations and in the phylogenetic trees.

We evaluate recombination in multiple ways. First the number of sites where recombination was detected is divided by the total length of the core genome. If recombination is more than 5% in a genomic cluster the Gubbins outputs are used. If recombination is more than 1% but less than 5%, then the Snippy and Gubbins outputs are reviewed jointly to see if they yield different interpretations. If the interpretations differ, then most likely we will use the Gubbins for the genomic interpretations.

TAXA GENOMIC_CLUSTER MAX_%Recomb_Detected
Klebsiella_pneumoniae 4 7.038
Klebsiella_pneumoniae 301 1.803

Sequences that resulted in new genetic clusters are excluded from this calculation.

SNP Min and Max Distances

The minimum and maximum SNP distances calculated using Gubbins are summarized below.

Source MAX MIN
1770000000-Klebsiella_pneumoniae-00004-core-snps_dist.gubbins-long 315 1
1770000000-Klebsiella_pneumoniae-00301-core-snps_dist.gubbins-long 25638 62

Sequences that resulted in new genetic clusters are excluded from this calculation.

Genomic Linkages

Based on the SNP distances calculated using Gubbins, the following very strong (0–5 SNPs), strong (6–10 SNPs), and intermediate (11–50 SNPs) genomic linkages were identified between the new isolate(s) and other sequences within the corresponding genomic clusters.

ID VeryStrongGenLinks (0-5 SNP) StrongGenLinks (6-10 SNP) InterGenLinks (11-50 SNP)
WA1200000 WA0500000 WA0800000, WA0900000

Metadata

This is an overview of the metadata pertaining to each of the genomic clusters that contain new isolates. The facilities are the submitting facilities and the counties the submitting facilities’ county.

Taxa_GenomicCluster Min_CollDate Max_CollDate All_Counties New_Counties All_Facilities New_Facilities All_IDs New_IDs Same_DOB_Isolates
Klebsiella_pneumoniae_208 01-30-2022 2024-01-01 Skagit, Whatcom Whatcom A Hospital,X Laboratory A Hospital WA1300000 WA1300000 No isolates from the same case
Klebsiella_pneumoniae_301 02-30-2020 2024-01-01 King King B Hospital B Hospital WA1400000 WA1400000 No isolates from the same case
Klebsiella_pneumoniae_318 03-30-2023 2024-01-01 Pierce, Snohomish, King King C Hospital,General Hospital,A Laboratory A Laboratory WA1500000 WA1500000 No isolates from the same case
Klebsiella_pneumoniae_4 04-30-2021 2024-01-01 Chelan, Grant, Spokane Spokane Medical Center, D Health, B Laboratory B Laboratory WA1200000, WA1000000 WA1200000, WA1000000 DOB: 1999-01-01 IDs: WA0800000, WA0900000

Isolates listed as having the same DOB might or might not be isolates from the same case. Check against epi data to confirm isolates listed are indeed from the same case.

Resources

The code to generate this report is available here:
https://github.com/NW-PaGe/BacterialGenomicsSummaryOutput

The following bioinformatics methods were used by WA PHL to generate some of the data summarized in this report:
BigBacter bioinformatics pipeline https://github.com/doh-jdj0303/bigbacter-nf
Snippy https://github.com/tseemann/snippy
Gubbins https://github.com/nickjcroucher/gubbins