20  Phylogenetic Tree Interpretation Cheat Sheet

⬜ Developing Hypotheses
⬜ Sample Collection
⬜ Outbreak Investigation
⬜ Sequencing
⬜ Bioinformatics
🟩 Molecular Epidemiology
⬜ Public Health Implementation

Term What it means In other words When to use…
Tip (Leaf) A sampled sequence; the end of a branch “A genome sequence from an organism, like a virus or bacteria” You’re describing the outer points on the tree
Internal Node A common ancestor between sequences (not sampled, but inferred during phylogenetic construction) “A point where the virus split into different lineages” Talking about relationships or ancestry
Clade A branch that includes one ancestor and all of its descendants “A group of viruses that all came from the same source” Pointing out related sequences or transmission clusters
Monophyletic Group A clade where all members share a unique ancestor “Everyone in this group is more related to each other than to anyone else” Emphasizing shared ancestry or outbreak origins
Polytomy An unresolved split with more than two branches; could be due to lack of sampling; may indicate either that we don’t know how the descendent lineages are related or that we think that the descendent lineages diverged simultaneously “The data couldn’t tell us the exact branching order” Explaining uncertainty in relationships
Branch A line that represents genetic change “The amount of evolution that’s happened” Interpreting divergence
Branch Length Genetic change (or time, if time-scaled) “Longer branches indicate more mutations have accumulated over time” Comparing how far sequences have evolved
Root The starting point or most recent common ancestor “Where we believe the outbreak started, based on the reference” Orienting the tree
Outgroup A sequence (or set of sequences) known to fall outside the main group of interest. Outgroups can also be used to root trees, providing directionality in the topology of the trees or evolutionary context “A virus that’s related but not part of the outbreak. OR We use it to figure out where the tree should start and how the others evolved” You want to orient the tree’s root, or distinguish between ingroup and outgroup sequences during tree interpretation
Divergence Observed genetic distance from the root, can be a proportion representing number of substitutions per site, or a whole number when scaled by the genome length to represent number of mutational differences from the root “How far a virus has evolved from the original” Reading the x-axis
Lineage A set of sequences with shared mutations or recent common origin “A group of related viruses that possess a set of specific mutations” Discussing clusters, variants, or introductions
Cluster A set of sequences that are genetically similar “A group of genetically related sequences that may be part of the same outbreak or the same transmission event” Discussing sequences that might be linked due to grouping closely together on a tree
Reversion When a mutation returns to its original state “When a site mutates, then mutates back to its original base. It’ll look as if a change hadn’t occurred when it has” Suspect a lineage is underestimating divergence due to hidden mutation history, especially in fast-evolving viruses or distance-based trees
Homoplasy A mutation that appears in different sequences independently “The same mutation happened in different viruses on their own, like evolution repeating itself” Suspect convergent evolution, possibly due to viral fitness advantages (e.g., increased transmissibility or immune escape)