Look under the hood of the lineages_classification.R script
Published
September 1, 2023
Modified
October 7, 2024
Summary
Label variants of concern (VOC)
Assign lineages to variant classification names
Assign hex code colors to variant names
1 Quickstart
The lineages_classification.R script will import the lineages.csv file we created previously and assign lineages according to their variant classifications. It will also color code the variants based on CDC variant proportion colors. See the links below for more details:
This variable indicates what lineage is a variant of concern (VOC). It parses the lineage string to determine which are VOCs and which are not.
Determine variants being monitored (VBM). This is done via regular expression (regex) - grepl()
Determine variants of concern (VOC) also by regex
Use a case_when() function to assign lineages as VBM, VOC or neither
Code
lineage_data_1 <- active_lineages %>%# 'cdc_class' variable code# if lineage extracted is in that list, assign to "VBM", else "non VBM"mutate(vbm_class =ifelse(grepl(c("B.1.617.2|^AY.|^B\\.1\\.1\\.7$|^Q.| B.1.351|B.1.351.|^P.1|^P.1.|^B.1.427| ^B.1.429|B.1.525$|B.1.526$|B.1.617.1$| B.1.617.3$|B.1.621$|B.1.621.1$|P.2"), lineage_extracted),"VBM","non VBM" ),# assign variant of concern class# if variant in that list, label "VOC", else "non VOC"voc_class =ifelse(grepl(c("B.1.1.529|XBB"), lineage_extracted) |grepl(c("B.1.1.529|XBB"), description), "VOC", "non VOC"),# If adding in recombinant omicroncdc_class =case_when( vbm_class =="VBM"~"VBM", voc_class =="VOC"~"VOC", TRUE~"non VOC/VBM") )
Here’s an example of deriving the WHO name (Alpha, Delta, Omicron, etc). Note, this is just a small example, the list is much larger in the actual script
Code
# 'who_name' variable codemutate(who_name =case_when(# Variants being monitored (VBM)# Alpha lineage_extracted =="B.1.1.7"|grepl("^Q.", lineage_extracted) ~"Alpha",# exact match to "B.1.1.7" or starts with "Q."# Beta lineage_extracted =="B.1.351"|grepl("B.1.351", lineage_extracted) ~"Beta", # exact match to "B.1.351" or starts with "B.1.351."# Gamma lineage_extracted =="P.1"|grepl("^P.1",lineage_extracted) ~"Gamma",# exact match to "P.1" or starts with "P.1."# Epsilongrepl("^B.1.427|^B.1.429", lineage_extracted) ~"Epsilon",
Assign the hex color codes to the variant classifications. The colors are assigned to match the CDC variant proportions plot
Find the lineage reporting group by assigning variants in the current monitoring list or the former monitoring list.
Code
# These two lists should be mutually exclusivecurrently_monitoring_list <-c( "Other Omicron","BA.1.1","BA.2","BA.2.12.1","BA.2.75","BA.2.75.2","BA.4","BA.4.6","BA.5","BF.7","BF.11","BN.1","BA.5.2.6","BQ.1","BQ.1.1","XBB","XBB.1.5","Other")formerly_monitoring_list <-c("Alpha","Beta","Delta","Epsilon","Eta","Gamma","Iota","Kappa","Mu","Zeta")
#read in last days fileprevious_lineage_data <-read_csv("lineage_classifications.csv")lineage_data_final$lineage_extracted <-as.character(lineage_data_final$lineage_extracted)previous_lineage_data$lineage_extracted <-as.character(previous_lineage_data$lineage_extracted)length(previous_lineage_data)length(lineage_data_final)nrow(previous_lineage_data)nrow(lineage_data_final)#new_lineage_data <-anti_join(previous_lineage_data, lineage_data_final)#list of ones not in previous listnew_lineage_data <- lineage_data_final %>%filter(!lineage_extracted %in% previous_lineage_data$lineage_extracted)new_lineage_data
And finally write the results to a csv that can be used to make plots and reports