NORTHWEST PATHOGEN GENOMICS CENTER OF EXCELLENCE

wadoh_raccoon python package

A Python package for transforming and linking pathogen sequencing/subtyping metadata.


Example Functions

  • accession matching
  • fuzzy matching
  • data cleaning utilities
import wadoh_raccoon as tp

# input your dataframes and matching columns:
tp.fuzzZ(
    source=df1,
    reference=df2,
    first_name_src="first",
    last_name_src="last",
    dob_src="dob",
    first_name_ref="first",
    last_name_ref="last",
    dob_ref="dob"
)
first last dob
Jon Doe 1990-01-01
Smith Jaane 1985-05-15
Alex Johnson 2000-09-10

id first last dob
1 john Dooe 1990-01-01
2 jane smith 1985-05-15
3 Alice Johnson 2020-09-10








id first last dob
1 JOHN DOE 1990-01-01
2 JANE SMITH 1985-05-15
import wadoh_raccoon as tp

# input your dataframes and matching columns:
tp.match_accession(
    source=df1,
    reference=df2,
    accession='NCBI_ACCESSION'
)
accession
12345
CDC-010023
L00029

id accession
1 12345
2 CDC-010023
3 X0293








id accession
1 12345
2 CDC-010023
import polars as pl
from wadoh_raccoon.utils import helpers

df = pl.DataFrame({
    "old_date": [
        "2024-10-30",     # ISO format
        "10-30-2024",     # US format
        "October 30, 2024",  # Full month name format,
        "45496"      # an excel date LOL
    ]
})

# apply the function
df.with_columns(new_date=helpers.date_format('old_date'))
index old_date new_date
0 2024-10-30 2024-10-30
1 10-30-2024 2024-10-30
2 October 30, 2024 2024-10-30
3 45496 None

Try It Yourself

Installation

uv pip install git+https://github.com/NW-PaGe/wadoh_raccoon.git#egg=wadoh_raccoon

To install a specific version, find the git tag noted in the GitHub Release section (something like v0.2.5) and then put it in the install statement like .git@v0.2.5:

uv pip install git+https://github.com/NW-PaGe/wadoh_raccoon.git@v0.2.5#egg=wadoh_raccoon

API Reference

reference