| first | last | dob | |
|---|---|---|---|
| Jon | Doe | 1990-01-01 | |
| Smith | Jaane | 1985-05-15 | |
| Alex | Johnson | 2000-09-10 |
NORTHWEST PATHOGEN GENOMICS CENTER OF EXCELLENCE
wadoh_raccoon python package
A Python package for transforming and linking pathogen sequencing/subtyping metadata.
Example Functions
- accession matching
- fuzzy matching
- data cleaning utilities
import wadoh_raccoon as tp
# input your dataframes and matching columns:
tp.fuzzZ(
source=df1,
reference=df2,
first_name_src="first",
last_name_src="last",
dob_src="dob",
first_name_ref="first",
last_name_ref="last",
dob_ref="dob"
)
| id | first | last | dob | |
|---|---|---|---|---|
| 1 | john | Dooe | 1990-01-01 | |
| 2 | jane | smith | 1985-05-15 | |
| 3 | Alice | Johnson | 2020-09-10 |
| id | first | last | dob |
|---|---|---|---|
| 1 | JOHN | DOE | 1990-01-01 |
| 2 | JANE | SMITH | 1985-05-15 |
import wadoh_raccoon as tp
# input your dataframes and matching columns:
tp.match_accession(
source=df1,
reference=df2,
accession='NCBI_ACCESSION'
)| accession | |
|---|---|
| 12345 | |
| CDC-010023 | |
| L00029 |
| id | accession | |
|---|---|---|
| 1 | 12345 | |
| 2 | CDC-010023 | |
| 3 | X0293 |
| id | accession |
|---|---|
| 1 | 12345 |
| 2 | CDC-010023 |
import polars as pl
from wadoh_raccoon.utils import helpers
df = pl.DataFrame({
"old_date": [
"2024-10-30", # ISO format
"10-30-2024", # US format
"October 30, 2024", # Full month name format,
"45496" # an excel date LOL
]
})
# apply the function
df.with_columns(new_date=helpers.date_format('old_date'))| index | old_date | new_date |
|---|---|---|
| 0 | 2024-10-30 | 2024-10-30 |
| 1 | 10-30-2024 | 2024-10-30 |
| 2 | October 30, 2024 | 2024-10-30 |
| 3 | 45496 | None |
Try It Yourself
Installation
uv pip install git+https://github.com/NW-PaGe/wadoh_raccoon.git#egg=wadoh_raccoonTo install a specific version, find the git tag noted in the GitHub Release section (something like v0.2.5) and then put it in the install statement like .git@v0.2.5:
uv pip install git+https://github.com/NW-PaGe/wadoh_raccoon.git@v0.2.5#egg=wadoh_raccoon