first | last | dob | |
---|---|---|---|
Jon | Doe | 1990-01-01 | |
Smith | Jaane | 1985-05-15 | |
Alex | Johnson | 2000-09-10 |
NORTHWEST PATHOGEN GENOMICS CENTER OF EXCELLENCE
wadoh_raccoon python package
A Python package for transforming and linking pathogen sequencing/subtyping metadata.
Example Functions
- accession matching
- fuzzy matching
- data cleaning utilities
import wadoh_raccoon as tp
# input your dataframes and matching columns:
tp.fuzzZ(=df1,
source=df2,
reference="first",
first_name_src="last",
last_name_src="dob",
dob_src="first",
first_name_ref="last",
last_name_ref="dob"
dob_ref )
id | first | last | dob | |
---|---|---|---|---|
1 | john | Dooe | 1990-01-01 | |
2 | jane | smith | 1985-05-15 | |
3 | Alice | Johnson | 2020-09-10 |
id | first | last | dob |
---|---|---|---|
1 | JOHN | DOE | 1990-01-01 |
2 | JANE | SMITH | 1985-05-15 |
import wadoh_raccoon as tp
# input your dataframes and matching columns:
tp.match_accession(=df1,
source=df2,
reference='NCBI_ACCESSION'
accession )
accession | |
---|---|
12345 | |
CDC-010023 | |
L00029 |
id | accession | |
---|---|---|
1 | 12345 | |
2 | CDC-010023 | |
3 | X0293 |
id | accession |
---|---|
1 | 12345 |
2 | CDC-010023 |
import polars as pl
from wadoh_raccoon.utils import helpers
= pl.DataFrame({
df "old_date": [
"2024-10-30", # ISO format
"10-30-2024", # US format
"October 30, 2024", # Full month name format,
"45496" # an excel date LOL
]
})
# apply the function
=helpers.date_format('old_date')) df.with_columns(new_date
index | old_date | new_date |
---|---|---|
0 | 2024-10-30 | 2024-10-30 |
1 | 10-30-2024 | 2024-10-30 |
2 | October 30, 2024 | 2024-10-30 |
3 | 45496 | None |
Try It Yourself
Installation
+https://github.com/NW-PaGe/wadoh_raccoon.git#egg=wadoh_raccoon uv pip install git
To install a specific version, find the git tag noted in the GitHub Release section (something like v0.2.5) and then put it in the install statement like .git@v0.2.5:
+https://github.com/NW-PaGe/wadoh_raccoon.git@v0.2.5#egg=wadoh_raccoon uv pip install git