helpers

wadoh_raccoon.utils.helpers

Functions

Name Description
clean_name Clean name field by stripping non-alpha characters and converting to uppercase.
date_format Format Dates
get_secrets Get secrets
gt_style Style for GT Tables
lazy_height Output the height of a polars frame regardless of it being lazy or eager
mft_upload Upload files to Washington State MFT server
save_raw_values save raw values

clean_name

wadoh_raccoon.utils.helpers.clean_name(col: str)

Clean name field by stripping non-alpha characters and converting to uppercase.

Parameters

col : str

Name of column to clean

Returns

: pl.Expr:

a column of uppercase, non-alpha character, non-whitespace strings

Examples

import polars as pl
from wadoh_raccoon.utils import helpers

df = pl.DataFrame({
    "name": [
        "A$AP rocky",
        "50 cent",
        "sTevIe WoNdEr"
    ]
})

output = df.with_columns(helpers.clean_name("name").alias("clean_name"))

helpers.gt_style(df_inp=output)
index name clean_name
0 A$AP rocky AAPROCKY
1 50 cent CENT
2 sTevIe WoNdEr STEVIEWONDER

date_format

wadoh_raccoon.utils.helpers.date_format(
    df: pl.DataFrame | pl.LazyFrame,
    col: str,
)

Format Dates

Convert string dates into a yyyy-mm-dd format. The function uses pl.coalesce to try to process different formats. For example, it will first try to convert m/d/y, and then if that doesn’t work it will try d/m/y. It’s not perfect, but if someone messes up the date it’s their fault.

Note: it won’t attempt to convert excel dates. If someone sends us excel dates we will file a lawsuit.

Usage

To be applied to a string date column.

Parameters

df : pl.DataFrame | pl.LazyFrame

a polars dataframe (needed to check if col is pl.Date type or not)

col : str

a string column that has a date

Returns

: pl.Expr:

a date column

Examples

import polars as pl
from wadoh_raccoon.utils import helpers

df = pl.DataFrame({
    "dates": [
        "2024-10-30",     # ISO format
        "30/10/2024",     # European format
        "10/20/2024",     # US format
        "10-30-2024",     # US format
        "October 30, 2024",  # Full month name format,
        "45496",           # an excel date LOL
        "2022-12-27 08:26:49"
    ]
})

output = (
    df
    .with_columns(
        new_date=helpers.date_format(df=df,col='dates')
    )
)

helpers.gt_style(df_inp=output)
index dates new_date
0 2024-10-30 2024-10-30
1 30/10/2024 2024-10-30
2 10/20/2024 2024-10-20
3 10-30-2024 2024-10-30
4 October 30, 2024 2024-10-30
5 45496 None
6 2022-12-27 08:26:49 2022-12-27

get_secrets

wadoh_raccoon.utils.helpers.get_secrets(vault, keys)

Get secrets

Retrieve secrets from Azure KeyVault. This function will utilize the keys that are passed to retrieve the corresponding secrets.

**Note: Authenication takes place via DefaultAzureCredential which attempts multiple authentication methods. One method is checking against Azure CLI if logged in.

Usage

Use this function to securely retrieve secret values from Azure KeyVault using the specified key(s). The function accepts either a single key or multiple keys as a list.

Parameters

vault :

Key vault url.

keys :

A single secret key or list of secret keys.

Returns

: str or tuple of str

If a single key is provided, returns the secret value as a string. If a list of keys is provided, returns a tuple of secret values in the same order.

Examples

from wadoh_raccoon.utils import helpers

# Get a single secret
db_password = helpers.get_secrets("keyvault_url", "db-password")

# Get multiple secrets at once
username, password, api_key = helpers.get_secrets(
    "keyvault_url",
    ["db-username", "db-password", "api-key"]
)

gt_style

wadoh_raccoon.utils.helpers.gt_style(
    df_inp: pl.DataFrame,
    title: str = '',
    subtitle: str = '',
    add_striping_inp,
    index_inp,
)

Style for GT Tables

Usage

Apply this style to a Polars DataFrame

Parameters

df_inp : pl.DataFrame

a polars dataframe

title : str = ''

a title for the table (optional)

subtitle : str = ''

a subtitle for the table (optional, must have a title if using a subtitle)

add_striping_inp : = True

striping in the table True or False

index_inp : = True

add a column for the row number and label it index

Returns

: GT

a GT object (great_tables table)

Examples

import polars as pl
from wadoh_raccoon.utils import helpers
df = pl.DataFrame({
    "x": [1,1,2],
    "y": [1,2,3]
})

A table with a title/subtitle:

helpers.gt_style(df_inp=df,title="My Title",subtitle="My Subtitle")
My Title
My Subtitle
index x y
0 1 1
1 1 2
2 2 3

No title/subtitle

helpers.gt_style(df_inp=df)
index x y
0 1 1
1 1 2
2 2 3

Without an index:

helpers.gt_style(df_inp=df,index_inp=False)
x y
1 1
1 2
2 3

Without striping:

helpers.gt_style(df_inp=df,add_striping_inp=False)
index x y
0 1 1
1 1 2
2 2 3

lazy_height

wadoh_raccoon.utils.helpers.lazy_height(lf: pl.DataFrame | pl.LazyFrame)

Output the height of a polars frame regardless of it being lazy or eager

mft_upload

wadoh_raccoon.utils.helpers.mft_upload(
    upload: pl.DataFrame,
    dir: str,
    upload_file_name: str,
    upload_file_extension: str,
    username: str,
    password: str,
    host: str = 'mft.wa.gov',
)

Upload files to Washington State MFT server

Upload Polars DataFrames to the Washington State Managed File Transfer (MFT) server via SFTP. This function converts DataFrames to various file formats and securely transfers them to specified directories on the MFT server.

**Note: Authentication requires explicit credentials to be provided. The function automatically adds the server’s host key for simplified connection handling.

Usage

Use this function to upload processed surveillance data, reports, or other DataFrames to the MFT server for sharing with partners. The function handles file format conversion.

Parameters

upload : polars.DataFrame

The Polars DataFrame to upload.

dir : str

Target directory path on the MFT server (e.g., ‘/outbound/partner’).

upload_file_name : str

Name of the file without extension (e.g., ‘surveillance_report’).

upload_file_extension : str

File extension including the dot. Supported formats: ‘.csv’, ‘.xlsx’, ‘.json’, ‘.parquet’.

username : str

MFT server username.

password : str

MFT server password.

host : str = 'mft.wa.gov'

MFT server hostname. Default is ‘mft.wa.gov’.

Returns

: None

Files are uploaded directly to the MFT server. Success message printed.

Raises

: TypeError

If upload is not a Polars DataFrame.

: ValueError

If upload is empty, required parameters are missing, or upload_file_extension is not supported.

: OSError

If the target directory does not exist or cannot be accessed on the MFT server.

: ConnectionError

If SFTP connection fails.

Examples

import polars as pl
from wadoh_raccoon.utils.helpers import mft_upload, get_secrets

# Create sample DataFrame
df = pl.DataFrame({
    'case_id': [1, 2, 3],
    'pathogen': ['Salmonella', 'E. coli', 'Campylobacter']
})

# Get credentials from Key Vault
mft_user, mft_pass = get_secrets(
    'vault_url', 
    ['mft-username', 'mft-password']
)

# Upload as CSV
mft_upload(
    upload=df,
    dir='DEV_TESTING',
    upload_file_name='weekly_report_2024_01',
    upload_file_extension='.csv',
    username=mft_user,
    password=mft_pass
)

# Upload as Excel
mft_upload(
    upload=df,
    dir='DEV_TESTING',
    upload_file_name='weekly_report_2024_01',
    upload_file_extension='.xlsx',
    username=mft_user,
    password=mft_pass
)

save_raw_values

wadoh_raccoon.utils.helpers.save_raw_values(
    df_inp: pl.DataFrame,
    primary_key_col: str,
)

save raw values

Usage

Converts a polars dataframe into a dataframe with all columns in a struct column. It’s good for saving raw outputs of data.

Parameters

df_inp : pl.DataFrame

a polars dataframe

primary_key_col : str

column name for the primary key (submission key, not person/case key)

Returns

df : pl.DataFrame

a dataframe

Examples

import polars as pl
from wadoh_raccoon.utils import helpers

data = pl.DataFrame({
    "lab_name": ["PHL", "MFT", "ELR","PHL"],
    "first_name": ["Alice", "Bob", "Charlie", "Charlie"],
    "last_name": ["Smith", "Johnson", "Williams", "Williams"],
    "WA_ID": [1,2,4,4]
})

received_submissions_df = (
        helpers.save_raw_values(df_inp=data,primary_key_col="WA_ID")
)

helpers.gt_style(data)
index lab_name first_name last_name WA_ID
0 PHL Alice Smith 1
1 MFT Bob Johnson 2
2 ELR Charlie Williams 4
3 PHL Charlie Williams 4
helpers.gt_style(received_submissions_df)
index submission_number internal_create_date raw_inbound_submission
0 1 2026-02-04 {"PHL","Alice","Smith",1}
1 2 2026-02-04 {"MFT","Bob","Johnson",2}
2 4 2026-02-04 {"ELR","Charlie","Williams",4}
3 4 2026-02-04 {"PHL","Charlie","Williams",4}