Type Casting and Validation#
Module to cast dtypes and to and validate the lyDATA datasets.
The two main functions here are cast_dtypes() and is_valid(). The
first one can be used to cast the dtypes of the columns in a LyDataFrame
to the expected types according to the schema constructed using
create_full_record_model().
Subsequently, is_valid() can be used to validate every row in the table, again
using the constructed schema.
- lydata.validator.flatten(nested: dict, prev_key: tuple = (), max_depth: int | None = None) dict[source]#
Flatten
nesteddict by creating key tuples for each value atmax_depth.>>> nested = {"tumor": {"1": {"t_stage": 1, "size": 12.3}}} >>> flatten(nested) {('tumor', '1', 't_stage'): 1, ('tumor', '1', 'size'): 12.3} >>> mapping = {"patient": {"#": {"age": {"func": int, "columns": ["age"]}}}} >>> flatten(mapping, max_depth=3) {('patient', '#', 'age'): {'func': <class 'int'>, 'columns': ['age']}}
Note that flattening an already flat dictionary will yield some weird results.
- lydata.validator.unflatten(flat: dict) dict[source]#
Take a flat dictionary with tuples of keys and create nested dict from it.
>>> flat = {('tumor', '1', 't_stage'): 1, ('tumor', '1', 'size'): 12.3} >>> unflatten(flat) {'tumor': {'1': {'t_stage': 1, 'size': 12.3}}} >>> mapping = {('patient', '#', 'age'): {'func': int, 'columns': ['age']}} >>> unflatten(mapping) {'patient': {'#': {'age': {'func': <class 'int'>, 'columns': ['age']}}}}
- lydata.validator.is_valid(dataset: LyDataFrame, fail_on_error: bool = True) bool[source]#
Validate the given dataset against the lyDATA schema.
Returns
Trueif all records are valid, otherwise it either raises an error (iffail_on_errorisTrue) or returnsFalse.
- lydata.validator.cast_dtypes(dataset: LyDataFrame, casters: Mapping[type, str] | None = None, fail_on_error: bool = True) LyDataFrame[source]#
Cast the dtypes of the
datasetto the expected types.This function uses the annotations of the Pydantic schema to cast the individual columns of the
datasetto the expected types. It uses thecastersmapping to determine the type to cast to. By default, it uses the mapping from the_get_default_casters()function.That way, pandas uses e.g. the nullable integer type
Int64if we specify in pydantic that a field can be an integer or None. If you want to use a different mapping, you can pass it as thecastersargument.