Enhancing and Augmenting Datasets

Enhancing and Augmenting Datasets#

Provides functions for augmenting and enhancing the lyDATA tables.

This module does the heavy lifting of inferring the most likely true involvment based on several - possibly conflicting - diagnoses and their sensitivities and specificities. It also resolves the sub- and super-level involvement information, e.g. if a sublevel is involved, the superlevel is also involved, and vice-versa.

All this is achieved in the combine_and_augment_levels() function, which is also used by the combine(), augment(), and enhance() methods of the LyDataAccessor class.

lydata.augmentor.combine_and_augment_levels(diagnoses: Sequence[DataFrame], specificities: Sequence[float], sensitivities: Sequence[float], method: Literal['max_llh', 'rank'] = 'max_llh', sides: Sequence[Literal['ipsi', 'contra']] | None = None, subdivisions: Mapping[str, Sequence[str]] | None = None) DataFrame[source]#

Combine diagnoses and add sub-/superlevel involvement info.

Different diagnostic modalities may conflict with each other, e.g. on MRI an LNL may look metastatic, while FNA finds no malignancy. This function combines available diagnoses based on their sensitivities and specificities into a sort of consensus. When choosing the method="max_llh", the most likely/ probable diagnosis is chosen. If method="rank", the single most trustworthy diagnosis is kept.

Additionally, the function may add and resolve sub- and superlevel involvement information. For example, some datasets report the overall involvement in LNL II, while others differentiate between sublevels IIa and IIb. Now, if IIa harbors disease, that means that the overall involvement in II is also true. By specifying subdivisions, the function consistently updates these super- and sublevel involvement patterns.

The returned DataFrame has a two-level multi-index: One level for each of the sides and the second level for the involvement levels. This means it i in the same format as the stack of input diagnoses.

See the accessor methods :py:meth:`~lydata.accessor.LyDataAccessor.augment and :py:meth:`~lydata.accessor.LyDataAccessor.combine for some examples.