Helpers for Loading Datasets#
Module for loading the lydata datasets.
- class lydata.loader.DatasetSpec(year: int | str, institution: str, subsite: str, path: Path | None = None, description: str = '', repo: str = 'rmnldwg/lydata', revision: str = 'main')[source]#
Specification of a dataset.
- property name: str#
Get the name of the dataset.
>>> spec = DatasetSpec(2023, "clb", "multisite", Path("path"), "description") >>> spec.name '2023-clb-multisite'
- lydata.loader.remove_subheadings(elements: list, min_level: int = 1) list[source]#
Remove anything under
min_levelheadings.
- lydata.loader.get_description(readme: TextIOWrapper | str, short: bool = False, max_line_length: int = 60) str[source]#
Get a markdown description from a file.
Truncate the description before the first second-level heading if
shortis set toTrue.
- lydata.loader.available_datasets(year: int | str = '*', institution: str = '*', subsite: str = '*', where: Literal['disk', 'github'] = 'disk') Generator[DatasetSpec, None, None][source]#
Generate names of available datasets.
>>> avail_gen = available_datasets(where='disk') >>> sorted([ds.name for ds in avail_gen]) ['2021-clb-oropharynx', '2021-usz-oropharynx', '2023-clb-multisite', '2023-isb-multisite'] >>> avail_gen = available_datasets(where='github') >>> sorted([ds.name for ds in avail_gen]) ['2021-clb-oropharynx', '2021-usz-oropharynx', '2023-clb-multisite', '2023-isb-multisite']
- lydata.loader.load_datasets(year: int | str = '*', institution: str = '*', subsite: str = '*', **load_kwargs) Generator[DataFrame, None, None][source]#
Load matching datasets from the disk.
- lydata.loader.load_dataset(year: int | str = '*', institution: str = '*', subsite: str = '*', **load_kwargs) DataFrame[source]#
Load the first matching dataset from the disk.
Note that datasets loaded (or fetched) with this function will have the dataset specification stored in the
attrsattribute. See below for an example of how to access the dataset specification.>>> ds = load_dataset(year=2021, institution='clb', subsite='oropharynx') >>> ds.attrs["year"] '2021' >>> spec_from_ds = DatasetSpec(**ds.attrs) >>> spec_from_ds.name '2021-clb-oropharynx'
- lydata.loader.fetch_datasets(year: int | str = '*', institution: str = '*', subsite: str = '*', **load_kwargs) Generator[DataFrame, None, None][source]#
Fetch matching datasets from the web.