grain.sources module#
APIs for reading data from various file formats.
List of Members#
- class grain.sources.RandomAccessDataSource(*args, **kwargs)#
Interface for datasets where storage supports efficient random access.
If used with DataLoader, __repr__ has to be additionally implemented to support checkpointing.
If used with multiprocessing, must be picklable.
- __getitem__(index)#
Returns the value for the given index.
This method must be thread-safe and deterministic.
Note that a number of sources take SupportsIndex instead of int for index. Such sources will still support int index and pass the isinstance check with this protocol, but all new source implementations should use int directly.
- Parameters:
index (int) – An integer in [0, len(self)-1].
- Returns:
The corresponding record. File data sources often return the raw bytes but records can be any Python object.
- Return type:
T
- __len__()#
Returns the total number of records in the data source.
- Return type:
int
- class grain.sources.ArrayRecordDataSource(*args, **kwargs)#
Data source for ArrayRecord files.
- Parameters:
paths (array_record.python.array_record_data_source.PathLikeOrFileInstruction | Sequence[array_record.python.array_record_data_source.PathLikeOrFileInstruction])
reader_options (dict[str, str] | None)
- __init__(paths, reader_options=None)#
Creates a new ArrayRecordDataSource object.
See array_record.ArrayRecordDataSource for more details.
- Parameters:
paths (array_record.python.array_record_data_source.PathLikeOrFileInstruction | Sequence[array_record.python.array_record_data_source.PathLikeOrFileInstruction]) – A single path/FileInstruction or list of paths/FileInstructions.
reader_options (dict[str, str] | None) – a dict[str, str] to be passed when creating a reader. For example, {index_storage_option:”in_memory”} stores the reader indices in memory versus {index_storage_option:”offloaded”} stores the indices on disk to save memory usage.
Simple in-memory data source for sequences that is sharable among multiple processes.
Note
This constrains storable values to only the int, float, bool, str (less than 10M bytes each), bytes (less than 10M bytes each), and None built-in data types. It also notably differs from the built-in list type in that these lists can not change their overall length (i.e. no append, insert, etc.)
- Parameters:
elements (Sequence[Any] | None)
name (str | None)
Creates a new InMemoryDataSource object.
- Parameters:
elements (Sequence[Any] | None) – The elements for the sharable list.
name (str | None) – The name of the datasource.