Data Producer¶
-
class
piepline.data_producer.data_producer.
DataProducer
(dataset: piepline.data_producer.datasets.AbstractDataset, batch_size: int = 1, num_workers: int = 0)[source]¶ Data Producer. Accumulate one or more datasets and pass it’s data by batches for processing. This use PyTorch builtin
DataLoader
for increase performance of data delivery. :param dataset: dataset object. Every dataset might be iterable (contans methods__getitem__
and__len__
) :param batch_size: size of output batch :param num_workers: number of processes, that load data from datasets and pass it for output-
get_data
(data_idx: int) → object[source]¶ Get single data by dataset idx and data_idx :param data_idx: index of data in this dataset :return: dataset output
-
get_loader
(indices: [<class 'str'>] = None) → torch.utils.data.dataloader.DataLoader[source]¶ Get PyTorch
DataLoader
object, that aggregateDataProducer
. Ifindices
is specified - DataLoader will output data only by this indices. In this case indices will not passed. :param indices: list of indices. Each item of list is a string in format ‘{}_{}’.format(dataset_idx, data_idx) :return:DataLoader
object
-
global_shuffle
(is_need: bool) → piepline.data_producer.data_producer.DataProducer[source]¶ Is need global shuffling. If global shuffling enable - batches will compile from random indices of all datasets. In this case datasets order shuffling was ignoring :param is_need: is need global shuffling :return: self object
-