Data Producer¶

class piepline.data_producer.data_producer.DataProducer(dataset: piepline.data_producer.datasets.AbstractDataset, batch_size: int = 1, num_workers: int = 0)[source]¶

Data Producer. Accumulate one or more datasets and pass it’s data by batches for processing. This use PyTorch builtin DataLoader for increase performance of data delivery. :param dataset: dataset object. Every dataset might be iterable (contans methods __getitem__ and __len__) :param batch_size: size of output batch :param num_workers: number of processes, that load data from datasets and pass it for output

get_data(data_idx: int) → object[source]¶: Get single data by dataset idx and data_idx :param data_idx: index of data in this dataset :return: dataset output

get_loader(indices: [<class 'str'>] = None) → torch.utils.data.dataloader.DataLoader[source]¶: Get PyTorch DataLoader object, that aggregate DataProducer. If indices is specified - DataLoader will output data only by this indices. In this case indices will not passed. :param indices: list of indices. Each item of list is a string in format ‘{}_{}’.format(dataset_idx, data_idx) :return: DataLoader object

global_shuffle(is_need: bool) → piepline.data_producer.data_producer.DataProducer[source]¶: Is need global shuffling. If global shuffling enable - batches will compile from random indices of all datasets. In this case datasets order shuffling was ignoring :param is_need: is need global shuffling :return: self object

pass_indices(need_pass: bool) → piepline.data_producer.data_producer.DataProducer[source]¶: Pass indices of data in every batch. By default disabled :param need_pass: is need to pass indices

pin_memory(is_need: bool) → piepline.data_producer.data_producer.DataProducer[source]¶: Is need to pin memory on loading. Pinning memory was increase data loading performance (especially when data loads to GPU) but incompatible with swap :param is_need: is need :return: self object