iirc.lifelong_dataset package

Submodules

iirc.lifelong_dataset.base_dataset module

class iirc.lifelong_dataset.base_dataset.BaseDataset(dataset: Union[List[Tuple[PIL.Image.Image, Tuple[str, ]]], List[Tuple[str, Tuple[str, ]]]], tasks: List[List[str]], setup: str = 'IIRC', using_image_path: bool = False, cache_images: bool = False, essential_transforms_fn: Optional[Callable[[Any], Any]] = None, augmentation_transforms_fn: Optional[Callable[[Any], Any]] = None, test_mode: bool = False, complete_information_mode: Optional[bool] = None, superclass_data_pct: float = 0.6, subclass_data_pct: float = 0.6, superclass_sampling_size_cap: int = 100)

Bases: abc.ABC

A lifelong learning dataset base class with the underlying data changing based on what task is currently activated. This class is an abstract base class.

Parameters
  • dataset (DatasetStructType) – a list of tuples which contains the data in the form of (image, (label,)) or (image, (label1,label2)). The image path (str) can be provided instead if the images would be loaded on the fly (see the argument using_image_path). label is a string representing the class name

  • tasks (List[List[str]]) – a list of lists where each inner list contains the set of classes (class names) that will be introduced in that task (example: [[dog, cat, car], [tiger, truck, fish]])

  • setup (str) – Class Incremental Learning setup (CIL) or Incremental Implicitly Refined Classification setup (IIRC) (default: IIRC_SETUP)

  • using_image_path (bool) – whether the pillow image is provided in the dataset argument, or the image path that would be used later to load the image. set True if using the image path (default: False)

  • cache_images (bool) – cache images that belong to the current task in the memory, only applicable when using the image path (default: False)

  • essential_transforms_fn (Callable[[Any], Any]) – A function that contains the essential transforms (for example, converting a pillow image to a tensor) that should be applied to each image. This function is applied only when the augmentation_transforms_fn is set to None (as in the case of a test set) or inside the disable_augmentations context (default: None)

  • augmentation_transforms_fn – (Callable[[Any], Any]): A function that contains the essential transforms (for example, converting a pillow image to a tensor) and augmentation transforms (for example, applying random cropping) that should be applied to each image. When this function is provided, essential_transforms_fn is not used except inside the disable_augmentations context (default: None)

  • test_mode (bool) – Whether this dataset is considered a training split or a test split. This info is only helpful when using the IIRC setup (default: False)

  • complete_information_mode (bool) – Whether the dataset is in complete information mode or incomplete information mode. This is only valid when using the IIRC setup. In the incomplete information mode, if a sample has two labels corresponding to a previous task and a current task (example: dog and Bulldog), only the label present in the current task is provided (Bulldog). In the complete information mode, both labels will be provided. In all cases, no label from a future task would be provided. When no value is set for complete_information_mode, this value is defaulted to the test_mode value (complete information during test mode only) (default: None)

  • superclass_data_pct (float) – The percentage of samples sampled for each superclass from its consistuent subclasses. This is valid only when using the IIRC setup and when test_mode is set to False. For example, If the superclass “dog” has the subclasses “Bulldog” and “Whippet”, and superclass_data_pct is set to 0.4, then 40% of each of the “Bulldog” samples and “Whippet” samples will be provided when training on the task that has the class “dog” (default: 0.6)

  • subclass_data_pct (float) – The percentage of samples sampled for each subclass if it has a superclass. This is valid only when using the IIRC setup and when test_mode is set to False. For example, If the superclass “dog” has one of the subclasses as “Bulldog”, and superclass_data_pct is set to 0.4 while subclass_data_pct is set to 0.8, then 40% of the “Bulldog” samples will be provided when training on the task that contains “dog”, and 80% of the “Bulldog” samples will be provided when training on the task that contains “Bulldog”. superclass_data_pct and subclass_data_pct don’t need to sum to 1 as the samples can be repeated across tasks (in the previous example, 20% of the samples were repeated across the two tasks) (default: 0.6)

  • superclass_sampling_size_cap (int) – The number of subclasses a superclass should contain after which the number of samples doesn’t increase anymore. This is valid only when using the IIRC setup and when test_mode is set to False. For example, If a superclass has 8 subclasses, with the superclass_data_pct set to 0.4, and superclass_sampling_size_cap set to 5, then superclass_data_pct for that specific superclass will be adjusted to 0.25 (5 / 8 * 0.4) (default: 100)

dataset_state_dict() → Dict

This function returns a dict that contains the current state of the dataset

Returns

a dictionary with all the attributes (key is attribute name) and their values, except the attributes in the self.non_savable_attributes

Return type

Dict

load_dataset_state_dict(state_dict: Dict) → None

This function loads the object attributes with the values in state_dict

Parameters

state_dict (Dict) – a dictionary with the attribute names as keys and their values

reset() → None

Reset the dataset to the starting state

choose_task(task_id: int) → None

Load the data corresponding to task “task_id” and update tbe seen classes based on it.

Parameters

task_id (int) – The task_id of the task to load

load_tasks_up_to(task_id: int) → None
Load the data corresponding to the tasks up to “task_id” (including that task). When using the IIRC setup, this

function is only available when complete_information_mode is set to True.

Parameters

task_id (int) – The task_id of the task to load

get_labels(index: int) → Tuple[str, str]

Return the labels of the sample with index (index) in the current task.

Parameters

index (int) – The index of the sample in the current task, this is a relative index within the current task

Returns

The labels corresponding to the sample. If using CIL setup, or if the other label is masked, then the other str contains the value specified by the NO_LABEL_PLACEHOLDER

Return type

Tuple[str, str]

get_item(index: int) → Tuple[Any, str, str]
Return the image with index (index) in the current task along with its labels. No transformations are applied

to the image.

Parameters

index (int) – The index of the sample in the current task, this is a relative index within the current task

Returns

The image along with its labels . If using CIL setup, or if the other label is masked, then the other str contains the value specified by the NO_LABEL_PLACEHOLDER

Return type

Tuple[Any, str, str]

get_image_indices_by_cla(cla: str, num_samples: int = - 1, shuffle: bool = True) → numpy.ndarray
get the indices of the samples of cla within the cur_task. Warning: if the task data is changed (like by

using choose_task() or load_tasks_up_to()), these indices would point to other samples as they are relative to the current task

Parameters
  • cla (str) – The class name

  • num_samples (int) – The number of samples needed for that class, set to -1 to return the indices of all the samples that belong to that class in the current task (default: -1)

  • shuffle (bool) – Whether to return the indices shuffled (default: False)

Returns

The indices of the samples of class cla within the current task (relative indices)

Return type

np.ndarray

disable_augmentations() → None

A context where only the essential transformations are applied

enable_complete_information_mode() → None
enable_incomplete_information_mode() → None

iirc.lifelong_dataset.tensorflow_dataset module

iirc.lifelong_dataset.torch_dataset module

class iirc.lifelong_dataset.torch_dataset.Dataset(dataset: Union[List[Tuple[PIL.Image.Image, Tuple[str, ]]], List[Tuple[str, Tuple[str, ]]]], tasks: List[List[str]], setup: str = 'IIRC', using_image_path: bool = False, cache_images: bool = False, essential_transforms_fn: Optional[Callable[[PIL.Image.Image], torch.Tensor]] = None, augmentation_transforms_fn: Optional[Callable[[PIL.Image.Image], torch.Tensor]] = None, test_mode: bool = False, complete_information_mode: Optional[bool] = None, superclass_data_pct: float = 0.6, subclass_data_pct: float = 0.6, superclass_sampling_size_cap: int = 100)

Bases: iirc.lifelong_dataset.base_dataset.BaseDataset, torch.utils.data.dataset.Dataset

A class inhereting from BaseDataset to be used with PyTorch

Parameters
  • dataset (DatasetStructType) – a list of tuples which contains the data in the form of (image, (label,)) or (image, (label1,label2)). The image path (str) can be provided instead if the images would be loaded on the fly (see the argument using_image_path). label is a string representing the class name

  • tasks (List[List[str]]) – a list of lists where each inner list contains the set of classes (class names) that will be introduced in that task (example: [[dog, cat, car], [tiger, truck, fish]])

  • setup (str) – Class Incremental Learning setup (CIL) or Incremental Implicitly Refined Classification setup (IIRC) (default: IIRC_SETUP)

  • using_image_path (bool) – whether the pillow image is provided in the dataset argument, or the image path that would be used later to load the image. set True if using the image path (default: False)

  • cache_images (bool) – cache images that belong to the current task in the memory, only applicable when using the image path (default: False)

  • essential_transforms_fn (Optional[Callable[[Image.Image], torch.Tensor]]) – A function that contains the essential transforms (for example, converting a pillow image to a tensor) that should be applied to each image. This function is applied only when the augmentation_transforms_fn is set to None (as in the case of a test set) or inside the disable_augmentations context (default: None)

  • augmentation_transforms_fn – (Optional[Callable[[Image.Image], torch.Tensor]]): A function that contains the essential transforms (for example, converting a pillow image to a tensor) and augmentation transforms (for example, applying random cropping) that should be applied to each image. When this function is provided, essential_transforms_fn is not used except inside the disable_augmentations context (default: None)

  • test_mode (bool) – Whether this dataset is considered a training split or a test split. This info is only helpful when using the IIRC setup (default: False)

  • complete_information_mode (bool) – Whether the dataset is in complete information mode or incomplete information mode. This is only valid when using the IIRC setup. In the incomplete information mode, if a sample has two labels corresponding to a previous task and a current task (example: dog and Bulldog), only the label present in the current task is provided (Bulldog). In the complete information mode, both labels will be provided. In all cases, no label from a future task would be provided. When no value is set for complete_information_mode, this value is defaulted to the test_mode value (complete information during test mode only) (default: None)

  • superclass_data_pct (float) – The percentage of samples sampled for each superclass from its consistuent subclasses. This is valid only when using the IIRC setup and when test_mode is set to False. For example, If the superclass “dog” has the subclasses “Bulldog” and “Whippet”, and superclass_data_pct is set to 0.4, then 40% of each of the “Bulldog” samples and “Whippet” samples will be provided when training on the task that has the class “dog” (default: 0.6)

  • subclass_data_pct (float) – The percentage of samples sampled for each subclass if it has a superclass. This is valid only when using the IIRC setup and when test_mode is set to False. For example, If the superclass “dog” has one of the subclasses as “Bulldog”, and superclass_data_pct is set to 0.4 while subclass_data_pct is set to 0.8, then 40% of the “Bulldog” samples will be provided when training on the task that contains “dog”, and 80% of the “Bulldog” samples will be provided when training on the task that contains “Bulldog”. superclass_data_pct and subclass_data_pct don’t need to sum to 1 as the samples can be repeated across tasks (in the previous example, 20% of the samples were repeated across the two tasks) (default: 0.6)

  • superclass_sampling_size_cap (int) – The number of subclasses a superclass should contain after which the number of samples doesn’t increase anymore. This is valid only when using the IIRC setup and when test_mode is set to False. For example, If a superclass has 8 subclasses, with the superclass_data_pct set to 0.4, and superclass_sampling_size_cap set to 5, then superclass_data_pct for that specific superclass will be adjusted to 0.25 (5 / 8 * 0.4) (default: 100)

Module contents