iirc.lifelong_dataset package¶
Submodules¶
iirc.lifelong_dataset.base_dataset module¶
-
class
iirc.lifelong_dataset.base_dataset.
BaseDataset
(dataset: Union[List[Tuple[PIL.Image.Image, Tuple[str, …]]], List[Tuple[str, Tuple[str, …]]]], tasks: List[List[str]], setup: str = 'IIRC', using_image_path: bool = False, cache_images: bool = False, essential_transforms_fn: Optional[Callable[[Any], Any]] = None, augmentation_transforms_fn: Optional[Callable[[Any], Any]] = None, test_mode: bool = False, complete_information_mode: Optional[bool] = None, superclass_data_pct: float = 0.6, subclass_data_pct: float = 0.6, superclass_sampling_size_cap: int = 100)¶ Bases:
abc.ABC
A lifelong learning dataset base class with the underlying data changing based on what task is currently activated. This class is an abstract base class.
- Parameters
dataset (DatasetStructType) – a list of tuples which contains the data in the form of (image, (label,)) or (image, (label1,label2)). The image path (str) can be provided instead if the images would be loaded on the fly (see the argument using_image_path). label is a string representing the class name
tasks (List[List[str]]) – a list of lists where each inner list contains the set of classes (class names) that will be introduced in that task (example: [[dog, cat, car], [tiger, truck, fish]])
setup (str) – Class Incremental Learning setup (CIL) or Incremental Implicitly Refined Classification setup (IIRC) (default: IIRC_SETUP)
using_image_path (bool) – whether the pillow image is provided in the dataset argument, or the image path that would be used later to load the image. set True if using the image path (default: False)
cache_images (bool) – cache images that belong to the current task in the memory, only applicable when using the image path (default: False)
essential_transforms_fn (Callable[[Any], Any]) – A function that contains the essential transforms (for example, converting a pillow image to a tensor) that should be applied to each image. This function is applied only when the augmentation_transforms_fn is set to None (as in the case of a test set) or inside the disable_augmentations context (default: None)
augmentation_transforms_fn – (Callable[[Any], Any]): A function that contains the essential transforms (for example, converting a pillow image to a tensor) and augmentation transforms (for example, applying random cropping) that should be applied to each image. When this function is provided, essential_transforms_fn is not used except inside the disable_augmentations context (default: None)
test_mode (bool) – Whether this dataset is considered a training split or a test split. This info is only helpful when using the IIRC setup (default: False)
complete_information_mode (bool) – Whether the dataset is in complete information mode or incomplete information mode. This is only valid when using the IIRC setup. In the incomplete information mode, if a sample has two labels corresponding to a previous task and a current task (example: dog and Bulldog), only the label present in the current task is provided (Bulldog). In the complete information mode, both labels will be provided. In all cases, no label from a future task would be provided. When no value is set for complete_information_mode, this value is defaulted to the test_mode value (complete information during test mode only) (default: None)
superclass_data_pct (float) – The percentage of samples sampled for each superclass from its consistuent subclasses. This is valid only when using the IIRC setup and when test_mode is set to False. For example, If the superclass “dog” has the subclasses “Bulldog” and “Whippet”, and superclass_data_pct is set to 0.4, then 40% of each of the “Bulldog” samples and “Whippet” samples will be provided when training on the task that has the class “dog” (default: 0.6)
subclass_data_pct (float) – The percentage of samples sampled for each subclass if it has a superclass. This is valid only when using the IIRC setup and when test_mode is set to False. For example, If the superclass “dog” has one of the subclasses as “Bulldog”, and superclass_data_pct is set to 0.4 while subclass_data_pct is set to 0.8, then 40% of the “Bulldog” samples will be provided when training on the task that contains “dog”, and 80% of the “Bulldog” samples will be provided when training on the task that contains “Bulldog”. superclass_data_pct and subclass_data_pct don’t need to sum to 1 as the samples can be repeated across tasks (in the previous example, 20% of the samples were repeated across the two tasks) (default: 0.6)
superclass_sampling_size_cap (int) – The number of subclasses a superclass should contain after which the number of samples doesn’t increase anymore. This is valid only when using the IIRC setup and when test_mode is set to False. For example, If a superclass has 8 subclasses, with the superclass_data_pct set to 0.4, and superclass_sampling_size_cap set to 5, then superclass_data_pct for that specific superclass will be adjusted to 0.25 (5 / 8 * 0.4) (default: 100)
-
dataset_state_dict
() → Dict¶ This function returns a dict that contains the current state of the dataset
- Returns
a dictionary with all the attributes (key is attribute name) and their values, except the attributes in the self.non_savable_attributes
- Return type
Dict
-
load_dataset_state_dict
(state_dict: Dict) → None¶ This function loads the object attributes with the values in state_dict
- Parameters
state_dict (Dict) – a dictionary with the attribute names as keys and their values
-
reset
() → None¶ Reset the dataset to the starting state
-
choose_task
(task_id: int) → None¶ Load the data corresponding to task “task_id” and update tbe seen classes based on it.
- Parameters
task_id (int) – The task_id of the task to load
-
load_tasks_up_to
(task_id: int) → None¶ - Load the data corresponding to the tasks up to “task_id” (including that task). When using the IIRC setup, this
function is only available when complete_information_mode is set to True.
- Parameters
task_id (int) – The task_id of the task to load
-
get_labels
(index: int) → Tuple[str, str]¶ Return the labels of the sample with index (index) in the current task.
- Parameters
index (int) – The index of the sample in the current task, this is a relative index within the current task
- Returns
The labels corresponding to the sample. If using CIL setup, or if the other label is masked, then the other str contains the value specified by the NO_LABEL_PLACEHOLDER
- Return type
Tuple[str, str]
-
get_item
(index: int) → Tuple[Any, str, str]¶ - Return the image with index (index) in the current task along with its labels. No transformations are applied
to the image.
- Parameters
index (int) – The index of the sample in the current task, this is a relative index within the current task
- Returns
The image along with its labels . If using CIL setup, or if the other label is masked, then the other str contains the value specified by the NO_LABEL_PLACEHOLDER
- Return type
Tuple[Any, str, str]
-
get_image_indices_by_cla
(cla: str, num_samples: int = - 1, shuffle: bool = True) → numpy.ndarray¶ - get the indices of the samples of cla within the cur_task. Warning: if the task data is changed (like by
using choose_task() or load_tasks_up_to()), these indices would point to other samples as they are relative to the current task
- Parameters
cla (str) – The class name
num_samples (int) – The number of samples needed for that class, set to -1 to return the indices of all the samples that belong to that class in the current task (default: -1)
shuffle (bool) – Whether to return the indices shuffled (default: False)
- Returns
The indices of the samples of class cla within the current task (relative indices)
- Return type
np.ndarray
-
disable_augmentations
() → None¶ A context where only the essential transformations are applied
-
enable_complete_information_mode
() → None¶
-
enable_incomplete_information_mode
() → None¶
iirc.lifelong_dataset.tensorflow_dataset module¶
iirc.lifelong_dataset.torch_dataset module¶
-
class
iirc.lifelong_dataset.torch_dataset.
Dataset
(dataset: Union[List[Tuple[PIL.Image.Image, Tuple[str, …]]], List[Tuple[str, Tuple[str, …]]]], tasks: List[List[str]], setup: str = 'IIRC', using_image_path: bool = False, cache_images: bool = False, essential_transforms_fn: Optional[Callable[[PIL.Image.Image], torch.Tensor]] = None, augmentation_transforms_fn: Optional[Callable[[PIL.Image.Image], torch.Tensor]] = None, test_mode: bool = False, complete_information_mode: Optional[bool] = None, superclass_data_pct: float = 0.6, subclass_data_pct: float = 0.6, superclass_sampling_size_cap: int = 100)¶ Bases:
iirc.lifelong_dataset.base_dataset.BaseDataset
,torch.utils.data.dataset.Dataset
A class inhereting from BaseDataset to be used with PyTorch
- Parameters
dataset (DatasetStructType) – a list of tuples which contains the data in the form of (image, (label,)) or (image, (label1,label2)). The image path (str) can be provided instead if the images would be loaded on the fly (see the argument using_image_path). label is a string representing the class name
tasks (List[List[str]]) – a list of lists where each inner list contains the set of classes (class names) that will be introduced in that task (example: [[dog, cat, car], [tiger, truck, fish]])
setup (str) – Class Incremental Learning setup (CIL) or Incremental Implicitly Refined Classification setup (IIRC) (default: IIRC_SETUP)
using_image_path (bool) – whether the pillow image is provided in the dataset argument, or the image path that would be used later to load the image. set True if using the image path (default: False)
cache_images (bool) – cache images that belong to the current task in the memory, only applicable when using the image path (default: False)
essential_transforms_fn (Optional[Callable[[Image.Image], torch.Tensor]]) – A function that contains the essential transforms (for example, converting a pillow image to a tensor) that should be applied to each image. This function is applied only when the augmentation_transforms_fn is set to None (as in the case of a test set) or inside the disable_augmentations context (default: None)
augmentation_transforms_fn – (Optional[Callable[[Image.Image], torch.Tensor]]): A function that contains the essential transforms (for example, converting a pillow image to a tensor) and augmentation transforms (for example, applying random cropping) that should be applied to each image. When this function is provided, essential_transforms_fn is not used except inside the disable_augmentations context (default: None)
test_mode (bool) – Whether this dataset is considered a training split or a test split. This info is only helpful when using the IIRC setup (default: False)
complete_information_mode (bool) – Whether the dataset is in complete information mode or incomplete information mode. This is only valid when using the IIRC setup. In the incomplete information mode, if a sample has two labels corresponding to a previous task and a current task (example: dog and Bulldog), only the label present in the current task is provided (Bulldog). In the complete information mode, both labels will be provided. In all cases, no label from a future task would be provided. When no value is set for complete_information_mode, this value is defaulted to the test_mode value (complete information during test mode only) (default: None)
superclass_data_pct (float) – The percentage of samples sampled for each superclass from its consistuent subclasses. This is valid only when using the IIRC setup and when test_mode is set to False. For example, If the superclass “dog” has the subclasses “Bulldog” and “Whippet”, and superclass_data_pct is set to 0.4, then 40% of each of the “Bulldog” samples and “Whippet” samples will be provided when training on the task that has the class “dog” (default: 0.6)
subclass_data_pct (float) – The percentage of samples sampled for each subclass if it has a superclass. This is valid only when using the IIRC setup and when test_mode is set to False. For example, If the superclass “dog” has one of the subclasses as “Bulldog”, and superclass_data_pct is set to 0.4 while subclass_data_pct is set to 0.8, then 40% of the “Bulldog” samples will be provided when training on the task that contains “dog”, and 80% of the “Bulldog” samples will be provided when training on the task that contains “Bulldog”. superclass_data_pct and subclass_data_pct don’t need to sum to 1 as the samples can be repeated across tasks (in the previous example, 20% of the samples were repeated across the two tasks) (default: 0.6)
superclass_sampling_size_cap (int) – The number of subclasses a superclass should contain after which the number of samples doesn’t increase anymore. This is valid only when using the IIRC setup and when test_mode is set to False. For example, If a superclass has 8 subclasses, with the superclass_data_pct set to 0.4, and superclass_sampling_size_cap set to 5, then superclass_data_pct for that specific superclass will be adjusted to 0.25 (5 / 8 * 0.4) (default: 100)