Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning

Authors:

Wenjin Wang,

Yunqing Hu,

Qianglong Chen,

Yin Zhang

Problem Statement:

Research Goals:

Proposed Approach:

Task Similarity via Task Distance via Nearest-Prototype Distance:

PAR calculates the KL divergence between 2 tasks (distributions). However, it is very hard to calculate the distributions from raw data like image dataset. Instead, a model can be used to calculate the representations of images and in the end distributions of the images.

We can calculate distribution of images by using a ML model. In this paper they used pre-trained ResNet18 to infer the images and get their distributions for current task and previous tasks.

They only hold the mean of distributions of each classes in previous tasks. Then when a new task comes they calculated the distance between sample Xi (from new task) and mean distribution of all images in the same task Ti.

In the last step they calculate the KL divergence between p(l) and q(l) and decide whether new task is similar to any of the previous tasks. They define a threshold value alpha and they compare the result of KL divergence with the alpha value.

Parameter Allocation:

They adopt a cell-based NAS to find the best architecture for the new task. They propose a relatedness-aware sampling-based architecture search strategy to improve efficiency.

Experimental Setup:

Evaluation metrics:

Results:

It is evaluated on TIL setup and it performs much better than EWC, LwF, MAS, GPM, AGEM, Learn to Grow, Progressive Networks, and Efficient Continual Learning with Modular Networks and Task-Driven Priors.

Limitations: