Divide and Not Forget: Ensemble of Selectively Trained Experts in Continual Learning

Problem Statement:

Mixture of experts are usually trained all at once using whole task data, which makes them all prone to forgetting and increasing computational burden.

To address this limitation, they introduce a novel approach named SEED. SEED selects only one, the most optimal expert for a considered task, and uses data from this task to fine-tune only this expert.

The below image very well describes the proposed method:

At some point it will eventually stuck and start to forget.
I did not understand %100, if we finetune the expert even though with regularization term, how we do not catastrophically forget the previous tasks that are fine-tuned with that Expert.
Also during test time image is a bit strange. Because let's say the orange Expert is trained with cat and chicken so that's why it classifies it as cat. But then I expect the green Expert to be trained with dog and rabbit and will respond to cat image as dog (because it was the last thing it learned) So I am probably missing something but couldn't figure it out yet what is that.

It is very real life case I think.