It depends! The lower parameter (i.e. 7B and 8B) models forget more, and the higher (20B and 70B) forget less.