“Masked language modeling was a heavily explored and leveraged training objective in earlier language models like BERT.”