train a mini-language-model:
https://github.com/jingyaogong/minimind
Learning Rate
RMSNorm