train a mini-language-model:

https://github.com/jingyaogong/minimind

image.png

Learning Rate

RMSNorm

Pre-train

SFT

Generate