train a mini-language-model:
https://github.com/jingyaogong/minimind
RMSNorm