Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Trinity Nano Preview: 6B parameter MoE (1B active, ~800M non-embedding), 56 layers, 128 experts with 8 active per token

Trinity Mini: 26B parameter MoE (3B active), fully post-trained reasoning model

They did pretraining on their own and are still training the large version on 2048 B300 GPUs



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: