Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> trained from scratch on 80B tokens of historical data

How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: