> trained from scratch on 80B tokens of historical data How can this thing possi...

		moffkalast 16 hours ago \| parent \| context \| favorite \| on: History LLMs: Models trained exclusively on pre-19... > trained from scratch on 80B tokens of historical data How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?