How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?
How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?