Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I don't think there's evidence that this issue would persist after continuing to scale models to be larger and doing more RL

And how much larger do we need to make the models? 2x? 3x? 10x? 100x? How large do they need to get before scaling-up somehow solves everything?

Because: 2x larger, means 2x more memory and compute required. Double the cost or half the capacity. Would people still pay for this tech if it doubles in price? Bear in mind, much of it is already running at a loss even now.

And what if 2x isn't good enough? Would anyone pay for a 10x larger model? Can we even realistically run such models as anything other than a very expensive PoC and for a very short time? And whos to say that even 10x will finally solve things? What if we need 40x? Or 100x?

Oh, and of course: Larger models also require more data to train them on. And while the Internet is huge, it's still finite. And when things grow geometrically, even `sizeof(internet)` eventually runs out ... and, in fact, may have done so already [1] [2]

What if we actually discover that scaling up doesn't even work at all, because of diminishing returns? Oh wait, looks like we did that already: [3]

[1]: https://observer.com/2024/12/openai-cofounder-ilya-sutskever...

[2]: https://biztechweekly.com/ai-training-data-crisis-how-synthe...

[3]: https://garymarcus.substack.com/p/confirmed-llms-have-indeed...



Scaling applies to multiple dimensions simultaneously over time. A frontier model today could be replicated a year later with a model half the size, with a quarter of the FLOPS, etc. I don’t know the real numbers for optimization scaling, but you could check out NanoGPT speedrun [1] as an example.

The best solution in the meantime is giving the LLM a harness that allows tool use like what coding agents have. I suspect current models are fully capable of solving arbitrary complexity artificial reasoning problems here, provided that they’re used in the context of a coding agent tool.

[1] https://github.com/KellerJordan/modded-nanogpt


> Scaling applies to multiple dimensions simultaneously over time. A frontier model today could be replicated a year later with a model half the size

Models "emergent capabilities" rely on encoding statistical information about text in their learnable params (weights and biases). Since we cannot compress information arbitrarily without loss, there is a lower bound on how few params we can have, before we lose information, and thus capabilities in the models.

So while it may be possible in some cases to get similar capabilities with a slightly smaller model, this development is limited and cannot go on for an arbitrary amount of time. It it were otherwise, we could eventually make a LLM on the level of GPT-3 happen in 1KB of space, and I think we can both agree that this isn't possible.

> giving the LLM a harness that allows tool use like what coding agents have

Given the awful performance of most coding "agents" on anything but the most trivial problems, I am not sure about that at all.


Some problems are just too complex and the effort to solve them increases exponentially. No LLM can keep up with exponenentially increasing effort unless you run them for adequatte number of years.


What? Fundamentally, information can only be so dense. Current models may be inefficient w.r.t. information density, however, there is a lower bound of compute required. As a pathological example, we shouldn't expect a megabyte worth of parameters to be able to encode the entirety of Wikipedia.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: