> I don't think there's evidence that this issue would persist after continuing ...

alyxya · 2025-10-31T16:03:25 1761926605

Scaling applies to multiple dimensions simultaneously over time. A frontier model today could be replicated a year later with a model half the size, with a quarter of the FLOPS, etc. I don’t know the real numbers for optimization scaling, but you could check out NanoGPT speedrun [1] as an example.

The best solution in the meantime is giving the LLM a harness that allows tool use like what coding agents have. I suspect current models are fully capable of solving arbitrary complexity artificial reasoning problems here, provided that they’re used in the context of a coding agent tool.

[1] https://github.com/KellerJordan/modded-nanogpt

usrbinbash · 2025-11-04T16:05:38 1762272338

> Scaling applies to multiple dimensions simultaneously over time. A frontier model today could be replicated a year later with a model half the size

Models "emergent capabilities" rely on encoding statistical information about text in their learnable params (weights and biases). Since we cannot compress information arbitrarily without loss, there is a lower bound on how few params we can have, before we lose information, and thus capabilities in the models.

So while it may be possible in some cases to get similar capabilities with a slightly smaller model, this development is limited and cannot go on for an arbitrary amount of time. It it were otherwise, we could eventually make a LLM on the level of GPT-3 happen in 1KB of space, and I think we can both agree that this isn't possible.

> giving the LLM a harness that allows tool use like what coding agents have

Given the awful performance of most coding "agents" on anything but the most trivial problems, I am not sure about that at all.

galaxyLogic · 2025-10-31T19:29:34 1761938974

Some problems are just too complex and the effort to solve them increases exponentially. No LLM can keep up with exponenentially increasing effort unless you run them for adequatte number of years.

Infinity315 · 2025-10-31T19:42:43 1761939763

What? Fundamentally, information can only be so dense. Current models may be inefficient w.r.t. information density, however, there is a lower bound of compute required. As a pathological example, we shouldn't expect a megabyte worth of parameters to be able to encode the entirety of Wikipedia.