I thought on 64 bit systems the stack could basically be infinite for all practical purposes. Is this only a problem on more limited (i.e. <64bit) systems or am I misremembering when I last learned about stacks 10+ years ago?
It is always a problem, in the same way that holding on to huge objects is always a problem. Let's think about this in terms of a memory map, so ignoring limits of physical memory. On 64bit systems we actually have 48bits of usable address space, so that's 256TB. Now, the OS is going to need some workspace, but I think we can ignore that for this argument. So if you're only going to run 4 threads then each one could have a huge stack of 32TB, but if we wanted to run a huge server with a thread per connection then maybe we can only have 256MB per thread.
Now, 256MB is pretty huge, but we probably want way more memory kept for heap storage and similar things (because managing lifetimes purely on the stack will be hard and might require a lot of copying of data), so it will be less than that. Now, you may ask, "Why not start with small stacks and make them bigger as needed?" It's a good question, but since our stacks are contiguous areas of memory and we don't know what's in them, we still have to space them out in our memory map allowing for as much growth as is needed.
We might solve some of these problems by introducing segmented stacks, but this is one of those problems that crosses so many language, runtime, and OS boundaries that it's been hard to do in the general case, and it feels like gently nudging people towards writing code that can be tail-optimised is easier, just as it's easier to push people to use async than it is to provide systems and APIs that would allow for blocking code and a huge number of threads.
Thanks, I didn't realise the usable address space was so "small", those numbers are well within reasonable usage so that's definitely too limiting to just say it's infinite and not worry about it.
Apart from address space, there’s also simply the question of wasted resources, because in tail-call-optimizable programs, that stack space is by definition not strictly needed (otherwise it couldn’t be optimized away), so you might unnecessarily be wasting many GBs of memory. Lastly, TCO can improve cache locality and thus potentially speed up the program.
Yes and no. In theory it would be totally fine to "allocate" a stack that was exobytes in size (or whatever extremely large number). Thanks to virtual memory no one really cares... if that memory isn't actually being used. But as your stack grows larger and larger the OS has to find pages of physical memory to assign to that allocation, and you're simply going to run out of memory. A high end home PC or average server is only going to have 64-128 GB of RAM, a very high end server might have a few TB. Of course when you start to run out of physical memory the OS will start paging (dumping data from RAM to disk to free up memory), but that will kill the performance of your app even with high end SSD's. If you keep chugging along with your stack growth you'll probably hit OS limits on page file size next, if not you'll eventually run out of disk space.