You can try to write it in Rust, doesn't mean you'll succeed. Rust targets the abstract machine, i.e. the wonderful land of optimizing compilers, which can copy your data anywhere they want and optimize out any attempts to scramble the bytes. What we'd need for this in Rust would be an integration with LLVM, and likely a number of modifications to LLVM passes, so that temporarily moved data can be tracked and erased. The only reason Go can even begin to do this is they have their own compiler suite.
I'm pretty sure you could do it with inline assembly, which targets the actual machine.
You could definitely zero registers that way, and a allocator that zeros on drop should be easy. The only tricky thing would be zeroing the stack - how do you know how deep to go? I wonder what Go's solution to that is...
I meeeeean... plenty of functions allocate internally and don't let the user pass in an allocator. So it's not clear to me how to do this at least somewhat universally. You could try to integrate it into the global allocator, I suppose, but then how do you know which allocations to wipe? Should anything allocated in the secret mode be zeroed on free? Or should anything be zeroed if the deallocation happens while in secret mode? Or are both of these necessary conditions? It seems tricky to define rigidly.
And stack's the main problem, yeah. It's kind of the main reason why zeroing registers is not enough. That and inter-procedural optimizations.
So you’re correct that covering the broadest general case is problematic. You have to block code from doing IO of any form to be safe.
In general though getting to a fairly predictable place is possible and the typical case of key material shouldn’t have highly arbitrary stacks, if you do you’re losing (see io comment above).
https://docs.rs/zeroize/1.8.1/zeroize/ has been effective for some users, it’s helped black box tests searching for key material no longer find it. There are also some docs there on how to avoid common pitfalls and links to ongoing language level discussions on the remaining and more complex register level issues.
It's not clear to me how true your comment is. I think that if things were as unpredictable as you are saying, there would be insane memory leaks all over the place in Rust (let alone C++) that would be the fault of compilers as opposed to programs, which does not align with my understanding of the world.
"Memory leaks" would be a mischaracterisation. "Memory leak" typically refers to not freeing heap-allocated data, while I'm talking about data being copied to temporary locations, most commonly on the stack or in registers.
In a nutshell, if you have a function like
fn secret_func() -> LargeType {
/* do some secret calculations */
LargeType::init_with_safe_Data()
}
...then even if you sanitize heap allocations and whatnot, there is still a possibility that those "secret calculations" will use the space allocated for the return value as a temporary location, and then you'll have secret data leaked in that type's padding.
More realistically, I'm assuming you're aware that optimizing compilers often simplify `memset(p, 0, size); free(p);` to `free(p);`. A compiler frontend can use things like `memset_s` to force rewrites, but this will only affect the locals created by the frontend. It's entirely possible that the LLVM backend notices that the IR wants to erase some variable, and then decides to just copy the data to another location on the stack and work with that, say to utilize instruction-level parallelism.
I'm partially talking out of my ass here, I don't actually know if LLVM utilizes this. I'm sure it does for small types, but maybe not with aggregates? Either way, this is something that can break very easily as optimizing compilers improve, similarly to how cryptography library authors have found that their "constant-time" hacks are now optimized to conditional jumps.
Of course, this ignores the overall issue that Rust does not have a runtime. If you enter the secret mode, the stack frames of all nested invoked functions needs to be erased, but no information about the size of that stack is accessible. For all you know, memcpy might save some dangerous data to stack (say, spill the vector registers or something), but since it's implemented in libc and linked dynamically, there is simply no information available on the size of the stack allocation.
This is a long yap, but personally, I've found that trying to harden general-purpose languages simply doesn't work well enough. Hopefully everyone realizes by now that a borrow checker is a must if you want to prevent memory unsoundness issues in a low-level language; similarly, I believe an entirely novel concept is needed for cryptographical applications. I don't buy that you can just bolt it onto an existing language.