It's very likely that VPNs like this are not CPU-bound, even on somewhat whimpy CPUs. I'd wager even some microcontrollers could sling 500megabits/sec around without trouble.
That CPU is pretty much a toy compared to (say) a brand-new M5 or EPYC chip, but it similarly eclipses almost any MCU you can buy.
Even with fast AES acceleration on the CPU/MCU — which I think some Cortex MCUs have — you’re really going to struggles to get much over 100Mbits of encrypted traffic handling, and that’s before the I/O handling interrupts take over the whole chip to shuttle packets on and off the wire.
Modern crypto is cheap for what you get, but it’s still a lot of extra math in the mix when you’re trying to pump bytes in and out of a constrained device.
You're looking at the wrong thing, WireGuard doesn't use AES, it uses ChaCha20. AES is really, really painful to implement securely in software only, and the result performs poorly. But ChaCha only uses addition rotation and XOR with 32 bit numbers and that makes it pretty performant even on fairly computationally limited devices.
For reference, I have an implementation of ChaCha20 running on the RP2350 at 100MBit/s on a single core at 150Mhz (910/64 = ~14.22 cycles per bytes). That's a lot for a cheap microcontroller costing around 1.5 bucks total. And that's not even taking into account using the other core the RP2350 has, or overclocking (runs fine at 300Mhz also at double the speed).
You’re totally right; I got myself spun around thinking AES instead of of ChaCha because the product I work on (ZeroTier) started with the initially and moved to AES later. I honestly just plain forgot that WireGuard hadn’t followed the same path.
An embarrassing slip, TBH. I’m gonna blame pre-holiday brain fog.