Your suggestion confuses latency and throughput. So it isn't correct.
For example, a modern CPU will be able to execute other instructions while waiting for a cache miss, and will also be able to have multiple cache loads in flight at once (especially for caches shared between cores).
Main memory is asynchronous too, so multiple loads might be in flight, per memory channel. Same goes for all the other layers here (multiple SSD transactions in flight at once, multiple network requests, etc)
Approximately everything in modern computers is async at the hardware level, often with multiple units handling the execution of the "thing". All the way from the network and SSD to the ALUs (arithmetic logic unit) in the CPU.
Modern CPUs are pipelined (and have been since the mid to late 90s), so they will be executing one instruction, decoding the next instruction and retiring (writing out the result of) the previous instruction all at once. But real pipelines have way more than the 3 basic stages I just listed. And they can reorder, do things in parallel, etc.
I'm aware of this to an extent. Do you know of any list of what degree of parallelization to expect out of various components? I know this whole napkin-math thing is mostly futile and the answer should mostly be "go test it", but just curious.
I was interviewing recently and was asked about implementing a web crawler and then were discussing bottlenecks (network fetching the pages, writing the content to disk, CPU usage for stuff like parsing the responses) and parallelism, and I wanted to just say "well, i'd test it to figure out what I was bottlenecked on and then iterate on my solution".
Napkin math is how you avoid spending several weeks of your life going down ultimately futile rabbit holes. Yes, it's approximations, often very coarse ones, but done right they do work.
Your question about what degree of parallelization is unfortunately too vague to really answer. SSDs offer some internal parallelism. Need more parallelism / IOPS? You can stick a lot more SSDs on your machine. Need many machines worth of SSDs? Disaggregate them, but now you need to think about your network bandwidth, NICs, cross-machine latency, and fault-tolerance.
The best engineers I've seen are usually excellent at napkin math.
For example, a modern CPU will be able to execute other instructions while waiting for a cache miss, and will also be able to have multiple cache loads in flight at once (especially for caches shared between cores).
Main memory is asynchronous too, so multiple loads might be in flight, per memory channel. Same goes for all the other layers here (multiple SSD transactions in flight at once, multiple network requests, etc)
Approximately everything in modern computers is async at the hardware level, often with multiple units handling the execution of the "thing". All the way from the network and SSD to the ALUs (arithmetic logic unit) in the CPU.
Modern CPUs are pipelined (and have been since the mid to late 90s), so they will be executing one instruction, decoding the next instruction and retiring (writing out the result of) the previous instruction all at once. But real pipelines have way more than the 3 basic stages I just listed. And they can reorder, do things in parallel, etc.