I read this doc and it completely blew my mind.
https://www.haiku-os.org/legacy-docs/benewsletter/Issue4-8.html
I have done a few simple embedded driver development but graphic cards, even in the 90s, look like beasts to me.
I don't think there is any books on this topic -- the best thing we have is Linux Device Driver, and I don't think any book is going to dive deep into graphic card driver development. If I want to know the details, I'll probably read the source code of OSS drivers.
I'm wondering if there are more stories or blogs like this (maybe in the 80s too, remember those Hercules cards?). It really warms me up thinking about sitting in a cube, writing code for device drivers, reading docs everywhere, banging my head on every solid wall until I start to see code in air, quaffing coffee one by one, going into deep night...I know it's way more romantic than the real story but I can't keep myself wondering about it.
I think you and other commenters pretty much summarized what it was like. Documentation was often poor, so you would sometimes have to reach out to the folks who had actually written the hardware (or the simulator) for guidance.
That is the easy part of writing a driver, even today. Just follow the specification. The code in a GPU driver is relatively simple, and doesn't vary that much from one generation to the next. In the 90s some features didn't have hardware support, so the driver would do a bunch of math in the CPU instead, which was slow.
I'm contrast, the fun part are the times when the hardware deviates from he specification, or where the specification left things out and different people filled in the blanks with their own ideas. This is less common nowadays, as the design process has become more refined.
But yeah, debugging hardware bugs essentially boils down to:
(1) writing the simplest test that triggers the unexpected behavior that you had observed in a more complex application, then
(2) providing traces of it to the folks who wrote that part of the hardware or simulator,
(3) wait a few days for them to painstakingly figure out what is going wrong, clock by clock, and
(4) implement the workaround that they suggest, often something like "when X condition happens on chips {1.23, 1.24 and 1.25}, then program Y register to Z value, or insert a command to wait for module to complete before sending new commands".
It was more tedious than anything. Coming up with the simplest way to trigger the behavior could take weeks.
Well, that's what it was like to write user mode drivers. The kernel side was rather different and I wasn't directly exposed to it. Kernel drivers are conceptually simpler and significantly smaller in terms of lines of code, but much harder to debug.