Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
RISC-V Origins and Architecture, Part 1 (thechipletter.substack.com)
85 points by ChuckMcM on Aug 7, 2023 | hide | past | favorite | 29 comments


This is a great overview of the origin of the RISC-V project and I particularly like the history of instructions which tracks back to previous instructions that were "kind of" like the current instruction. I suspect this particular set of articles are assisting the RISC-V folks hold at bay the various patent lawsuits they are clearly being threatened with.

One of the worst things about technology was that so many people focused on "If we own the road, we can charge EVERYONE a toll and get rich!" and so, so many, business plans were built around gaining exclusive control over key facets of technology. The idea that there could be an instruction set architecture (ISA) and related silicon implementations that anyone could use for free, is an existential threat to those who live on the margin generated by the licensing of their ISAs.

If anything can derail ARM's growing influence, it will be this. The winners will be people like Apple with the resources to have their own chips fabbed with their own extensions and put into their own products which can run rings around people trying to use "generic" processors to do the same thing. Could signal a coming resurgence in demand for "computer architects" (old skool type :-))


I don't know whether anyone has ever really threatened a patent lawsuit over the ISA, but it definitely does seem to be a good idea to document the origins of instructions, and that they are either so well known and obvious that they can't be patented, or else that they were patented and the patent has expired.

Another protection is that the membership application for RISC-V International includes certification that RISC-V does not infringe on the applicant's own patents. With companies such as Intel, AMD, IBM, Marvell, Microchip, MIPS, nVIDIA, NXP, Qualcomm, Renesas, Samsung, Siemens, Sony, STM, Texas Instruments being members that covers a lot of ground.

The base ISA (up to RV32GC, RV64GC) very deliberately breaks no new ground. It's just smoother, with known potholes filled in.

Many of the optional extensions do break some new ground. The V extension has some unique features, though the basics are inspired by the Cray-1. RVWMO is based on experience gained with ARM and DEC Alpha memory ordering models, but fixing the problems -- and was actually designed by a world-wide group of academic and industry experts. Both of these are examples of something that simply would not be done as well by a small group of people inside a single company, working to a deadline.


My experience of being in technology companies for > 30 years is that every time something hits "it's everywhere" status people crawl out of the wood works claiming they have both prior art and a patent on it. The entire submarine patent game was about having dozens of unclear patents in the "application" phase, only to sharpen them up and file them as things like RISC-V debut. They become successful and suddenly poof the current patent assignee (typically a law firm) starts chasing around suing people and making people's lives miserable until their patent is invalidated.


I believe early on there was a change to a pre-AUIPC instruction due to potential infringement fears (for PC-relative indexing); but everything else was understood to be following well-worn paths.


Interesting. I hadn't heard that, but then I was a late-comer in 2016 with the user ISA already in its current form. I have some vague memory of having seen something about RISC-V a few years earlier than that, and glancing at the ISA, and not being impressed. But by the time the HiFive1 came it all looked pretty great to me. Not perfect — `SLT[I][U]` should produce 0 or -1 not 0 or 1, for example — but pretty darn good. The C extension also made it vastly more attractive as a practical ISA, and that was a late-comer (even if always "planned" and already mentioned as being developed in the May 2011 spec).

Looking at the May 2011 spec (https://www2.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-...) there is `RDNPC Rd` which simply loads next PC into Rd. So that's 4 bytes different to modern `AUIPC Rd,0`, and functionally identical (other than return address predictor SNAFU) to modern `JAL Rd,.+4` or 2011 `JAL .+4` (only `x1` supported).

So 32 bit PC-relative addressing needed `RDNPC;LUI;ADD` and then `JALR`/load/store vs modern just `AUIPC` before the main instruction.

If there was something before `RDNPC` then it much have been very short-lived!

There is allegedly an earlier RISC-V design here...

https://inst.eecs.berkeley.edu/~cs250/fa10/handouts/lab2-ris...

... but I get "You don't have permission to access this resource."

There is an updated 2011 version which appears to conform to the May 2011 spec. The instruction listing omits load/store byte/half and there is no relative addressing support at all (except `JAL 4`) but that is probably just simplification for a hardware design class.

https://inst.eecs.berkeley.edu/~cs250/fa11/handouts/lab2-ris...

Chris, do you have access to that fa10 version?


I also get a permission error: but I think this (https://inst.eecs.berkeley.edu/~cs250/fa10/handouts/lab2-ris...) is the same file, and functionally the same as the fa11 version?

I think RDNPC is what I was thinking of, but I can't at all remember what was perceived as "risky" about it. I may be overblowing something from memory. AUIPC is better anyways.


Ohh .. the ISA changed quite a bit between 2010 and May 2011!

- 2010 opcodes left justified like MIPS, 2011 right justified and a func3 like modern RISC-V .. but rd on the far left, not between func3 and opcode.

- register names in formats changed from xa, xb, xc (dst) to rs1, rs2, rd

- 2010 lw uses xa for dst, sw uses xa for src like MIPS, ARM etc. 2011 has rd always in the same place, with sw using rs2 not rd, like modern RISC-V.

- 2010 jalr has no offset, 2011 has 12 bit offset like now.

- 2010 the literals for ANDI, ORI, XORI are zero-extended, 2011 unspecified (so I think all sign-ext)

- 2010 j/jal have 27 bit offset (28 with shift?), 2011 25 bit field.

- 2010 ADD/SUB/shifts have *W suffixes. 2011 is is like modern RV32.

- 2010 has SRA / SRAI, 2011 only has logical. Possibly just a simplification for the lab.


If you want to do some further archeology, check out the SMIPS ISA, which I believe dates back to 2005 (which itself gradually evolved from the T0/Scale/6.371 MIPS ISAs; and the earliest, T0, was derived from MIPS-II).


Seems to be just exactly MIPS-II, down to the encoding, but with stuff left out.

http://csg.csail.mit.edu/6.S078/6_S078_2012_www/handouts/smi...


> RVWMO is based on experience gained with ARM and DEC Alpha memory ordering models, but fixing the problems

In what ways?

I thought that it was fairly well understood that LR/SC stuff was suboptimal for (for example) lock-free data structures and that the x86 primitives were quite a bit more suitable for those.

Taking a superficial look at the RVWMO stuff doesn't seem to contradict this.

What am I missing? How was this fixed?


The memory model and atomic ops are unrelated.

The memory model specifies what happens when you DON'T use atomic ops.


> various patent lawsuits they are clearly being threatened with.

Do you have a link we could read about this?


If you're curious about the origins of RISC-V, check out Andrew Waterman's PhD thesis "Design of the RISC-V Instruction Set Architecture" ("Why Develop a New Instruction Set?"). It's quite readable and has a nice comparison of ISA encodings of RISC-V, MIPS, SPARC, Alpha, ARMv7/8, Thumb, OpenRISC, and x86/x86-64.

https://people.eecs.berkeley.edu/~krste/papers/EECS-2016-1.p...


As a totally ignorant person to CPU architecture, I do wonder what was wrong with OpenSPARC.

I remember hearing about it back in college in the 2000s and thinking "wow, open source hardware! This is revolutionary!". I'm sure there is a good reason it didn't work/scale, but does anyone here know?


Andrew Waterman (Chief Engineer, Co-Founder of RISC-V and SiFive) describes the motivation for a new instruction set for RISC V in "Design of the RISC-V Instruction Set Architecture" (https://people.eecs.berkeley.edu/~krste/papers/EECS-2016-1.p...).

The document contains a short discussion of the pros and cons of most architectures which were still somewhat popular at the time RISC V was designed (the Power/PPC is omitted) - MIPS, SPARC, Alpha, ARMv7/v8, OpenRISC and x86.

Major points that count against SPARC in Waterman's view are:

- SPARC register windows have significant power and area cost and complicate superscalar implementations

- Branches use condition codes which result in more complexity for OOO implementations with register renaming

- Load/store pair instructions also complicate implementations with register renaming (ARMv8 has introduced those, too)

- Moves between the floating-point and integer register files must use the memory system as an intermediary

- The only atomic memory operation is fetch-and-store, which is insufficient to implement many wait-free data structures

The openly available SPARC implementations are either embedded-style cores (e.g. Gaisler's LEON3) or relatively specialized cores for highly parallel workloads such as web serving (T1 and T2). Regular Sun 4m/4u cores were not available in source form, but the SPARC license would have allowed alternative implementations. The license used to be available for a one-time fee of $99, but it seems SPARC can be used free and without royalties now, the $99 are an optional registration fee at SPARC International (https://sparc.org/faq/#q3).


To add to this, Andrew et. al. at Berkeley built SPARC cores, so they were well aware of what it took to implement hardware to run SPARC software. I believe some of the really annoying challenges were at the system/privileged architecture level, including issues where off-the-shelf privileged software was either not obeying the specification or instead relied on undefined/underdefined corner cases. I believe RISCV's atomic read/modify/write CSR behavior was informed from some of these experiences.


Sparc has register windows[0] and branch delay slots[1]. The former was made obsolete by graph-coloring register allocation[2] and the latter by smarter branch predictors. It's certainly possible to design aggressive Sparc implementations, but newer designs such as RISC-V have a lot less legacy baggage.

[0] https://yarchive.net/comp/register_windows.html

[1] https://en.wikipedia.org/wiki/Delay_slot

[2] https://en.wikipedia.org/wiki/Register_allocation#Graph-colo...


To add to everything that's been said, opensparc only covers 32bit, and is thus of limited usefulness.


Same with OpenRISC, at the time the RISC-V project was started.

ARM too. The RISC-V design was fairly well advanced by the May 2011 spec, and IMAFD (and E) were in essentially their current form in the May 2014 spec.

Aarch64 was announced between those two in October 2011, with the first 64 bit Apple iPhone 5s shipping in September 2013 and with ARM's own cores such as Samsung Galaxy S6 in April 2015.

The first commercial RISC-V chip (SiFive FE310) shipped in December 2016.


Any bets on when we we’ll see RISC-VI? And which aspects of RISC-V it will change, correcting mistakes or designs that didn’t scale to unanticipated computing environments or requirements?


2100. Or never.

IBM System/360 is still going strong after 60 years, with extensions not replacement. Only the 24 bit memory addresses were short-sighted. Motorola 68000 and ARM made the same mistake, 15 and 20 years later. Let's not even mention x86 memory addressing.

RISC-V has that covered with 64 bit from the outset, and 128 bit addressing included in the planning.

The basic RISC idea of separating load/store from arithmetic has stood the test of time for 60 years too, since 1964's CDC6600, the first supercomputer, and the fastest one for the rest of that decade. The Cray 1 in 1975 is also very much a RISC computer on the scalar side (plus vectors very similar to modern RISC-V V extension).

Both of those machines had two instruction lengths just like current RISC-V: 15 and 30 bits on the CDC6600 and 16 and 32 bits on the Cray 1. RISC-V has had plans for future longer instructions, if necessary, from the outset.

IBM 360 is actually very similar, with the same 2 bits in every instruction specifying whether the instruction is 16, 32, or 48 bits long.

This makes very wide instruction decode simple -- 8 wide, 16 wide, even 32 wide if that is ever practical. Not quite as simple as doing the same thing on a completely fixed length ISA such as Aarch, but simple enough to not be a problem. Unlike x86.

What else might want to change? More registers? That is easily handled with the planned-for 48 bit or 64 bit instruction formats. The long instructions can address all 64 or 256 or 1024 registers, while the 32 bit instructions can use only 32 of them, and the 16 bit instruction can use (mostly) only 8 of them. Or x86-style REX prefixes could be used. That works, but is worse for wide instruction decode unless the prefix itself indicates the full instruction length.


Thanks. It's interesting to see the commonality (path dependence?) of these ISAs and RISC-V's extensible design. I'm reminded of "How DEC Developed Alpha" (1992), an article describing how DEC designed the Alpha architecture to scale to handle a 1000-fold performance increase they anticipated over the next 25 years:

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=144...


It would have worked too, if Alpha hadn't gone down along with the mismanaged company that owned it.

That's perhaps the biggest thing about RISC-V: no one company can kill it by going out of business (DEC), by getting bored with it (Oracle with SPARC), or by deciding to replace it with something else (Intel with Itanium).

If the company you were buying RISC-V disappears, you can choose from ten others -- maybe 100 or 1000 others one day. Or, if you have the skills, you can design your own ... maybe in an FPGA if technology has moved along and you were using an old design.

Or if another ISA takes over, you can run a RISC-V emulator on whatever the new ISA is.

All of this without any questions or worries about legality, licenses, patents.


Yeah, only a breaking change like quantum computing going mainstream, would change the current course.

FPGAs and special purpose CPUs don't change much, as they have also been around for decades.


I don't get why the base ISA didn't include any atomic instructions. It's an annoying omission when even tiny embedded devices often have more than one core available. And single-core designs could just implement them as no-op, the silicon cost is miniscule.


You can't implement AMOs as no-op! That code will fail horribly!

You need (in a simple core [1]) a very un-RISC read/modify/write sequencer with bus lock. That's not cheap in silicon. Or trap and emulate using LR (which could be NOP) and SC (which could be a plain store) and a few other instructions. That's not fast.

If you have that tiny embedded device with more than one core then you implement RV32IA properly, simple. And if you have a single core and no interrupts or DMA then implement RV32I and save a good bit of silicon.

[1] big designs don't implement AMOs in the CPU core, but out in the peripherals, RAM controller, or cache controller. As far as the CPU core is concerned an AMO is a memory read that sends extra data (a constant and an operation code) out with the memory address.


>One of the greatest powers in the universe is grad-student naivety. When you don’t realise something is impossible, you try it anyway.

A similar comment was in a dialogue in the Expanse series. I forget which character said it, but the gist was the same. It's a good lesson.


As a general rule I think people underestimate how hard something is going to be. It’s a good thing though, otherwise they might not even start it.


So first it has medium, now substack, oh well.

At least the content is quite good.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: