• 0 Posts
  • 10 Comments
Joined 1 year ago
cake
Cake day: June 20th, 2023

help-circle

  • I’m more familiar with RISC-V than I am with ARM though it’s my understanding they’re quite similar.

    • ARM/RISC-V are load-store architectures, meaning they divide instructions between loading/storing and doing computation. x86 on the other hand is a register-memory architecture, having instructions that do both computation as well as loading/storing.

    • ARM/RISC-V also have weaker guarantees as to memory ordering allowing for less synchronization between cores, however RISC-V has an extension to enforce the same guarantees as x86 and Apple’s M-series CPU have a similar extension for ARM. If you want to emulate x86 applications on ARM/RISC-V these kinds of extensions are essential for performance.

    • ARM/RISC-V instructions are variable width but only in a limited sense. They have “compressed instructions” - 2 bytes instead of 4 - to increase instruction density in order to compete with x86’s true variable width instructions. They’re fairly close in instruction density, though compressed instructions are annoying for compilers to handle due to instruction alignment. 4 byte instructions must be aligned to 4 bytes, so if you have 3 instructions A, B and C but only B has a compressed version then you can’t actually use it because there must be 4 bytes between instructions A and C.

    • ARM/RISC-V also makes backwards compatibility entirely optional, Apple’s M-series don’t implement 32-bit mode for instance, whereas x86-64 still has “real mode” for running 16 bit operating systems.

    There’s also a number of other differences, like the number of registers, page table formats, operating modes, etc, but those are the more fundamental ones I can think of.

    Up until your post I had thought it exactly was the size of the instruction set with x86 having lots of very specific multi-step-in-a-single instruction as well as crufty instruction for backwards compatibility (like MPSADBW).

    The MPSADBW thing likely comes from the hackaday article on why “x86 needs to die”. The kinda funny thing about that is MPSADBW is actually a really important instruction for (apparently) video decoding; ARM even has a similar instruction called SABD.

    x86 does have a large number of instructions (even more so if you want to count the variants of each), but ARM does not have a small number of instructions and a lot of that instruction complexity stops at the decoder. There’s a whole lot more to a CPU than the decoder.


  • The original debate from the 80s that defined what RISC and CISC mean has already been settled and neither of those categories really apply anymore. Today all high performance CPUs are superscalar, use microcode, reorder instructions, have variable width instructions, vector instructions, etc. These are exactly the bits of complexity RISC was supposed to avoid in order to achieve higher clock speeds and therefore better performance. The microcode used in modern CPUs is very RISC like, and the instruction sets of ARM64/RISC-V and their extensions would have likely been called CISC in the 80s. All that to say the whole RISC vs CISC thing doesn’t really apply anymore and neither does it explain any differences between x86 and ARM. There are differences and they do matter, but by an large it’s not due to RISC vs CISC.

    As for an example: if we compare the M1 and the 7840u (similar CPUs on a similar process node, one arm64 the other AMD64), the 7840u beats the M1 in performance per watt and outright performance. See https://www.cpu-monkey.com/en/compare_cpu-amd_ryzen_7_7840u-vs-apple_m1. Though the M1 has substantially better battery life than any 7840u laptop, which very clearly has nothing to do with performance per watt but rather design elements adjacent to the CPU.

    In conclusion the major benefit of ARM and RISC-V really has very little to do with the ISA itself, but their more open nature allows manufacturers to build products that AMD and Intel can’t or don’t. CISC-V would be just as exciting.


  • “unified memory” is an Apple marketing term for what everyone’s been doing for well over a decade. Every single integrated GPU in existence shares memory between the CPU and GPU; that’s how they work. It has nothing to do with soldering the RAM.

    You’re right about the bandwidth though, current socketed RAM standards have severe bandwidth limitations which directly limit the performance of integrated GPUs. This again has little to do with being socketed though: LPCAMM supports up to 9.6GT/s, considerably faster than what ships with the latest macs.

    This is why user-replaceable RAM and discrete GPUs are going to die out. The overhead and latency of copying all that data back and forth over the relatively slow PCIe bus is just not worth it.

    The only way discrete GPUs can possibly be outcompeted is if DDR starts competing with GDDR and/or HBM in terms of bandwidth, and there’s zero indication of that ever happening. Apple needs to puts a whole 128GB of LPDDR in their system to be comparable (in bandwidth) to literally 10 year old dedicated GPUs - the 780ti had over 300GB/s of memory bandwidth with a measly 3GB of capacity. DDR is simply not a good choice GPUs.