Wednesday, October 23

Arm Allows Custom Instructions

We’re surrounded by ARM processors, which enjoy a commanding foothold in the consumer market, especially with portable electronics. However, Arm Holdings has never focused its business model on manufacturing chips, instead licensing its CPUs to others who make the physical devices. There is a bit of a tightrope to walk, though, because vendors want to differentiate themselves while Arm wants to keep products as similar as possible to allow for portability and reuse of things like libraries and toolchains. So it was a little surprising when Arm announced recently that for the first time, they would allow vendors to develop custom instructions. At least on the Armv8-M architecture.

We imagine designs like RISC-V are encroaching on Arm’s market share and this is a response to that. Although it is big news, it isn’t necessarily as big as you might think since Arm has allowed other means to do similar things via special coprocessor instructions and memory-mapped accelerators. If you are willing to put in some contact information, they have a full white paper available with a pretty sparse example. The example shows a population count function hand-optimized into 12 Arm instructions. Then it shows a single custom instruction that would do the same job. However, they don’t show the implementation nor do they offer any timing data about speed increases.

Population count — determining how many 1s are in a word — is a good example of where this technique might pay off. You have a small piece of data you can process in a couple of cycles. For many other cases, though, the memory-mapped or coprocessor techniques will still be more appropriate. With a coprocessor or accelerator, you can have a separate piece doing something in parallel to the main CPU.

The custom instructions integrate with the normal Arm pipeline. There are only certain instruction formats you can use and your code gets the same interface as the ALU. That means it can process data, but probably can’t easily do anything to manipulate the processor’s execution.

What’s it good for? Typically, a custom instruction will perform some algorithmic step faster than it can be done using normal instructions. However, since you need to have the license for the Arm core, we aren’t likely to see end users adding instructions (please, prove us wrong). But a chip you buy from a manufacturer who make Arm chips could add some instructions like “validate IP address.”

The question is, would you use these instructions in the code you write? Doing so would probably lock your software to a specific processor family. That is, Microchip is unlikely to support ST’s custom instructions and vice versa. That’s good for the vendors, but not so good for developers, or for Arm. That seems to us to be the challenge. New instructions will need to be compelling and fast. A developer won’t likely limit their silicon choices just to do something silly in a single instruction on occasion. The vast majority of the time you’d use a C library and you won’t care how many instructions the operation uses.

To get any traction, vendors will need to find an irresistible need, and a custom instruction cabale of satisfying that need faster than any other ways of doing it. In addition, it needs to be a problem where the timing matters.

Of course, sometimes doing something like a population count can benefit more from a smarter algorithm. Arm, in particular, is very good at doing shifting.

No comments:

Post a Comment