What is an NPU chip?

If you have been following developments in the field of computing in recent months, you may have come across the term NPU more and more often, especially in tasks where artificial intelligence is present.

What is NPU anyway? What is its function? Do we really need it? For many, it is enough to know which processor and graphics card are located in a computer, tablet or phone. Even those of us who test these types of devices haven't paid too much attention to NPU chips so far. But times are changing. The NPU is or will become as important as the processor (CPU) and graphics card (GPU).

What is NPU?

**_{The NPU is a dedicated processor for accelerating machine learning and artificial intelligence tasks}**

An NPU, or neural processing unit, is a dedicated processor or processing unit in a larger system-on-a-chip (SoC) that is purpose-built to accelerate the performance of neural networks and artificial intelligence tasks. Unlike general purpose CPUs and GPUs, NPUs are optimized for parallel data-driven computing, so they are highly efficient in processing large-scale multimedia data such as videos and images, and processing data for neural networks. They are particularly adept at AI-related tasks such as speech recognition, background blurring in video calls, and photo or video editing procedures such as object detection, erasing, adding elements, and more.

NPUs are integrated circuits, but they differ significantly from single-function ASICs (Application-Specific Integrated Circuits). While ASICs are designed for a single purpose, such as mining a specific cryptocurrency, NPU chips offer greater complexity and flexibility, meeting the diverse requirements of network computing. They accomplish this through specialized programming in software or hardware that is tailored to the unique requirements of neural network computations.

In most consumer products, the NPU will actually be integrated into the main processor, as in the Intel Core and Core Ultra series or the new AMD Ryzen 8040 laptop processors. However, in larger data centers or more specialized industrial operations, the NPU may be an entirely separate processor on the motherboard, separate from all other processing units. On phones, NPU chips are usually integrated on a system-on-chip, where we also find processor and graphics cores.

Each manufacturer names their NPU chips slightly differently. Apple calls them Neural Engine, Qualcomm chose Hexagon, Google, for example, Tensor Processing Unit (TPU), and Huawei Da Vinci Architecture. Interestingly, Huawei was among the first to integrate the NPU into the Mate 10 smartphone.

Why do phones and computers even need NPUs?

When Samsung unveiled its Galaxy AI concept earlier this year, it may have been the first time the NPU chip was talked about. Why? If you managed to test their AI features, you may have noticed that some work locally and some require a connection to a remote server. If we want functions to run locally, i.e. directly on our device, we need an NPU, especially if we want tasks to run flawlessly.

Much of today's AI processing is done in the cloud, but this is not ideal for a number of reasons. The first is latency and network requirements. You may not be able to access the tools when you are offline, or you may have to wait for long processing times during peak traffic. Sending data over the Internet is also less secure, which is a very important factor when using artificial intelligence that has access to your personal data, such as Microsoft's now infamous Recall.

Where possible, local performance is better. However, AI features are no slouch, they require quite a bit of power. If you were one of the few who installed Stable Diffusion on your computer at the beginning of the AI craze, then you know what kind of hardware you need to get reasonably solid results. Nobody wants to wait too long, and while the processor and graphics card could do a lot on their own, it simply won't be enough from now on.

The solution is NPU chips, which can significantly speed up such tasks. Their performance is often quoted in trillions of operations per second (TOPS), but that's not a very useful metric because it doesn't tell you exactly what each operation is doing. Instead, it's often better to look for data that tells how fast tokens for large language models need to be processed, how much power it uses, how accurate it is, how much data it can read or write, and so on.

NPUs and GPUs. What's the difference?

Although many AI and machine learning workloads run on GPUs, there is an important difference between a GPU and an NPU.

Although GPUs are known for their parallel computing capabilities, not all GPUs are quite capable of handling a bunch of other tasks besides graphics, requiring special integrated circuits to efficiently handle machine learning workloads. The most popular Nvidia GPUs have these circuits in the form of Tensor cores, but AMD and Intel have also built them into their GPUs, mainly to handle upscale operations, one of the most common AI tasks.

With NPU chips, we can simply separate these circuits from the GPU, let it perform its main task with all its might, and integrate the separate circuits into an independent dedicated unit. This allows AI-related tasks to be processed more efficiently with less power consumption, which is also the reason they are becoming an indispensable component in laptops and phones. For those most demanding tasks, they will still need at least the help of a graphics processor.

NPU and CPU. What does one do, what does the other do?

An NPU differs from a central processing unit (CPU) in the workload it is designed for. A typical processor in a computer or phone has very general tasks, and it suits it, as it supports a wide range of commands, different ways of caching and invoking functions.

As we said before, machine learning and artificial intelligence tasks are different and do not require as much flexibility. Not only are they more demanding, they often operate on unusual formats, such as 16-, 8-, or 4-bit numbers. Although processors can perform basic machine learning tasks, this is not their primary task and they are much slower at it.

Will the NPU in the future required component?

Almost certainly, unless you're whistling at AI.

For example, if you want to use Microsoft Copilot Plus, you will need an NPU chip with a minimum capacity of 40 TOPS. For example, you will be able to use the local AI functions Recall, Cocreator and Live Captions.

At least for now, there aren't many apps that require an NPU chip as part of their terms of use. This will likely change as developers look to integrate more native features, such as Adobe, DaVinci Resolve, or perhaps Teams and Zoom.

It will also be necessary to pay attention to this component in the case of telephones. With premium phones, you can be sure that you will have access to the best NPU chip and all the AI features that the manufacturer has to offer. In the lower classes, for example for 500 euros and less, this will not be taken for granted. If you want to play with artificial intelligence, you will most likely have to look in the 1000 euro class and up.

At the moment, however, you are in no rush to buy a new computer or phone. We're just at the beginning, and at least for now, you haven't missed much.