Vision processing unit

What is a Vision Processing Unit?

A vision processing unit (VPU) is a type of microprocessor aimed at accelerating machine learning and artificial intelligence technologies. VPU is a specialized processor that is made to support tasks like image processing, one of several specialized chips such as the GPU that are generally useful in machine learning.

In a way, a vision processing unit is similar to a video processing unit which is used with convolutional neural networks. Where a video processing unit is a specific type of graphics processing, the vision processing unit is described as more suitable for running different types of machine vision algorithms – these tools may be built with specific resources for getting visual data from cameras – they are built for parallel processing. Like video processing units, they are particularly geared toward image processing. Some of these tools are described as “low power and high performance” and may be plugged into interfaces that allow for programmable use. Other aspects of the build can vary due to manufacturer and design choices.

‍

What is the difference between VPU and GPU?

The main difference between VPU and GPU architecture is that a CPU is designed to handle a wide range of tasks quickly (as measured by CPU clock speed), but is limited in the concurrency of tasks that can be running. A GPU is designed to quickly render high-resolution images and video concurrently.

While GPUs can process data several orders of magnitude faster than a CPU due to massive parallelism, GPUs are not as versatile as CPUs. CPUs have large and broad instruction sets, managing every input and output of a computer, which a GPU cannot do. In a server environment, there might be 24 to 48 very fast CPU cores. Adding 4 to 8 GPUs to this same server can provide as many as 40,000 additional cores. While individual CPU cores are faster (as measured by CPU clock speed) and smarter than individual GPU cores (as measured by available instruction sets), the sheer number of GPU cores and the massive amount of parallelism that they offer more than make up the single-core clock speed difference and limited instruction sets.

GPUs are best suited for repetitive and highly parallel computing tasks. Beyond video rendering, GPUs excel in machine learning, financial simulations, and risk modeling, and many other types of scientific computations. While in years past, GPUs were used for mining cryptocurrencies such as Bitcoin or Ethereum, GPUs are generally no longer utilized at scale, giving way to specialized hardware such as Field-Programmable Grid Arrays (FPGA) and then Application-Specific Integrated Circuits (ASIC).

‍

What is Vision Processing Unit (VPU) used for?

VPUs enable demanding computer vision and edge AI workloads with efficiency. By coupling highly parallel programmable compute with workload-specific hardware acceleration in a unique architecture that minimizes data movement, VPUs achieve a balance of power efficiency and compute performance. VPU technology enables intelligent cameras, edge servers, and AI appliances with deep neural networks and computer vision-based applications in areas such as visual retail, security and safety, and industrial automation.

‍

3x your revenue with Chatbots and Live Chat

Schedule a demo

‍

What are the advantages of the Vision Processing Unit (VPU)?

It’s all on the edge; there is no interaction with the cloud. That means no latency and more privacy. It comes with a toolkit and SDK called Open VINO that can implement deep learning CNN libraries on the dedicated Neural Compute Engine in TensorFlow and Caffe. When running your algorithms on this USB stick, you can completely free the rest of the computer and GPU for other programs such as point cloud processing. You can stack multiple USB sticks and double the power as long as you want.

‍

What is Intel Vision Processing Unit (VPU)?

Myriad 2

A popular exponent is the Movidius Myriad 2 VPU based on the Intel Neural Compute Stick (NCS) platform that can be used for inference in convolutional networks with a pre-trained network.

The Myriad 2 VPU is designed as a 28-nm co-processor that provides high-performance tensor acceleration. Hence, it provides high-level APIs that allow application programmers to easily take advantage of its features and a software-controlled memory subsystem that enables fine-grained control on different workloads.

The architecture of this chip is inspired by the observation that beyond a certain frequency limit for any particular design and target process technology, the cost is quadratic in power for linear increases in operating frequency. The Myriad 2 VPU was designed following this principle, with 12 highly parallelizable vector processors, named Streaming Hybrid Architecture Vector Engines (SHAVE). Its parallelism and instruction set architecture to provide highly sustainable performance efficiency across a range of computer vision applications, including those with low latency requirements on the order of milliseconds.

The Myriad 2 VPU aims to provide an order of magnitude higher performance efficiency, allowing high-performance computer vision systems with very low latency to be built while dissipating less than 1 Watt.

Myriad X

Intel’s Myriad X VPU is the third generation and the most advanced VPU from Movidius. The Myriad X VPU, for the first time in its class, features the Neural Compute Engine, a specialized hardware AI accelerator for deep neural network deep-learning inferences.

The Neural Compute Engine, in conjunction with the 16 SHAVE cores and an ultra-high throughput (Movidius states that it can achieve over one trillion operations per second of peak DNN inferencing throughput), makes Myriad X a popular option for on-device deep neural networks and computer vision applications.

The Myriad X VPU has a native 4K image processing pipeline with support for up to 8 HD sensors connecting directly to the VPU. As with Myriad 2, the Myriad X VPU is programmable via the Myriad Development Kit (MDK), which includes development tools, frameworks, and APIs to implement the custom vision, imaging, and deep neural network workloads on the chip.

‍

How does Vision Processing Unit (VPU) work?

Decoding and encoding are done using the OpenVINO toolkit; it’s one line of code for each.

Preprocessing is done using OpenCV or other libraries, and mostly just includes resizing and fitting the image to the requirements of your network.

The inference is made using a special function of the toolkit that calls your models trained in TensorFlow and Caffe. It involves VPU, CPU, GPU, and FPGA.

‍