What is a DPU (Data Processing Unit)?

What is a DPU (Data Processing Unit)? 

Data processing units, commonly known as DPUs, are a new class of reprogrammable high-performance processors combined with high-performance network interfaces that are optimized to perform and accelerate network and storage functions carried out by data center servers. DPUs plug into a server’s PCIe slot just as a GPU would, and they allow servers to offload network and storage functions from the CPU to the DPU, allowing the CPU to focus only on running the operating systems and system applications. DPUs often use a reprogrammable FPGA combined with a network interface card to accelerate network traffic the same way that GPUs are being used to accelerate artificial intelligence (AI) applications by offloading mathematical operations from the CPU to the GPU. In the past, GPUs were used to deliver rich, real-time graphicsThis is so because they can process large amounts of data in parallel making them ideal for accelerating AI workloads, such as machine learning and deep learning, and other artificial intelligence workloads.

DPU Accelerated Servers will become extremely popular in the future thanks to their ability to offload 
network functions from the CPU to the DPU, freeing up precious CPU processing power, allowing the CPU to run more applications, and run the operating system as efficiently as possible without being bogged down by handling network activities. In fact, some experts claim that 30% of CPU processing power goes towards handling network and storage functions. Offloading storage and network functions to the DPU frees up precious CPU processing power for functions such as virtual or containerized workloads. Additionally, DPUs can be used to handle functions that include network security, firewall tasks, encryption, and infrastructure management. 

DPUs will become the third component in data center servers along with CPU (central processing units) and GPUs (graphics processing units) because of their ability to accelerate and perform network and storage functions. The CPU would be used for general-purpose computing. The GPU would be used to accelerate artificial intelligence applications. The DPU in a DPU equipped server would be used to process data and move data around the data center. 

Overall, DPUs have a bright future thanks to the ever-increasing amount of data stored in data centers, requiring a solution that can accelerate storage and networking functions performed by high-performance data center servers. DPUs can breathe new life into existing servers because they can reduce the CPU utilization of servers by offloading network and storage functions to the DPU. Estimates indicate that 30% of CPU utilization goes towards networking functions, so moving them to the DPU will provide you with extra CPU processing power. Thus, DPUs can extend the life of your servers for months or even years, depending on how much of your system’s resources are being used for network functions. 

What are the Components of a DPU? 

A DPU is a system on chip that is made from three primary elements. First, data processing units typically have a multi-core CPU that is software programmable. The second element is a high-performance network interface that enables the DPU to parse, process, and efficiently move data through the network. The third element is a rich set of flexible, programmable acceleration engines that offload network and storage functions from the CPU to the DPU. DPUs are often integrated with smart NICs offering powerful network data processing. 

Nvidia is leading the way when it comes to DPUs, recently releasing the Nvidia Bluefield 2 DPU, which is the world’s first data infrastructure on chip architecture, optimized for modern data centers. The Bluefield 2 DPU allows data center servers to offload network and storage functions from the CPU to the DPU, allowing the DPU to handle mundane storage and network functions. 

Nvidia DPUs are accessible through the DOCA SDK, enabling a programmable API for DPU hardware. DOCA enables organizations to program DPUs to accelerate data processing for moving data in and out of servers, virtual machines, and containers. DPUs accelerate network functions and handle east-west traffic associated with VMs and containers and north-south traffic flowing in and out of data centers. That said, where DPUs shine is in moving data within a data center because they are optimized for data movement.  

Furthermore, Nvidia states that DPUs are capable of offloading and accelerating all data center security services. This is so because they include next-generation firewalls, micro-segmentation, data encryption capabilities, and intrusion detection. In the past, security was handled by software utilizing x86 CPUs; however, security can be offloaded to DPUs, freeing up CPU resources for other tasks. 

What Are the Most Common Features of DPUs? 

DPUs have a ton of features, but here are the most common features that are found on DPUs: 

  • High-speed connectivity via one or multiple 100 Gigabit to 200 Gigabit interfaces 
  • High-speed packet processing 
  • Multi-core processing via ARM or MIPS based CPUs (8x 64-bit Arm CPU Cores)
  • Memory controllers offering support for DDR4 and DDR5 RAM 
  • Accelerators 
  • PCI Express Gen 4 Support 
  • Security features 
  • Custom operating system separated from the host system’s OS 

What Are Some of the Most Common DPU Solutions? 

Nvidia has released a DPU known as the Nvidia Mellanox BlueField 2 DPU and the BlueField 2X DPU. The BlueField 2X DPU has everything that the BlueField 2 DPU has, plus an additional Ampere GPU, enabling artificial intelligence functionality on the DPU. Nvidia included a GPU on its DPU to handle security, network, and storage management. For example, machine learning or deep learning can run on the data processing unit itself and be used to identify and stop an attempted network breach. Furthermore, Nvidia has stated that it intends to launch Bluefield 3 in 2022 and Bluefield 4 in 2023.  

Companies such as Intel and Xilinx are introducing some DPUs into the space. That said, some of the offerings from Xilinx and Intel are known as SmartNICsSmartNICs from Xilinx and Intel utilize FPGAs to accelerate network and storage functions. Smart NICs work the same way as do data processing units in that they offload network functions from the CPU to the SmartNIC, freeing up processing power by intelligently delegating network and storage functions to the SmartNICFPGAs bring parallelism and customization to the data path because of the reprogrammable nature of FPGAs.  

For example, Xilinx offers the ALVEO series of SmartNICs with various products, and Intel and its partners offer several FPGA-based SmartNIC solutions to accelerate data processing workloads in large data centers. Intel claims that its SmartNICs “boost data center performance levels by offloading switching, storage, and security functionality onto a single PCIe platform that combines both Intel FPGAs and Intel Xeon Processors.” Intel offers a second newer Smart NIC solution known as the Silicom FPGA SmartNIC N5010, which combines an Intel Stratix 10 FPGA with an Intel Ethernet 800 Series Adapterproviding organization with 4x 100 Gigabit Ethernet Ports, offering plenty of bandwidth for data centers. 

Why Are DPUs Increasing in Popularity? 

We live in a digital information age where tons of data is being generated daily. This is especially true as the number of IoT devices, autonomous vehicles, connected homes, and connect workplaces come onlinesaturating data centers with data. So, there is a need for solutions that can enable data centers to cope with the ever-increasing amount of data moving in/out of data centers and the data moving through a data center. 

DPUs contain a data movement system that accelerates data movement and processing operations, offloading networking functions from a server’s processor to the DPU. DPUs are a great way for extracting more processing power out of a server, especially when considering that Moore’s Law has slowed down, pushing organizations to use hardware accelerators to gain more performance from their hardware, reducing an organization’s total cost of ownership since more performance can be extracted from existing hardware, allowing a server to perform more application workloads. 

Data processing units and FPGA SmartNICs are gaining popularity, with Microsoft and Google exploring bringing them to their data centers to accelerate data processing and artificial intelligence workloads. Moreover, Nvidia has partnered with VMware to offload networking, security, and storage tasks to the DPU. 

What Are Some Other Performance Accelerators? 

We will now discuss some of the other performance accelerators that are often used in data centers. The performance accelerators that we will discuss include GPUs (graphics processing units), computational storage, and FPGA (field-programmable gate arrays). 

1. Graphics Processing Units (GPUs) 

Graphics processing units are often deployed in high-performance servers in data centers to accelerate workloads. A server will often offload complicated mathematical calculations to the GPU because the GPU can perform them faster. This is so because GPUs employ a parallel architecture, which is made from many smaller cores than CPUs, enabling them to handle many tasks in parallel, which allows organizations to extract more performance from servers.

Source Credit (Nvidia) 

For example, the average CPU has anywhere between four to ten cores, while GPUs have hundreds or thousands of smaller cores that operate together to tackle complex calculations in parallel. As such, GPUs are different from CPUs, which have fewer cores and are more suitable for sequential data processing. GPU accelerated servers are great for high-resolution video editing, medical imaging, artificial intelligence, machine learning training, and deep learning training. 

GPUs installed on data center servers are great for accelerating deep learning training and machine learning training which require a lot of computation power that CPUs simply do not offer. GPUs perform artificial intelligence tasks quicker than CPUs because they are equipped with HBM (high bandwidth memory and hundreds or thousands of cores that can perform floating-point arithmetic significantly faster than traditional CPUs.  

For these reasons, organizations use GPUs to train deep learning and machine learning models. The larger the data set and the larger the neural network, the more likely an organization will need a GPU to accelerate the workloads. Although CPUs can perform deep learning training and machine learning training, it takes them a long time to complex computations. There are situations where deep learning training takes a few hours; however, performing the same task using only a CPU may take a few days to a few weeks instead of just a few hours. 

Moreover, adding GPUs to data center servers provides significantly better data throughput and offers the ability to process and analyze data with as little latency as possible. Latency refers to the amount of time required to complete a given task, and data throughput refers to the number of tasks completed per unit of time. 

2. Computational Storage Devices (CSD) 

Computational storage has made its way into data centers as a performance accelerator. Computational storage processes data at the storage device level, reducing data moving between the CPU and the storage device. Computational storage enables real-time data analysis and improves a system’s performance by reducing input/output bottlenecks. Computational storage devices look the same as regular storage devices, but they include a multi-core processor that’s used to perform functions such as indexing data as it enters the storage devices and search the storage devices for specific entries. 

Source Credit (AnandTech)

Computational storage devices are increasing in popularity due to the growing need to process and analyze data in real-timeReal-time data processing and analysis is possible because the data no longer has to move between the storage device and the CPU. Instead, the data is processed on the storage device itself. Bringing compute power to storage media at the exact location where the data is located enables real-time analysis and decision making.  

3. FPGA (Field Programmable Gate Array) 

Source Credit (Xilinx)

An FPGA is an integrated circuit that is made from logic blocks, I/O cells, and other resources that allow users to reprogram and reconfigure the chip in different ways according to the specific requirements of the workload you want it to perform. FPGAs are gaining popularity for performing deep learning inference processing and machine learning inference. Additionally, FPGA-based SmartNICs are being used because of their ability to offload network and storage functions from the CPU to the SmartNICNetwork and storage functions can place a significant burden on a system’s CPU, so offloading these functions to a SmartNIC frees up precious CPU processing power to run the OS and other critical applications. FPGA based SmartNICs allow organizations to optimize the SmartNIC for the specific workload that’s going to be offloaded to the SmartNIC, providing customizability that’s difficult to find elsewhere. 

Bottom Line

At this point, it should come as no surprise that DPUs (data processing units) are gaining popularity in high-performance data center servers due to their ability to offload storage and network functions to the DPU, allowing the processor to focus on running the operating system and revenue generating applications. Premio offers a number of DPU servers that utilize DPUs to make servers more powerful by offloading data processing, network functions, and storage functions from the CPU to the DPU. Nvidia claims that a single BlueField 2 Data Processing Unit can handle the same data center services that would require 125 CPU cores, allowing DPU servers to work smart and not harder. So, if you’re interested in buying DPU servers, feel free to contact our DPU server professionals. They will be more than happy to assist you with choosing or customizing a solution that meets your specific requirements.