How Can DPU Servers Improve Data Centers?


DPUs & Data Centers

As the amount of data being stored and accessed on data center servers continues to increase, there is a growing need for performance accelerators to free up server CPU cycles to run enterprise applications and the operating system (OS). DPUs are capable of freeing up the host server’s CPU cycles because network, storage, and security functions can be offloaded from the CPU to the DPU. DPUs are capable of performing these functions because they are equipped with powerful multi-core processors, accelerators, network interface controllers, and some options are also equipped with a GPU. Datacenter operators are turning to performance accelerators such as DPUs (data processing units) because Moore’s Law has slowed down, and processing power is not advancing at the rate it once did. DPUs allow data center operators to extract more power from their existing hardware, negating the need to replace all of their equipment for more performance. 

How Can DPU Servers Improve Data Centers? 

DPU accelerated servers can improve data centers by increasing the available processing power for enterprise applications. This is so because adding a DPU (data processing unit) to a server allows the server to offload the processing of network and storage functions to the DPU, freeing up precious CPU processing power for mission-critical applications and running the operating system. DPUs are a new class of programmable processors that Nvidia claims will become a staple in data centers, just as are CPUs (central processing units) and GPUs (graphics processing units). Recently, GPUs made their way into data centers, accelerating AI workloads, such as machine learning and deep learning. GPUs were widely adopted in data centers because of their massive parallel processing power that’s possible thanks to the hundreds or thousands of cores that GPUs come equipped with.


Now,
 DPUs are making their way into data centers, freeing up server processing power since network and storage functions are offloaded from servers’ main CPUs to the DPU. Nvidia estimates that approximately 30% of a server’s processing power is dedicated to performing network and storage functions. Therefore, equipping data center servers with DPUs frees up precious processing power for running the OS and other enterprise applications. So, to recap, the CPU will be used for general-purpose computing, the GPU will be used to accelerate certain workloads that benefit from their massive parallel computing power, and DPUs will be used to take over storage and network functions, such as data processing and moving data throughout data centers. 

As the amount of data moving through the data center increases due to the explosion of IoT and the availability of 5G connectivity, bringing millions of new data-generating devices online, DPUs will be needed to accelerate storage and networking functions carried out by data center servers. So what exactly is a DPU? We will answer this question in much detail below. 

What is a DPU? 

A DPU, also known as a data processing unit, is made from a Multi-core processor (usually an Arm processor), a network interface controller capable of transferring data at extremely high speeds (100 Gigabits/s to 200 Gigabits/s), a set of acceleration engines that accelerate application performance, and RAM memory. Also, Nvidia provides a DPU that is equipped with an Ampere GPU that’s used to run AI applications that include machine learning and deep learning. Furthermore, it can use AI for real-time security analytics, including identifying abnormal traffic, which helps the organization identify theft of confidential data or malicious activity on the network. That said, the core features of DPUs remain their ability to take over network and storage tasks that include isolation, root trust, key management, elastic block storage, data compression, and much more.


For example, the Nvidia Mellanox BlueField 2 
is a data processing unit made by combining the industry-leading ConnectX network adapter with several ARM cores, providing advanced networking, storage, and security features to data centers. Furthermore, the BlueField 2 DPU can transfer data at speeds of up to 200 Gigabits per second while freeing up data center server cores to speed the performance of revenue-generating services. 

Overall, the future of DPUs looks like a bright one indeed. This is so because as the amount of data that’s generated continues to explode, data centers are looking for ways to extract more performance from their existing servers. Equipping a server with a DPU can increase a server’s performance by more than 30% since it offloads networking and storage functions from the CPU to the DPU, allowing the DPU to manage the movement of data throughout the data center. This is great for data center operators because it allows them to extract more performance from their existing hardware without replacing all of their equipment.  

What Are the Main Components of a Data Processing Unit? 

Data processing units are a type of SoC (system on chip) that has the following components: high-performance network interface controller, multi-core processors (Nvidia uses Arm processors) that is software programmable, random access memory (RAM up to 16GB), and a rich set of programmable performance accelerators. The high-performance network interface controller is tasked with processing, parsing, and transferring data throughout a data center.  

Common Features Found on DPUs: 

  • Extreme high-speed connectivity thanks to onboard 100 Gigabit/s to 200 Gigabit/s interfaces 
  • High-speed packet processing 
  • Multi-core powerful CPU
  • Support for DDR4 or DDR5 RAM 
  • Performance accelerators 
  • Support for PCI Express 4.0 
  • Security features 
  • Data and storage management features 

Why are DPU Servers Becoming More Popular? 

DPU servers are gaining popularity because, as previously mentioned, DPU servers are more powerful than regular data center servers. This is so because DPU accelerated servers can offload all of the networking and storage functions to the DPU, freeing up processing power in the server that can be used to run enterprise applications. Increasing processing power continues to grow in importance as more data makes its way to data centers. Datacenter servers must have sufficient processing power to process and analyze the data coming in. One way to do this is to offload networking and storage functions to a DPU. DPUs are optimized explicitly for moving data east-west throughout data centers.  

Since Moore’s Law has slowed down, increasing CPU performance to cope with the increase in data is not possible, necessitating dedicated hardware such as DPUs to manage data flow and processingIn fact, Intel, Google, and Microsoft are looking at data processing units as a possible solution to cope with the explosion of data. 

So, as data volume and velocity increase, DPU servers are gaining popularity because they can handle data-related workloads, such as encryption and protection, offloading them from the CPU to the DPU. This allows the CPU to focus on running the operating system (OS) and enterprise application, improving servers’ overall performance while also delivering storage and networking improvements. 

What are Common Configurations for a DPU Server? 

Let’s explore some of the different server configurations. All of these items can be added to servers in addition to data processing units. 

1. Computational Power 

High-performance servers can be equipped with Dual Intel Xeon SP (scalable processors), featuring up to 28 cores and 56 threads each. Xeon processors offer blazing fast performance, which assists organizations with meeting the growing demands placed on data centers to provide fast data storage and access. Furthermore, high core count processors enable new services, deliver new applications in the enterprise, technical computing, storage, and cloud. Moreover, the Intel Xeon processors provide significant benefits for performance, power efficiency, security, and virtualization. Also, servers can be configured using Dual AMD Epyc processors, featuring 64 cores and 128 threads each, providing a total of 128 cores and 256 threads of ultra-fast processing power. Moreover, if your workload does not require this much processing power, we have servers that can be configured with a single AMD Epyc processor. 

2. High-Performance Storage 

Premio servers can be equipped with high-performance storage that includes NVMe storage, offering high storage throughput performance and application responsiveness. Servers equipped with NVMe storage are significantly faster than legacy solutions that still use SATA, and SAS interconnects. NVMe SSDs plug directly into a server’s PCIe Bus, providing a significant boost in performance as well as significantly lowers the latency when compared with an SSD that plugs into a SATA controller. As the amount of data being stored and accessed continues to increase, there is an increased need for faster processing and larger capacity high-speed data storage. 

3. High-speed Network Connectivity 

High-performance DPU servers have a ton of connectivity via Ethernet LAN ports that are located both onboard and the DPUs themselves. The motherboard itself comes with 2x Gigabit Ethernet ports and a single management portHowever, the amount of connectivity available depends on how many data processing units and regular NICs you configure your system with. Here are the performance specs of some of the most popular DPUs currently available: 

  • Xilinx Alveo U25 – 2x 25 Gigabit Ethernet Ports 
  • Xilinx Alveo U50 – 1x 100 Gigabit Ethernet Port 
  • Xilinx Alveo U 200 – 2x 100 Gigabit Ethernet Ports 
  • Xilinx Alveo U250 – 2x 100 Gigabit Ethernet Ports 
  • Xilinx Alveo U280 – 2x 100 Gigabit Ethernet Ports 
  • Nvidia Mellanox BlueField 2 – Dual Ports of 10/25/50/100 Gigabits or a single port of 200 Gigabits 
  • Silicom FPGA SmartNIC N5010 Series – 4x 100 Gigabit Ethernet Ports 

Note: We offer support for either SFP28 for 10/25 Gigabit connectivity or QFSP28 for 100/200 Gigabit connectivity options.  

Balanced PCIe Architecture    

DPU servers employ a throughput optimized configuration that balances storage and network bandwidth. So, even though the attached storage drives offer more speed than the network, the amount of data being processed from both ends would be equal, providing organizations with a well-balanced design where storage and network I/O is matched. Additionally, a balanced PCIe architecture offers balanced performance for codes that have high data parallelism and use both the CPUs and DPUs to process workloads, ensuring that your server performs optimally.  

What is the Difference Between CPUs, GPUs, and DPUs? 

CPUs are designed and built to perform a wide range of tasks as quickly as possible, making them very versatile. CPUs have large, broad instruction setthat direct the CPU to switch the relevant transistors in order for it to perform the task it needs to accomplish. On the other hand, GPUs do not have such a broad instruction set, but they do have an advantage when it comes to some applications over CPUs. GPUs have significantly more cores than CPUs. For example, the typical CPU processor has 4 to 10 cores with some server CPUs having up to 64 cores, whereas the typical GPU can have hundreds or thousands of cores that are smaller in sizeFor example, the RTX 3080 has more than 8700 cores. So, although CPUs are more intelligent than GPUs, the sheer number of GPU cores and the amount of parallelism that’s offered makes them a great option for performing applications that require a ton of mathematical calculations

Initially, GPUs (graphics processing units) were used to deliver rich, real-time graphics; however, observers figured out that GPUs could also accelerate specific applications. Applications that greatly benefit from GPUs include machine learning, deep learning, risk modeling, financial simulations, and many other scientific computations. Just as GPUs are capable of accelerating artificial intelligence workloads, DPUs are capable of accelerating network and storage functions by offloading them from the CPU to the DPU to perform them. DPUs are great for managing the movement of data through the data center. 

What Are the Most Common DPU Solutions? 

Nvidia has released a DPU named the Nvidia Mellanox BlueField 2 Data Processing Unit (DPU). Additionally, Nvidia has released the Mellanox BlueField 2X DPU, which has the same features as the BlueField 2 DPU with an added Ampere GPU. The Ampere GPU can be used to run artificial intelligence applications, such as security anomaly detection, to detect and prevent a network breach. If that’s not enough, Nvidia has already planned the release of two new DPUs, including the BlueField 3 for 2022 and the BlueField 4 for 2023. 

Furthermore, Intel and Xilinx have introduced their own DPUs into the space; however, they refer to data processing units as SmartNICs. Solutions from both Xilinx and Intel combine FPGAs with network interface controllers to accelerate network and storage functions, just as DPUs do. 

For example, Intel has partnered with Silicom to offer to provide the Silicom FPGA SmartNIC N5010, which is made by combining an Intel Stratix 10 FPGA with an Intel Ethernet 800 Series Adapter, providing plenty of bandwidth thanks to the inclusion of 4x 100 Gigabit Ethernet ports. That said, Xilinx offers the ALVEO Series of SmartNICs to boost data center performance levels by offloading network, storage, and compute functions to the Xilinx SmartNIC. The Xilinx Alveo U25 is based on an FPGA platform providing ultra-high throughput and low latency while avoiding unnecessary data movement and CPU processing. 

What Are Some Other Performance Accelerators Used in Data Center Servers? 

1. GPU (Graphics Processing Unit)

Source Credit (Nvidia)

Other performance accelerators commonly found in data centers include GPUs (graphics processing units), computational storage devices (CSDs), and FPGAs (field-programmable gate arrays). GPUs (graphics processing units) are often used in data center servers to accelerate complex mathematical workloads. GPUs are great at performing mathematical workloads thanks to the inclusion of thousands of small cores, allowing them to perform many tasks and computations in parallel. As such, graphics processing units are great for artificial intelligence, deep learning, machine learning, high-resolution video editing, medical imaging, and many other demanding workloads. 

2. Computational Storage Device (CSD)

Source Credit (Anandtech)

Computational storage is used in data centers as a performance accelerator. Computational storage accelerates server performance because it enables servers to process data at the storage device level, providing organizations with the ability to perform real-time data analysis with as little latency as possible, all while reducing input/output bottlenecks. Computational storage devices look similar to regular storage devices, but they are equipped with multi-core processors that process and analyze the data on the storage device itselfallowing an organization to extract valuable, actionable insights at the storage device level. Furthermore, equipping servers with computational storage devices results in a latency reduction since the data is processed and analyzed on the storage device itself in near-real-time.  Moreover, since the data does not need to remove and remains on the storage devices, this adds security by mitigating vulnerabilities. 

3. Field Programmable Gate Array (FPGA)

Source Credit (Xilinx)

FPGAs are integrated circuits made from logic blocks, I/O cells, and other resources that allow users to reprogram and reconfigure the chip in different arrangements according to the user’s specific requirements. FPGAs are heavily utilized to perform machine learning and deep learning workloads. Additionally, FPGAs are being used in SmartNICs to accelerate network functions by several orders of magnitude thanks to the massive parallelism possible from FPGAs thanks to the high performance, high-bandwidth, and high throughput offered by FPGAs. Overall, FPGA SmartNICs are similar to DPUs because they can offload network and storage functions from the server’s CPU and perform them on separate dedicated hardware, freeing up precious CPU processing power. 

The Bottom Line 

At this point, it should not surprise you that DPUs (data processing units) will become commonplace in data centers as the amount of data coming into data centers continues to increase, pushing data centers to maximize the performance of their systems to cope with the influx of data. DPUs allow data center operators to extract more performance from their servers by offloading storage and network functions from the host system to the data processing unit. Nvidia claims that a single Blue Field 2 DPU can handle the same data center services that would otherwise require 125 CPU cores. Premio has been manufacturing computers in the United States for over 30 years, and it provides a wide variety of high-performance DPU servers that can be customized according to your specific requirements. If you need assistance choosing a DPU server or customizing a solution, please contact us, and one of our server professionals will assist you with finding a solution that meets your specific requirements. 

Explore the Following DPU Accelerated Server 

Premio offers a variety of high-performance DPU servers; one of such servers is the Flache Streams DPU server. This server can be configured with up to 18 data processing units (DPUs), high-speed NVMe SSD storage, and Dual Intel Xeon processors, Dual AMD Epyc Processors, or Single AMD Epyc Processor. Furthermore, servers can be configured with high-speed NVMe SSD and regular SSD storage via SATA. Data processing units are added to offload some of the networking, storage, and security functions from the server’s main CPU to the DPU, freeing up precious processing power to run revenue-generating enterprise applications. For example, you can configure one of our DPU servers with multiple Mellanox BlueField 2 DPUs, PCIe NVMe storage, and Dual Intel Xeon SP (scalable processors), providing you with plenty of CPU processing power, high-speed solid-state storage, and a ton of network capabilities thanks to the inclusion of data processing units.