Best Enterprise DPUs (Data Processing Units), SmartNICs, and FPGAs


Best Enterprise DPUs (Data Processing Units) 

Data processing units and SmartNICs are revolutionizing data centers across the globe because of their ability to speed up servers, allowing organizations to extract more performance from existing hardware by offload networking, storage, and security functions from a server’s CPU to the DPU. When it comes to terminology, DPUs and SmartNICs are incredibly similar, but organizations have not agreed on a naming scheme for these solutions, with some organizations calling them SmartNICs and others referring to them as DPUs. So, you might be wondering, what are the best DPUs currently on the market? We will introduce you to some of the best enterprise DPUs (data processing units) currently offered by the different players currently offering solutions: 

1. Xilinx Alveo SmartNIC

Source: Xilinx

Xilinx offers the Alveo Series of SmartNICs is a platform that’s based on FPGAs, enabling hardware acceleration and avoiding unnecessary data movement. The Xilinx Alveo is capable of accelerating compute-intensive applications, including machine learning inference, data analytics, video transcoding, and many other workloads. Xilinx estimates that the Alveo Series is capable of 90X higher performance for these workloads than a CPU is capable of performing. The Alveo Series of SmartNICis adaptable thanks to the ability of users to reprogram them according to the specific requirements of the user. This allows users to accelerate any workload without having to change hardware, reducing the total cost of ownership. Re-programmability is extremely important, especially when considering that algorithms evolve quicker than silicon design cycles, necessitating re-programmable hardware that can adapt to the changing algorithms. One of the DPUs/SmartNICs offered by Xilinx is the Alveo U250, which can be configured with up to 64 Gigabytes of ECC RAM, 2x 100 Gigabit RJ45 Ethernet ports, and the ability to connect to 16 Lanes of PCI Express 3.0, operating at up to 8 GT/s. The Alveo U250 is capable of offloading common compute-intensive functions such as data processing, networking, and security functions from the host server’s CPU to the DPU. 

Xilinx Alveo U25 

  • 2x 10 Gigabit or 2x 25 Gigabit Ethernet  
  • RAM 2GB to 4GB DDR4-2400 
  • PCIe Gen 3 x16 
  • 75W TDP 
  • Passively cooled
  • Support for PXE and UEFI 

Xilinx Alveo U50 

  • 1x 100 Gigabit Ethernet 
  • HBM2 – 8GB Capacity 
  • PCIe Gen 3 x16 
  • 75W TDP 
  • Passively cooled 
  • Support for PXE and UEFI 

Xilinx Alveo U200 

  • 2x 100 Gigabit Ethernet  
  • 64GB DDR4 RAM 
  • PCIe Gen 3 x16 
  • 225W TDP 
  • Active cooling 
  • Support for PXE and UEFI 

Xilinx Alveo U250 

  • 2x 100 Gigabit Ethernet  
  • 64GB DDR4 RAM 
  • PCIe Gen 3 x16 
  • 225W TDP 
  • Active cooling 
  • Support for PXE and UEFI 

2. Nvidia Mellanox BlueField 2 DPU

Source: Nvidia

Nvidia offers the Nvidia Mellanox BlueField 2 data processing unit that combines industry-leading ConnectX network adapter, powerful multi-core processor, and a number of other performance accelerators into a single packaging that’s re-programmable, allowing organizations to reprogram the device to run the latest algorithmsThe BlueField 2 can offload some of the main CPU functions from the server’s CPU to the DPU, overcoming bottlenecks and freeing up CPU cycles for revenue-generating enterprise applications. The BlueField 2 DPU comes equipped with 8 ARMv8 cores, DDR4 RAM, and Intelligent Ethernet Adapter, supporting 10/25/50/56/100 Gigabit/s connectivity and up to 32 lanes of PCIe Gen 3.0/4.0. The Blue Field 2 DPU can take on storage applications such as All-Flash Arrays, data compression, data decompression, and deduplication. 

Additionally, this DPU can handle storage controller tasks by offloading them from the main host CPU to the data processing unit itself. Furthermore, it’s equipped with RDMA-based that provides remote storage access performance equal to that of local storage with minimal PC overhead. The BlueField DPU really shines in the high-performance network interface, enabling the DPU to parse, process, and transfer data, speeding up the rest of the network. Moreover, the Mellanox BlueField 2 is equipped with acceleration engines that can offload and accelerate AI tasks, such as machine learning and deep learning, storage, and telecommunications, freeing up a server’s CPU to perform other revenue-generating tasks. 

Nvidia Mellanox BlueField 2 DPU Specs 

  • Dual Ports of 10/25/50/100 Gigabits or 1x 200 Gigabit Ethernet Port 
  • 8GB or 16GB of on-board DDR4 RAM w/ ECC Support 
  • 8 or 16 Lanes of PCIe Gen 4.0 Connectivity 
  • 8 ARMv8 Cores 
  • Secure Boot 
  • Remote Boot 

3. Silicom FPGA SmartNIC N5010

Source (Silicom USA)

The Silicom FPGA SmartNIC N5010 is a SmartNIC that is built with the Intel Stratix 10 FPGA. The N5010 is a high-performance accelerator card that can be optimized to process packets and manage traffic. The N5010 supports 4x 100 Gigabit/s for a total of up to 500Gbps. This SmartNIC can be used to accelerate a wide variety of functions, including functions in telecommunication infrastructureFor example, this Silicom SmartNIC can be used to improve and accelerate 5G network services by offloading CPU-intensive tasks froa server’s CPU to the SmartNIC, and such tasks include network functions, security features, and telemetry featuresBy offloading tasks from the main system’s CPU to the SmartNIC, all of the server’s cores can focus on processing value-added services for 5G applications, such as deep packet inspection, endpoint detection, adaptive bitrate streaming, and other applications. Overall, FPGA-based SmartNICs give communication service providers the flexibility to deliver new features because of their ability to adapt to the ever-changing communication needs at affordable prices. 

Silicom FPGA SmartNIC N5010 Specs 

  • 4x 100 Gigabit Ethernet  
  • 32GB DDR4 RAM with ECC 
  • HBM 8GB 
  • PCIe Gen 4.0 x16 
  • Passive or Active Cooling  
  • 225 Watt TDP 

4. Broadcom Stingray SmartNIC


Source (Broadcom)

The Broadcom Stingray combines a powerful network controller, a high-performance ARM CPU, PCI Express 3.0, performance accelerators, and DDR4 RAM to offload compute-intensive applications from a host server’s CPU to the SmartNICStingray is capable of delivering a high packet rate and low latency. The Stingray performance accelerators allow it to perform powerful packet inspection and provide it with processing power in the hardware itself, offering organizations the ability to move common flow-processing workloads from the server’s CPU to the SmartNIC itself, freeing up the server’s CPU to run revenue-generating applications.  

Stingray SmartNIC Specs 

  • 8x ARM A72 Cores at 3.00GHz 
  • 1x 100 Gigabit Ethernet 
  • 8GB or 16GB of DDR4 RAM 
  • Support for PCIe Gen 3.0 x8 
  • Cryptography engine 
  • Secure boot 

5. Marvell Octeon Liquid I/O III SmartNIC

 Source (Marvell)

Marvell offers the LiquidIO III SmartNIC for inline network and security acceleration. The LiquidIO III offers support for a full networking software stack based on Linux and DPDK. The LiquidIO SmartNIC adopts the PCI Express form factor, enabling data centers to offload and accelerate certain workloads in the data center. The included network adapter has the ability to manage, schedule, steer, and prioritize traffic based on que management, packet marking, congestion notification, and priority-based scheduling. The Liquid IO III is capable of offloading various functions from the host system’s CPU to the SmartNIC, freeing up precious CPU Cycles for other high-priority applications. Additionally, it can offload and accelerate crypto operations, packet processing, security protocols, virtual switch, traffic management, and tunneling functions. 

LiquidIO III Specs 

  • Multicore-processor with 36 ARMv8 Cores at 2.2GHz 
  • 16GB DDR4 + ECC RAM  
  • Up to 5x 100 Gigabit Ethernet Ports or 2x 50 Gigabit Ethernet Ports 
  • Support for PCIe Gen 4.0 x16 

6. Fungible DPU

Source (Fungible)

The Fungible DPU is designed to accelerate the processing of data-centric workloads within data centers. The Fungible DPU functions as a data traffic controller, moving network traffic to CPUs and GPUs. This data processing unit enables high speed data center fabric between DPU enabled compute and storage servers. The main benefit of the Fungible DPU is that it allows data centers to disassociate compute and storage elements, removing the physical limitation of servers, allowing data center resources to be pooled and aggregated dynamically over high-speed data fabric. Overall, the Fungible DPU is ideal for dynamic composability and resource pooling over True Fabric for CPUs, FPGAs, GPUs, SSDs, and HDDs, enabling such resources to be shared among many remote servers over a secure, low-latency True Fabric. 

Fungible F1 DPU Specifications 

  • Latest generation MIPS6452 Cores @ 1.6GHz 
  • 64 PCIe Gen3/Gen4 Lanes 
  • 16 Dual Mode Configurable Controllers 
  • Programmable DMA Engines 
  • 8GB HBM Modules with ECC 
  • Two Channel DDR4 with ECC 
  • 10x100GE, 10x 40GE, 20x 50GE, 40x 25GE, or 40x 10GE ports 

Bottom Line

As you can tell from this blog post, there are many DPUs (data processing units) for you to choose from. When choosing a DPU, you should consider the core count, acceleration engines, RAM, and Ethernet connectivity of the solution. Keep in mind that servers can be equipped with several DPUs, so if the power of a single DPU is not enough, you can configure your DPU server with additional DPUs. If you need assistance selecting a DPU servers, you should contact our DPU server professionals and they will assist you with choosing a solution that meets your specific requirements. Furthermore, Premio has been manufacturing servers and embedded computing solutions in the United States for over 30 years, and so they have a proven track record of providing reliable and robust computing solutions.