Data Insights from the Cloud And Out to the Edge

As the demand from enterprise cloud providers require performance driven processing, storage, and connectivity, purpose-built hardware servers will change how traditional scale-out architectures access, transfer, and store enormous amounts of data. New hardware architectures require the latest technologies in order to meet the modern demands in the data center and out to the edge. Legacy rackmount server designs are rapidly changing in order to enable faster, and more efficient processing of real-time data that is being used for a new wave of machine intelligence.

DPU accelerated servers bring a convergence of the latest technologies in data center architectures that enable new performance benchmarks for incredible compute power, efficiency, and scalability from the data center network and out to the edge.

Data Processing. Performance Acceleration Matters

Performance Increase

High-performance Computing (HPC)

Industry 4.0 and smart productions icon set smart industrial revolution, automation, robot assistants, cloud and innovation. stock illustration [Converted]

Machine Learning


Deep Learning


Data Analytics Web 2.0


Cloud Management


Massive Data Storage

FlacheStreams DPU Accelerated Server

The FlacheStreams DPU accelerated rackmount server is designed to provide high-performance results that enables new architectures from the latest technologies in CPUs, GPUs, and DPUs. This purpose-built server addresses the most complex data center workloads in today’s modern-day infrastructures for public, private, and hybrid cloud models that require a balance of incredible hardware acceleration.


  • 2U Rackmountable Chassis
  • 12 Unique Hot-Swappable Carrier Trays for DPU Cards
  • Single AMD EPYC or Dual AMD EPYC or Intel Scalable Processors


  • Up to 36x 25Gb/s Network Accelerator Ports
  • 2x 96lane PCIe Switch Board for Peer-to-Peer Communication
  • 2x GbE RJ45 LAN ports, 1x IPMI Management RJ45 port


  • Scales up to 18 DPUs or 14 DPUs + 2 Full-Height Double Width 350W GPGPUs
  • Supports up to 300TB+ of NVMe SSD Storage
  • CPU Cores: From 8 cores to 128 cores
  • Up to 18x 8 Armv8 A72 cores (64-bit) DPU Accelerators

Hardware Accelerations:

  • Compression and decompression acceleration
  • NVMe-oF™ acceleration
  • SR-IOV
  • TCP/IP Transport Offloads

Product Collection:

FlacheSAN2N12C-DM 2U 12 Dual AMD EPYC DPU Server

  • 2x AMD EPYC Milan CPUs up to 128 cores
  • 4TB RDIMM 3200MHz
  • 16x PCIe Add-in-Card Slots

FlacheSAN2N12C-UM 2U 12 Single AMD EPYC DPU Server

  • 1x AMD EPYC Milan CPUs up to 64 cores
  • 4TB RDIMM 3200MHz
  • 16x PCIe Add-in-Card Slots

FlacheSAN2N12C-D5 2U 12 Dual Intel Scalable Processors DPU Server

  • 2x Intel Scalable CPUs
  • 16DIMMS RDIMM 2933MHz
  • 16x PCIe Add-in-Card Slots

How Can a DPU Accelerated Server Be Configured?

Network Application:

Supports SmartNIC/FPGA/Accelerators to offload workload from CPU and increase network throughput and decrease latency.

Storage Application:

12x PCIe 3.0 x8 Add-on-Card slots for NVMe SSD devices and store up to 300TB of data.

Computational Application:

128x AMD EPYC cores (2x 64 cores), 2x double width GPUs, and up to 18x DPUs Computation SSDs enables customer to run the most computational demanding applications.

HPC Application:

With two 96 lane switch boards that fan out to 12x PCIe x8 Add-on-Cards and 4x PCIe x16 Add-on-Cards, the system allows SmartNICs and SSDs to communicate with each other via peer-to-peer DMA. By doing so, it eliminates unnecessary data movements to decrease latency and increase computational resources.

Hardware Acceleration in the Data Center

Data centers today are still predominantly powered by x86 CPU-based computing. However, with Moore’s Law proving to be more of a challenge for semiconductor innovation, data center infrastructures require newer technologies and methods to extract performance from existing hardware. Today’s data center solutions now rely on a variety of performance acceleration technologies that address the computational demands for faster processing, storage, network, and security in AI-driven applications.

  • X86 servers accounted for $20.93 billion of revenues in Q4 2020 – 92.8 percent of all server revenues (IDC)
  • Modern data center infrastructures use hardware technologies in CPUs, GPUs, DPUs, and FPGAs for performance driven scale-out architecture

First, GPUs were used to accelerate AI workloads. Now computational storage devices and data processing units are making their way into data centers for better resource utilizations across hardware architectures. CPUs, GPU, and now DPUs accelerate workloads and ease the overhead from traditional processors, freeing up valuable CPU resources for other critical applications. Today, popular data center performance accelerators include multi-core processors, GPUs (graphics processing units), computational storage devices, and DPUs (data processing units) all working together to increase performance and eliminate unnecessary bottlenecks.

General Purpose Powerhouse: X86 Multi-Core Processors

Multi-core processors are extremely popular in data centers because the more cores a data center server CPU has, the more program instructions it can execute per cycle. As such, most data center servers are equipped with multi-core CPUs, accelerating a servers’ performance by providing more CPU processing power for enterprise applications.

Deep Learning and Machine Intelligence: GPUs (Graphics Processing Units)

GPUs are popular in data centers as performance accelerators because GPUs are great for performing AI workloads, such as machine learning and deep learning training. This is so because they are equipped with thousands of cores, enabling them to perform many computations in parallel, significantly reducing the amount of time it takes to complete complex workloads.

Computational NAND Flash Storage: Blazing Fast NVMe

High performance servers can be configured with computational storage devices to accelerate storage functions by offloading data processing from a server’s CPU to the storage device without relying on host CPU resources. This is made possible with processing and memory technologies built-in on the storage level, allowing servers to process data in real-time with ultra-low latency.

A SmarterNIC: DPUs (Data Processing Unit)

Data processing units are making their way into data center servers because of their ability to offload storage, networking, and security functions from the host server’s CPU to the DPU, freeing up CPU cycles for running enterprise applications and the operating system.

Driving Trends That Require New Solutions

The Problem: Bottleneck of CPU resources being wasted to manage other critical data processing and storage in the data center.

Solutions: SmartNICs, FPGAs, GPUs, and now DPUs are shaping new hardware architectures for incredible performance and ultra-low latency.

The rapid growth of data generated by 5G and PCIe Gen 4 has led to edge computing becoming necessary to process the amount of data. Specific add-on cards, known as SmartNIC FPGAs and DPUs, help to alleviate the workload by focusing on processing network tasks so that the CPU has better utilization for more general purpose workloads. Tune in to Premio’s Rugged Edge Survival Podcast to learn about the benefits of DPUs.

These [DPUs] are going to represent one of the three major pillars of computing going forward. The CPU is for general-purpose computing, the GPU is for accelerated computing, and the DPU, which moves data around the data center, for data processing.

- NVIDIA CEO Jensen Huang

What Are DPUs (Data Processing Units) Based SmartNics?

DPUs, short for data processing units, are a new class of programmable processor that’s often combined with a high-performance network interface. By offloading how the host server CPU processes its network data directly to the DPU, this frees up precious CPU resources to manage other mission critical applications in the data center. For example, DPUs are optimized to accelerate networking, storage, and security management functions directly on the network interface card.

DPUs can free up 30% to 50% utilization from host CPU

DPUs act as SmartNICs for efficient gate-keeping of data management and workloads

DPUs can bypass the CPU and decipher data required for GPUs acceleration in deep learning applications.

PCIe Gen 4 Support

DPUs support the PCIe 4.0 Peripheral Component Interconnect Express (PCIe), which is a standard interface that connects high-end accelerators to your server.
Learn More ...

  • DPUs are made from a few primary components that include:
  • A high performance, software programmable multi-core ARM processing tightly coupled to other SoC component

  • A high-performance network interface that allows the DPU to parse, process, and efficiently move data through a network

  • A rich set of flexible, programmable acceleration engines that offload network, storage, security, and AI workloads from the server’s CPU to the DPU

DPU Features

Data Security

High-speed networking connectivity from 100GbE to 200 GbE

High-Speed Packet Processing Without CPU

Powerful Multi-Core Processor

Performance Accelerators

Data and Local Storage Management

Why Are DPUs Important for Data Centers and Edge Computing Infrastructure?

DPUs are important for data centers and edge networks because they can free up valuable server CPU resources for other general-purpose processing. DPU smartNics also offload specific network bottlenecks by managing data packets without any interaction with the server host CPUs. Many new edge computing applications also require powerful hardware acceleration of incoming data for real-time machine learning and training of deep learning algorithms.

Network Virtualization

Supports network virtualization more efficiently compared to x86 CPUs. Offload network workload from CPU, such as SRIOV, vFirewall, OVS, and overlay network traffic encapsulation.

Packet Pacing

Enables smoother delivery of content by limiting outbound packets to ensure stabler network traffic while still retaining the same throughput and quality

Increase 5G Traffic Routing Performance

Supports vRAN, front-haul IO, and forward error correction, precision time stamping, eCPRI windowing, and real-time transmission hardware acceleration

Trusted SmartNIC, FPGA, and DPU Partners

Nvidia offers the Nvidia Mellanox BlueField 2 data processing unit (DPU) that combines industry-leading ConnectX network adapter, powerful multi-core processor, and a number of other performance accelerators into a single packaging that’s re-programmable, allowing organizations to reprogram the device to run the latest algorithms.
Learn More ...

Xilinx offers the Alveo Series of SmartNIC cards that are based on FPGAs, enabling hardware acceleration and avoiding unnecessary data movement. The Xilinx Alveo is capable of accelerating compute-intensive applications, including machine learning inference, data analytics, video transcoding, and many other workloads
Learn More ...

Why Choose Premio For DPU Acceleration Servers

Expertise in the Design, Engineering and Manufacturing (ODM) of server hardware for key enterprise markets

  • Server products purpose-built for workloads in High-performance Compute, Scale-Out Data Storage, Machine and Deep Learning, Public and Private Cloud Management
  • 30+ years of extensive design expertise in server and storage architecture for performance, storage, and high-speed connectivity
  • Global turnkey manufacturing and support infrastructure in the USA to accelerate scalable mass deployments in server and storage solutions
  • Deep understanding of IoT and data center technologies in computation, storage, and connectivity designed for performance workloads
  • Regulatory testing and compliance options for server and storage solutions in the North America Markets

Frequently Asked Questions (FAQs)

Just like GPUs were used to offload mathematically intensive tasks from the CPU to the GPU, DPUs (data processing units) are used to offload network and storage functions from the CPU to the DPU, freeing up valuable CPU resources from tackling mundane network and storage functions.

Think of the CPU as the brains of the computer, dedicated to handle general purpose computing. GPUs have their place in the data center because of their ability to accelerate certain workloads, such as deep learning for artificial intelligence due to the parallelism computing that they offer. DPUs are now making their way into data centers because of their abilities to process and decipher mission-critical data without using the host CPU resources, saving time and reducing latency.

Not all servers are configured with DPUs. In fact, DPU acceleration servers are now being introduced into data centers because of greater demands for accelerating network and storage functions carried out by servers. Depending on the workload and application, DPUs can provide a huge benefit for modern data center infrastructure.

The main benefits of servers outfitted with DPUs is that they are capable of offloading data parsing, data processing, and data transfer from the host server’s CPU to the GPU, freeing up the CPU for a variety of enterprise applications.

DPUs are equipped with powerful multi cores processors and acceleration engines that are great for AI, machine learning, security, telecommunication, storage, and many other modern day data center applications.