Push the limits of your PCIe FPGA solutions with GPUDirect® RDMA
We bridge the FPGA and GPU worlds, enabling real-time execution of compute-intensive tasks such as AI, image and signal processing, or complex simulations with optimized latency and performance. This new cutting-edge capability, RDMA with NVIDIA GPUDirect®, is now integrated into our Development Kit (DK) delivering out-of-the-box acceleration to all customers.
What is GPUDirect® RDMA ?
To address the limitations of traditional data transfers, NVIDIA developed GPUDirect® RDMA, a technology that accelerates and simplifies data exchanges.
With RDMA (Remote Direct Memory Access), data flows directly between an FPGA card and GPU memory via PCI Express, bypassing the CPU and system memory. The result is dramatically reduced latency and unmatched data transfer efficiency.
This approach enables real-time processing and the use of advanced applications such as AI, image processing, and signal processing directly on incoming data.
Unlock the full potential of your data: gain speed, fluidity, and power with GPUDirect® RDMA.
In the standard DMA model, the FPGA acquires data in real time via its input interfaces (1) and temporarily stores it in its onboard DDR memory. This data is then transferred to the host system memory (2) using DMA. The CPU is then notified and copies the data into a GPU-accessible buffer (3), allowing the latter to start processing. Once the calculations have been performed, the results are sent back to the host memory (4), where they are retrieved by the CPU (5).
This approach results in additional memory copies and CPU intervention, which increases latency, unnecessarily consumes processor resources, and reduces overall system efficiency.
Standard DMA
GPUDirect® RDMA
Conversely, GPUDirect® RDMA allows the FPGA to transfer data directly from its onboard memory to the GPU memory via PCIe (A), bypassing the host CPU and system memory. This direct path eliminates redundant copies, reduces CPU involvement, and gives the GPU the ability to process incoming data with significantly lower latency. The calculated results are then sent back to host memory (B) and made available to the CPU (C).
This streamlined architecture reduces system overhead, lowers latency, and delivers higher throughput, making it particularly well-suited for real-time and high-performance applications.
Boost your performance by unleashing the full power of the GPU
GPUDirect® RDMA offloads computational tasks from the FPGA to the GPU, allowing the FPGA to focus on data acquisition and preprocessing. With their parallel architecture and high-bandwidth memory, GPUs provide fast and efficient processing of large data volumes while reducing the load on FPGA logic resources. For developers, this approach facilitates migration from firmware to software, leveraging CUDA® tools and the NVIDIA ecosystem. Combining a TECHWAY FPGA and an NVIDIA GPU results in a powerful, flexible, and cost-effective system optimized for real-time data and image processing.
TECHWAY PCIe Development Kit – Integrated GPUDirect® RDMA
TECHWAY integrates GPUDirect® RDMA as a native feature in its PCIe development kit, offering ultra-fast, ultra-low latency data transfers between FPGAs and GPUs. The entire range of TECHWAY products based on AMD/Xilinx Kintex-7 and UltraScale+ FPGAs benefit from this capability, ensuring immediate compatibility and scalability across the entire range.
With a unified PCIe architecture and common development kit, developers can easily migrate from one FPGA platform to another, accelerating integration and drastically reducing time-to-market. The TECHWAY driver, optimized for multi-GPU environments, enables powerful and scalable designs where multiple NVIDIA GPUs collaborate without throughput limitations.
Fully integrated with NVIDIA kernel modules and CUDA® tools, the development kit ensures continuous updates, long-term maintenance, and direct access to GPU resources via RDMA. With dedicated PCIe APIs to efficiently manage transfers, TECHWAY offers developers a turnkey solution that combines performance, simplicity, and sustainability.