.. _device_ipc: =========================== Sharing GPU Memory with IPC =========================== Umpire provides a ``DeviceIpcAllocator`` strategy that allows GPU memory to be shared between processes on the same node using CUDA or HIP IPC handles. This strategy uses the shared memory infrastructure in Umpire to coordinate the sharing of GPU memory between processes. How It Works ------------ The ``DeviceIpcAllocator`` works by: 1. Having one process (the "leader", determined by the "scope"-local MPI rank 0) physically allocate GPU memory 2. The leader process gets an IPC handle for that memory and stores it in CPU shared memory 3. Other processes retrieve the IPC handle from CPU shared memory and import the GPU memory 4. MPI barriers are used to synchronize between processes and ensure safe access This allows multiple processes to share GPU memory efficiently, which is useful for multi-process applications that need to operate on the same data. Using DeviceIpcAllocator ------------------------ A ``DeviceIpcAllocator`` can be created using the Umpire ResourceMangager. It doesn't require any additional arguments apart from a name. By default, device memory will be allocated using the "DEVICE" resource. Optionally, you can provide a device allocator, as well as a "scope" argument that determines who will share each GPU allocation. Here's an example: .. literalinclude:: ../../../examples/cookbook/recipe_device_ipc.cpp :language: cpp The scope argument can either be "socket" or "node". The "socket" scope means that all processes using the same socket (as identified by the PCI address of the GPU) will share the GPU memory, while the "node" scope means that all processes on the same node (as determined by MPI_COMM_TYPE_SHARED) will share the GPU memory. Limitations ----------- - Requires UMPIRE_ENABLE_MPI to be enabled - Requires UMPIRE_ENABLE_IPC_SHARED_MEMORY to be enabled