Using IPC Shared Memory

Umpire supports the use of Inter-Process Communication (IPC) Shared Memory on the HOST memory resource. IPC Shared Memory refers to the mechanisms that allow processes to communicate with each other and synchronize their actions and involves a method where multiple processes can access a common memory space.

To use Umpire’s IPC Shared Memory allocators, the UMPIRE_ENABLE_IPC_SHARED_MEMORY flag should be set to On. Note that you can use IPC Shared Memory with MPI enabled or disabled.

First, to get started with the shared memory allocator, set up the traits. For example:

auto traits{umpire::get_default_resource_traits("SHARED")};

The traits above is a struct of different properties for your shared allocator. You can set the maximum size of the allocator with traits.size and set the scope of the allocator.

For example, you can set the scope to socket:

traits.scope = umpire::MemoryResourceTraits::shared_scope::socket;

However, by default the scope will be set to “node”.

Next, create the shared memory allocator:

auto node_allocator{rm.makeResource("SHARED::node_allocator", traits)};

Note

The name of the Shared Memory allocators MUST have “SHARED” in the name. This will help Umpire distinguish the allocators as Shared Memory allocators. It is also used for discovery by other ranks on node.

Now you can allocate and deallocate shared memory with:

void* ptr{node_allocator.allocate("allocation_name_2", sizeof(uint64_t))};
...
node_allocator.deallocate(ptr);

Important Notes About Shared Memory

Because we are dealing with shared memory there are a few unique characteristics of the Shared Memory allocators which set it apart from other Umpire allocators.

  1. Once you allocate shared memory, that block of memory is fixed. If you need a bigger size, you will have to create a new one.

  2. If you want to see how much memory is available for a shared memory allocator, use the getActualSize() function.

  3. File descriptors are used for the shared memory. These files will be under /dev/shm.

  4. Although Umpire does not need to have MPI enabled in order to provide IPC Shared Memory, if users wish to associate shared memory with MPI communicators, Umpire will need to be built with MPI enabled.

There are a few helper functions provided in the Umpire.hpp header that will be useful when working with Shared Memory allocators. For example, you can grab the MPI communicator for a particular Shared Memory allocator with:

MPI_Comm shared_allocator_comm = umpire::get_communicator_for_allocator(node_allocator, MPI_COMM_WORLD);

Note that the node_allocator is the Shared Memory allocator we created above. Additionally, we can double check that an allocator has the SHARED memory resource by asserting:

UMPIRE_ASSERT(node_allocator.getAllocationStrategy()->getTraits().resource == umpire::MemoryResourceTraits::resource_type::shared);

You can see a full example here:

//////////////////////////////////////////////////////////////////////////////
// Copyright (c) 2016-24, Lawrence Livermore National Security, LLC and Umpire
// project contributors. See the COPYRIGHT file for details.
//
// SPDX-License-Identifier: (MIT)
//////////////////////////////////////////////////////////////////////////////

#include <chrono>
#include <iostream>
#include <string>
#include <thread>

#include "mpi.h"
#include "umpire/Allocator.hpp"
#include "umpire/ResourceManager.hpp"
#include "umpire/Umpire.hpp"
#include "umpire/config.hpp"
#include "umpire/resource/HostSharedMemoryResource.hpp"
#include "umpire/util/MemoryResourceTraits.hpp"

//
// For debugging purposes, this program uses the number of command line
// as a flag to indicate the mode it should run in.  The modes are:
//
// 1) If run with no argument (ac==1), it will run as an MPI program.
// 2) If run with 1 argument (ac==2), it will run as the parent non-mpi program.
// 3) If run with 2 arguments (ac==3), it will run as the child non-mpi program.
//
// This will allow someone to launch this program as a Parent and Child in two
// separate debugger session windows (possible in gdb or vscode).  When running
// in the debugger session windows, no MPI will be used and the debugger must
// be used for synchronization (by setting breakpoints).
//
int main(int ac, char** av)
{
  const bool use_mpi{ac == 1};
  const bool i_am_parent{ac == 2};

  if (use_mpi) {
    MPI_Init(&ac, &av);
  }

  auto& rm = umpire::ResourceManager::getInstance();

  //
  // Set up the traits for the allocator
  //
  auto traits{umpire::get_default_resource_traits("SHARED")};
  traits.size = 1 * 1024 * 1024; // Maximum size of this Allocator

  //
  // Default scope for allocator is NODE.  SOCKET is another option of interest
  //
  traits.scope = umpire::MemoryResourceTraits::shared_scope::node; // default

  //
  // Create (or attach to) the allocator
  //
  auto node_allocator{rm.makeResource("SHARED::node_allocator", traits)};

  //
  // Resource of this allocator is SHARED
  //
  UMPIRE_ASSERT(node_allocator.getAllocationStrategy()->getTraits().resource ==
                umpire::MemoryResourceTraits::resource_type::shared);

  //
  // Get communicator for this allocator
  //
  MPI_Comm shared_allocator_comm;
  int foreman_rank{0};
  int shared_rank{0};

  if (use_mpi) {
    shared_allocator_comm = umpire::get_communicator_for_allocator(node_allocator, MPI_COMM_WORLD);
    MPI_Comm_rank(shared_allocator_comm, &shared_rank);
  } else { // Running non-mpi in debugger
    shared_rank = i_am_parent ? foreman_rank : foreman_rank + 1;
  }

  //
  // Allocate shared memory
  //
  void* ptr{node_allocator.allocate("allocation_name_2", sizeof(uint64_t))};
  uint64_t* data{static_cast<uint64_t*>(ptr)};

  if (shared_rank == foreman_rank)
    *data = 0xDEADBEEF;

  if (use_mpi) {
    MPI_Barrier(shared_allocator_comm);
  } else {
    if (!i_am_parent) {
      shared_rank++; // Set a breakpoint here to synchronize
    }
  }

  UMPIRE_ASSERT(*data == 0xDEADBEEF);

  node_allocator.deallocate(ptr);

  if (use_mpi) {
    MPI_Finalize();
  }

  return 0;
}