Using Shared Memory Allocators

Umpire provides two different kinds of Shared Memory capabilities. First, Umpire provides Inter-Process Communication (IPC) Shared Memory which can be used with or without MPI. Secondly, Umpire provides MPI3 Shared Memory which requires MPI. Although both kinds of Shared Memory provide a convenient way to share memory across nodes/sockets, each type has a few unique characteristics and usage details which will be outlined in this section of the documentation.

IPC Shared Memory

Umpire supports the use of Inter-Process Communication (IPC) Shared Memory on the HOST memory resource. IPC Shared Memory refers to the mechanisms that allow processes to communicate with each other and synchronize their actions and involves a method where multiple processes can access a common memory space.

To use Umpire’s IPC Shared Memory allocators, the UMPIRE_ENABLE_IPC_SHARED_MEMORY flag should be set to On. Note that you can use IPC Shared Memory with MPI enabled or disabled.

First, to get started with the shared memory allocator, set up the traits. For example:

auto traits{umpire::get_default_resource_traits("SHARED::POSIX")};

The traits above is a struct of different properties for your shared allocator. You can set the maximum size of the allocator with traits.size and set the scope of the allocator.

For example, you can set the scope to socket:

traits.scope = umpire::MemoryResourceTraits::shared_scope::socket;

However, by default the scope will be set to “node”.

Next, create the shared memory allocator:

auto node_allocator{rm.makeResource("SHARED::node_allocator", traits)};

Note

The name of the Shared Memory allocators MUST have “SHARED” in the name. This will help Umpire distinguish the allocators as Shared Memory allocators. It is also used for discovery by other ranks on node.

Now you can allocate and deallocate shared memory with:

void* ptr{node_allocator.allocate("allocation_name_2", sizeof(uint64_t))};
...
node_allocator.deallocate(ptr);

Note

A name is required in order to allocate memory with IPC Shared Memory allocators. However, if that isn’t feasible, you can instead use the umpire::strategy::NamingShim strategy. This allows you to call allocate with only 1 argument for the size in bytes. Check out the cookbook recipe to learn more.

See the bottom of this page for a full example of how to use IPC Shared Memory Allocators with Umpire.

MPI3 Shared Memory

In addition to IPC Shared Memory, Umpire also supports MPI3 Shared Memory on the HOST memory resource. As the name suggests, this allocator uses the MPI3 API for its Shared Memory mechanisms that allow processes to communicate with each other and synchronize their actions.

To use Umpire’s MPI3 Shared Memory allocators, the UMPIRE_ENABLE_MPI3_SHARED_MEMORY flag should be set to On. Note that if you are using MPI3 Shared Memory, then MPI must be enabled.

To create an allocator with the MPI3 Shared Memory resource, you can do the following:

auto traits{umpire::get_default_resource_traits("SHARED::MPI3")};
auto node_allocator{rm.makeResource("SHARED::mpi3_alloc", traits)};

See the bottom of this page for a full example of how to use MPI3 Shared Memory Allocators with Umpire.

Using Both IPC and MPI3 Shared Memory Allocators

It is possible to enable both IPC and MPI3 Shared Memory Allocators at the same time.

To create these Shared Memory allocators, you can do the following:

auto mpi3_traits{umpire::get_default_resource_traits("SHARED::MPI3")};
// or
auto ipc_traits{umpire::get_default_resource_traits("SHARED::POSIX")};

// then create an allocator:
auto mpi3_node_allocator{rm.makeResource("SHARED::mpi3_alloc", traits)};
// or
auto ipc_node_allocator{rm.makeResource("SHARED::ipc_alloc", traits)};

// and allocate with
mpi3_node_allocator.allocate(1024 * sizeof(double));
// or
ipc_node_allocator.allocate("my_SHARED_alloc", 1024 * sizeof(double));

Note

It is best practice to use the full name, “SHARED::MPI3” or “SHARED::POSIX”, when setting up the traits for a shared memory allocator. However, when both IPC and MPI3 resources are enabled, using “SHARED” will default to the MPI3 memory resource. Additionally, the name used with the makeResource call could also just be “SHARED”, but it must include either the “SHARED” or the “SHARED::” prefix. Finally, while a name is not needed for MPI3 allocate calls, it is required for IPC allocations.

Full Shared Memory Examples

This section shows two full code examples, one for IPC Shared Memory and one for MPI3 Shared Memory.

The following example shows how to create, use, and destruct the IPC Shared Memory Allocator. (Can be used with or without MPI, as shown in the example). Note that this example could be easily adapted to the MPI3 Shared Memory type if needed.

//////////////////////////////////////////////////////////////////////////////
// Copyright (c) 2016-25, Lawrence Livermore National Security, LLC and Umpire
// project contributors. See the COPYRIGHT file for details.
//
// SPDX-License-Identifier: (MIT)
//////////////////////////////////////////////////////////////////////////////

#include <chrono>
#include <iostream>
#include <string>
#include <thread>

#include "mpi.h"
#include "umpire/Allocator.hpp"
#include "umpire/ResourceManager.hpp"
#include "umpire/Umpire.hpp"
#include "umpire/config.hpp"
#include "umpire/resource/HostSharedMemoryResource.hpp"
#include "umpire/strategy/NamedAllocationStrategy.hpp"
#include "umpire/util/MemoryResourceTraits.hpp"

//
// For debugging purposes, this program uses the number of command line
// as a flag to indicate the mode it should run in.  The modes are:
//
// 1) If run with no argument (ac==1), it will run as an MPI program.
// 2) If run with 1 argument (ac==2), it will run as the parent non-mpi program.
// 3) If run with 2 arguments (ac==3), it will run as the child non-mpi program.
//
// This will allow someone to launch this program as a Parent and Child in two
// separate debugger session windows (possible in gdb or vscode).  When running
// in the debugger session windows, no MPI will be used and the debugger must
// be used for synchronization (by setting breakpoints).
//
int main(int ac, char** av)
{
  const bool use_mpi{ac == 1};
  const bool i_am_parent{ac == 2};

  if (use_mpi) {
    MPI_Init(&ac, &av);
  }

  auto& rm = umpire::ResourceManager::getInstance();

  //
  // Set up the traits for the allocator
  //
  auto traits{umpire::get_default_resource_traits("SHARED")};
  traits.size = 1 * 1024 * 1024; // Maximum size of this Allocator

  //
  // Default scope for allocator is NODE.  SOCKET is another option of interest
  //
  traits.scope = umpire::MemoryResourceTraits::shared_scope::node; // default

  //
  // Create (or attach to) the allocator
  //
  auto node_allocator{rm.makeResource("SHARED::node_allocator", traits)};

  auto named_node_allocator{
      rm.makeAllocator<umpire::strategy::NamedAllocationStrategy>("My Node Allocator", node_allocator)};

  //
  // Resource of this allocator is SHARED
  //
  UMPIRE_ASSERT(node_allocator.getAllocationStrategy()->getTraits().resource ==
                umpire::MemoryResourceTraits::resource_type::shared);

  //
  // Get communicator for this allocator
  //
  MPI_Comm shared_allocator_comm;
  int foreman_rank{0};
  int shared_rank{0};

  if (use_mpi) {
    shared_allocator_comm = umpire::get_communicator_for_allocator(node_allocator, MPI_COMM_WORLD);
    MPI_Comm_rank(shared_allocator_comm, &shared_rank);
  } else { // Running non-mpi in debugger
    shared_rank = i_am_parent ? foreman_rank : foreman_rank + 1;
  }

  //
  // Allocate shared memory
  //
  void* ptr{node_allocator.allocate("allocation_name_2", sizeof(uint64_t))};
  void* ptr2{named_node_allocator.allocate("allocation two", 1024)};
  uint64_t* data{static_cast<uint64_t*>(ptr)};

  if (shared_rank == foreman_rank)
    *data = 0xDEADBEEF;

  if (use_mpi) {
    MPI_Barrier(shared_allocator_comm);
  } else {
    if (!i_am_parent) {
      shared_rank++; // Set a breakpoint here to synchronize
    }
  }

  UMPIRE_ASSERT(*data == 0xDEADBEEF);

  node_allocator.deallocate(ptr);
  named_node_allocator.deallocate(ptr2);

  if (use_mpi) {
    umpire::cleanup_cached_communicators(); // Frees the shared_allocator_comm created above
    MPI_Finalize();
  }

  return 0;
}

The following example shows how to create, use, and verify the MPI3 Shared Memory Allocator. Note that although a name is needed when when creating the MPI3 Shared Memory allocator, a name is not needed when allocating memory.

#include <mpi.h>

#include <iostream>

#include "umpire/Allocator.hpp"
#include "umpire/ResourceManager.hpp"
#include "umpire/Umpire.hpp"
#include "umpire/config.hpp"
#include "umpire/resource/HostMpi3SharedMemoryResource.hpp"
#include "umpire/strategy/NamedAllocationStrategy.hpp"
#include "umpire/util/MemoryResourceTraits.hpp"

int main(int argc, char** argv)
{
  MPI_Init(&argc, &argv);

  auto& rm = umpire::ResourceManager::getInstance();

  // Use MPI3 shared memory resource
  // Note: Could also use "SHARED"
  auto traits = umpire::get_default_resource_traits("SHARED::MPI3");
  traits.size = 1 * 1024 * 1024; // 1 MB

  // Node scope is required for mpi3 shared memory
  traits.scope = umpire::MemoryResourceTraits::shared_scope::node;

  // Create allocator using MPI3 shared memory
  auto mpi3_shm_allocator = rm.makeResource("SHARED::mpi3_alloc", traits);

  // Get communicator for the allocator
  MPI_Comm shm_comm = umpire::get_communicator_for_allocator(mpi3_shm_allocator, MPI_COMM_WORLD);

  int rank = 0;
  MPI_Comm_rank(shm_comm, &rank);

  // Allocate shared memory, doesn't need a name for allocation
  uint64_t* data = static_cast<uint64_t*>(mpi3_shm_allocator.allocate(sizeof(uint64_t)));

  if (rank == 0) {
    *data = 0xCAFEBABE;
  }

  MPI_Barrier(shm_comm);

  // All ranks should see the same value
  std::cout << "Rank " << rank << " sees value: " << std::hex << *data << std::endl;

  mpi3_shm_allocator.deallocate(data);

  umpire::cleanup_cached_communicators();
  MPI_Finalize();

  return 0;
}