Using Shared Memory Allocators¶
Umpire provides two different kinds of Shared Memory capabilities. First, Umpire provides Inter-Process Communication (IPC) Shared Memory which can be used with or without MPI. Secondly, Umpire provides MPI3 Shared Memory which requires MPI3. Although both kinds of Shared Memory provide a convenient way to share memory across nodes/sockets, each type has a few unique characteristics and usage details which will be outlined in this section of the documentation.
IPC Shared Memory¶
Umpire supports the use of Inter-Process Communication (IPC) Shared Memory on the HOST memory resource. IPC Shared Memory refers to the mechanisms that allow processes to communicate with each other and synchronize their actions and involves a method where multiple processes can access a common memory space.
To use Umpire’s IPC Shared Memory allocators, the UMPIRE_ENABLE_IPC_SHARED_MEMORY flag
should be set to On. Note that you can use IPC Shared Memory with MPI enabled or disabled.
First, to get started with the shared memory allocator, set up the traits. For example:
auto traits{umpire::get_default_resource_traits("SHARED")};
The traits above is a struct of different properties for your shared allocator. You can
set the maximum size of the allocator with traits.size and set the scope of the allocator.
For example, you can set the scope to socket:
traits.scope = umpire::MemoryResourceTraits::shared_scope::socket;
However, by default the scope will be set to “node”.
Next, create the shared memory allocator:
auto node_allocator{rm.makeResource("SHARED::node_allocator", traits)};
Note
The name of the Shared Memory allocators MUST have “SHARED” in the name. This will help Umpire distinguish the allocators as Shared Memory allocators. It is also used for discovery by other ranks on node.
Now you can allocate and deallocate shared memory with:
void* ptr{node_allocator.allocate("allocation_name_2", sizeof(uint64_t))};
...
node_allocator.deallocate(ptr);
Note
A name is required in order to allocate memory with IPC Shared Memory allocators. However, if that isn’t feasible, you
can instead use the umpire::strategy::NamingShim strategy. This allows you to call allocate with only 1 argument
for the size in bytes. Check out the cookbook recipe to learn more.
See the bottom of this page for a full example of how to use Shared Memory Allocators with Umpire.
MPI3 Shared Memory¶
In addition to IPC Shared Memory, Umpire also supports MPI3 Shared Memory on the HOST memory resource. As the name suggests, this allocator uses the MPI3 API for its Shared Memory mechanisms that allow processes to communicate with each other and synchronize their actions.
To use Umpire’s MPI3 Shared Memory allocators, the UMPIRE_ENABLE_MPI3_SHARED_MEMORY flag
should be set to On. Note that if you are using MPI3 Shared Memory, then MPI must be enabled.
See the bottom of this page for a full example of how to use Shared Memory Allocators with Umpire.
Full IPC Shared Memory Recipe¶
The following example shows how to create, use, and destruct the IPC Shared Memory Allocator whether using MPI or not. For an example of using MPI3 Shared Memory Allocators, users could simply update the included header file. No other code changes are necessary assuming the input parameter specifies MPI for the following example code.
//////////////////////////////////////////////////////////////////////////////
// Copyright (c) 2016-25, Lawrence Livermore National Security, LLC and Umpire
// project contributors. See the COPYRIGHT file for details.
//
// SPDX-License-Identifier: (MIT)
//////////////////////////////////////////////////////////////////////////////
#include <chrono>
#include <iostream>
#include <string>
#include <thread>
#include "mpi.h"
#include "umpire/Allocator.hpp"
#include "umpire/ResourceManager.hpp"
#include "umpire/Umpire.hpp"
#include "umpire/config.hpp"
#include "umpire/resource/HostSharedMemoryResource.hpp"
#include "umpire/strategy/NamedAllocationStrategy.hpp"
#include "umpire/util/MemoryResourceTraits.hpp"
//
// For debugging purposes, this program uses the number of command line
// as a flag to indicate the mode it should run in. The modes are:
//
// 1) If run with no argument (ac==1), it will run as an MPI program.
// 2) If run with 1 argument (ac==2), it will run as the parent non-mpi program.
// 3) If run with 2 arguments (ac==3), it will run as the child non-mpi program.
//
// This will allow someone to launch this program as a Parent and Child in two
// separate debugger session windows (possible in gdb or vscode). When running
// in the debugger session windows, no MPI will be used and the debugger must
// be used for synchronization (by setting breakpoints).
//
int main(int ac, char** av)
{
const bool use_mpi{ac == 1};
const bool i_am_parent{ac == 2};
if (use_mpi) {
MPI_Init(&ac, &av);
}
auto& rm = umpire::ResourceManager::getInstance();
//
// Set up the traits for the allocator
//
auto traits{umpire::get_default_resource_traits("SHARED")};
traits.size = 1 * 1024 * 1024; // Maximum size of this Allocator
//
// Default scope for allocator is NODE. SOCKET is another option of interest
//
traits.scope = umpire::MemoryResourceTraits::shared_scope::node; // default
//
// Create (or attach to) the allocator
//
auto node_allocator{rm.makeResource("SHARED::node_allocator", traits)};
auto named_node_allocator{
rm.makeAllocator<umpire::strategy::NamedAllocationStrategy>("My Node Allocator", node_allocator)};
//
// Resource of this allocator is SHARED
//
UMPIRE_ASSERT(node_allocator.getAllocationStrategy()->getTraits().resource ==
umpire::MemoryResourceTraits::resource_type::shared);
//
// Get communicator for this allocator
//
MPI_Comm shared_allocator_comm;
int foreman_rank{0};
int shared_rank{0};
if (use_mpi) {
shared_allocator_comm = umpire::get_communicator_for_allocator(node_allocator, MPI_COMM_WORLD);
MPI_Comm_rank(shared_allocator_comm, &shared_rank);
} else { // Running non-mpi in debugger
shared_rank = i_am_parent ? foreman_rank : foreman_rank + 1;
}
//
// Allocate shared memory
//
void* ptr{node_allocator.allocate("allocation_name_2", sizeof(uint64_t))};
void* ptr2{named_node_allocator.allocate("allocation two", 1024)};
uint64_t* data{static_cast<uint64_t*>(ptr)};
if (shared_rank == foreman_rank)
*data = 0xDEADBEEF;
if (use_mpi) {
MPI_Barrier(shared_allocator_comm);
} else {
if (!i_am_parent) {
shared_rank++; // Set a breakpoint here to synchronize
}
}
UMPIRE_ASSERT(*data == 0xDEADBEEF);
node_allocator.deallocate(ptr);
named_node_allocator.deallocate(ptr2);
if (use_mpi) {
umpire::cleanup_cached_communicators(); // Frees the shared_allocator_comm created above
MPI_Finalize();
}
return 0;
}