Device Allocators¶
The DeviceAllocator is designed for memory allocations on the GPU. Currently there is only support for CUDA, although HIP support is coming soon.
Creating a Device Allocator¶
To create a DeviceAllocator, users can call the umpire::make_device_allocator
host function.
This function takes an allocator, the total amount of memory the DeviceAllocator will have, and a name
for the new DeviceAllocator object, as shown below. Currently, a maximum of 64 unique DeviceAllocators can be
created at a time.
auto& rm = umpire::ResourceManager::getInstance();
auto allocator = rm.getAllocator("UM");
auto device_allocator = umpire::make_device_allocator(allocator, sizeof(double), "my_device_alloc");
When the DeviceAllocator is created, the size
parameter that is passed to the umpire::make_device_allocator
function is the total memory, in bytes, available to that allocator. Whenever the allocate
function is
called on the GPU, it is simply atomically incrementing a counter which offsets a pointer to the start of that
memory.
Retrieving a DeviceAllocator Object¶
The umpire::get_device_allocator
host/device function returns the DeviceAllocator object that corresponds
to the ID or name given. The DeviceAllocator class also includes a helper function, umpire::is_device_allocator
,
to query whether or not a given ID corresponds to an existing DeviceAllocator. Below is an example of using the ID to
obtain the DeviceAllocator object:
umpire::DeviceAllocator alloc = umpire::get_device_allocator(0);
And next is an example of using the name instead:
umpire::DeviceAllocator alloc = umpire::get_device_allocator("my_device_alloc");
With the umpire::get_device_allocator
function, there is no need to keep track of a DeviceAllocator, since function
call stacks can become quite complex. Users can instead use this function to obtain it inside whichever host or device
function they need.
Under the hood, the umpire::get_device_allocator
uses global arrays which can be accessed by both
the host and device. The global array is indexed by DeviceAllocator ID, which is returned by DeviceAllocator::getID()
.
Because we are using global arrays on host and device, the arrays need to be “set up” after at least one DeviceAllocator
has been created, but before any kernels which use a DeviceAllocator are called. This process is done by calling
the UMPIRE_SET_UP_DEVICE_ALLOCS()
macro. This just ensures that the host and device global arrays are updated and
pointing at the same memory.
Note
In order to use the full capabilities of the DeviceAllocator, relocatable device code must be enabled.
Resetting Memory on the DeviceAllocator¶
The memory that has been used with the DeviceAllocator is only freed at the end of a program when the
ResourceManager is torn down. However, there is a way to overwrite old or outdated data. Users can call
the DeviceAllocator::reset()
method which will allows old data to be overwritten. Below is an example:
The above code snippet shows the reset()
function being called from the host. Calling the function from the host
utilizes the ResourceManager and Umpire’s memset
operation under the hood. Therefore, there is some kind of
synchronization guaranteed. However, if the reset()
function is called on the device, there is no synchronization
guaranteed, so the user must be very careful not to reset memory that other GPU threads still need.
//////////////////////////////////////////////////////////////////////////////
// Copyright (c) 2016-21, Lawrence Livermore National Security, LLC and Umpire
// project contributors. See the COPYRIGHT file for details.
//
// SPDX-License-Identifier: (MIT)
//////////////////////////////////////////////////////////////////////////////
#include "camp/camp.hpp"
#include "umpire/ResourceManager.hpp"
#include "umpire/device_allocator_helper.hpp"
#if defined(UMPIRE_ENABLE_CUDA)
using resource_type = camp::resources::Cuda;
#elif defined(UMPIRE_ENABLE_HIP)
using resource_type = camp::resources::Hip;
#endif
/*
* Very simple kernels that use only the first thread to "get" the
* existing DeviceAllocator and to allocate a double. (One kernel retrieves
* the DeviceAllocator object by ID, the other by name.)
* Making sure that the data_ptr is pointing to the device allocated double,
* it sets the value of that double which will be checked later.
*/
__global__ void my_kernel(double** data_ptr)
{
if (threadIdx.x == 0) {
// _sphinx_tag_get_dev_allocator_id_start
umpire::DeviceAllocator alloc = umpire::get_device_allocator(0);
// _sphinx_tag_get_dev_allocator_id_end
double* data = static_cast<double*>(alloc.allocate(1 * sizeof(double)));
*data_ptr = data;
data[0] = 1024;
}
}
__global__ void my_other_kernel(double** data_ptr)
{
if (threadIdx.x == 0) {
// _sphinx_tag_get_dev_allocator_name_start
umpire::DeviceAllocator alloc = umpire::get_device_allocator("my_device_alloc");
// _sphinx_tag_get_dev_allocator_name_end
double* data = static_cast<double*>(alloc.allocate(1 * sizeof(double)));
*data_ptr = data;
data[0] = 42;
}
}
int main(int argc, char const* argv[])
{
auto resource = camp::resources::Resource{resource_type{}};
// Create my allocators.
// _sphinx_tag_make_dev_allocator_start
auto& rm = umpire::ResourceManager::getInstance();
auto allocator = rm.getAllocator("UM");
auto device_allocator = umpire::make_device_allocator(allocator, sizeof(double), "my_device_alloc");
// _sphinx_tag_make_dev_allocator_end
// Checking that a DeviceAllocator exists...
if (umpire::is_device_allocator(0)) {
std::cout << "I found a DeviceAllocator! " << std::endl;
}
double** ptr_to_data = static_cast<double**>(allocator.allocate(sizeof(double*)));
// Make sure that device and host side DeviceAllocator pointers are synched
// _sphinx_tag_macro_start
UMPIRE_SET_UP_DEVICE_ALLOCATORS();
// _sphinx_tag_macro_end
my_kernel<<<1, 16>>>(ptr_to_data);
resource.get_event().wait();
std::cout << "After first kernel, found value: " << (*ptr_to_data)[0] << std::endl;
// DeviceAllocator only has enough memory for one double. We need to reset it!
device_allocator.reset();
my_other_kernel<<<1, 16>>>(ptr_to_data);
resource.get_event().wait();
std::cout << "After second kernel, found value: " << (*ptr_to_data)[0] << std::endl;
return 0;
}