Device Allocators

The DeviceAllocator is designed for memory allocations on the GPU. Currently there is only support for CUDA, although HIP support is coming soon.

Creating a Device Allocator

To create a DeviceAllocator, users can call the umpire::make_device_allocator host function. This function takes an allocator, the total amount of memory the DeviceAllocator will have, and a name for the new DeviceAllocator object, as shown below. Currently, a maximum of 64 unique DeviceAllocators can be created at a time.

  auto& rm = umpire::ResourceManager::getInstance();
  auto allocator = rm.getAllocator("UM");
  auto device_allocator = umpire::make_device_allocator(allocator, sizeof(double), "my_device_alloc");

When the DeviceAllocator is created, the size parameter that is passed to the umpire::make_device_allocator function is the total memory, in bytes, available to that allocator. Whenever the allocate function is called on the GPU, it is simply atomically incrementing a counter which offsets a pointer to the start of that memory.

Retrieving a DeviceAllocator Object

The umpire::get_device_allocator host/device function returns the DeviceAllocator object that corresponds to the ID or name given. The DeviceAllocator class also includes a helper function, umpire::is_device_allocator, to query whether or not a given ID corresponds to an existing DeviceAllocator. Below is an example of using the ID to obtain the DeviceAllocator object:

    umpire::DeviceAllocator alloc = umpire::get_device_allocator(0);

And next is an example of using the name instead:

    umpire::DeviceAllocator alloc = umpire::get_device_allocator("my_device_alloc");

With the umpire::get_device_allocator function, there is no need to keep track of a DeviceAllocator, since function call stacks can become quite complex. Users can instead use this function to obtain it inside whichever host or device function they need.

Under the hood, the umpire::get_device_allocator uses global arrays which can be accessed by both the host and device. The global array is indexed by DeviceAllocator ID, which is returned by DeviceAllocator::getID(). Because we are using global arrays on host and device, the arrays need to be “set up” after at least one DeviceAllocator has been created, but before any kernels which use a DeviceAllocator are called. This process is done by calling the UMPIRE_SET_UP_DEVICE_ALLOCS() macro. This just ensures that the host and device global arrays are updated and pointing at the same memory.

Note

In order to use the full capabilities of the DeviceAllocator, relocatable device code must be enabled.

Resetting Memory on the DeviceAllocator

The memory that has been used with the DeviceAllocator is only freed at the end of a program when the ResourceManager is torn down. However, there is a way to overwrite old or outdated data. Users can call the DeviceAllocator::reset() method which will allows old data to be overwritten. Below is an example:

The above code snippet shows the reset() function being called from the host. Calling the function from the host utilizes the ResourceManager and Umpire’s memset operation under the hood. Therefore, there is some kind of synchronization guaranteed. However, if the reset() function is called on the device, there is no synchronization guaranteed, so the user must be very careful not to reset memory that other GPU threads still need.

//////////////////////////////////////////////////////////////////////////////
// Copyright (c) 2016-21, Lawrence Livermore National Security, LLC and Umpire
// project contributors. See the COPYRIGHT file for details.
//
// SPDX-License-Identifier: (MIT)
//////////////////////////////////////////////////////////////////////////////
#include "camp/camp.hpp"
#include "umpire/ResourceManager.hpp"
#include "umpire/device_allocator_helper.hpp"

#if defined(UMPIRE_ENABLE_CUDA)
using resource_type = camp::resources::Cuda;
#elif defined(UMPIRE_ENABLE_HIP)
using resource_type = camp::resources::Hip;
#endif

/*
 * Very simple kernels that use only the first thread to "get" the
 * existing DeviceAllocator and to allocate a double. (One kernel retrieves
 * the DeviceAllocator object by ID, the other by name.)
 * Making sure that the data_ptr is pointing to the device allocated double,
 * it sets the value of that double which will be checked later.
 */
__global__ void my_kernel(double** data_ptr)
{
  if (threadIdx.x == 0) {
    // _sphinx_tag_get_dev_allocator_id_start
    umpire::DeviceAllocator alloc = umpire::get_device_allocator(0);
    // _sphinx_tag_get_dev_allocator_id_end
    double* data = static_cast<double*>(alloc.allocate(1 * sizeof(double)));
    *data_ptr = data;
    data[0] = 1024;
  }
}

__global__ void my_other_kernel(double** data_ptr)
{
  if (threadIdx.x == 0) {
    // _sphinx_tag_get_dev_allocator_name_start
    umpire::DeviceAllocator alloc = umpire::get_device_allocator("my_device_alloc");
    // _sphinx_tag_get_dev_allocator_name_end
    double* data = static_cast<double*>(alloc.allocate(1 * sizeof(double)));
    *data_ptr = data;
    data[0] = 42;
  }
}

int main(int argc, char const* argv[])
{
  auto resource = camp::resources::Resource{resource_type{}};

  // Create my allocators.
  // _sphinx_tag_make_dev_allocator_start
  auto& rm = umpire::ResourceManager::getInstance();
  auto allocator = rm.getAllocator("UM");
  auto device_allocator = umpire::make_device_allocator(allocator, sizeof(double), "my_device_alloc");
  // _sphinx_tag_make_dev_allocator_end

  // Checking that a DeviceAllocator exists...
  if (umpire::is_device_allocator(0)) {
    std::cout << "I found a DeviceAllocator! " << std::endl;
  }

  double** ptr_to_data = static_cast<double**>(allocator.allocate(sizeof(double*)));

  // Make sure that device and host side DeviceAllocator pointers are synched
  // _sphinx_tag_macro_start
  UMPIRE_SET_UP_DEVICE_ALLOCATORS();
  // _sphinx_tag_macro_end

  my_kernel<<<1, 16>>>(ptr_to_data);
  resource.get_event().wait();
  std::cout << "After first kernel, found value: " << (*ptr_to_data)[0] << std::endl;

  // DeviceAllocator only has enough memory for one double. We need to reset it!
  device_allocator.reset();

  my_other_kernel<<<1, 16>>>(ptr_to_data);
  resource.get_event().wait();
  std::cout << "After second kernel, found value: " << (*ptr_to_data)[0] << std::endl;

  return 0;
}