Device Allocators

The DeviceAllocator is designed for memory allocations on the GPU.

Creating a Device Allocator

To create a DeviceAllocator, users can call the umpire::make_device_allocator host function. This function takes an allocator, the total amount of memory the DeviceAllocator will have, and a unique name for the new DeviceAllocator object, as shown below. A maximum of 64 unique DeviceAllocators can be created at a time.

  auto& rm = umpire::ResourceManager::getInstance();
  auto allocator = rm.getAllocator("UM");
  auto device_allocator = umpire::make_device_allocator(allocator, sizeof(double), "my_device_alloc");

When the DeviceAllocator is created, the size parameter that is passed to the umpire::make_device_allocator function is the total memory, in bytes, available to that allocator. Whenever the allocate function is called on the GPU, it is simply atomically incrementing a counter which offsets a pointer to the start of that memory. In other words, the total size from all of the allocates performed on the device with the DeviceAllocator may not exceed the size that was used when creating the device allocator.

To see what the total memory, in bytes, available to the allocator is, simply call the DeviceAllocator::getTotalSize() function.

Retrieving a DeviceAllocator Object

After creating a DeviceAllocator, we can immediately start using that allocator to allocate device memory. To do this, we have the umpire::get_device_allocator host/device function which returns the DeviceAllocator object corresponding to the name (or ID) given. The DeviceAllocator class also includes a helper function, umpire::is_device_allocator, to query whether or not a given name (or ID) corresponds to an existing DeviceAllocator. Below is an example of using the name to obtain the DeviceAllocator object:

    umpire::DeviceAllocator alloc = umpire::get_device_allocator("my_device_alloc");

With the umpire::get_device_allocator function, there is no need to keep track of a DeviceAllocator, since function call stacks can become quite complex. Users can instead use this function to obtain it inside whichever host or device function they need.

Note

When compiling without relocatable device code (RDC), the UMPIRE_SET_UP_DEVICE_ALLOCATORS() macro must be called in every translation unit that will use the umpire::get_device_allocator function.

Resetting Memory on the DeviceAllocator

The memory that has been used with the DeviceAllocator is only freed at the end of a program when the ResourceManager is torn down. However, there is a way to overwrite old or outdated data. Users can call the DeviceAllocator::reset() method which will allows old data to be overwritten. Below is an example:

The above code snippet shows the reset() function being called from the host. Calling the function from the host utilizes the ResourceManager and Umpire’s memset operation under the hood. Therefore, there is some kind of synchronization guaranteed. However, if the reset() function is called on the device, there is no synchronization guaranteed, so the user must be very careful not to reset memory that other GPU threads still need.

To see the current size of the DeviceAllocator (aka, the current amount of memory, in bytes, being used), call the DeviceAllocator::getCurrentSize() function.

//////////////////////////////////////////////////////////////////////////////
// Copyright (c) 2016-24, Lawrence Livermore National Security, LLC and Umpire
// project contributors. See the COPYRIGHT file for details.
//
// SPDX-License-Identifier: (MIT)
//////////////////////////////////////////////////////////////////////////////
#include "camp/camp.hpp"
#include "umpire/ResourceManager.hpp"
#include "umpire/device_allocator_helper.hpp"

#if defined(UMPIRE_ENABLE_CUDA)
using resource_type = camp::resources::Cuda;
#elif defined(UMPIRE_ENABLE_HIP)
using resource_type = camp::resources::Hip;
#endif

/*
 * Very simple kernel that uses only the first thread to "get" the
 * existing DeviceAllocator and allocate a double.
 * Making sure that the data_ptr is pointing to the device allocated double,
 * it sets the value of that double which will be checked later.
 */
__global__ void my_kernel(double** data_ptr)
{
  if (threadIdx.x == 0) {
    // _sphinx_tag_get_dev_allocator_name_start
    umpire::DeviceAllocator alloc = umpire::get_device_allocator("my_device_alloc");
    // _sphinx_tag_get_dev_allocator_name_end
    double* data = static_cast<double*>(alloc.allocate(1 * sizeof(double)));
    *data_ptr = data;
    data[0] = alloc.getCurrentSize();
  }
}

int main()
{
  auto resource = camp::resources::Resource{resource_type{}};

  // Create my allocators.
  // _sphinx_tag_make_dev_allocator_start
  auto& rm = umpire::ResourceManager::getInstance();
  auto allocator = rm.getAllocator("UM");
  auto device_allocator = umpire::make_device_allocator(allocator, sizeof(double), "my_device_alloc");
  // _sphinx_tag_make_dev_allocator_end

  // Checking that the DeviceAllocator just created can be found...
  if (umpire::is_device_allocator("my_device_alloc")) {
    std::cout << "I found a DeviceAllocator!" << std::endl;
    std::cout << "The total size is: " << device_allocator.getTotalSize() << std::endl;
  }

  double** ptr_to_data = static_cast<double**>(allocator.allocate(sizeof(double*)));

  // See ReadTheDocs DeviceAllocator documentation for more info about macro usage!
  UMPIRE_SET_UP_DEVICE_ALLOCATORS();

#if defined(UMPIRE_ENABLE_CUDA)
  my_kernel<<<1, 16>>>(ptr_to_data);
#elif defined(UMPIRE_ENABLE_HIP)
  hipLaunchKernelGGL(my_kernel, dim3(1), dim3(16), 0, 0, ptr_to_data);
#endif

  resource.get_event().wait();
  std::cout << "After calling kernel, found value: " << (*ptr_to_data)[0] << std::endl;

  // DeviceAllocator only has enough memory for one double. We need to reset it!
  device_allocator.reset();
  std::cout << "After calling reset, the current size is: " << device_allocator.getCurrentSize() << std::endl;

#if defined(UMPIRE_ENABLE_CUDA)
  my_kernel<<<1, 16>>>(ptr_to_data);
#elif defined(UMPIRE_ENABLE_HIP)
  hipLaunchKernelGGL(my_kernel, dim3(1), dim3(16), 0, 0, ptr_to_data);
#endif

  resource.get_event().wait();
  std::cout << "After calling kernel again, found value: " << (*ptr_to_data)[0] << std::endl;

  allocator.deallocate(ptr_to_data);

  return 0;
}