CUB
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Groups
List of all members
cub::DeviceHistogram Struct Reference

Detailed description

DeviceHistogram provides device-wide parallel operations for constructing histogram(s) from a sequence of samples data residing within global memory.

histogram_logo.png
.
Overview
A histogram counts the number of observations that fall into each of the disjoint categories (known as bins).
Usage Considerations
  • Dynamic parallelism. DeviceHistogram methods can be called within kernel code on devices in which CUDA dynamic parallelism is supported. When calling these methods from kernel code, be sure to define the CUB_CDP macro in your compiler's macro definitions.
Performance
histo_perf.png

Definition at line 66 of file device_histogram.cuh.

Static Public Methods

Single-channel samples
template<int BINS, typename InputIterator , typename HistoCounter >
static CUB_RUNTIME_FUNCTION
cudaError_t 
SingleChannelSorting (void *d_temp_storage, size_t &temp_storage_bytes, InputIterator d_samples, HistoCounter *d_histogram, int num_samples, cudaStream_t stream=0, bool debug_synchronous=false)
 Computes a device-wide histogram using fast block-wide sorting. More...
 
template<int BINS, typename InputIterator , typename HistoCounter >
static CUB_RUNTIME_FUNCTION
cudaError_t 
SingleChannelSharedAtomic (void *d_temp_storage, size_t &temp_storage_bytes, InputIterator d_samples, HistoCounter *d_histogram, int num_samples, cudaStream_t stream=0, bool debug_synchronous=false)
 Computes a device-wide histogram using shared-memory atomic read-modify-write operations. More...
 
template<int BINS, typename InputIterator , typename HistoCounter >
static CUB_RUNTIME_FUNCTION
cudaError_t 
SingleChannelGlobalAtomic (void *d_temp_storage, size_t &temp_storage_bytes, InputIterator d_samples, HistoCounter *d_histogram, int num_samples, cudaStream_t stream=0, bool debug_synchronous=false)
 Computes a device-wide histogram using global-memory atomic read-modify-write operations. More...
 
Interleaved multi-channel samples
template<int BINS, int CHANNELS, int ACTIVE_CHANNELS, typename InputIterator , typename HistoCounter >
static CUB_RUNTIME_FUNCTION
cudaError_t 
MultiChannelSorting (void *d_temp_storage, size_t &temp_storage_bytes, InputIterator d_samples, HistoCounter *d_histograms[ACTIVE_CHANNELS], int num_samples, cudaStream_t stream=0, bool debug_synchronous=false)
 Computes a device-wide histogram from multi-channel data using fast block-sorting. More...
 
template<int BINS, int CHANNELS, int ACTIVE_CHANNELS, typename InputIterator , typename HistoCounter >
static CUB_RUNTIME_FUNCTION
cudaError_t 
MultiChannelSharedAtomic (void *d_temp_storage, size_t &temp_storage_bytes, InputIterator d_samples, HistoCounter *d_histograms[ACTIVE_CHANNELS], int num_samples, cudaStream_t stream=0, bool debug_synchronous=false)
 Computes a device-wide histogram from multi-channel data using shared-memory atomic read-modify-write operations. More...
 
template<int BINS, int CHANNELS, int ACTIVE_CHANNELS, typename InputIterator , typename HistoCounter >
static CUB_RUNTIME_FUNCTION
cudaError_t 
MultiChannelGlobalAtomic (void *d_temp_storage, size_t &temp_storage_bytes, InputIterator d_samples, HistoCounter *d_histograms[ACTIVE_CHANNELS], int num_samples, cudaStream_t stream=0, bool debug_synchronous=false)
 Computes a device-wide histogram from multi-channel data using global-memory atomic read-modify-write operations. More...
 

Member Function Documentation

template<int BINS, typename InputIterator , typename HistoCounter >
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceHistogram::SingleChannelSorting ( void *  d_temp_storage,
size_t &  temp_storage_bytes,
InputIterator  d_samples,
HistoCounter *  d_histogram,
int  num_samples,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Computes a device-wide histogram using fast block-wide sorting.

  • The total number of samples across all channels (num_samples) must be a whole multiple of CHANNELS.
  • Delivers consistent throughput regardless of sample diversity
  • Histograms having a large number of bins (e.g., thousands) may adversely affect shared memory occupancy and performance (or even the ability to launch).
  • Performance is often improved when referencing input samples through a texture-caching iterator (e.g., cub::TexObjInputIterator).
  • This operation requires an allocation of temporary device storage. When d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.
  • When calling this method from kernel code, be sure to define the CUB_CDP macro in your compiler's macro definitions.
Snippet
The code snippet below illustrates the computation of a 8-bin histogram of single-channel unsigned char samples.
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh>
// Declare, allocate, and initialize device pointers for input and histogram
int num_samples; // e.g., 12
unsigned char *d_samples; // e.g., [2, 6, 7, 5, 3, 0, 2, 1, 7, 0, 6, 2]
unsigned int *d_histogram; // e.g., [ , , , , , , , ]
...
// Wrap d_samples device pointer in a random-access texture iterator
cub::TexObjInputIterator<unsigned char> d_samples_tex_itr;
d_samples_tex_itr.BindTexture(d_samples, num_samples * sizeof(unsigned char));
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
cub::DeviceHistogram::SingleChannelSorting<8>(d_temp_storage, temp_storage_bytes, d_samples_tex_itr, d_histogram, num_samples);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Compute histogram
cub::DeviceHistogram::SingleChannelSorting<8>(d_temp_storage, temp_storage_bytes, d_samples_tex_itr, d_histogram, num_samples);
// Unbind texture iterator
d_samples_tex_itr.UnbindTexture();
// d_histogram <-- [2, 1, 3, 1, 0, 1, 2, 2]
Template Parameters
BINSNumber of histogram bins per channel
InputIterator[inferred] Random-access input iterator type for reading input samples. (Must have an InputIterator::value_type that, when cast as an integer, falls in the range [0..BINS-1]) (may be a simple pointer type)
HistoCounter[inferred] Integer type for counting sample occurrences per histogram bin
Parameters
[in]d_temp_storageDevice allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in]d_samplesInput samples
[out]d_histogramArray of BINS counters of integral type HistoCounter.
[in]num_samplesNumber of samples to process
[in]stream[optional] CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous[optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false.

Definition at line 129 of file device_histogram.cuh.

template<int BINS, typename InputIterator , typename HistoCounter >
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceHistogram::SingleChannelSharedAtomic ( void *  d_temp_storage,
size_t &  temp_storage_bytes,
InputIterator  d_samples,
HistoCounter *  d_histogram,
int  num_samples,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Computes a device-wide histogram using shared-memory atomic read-modify-write operations.

  • Input samples having lower diversity can cause performance to be degraded due to serializations from bin-collisions.
  • Histograms having a large number of bins (e.g., thousands) may adversely affect shared memory occupancy and performance (or even the ability to launch).
  • Performance is often improved when referencing input samples through a texture-caching iterator (e.g., cub::TexObjInputIterator).
  • This operation requires an allocation of temporary device storage. When d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.
  • When calling this method from kernel code, be sure to define the CUB_CDP macro in your compiler's macro definitions.
Snippet
The code snippet below illustrates the computation of a 8-bin histogram of single-channel unsigned char samples.
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh>
// Declare, allocate, and initialize device pointers for input and histogram
int num_samples; // e.g., 12
unsigned char *d_samples; // e.g., [2, 6, 7, 5, 3, 0, 2, 1, 7, 0, 6, 2]
unsigned int *d_histogram; // e.g., [ , , , , , , , ]
...
// Wrap d_samples device pointer in a random-access texture iterator
cub::TexObjInputIterator<unsigned char> d_samples_tex_itr;
d_samples_tex_itr.BindTexture(d_samples, num_samples * sizeof(unsigned char));
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
cub::DeviceHistogram::SingleChannelSorting<8>(d_temp_storage, temp_storage_bytes, d_samples_tex_itr, d_histogram, num_samples);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Compute histogram
cub::DeviceHistogram::SingleChannelSharedAtomic<8>(d_temp_storage, temp_storage_bytes, d_samples_tex_itr, d_histogram, num_samples);
// Unbind texture iterator
d_samples_tex_itr.UnbindTexture();
// d_histogram <-- [2, 1, 3, 1, 0, 1, 2, 2]
Template Parameters
BINSNumber of histogram bins per channel
InputIterator[inferred] Random-access input iterator type for reading input samples. (Must have an InputIterator::value_type that, when cast as an integer, falls in the range [0..BINS-1]) (may be a simple pointer type)
HistoCounter[inferred] Integer type for counting sample occurrences per histogram bin
Parameters
[in]d_temp_storageDevice allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in]d_samplesInput samples
[out]d_histogramArray of BINS counters of integral type HistoCounter.
[in]num_samplesNumber of samples to process
[in]stream[optional] CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous[optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false.

Definition at line 217 of file device_histogram.cuh.

template<int BINS, typename InputIterator , typename HistoCounter >
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceHistogram::SingleChannelGlobalAtomic ( void *  d_temp_storage,
size_t &  temp_storage_bytes,
InputIterator  d_samples,
HistoCounter *  d_histogram,
int  num_samples,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Computes a device-wide histogram using global-memory atomic read-modify-write operations.

  • Input samples having lower diversity can cause performance to be degraded due to serializations from bin-collisions.
  • Performance is not significantly impacted when computing histograms having large numbers of bins (e.g., thousands).
  • Performance is often improved when referencing input samples through a texture-caching iterator (e.g., cub::TexObjInputIterator).
  • This operation requires an allocation of temporary device storage. When d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.
  • When calling this method from kernel code, be sure to define the CUB_CDP macro in your compiler's macro definitions.
Snippet
The code snippet below illustrates the computation of a 8-bin histogram of single-channel unsigned char samples.
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh>
// Declare, allocate, and initialize device pointers for input and histogram
int num_samples; // e.g., 12
unsigned char *d_samples; // e.g., [2, 6, 7, 5, 3, 0, 2, 1, 7, 0, 6, 2]
unsigned int *d_histogram; // e.g., [ , , , , , , , ]
...
// Wrap d_samples device pointer in a random-access texture iterator
cub::TexObjInputIterator<unsigned char> d_samples_tex_itr;
d_samples_tex_itr.BindTexture(d_samples, num_samples * sizeof(unsigned char));
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
cub::DeviceHistogram::SingleChannelSorting<8>(d_temp_storage, temp_storage_bytes, d_samples_tex_itr, d_histogram, num_samples);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Compute histogram
cub::DeviceHistogram::SingleChannelGlobalAtomic<8>(d_temp_storage, temp_storage_bytes, d_samples_tex_itr, d_histogram, num_samples);
// Unbind texture iterator
d_samples_tex_itr.UnbindTexture();
// d_histogram <-- [2, 1, 3, 1, 0, 1, 2, 2]
Template Parameters
BINSNumber of histogram bins per channel
InputIterator[inferred] Random-access input iterator type for reading input samples. (Must have an InputIterator::value_type that, when cast as an integer, falls in the range [0..BINS-1]) (may be a simple pointer type)
HistoCounter[inferred] Integer type for counting sample occurrences per histogram bin
Parameters
[in]d_temp_storageDevice allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in]d_samplesInput samples
[out]d_histogramArray of BINS counters of integral type HistoCounter.
[in]num_samplesNumber of samples to process
[in]stream[optional] CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous[optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false.

Definition at line 305 of file device_histogram.cuh.

template<int BINS, int CHANNELS, int ACTIVE_CHANNELS, typename InputIterator , typename HistoCounter >
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceHistogram::MultiChannelSorting ( void *  d_temp_storage,
size_t &  temp_storage_bytes,
InputIterator  d_samples,
HistoCounter *  d_histograms[ACTIVE_CHANNELS],
int  num_samples,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Computes a device-wide histogram from multi-channel data using fast block-sorting.

  • The total number of samples across all channels (num_samples) must be a whole multiple of CHANNELS.
  • Delivers consistent throughput regardless of sample diversity
  • Histograms having a large number of bins (e.g., thousands) may adversely affect shared memory occupancy and performance (or even the ability to launch).
  • Performance is often improved when referencing input samples through a texture-caching iterator (e.g., cub::TexObjInputIterator).
  • This operation requires an allocation of temporary device storage. When d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.
  • When calling this method from kernel code, be sure to define the CUB_CDP macro in your compiler's macro definitions.
Snippet
The code snippet below illustrates the computation of three 256-bin histograms from an input sequence of quad-channel (interleaved) unsigned char samples. (E.g., RGB histograms from RGBA pixel samples.)
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh>
// Declare, allocate, and initialize device pointers for input and histograms
int num_samples; // e.g., 20 (five pixels with four channels each)
unsigned char *d_samples; // e.g., [(2, 6, 7, 5), (3, 0, 2, 1), (7, 0, 6, 2),
// (0, 6, 7, 5), (3, 0, 2, 6)]
unsigned int *d_histogram[3]; // e.g., [ [ , , , , , , , ];
// [ , , , , , , , ];
// [ , , , , , , , ] ]
...
// Wrap d_samples device pointer in a random-access texture iterator
cub::TexObjInputIterator<unsigned char> d_samples_tex_itr;
d_samples_tex_itr.BindTexture(d_samples, num_samples * sizeof(unsigned char));
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
cub::DeviceHistogram::MultiChannelSorting<8, 4, 3>(d_temp_storage, temp_storage_bytes, d_samples_tex_itr, d_histograms, num_samples);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Compute histograms
cub::DeviceHistogram::MultiChannelSorting<8, 4, 3>(d_temp_storage, temp_storage_bytes, d_samples_tex_itr, d_histograms, num_samples);
// Unbind texture iterator
d_samples_tex_itr.UnbindTexture();
// d_histogram <-- [ [1, 0, 1, 2, 0, 0, 0, 1];
// [0, 3, 0, 0, 0, 0, 2, 0];
// [0, 0, 2, 0, 0, 0, 1, 2] ]
Template Parameters
BINSNumber of histogram bins per channel
CHANNELSNumber of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
ACTIVE_CHANNELS[inferred] Number of channels actively being histogrammed
InputIterator[inferred] Random-access input iterator type for reading input samples. (Must have an InputIterator::value_type that, when cast as an integer, falls in the range [0..BINS-1]) (may be a simple pointer type)
HistoCounter[inferred] Integer type for counting sample occurrences per histogram bin
Parameters
[in]d_temp_storageDevice allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in]d_samplesPointer to the input sequence of sample items. The samples from different channels are assumed to be interleaved (e.g., an array of 32b pixels where each pixel consists of four RGBA 8b samples).
[out]d_histogramsArray of active channel histogram pointers, each pointing to an output array having BINS counters of integral type HistoCounter.
[in]num_samplesTotal number of samples to process in all channels, including non-active channels
[in]stream[optional] CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous[optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false.

Definition at line 412 of file device_histogram.cuh.

template<int BINS, int CHANNELS, int ACTIVE_CHANNELS, typename InputIterator , typename HistoCounter >
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceHistogram::MultiChannelSharedAtomic ( void *  d_temp_storage,
size_t &  temp_storage_bytes,
InputIterator  d_samples,
HistoCounter *  d_histograms[ACTIVE_CHANNELS],
int  num_samples,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Computes a device-wide histogram from multi-channel data using shared-memory atomic read-modify-write operations.

  • The total number of samples across all channels (num_samples) must be a whole multiple of CHANNELS.
  • Input samples having lower diversity can cause performance to be degraded due to serializations from bin-collisions.
  • Histograms having a large number of bins (e.g., thousands) may adversely affect shared memory occupancy and performance (or even the ability to launch).
  • Performance is often improved when referencing input samples through a texture-caching iterator (e.g., cub::TexObjInputIterator).
  • This operation requires an allocation of temporary device storage. When d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.
  • When calling this method from kernel code, be sure to define the CUB_CDP macro in your compiler's macro definitions.
Snippet
The code snippet below illustrates the computation of three 256-bin histograms from an input sequence of quad-channel (interleaved) unsigned char samples. (E.g., RGB histograms from RGBA pixel samples.)
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh>
// Declare, allocate, and initialize device pointers for input and histograms
int num_samples; // e.g., 20 (five pixels with four channels each)
unsigned char *d_samples; // e.g., [(2, 6, 7, 5), (3, 0, 2, 1), (7, 0, 6, 2),
// (0, 6, 7, 5), (3, 0, 2, 6)]
unsigned int *d_histogram[3]; // e.g., [ [ , , , , , , , ];
// [ , , , , , , , ];
// [ , , , , , , , ] ]
...
// Wrap d_samples device pointer in a random-access texture iterator
cub::TexObjInputIterator<unsigned char> d_samples_tex_itr;
d_samples_tex_itr.BindTexture(d_samples, num_samples * sizeof(unsigned char));
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
cub::DeviceHistogram::MultiChannelSharedAtomic<8, 4, 3>(d_temp_storage, temp_storage_bytes, d_samples_tex_itr, d_histograms, num_samples);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Compute histograms
cub::DeviceHistogram::MultiChannelSharedAtomic<8, 4, 3>(d_temp_storage, temp_storage_bytes, d_samples_tex_itr, d_histograms, num_samples);
// Unbind texture iterator
d_samples_tex_itr.UnbindTexture();
// d_histogram <-- [ [1, 0, 1, 2, 0, 0, 0, 1];
// [0, 3, 0, 0, 0, 0, 2, 0];
// [0, 0, 2, 0, 0, 0, 1, 2] ]
Template Parameters
BINSNumber of histogram bins per channel
CHANNELSNumber of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
ACTIVE_CHANNELS[inferred] Number of channels actively being histogrammed
InputIterator[inferred] Random-access input iterator type for reading input samples. (Must have an InputIterator::value_type that, when cast as an integer, falls in the range [0..BINS-1]) (may be a simple pointer type)
HistoCounter[inferred] Integer type for counting sample occurrences per histogram bin
Parameters
[in]d_temp_storageDevice allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in]d_samplesPointer to the input sequence of sample items. The samples from different channels are assumed to be interleaved (e.g., an array of 32b pixels where each pixel consists of four RGBA 8b samples).
[out]d_histogramsArray of active channel histogram pointers, each pointing to an output array having BINS counters of integral type HistoCounter.
[in]num_samplesTotal number of samples to process in all channels, including non-active channels
[in]stream[optional] CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous[optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false.

Definition at line 510 of file device_histogram.cuh.

template<int BINS, int CHANNELS, int ACTIVE_CHANNELS, typename InputIterator , typename HistoCounter >
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceHistogram::MultiChannelGlobalAtomic ( void *  d_temp_storage,
size_t &  temp_storage_bytes,
InputIterator  d_samples,
HistoCounter *  d_histograms[ACTIVE_CHANNELS],
int  num_samples,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Computes a device-wide histogram from multi-channel data using global-memory atomic read-modify-write operations.

  • The total number of samples across all channels (num_samples) must be a whole multiple of CHANNELS.
  • Input samples having lower diversity can cause performance to be degraded due to serializations from bin-collisions.
  • Performance is not significantly impacted when computing histograms having large numbers of bins (e.g., thousands).
  • Performance is often improved when referencing input samples through a texture-caching iterator (e.g., cub::TexObjInputIterator).
  • This operation requires an allocation of temporary device storage. When d_temp_storage is NULL, no work is done and the required allocation size is returned in temp_storage_bytes.
  • When calling this method from kernel code, be sure to define the CUB_CDP macro in your compiler's macro definitions.
Snippet
The code snippet below illustrates the computation of three 256-bin histograms from an input sequence of quad-channel (interleaved) unsigned char samples. (E.g., RGB histograms from RGBA pixel samples.)
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh>
// Declare, allocate, and initialize device pointers for input and histograms
int num_samples; // e.g., 20 (five pixels with four channels each)
unsigned char *d_samples; // e.g., [(2, 6, 7, 5), (3, 0, 2, 1), (7, 0, 6, 2),
// (0, 6, 7, 5), (3, 0, 2, 6)]
unsigned int *d_histogram[3]; // e.g., [ [ , , , , , , , ];
// [ , , , , , , , ];
// [ , , , , , , , ] ]
...
// Wrap d_samples device pointer in a random-access texture iterator
cub::TexObjInputIterator<unsigned char> d_samples_tex_itr;
d_samples_tex_itr.BindTexture(d_samples, num_samples * sizeof(unsigned char));
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
cub::DeviceHistogram::MultiChannelGlobalAtomic<8, 4, 3>(d_temp_storage, temp_storage_bytes, d_samples_tex_itr, d_histograms, num_samples);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Compute histograms
cub::DeviceHistogram::MultiChannelGlobalAtomic<8, 4, 3>(d_temp_storage, temp_storage_bytes, d_samples_tex_itr, d_histograms, num_samples);
// Unbind texture iterator
d_samples_tex_itr.UnbindTexture();
// d_histogram <-- [ [1, 0, 1, 2, 0, 0, 0, 1];
// [0, 3, 0, 0, 0, 0, 2, 0];
// [0, 0, 2, 0, 0, 0, 1, 2] ]
Template Parameters
BINSNumber of histogram bins per channel
CHANNELSNumber of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
ACTIVE_CHANNELS[inferred] Number of channels actively being histogrammed
InputIterator[inferred] Random-access input iterator type for reading input samples. (Must have an InputIterator::value_type that, when cast as an integer, falls in the range [0..BINS-1]) (may be a simple pointer type)
HistoCounter[inferred] Integer type for counting sample occurrences per histogram bin
Parameters
[in]d_temp_storageDevice allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in]d_samplesPointer to the input sequence of sample items. The samples from different channels are assumed to be interleaved (e.g., an array of 32b pixels where each pixel consists of four RGBA 8b samples).
[out]d_histogramsArray of active channel histogram pointers, each pointing to an output array having BINS counters of integral type HistoCounter.
[in]num_samplesTotal number of samples to process in all channels, including non-active channels
[in]stream[optional] CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous[optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false.

Definition at line 609 of file device_histogram.cuh.


The documentation for this struct was generated from the following file: