 cub | Optional outer namespace(s) |
  CachingDeviceAllocator | A simple caching allocator for device memory allocations |
  If | Type selection (IF ? ThenType : ElseType ) |
  Equals | Type equality test |
  NullType | A simple "NULL" marker type |
  Int2Type | Allows for the treatment of an integral constant as a type at compile-time (e.g., to achieve static call dispatch based on constant integral values) |
  CubVector | Exposes a member typedef Type that names the corresponding CUDA vector type if one exists. Otherwise Type refers to the CubVector structure itself, which will wrap the corresponding x , y , etc. vector fields |
  Uninitialized | A storage-backing wrapper that allows types with non-trivial constructors to be aliased in unions |
  ItemOffsetPair | An item value paired with a corresponding offset |
  KeyValuePair | A key identifier paired with a corresponding value |
  DoubleBuffer | Double-buffer storage wrapper for multi-pass stream transformations that require more than one storage array for streaming intermediate results back and forth |
  Log2 | Statically determine log2(N), rounded up |
  PowerOfTwo | Statically determine if N is a power-of-two |
  BaseTraits | Basic type traits |
  NumericTraits | Numeric type traits |
  Traits | Type traits |
  ArgIndexInputIterator | A random-access input wrapper for pairing dereferenced values with their corresponding indices (forming ItemOffsetPair tuples) |
  CacheModifiedInputIterator | A random-access input wrapper for dereferencing array values using a PTX cache load modifier |
  CacheModifiedOutputIterator | A random-access output wrapper for storing array values using a PTX cache-modifier |
  ConstantInputIterator | A random-access input generator for dereferencing a sequence of homogeneous values |
  CountingInputIterator | A random-access input generator for dereferencing a sequence of incrementing integer values |
  TexObjInputIterator | A random-access input wrapper for dereferencing array values through texture cache. Uses newer Kepler-style texture objects |
  TexRefInputIterator | A random-access input wrapper for dereferencing array values through texture cache. Uses older Tesla/Fermi-style texture references |
  TransformInputIterator | A random-access input wrapper for transforming dereferenced values |
  Equality | Default equality functor |
  Inequality | Default inequality functor |
  InequalityWrapper | Inequality functor (wraps equality functor) |
  Sum | Default sum functor |
  Max | Default max functor |
  ArgMax | Arg max functor (keeps the value and offset of the first occurrence of the l item) |
  Min | Default min functor |
  ArgMin | Arg min functor (keeps the value and offset of the first occurrence of the smallest item) |
  Cast | Default cast functor |
  BlockDiscontinuity | The BlockDiscontinuity class provides collective methods for flagging discontinuities within an ordered set of items partitioned across a CUDA thread block.
|
   TempStorage | The operations exposed by BlockDiscontinuity require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockExchange | The BlockExchange class provides collective methods for rearranging data partitioned across a CUDA thread block.
|
   TempStorage | The operations exposed by BlockExchange require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockHistogram | The BlockHistogram class provides collective methods for constructing block-wide histograms from data samples partitioned across a CUDA thread block.
|
   TempStorage | The operations exposed by BlockHistogram require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockLoad | The BlockLoad class provides collective data movement methods for loading a linear segment of items from memory into a blocked arrangement across a CUDA thread block.
|
   TempStorage | The operations exposed by BlockLoad require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockRadixSort | The BlockRadixSort class provides collective methods for sorting items partitioned across a CUDA thread block using a radix sorting method.
|
   TempStorage | The operations exposed by BlockScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockReduce | The BlockReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread block.
|
   TempStorage | The operations exposed by BlockReduce require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockScan | The BlockScan class provides collective methods for computing a parallel prefix sum/scan of items partitioned across a CUDA thread block.
|
   TempStorage | The operations exposed by BlockScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockStore | The BlockStore class provides collective data movement methods for writing a blocked arrangement of items partitioned across a CUDA thread block to a linear segment of memory.
|
   TempStorage | The operations exposed by BlockStore require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  DeviceHistogram | DeviceHistogram provides device-wide parallel operations for constructing histogram(s) from a sequence of samples data residing within global memory.
|
  DevicePartition | DevicePartition provides device-wide, parallel operations for partitioning sequences of data items residing within global memory.
|
  DeviceRadixSort | DeviceRadixSort provides device-wide, parallel operations for computing a radix sort across a sequence of data items residing within global memory.
|
  DeviceReduce | DeviceReduce provides device-wide, parallel operations for computing a reduction across a sequence of data items residing within global memory.
|
  DeviceScan | DeviceScan provides device-wide, parallel operations for computing a prefix scan across a sequence of data items residing within global memory.
|
  DeviceSelect | DeviceSelect provides device-wide, parallel operations for compacting selected items from sequences of data items residing within global memory.
|
  WarpScan | The WarpScan class provides collective methods for computing a parallel prefix scan of items partitioned across a CUDA thread warp.
|
   TempStorage | The operations exposed by WarpScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  WarpReduce | The WarpReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread warp.
|
   TempStorage | The operations exposed by WarpReduce require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |