 cub::ArgIndexInputIterator< InputIterator, Offset > | A random-access input wrapper for pairing dereferenced values with their corresponding indices (forming ItemOffsetPair tuples) |
 cub::ArgMax | Arg max functor (keeps the value and offset of the first occurrence of the l item) |
 cub::ArgMin | Arg min functor (keeps the value and offset of the first occurrence of the smallest item) |
 cub::BaseTraits< _CATEGORY, _PRIMITIVE, _NULL_TYPE, _UnsignedBits > | Basic type traits |
 cub::BaseTraits< NOT_A_NUMBER, false, false, RemoveQualifiers< T >::Type > | |
  cub::NumericTraits< RemoveQualifiers< T >::Type > | |
   cub::Traits< T > | Type traits |
 cub::BaseTraits< NOT_A_NUMBER, false, false, T > | |
  cub::NumericTraits< T > | Numeric type traits |
 cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockDiscontinuity class provides collective methods for flagging discontinuities within an ordered set of items partitioned across a CUDA thread block.
|
 cub::BlockExchange< T, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockExchange class provides collective methods for rearranging data partitioned across a CUDA thread block.
|
 cub::BlockHistogram< T, BLOCK_DIM_X, ITEMS_PER_THREAD, BINS, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockHistogram class provides collective methods for constructing block-wide histograms from data samples partitioned across a CUDA thread block.
|
 cub::BlockLoad< InputIterator, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockLoad class provides collective data movement methods for loading a linear segment of items from memory into a blocked arrangement across a CUDA thread block.
|
 cub::BlockRadixSort< Key, BLOCK_DIM_X, ITEMS_PER_THREAD, Value, RADIX_BITS, MEMOIZE_OUTER_SCAN, INNER_SCAN_ALGORITHM, SMEM_CONFIG, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockRadixSort class provides collective methods for sorting items partitioned across a CUDA thread block using a radix sorting method.
|
 cub::BlockReduce< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread block.
|
 cub::BlockScan< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockScan class provides collective methods for computing a parallel prefix sum/scan of items partitioned across a CUDA thread block.
|
 cub::BlockStore< OutputIterator, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockStore class provides collective data movement methods for writing a blocked arrangement of items partitioned across a CUDA thread block to a linear segment of memory.
|
 cub::CacheModifiedInputIterator< MODIFIER, ValueType, Offset > | A random-access input wrapper for dereferencing array values using a PTX cache load modifier |
 cub::CacheModifiedOutputIterator< MODIFIER, ValueType, Offset > | A random-access output wrapper for storing array values using a PTX cache-modifier |
 cub::CachingDeviceAllocator | A simple caching allocator for device memory allocations |
 cub::Cast< B > | Default cast functor |
 cub::ConstantInputIterator< ValueType, Offset > | A random-access input generator for dereferencing a sequence of homogeneous values |
 cub::CountingInputIterator< ValueType, Offset > | A random-access input generator for dereferencing a sequence of incrementing integer values |
 cub::CubVector< T, vec_elements > | Exposes a member typedef Type that names the corresponding CUDA vector type if one exists. Otherwise Type refers to the CubVector structure itself, which will wrap the corresponding x , y , etc. vector fields |
 cub::DeviceHistogram | DeviceHistogram provides device-wide parallel operations for constructing histogram(s) from a sequence of samples data residing within global memory.
|
 cub::DevicePartition | DevicePartition provides device-wide, parallel operations for partitioning sequences of data items residing within global memory.
|
 cub::DeviceRadixSort | DeviceRadixSort provides device-wide, parallel operations for computing a radix sort across a sequence of data items residing within global memory.
|
 cub::DeviceReduce | DeviceReduce provides device-wide, parallel operations for computing a reduction across a sequence of data items residing within global memory.
|
 cub::DeviceScan | DeviceScan provides device-wide, parallel operations for computing a prefix scan across a sequence of data items residing within global memory.
|
 cub::DeviceSelect | DeviceSelect provides device-wide, parallel operations for compacting selected items from sequences of data items residing within global memory.
|
 cub::DoubleBuffer< T > | Double-buffer storage wrapper for multi-pass stream transformations that require more than one storage array for streaming intermediate results back and forth |
 cub::Equality | Default equality functor |
 cub::Equals< A, B > | Type equality test |
 cub::If< IF, ThenType, ElseType > | Type selection (IF ? ThenType : ElseType ) |
 cub::Inequality | Default inequality functor |
 cub::InequalityWrapper< EqualityOp > | Inequality functor (wraps equality functor) |
 cub::Int2Type< A > | Allows for the treatment of an integral constant as a type at compile-time (e.g., to achieve static call dispatch based on constant integral values) |
 cub::ItemOffsetPair< _T, _Offset > | An item value paired with a corresponding offset |
 cub::KeyValuePair< _Key, _Value > | A key identifier paired with a corresponding value |
 cub::Log2< N, CURRENT_VAL, COUNT > | Statically determine log2(N), rounded up |
 cub::Max | Default max functor |
 cub::Min | Default min functor |
 cub::NullType | A simple "NULL" marker type |
 cub::PowerOfTwo< N > | Statically determine if N is a power-of-two |
 cub::Sum | Default sum functor |
 cub::TexObjInputIterator< T, Offset > | A random-access input wrapper for dereferencing array values through texture cache. Uses newer Kepler-style texture objects |
 cub::TexRefInputIterator< T, UNIQUE_ID, Offset > | A random-access input wrapper for dereferencing array values through texture cache. Uses older Tesla/Fermi-style texture references |
 cub::TransformInputIterator< ValueType, ConversionOp, InputIterator, Offset > | A random-access input wrapper for transforming dereferenced values |
 cub::Uninitialized< T > | A storage-backing wrapper that allows types with non-trivial constructors to be aliased in unions |
 cub::Uninitialized< _TempStorage > | |
  cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockDiscontinuity require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockExchange< T, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockExchange require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockHistogram< T, BLOCK_DIM_X, ITEMS_PER_THREAD, BINS, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockHistogram require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockLoad< InputIterator, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::LoadInternal< BLOCK_LOAD_TRANSPOSE, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockLoad< InputIterator, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::LoadInternal< BLOCK_LOAD_WARP_TRANSPOSE, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockLoad< InputIterator, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockLoad require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockRadixSort< Key, BLOCK_DIM_X, ITEMS_PER_THREAD, Value, RADIX_BITS, MEMOIZE_OUTER_SCAN, INNER_SCAN_ALGORITHM, SMEM_CONFIG, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockReduce< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockReduce require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockScan< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockStore< OutputIterator, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::StoreInternal< BLOCK_STORE_TRANSPOSE, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockStore< OutputIterator, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::StoreInternal< BLOCK_STORE_WARP_TRANSPOSE, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockStore< OutputIterator, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockStore require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::WarpReduce< T, LOGICAL_WARP_THREADS, PTX_ARCH >::TempStorage | The operations exposed by WarpReduce require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::WarpScan< T, LOGICAL_WARP_THREADS, PTX_ARCH >::TempStorage | The operations exposed by WarpScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
 cub::WarpReduce< T, LOGICAL_WARP_THREADS, PTX_ARCH > | The WarpReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread warp.
|
 cub::WarpScan< T, LOGICAL_WARP_THREADS, PTX_ARCH > | The WarpScan class provides collective methods for computing a parallel prefix scan of items partitioned across a CUDA thread warp.
|