Pulse · NVIDIA/cutlass

March 7, 2025 – March 14, 2025

30 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[QST] Why tma_load.get_slice(0) here always need 0?
#1929 commented on Mar 8, 2025 • 0 new comments
[QST] Getting a template error trying to use cutlass's depthwise 2D convolution with pytorch
#2077 commented on Mar 8, 2025 • 0 new comments
[QST] in implicit gemm conv, why does not support split-k when group !=1 ?
#2049 commented on Mar 9, 2025 • 0 new comments
[QST] bfloat16 x int8 GEMM
#1936 commented on Mar 11, 2025 • 0 new comments
[BUG] Unaligned access in test/unit/gemm/threadblock/batched_gemv.cu
#2003 commented on Mar 11, 2025 • 0 new comments
[BUG] Tmem tiled copy with non power-of-2 size fails to compile
#2094 commented on Mar 12, 2025 • 0 new comments
[QST] How to apply StreamK to hopper warp specialized GEMM
#2075 commented on Mar 12, 2025 • 0 new comments
[QST] Adding a flag in Tensor Ref Class
#2080 commented on Mar 12, 2025 • 0 new comments
[QST]The Persistent Tile Scheduler in CUTLASS?
#1685 commented on Mar 12, 2025 • 0 new comments
[QST] How to define a new custom kernel
#1930 commented on Mar 12, 2025 • 0 new comments
[QST] Global variable inside conv2d kernel
#1987 commented on Mar 12, 2025 • 0 new comments
[QST] Question about example 69
#2096 commented on Mar 13, 2025 • 0 new comments
[QST] Why does GenerateSM90_TensorOp_16b_WGMMA_alignx_gemm not generate C.element = DataType.void?
#2144 commented on Mar 13, 2025 • 0 new comments
[QST]From index into a coordinate (or coordniate into a index), it has two different implementations, how should one distinguish and understand the scenarios for their use?
#2128 commented on Mar 13, 2025 • 0 new comments
[QST]How Does TMA Work in CUTLASS for Writing from Shared Memory to Global Memory?
#2008 commented on Mar 13, 2025 • 0 new comments
[BUG] wmma should be enabled w/ clang.
#2006 commented on Mar 13, 2025 • 0 new comments
[QST]Behavior of TMA Store and Wait Mechanism in CUTLASS
#2002 commented on Mar 13, 2025 • 0 new comments
[BUG] Funcionality TensorOp 80+ s8 * s8 + s32 => {s32, s8} not working
#1981 commented on Mar 13, 2025 • 0 new comments
[QST] Is there a Cutlass GEMM example to read inputs with custom padding?
#1922 commented on Mar 13, 2025 • 0 new comments
[BUG] Logic issue in nondeterministic reduction mode of Stream-K tile scheduler.
#2027 commented on Mar 13, 2025 • 0 new comments
[QST] Are there plans to add specialisations for Sm90?
#1123 commented on Mar 14, 2025 • 0 new comments
[BUG] calling cast_smem_ptr_to_uint(device fn) from make_gmma_desc(host device fn) is not allowed
#1997 commented on Mar 14, 2025 • 0 new comments
[QST]Is the Key Difference Between mbarrier and barrier Their Handling of Producer-Consumer Count?
#1999 commented on Mar 14, 2025 • 0 new comments
[QST]How to Handle Synchronization with Different Thread Counts for Producer and Consumer in CUTLASS?
#1998 commented on Mar 14, 2025 • 0 new comments
[BUG]Is tcgen05.fence supported by Cutlass-3.8.0 ?
#2098 commented on Mar 14, 2025 • 0 new comments
[QST] Permutation layout for contiguous stores
#2127 commented on Mar 14, 2025 • 0 new comments
[QST] Question about UniversalFMA
#2101 commented on Mar 14, 2025 • 0 new comments
[QST] FP8 with row-wise scaling on Ada-Lovelace
#1937 commented on Mar 14, 2025 • 0 new comments
Use Compile-Time Constexpr If
#2035 commented on Mar 9, 2025 • 0 new comments
Allow cluster sizes across m,n,k to be reported in cutlass profiler
#2078 commented on Mar 14, 2025 • 0 new comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

March 7, 2025 – March 14, 2025

Overview

Could not load contribution data

2 Pull requests merged by 2 people

5 Pull requests opened by 5 people

6 Issues closed by 6 people

8 Issues opened by 7 people

30 Unresolved conversations

Insights: NVIDIA/cutlass

March 7, 2025 – March 14, 2025

Overview

Could not load contribution data

2 Pull requests merged by 2 people

5 Pull requests opened by 5 people

6 Issues closed by 6 people

8 Issues opened by 7 people

30 Unresolved conversations