[CuTeDSL] Atom Shapes: MMA, TMA, and TMEM

March 31, 2026

When using partition functions in CuTe DSL, the returned tensor shape contains an atom shape representing hardware instruction constraints. This article covers the three atom types from your examples.

1. MMA Atom Shape

tCgA: Unknown = thr_mma.partition_A(gA)
# Shape: ((128, 16), 1, 4, ...)
#        └─mma_atom─┘

2. TMA Atom Shape

tAsA, tAgA = cute.nvgpu.cpasync.tma_partition(
    (((128, 16), 1, 4), 64, 64),  # input
    ...
)
# Input:  (((128, 16), 1, 4), 64, 64)
# Output: ((8192, 1), 64, 64)
#         └─tma_atom─┘

3. TMEM Atom Shape

tTRtC: Unknown = tmem_thr_copy.partition_S(tEPItAcc)
# Shape: (((64, 32), 1), 1, 1, 1, 4, 1, 1)
#        └─ tmem_atom ─┘  epi_rests, and others (mma_rests, tma_rests, all_rests)

Comparison Table

Aspect MMA TMA TMEM
Atom Shape (mma_atom_m, mma_atom_k) (tma_atom, num_atoms) ((tmem_atom_n, tmem_atom_m), num_atoms)
Num Atom Repeats Implicit (1) Explicit Explicit
Num Atom Tiles Outside atom Implicit (1) Outside atom (like MMA)
Inner Structure (M/N, K) (elements, num) ((repetition, lanes), num)

Key Observations

  1. MMA and TMEM: Remaining num of atom tiles are outside the atom shape
  2. TMA: The flatten shape is not changed before and after tma_partition
  3. Inner format similarity: TMEM combines MMA’s outer behavior with TMA’s inner (atom, num_instrs) structure
← Back to Articles